Skip to content

analysis python_parser

aakash-anko edited this page May 25, 2026 · 1 revision

analysis/python_parser.py

Python-specific parser using the built-in ast module. Extracts functions, classes, decorators, arguments, base classes, and source code segments.


Key Concepts

Term Definition Example
AST Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].

Source: src/codewalk/analysis/python_parser.py


parse_python_file

Parses a Python file using ast.parse() and extracts all function and class definitions with their metadata.

Example

Input file_path: "/repo/src/codewalk/config.py"

The file contains:

import os

class Settings:
    host = "localhost"
    port = 8080

@staticmethod
def load_config(path: str) -> Settings:
    return Settings()

Line 7: source = Path(file_path).read_text(encoding="utf-8") → the full file text
Line 12: tree = ast.parse(source) → Python AST
Line 15: lines = source.splitlines()["import os", "", "class Settings:", ' host = "localhost"', " port = 8080", "", "@staticmethod", "def load_config(path: str) -> Settings:", " return Settings()"]
Line 16: items = []

Line 18: ast.walk(tree) — walks every node in the AST in no particular order:

Node: class Settings (ClassDef)

  • Line 28: isinstance(node, ast.ClassDef)
  • Line 29–35:
    • type = "class"
    • name = "Settings"
    • start_line = 3, end_line = 5
    • code = get_source_segment(lines, 3, 5)"class Settings:\n host = \"localhost\"\n port = 8080"
    • bases = [] (no parent classes)
    • methods = [] (no methods inside)

Node: load_config (FunctionDef)

  • Line 19: isinstance(node, ast.FunctionDef)
  • Line 20–27:
    • type = "function"
    • name = "load_config"
    • start_line = 8, end_line = 9
    • code = get_source_segment(lines, 8, 9)"@staticmethod\ndef load_config(path: str) -> Settings:" — wait, decorators are on line 7, the function is lines 8–9
    • Actually node.lineno for a decorated function points to the def line (8), not the decorator. So code = "@staticmethod\ndef load_config(path: str) -> Settings:\n return Settings()" — no, lines[7:9] = lines 8–9 (0-indexed 7,8) = ["def load_config(path: str) -> Settings:", " return Settings()"]
    • decorators = [get_decorator_name(d) for d in node.decorator_list]["staticmethod"]
    • args = [arg.arg for arg in node.args.args]["path"]

Return:

[
    {
        "type": "class",
        "name": "Settings",
        "start_line": 3,
        "end_line": 5,
        "code": "class Settings:\n    host = \"localhost\"\n    port = 8080",
        "bases": [],
        "methods": [],
    },
    {
        "type": "function",
        "name": "load_config",
        "start_line": 8,
        "end_line": 9,
        "code": "def load_config(path: str) -> Settings:\n    return Settings()",
        "decorators": ["staticmethod"],
        "args": ["path"],
    },
]

get_source_segment

Extracts source code lines from a list, converting from 1-indexed line numbers to 0-indexed list indices.

Example

Input lines: ["import os", "", "class Settings:", "    host = \"localhost\"", "    port = 8080"]
Input start: 3
Input end: 5

Line 39: lines[3-1 : 5]lines[2:5]["class Settings:", ' host = "localhost"', " port = 8080"]
Line 39: "\n".join(...)Return: "class Settings:\n host = \"localhost\"\n port = 8080"


get_decorator_name

Extracts a decorator's name from an AST node. Handles simple names, dotted names, and call-style decorators.

Example 1: Simple decorator

Input node: <ast.Name id="staticmethod">

Line 43: isinstance(node, ast.Name) ✓ → Return: "staticmethod"

Example 2: Dotted decorator

Input node: <ast.Attribute: value=<Name "app">, attr="route">

Line 45: isinstance(node, ast.Attribute)
Line 46: get_name(node.value)"app", node.attr"route"
Return: "app.route"

Example 3: Call decorator @app.route("/api")

Input node: <ast.Call func=<Attribute "app.route">>

Line 47: isinstance(node, ast.Call)
Line 48: get_decorator_name(node.func) → recurses into the Attribute case → Return: "app.route"


get_name

Extracts a dotted name string from an AST node. Used by both get_decorator_name and the bases extraction for classes.

Example 1: Simple name

Input node: <ast.Name id="Settings">

Line 52: isinstance(node, ast.Name) ✓ → Return: "Settings"

Example 2: Dotted name

Input node: <ast.Attribute value=<Attribute value=<Name "src">, attr="codewalk">, attr="config">

Line 54: isinstance(node, ast.Attribute)
Line 55: get_name(node.value) → recurses:

  • node.value is <Attribute value=<Name "src">, attr="codewalk">
  • get_name(<Name "src">)"src"
  • Returns "src.codewalk"

Back in the outer call: f"src.codewalk.config"Return: "src.codewalk.config"

Clone this wiki locally