-
Notifications
You must be signed in to change notification settings - Fork 0
analysis code_parser
Multi-language source code parser using tree-sitter. Loads grammar modules, extracts function/class names and parameters from ASTs, and supports 14 languages.
| Term | Definition | Example |
|---|---|---|
| AST | Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). |
def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)]. |
| tree-sitter | A fast, multi-language parser that builds ASTs. Supports 100+ languages without needing each language's compiler. | tree-sitter parses config.py into an AST, then we extract function/class nodes from it. |
Source: src/codewalk/analysis/code_parser.py
Maps language name → tree-sitter grammar pip package name:
{
"python": "tree_sitter_python",
"javascript": "tree_sitter_javascript",
"typescript": "tree_sitter_typescript",
"dart": "tree_sitter_dart",
... # 14 languages total
}Module-level dict that caches loaded Language objects so each grammar is loaded only once.
Per-language mapping of which AST node types represent functions vs classes, and which child fields hold the name and parameters. Example for Python:
{
"function": ["function_definition"],
"class": ["class_definition"],
"name_field": "name",
"params_field": "parameters"
}Loads a tree-sitter Language object for a given language name. Returns from cache if already loaded, otherwise imports the grammar module and creates the Language.
Input language: "python"
Line 107: "python" in _language_cache → False (first call)
Line 110: model_name = GRAMMAR_MAP.get("python") → "tree_sitter_python"
Line 112: model_name is not None → skip the return None
Line 115: grammar_module = importlib.import_module("tree_sitter_python")
Line 117: language == "typescript"? No
Line 119: language == "php"? No
Line 121: lang = Language(grammar_module.language()) — calls the grammar's language() C function
Line 123: _language_cache["python"] = lang — cached for next time
Return: <Language object for Python>
Second call with "python" → Line 107: cache hit → returns immediately.
-
"typescript"→ callsgrammar_module.language_typescript()(has separate TS + TSX grammars) -
"php"→ callsgrammar_module.language_php() - Unknown language → returns
None
Creates a tree-sitter Parser loaded with the grammar for the given language.
Input language: "dart"
Line 131: lang = get_language("dart") → loads and returns the Dart Language object
Line 133: lang is not None → skip return None
Line 135: Creates Parser(lang) and returns it
Return: <Parser object configured for Dart>
If language is unsupported → get_language returns None → this returns None.
Pulls the function/class name out of an AST node by looking up a named child field.
Input node: <function_definition node for "def scan_directory(path):">
Input name_field: "name"
Line 138: name_node = node.child_by_field_name("name") → <identifier node "scan_directory">
Line 139: name_node is truthy → Return: name_node.text.decode("utf-8") → "scan_directory"
Input node: <function_definition for "int main(int argc, char *argv[])">
Input name_field: "declarator"
Line 138: name_node = node.child_by_field_name("declarator") → None (C nests the name deeper)
Lines 143–146: Fallback loop — iterates over node.children:
- Finds
child.type == "function_declarator"✓ -
inner = child.child_by_field_name("declarator")→<identifier "main"> -
Return:
"main"
Input node: <method_signature wrapping a function_signature>
Input name_field: "name"
Line 138: Direct lookup fails → None
Lines 143–146: No function_declarator child
Lines 149–153: Finds child with type == "function_signature", looks up "name" inside it → returns the method name
If all fallbacks fail → Return: "<anonymous>"
Pulls parameter names from a function's AST node.
Input node: <function_definition for "def greet(name, age):">
Input params_field: "parameters"
Line 157: params_node = node.child_by_field_name("parameters") → <parameters node "(name, age)">
Line 158: params_node is truthy → skip fallback
Line 167: param_names = []
Lines 169–180: Loop through params_node.children:
-
child = "("→ type is"("→continue -
child = <identifier "name">:-
Line 174:
name_node = child.child_by_field_name("name")→ None (it IS the identifier) -
Line 176:
child.type == "identifier"✓ →param_names.append("name")
-
Line 174:
-
child = ","→continue -
child = <identifier "age">:- Same path →
param_names.append("age")
- Same path →
-
child = ")"→continue
Return: ["name", "age"]
Input node: <function_definition for "def greet(name: str):">
The child is a typed_parameter node (not an identifier). Neither child_by_field_name("name") nor child.type == "identifier" matches.
Lines 179–182: Fallback — iterates child.children:
- Finds
sub.type == "identifier"→param_names.append("name")→break
Return: ["name"]
Recursively walks the concrete syntax tree (CST) and yields nodes whose type matches a given set.
Input node: <root of a Python file with two function defs>
Input target_types: {"function_definition", "class_definition"}
Input skip_children_types: None
Line 223: node.type = "module" → not in target_types → skip yield
Lines 228–229: Recurse into each child of module:
- Child 1:
<function_definition>→ type in target_types ✓ → yield this node- Not in skip_children_types → recurse into its children (won't find nested functions in this example)
- Child 2:
<function_definition>→ yield
Yields: 2 function_definition nodes
For Dart: method_signature contains a function_signature inside it. Without skip_children_types, you'd get both (duplicate). Setting skip_children_types = {"method_signature"} prevents recursing into matched method_signature nodes.
Parses any supported language file and returns a list of function and class definitions with their source code.
Input file_path: "/repo/src/config.py"
Input language: "python"
The file contains:
class Settings:
host = "localhost"
def load_config(path):
return Settings()Line 234: parser = get_parser_for_language("python") → Parser object
Line 239: node_types = NODE_TYPES.get("python") → {"function": ["function_definition"], "class": ["class_definition"], "name_field": "name", "params_field": "parameters"}
Line 245: Reads file as bytes
Line 250: tree = parser.parse(source) → AST
Line 252: lines = source.decode(...).splitlines() → ["class Settings:", ' host = "localhost"', "", "def load_config(path):", " return Settings()"]
Line 255: function_types = {"function_definition"}
Line 256: class_types = {"class_definition"}
Line 257: all_target_types = {"function_definition", "class_definition"}
Line 261: Walk tree, find matching nodes:
Node 1: class_definition → item_type = "class"
-
start_line = 1,end_line = 2 -
name = extract_name(node, "name")→"Settings" code = "class Settings:\n host = \"localhost\""- Appends
{"type": "class", "name": "Settings", "start_line": 1, "end_line": 2, "code": "..."}
Node 2: function_definition → item_type = "function"
-
start_line = 4,end_line = 5 name = "load_config"code = "def load_config(path):\n return Settings()"-
args = extract_params(node, "parameters")→["path"] - Appends
{"type": "function", "name": "load_config", "start_line": 4, "end_line": 5, "code": "...", "args": ["path"]}
Return:
[
{"type": "class", "name": "Settings", "start_line": 1, "end_line": 2, "code": "..."},
{"type": "function", "name": "load_config", "start_line": 4, "end_line": 5, "code": "...", "args": ["path"]},
]