-
Notifications
You must be signed in to change notification settings - Fork 0
analysis dependency_graph
Extracts import statements from source files using tree-sitter, resolves them to actual file paths in the repo, and builds a file-level dependency graph.
| Term | Definition | Example |
|---|---|---|
| edge | A connection between two vertices in a graph, representing a relationship (e.g., an import). | If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py. |
| AST | Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). |
def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)]. |
| tree-sitter | A fast, multi-language parser that builds ASTs. Supports 100+ languages without needing each language's compiler. | tree-sitter parses config.py into an AST, then we extract function/class nodes from it. |
Source: src/codewalk/analysis/dependency_graph.py
Maps each language to the AST node types that represent import statements:
{
"python": ["import_statement", "import_from_statement"],
"javascript": ["import_statement", "call_expression"], # call_expression for require()
"dart": ["import_or_export"],
"java": ["import_declaration"],
"go": ["import_declaration", "import_spec"],
"rust": ["use_declaration"],
... # 14 languages
}Parses a file with tree-sitter and returns a list of raw import strings (not yet resolved to file paths).
Input file_path: "/repo/src/codewalk/pipeline.py"
Input language: "python"
The file contains:
import os
from src.codewalk.config import SettingsLine 37: "python" in IMPORT_NODE_TYPES ✓
Line 40: parser = get_parser_for_language("python") → Parser object
Line 45: source = Path(file_path).read_bytes() → raw bytes of the file
Line 49: tree = parser.parse(source) → AST
Line 50: root = tree.root_node
Line 52: target_types = {"import_statement", "import_from_statement"}
Line 54: imports = []
Line 56: Calls _walk_for_imports(root, target_types) → yields 2 import nodes
Line 57: For each node, calls _extract_raw_import(node, "python"):
- Node 1:
import_statementforimport os→ extracts"os" - Node 2:
import_from_statementforfrom src.codewalk.config import Settings→ extracts"src.codewalk.config"
Return: ["os", "src.codewalk.config"]
Recursively walks the AST and yields nodes whose type matches the target import types.
Input node: <root of a JS file with "import express from 'express'" and "const fs = require('fs')">
Input target_types: {"import_statement", "call_expression"}
Line 63: node.type = "program" → not in target_types → skip yield
Line 70: Recurse into children:
- Child 1:
<import_statement>→ type in target_types ✓-
Line 63: Not a
call_expression→ yield directly
-
Line 63: Not a
- Child 2:
<expression_statement>→ not in target_types → recurse- Finds
<call_expression>→ type in target_types ✓ -
Line 65: It IS a
call_expression→ check:func.text == "require"✓ → yield
- Finds
Yields: the import_statement node and the require() call_expression node
Walks the nested Dart AST structure to extract the URI string from an import statement.
Input node: <import_or_export for "import 'package:flutter/material.dart';">
Line 75: Iterates children, finds child.type == "library_import" ✓
Line 76: Inside that, finds spec.type == "import_specification" ✓
Line 77: Inside that, finds part.type == "configurable_uri" ✓
Line 78: Inside that, finds uri_node.type == "uri" ✓
Line 79: uri_node.text.decode("utf-8").strip("'\"") → "package:flutter/material.dart"
Return: "package:flutter/material.dart"
Given an import AST node, extracts the module/path string being imported. Has per-language branches.
Input node: <import_from_statement for "from pathlib import Path">
Input language: "python"
Line 86: language == "python" ✓
Lines 87–91: Loop through node.children:
-
child.type == "dotted_name"→child.text = "pathlib"→ Return:"pathlib"
Input node: <import_statement for "import express from 'express'">
Input language: "javascript"
Line 95: language in ("javascript", "typescript") ✓
Line 97: node.type == "import_statement" ✓
Lines 98–100: Loop through children:
- Finds
child.type == "string"→child.text = "'express'"→.strip("'\"")→ Return:"express"
Input node: <call_expression for "require('./auth')">
Input language: "javascript"
Line 104: node.type == "call_expression" ✓
Line 105: func.text == "require" ✓
Line 107: Gets arguments node, finds string child → .strip("'\"")→ Return: "./auth"
Each language branch follows the same pattern: iterate children, find the node type that holds the import path/name, decode and return it:
-
Dart → delegates to
_extract_dart_import -
Java → looks for
scoped_identifier→"com.google.gson.Gson" -
Go → looks for
interpreted_string_literal→"fmt" -
C/C++ → looks for
string_literalorsystem_lib_string→"stdio.h" -
Rust → looks for
scoped_identifier/identifier→"crate::config" -
C# → looks for
qualified_name→"System.IO" -
PHP → looks inside
namespace_use_clauseforqualified_name -
Ruby → checks
callnode text starts with"require", finds string in argument_list -
Kotlin → looks for
qualified_identifier→"okio.internal.Buffer" -
Swift → looks for
identifier→"Foundation"
Resolves a Java import string to an actual file path in the repo.
Input raw_import: "com.google.gson.Gson"
Input all_files: ["com/google/gson/Gson.java", "com/google/gson/GsonBuilder.java"]
Line 201: as_path = "com/google/gson/Gson" (dots → slashes)
Line 202: suffix = "com/google/gson/Gson.java"
Line 204: "com/google/gson/Gson.java" in all_files ✓ → Return: "com/google/gson/Gson.java"
If not found directly, tries iterating all_files for endswith(suffix), then falls back to _suffix_match.
Tries progressively shorter suffixes of a path against all file paths. Handles cases where the repo path is a subdirectory.
Input as_path: "src/codewalk/config"
Input extensions: [".py", "/__init__.py"]
Input all_files: ["config.py", "pipeline.py"]
Line 220: parts = ["src", "codewalk", "config"]
Lines 221–226: Try suffixes from the end:
-
i = 1→suffix = "codewalk/config"-
"codewalk/config.py" in all_files? No -
"codewalk/config/__init__.py" in all_files? No
-
-
i = 2→suffix = "config"-
"config.py" in all_files? ✓ → Return:"config.py"
-
Master resolver — takes a raw import string and tries to match it to an actual file in the repo. Dispatches to language-specific logic.
Input raw_import: "src.codewalk.config"
Input language: "python"
Input all_files: ["src/codewalk/config.py", "src/codewalk/pipeline.py"]
Input source_file: "src/codewalk/pipeline.py"
Line 240: language == "python" ✓
Line 241: as_path = "src/codewalk/config" (dots → slashes)
Line 242: candidates = ["src/codewalk/config.py", "src/codewalk/config/__init__.py"]
Line 243–244: "src/codewalk/config.py" in all_files ✓ → Return: "src/codewalk/config.py"
Input raw_import: "./auth"
Input language: "javascript"
Input all_files: ["src/services/auth.ts", "src/routes/index.ts"]
Input source_file: "src/routes/index.ts"
Line 248: raw_import.startswith(".") ✓
Line 250: source_dir = "src/routes"
Line 251: resolved_base = posixpath.normpath("src/routes/./auth") → "src/routes/auth"
Line 253: No extension on resolved_base → fall to extensionless branch
Lines 263–266: Try extensions:
-
"src/routes/auth.ts" in all_files? No -
"src/routes/auth.js" in all_files? No - ...none match. Try index files:
-
"src/routes/auth/index.ts" in all_files? No
Falls through → Return: "./auth" (unresolved)
Input raw_import: "package:my_app/models/user.dart"
Input language: "dart"
Input dart_package: "my_app"
Input all_files: ["lib/models/user.dart", "lib/main.dart"]
Line 279: raw_import.startswith("package:") ✓ and dart_package is set
Line 280: prefix = "package:my_app/"
Line 281: raw_import.startswith(prefix) ✓
Line 282: candidate = "lib/" + "models/user.dart" → "lib/models/user.dart"
Line 283: "lib/models/user.dart" in all_files ✓ → Return: "lib/models/user.dart"
Input raw_import: "crate::config"
Input language: "rust"
Input source_file: "src/lib.rs"
Input all_files: ["Cargo.toml", "src/lib.rs", "src/config.rs"]
Line 311: raw_import.startswith("crate") ✓
Line 314–318: Walk up from "src" looking for Cargo.toml → found at "" (root)
Line 320: crate_root = "", crate_src = "src"
Line 321: as_path = "src/config" (replace crate:: with src/, :: with /)
Line 324: candidates = ["src/config.rs", "src/config/mod.rs"]
Line 325: "src/config.rs" in all_files ✓ → Return: "src/config.rs"
Reads pubspec.yaml to find the Dart package name, used for resolving self-referencing package imports.
Input files: [{"file_path": "pubspec.yaml", "absolute_path": "/repo/pubspec.yaml"}, ...]
The pubspec.yaml contains:
name: my_app
version: 1.0.0Line 411: file_info["file_path"].endswith("pubspec.yaml") ✓
Line 413: Reads content
Line 414–416: Splits by \n, finds line starting with "name:" → splits on : → .strip() → Return: "my_app"
Builds the complete file-level dependency graph for all scanned files. Extracts imports from each file, resolves them to file paths, and returns the graph + stats.
Input files: [
{"file_path": "pipeline.py", "absolute_path": "/repo/pipeline.py", "language": "python"},
{"file_path": "config.py", "absolute_path": "/repo/config.py", "language": "python"},
]
pipeline.py contains from config import Settings.
config.py contains import os.
Line 443: all_file_paths = ["pipeline.py", "config.py"]
Line 449: dart_package_name = "" (no pubspec.yaml)
Line 451: graph = {}, total_edges = 0, unresolved_count = 0
Iteration 1: file_path = "pipeline.py"
-
Line 458:
raw_imports = extract_imports("/repo/pipeline.py", "python")→["config"] -
Line 462: Resolve
"config":-
resolve_import_to_file("config", "python", ["pipeline.py", "config.py"], ...)→ tries"config.py"→ found →"config.py"
-
resolved_imports = ["config.py"]-
graph["pipeline.py"] = ["config.py"],total_edges = 1
Iteration 2: file_path = "config.py"
raw_imports = ["os"]- Resolve
"os"→ tries"os.py","os/__init__.py"→ neither in all_files → unresolved → stays"os" -
resolved_imports = ["os"],unresolved_count = 1 -
graph["config.py"] = ["os"],total_edges = 2
Return:
{
"graph": {
"pipeline.py": ["config.py"],
"config.py": ["os"],
},
"stats": {
"total_files": 2,
"total_edges": 2,
"unresolved": 1,
},
}