Skip to content

analysis dependency_graph

aakash-anko edited this page May 25, 2026 · 1 revision

analysis/dependency_graph.py

Extracts import statements from source files using tree-sitter, resolves them to actual file paths in the repo, and builds a file-level dependency graph.


Key Concepts

Term Definition Example
edge A connection between two vertices in a graph, representing a relationship (e.g., an import). If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py.
AST Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].
tree-sitter A fast, multi-language parser that builds ASTs. Supports 100+ languages without needing each language's compiler. tree-sitter parses config.py into an AST, then we extract function/class nodes from it.

Source: src/codewalk/analysis/dependency_graph.py


Constants

IMPORT_NODE_TYPES (line 14)

Maps each language to the AST node types that represent import statements:

{
    "python":     ["import_statement", "import_from_statement"],
    "javascript": ["import_statement", "call_expression"],   # call_expression for require()
    "dart":       ["import_or_export"],
    "java":       ["import_declaration"],
    "go":         ["import_declaration", "import_spec"],
    "rust":       ["use_declaration"],
    ...  # 14 languages
}

extract_imports

Parses a file with tree-sitter and returns a list of raw import strings (not yet resolved to file paths).

Example

Input file_path: "/repo/src/codewalk/pipeline.py"
Input language: "python"

The file contains:

import os
from src.codewalk.config import Settings

Line 37: "python" in IMPORT_NODE_TYPES ✓
Line 40: parser = get_parser_for_language("python") → Parser object

Line 45: source = Path(file_path).read_bytes() → raw bytes of the file
Line 49: tree = parser.parse(source) → AST
Line 50: root = tree.root_node

Line 52: target_types = {"import_statement", "import_from_statement"}

Line 54: imports = []

Line 56: Calls _walk_for_imports(root, target_types) → yields 2 import nodes
Line 57: For each node, calls _extract_raw_import(node, "python"):

  • Node 1: import_statement for import os → extracts "os"
  • Node 2: import_from_statement for from src.codewalk.config import Settings → extracts "src.codewalk.config"

Return: ["os", "src.codewalk.config"]


_walk_for_imports

Recursively walks the AST and yields nodes whose type matches the target import types.

Example

Input node: <root of a JS file with "import express from 'express'" and "const fs = require('fs')">
Input target_types: {"import_statement", "call_expression"}

Line 63: node.type = "program" → not in target_types → skip yield
Line 70: Recurse into children:

  • Child 1: <import_statement> → type in target_types ✓
    • Line 63: Not a call_expressionyield directly
  • Child 2: <expression_statement> → not in target_types → recurse
    • Finds <call_expression> → type in target_types ✓
    • Line 65: It IS a call_expression → check: func.text == "require" ✓ → yield

Yields: the import_statement node and the require() call_expression node


_extract_dart_import

Walks the nested Dart AST structure to extract the URI string from an import statement.

Example

Input node: <import_or_export for "import 'package:flutter/material.dart';">

Line 75: Iterates children, finds child.type == "library_import"
Line 76: Inside that, finds spec.type == "import_specification"
Line 77: Inside that, finds part.type == "configurable_uri"
Line 78: Inside that, finds uri_node.type == "uri"
Line 79: uri_node.text.decode("utf-8").strip("'\"")"package:flutter/material.dart"

Return: "package:flutter/material.dart"


_extract_raw_import

Given an import AST node, extracts the module/path string being imported. Has per-language branches.

Example (Python)

Input node: <import_from_statement for "from pathlib import Path">
Input language: "python"

Line 86: language == "python"
Lines 87–91: Loop through node.children:

  • child.type == "dotted_name"child.text = "pathlib"Return: "pathlib"

Example (JavaScript ES import)

Input node: <import_statement for "import express from 'express'">
Input language: "javascript"

Line 95: language in ("javascript", "typescript")
Line 97: node.type == "import_statement"
Lines 98–100: Loop through children:

  • Finds child.type == "string"child.text = "'express'".strip("'\"")Return: "express"

Example (JavaScript CommonJS require)

Input node: <call_expression for "require('./auth')">
Input language: "javascript"

Line 104: node.type == "call_expression"
Line 105: func.text == "require"
Line 107: Gets arguments node, finds string child → .strip("'\"")Return: "./auth"

Other languages

Each language branch follows the same pattern: iterate children, find the node type that holds the import path/name, decode and return it:

  • Dart → delegates to _extract_dart_import
  • Java → looks for scoped_identifier"com.google.gson.Gson"
  • Go → looks for interpreted_string_literal"fmt"
  • C/C++ → looks for string_literal or system_lib_string"stdio.h"
  • Rust → looks for scoped_identifier/identifier"crate::config"
  • C# → looks for qualified_name"System.IO"
  • PHP → looks inside namespace_use_clause for qualified_name
  • Ruby → checks call node text starts with "require", finds string in argument_list
  • Kotlin → looks for qualified_identifier"okio.internal.Buffer"
  • Swift → looks for identifier"Foundation"

_resolve_java

Resolves a Java import string to an actual file path in the repo.

Example

Input raw_import: "com.google.gson.Gson"
Input all_files: ["com/google/gson/Gson.java", "com/google/gson/GsonBuilder.java"]

Line 201: as_path = "com/google/gson/Gson" (dots → slashes)
Line 202: suffix = "com/google/gson/Gson.java"
Line 204: "com/google/gson/Gson.java" in all_files ✓ → Return: "com/google/gson/Gson.java"

If not found directly, tries iterating all_files for endswith(suffix), then falls back to _suffix_match.


_suffix_match

Tries progressively shorter suffixes of a path against all file paths. Handles cases where the repo path is a subdirectory.

Example

Input as_path: "src/codewalk/config"
Input extensions: [".py", "/__init__.py"]
Input all_files: ["config.py", "pipeline.py"]

Line 220: parts = ["src", "codewalk", "config"]
Lines 221–226: Try suffixes from the end:

  • i = 1suffix = "codewalk/config"
    • "codewalk/config.py" in all_files? No
    • "codewalk/config/__init__.py" in all_files? No
  • i = 2suffix = "config"
    • "config.py" in all_files? ✓ → Return: "config.py"

resolve_import_to_file

Master resolver — takes a raw import string and tries to match it to an actual file in the repo. Dispatches to language-specific logic.

Example (Python)

Input raw_import: "src.codewalk.config"
Input language: "python"
Input all_files: ["src/codewalk/config.py", "src/codewalk/pipeline.py"]
Input source_file: "src/codewalk/pipeline.py"

Line 240: language == "python"
Line 241: as_path = "src/codewalk/config" (dots → slashes)
Line 242: candidates = ["src/codewalk/config.py", "src/codewalk/config/__init__.py"]
Line 243–244: "src/codewalk/config.py" in all_files ✓ → Return: "src/codewalk/config.py"

Example (JS relative import)

Input raw_import: "./auth"
Input language: "javascript"
Input all_files: ["src/services/auth.ts", "src/routes/index.ts"]
Input source_file: "src/routes/index.ts"

Line 248: raw_import.startswith(".")
Line 250: source_dir = "src/routes"
Line 251: resolved_base = posixpath.normpath("src/routes/./auth")"src/routes/auth"
Line 253: No extension on resolved_base → fall to extensionless branch
Lines 263–266: Try extensions:

  • "src/routes/auth.ts" in all_files? No
  • "src/routes/auth.js" in all_files? No
  • ...none match. Try index files:
  • "src/routes/auth/index.ts" in all_files? No

Falls through → Return: "./auth" (unresolved)

Example (Dart package import)

Input raw_import: "package:my_app/models/user.dart"
Input language: "dart"
Input dart_package: "my_app"
Input all_files: ["lib/models/user.dart", "lib/main.dart"]

Line 279: raw_import.startswith("package:") ✓ and dart_package is set
Line 280: prefix = "package:my_app/"
Line 281: raw_import.startswith(prefix)
Line 282: candidate = "lib/" + "models/user.dart""lib/models/user.dart"
Line 283: "lib/models/user.dart" in all_files ✓ → Return: "lib/models/user.dart"

Example (Rust crate import)

Input raw_import: "crate::config"
Input language: "rust"
Input source_file: "src/lib.rs"
Input all_files: ["Cargo.toml", "src/lib.rs", "src/config.rs"]

Line 311: raw_import.startswith("crate")
Line 314–318: Walk up from "src" looking for Cargo.toml → found at "" (root)
Line 320: crate_root = "", crate_src = "src"
Line 321: as_path = "src/config" (replace crate:: with src/, :: with /)
Line 324: candidates = ["src/config.rs", "src/config/mod.rs"]
Line 325: "src/config.rs" in all_files ✓ → Return: "src/config.rs"


_detect_dart_package_name

Reads pubspec.yaml to find the Dart package name, used for resolving self-referencing package imports.

Example

Input files: [{"file_path": "pubspec.yaml", "absolute_path": "/repo/pubspec.yaml"}, ...]

The pubspec.yaml contains:

name: my_app
version: 1.0.0

Line 411: file_info["file_path"].endswith("pubspec.yaml")
Line 413: Reads content
Line 414–416: Splits by \n, finds line starting with "name:" → splits on :.strip()Return: "my_app"


build_dependency_graph

Builds the complete file-level dependency graph for all scanned files. Extracts imports from each file, resolves them to file paths, and returns the graph + stats.

Example

Input files: [
    {"file_path": "pipeline.py", "absolute_path": "/repo/pipeline.py", "language": "python"},
    {"file_path": "config.py",   "absolute_path": "/repo/config.py",   "language": "python"},
]

pipeline.py contains from config import Settings.
config.py contains import os.

Line 443: all_file_paths = ["pipeline.py", "config.py"]
Line 449: dart_package_name = "" (no pubspec.yaml)
Line 451: graph = {}, total_edges = 0, unresolved_count = 0

Iteration 1: file_path = "pipeline.py"

  • Line 458: raw_imports = extract_imports("/repo/pipeline.py", "python")["config"]
  • Line 462: Resolve "config":
    • resolve_import_to_file("config", "python", ["pipeline.py", "config.py"], ...) → tries "config.py" → found → "config.py"
  • resolved_imports = ["config.py"]
  • graph["pipeline.py"] = ["config.py"], total_edges = 1

Iteration 2: file_path = "config.py"

  • raw_imports = ["os"]
  • Resolve "os" → tries "os.py", "os/__init__.py" → neither in all_files → unresolved → stays "os"
  • resolved_imports = ["os"], unresolved_count = 1
  • graph["config.py"] = ["os"], total_edges = 2

Return:

{
    "graph": {
        "pipeline.py": ["config.py"],
        "config.py":   ["os"],
    },
    "stats": {
        "total_files": 2,
        "total_edges": 2,
        "unresolved": 1,
    },
}

Clone this wiki locally