Skip to content

graph graph_runtime

aakash-anko edited this page May 25, 2026 · 1 revision

graph_runtime.py

In-memory graph for fast traversal and analysis. Loads edges from GraphStore (DuckDB) into igraph (C-speed library). Rebuilt on every startup — takes milliseconds even for large codebases. Maintains two separate graphs: file_graph (file-level import edges) and module_graph (module-level dependency edges).


Key Concepts

Term Definition Example
vertex A node in a graph representing a single entity (a file, module, etc.). In a file graph, pipeline.py is one vertex.
edge A connection between two vertices in a graph, representing a relationship (e.g., an import). If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py.
directed graph A graph where edges have direction — A→B doesn't mean B→A. pipeline.py → scanner.py means pipeline imports scanner, not the reverse.
DAG Directed Acyclic Graph — a directed graph with no cycles (no circular paths back to the same node). A→B→C is a DAG. A→B→C→A is NOT (it has a cycle).
topological sort Ordering vertices so that for every edge A→B, A comes after B. Only works on DAGs. If A imports B and B imports C, topological order is: C, B, A (dependencies first).
cycle A path in a graph that starts and ends at the same vertex. A→B→C→A is a cycle. If auth.py imports user.py and user.py imports auth.py, that's a cycle.
strongly connected component A group of vertices where every vertex can reach every other vertex. These form cycle groups. If A→B→C→A, then {A, B, C} is one strongly connected component.
feedback arc set The minimum set of edges you must remove to break all cycles and make the graph a DAG. In cycle A→B→C→A, removing just A→B breaks the cycle. {A→B} is a feedback arc set.
betweenness centrality How often a vertex sits on the shortest path between other vertices. High = important hub file. config.py with betweenness=45.0 means 45 shortest paths between other files pass through it.
pagerank Algorithm that ranks vertices by importance based on how many other important vertices link to them. Originally used by Google for web pages. utils.py with pagerank=0.12 is important because many important files import it.
in-degree Number of edges pointing INTO a vertex (how many files import this file). utils.py with in-degree=15 means 15 other files import it.
neighborhood All vertices reachable from a starting vertex within a given number of hops. neighborhood(pipeline.py, order=2, mode="in") = all files that import pipeline.py, plus files that import THOSE files.
blast radius All files that would be affected if a given file changes — found by following reverse import edges transitively. If A imports B and C imports A, changing B has blast radius = {A, C}.
transitive dependency An indirect dependency through a chain. If A imports B and B imports C, then A transitively depends on C. Changing C could break A even though A never directly imports C.
DuckDB An embedded SQL database (like SQLite but optimized for analytics). Used here to store file metadata and import edges. conn.execute("SELECT path FROM files WHERE language='python'").fetchall() returns all Python file paths.
AST Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].
igraph A high-performance C library for graph analysis with Python bindings. Much faster than pure-Python graph libraries. ig.Graph.TupleList([("a.py", "b.py"), ("b.py", "c.py")], directed=True) builds a graph with 3 vertices and 2 edges instantly.
orphan file A file that has no import relationships — it neither imports nor is imported by any other file. constants.py that's only read at runtime (not imported) would be an orphan in the import graph.

Class: GraphRuntime (Line 11)

Wraps two igraph.Graph instances built from DuckDB edges. All graph algorithms (blast radius, topological sort, cycle detection, centrality, shortest path) run on these in-memory graphs.


__init__(self, store: GraphStore) (Line 22)

Builds both in-memory graphs from DuckDB edge data.

Example

Input: store = GraphStore instance with:
  import edges: [("main.py", "config.py"), ("main.py", "utils.py")]
  module dep edges: [("api", "core")]

Line 23self.store = store: Stores the GraphStore reference for later queries (e.g. get_all_files()).

Line 24self.file_graph = self._build_graph(store.get_import_edges()):

store.get_import_edges() returns [("main.py", "config.py"), ("main.py", "utils.py")]
_build_graph builds a directed igraph with 3 vertices and 2 edges
self.file_graph.vcount() = 3   # main.py, config.py, utils.py
self.file_graph.ecount() = 2   # main→config, main→utils

Line 25self.module_graph = self._build_graph(store.get_module_dep_edges()):

store.get_module_dep_edges() returns [("api", "core")]
self.module_graph.vcount() = 2  # api, core
self.module_graph.ecount() = 1  # api→core

Lines 26–30 — Logs: [GraphRuntime] Built file_graph: 3 vertices, 2 edges | module_graph: 2 vertices, 1 edges

Returns: A GraphRuntime instance with self.store, self.file_graph, self.module_graph.


_build_graph(edges: list[tuple[str, str]]) -> ig.Graph (Line 32) — static

Builds a directed igraph from (source, target) string tuples.

Example

Input: edges = [("main.py", "config.py"), ("main.py", "utils.py")]

Line 34if not edges:False (list has 2 items)

Line 36ig.Graph.TupleList(edges, directed=True): igraph creates vertices "main.py", "config.py", "utils.py" and directed edges between them automatically from the tuples.

Returns: An ig.Graph with 3 vertices and 2 directed edges.

If edges is empty, returns an empty directed graph (ig.Graph(directed=True) with 0 vertices).


rebuild(self) (Line 38)

Rebuilds both in-memory graphs from DuckDB. Call after re-analysis to pick up new/changed edges.

Example

# After re-running analysis that added a new import: utils.py → config.py
runtime.rebuild()

Line 40self.file_graph = self._build_graph(self.store.get_import_edges()): Re-reads all import edges from DuckDB and builds a fresh igraph. Now has 3 edges instead of 2.

Line 41self.module_graph = self._build_graph(self.store.get_module_dep_edges()): Re-reads all module dep edges.

Lines 42–46 — Logs updated vertex/edge counts.

Returns: None — both self.file_graph and self.module_graph are replaced.


_find_vertex(self, graph: ig.Graph, name: str) -> Optional[int] (Line 48)

Finds a vertex index by its name attribute. Returns None if the vertex doesn't exist in the graph.

Example

Input: graph = file_graph (vertices: main.py@0, config.py@1, utils.py@2)
       name = "config.py"

Line 50graph.vs.find(name="config.py"): igraph searches its vertex sequence for a vertex with name == "config.py". Found at index 1.

Line 50.index extracts the integer index:

1

Returns: 1

If name = "missing.py", graph.vs.find(name="missing.py") raises ValueError, caught on line 51, returns None.


get_blast_radius(self, file_path: str) -> list[str] (Line 54)

Returns all files that would be affected if the given file changes — i.e., all transitive reverse dependencies (files that import this file, directly or indirectly).

Example

Input: file_path = "config.py"
Assume file_graph edges: main.py → config.py, app.py → main.py

Line 56start = self._find_vertex(self.file_graph, "config.py")1

Line 57–58 — Not None, so continue.

Line 60self.file_graph.neighborhood(1, order=999, mode="in"):

  • mode="in" means follow edges backwards (who imports me?)
  • order=999 means unlimited depth (transitive)
  • Starting from config.py (index 1):
    • main.py (index 0) imports config.py → included
    • app.py (index 2) imports main.py → included (transitive)
affected_indices = [1, 0, 2]   # includes self (1)

Lines 62–65 — Filter out start (index 1) and map indices to names:

[file_graph.vs[0]["name"], file_graph.vs[2]["name"]] = ["main.py", "app.py"]

Returns: ["main.py", "app.py"]

Returns [] if the file isn't in the graph.


topological_sort(self) -> list[str] (Line 67)

Returns all files in dependency order: leaf dependencies first, dependents later. This is the recommended reading order. Includes orphan files (files with no import edges) appended at the end.

Example

Assume file_graph edges: main.py → config.py, main.py → utils.py
Assume files table also has: README.md (no import edges — orphan)

Line 77self.file_graph.vcount()3 (not 0, so continue)

Line 80self.file_graph.is_dag()True (no cycles)

Lines 84–85self.file_graph.topological_sorting(mode="in"):

  • mode="in": for edge A→B (A imports B), B comes first
  • config.py and utils.py have no outgoing imports → they come first
  • main.py imports both → it comes last
sorted_indices = [1, 2, 0]   # config.py, utils.py, main.py

Line 86 — Map to names:

sorted_files = ["config.py", "utils.py", "main.py"]

Line 89self.store.get_all_files() returns ["config.py", "utils.py", "main.py", "README.md"]

Line 90graph_files = {"config.py", "utils.py", "main.py"}

Line 91orphans = sorted(f for f in all_files if f not in graph_files):

orphans = ["README.md"]

Line 97 — Return concatenated:

["config.py", "utils.py", "main.py", "README.md"]

Returns: ["config.py", "utils.py", "main.py", "README.md"]

Cycle fallback (lines 80–83): If the graph has cycles, uses in-degree sort instead — files imported by fewer others come first. Not a true topological sort, but a reasonable approximation.

Empty graph (lines 77–79): If no import edges exist at all, returns all files from DuckDB sorted alphabetically.


detect_cycles(self) -> dict (Line 99)

Detects circular dependencies in the file graph using strongly connected components and suggests minimum edges to break all cycles.

Example

Assume file_graph edges: a.py → b.py, b.py → c.py, c.py → a.py (cycle!)

Line 100self.file_graph.vcount()3 (not 0)

Line 103self.file_graph.is_dag()False (has cycle)

Line 106self.file_graph.components(mode="STRONG"): Finds strongly connected components. A cycle group is any component with >1 vertex.

components = [[0, 1, 2]]   # all 3 vertices are in one cycle

Lines 107–110 — Filter to components with len > 1, map to names:

cycle_groups = [["a.py", "b.py", "c.py"]]

Line 113self.file_graph.feedback_arc_set(): Finds the minimum set of edge indices to remove to break all cycles.

fas = [2]   # e.g., edge index 2 = c.py → a.py

Lines 114–119 — Map edge indices to (source_name, target_name) tuples:

edges_to_break = [("c.py", "a.py")]

Returns:

{
  "has_cycles": True,
  "cycle_groups": [["a.py", "b.py", "c.py"]],
  "edges_to_break": [("c.py", "a.py")]
}

If no cycles: returns {"has_cycles": False, "cycle_groups": [], "edges_to_break": []}.


centrality(self, top_n: int = 10) -> dict (Line 126)

Returns top files by two centrality metrics: betweenness (most import paths pass through this file) and PageRank (transitively depended on by most code).

Example

Input: top_n = 2
Assume file_graph: app.py → api.py → config.py, app.py → config.py

Line 128self.file_graph.vcount()3 (not 0)

Line 131names = self.file_graph.vs["name"]:

names = ["app.py", "api.py", "config.py"]

Line 133betweenness = self.file_graph.betweenness(): Betweenness centrality for each vertex — how many shortest paths pass through it.

betweenness = [0.0, 0.0, 0.0]   # small graph, no intermediaries

Line 134page_rank = self.file_graph.pagerank(): PageRank score — importance based on who links to you.

page_rank = [0.15, 0.28, 0.57]   # config.py highest (imported by both)

Lines 137–139zip(names, betweenness), sort descending, take top 2:

top_betweenness = [("app.py", 0.0), ("api.py", 0.0)]

Lines 141–143zip(names, page_rank), sort descending, take top 2:

top_pagerank = [("config.py", 0.57), ("api.py", 0.28)]

Returns:

{
  "betweenness": [{"file": "app.py", "score": 0.0}, {"file": "api.py", "score": 0.0}],
  "pagerank": [{"file": "config.py", "score": 0.570000}, {"file": "api.py", "score": 0.280000}]
}

Returns {"betweenness": [], "pagerank": []} if the graph is empty.


shortest_path(self, source: str, target: str) -> list[str] (Line 149)

Finds the shortest import chain from one file to another.

Example

Input: source = "app.py", target = "config.py"
Assume file_graph: app.py → api.py → config.py, app.py → config.py

Line 151source_index = self._find_vertex(self.file_graph, "app.py")0

Line 152target_index = self._find_vertex(self.file_graph, "config.py")2

Line 154 — Both are not None, continue.

Line 156self.file_graph.get_shortest_paths(0, 2): Finds shortest path(s) from vertex 0 to vertex 2. There's a direct edge app.py → config.py (length 1).

paths = [[0, 2]]

Line 157paths is not empty and paths[0] is not empty.

Line 159 — Map indices to names:

[file_graph.vs[0]["name"], file_graph.vs[2]["name"]] = ["app.py", "config.py"]

Returns: ["app.py", "config.py"]

Returns [] if either file is not in the graph, or if no path exists.


get_graph_stats(self) -> dict (Line 161)

Returns summary statistics for both graphs: vertex count, edge count, and whether each is a DAG (directed acyclic graph).

Example

Assume file_graph: 3 vertices, 2 edges, no cycles
Assume module_graph: 2 vertices, 1 edge, no cycles

Lines 163–174 — Build stats dict:

Returns:

{
  "file_graph": {"vertices": 3, "edges": 2, "is_dag": True},
  "module_graph": {"vertices": 2, "edges": 1, "is_dag": True}
}

For empty graphs (vcount() == 0), is_dag defaults to True.


Summary

Function/Method Line Purpose
GraphRuntime.__init__ 22 Build both igraphs from DuckDB edges
GraphRuntime._build_graph 32 Static: (source, target) tuples → directed igraph
GraphRuntime.rebuild 38 Re-read DuckDB, rebuild both igraphs
GraphRuntime._find_vertex 48 Look up vertex index by name
GraphRuntime.get_blast_radius 54 Transitive reverse deps of a file
GraphRuntime.topological_sort 67 All files in dependency-first reading order
GraphRuntime.detect_cycles 99 Find cycles + minimum edges to break them
GraphRuntime.centrality 126 Top files by betweenness and PageRank
GraphRuntime.shortest_path 149 Shortest import chain between two files
GraphRuntime.get_graph_stats 161 Vertex/edge counts + DAG status for both graphs

Clone this wiki locally