graph graph_runtime

graph_runtime.py

In-memory graph for fast traversal and analysis. Loads edges from GraphStore (DuckDB) into igraph (C-speed library). Rebuilt on every startup — takes milliseconds even for large codebases. Maintains two separate graphs: file_graph (file-level import edges) and module_graph (module-level dependency edges).

Key Concepts

Term	Definition	Example
vertex	A node in a graph representing a single entity (a file, module, etc.).	In a file graph, `pipeline.py` is one vertex.
edge	A connection between two vertices in a graph, representing a relationship (e.g., an import).	If `pipeline.py` imports `scanner.py`, there's a directed edge `pipeline.py → scanner.py`.
directed graph	A graph where edges have direction — A→B doesn't mean B→A.	`pipeline.py → scanner.py` means pipeline imports scanner, not the reverse.
DAG	Directed Acyclic Graph — a directed graph with no cycles (no circular paths back to the same node).	A→B→C is a DAG. A→B→C→A is NOT (it has a cycle).
topological sort	Ordering vertices so that for every edge A→B, A comes after B. Only works on DAGs.	If A imports B and B imports C, topological order is: C, B, A (dependencies first).
cycle	A path in a graph that starts and ends at the same vertex. A→B→C→A is a cycle.	If `auth.py` imports `user.py` and `user.py` imports `auth.py`, that's a cycle.
strongly connected component	A group of vertices where every vertex can reach every other vertex. These form cycle groups.	If A→B→C→A, then {A, B, C} is one strongly connected component.
feedback arc set	The minimum set of edges you must remove to break all cycles and make the graph a DAG.	In cycle A→B→C→A, removing just A→B breaks the cycle. {A→B} is a feedback arc set.
betweenness centrality	How often a vertex sits on the shortest path between other vertices. High = important hub file.	`config.py` with betweenness=45.0 means 45 shortest paths between other files pass through it.
pagerank	Algorithm that ranks vertices by importance based on how many other important vertices link to them. Originally used by Google for web pages.	`utils.py` with pagerank=0.12 is important because many important files import it.
in-degree	Number of edges pointing INTO a vertex (how many files import this file).	`utils.py` with in-degree=15 means 15 other files import it.
neighborhood	All vertices reachable from a starting vertex within a given number of hops.	`neighborhood(pipeline.py, order=2, mode="in")` = all files that import pipeline.py, plus files that import THOSE files.
blast radius	All files that would be affected if a given file changes — found by following reverse import edges transitively.	If A imports B and C imports A, changing B has blast radius = {A, C}.
transitive dependency	An indirect dependency through a chain. If A imports B and B imports C, then A transitively depends on C.	Changing C could break A even though A never directly imports C.
DuckDB	An embedded SQL database (like SQLite but optimized for analytics). Used here to store file metadata and import edges.	`conn.execute("SELECT path FROM files WHERE language='python'").fetchall()` returns all Python file paths.
AST	Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.).	`def add(a, b): return a+b` becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].
igraph	A high-performance C library for graph analysis with Python bindings. Much faster than pure-Python graph libraries.	`ig.Graph.TupleList([("a.py", "b.py"), ("b.py", "c.py")], directed=True)` builds a graph with 3 vertices and 2 edges instantly.
orphan file	A file that has no import relationships — it neither imports nor is imported by any other file.	`constants.py` that's only read at runtime (not imported) would be an orphan in the import graph.

Class: `GraphRuntime` (Line 11)

Wraps two igraph.Graph instances built from DuckDB edges. All graph algorithms (blast radius, topological sort, cycle detection, centrality, shortest path) run on these in-memory graphs.

`init(self, store: GraphStore)` (Line 22)

Builds both in-memory graphs from DuckDB edge data.

Example

Input: store = GraphStore instance with:
  import edges: [("main.py", "config.py"), ("main.py", "utils.py")]
  module dep edges: [("api", "core")]

Line 23 — self.store = store: Stores the GraphStore reference for later queries (e.g. get_all_files()).

Line 24 — self.file_graph = self._build_graph(store.get_import_edges()):

store.get_import_edges() returns [("main.py", "config.py"), ("main.py", "utils.py")]
_build_graph builds a directed igraph with 3 vertices and 2 edges
self.file_graph.vcount() = 3   # main.py, config.py, utils.py
self.file_graph.ecount() = 2   # main→config, main→utils

Line 25 — self.module_graph = self._build_graph(store.get_module_dep_edges()):

store.get_module_dep_edges() returns [("api", "core")]
self.module_graph.vcount() = 2  # api, core
self.module_graph.ecount() = 1  # api→core

Lines 26–30 — Logs: [GraphRuntime] Built file_graph: 3 vertices, 2 edges | module_graph: 2 vertices, 1 edges

Returns: A GraphRuntime instance with self.store, self.file_graph, self.module_graph.

`_build_graph(edges: list[tuple[str, str]]) -> ig.Graph` (Line 32) — static

Builds a directed igraph from (source, target) string tuples.

Example

Input: edges = [("main.py", "config.py"), ("main.py", "utils.py")]

Line 34 — if not edges: → False (list has 2 items)

Line 36 — ig.Graph.TupleList(edges, directed=True): igraph creates vertices "main.py", "config.py", "utils.py" and directed edges between them automatically from the tuples.

Returns: An ig.Graph with 3 vertices and 2 directed edges.

If edges is empty, returns an empty directed graph (ig.Graph(directed=True) with 0 vertices).

`rebuild(self)` (Line 38)

Rebuilds both in-memory graphs from DuckDB. Call after re-analysis to pick up new/changed edges.

Example

# After re-running analysis that added a new import: utils.py → config.py
runtime.rebuild()

Line 40 — self.file_graph = self._build_graph(self.store.get_import_edges()): Re-reads all import edges from DuckDB and builds a fresh igraph. Now has 3 edges instead of 2.

Line 41 — self.module_graph = self._build_graph(self.store.get_module_dep_edges()): Re-reads all module dep edges.

Lines 42–46 — Logs updated vertex/edge counts.

Returns: None — both self.file_graph and self.module_graph are replaced.

`_find_vertex(self, graph: ig.Graph, name: str) -> Optional[int]` (Line 48)

Finds a vertex index by its name attribute. Returns None if the vertex doesn't exist in the graph.

Example

Input: graph = file_graph (vertices: main.py@0, config.py@1, utils.py@2)
       name = "config.py"

Line 50 — graph.vs.find(name="config.py"): igraph searches its vertex sequence for a vertex with name == "config.py". Found at index 1.

Line 50 — .index extracts the integer index:

Returns: 1

If name = "missing.py", graph.vs.find(name="missing.py") raises ValueError, caught on line 51, returns None.

`get_blast_radius(self, file_path: str) -> list[str]` (Line 54)

Returns all files that would be affected if the given file changes — i.e., all transitive reverse dependencies (files that import this file, directly or indirectly).

Example

Input: file_path = "config.py"
Assume file_graph edges: main.py → config.py, app.py → main.py

Line 56 — start = self._find_vertex(self.file_graph, "config.py") → 1

Line 57–58 — Not None, so continue.

Line 60 — self.file_graph.neighborhood(1, order=999, mode="in"):

mode="in" means follow edges backwards (who imports me?)
order=999 means unlimited depth (transitive)
Starting from config.py (index 1):
- main.py (index 0) imports config.py → included
- app.py (index 2) imports main.py → included (transitive)

affected_indices = [1, 0, 2]   # includes self (1)

Lines 62–65 — Filter out start (index 1) and map indices to names:

[file_graph.vs[0]["name"], file_graph.vs[2]["name"]] = ["main.py", "app.py"]

Returns: ["main.py", "app.py"]

Returns [] if the file isn't in the graph.

`topological_sort(self) -> list[str]` (Line 67)

Returns all files in dependency order: leaf dependencies first, dependents later. This is the recommended reading order. Includes orphan files (files with no import edges) appended at the end.

Example

Assume file_graph edges: main.py → config.py, main.py → utils.py
Assume files table also has: README.md (no import edges — orphan)

Line 77 — self.file_graph.vcount() → 3 (not 0, so continue)

Line 80 — self.file_graph.is_dag() → True (no cycles)

Lines 84–85 — self.file_graph.topological_sorting(mode="in"):

mode="in": for edge A→B (A imports B), B comes first
config.py and utils.py have no outgoing imports → they come first
main.py imports both → it comes last

sorted_indices = [1, 2, 0]   # config.py, utils.py, main.py

Line 86 — Map to names:

sorted_files = ["config.py", "utils.py", "main.py"]

Line 89 — self.store.get_all_files() returns ["config.py", "utils.py", "main.py", "README.md"]

Line 90 — graph_files = {"config.py", "utils.py", "main.py"}

Line 91 — orphans = sorted(f for f in all_files if f not in graph_files):

orphans = ["README.md"]

Line 97 — Return concatenated:

["config.py", "utils.py", "main.py", "README.md"]

Returns: ["config.py", "utils.py", "main.py", "README.md"]

Cycle fallback (lines 80–83): If the graph has cycles, uses in-degree sort instead — files imported by fewer others come first. Not a true topological sort, but a reasonable approximation.

Empty graph (lines 77–79): If no import edges exist at all, returns all files from DuckDB sorted alphabetically.

`detect_cycles(self) -> dict` (Line 99)

Detects circular dependencies in the file graph using strongly connected components and suggests minimum edges to break all cycles.

Example

Assume file_graph edges: a.py → b.py, b.py → c.py, c.py → a.py (cycle!)

Line 100 — self.file_graph.vcount() → 3 (not 0)

Line 103 — self.file_graph.is_dag() → False (has cycle)

Line 106 — self.file_graph.components(mode="STRONG"): Finds strongly connected components. A cycle group is any component with >1 vertex.

components = [[0, 1, 2]]   # all 3 vertices are in one cycle

Lines 107–110 — Filter to components with len > 1, map to names:

cycle_groups = [["a.py", "b.py", "c.py"]]

Line 113 — self.file_graph.feedback_arc_set(): Finds the minimum set of edge indices to remove to break all cycles.

fas = [2]   # e.g., edge index 2 = c.py → a.py

Lines 114–119 — Map edge indices to (source_name, target_name) tuples:

edges_to_break = [("c.py", "a.py")]

Returns:

{
  "has_cycles": True,
  "cycle_groups": [["a.py", "b.py", "c.py"]],
  "edges_to_break": [("c.py", "a.py")]
}

If no cycles: returns {"has_cycles": False, "cycle_groups": [], "edges_to_break": []}.

`centrality(self, top_n: int = 10) -> dict` (Line 126)

Returns top files by two centrality metrics: betweenness (most import paths pass through this file) and PageRank (transitively depended on by most code).

Example

Input: top_n = 2
Assume file_graph: app.py → api.py → config.py, app.py → config.py

Line 128 — self.file_graph.vcount() → 3 (not 0)

Line 131 — names = self.file_graph.vs["name"]:

names = ["app.py", "api.py", "config.py"]

Line 133 — betweenness = self.file_graph.betweenness(): Betweenness centrality for each vertex — how many shortest paths pass through it.

betweenness = [0.0, 0.0, 0.0]   # small graph, no intermediaries

Line 134 — page_rank = self.file_graph.pagerank(): PageRank score — importance based on who links to you.

page_rank = [0.15, 0.28, 0.57]   # config.py highest (imported by both)

Lines 137–139 — zip(names, betweenness), sort descending, take top 2:

top_betweenness = [("app.py", 0.0), ("api.py", 0.0)]

Lines 141–143 — zip(names, page_rank), sort descending, take top 2:

top_pagerank = [("config.py", 0.57), ("api.py", 0.28)]

Returns:

{
  "betweenness": [{"file": "app.py", "score": 0.0}, {"file": "api.py", "score": 0.0}],
  "pagerank": [{"file": "config.py", "score": 0.570000}, {"file": "api.py", "score": 0.280000}]
}

Returns {"betweenness": [], "pagerank": []} if the graph is empty.

`shortest_path(self, source: str, target: str) -> list[str]` (Line 149)

Finds the shortest import chain from one file to another.

Example

Input: source = "app.py", target = "config.py"
Assume file_graph: app.py → api.py → config.py, app.py → config.py

Line 151 — source_index = self._find_vertex(self.file_graph, "app.py") → 0

Line 152 — target_index = self._find_vertex(self.file_graph, "config.py") → 2

Line 154 — Both are not None, continue.

Line 156 — self.file_graph.get_shortest_paths(0, 2): Finds shortest path(s) from vertex 0 to vertex 2. There's a direct edge app.py → config.py (length 1).

paths = [[0, 2]]

Line 157 — paths is not empty and paths[0] is not empty.

Line 159 — Map indices to names:

[file_graph.vs[0]["name"], file_graph.vs[2]["name"]] = ["app.py", "config.py"]

Returns: ["app.py", "config.py"]

Returns [] if either file is not in the graph, or if no path exists.

`get_graph_stats(self) -> dict` (Line 161)

Returns summary statistics for both graphs: vertex count, edge count, and whether each is a DAG (directed acyclic graph).

Example

Assume file_graph: 3 vertices, 2 edges, no cycles
Assume module_graph: 2 vertices, 1 edge, no cycles

Lines 163–174 — Build stats dict:

Returns:

{
  "file_graph": {"vertices": 3, "edges": 2, "is_dag": True},
  "module_graph": {"vertices": 2, "edges": 1, "is_dag": True}
}

For empty graphs (vcount() == 0), is_dag defaults to True.

Summary

Function/Method	Line	Purpose
`GraphRuntime.__init__`	22	Build both igraphs from DuckDB edges
`GraphRuntime._build_graph`	32	Static: (source, target) tuples → directed igraph
`GraphRuntime.rebuild`	38	Re-read DuckDB, rebuild both igraphs
`GraphRuntime._find_vertex`	48	Look up vertex index by name
`GraphRuntime.get_blast_radius`	54	Transitive reverse deps of a file
`GraphRuntime.topological_sort`	67	All files in dependency-first reading order
`GraphRuntime.detect_cycles`	99	Find cycles + minimum edges to break them
`GraphRuntime.centrality`	126	Top files by betweenness and PageRank
`GraphRuntime.shortest_path`	149	Shortest import chain between two files
`GraphRuntime.get_graph_stats`	161	Vertex/edge counts + DAG status for both graphs

graph graph_runtime

graph_runtime.py

Key Concepts

Class: GraphRuntime (Line 11)

__init__(self, store: GraphStore) (Line 22)

Example

_build_graph(edges: list[tuple[str, str]]) -> ig.Graph (Line 32) — static

Example

rebuild(self) (Line 38)

Example

_find_vertex(self, graph: ig.Graph, name: str) -> Optional[int] (Line 48)

Example

get_blast_radius(self, file_path: str) -> list[str] (Line 54)

Example

topological_sort(self) -> list[str] (Line 67)

Example

detect_cycles(self) -> dict (Line 99)

Example

centrality(self, top_n: int = 10) -> dict (Line 126)

Example

shortest_path(self, source: str, target: str) -> list[str] (Line 149)

Example

get_graph_stats(self) -> dict (Line 161)

Example

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Class: `GraphRuntime` (Line 11)

`init(self, store: GraphStore)` (Line 22)

`_build_graph(edges: list[tuple[str, str]]) -> ig.Graph` (Line 32) — static

`rebuild(self)` (Line 38)

`_find_vertex(self, graph: ig.Graph, name: str) -> Optional[int]` (Line 48)

`get_blast_radius(self, file_path: str) -> list[str]` (Line 54)

`topological_sort(self) -> list[str]` (Line 67)

`detect_cycles(self) -> dict` (Line 99)

`centrality(self, top_n: int = 10) -> dict` (Line 126)

`shortest_path(self, source: str, target: str) -> list[str]` (Line 149)

`get_graph_stats(self) -> dict` (Line 161)