-
Notifications
You must be signed in to change notification settings - Fork 0
graph graph_runtime
In-memory graph for fast traversal and analysis. Loads edges from GraphStore (DuckDB) into igraph (C-speed library). Rebuilt on every startup — takes milliseconds even for large codebases. Maintains two separate graphs: file_graph (file-level import edges) and module_graph (module-level dependency edges).
| Term | Definition | Example |
|---|---|---|
| vertex | A node in a graph representing a single entity (a file, module, etc.). | In a file graph, pipeline.py is one vertex. |
| edge | A connection between two vertices in a graph, representing a relationship (e.g., an import). | If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py. |
| directed graph | A graph where edges have direction — A→B doesn't mean B→A. |
pipeline.py → scanner.py means pipeline imports scanner, not the reverse. |
| DAG | Directed Acyclic Graph — a directed graph with no cycles (no circular paths back to the same node). | A→B→C is a DAG. A→B→C→A is NOT (it has a cycle). |
| topological sort | Ordering vertices so that for every edge A→B, A comes after B. Only works on DAGs. | If A imports B and B imports C, topological order is: C, B, A (dependencies first). |
| cycle | A path in a graph that starts and ends at the same vertex. A→B→C→A is a cycle. | If auth.py imports user.py and user.py imports auth.py, that's a cycle. |
| strongly connected component | A group of vertices where every vertex can reach every other vertex. These form cycle groups. | If A→B→C→A, then {A, B, C} is one strongly connected component. |
| feedback arc set | The minimum set of edges you must remove to break all cycles and make the graph a DAG. | In cycle A→B→C→A, removing just A→B breaks the cycle. {A→B} is a feedback arc set. |
| betweenness centrality | How often a vertex sits on the shortest path between other vertices. High = important hub file. |
config.py with betweenness=45.0 means 45 shortest paths between other files pass through it. |
| pagerank | Algorithm that ranks vertices by importance based on how many other important vertices link to them. Originally used by Google for web pages. |
utils.py with pagerank=0.12 is important because many important files import it. |
| in-degree | Number of edges pointing INTO a vertex (how many files import this file). |
utils.py with in-degree=15 means 15 other files import it. |
| neighborhood | All vertices reachable from a starting vertex within a given number of hops. |
neighborhood(pipeline.py, order=2, mode="in") = all files that import pipeline.py, plus files that import THOSE files. |
| blast radius | All files that would be affected if a given file changes — found by following reverse import edges transitively. | If A imports B and C imports A, changing B has blast radius = {A, C}. |
| transitive dependency | An indirect dependency through a chain. If A imports B and B imports C, then A transitively depends on C. | Changing C could break A even though A never directly imports C. |
| DuckDB | An embedded SQL database (like SQLite but optimized for analytics). Used here to store file metadata and import edges. |
conn.execute("SELECT path FROM files WHERE language='python'").fetchall() returns all Python file paths. |
| AST | Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). |
def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)]. |
| igraph | A high-performance C library for graph analysis with Python bindings. Much faster than pure-Python graph libraries. |
ig.Graph.TupleList([("a.py", "b.py"), ("b.py", "c.py")], directed=True) builds a graph with 3 vertices and 2 edges instantly. |
| orphan file | A file that has no import relationships — it neither imports nor is imported by any other file. |
constants.py that's only read at runtime (not imported) would be an orphan in the import graph. |
Wraps two igraph.Graph instances built from DuckDB edges. All graph algorithms (blast radius, topological sort, cycle detection, centrality, shortest path) run on these in-memory graphs.
Builds both in-memory graphs from DuckDB edge data.
Input: store = GraphStore instance with:
import edges: [("main.py", "config.py"), ("main.py", "utils.py")]
module dep edges: [("api", "core")]
Line 23 — self.store = store:
Stores the GraphStore reference for later queries (e.g. get_all_files()).
Line 24 — self.file_graph = self._build_graph(store.get_import_edges()):
store.get_import_edges() returns [("main.py", "config.py"), ("main.py", "utils.py")]
_build_graph builds a directed igraph with 3 vertices and 2 edges
self.file_graph.vcount() = 3 # main.py, config.py, utils.py
self.file_graph.ecount() = 2 # main→config, main→utils
Line 25 — self.module_graph = self._build_graph(store.get_module_dep_edges()):
store.get_module_dep_edges() returns [("api", "core")]
self.module_graph.vcount() = 2 # api, core
self.module_graph.ecount() = 1 # api→core
Lines 26–30 — Logs: [GraphRuntime] Built file_graph: 3 vertices, 2 edges | module_graph: 2 vertices, 1 edges
Returns: A GraphRuntime instance with self.store, self.file_graph, self.module_graph.
Builds a directed igraph from (source, target) string tuples.
Input: edges = [("main.py", "config.py"), ("main.py", "utils.py")]
Line 34 — if not edges: → False (list has 2 items)
Line 36 — ig.Graph.TupleList(edges, directed=True):
igraph creates vertices "main.py", "config.py", "utils.py" and directed edges between them automatically from the tuples.
Returns: An ig.Graph with 3 vertices and 2 directed edges.
If edges is empty, returns an empty directed graph (ig.Graph(directed=True) with 0 vertices).
Rebuilds both in-memory graphs from DuckDB. Call after re-analysis to pick up new/changed edges.
# After re-running analysis that added a new import: utils.py → config.py
runtime.rebuild()
Line 40 — self.file_graph = self._build_graph(self.store.get_import_edges()):
Re-reads all import edges from DuckDB and builds a fresh igraph. Now has 3 edges instead of 2.
Line 41 — self.module_graph = self._build_graph(self.store.get_module_dep_edges()):
Re-reads all module dep edges.
Lines 42–46 — Logs updated vertex/edge counts.
Returns: None — both self.file_graph and self.module_graph are replaced.
Finds a vertex index by its name attribute. Returns None if the vertex doesn't exist in the graph.
Input: graph = file_graph (vertices: main.py@0, config.py@1, utils.py@2)
name = "config.py"
Line 50 — graph.vs.find(name="config.py"):
igraph searches its vertex sequence for a vertex with name == "config.py". Found at index 1.
Line 50 — .index extracts the integer index:
1
Returns: 1
If name = "missing.py", graph.vs.find(name="missing.py") raises ValueError, caught on line 51, returns None.
Returns all files that would be affected if the given file changes — i.e., all transitive reverse dependencies (files that import this file, directly or indirectly).
Input: file_path = "config.py"
Assume file_graph edges: main.py → config.py, app.py → main.py
Line 56 — start = self._find_vertex(self.file_graph, "config.py") → 1
Line 57–58 — Not None, so continue.
Line 60 — self.file_graph.neighborhood(1, order=999, mode="in"):
-
mode="in"means follow edges backwards (who imports me?) -
order=999means unlimited depth (transitive) - Starting from
config.py(index 1):-
main.py(index 0) importsconfig.py→ included -
app.py(index 2) importsmain.py→ included (transitive)
-
affected_indices = [1, 0, 2] # includes self (1)
Lines 62–65 — Filter out start (index 1) and map indices to names:
[file_graph.vs[0]["name"], file_graph.vs[2]["name"]] = ["main.py", "app.py"]
Returns: ["main.py", "app.py"]
Returns [] if the file isn't in the graph.
Returns all files in dependency order: leaf dependencies first, dependents later. This is the recommended reading order. Includes orphan files (files with no import edges) appended at the end.
Assume file_graph edges: main.py → config.py, main.py → utils.py
Assume files table also has: README.md (no import edges — orphan)
Line 77 — self.file_graph.vcount() → 3 (not 0, so continue)
Line 80 — self.file_graph.is_dag() → True (no cycles)
Lines 84–85 — self.file_graph.topological_sorting(mode="in"):
-
mode="in": for edge A→B (A imports B), B comes first - config.py and utils.py have no outgoing imports → they come first
- main.py imports both → it comes last
sorted_indices = [1, 2, 0] # config.py, utils.py, main.py
Line 86 — Map to names:
sorted_files = ["config.py", "utils.py", "main.py"]
Line 89 — self.store.get_all_files() returns ["config.py", "utils.py", "main.py", "README.md"]
Line 90 — graph_files = {"config.py", "utils.py", "main.py"}
Line 91 — orphans = sorted(f for f in all_files if f not in graph_files):
orphans = ["README.md"]
Line 97 — Return concatenated:
["config.py", "utils.py", "main.py", "README.md"]
Returns: ["config.py", "utils.py", "main.py", "README.md"]
Cycle fallback (lines 80–83): If the graph has cycles, uses in-degree sort instead — files imported by fewer others come first. Not a true topological sort, but a reasonable approximation.
Empty graph (lines 77–79): If no import edges exist at all, returns all files from DuckDB sorted alphabetically.
Detects circular dependencies in the file graph using strongly connected components and suggests minimum edges to break all cycles.
Assume file_graph edges: a.py → b.py, b.py → c.py, c.py → a.py (cycle!)
Line 100 — self.file_graph.vcount() → 3 (not 0)
Line 103 — self.file_graph.is_dag() → False (has cycle)
Line 106 — self.file_graph.components(mode="STRONG"):
Finds strongly connected components. A cycle group is any component with >1 vertex.
components = [[0, 1, 2]] # all 3 vertices are in one cycle
Lines 107–110 — Filter to components with len > 1, map to names:
cycle_groups = [["a.py", "b.py", "c.py"]]
Line 113 — self.file_graph.feedback_arc_set():
Finds the minimum set of edge indices to remove to break all cycles.
fas = [2] # e.g., edge index 2 = c.py → a.py
Lines 114–119 — Map edge indices to (source_name, target_name) tuples:
edges_to_break = [("c.py", "a.py")]
Returns:
{
"has_cycles": True,
"cycle_groups": [["a.py", "b.py", "c.py"]],
"edges_to_break": [("c.py", "a.py")]
}
If no cycles: returns {"has_cycles": False, "cycle_groups": [], "edges_to_break": []}.
Returns top files by two centrality metrics: betweenness (most import paths pass through this file) and PageRank (transitively depended on by most code).
Input: top_n = 2
Assume file_graph: app.py → api.py → config.py, app.py → config.py
Line 128 — self.file_graph.vcount() → 3 (not 0)
Line 131 — names = self.file_graph.vs["name"]:
names = ["app.py", "api.py", "config.py"]
Line 133 — betweenness = self.file_graph.betweenness():
Betweenness centrality for each vertex — how many shortest paths pass through it.
betweenness = [0.0, 0.0, 0.0] # small graph, no intermediaries
Line 134 — page_rank = self.file_graph.pagerank():
PageRank score — importance based on who links to you.
page_rank = [0.15, 0.28, 0.57] # config.py highest (imported by both)
Lines 137–139 — zip(names, betweenness), sort descending, take top 2:
top_betweenness = [("app.py", 0.0), ("api.py", 0.0)]
Lines 141–143 — zip(names, page_rank), sort descending, take top 2:
top_pagerank = [("config.py", 0.57), ("api.py", 0.28)]
Returns:
{
"betweenness": [{"file": "app.py", "score": 0.0}, {"file": "api.py", "score": 0.0}],
"pagerank": [{"file": "config.py", "score": 0.570000}, {"file": "api.py", "score": 0.280000}]
}
Returns {"betweenness": [], "pagerank": []} if the graph is empty.
Finds the shortest import chain from one file to another.
Input: source = "app.py", target = "config.py"
Assume file_graph: app.py → api.py → config.py, app.py → config.py
Line 151 — source_index = self._find_vertex(self.file_graph, "app.py") → 0
Line 152 — target_index = self._find_vertex(self.file_graph, "config.py") → 2
Line 154 — Both are not None, continue.
Line 156 — self.file_graph.get_shortest_paths(0, 2):
Finds shortest path(s) from vertex 0 to vertex 2. There's a direct edge app.py → config.py (length 1).
paths = [[0, 2]]
Line 157 — paths is not empty and paths[0] is not empty.
Line 159 — Map indices to names:
[file_graph.vs[0]["name"], file_graph.vs[2]["name"]] = ["app.py", "config.py"]
Returns: ["app.py", "config.py"]
Returns [] if either file is not in the graph, or if no path exists.
Returns summary statistics for both graphs: vertex count, edge count, and whether each is a DAG (directed acyclic graph).
Assume file_graph: 3 vertices, 2 edges, no cycles
Assume module_graph: 2 vertices, 1 edge, no cycles
Lines 163–174 — Build stats dict:
Returns:
{
"file_graph": {"vertices": 3, "edges": 2, "is_dag": True},
"module_graph": {"vertices": 2, "edges": 1, "is_dag": True}
}
For empty graphs (vcount() == 0), is_dag defaults to True.
| Function/Method | Line | Purpose |
|---|---|---|
GraphRuntime.__init__ |
22 | Build both igraphs from DuckDB edges |
GraphRuntime._build_graph |
32 | Static: (source, target) tuples → directed igraph |
GraphRuntime.rebuild |
38 | Re-read DuckDB, rebuild both igraphs |
GraphRuntime._find_vertex |
48 | Look up vertex index by name |
GraphRuntime.get_blast_radius |
54 | Transitive reverse deps of a file |
GraphRuntime.topological_sort |
67 | All files in dependency-first reading order |
GraphRuntime.detect_cycles |
99 | Find cycles + minimum edges to break them |
GraphRuntime.centrality |
126 | Top files by betweenness and PageRank |
GraphRuntime.shortest_path |
149 | Shortest import chain between two files |
GraphRuntime.get_graph_stats |
161 | Vertex/edge counts + DAG status for both graphs |