-
Notifications
You must be signed in to change notification settings - Fork 0
api state
Single source of truth for all runtime state shared between the MCP server and the FastAPI API — stores, agents, analysis caches, and graph objects.
| Term | Definition | Example |
|---|---|---|
| edge | A connection between two vertices in a graph, representing a relationship (e.g., an import). | If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py. |
| embedding | A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors. | The code def add(a, b): return a+b might become [0.12, -0.45, 0.78, ...] (1536 numbers for OpenAI). |
| vector store | A database optimized for storing embeddings and finding the most similar ones quickly. | ChromaDB stores code chunk embeddings and returns the 5 most similar chunks to your query. |
| ChromaDB | An open-source vector database for storing and searching embeddings. Used here to store code chunks. |
collection.query(query_texts=["scan files"], n_results=5) returns the 5 closest code chunks. |
| DuckDB | An embedded SQL database (like SQLite but optimized for analytics). Used here to store file metadata and import edges. |
conn.execute("SELECT path FROM files WHERE language='python'").fetchall() returns all Python file paths. |
| chunk | A piece of source code (usually one function or class) stored as a unit for search. | The function def scan_directory(root): ... (20 lines) is one chunk. |
| AST | Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). |
def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)]. |
| glob pattern | A wildcard pattern for matching file paths. * matches anything in one directory, ** matches across directories. |
**/*.py matches all Python files in any subdirectory. src/*.ts matches TypeScript files only in src/. |
| igraph | A high-performance C library for graph analysis with Python bindings. Much faster than pure-Python graph libraries. |
ig.Graph.TupleList([("a.py", "b.py"), ("b.py", "c.py")], directed=True) builds a graph with 3 vertices and 2 edges instantly. |
| hunk | A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file). | A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed). |
These private variables hold the entire runtime state. Every getter and setter in this file reads/writes these.
_store = None # VectorStore (ChromaDB wrapper)
_agent = None # Compiled LangGraph agent
_modules_result = None # detect_modules() output
_analyze_result = None # Last /analyze result dict
_files = None # scan_directory() result (list of file dicts)
_deps = None # build_dependency_graph() result
_repo_path = None # Target repo being analyzed
_graph_store = None # GraphStore (DuckDB persistent)
_graph_runtime = None # GraphRuntime (igraph in-memory)
_doc_store = None # DocStore (ChromaDB "docs" collection)
_banner_shown = False # One-time upgrade banner flag
_init_lock = threading.Lock() # Prevents double-initialization
Returns the VectorStore instance. Raises RuntimeError if no codebase has been analyzed yet.
State: _store = VectorStore(persist_dir="/app/.codewalk/chroma")
Line 35: _store is None → False (store exists)
Line 37: return VectorStore(persist_dir="/app/.codewalk/chroma")
State: _store = None
Line 35: _store is None → True
Line 36: raises RuntimeError("No codebase analyzed yet. Call POST /analyze first.")
Returns the compiled LangGraph agent. Raises RuntimeError if not initialized.
State: _agent = <CompiledGraph object>
Line 42: _agent is None → False
Line 44: return <CompiledGraph object>
Returns the modules detection result dict. Raises RuntimeError if not initialized.
State: _modules_result = {"modules": {"api": {...}, "analysis": {...}}, "stats": {"total_files": 45, "total_modules": 5}, "module_graph": {"api": ["analysis"]}}
Line 49: _modules_result is None → False
Line 51: return {"modules": {"api": {...}, "analysis": {...}}, "stats": {...}, "module_graph": {...}}
Returns the last analyze result dict (contains repo_path, files_scanned, etc.). Raises if not initialized.
State: _analyze_result = {"repo_path": "/home/user/my-app", "files_scanned": 127, "chunks_created": 843}
Line 56: _analyze_result is None → False
Line 58: return {"repo_path": "/home/user/my-app", "files_scanned": 127, "chunks_created": 843}
Returns the cached scan_directory() result. Raises if not initialized.
State: _files = [{"path": "src/main.py", "language": "python", "size": 2048}, {"path": "src/utils.py", "language": "python", "size": 512}]
Line 63: _files is None → False
Line 65: return [{"path": "src/main.py", ...}, {"path": "src/utils.py", ...}]
Returns the cached build_dependency_graph() result. Raises if not initialized.
State: _deps = {"graph": {"src/main.py": ["src/utils.py"], "src/utils.py": []}, "reverse": {"src/utils.py": ["src/main.py"]}}
Line 70: _deps is None → False
Line 72: return {"graph": {"src/main.py": ["src/utils.py"], ...}, "reverse": {...}}
Returns the GraphRuntime (igraph-based in-memory graph). Raises if not initialized.
State: _graph_runtime = GraphRuntime(<GraphStore>)
Line 77: _graph_runtime is None → False
Line 79: return GraphRuntime(<GraphStore>)
Returns the GraphStore (DuckDB). Returns None if not initialized (does NOT raise).
State: _graph_store = None
Line 83: return None
Returns the DocStore instance, creating it lazily on first call. Unlike the code stores, this doesn't require ensure_initialized() — docs can be indexed independently of codebase analysis.
State: _doc_store = None (first call)
Line: _doc_store is None → True
Line: col_name = f"{get_collection_name()}_docs" → e.g. "myrepo_docs"
Line: _doc_store = DocStore(persist_dir, collection_name="myrepo_docs") → creates ChromaDB client
Line: _doc_store.create_collection() → gets/creates "myrepo_docs" collection
Line: return DocStore instance
CODEWALK_VERSION is imported from src/codewalk/__init__.__version__ (single source of truth). Used by _check_upgrade_banner() and _write_meta() to track which version built the index.
Sets all state after a successful /analyze. This is the main "write" function — everything else reads from what this sets.
Input:
store = VectorStore(persist_dir="/app/.codewalk/chroma")
agent = <CompiledGraph>
modules_result = {"modules": {"api": {"files": ["main.py"], "file_count": 1, "languages": {"python": 1}}}, "stats": {"total_files": 1, "total_modules": 1}, "module_graph": {}}
analyze_result = {"repo_path": "/app", "files_scanned": 1, "chunks_created": 10}
files = [{"path": "main.py", "language": "python"}]
deps = {"graph": {"main.py": []}}
repo_path = "/app"
embedded_chunks = [{"id": "chunk-1", "text": "def hello():", "metadata": {"file": "main.py"}}]
Line 91: global declares all module-level variables writable
Line 92: _store → VectorStore(persist_dir="/app/.codewalk/chroma")
Line 93: _agent → <CompiledGraph>
Line 94: _modules_result → {"modules": {"api": {...}}, ...}
Line 95: _analyze_result → {"repo_path": "/app", "files_scanned": 1, ...}
Line 96: _files → [{"path": "main.py", ...}]
Line 97: _deps → {"graph": {"main.py": []}}
Line 99: repo_path is truthy → _repo_path → "/app"
Line 102: files and deps and modules_result → all truthy, enter block
Line 103: repo → "/app" (from _repo_path)
Line 104: db_path → "/app/.codewalk/graph.duckdb"
Line 105: _graph_store is not None → False (first call), skip close
Line 106: _graph_store → GraphStore("/app/.codewalk/graph.duckdb")
Line 107: _graph_store.populate_from_analysis(files, deps, modules_result, embedded_chunks=embedded_chunks) — writes files, edges, modules, and chunks into DuckDB
Line 109: _graph_runtime → GraphRuntime(_graph_store) — builds an igraph from DuckDB data
Line 110: _agent → create_agent(...) — recreates agent with graph-aware tools
Updates cached analysis data and rebuilds the graph. Does NOT re-embed. Used by POST /refresh.
Input:
files = [{"path": "main.py", ...}, {"path": "new_file.py", ...}]
deps = {"graph": {"main.py": ["new_file.py"], "new_file.py": []}}
modules_result = {"modules": {"api": {"files": ["main.py", "new_file.py"], "file_count": 2}}, ...}
Line 116: _files → [{"path": "main.py", ...}, {"path": "new_file.py", ...}]
Line 117: _deps → {"graph": {"main.py": ["new_file.py"], "new_file.py": []}}
Line 118: _modules_result → updated dict with 2 files
Line 121: repo → "/app" (from _repo_path)
Line 122: db_path → "/app/.codewalk/graph.duckdb"
Line 123-124: Close old _graph_store if it exists
Line 125: _graph_store → new GraphStore("/app/.codewalk/graph.duckdb")
Line 126: populate_from_analysis(...) — writes fresh data to DuckDB
Line 127: _graph_runtime → new GraphRuntime(_graph_store)
Returns the current repo path from state, falling back to settings.repo_path.
State: _repo_path = "/home/user/my-app"
Line 133: return "/home/user/my-app"
State: _repo_path = None, settings.repo_path = "/default/repo"
Line 133: _repo_path is None → return "/default/repo"
Derives the ChromaDB collection name from the repo path by taking the last path segment.
State: _repo_path = "/home/user/my-project"
Line 142: path → "/home/user/my-project"
Line 143: path.rstrip("/") → "/home/user/my-project"
Line 143: .split("/") → ["", "home", "user", "my-project"]
Line 143: [-1] → "my-project"
Line 143: "my-project" or "codebase" → "my-project"
Line 143: return "my-project"
Edge case — _repo_path = "/":
Line 143: "/".rstrip("/") → ""
Line 143: "".split("/") → [""]
Line 143: [-1] → ""
Line 143: "" or "codebase" → "codebase"
Line 143: return "codebase"
Returns the ChromaDB directory path: {repo_path}/.codewalk/chroma/.
State: _repo_path = "/home/user/my-project"
Line 148: repo → "/home/user/my-project"
Line 149: return "/home/user/my-project/.codewalk/chroma"
Re-scans files and rebuilds the in-memory caches (_files, _deps, _modules_result). Does NOT touch DuckDB.
State: _repo_path = "/home/user/my-app"
Line 155: repo_path → "/home/user/my-app"
Line 156: _repo_path → "/home/user/my-app"
Line 157: _files → scan_directory("/home/user/my-app") → e.g. [{"path": "src/main.py", ...}, ...] (127 files)
Line 158: _deps → build_dependency_graph(_files) → {"graph": {...}, "reverse": {...}}
Line 159: _modules_result → detect_modules(_files, _deps) → {"modules": {...}, "stats": {...}}
Line 160: logs "[cache] Memory caches rebuilt: 127 files, 95 in graph, 5 modules"
Re-scans files, rebuilds in-memory caches, AND repopulates DuckDB. Used after analyze, reindex, or refresh.
Input: embedded_chunks = [{"id": "c1", "text": "...", "metadata": {"file": "main.py"}}]
Line 167: calls _rebuild_memory_caches() — sets _files, _deps, _modules_result
Line 169: repo_path → "/home/user/my-app"
Line 170: db_path → "/home/user/my-app/.codewalk/graph.duckdb"
Line 171-172: close old _graph_store if exists
Line 173: _graph_store → new GraphStore("/home/user/my-app/.codewalk/graph.duckdb")
Line 174: populate_from_analysis(_files, _deps, _modules_result, embedded_chunks=embedded_chunks) — writes everything to DuckDB
Line 176: _graph_runtime → new GraphRuntime(_graph_store) — builds igraph from DuckDB
Auto-loads index and analysis cache from disk if not already in memory. Uses double-checked locking to be thread-safe.
State: _store = VectorStore(...), _modules_result = {...}
Line 185: _store is not None and _modules_result is not None → True
Line 186: return immediately (no work done)
State: _store = None, _modules_result = None
Line 185: _store is not None and _modules_result is not None → False
Line 188: acquires _init_lock
Line 190: double-check: still None → proceed
Line 193: calls _ensure_initialized_locked()
Inner initialization logic run under _init_lock. Loads ChromaDB index from disk, rebuilds memory caches, opens DuckDB (without repopulating), and recreates the agent.
State: _repo_path = "/home/user/my-app", chroma dir exists on disk with 843 chunks
Line 201: chroma → "/home/user/my-app/.codewalk/chroma"
Line 202: os.path.isdir(chroma) → True (index exists on disk)
Line 206: calls _rebuild_memory_caches() — sets _files, _deps, _modules_result
Line 209: repo → "/home/user/my-app"
Line 210: db_path → "/home/user/my-app/.codewalk/graph.duckdb"
Line 213: _graph_store → GraphStore("/home/user/my-app/.codewalk/graph.duckdb") (opens existing DB)
Line 214: _graph_runtime → GraphRuntime(_graph_store)
Line 216: _store → VectorStore(persist_dir="/home/user/my-app/.codewalk/chroma")
Line 217: _store.create_collection("my-app") — opens existing collection
Line 220: _graph_store.populate_chunks_from_chromadb(_store) — backfills chunks table if empty
Line 222: count → 843
Line 224: logs "[ensure_initialized] Loaded 843 chunks from /home/user/my-app/.codewalk/chroma"
Line 227: _analyze_result → {"repo_path": "/home/user/my-app", "skipped": True}
Line 230: _agent is None → True, _store is not None → True, _modules_result is not None → True
Line 231: _agent → create_agent(...) (agent recreated with full state)
Line 232: logs "[ensure_initialized] Agent recreated"
Shows a one-time banner if the on-disk index was built with an older version of codewalk.
Input: repo_path = "/home/user/my-app"
Disk: .codewalk/meta.json contains {"codewalk_version": "1.2.0"}
Code: CODEWALK_VERSION = "1.3.0"
Line 240: _banner_shown → False
Line 243: meta_path → "/home/user/my-app/.codewalk/meta.json"
Line 244: os.path.exists(meta_path) → True
Line 247: reads file → meta → {"codewalk_version": "1.2.0"}
Line 251: stored_version → "1.2.0"
Line 254: current_version → "1.3.0"
Line 256: "1.2.0" < "1.3.0" → True
Line 257-260: logs banner:
⚡ Codewalk v1.3.0 — index was built with v1.2.0
Run codewalk_analyze_codebase to rebuild with latest features.
Line 262: _banner_shown → True (won't show again this process)