Skip to content

api state

aakash-anko edited this page May 28, 2026 · 3 revisions

api/state.py

Single source of truth for all runtime state shared between the MCP server and the FastAPI API — stores, agents, analysis caches, and graph objects.


Key Concepts

Term Definition Example
edge A connection between two vertices in a graph, representing a relationship (e.g., an import). If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py.
embedding A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors. The code def add(a, b): return a+b might become [0.12, -0.45, 0.78, ...] (1536 numbers for OpenAI).
vector store A database optimized for storing embeddings and finding the most similar ones quickly. ChromaDB stores code chunk embeddings and returns the 5 most similar chunks to your query.
ChromaDB An open-source vector database for storing and searching embeddings. Used here to store code chunks. collection.query(query_texts=["scan files"], n_results=5) returns the 5 closest code chunks.
DuckDB An embedded SQL database (like SQLite but optimized for analytics). Used here to store file metadata and import edges. conn.execute("SELECT path FROM files WHERE language='python'").fetchall() returns all Python file paths.
chunk A piece of source code (usually one function or class) stored as a unit for search. The function def scan_directory(root): ... (20 lines) is one chunk.
AST Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].
glob pattern A wildcard pattern for matching file paths. * matches anything in one directory, ** matches across directories. **/*.py matches all Python files in any subdirectory. src/*.ts matches TypeScript files only in src/.
igraph A high-performance C library for graph analysis with Python bindings. Much faster than pure-Python graph libraries. ig.Graph.TupleList([("a.py", "b.py"), ("b.py", "c.py")], directed=True) builds a graph with 3 vertices and 2 edges instantly.
hunk A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file). A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed).

Module-level globals — line 20

These private variables hold the entire runtime state. Every getter and setter in this file reads/writes these.

_store            = None   # VectorStore (ChromaDB wrapper)
_agent            = None   # Compiled LangGraph agent
_modules_result   = None   # detect_modules() output
_analyze_result   = None   # Last /analyze result dict
_files            = None   # scan_directory() result (list of file dicts)
_deps             = None   # build_dependency_graph() result
_repo_path        = None   # Target repo being analyzed
_graph_store      = None   # GraphStore (DuckDB persistent)
_graph_runtime    = None   # GraphRuntime (igraph in-memory)
_doc_store        = None   # DocStore (ChromaDB "docs" collection)
_banner_shown     = False  # One-time upgrade banner flag
_init_lock        = threading.Lock()  # Prevents double-initialization

get_store() — line 33

Returns the VectorStore instance. Raises RuntimeError if no codebase has been analyzed yet.

Example

State: _store = VectorStore(persist_dir="/app/.codewalk/chroma")

Line 35: _store is NoneFalse (store exists) Line 37: return VectorStore(persist_dir="/app/.codewalk/chroma")

State: _store = None

Line 35: _store is NoneTrue Line 36: raises RuntimeError("No codebase analyzed yet. Call POST /analyze first.")


get_agent() — line 40

Returns the compiled LangGraph agent. Raises RuntimeError if not initialized.

Example

State: _agent = <CompiledGraph object>

Line 42: _agent is NoneFalse Line 44: return <CompiledGraph object>


get_modules_result() — line 47

Returns the modules detection result dict. Raises RuntimeError if not initialized.

Example

State: _modules_result = {"modules": {"api": {...}, "analysis": {...}}, "stats": {"total_files": 45, "total_modules": 5}, "module_graph": {"api": ["analysis"]}}

Line 49: _modules_result is NoneFalse Line 51: return {"modules": {"api": {...}, "analysis": {...}}, "stats": {...}, "module_graph": {...}}


get_analyze_result() — line 54

Returns the last analyze result dict (contains repo_path, files_scanned, etc.). Raises if not initialized.

Example

State: _analyze_result = {"repo_path": "/home/user/my-app", "files_scanned": 127, "chunks_created": 843}

Line 56: _analyze_result is NoneFalse Line 58: return {"repo_path": "/home/user/my-app", "files_scanned": 127, "chunks_created": 843}


get_files() — line 61

Returns the cached scan_directory() result. Raises if not initialized.

Example

State: _files = [{"path": "src/main.py", "language": "python", "size": 2048}, {"path": "src/utils.py", "language": "python", "size": 512}]

Line 63: _files is NoneFalse Line 65: return [{"path": "src/main.py", ...}, {"path": "src/utils.py", ...}]


get_deps() — line 68

Returns the cached build_dependency_graph() result. Raises if not initialized.

Example

State: _deps = {"graph": {"src/main.py": ["src/utils.py"], "src/utils.py": []}, "reverse": {"src/utils.py": ["src/main.py"]}}

Line 70: _deps is NoneFalse Line 72: return {"graph": {"src/main.py": ["src/utils.py"], ...}, "reverse": {...}}


get_graph_runtime() — line 75

Returns the GraphRuntime (igraph-based in-memory graph). Raises if not initialized.

Example

State: _graph_runtime = GraphRuntime(<GraphStore>)

Line 77: _graph_runtime is NoneFalse Line 79: return GraphRuntime(<GraphStore>)


get_graph_store() — line 81

Returns the GraphStore (DuckDB). Returns None if not initialized (does NOT raise).

Example

State: _graph_store = None

Line 83: return None


get_doc_store()

Returns the DocStore instance, creating it lazily on first call. Unlike the code stores, this doesn't require ensure_initialized() — docs can be indexed independently of codebase analysis.

Example

State: _doc_store = None (first call)

Line: _doc_store is NoneTrue Line: col_name = f"{get_collection_name()}_docs" → e.g. "myrepo_docs" Line: _doc_store = DocStore(persist_dir, collection_name="myrepo_docs") → creates ChromaDB client Line: _doc_store.create_collection() → gets/creates "myrepo_docs" collection Line: return DocStore instance


Version Import

CODEWALK_VERSION is imported from src/codewalk/__init__.__version__ (single source of truth). Used by _check_upgrade_banner() and _write_meta() to track which version built the index.


initialize() — line 87

Sets all state after a successful /analyze. This is the main "write" function — everything else reads from what this sets.

Example

Input:

store         = VectorStore(persist_dir="/app/.codewalk/chroma")
agent         = <CompiledGraph>
modules_result = {"modules": {"api": {"files": ["main.py"], "file_count": 1, "languages": {"python": 1}}}, "stats": {"total_files": 1, "total_modules": 1}, "module_graph": {}}
analyze_result = {"repo_path": "/app", "files_scanned": 1, "chunks_created": 10}
files         = [{"path": "main.py", "language": "python"}]
deps          = {"graph": {"main.py": []}}
repo_path     = "/app"
embedded_chunks = [{"id": "chunk-1", "text": "def hello():", "metadata": {"file": "main.py"}}]

Line 91: global declares all module-level variables writable Line 92: _storeVectorStore(persist_dir="/app/.codewalk/chroma") Line 93: _agent<CompiledGraph> Line 94: _modules_result{"modules": {"api": {...}}, ...} Line 95: _analyze_result{"repo_path": "/app", "files_scanned": 1, ...} Line 96: _files[{"path": "main.py", ...}] Line 97: _deps{"graph": {"main.py": []}} Line 99: repo_path is truthy → _repo_path"/app"

Line 102: files and deps and modules_result → all truthy, enter block Line 103: repo"/app" (from _repo_path) Line 104: db_path"/app/.codewalk/graph.duckdb" Line 105: _graph_store is not NoneFalse (first call), skip close Line 106: _graph_storeGraphStore("/app/.codewalk/graph.duckdb") Line 107: _graph_store.populate_from_analysis(files, deps, modules_result, embedded_chunks=embedded_chunks) — writes files, edges, modules, and chunks into DuckDB Line 109: _graph_runtimeGraphRuntime(_graph_store) — builds an igraph from DuckDB data Line 110: _agentcreate_agent(...) — recreates agent with graph-aware tools


refresh() — line 113

Updates cached analysis data and rebuilds the graph. Does NOT re-embed. Used by POST /refresh.

Example

Input:

files          = [{"path": "main.py", ...}, {"path": "new_file.py", ...}]
deps           = {"graph": {"main.py": ["new_file.py"], "new_file.py": []}}
modules_result = {"modules": {"api": {"files": ["main.py", "new_file.py"], "file_count": 2}}, ...}

Line 116: _files[{"path": "main.py", ...}, {"path": "new_file.py", ...}] Line 117: _deps{"graph": {"main.py": ["new_file.py"], "new_file.py": []}} Line 118: _modules_result → updated dict with 2 files

Line 121: repo"/app" (from _repo_path) Line 122: db_path"/app/.codewalk/graph.duckdb" Line 123-124: Close old _graph_store if it exists Line 125: _graph_store → new GraphStore("/app/.codewalk/graph.duckdb") Line 126: populate_from_analysis(...) — writes fresh data to DuckDB Line 127: _graph_runtime → new GraphRuntime(_graph_store)


get_repo_path() — line 131

Returns the current repo path from state, falling back to settings.repo_path.

Example

State: _repo_path = "/home/user/my-app"

Line 133: return "/home/user/my-app"

State: _repo_path = None, settings.repo_path = "/default/repo"

Line 133: _repo_path is None → return "/default/repo"


get_collection_name() — line 136

Derives the ChromaDB collection name from the repo path by taking the last path segment.

Example

State: _repo_path = "/home/user/my-project"

Line 142: path"/home/user/my-project" Line 143: path.rstrip("/")"/home/user/my-project" Line 143: .split("/")["", "home", "user", "my-project"] Line 143: [-1]"my-project" Line 143: "my-project" or "codebase""my-project" Line 143: return "my-project"

Edge case_repo_path = "/":

Line 143: "/".rstrip("/")"" Line 143: "".split("/")[""] Line 143: [-1]"" Line 143: "" or "codebase""codebase" Line 143: return "codebase"


chroma_path() — line 146

Returns the ChromaDB directory path: {repo_path}/.codewalk/chroma/.

Example

State: _repo_path = "/home/user/my-project"

Line 148: repo"/home/user/my-project" Line 149: return "/home/user/my-project/.codewalk/chroma"


_rebuild_memory_caches() — line 152

Re-scans files and rebuilds the in-memory caches (_files, _deps, _modules_result). Does NOT touch DuckDB.

Example

State: _repo_path = "/home/user/my-app"

Line 155: repo_path"/home/user/my-app" Line 156: _repo_path"/home/user/my-app" Line 157: _filesscan_directory("/home/user/my-app") → e.g. [{"path": "src/main.py", ...}, ...] (127 files) Line 158: _depsbuild_dependency_graph(_files){"graph": {...}, "reverse": {...}} Line 159: _modules_resultdetect_modules(_files, _deps){"modules": {...}, "stats": {...}} Line 160: logs "[cache] Memory caches rebuilt: 127 files, 95 in graph, 5 modules"


rebuild_analysis_cache() — line 163

Re-scans files, rebuilds in-memory caches, AND repopulates DuckDB. Used after analyze, reindex, or refresh.

Example

Input: embedded_chunks = [{"id": "c1", "text": "...", "metadata": {"file": "main.py"}}]

Line 167: calls _rebuild_memory_caches() — sets _files, _deps, _modules_result Line 169: repo_path"/home/user/my-app" Line 170: db_path"/home/user/my-app/.codewalk/graph.duckdb" Line 171-172: close old _graph_store if exists Line 173: _graph_store → new GraphStore("/home/user/my-app/.codewalk/graph.duckdb") Line 174: populate_from_analysis(_files, _deps, _modules_result, embedded_chunks=embedded_chunks) — writes everything to DuckDB Line 176: _graph_runtime → new GraphRuntime(_graph_store) — builds igraph from DuckDB


ensure_initialized() — line 179

Auto-loads index and analysis cache from disk if not already in memory. Uses double-checked locking to be thread-safe.

Example — already initialized

State: _store = VectorStore(...), _modules_result = {...}

Line 185: _store is not None and _modules_result is not NoneTrue Line 186: return immediately (no work done)

Example — cold start, index on disk

State: _store = None, _modules_result = None

Line 185: _store is not None and _modules_result is not NoneFalse Line 188: acquires _init_lock Line 190: double-check: still None → proceed Line 193: calls _ensure_initialized_locked()


_ensure_initialized_locked() — line 196

Inner initialization logic run under _init_lock. Loads ChromaDB index from disk, rebuilds memory caches, opens DuckDB (without repopulating), and recreates the agent.

Example

State: _repo_path = "/home/user/my-app", chroma dir exists on disk with 843 chunks

Line 201: chroma"/home/user/my-app/.codewalk/chroma" Line 202: os.path.isdir(chroma)True (index exists on disk)

Line 206: calls _rebuild_memory_caches() — sets _files, _deps, _modules_result

Line 209: repo"/home/user/my-app" Line 210: db_path"/home/user/my-app/.codewalk/graph.duckdb" Line 213: _graph_storeGraphStore("/home/user/my-app/.codewalk/graph.duckdb") (opens existing DB) Line 214: _graph_runtimeGraphRuntime(_graph_store)

Line 216: _storeVectorStore(persist_dir="/home/user/my-app/.codewalk/chroma") Line 217: _store.create_collection("my-app") — opens existing collection

Line 220: _graph_store.populate_chunks_from_chromadb(_store) — backfills chunks table if empty

Line 222: count843 Line 224: logs "[ensure_initialized] Loaded 843 chunks from /home/user/my-app/.codewalk/chroma"

Line 227: _analyze_result{"repo_path": "/home/user/my-app", "skipped": True}

Line 230: _agent is NoneTrue, _store is not NoneTrue, _modules_result is not NoneTrue Line 231: _agentcreate_agent(...) (agent recreated with full state) Line 232: logs "[ensure_initialized] Agent recreated"


_check_upgrade_banner() — line 234

Shows a one-time banner if the on-disk index was built with an older version of codewalk.

Example

Input: repo_path = "/home/user/my-app" Disk: .codewalk/meta.json contains {"codewalk_version": "1.2.0"} Code: CODEWALK_VERSION = "1.3.0"

Line 240: _banner_shownFalse Line 243: meta_path"/home/user/my-app/.codewalk/meta.json" Line 244: os.path.exists(meta_path)True

Line 247: reads file → meta{"codewalk_version": "1.2.0"} Line 251: stored_version"1.2.0" Line 254: current_version"1.3.0"

Line 256: "1.2.0" < "1.3.0"True Line 257-260: logs banner:

  ⚡ Codewalk v1.3.0 — index was built with v1.2.0
     Run codewalk_analyze_codebase to rebuild with latest features.

Line 262: _banner_shownTrue (won't show again this process)

Clone this wiki locally