api main

api/main.py

FastAPI application with all HTTP endpoints — analyze, chat, overview, modules, blast radius, reading order, execution flow, review, voice, and health check.

Key Concepts

Term	Definition	Example
edge	A connection between two vertices in a graph, representing a relationship (e.g., an import).	If `pipeline.py` imports `scanner.py`, there's a directed edge `pipeline.py → scanner.py`.
DAG	Directed Acyclic Graph — a directed graph with no cycles (no circular paths back to the same node).	A→B→C is a DAG. A→B→C→A is NOT (it has a cycle).
cycle	A path in a graph that starts and ends at the same vertex. A→B→C→A is a cycle.	If `auth.py` imports `user.py` and `user.py` imports `auth.py`, that's a cycle.
betweenness centrality	How often a vertex sits on the shortest path between other vertices. High = important hub file.	`config.py` with betweenness=45.0 means 45 shortest paths between other files pass through it.
blast radius	All files that would be affected if a given file changes — found by following reverse import edges transitively.	If A imports B and C imports A, changing B has blast radius = {A, C}.
transitive dependency	An indirect dependency through a chain. If A imports B and B imports C, then A transitively depends on C.	Changing C could break A even though A never directly imports C.
embedding	A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors.	The code `def add(a, b): return a+b` might become `[0.12, -0.45, 0.78, ...]` (1536 numbers for OpenAI).
vector store	A database optimized for storing embeddings and finding the most similar ones quickly.	ChromaDB stores code chunk embeddings and returns the 5 most similar chunks to your query.
cosine distance	Measures how different two vectors are. 0.0 = identical meaning, 1.0 = completely different, 2.0 = opposite.	Query `"scan files"` has cosine distance 0.15 to `scan_directory()` (very similar) and 0.85 to `grade_answer()` (very different).
ChromaDB	An open-source vector database for storing and searching embeddings. Used here to store code chunks.	`collection.query(query_texts=["scan files"], n_results=5)` returns the 5 closest code chunks.
chunk	A piece of source code (usually one function or class) stored as a unit for search.	The function `def scan_directory(root): ...` (20 lines) is one chunk.
AST	Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.).	`def add(a, b): return a+b` becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].
LLM	Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt.	`get_llm()` returns a ChatOpenAI instance that can answer questions about code.
glob pattern	A wildcard pattern for matching file paths. `` matches anything in one directory, `*` matches across directories.	`*/.py` matches all Python files in any subdirectory. `src/*.ts` matches TypeScript files only in `src/`.
diff	The set of changes between two versions of code, showing added (+) and removed (-) lines.	`- old_line\n+ new_line` shows `old_line` was replaced with `new_line`.
hunk	A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file).	A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed).
STT	Speech-to-Text — converting spoken audio into text (transcription).	User speaks into microphone → STT produces `"What does scan_directory do?"`.

`app` — line 46

Creates the FastAPI application instance with CORS middleware allowing all origins.

app = FastAPI(title="Codewalk API", description="AI-powered codebase onboarding tool", version="1.0.0")

CORS is wide-open (allow_origins=["*"]) so the Next.js frontend can call the API from any port.

`global_exception_handler()` — line 57

Catches all unhandled exceptions and converts them to user-friendly JSON messages via classify_error().

Example

Input: an unhandled ValueError("chromadb collection not found")

Line 59: user_message → classify_error(ValueError(...)) → e.g. "ChromaDB collection not found. Try re-analyzing." Line 60: logs "[api] Error: chromadb collection not found" Line 61-63: returns JSONResponse(status_code=500, content={"detail": "ChromaDB collection not found. Try re-analyzing."})

`POST /analyze` — line 66

Indexes a codebase: scan → chunk → embed → store → build agent.

Modes

`index_mode`	Behavior
`"auto"`	Skip indexing if collection already has data
`"reindex"`	Smart re-index (only changed/new/deleted files)
`"full"`	Nuke everything, re-embed from scratch

Example

Input: AnalyzeRequest(repo_path="/home/user/my-app", collection_name="", index_mode="auto")

Line 77: request.repo_path → "/home/user/my-app" (not empty, stays as is) Line 78: request.collection_name is "" → enter block Line 79: "/home/user/my-app".rstrip("/").split("/")[-1] → "my-app" Line 79: request.collection_name → "my-app" Line 80: persist_dir → "/home/user/my-app/.codewalk/chroma" Line 81: store → VectorStore(persist_dir="/home/user/my-app/.codewalk/chroma") Line 82: store.create_collection("my-app") Line 83: existing_count → store.chunk_count() → e.g. 843

Line 86: request.index_mode == "full" → False, existing_count == 0 → False Line 88: request.index_mode == "reindex" → False Line 90: auto mode + data exists → skip indexing Line 91-95: index_result → {"repo_path": "/home/user/my-app", "files_scanned": 0, "chunks_created": 0, "skipped": True} Line 96: logs "[api] Skipping indexing — collection already has 843 chunks"

Line 99: files → scan_directory("/home/user/my-app") → list of 127 file dicts Line 100: deps → build_dependency_graph(files) → {"graph": {...}, "reverse": {...}} Line 101: modules_result → detect_modules(files, deps) → {"modules": {"api": {...}, ...}, "stats": {...}}

Line 104: agent → create_agent(store, modules_result, files=files, deps=deps)

Line 107: state.initialize(store, agent, modules_result, index_result, files=files, deps=deps, repo_path="/home/user/my-app", embedded_chunks=None)

Line 110-112: loads guidelines store if REVIEW_GUIDELINES_PATH is set

Line 114-120: returns:

{
  "status": "complete",
  "repo_path": "/home/user/my-app",
  "files_scanned": 0,
  "chunks_created": 0,
  "modules": ["api", "analysis", "embeddings", "ingestion", "generation"]
}

`POST /analyze/stream` — line 124

Streams analysis progress via Server-Sent Events (SSE). Same logic as /analyze but yields progress events at each step.

SSE Event Format

Each event is a JSON line: data: {"step": "<step>", "message": "<msg>"}\n\n

Steps

Step	When
`init`	Checking existing index
`scan`	Scanning directory
`filter`	LLM-based file filtering (if enabled)
`chunk`	Chunking + embedding
`embed`	Embedding complete
`store`	Storing in ChromaDB
`reindex`	Smart re-index stats
`skip`	Index exists, skipping
`analyze`	Building dependency graph
`agent`	Creating AI agent
`guidelines`	Embedding guidelines
`done`	Final event with full result
`error`	Exception message

Example (auto mode, index exists)

Input: AnalyzeRequest(repo_path="/home/user/my-app", index_mode="auto")

Events yielded:

data: {"step": "init", "message": "Checking existing index..."}
data: {"step": "skip", "message": "Index exists (843 chunks) — skipping"}
data: {"step": "analyze", "message": "Building dependency graph..."}
data: {"step": "analyze", "message": "Detected 5 modules"}
data: {"step": "agent", "message": "Creating AI agent..."}
data: {"step": "done", "message": "Analysis complete!", "result": {"status": "complete", ...}}

Returns StreamingResponse with media_type="text/event-stream" and X-Accel-Buffering: no.

`POST /chat` — line 247

Asks the agent a question about the codebase.

Example

Input: ChatRequest(message="How does authentication work?", thread_id="session-42")

Line 253: state.ensure_initialized() — auto-loads if needed Line 254: agent → the compiled LangGraph agent Line 255-258: config → {"configurable": {"thread_id": "session-42"}} Line 259-262: result → agent.invoke({"messages": [("human", "How does authentication work?")]}, config=config) Line 262: answer → result["messages"][-1].content → e.g. "Auth uses JWT tokens issued by..." Line 263: returns ChatResponse(answer="Auth uses JWT tokens issued by...", thread_id="session-42")

`GET /overview` — line 268

Returns project overview: tech stack, modules, Mermaid diagram, LLM summary, and top 30 riskiest files.

Example

Line 274: state.ensure_initialized() Line 275: modules_result → {"modules": {"api": {...}, ...}, "stats": {"total_files": 45, "total_modules": 5}, "module_graph": {"api": ["analysis"]}} Line 276: store → VectorStore(...)

Line 279: diagram → generate_module_diagram({"api": ["analysis"]}) → "graph LR\n api --> analysis"

Line 282: analyze_result → {"repo_path": "/home/user/my-app", ...} Line 283: tech → detect_tech_stack("/home/user/my-app") → ["Python", "FastAPI", "ChromaDB"]

Line 286: overview_text → generate_overview(tech, modules_result, diagram) → LLM-generated summary string

Line 288: deps → {"graph": {...}} Line 289: runtime → _graph_runtime (or fallback to deps["graph"]) Line 290: blast_map → calculate_full_blast_map(runtime) → {"blast_map": [{"file": "config.py", "count": 30}, ...]} Line 291: top_files → ["config.py", "utils.py", ...] (first 30) Line 292: top_risky, _ → compute_file_risks(top_files, runtime) → list of dicts with risk levels

Line 294-302: returns OverviewResponse(tech_stack=["Python", "FastAPI", "ChromaDB"], total_files=45, total_modules=5, ...)

`GET /modules/{module_name}` — line 310

Returns details about a specific module: files, languages, dependencies, and blast radius.

Example

Input: module_name = "api"

Line 316: state.ensure_initialized() Line 317: module_result → full modules dict Line 318: modules → {"api": {"files": [...], "file_count": 3, "languages": {"python": 3}}, ...} Line 319: module_graph → {"api": ["analysis", "embeddings"], ...}

Line 321-323: actual_name → "api", info → {"files": [...], "file_count": 3, ...}, matched_as_feature → False

Line 331: depends_on → module_graph["api"] → ["analysis", "embeddings"] Line 332-334: depended_by → scan all modules → e.g. [] (nothing depends on api)

Line 337: runtime → _graph_runtime or fallback Line 338: file_risks, max_risk → compute_file_risks(["src/api/main.py", "src/api/models.py", "src/api/state.py"], runtime) → e.g. ([{"file": "main.py", "risk_level": "medium", ...}], "medium")

Line 340-348: returns:

{
  "name": "api",
  "file_count": 3,
  "files": ["src/api/main.py", "src/api/models.py", "src/api/state.py"],
  "languages": {"python": 3},
  "depends_on": ["analysis", "embeddings"],
  "depended_by": [],
  "blast_radius": [{"file": "main.py", "risk_level": "medium", ...}],
  "module_risk": "medium"
}

Module not found

If resolve_module_with_fallback returns None:

Line 325-329: raises HTTPException(status_code=404, detail="Module 'xyz' not found. Available: analysis, api, embeddings, ...")

`GET /blast-radius/{module_name}` and `GET /blast-radius` — line 355

Returns blast radius (change risk) for files, optionally scoped to a module.

Example — scoped to a module

Input: module_name = "analysis"

Line 362: state.ensure_initialized() Line 363: modules_result → full modules dict Line 364-365: runtime → graph runtime

Line 368: module_name is "analysis" → truthy, enter block Line 370: actual_name → "analysis" Line 376: target_files → ["src/analysis/blast_radius.py", "src/analysis/dependency_graph.py", ...] (sorted) Line 377: scope → "analysis"

Line 381: file_results, max_risk → compute_file_risks(target_files, runtime) → e.g. ([...], "high")

Line 383-387: returns:

{
  "module": "analysis",
  "module_risk": "high",
  "total_files": 6,
  "files": [{"file": "dependency_graph.py", "risk_level": "high", "affected_files": 18}, ...]
}

Example — whole repo (no module_name)

Line 368: module_name is "" → falsy Line 379: target_files → all files from deps["graph"] (sorted) Line 380: scope → "all"

`GET /modules` — line 396

Lists all available module names.

Example

Line 401: state.ensure_initialized() Line 402: modules_result → {"modules": {"api": {...}, "analysis": {...}}, "stats": {"total_modules": 5}} Line 403-405: returns:

{
  "modules": ["api", "analysis", "embeddings", "ingestion", "generation"],
  "total": 5
}

`GET /reading-order` — line 410

Returns the recommended file reading order with risk annotations.

Example

Line 415: state.ensure_initialized() Line 416: files → list of 127 file dicts Line 417: deps → dependency graph Line 418: runtime → graph runtime Line 419: order → generate_reading_order(files, deps, graph_runtime=runtime) → {"order": [{"file": "config.py", "relevance": "essential", "why": "Used by every module"}, ...]} Line 420: order_files → ["config.py", "utils.py", ...] Line 421: risks, _ → compute_file_risks(order_files, runtime) Line 422: risks_by_file → {"config.py": {"risk_level": "critical", ...}, ...} Line 423-428: enriches each item with risk_level, affected_files, direct, transitive Line 429-430: maps relevance → priority, why → reason for frontend compatibility

Returns the enriched order dict.

`GET /execution-flow` — line 435

Returns the execution flow diagram and narration.

Example

Line 439: state.ensure_initialized() Line 440: analyze_result → {"repo_path": "/home/user/my-app", ...} Line 441: repo_path → "/home/user/my-app" Line 442-444: files, deps, runtime from state Line 445: order → generate_reading_order(files, deps, graph_runtime=runtime) Line 446: flow → generate_execution_flow(order, deps) → Mermaid diagram + narration text Line 447: returns {"flow": "<mermaid + narration>"}

`POST /refresh` — line 450

Re-scans files and rebuilds dependency graph + modules. Does NOT re-embed or re-index.

Example

Line 458: state.ensure_initialized() Line 459: state.rebuild_analysis_cache() — re-scans, rebuilds graph

Line 461-464: returns:

{
  "status": "refreshed",
  "files": 127,
  "modules": ["api", "analysis", "embeddings", "ingestion", "generation"]
}

`POST /incremental-reindex` — line 472

Re-embeds only files that changed since last indexing (hash-based comparison).

Example

Line 478: store → current VectorStore Line 479: repo_path → "/home/user/my-app" Line 480: collection_name → "my-app" Line 481: persist_dir → "/home/user/my-app/.codewalk/chroma" Line 482: indexed_files → ["src/main.py", "src/utils.py", ...] (all files currently in ChromaDB) Line 483-484: if empty → HTTPException(400, "No files indexed yet. Run /analyze first.")

Line 486: result → incremental_reindex(indexed_files, repo_path, collection_name, persist_dir=persist_dir) → {"new_files": 2, "changed_files": 1, "deleted_files": 0, ...}

Line 489: state.rebuild_analysis_cache(embedded_chunks=result.get("embedded_chunks")) — refresh graph

Line 491: returns the result dict

`POST /review` — line 498

Reviews the current git diff for bugs, security issues, and style.

Example

Input: ReviewRequest(staged=True, target_branch="main")

Line 505: state.ensure_initialized() Line 507-511: gets store and deps (non-fatal if missing)

Line 513-520: result → review_diff(staged=True, target_branch="main", use_llm=True, store=store, deps=deps, graph_store=..., repo_path=...)

Line 522-533: transforms result.issues into list of dicts:

[{
  "severity": "high",
  "category": "security",
  "file_path": "src/api/main.py",
  "line_number": 42,
  "title": "SQL injection risk",
  "explanation": "...",
  "suggestion": "...",
  "code_snippet": "..."
}]

Line 535-540: returns {"issues": [...], "summary": "...", "files_reviewed": 3, "lines_added": 45, "lines_removed": 12}

`POST /review/file` — line 547

Reviews a single file against codebase conventions using LLM + vector search for context.

Example

Input: ReviewFileRequest(file_path="src/codewalk/api/main.py")

Line 554: store → VectorStore Line 556-557: reads file content from disk Line 559: results → store.search("code in src/codewalk/api/main.py", n_results=5) — top 5 similar chunks Line 561: filtered, _ → filter_by_distance(results) — removes low-quality matches Line 562: patterns → format_context(filtered) — formats chunks as context string

Line 564: llm → get_llm(temperature=0) (deterministic) Line 565-574: invokes LLM with system prompt (review for consistency, error handling, naming, bugs) and user prompt (file content + patterns from elsewhere)

Line 576: returns {"review": "<LLM review text>", "file_path": "src/codewalk/api/main.py"}

`POST /review/guidelines` — line 583

Loads team coding guidelines from a directory of markdown/text files.

Example

Input: GuidelinesRequest(docs_path="/home/user/my-app/docs/guidelines")

Line 589: path → "/home/user/my-app/docs/guidelines" (from request) Line 590-594: validates path is not empty Line 595: os.path.isdir(path) → True

Line 598: store → get_guidelines_store() — embeds guideline files into ChromaDB Line 602: count → store.chunk_count() → e.g. 24 Line 603: returns {"status": "loaded", "chunks": 24, "path": "/home/user/my-app/docs/guidelines"}

`POST /voice/ask` — line 608

Voice-in, voice-out codebase Q&A. Accepts audio file, transcribes, routes to the right tool, executes, and speaks the result.

Example

Input: audio file (webm from browser mic) saying "What does the config module do?"

Line 624: audio_bytes → raw bytes of the uploaded audio Line 625: question → transcribe_bytes(audio_bytes, file_name="audio.webm") → "What does the config module do?"

Line 627-634: question.strip() is truthy → skip fallback

Line 636: route_result → route("What does the config module do?") → {"tool": "codewalk_get_module_info", "arguments": {"module_name": "config"}} Line 637: tool_name → "codewalk_get_module_info" Line 638: arguments → {"module_name": "config"}

Line 649: state.ensure_initialized()

Line 652: result → execute_direct("codewalk_get_module_info", {"module_name": "config"}) → module info dict

Line 655: voice → format_voice_response(result) → {"technical": "<full detail>", "speech": "The config module has 1 file..."}

Line 658: audio_response → synthesize("The config module has 1 file...") → MP3 bytes

Line 660-666: returns:

{
  "question": "What does the config module do?",
  "tool": "codewalk_get_module_info",
  "answer": "<full technical detail>",
  "speech": "The config module has 1 file...",
  "audio_base64": "<base64-encoded MP3>"
}

`GET /cycles` — line 697

Detects circular dependencies in the codebase.

Example

Line 700: state.ensure_initialized() Line 701: runtime → state.get_graph_runtime() Line 702: returns runtime.detect_cycles() → e.g. {"cycles": [["a.py", "b.py", "a.py"]], "count": 1}

`GET /architecture` — line 705

Architecture health report: graph stats, centrality scores, and cycles.

Example

Line 708: state.ensure_initialized() Line 709: runtime → graph runtime Line 710-713: returns:

{
  "stats": {"nodes": 127, "edges": 203, "is_dag": false},
  "centrality": [{"file": "config.py", "betweenness": 0.42}, ...],
  "cycles": {"cycles": [...], "count": 2}
}

`GET /health` — line 718

Simple health check. Always returns {"status": "ok"}.

`POST /docs/index`

Indexes a folder of .md/.pdf/.txt documents into the DocStore.

Example

Input: {"docs_path": "/Users/me/team-docs"}

Line: doc_store = state.get_doc_store() Line: result = doc_store.index_docs("/Users/me/team-docs") Line: returns {"docs_found": 5, "chunks_stored": 42}

`POST /docs/search`

Semantic search across indexed documents.

Example

Input: {"query": "deployment process", "n_results": 3}

Line: doc_store = state.get_doc_store() Line: results = doc_store.search("deployment process", n_results=3) Line: returns [{"text": "...", "metadata": {...}, "distance": 0.12}, ...]

`POST /docs/ask`

Ask a question answered by indexed documents. Uses DOC_ASK_PROMPT with LLM.

Example

Input: {"question": "How do we deploy to production?", "n_results": 5}

Line: searches docs → gets top 5 chunks Line: formats DOC_ASK_PROMPT with chunks as context + question Line: answer = get_llm().invoke(prompt) Line: returns {"answer": "...", "sources": [{"doc_path": "deploy.md", "section": "Steps"}]}

api main

api/main.py

Key Concepts

app — line 46

global_exception_handler() — line 57

Example

POST /analyze — line 66

Modes

Example

POST /analyze/stream — line 124

SSE Event Format

Steps

Example (auto mode, index exists)

POST /chat — line 247

Example

GET /overview — line 268

Example

GET /modules/{module_name} — line 310

Example

Module not found

GET /blast-radius/{module_name} and GET /blast-radius — line 355

Example — scoped to a module

Example — whole repo (no module_name)

GET /modules — line 396

Example

GET /reading-order — line 410

Example

GET /execution-flow — line 435

Example

POST /refresh — line 450

Example

POST /incremental-reindex — line 472

Example

POST /review — line 498

Example

POST /review/file — line 547

Example

POST /review/guidelines — line 583

Example

POST /voice/ask — line 608

Example

GET /cycles — line 697

Example

GET /architecture — line 705

Example

GET /health — line 718

POST /docs/index

Example

POST /docs/search

Example

POST /docs/ask

Example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`app` — line 46

`global_exception_handler()` — line 57

`POST /analyze` — line 66

`POST /analyze/stream` — line 124

`POST /chat` — line 247

`GET /overview` — line 268

`GET /modules/{module_name}` — line 310

`GET /blast-radius/{module_name}` and `GET /blast-radius` — line 355

`GET /modules` — line 396

`GET /reading-order` — line 410

`GET /execution-flow` — line 435

`POST /refresh` — line 450

`POST /incremental-reindex` — line 472

`POST /review` — line 498

`POST /review/file` — line 547

`POST /review/guidelines` — line 583

`POST /voice/ask` — line 608

`GET /cycles` — line 697

`GET /architecture` — line 705

`GET /health` — line 718

`POST /docs/index`

`POST /docs/search`

`POST /docs/ask`