Skip to content

api main

aakash-anko edited this page May 28, 2026 · 2 revisions

api/main.py

FastAPI application with all HTTP endpoints — analyze, chat, overview, modules, blast radius, reading order, execution flow, review, voice, and health check.


Key Concepts

Term Definition Example
edge A connection between two vertices in a graph, representing a relationship (e.g., an import). If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py.
DAG Directed Acyclic Graph — a directed graph with no cycles (no circular paths back to the same node). A→B→C is a DAG. A→B→C→A is NOT (it has a cycle).
cycle A path in a graph that starts and ends at the same vertex. A→B→C→A is a cycle. If auth.py imports user.py and user.py imports auth.py, that's a cycle.
betweenness centrality How often a vertex sits on the shortest path between other vertices. High = important hub file. config.py with betweenness=45.0 means 45 shortest paths between other files pass through it.
blast radius All files that would be affected if a given file changes — found by following reverse import edges transitively. If A imports B and C imports A, changing B has blast radius = {A, C}.
transitive dependency An indirect dependency through a chain. If A imports B and B imports C, then A transitively depends on C. Changing C could break A even though A never directly imports C.
embedding A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors. The code def add(a, b): return a+b might become [0.12, -0.45, 0.78, ...] (1536 numbers for OpenAI).
vector store A database optimized for storing embeddings and finding the most similar ones quickly. ChromaDB stores code chunk embeddings and returns the 5 most similar chunks to your query.
cosine distance Measures how different two vectors are. 0.0 = identical meaning, 1.0 = completely different, 2.0 = opposite. Query "scan files" has cosine distance 0.15 to scan_directory() (very similar) and 0.85 to grade_answer() (very different).
ChromaDB An open-source vector database for storing and searching embeddings. Used here to store code chunks. collection.query(query_texts=["scan files"], n_results=5) returns the 5 closest code chunks.
chunk A piece of source code (usually one function or class) stored as a unit for search. The function def scan_directory(root): ... (20 lines) is one chunk.
AST Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].
LLM Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt. get_llm() returns a ChatOpenAI instance that can answer questions about code.
glob pattern A wildcard pattern for matching file paths. * matches anything in one directory, ** matches across directories. **/*.py matches all Python files in any subdirectory. src/*.ts matches TypeScript files only in src/.
diff The set of changes between two versions of code, showing added (+) and removed (-) lines. - old_line\n+ new_line shows old_line was replaced with new_line.
hunk A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file). A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed).
STT Speech-to-Text — converting spoken audio into text (transcription). User speaks into microphone → STT produces "What does scan_directory do?".

app — line 46

Creates the FastAPI application instance with CORS middleware allowing all origins.

app = FastAPI(title="Codewalk API", description="AI-powered codebase onboarding tool", version="1.0.0")

CORS is wide-open (allow_origins=["*"]) so the Next.js frontend can call the API from any port.


global_exception_handler() — line 57

Catches all unhandled exceptions and converts them to user-friendly JSON messages via classify_error().

Example

Input: an unhandled ValueError("chromadb collection not found")

Line 59: user_messageclassify_error(ValueError(...)) → e.g. "ChromaDB collection not found. Try re-analyzing." Line 60: logs "[api] Error: chromadb collection not found" Line 61-63: returns JSONResponse(status_code=500, content={"detail": "ChromaDB collection not found. Try re-analyzing."})


POST /analyze — line 66

Indexes a codebase: scan → chunk → embed → store → build agent.

Modes

index_mode Behavior
"auto" Skip indexing if collection already has data
"reindex" Smart re-index (only changed/new/deleted files)
"full" Nuke everything, re-embed from scratch

Example

Input: AnalyzeRequest(repo_path="/home/user/my-app", collection_name="", index_mode="auto")

Line 77: request.repo_path"/home/user/my-app" (not empty, stays as is) Line 78: request.collection_name is "" → enter block Line 79: "/home/user/my-app".rstrip("/").split("/")[-1]"my-app" Line 79: request.collection_name"my-app" Line 80: persist_dir"/home/user/my-app/.codewalk/chroma" Line 81: storeVectorStore(persist_dir="/home/user/my-app/.codewalk/chroma") Line 82: store.create_collection("my-app") Line 83: existing_countstore.chunk_count() → e.g. 843

Line 86: request.index_mode == "full"False, existing_count == 0False Line 88: request.index_mode == "reindex"False Line 90: auto mode + data exists → skip indexing Line 91-95: index_result{"repo_path": "/home/user/my-app", "files_scanned": 0, "chunks_created": 0, "skipped": True} Line 96: logs "[api] Skipping indexing — collection already has 843 chunks"

Line 99: filesscan_directory("/home/user/my-app") → list of 127 file dicts Line 100: depsbuild_dependency_graph(files){"graph": {...}, "reverse": {...}} Line 101: modules_resultdetect_modules(files, deps){"modules": {"api": {...}, ...}, "stats": {...}}

Line 104: agentcreate_agent(store, modules_result, files=files, deps=deps)

Line 107: state.initialize(store, agent, modules_result, index_result, files=files, deps=deps, repo_path="/home/user/my-app", embedded_chunks=None)

Line 110-112: loads guidelines store if REVIEW_GUIDELINES_PATH is set

Line 114-120: returns:

{
  "status": "complete",
  "repo_path": "/home/user/my-app",
  "files_scanned": 0,
  "chunks_created": 0,
  "modules": ["api", "analysis", "embeddings", "ingestion", "generation"]
}

POST /analyze/stream — line 124

Streams analysis progress via Server-Sent Events (SSE). Same logic as /analyze but yields progress events at each step.

SSE Event Format

Each event is a JSON line: data: {"step": "<step>", "message": "<msg>"}\n\n

Steps

Step When
init Checking existing index
scan Scanning directory
filter LLM-based file filtering (if enabled)
chunk Chunking + embedding
embed Embedding complete
store Storing in ChromaDB
reindex Smart re-index stats
skip Index exists, skipping
analyze Building dependency graph
agent Creating AI agent
guidelines Embedding guidelines
done Final event with full result
error Exception message

Example (auto mode, index exists)

Input: AnalyzeRequest(repo_path="/home/user/my-app", index_mode="auto")

Events yielded:

data: {"step": "init", "message": "Checking existing index..."}
data: {"step": "skip", "message": "Index exists (843 chunks) — skipping"}
data: {"step": "analyze", "message": "Building dependency graph..."}
data: {"step": "analyze", "message": "Detected 5 modules"}
data: {"step": "agent", "message": "Creating AI agent..."}
data: {"step": "done", "message": "Analysis complete!", "result": {"status": "complete", ...}}

Returns StreamingResponse with media_type="text/event-stream" and X-Accel-Buffering: no.


POST /chat — line 247

Asks the agent a question about the codebase.

Example

Input: ChatRequest(message="How does authentication work?", thread_id="session-42")

Line 253: state.ensure_initialized() — auto-loads if needed Line 254: agent → the compiled LangGraph agent Line 255-258: config{"configurable": {"thread_id": "session-42"}} Line 259-262: resultagent.invoke({"messages": [("human", "How does authentication work?")]}, config=config) Line 262: answerresult["messages"][-1].content → e.g. "Auth uses JWT tokens issued by..." Line 263: returns ChatResponse(answer="Auth uses JWT tokens issued by...", thread_id="session-42")


GET /overview — line 268

Returns project overview: tech stack, modules, Mermaid diagram, LLM summary, and top 30 riskiest files.

Example

Line 274: state.ensure_initialized() Line 275: modules_result{"modules": {"api": {...}, ...}, "stats": {"total_files": 45, "total_modules": 5}, "module_graph": {"api": ["analysis"]}} Line 276: storeVectorStore(...)

Line 279: diagramgenerate_module_diagram({"api": ["analysis"]})"graph LR\n api --> analysis"

Line 282: analyze_result{"repo_path": "/home/user/my-app", ...} Line 283: techdetect_tech_stack("/home/user/my-app")["Python", "FastAPI", "ChromaDB"]

Line 286: overview_textgenerate_overview(tech, modules_result, diagram) → LLM-generated summary string

Line 288: deps{"graph": {...}} Line 289: runtime_graph_runtime (or fallback to deps["graph"]) Line 290: blast_mapcalculate_full_blast_map(runtime){"blast_map": [{"file": "config.py", "count": 30}, ...]} Line 291: top_files["config.py", "utils.py", ...] (first 30) Line 292: top_risky, _compute_file_risks(top_files, runtime) → list of dicts with risk levels

Line 294-302: returns OverviewResponse(tech_stack=["Python", "FastAPI", "ChromaDB"], total_files=45, total_modules=5, ...)


GET /modules/{module_name} — line 310

Returns details about a specific module: files, languages, dependencies, and blast radius.

Example

Input: module_name = "api"

Line 316: state.ensure_initialized() Line 317: module_result → full modules dict Line 318: modules{"api": {"files": [...], "file_count": 3, "languages": {"python": 3}}, ...} Line 319: module_graph{"api": ["analysis", "embeddings"], ...}

Line 321-323: actual_name"api", info{"files": [...], "file_count": 3, ...}, matched_as_featureFalse

Line 331: depends_onmodule_graph["api"]["analysis", "embeddings"] Line 332-334: depended_by → scan all modules → e.g. [] (nothing depends on api)

Line 337: runtime_graph_runtime or fallback Line 338: file_risks, max_riskcompute_file_risks(["src/api/main.py", "src/api/models.py", "src/api/state.py"], runtime) → e.g. ([{"file": "main.py", "risk_level": "medium", ...}], "medium")

Line 340-348: returns:

{
  "name": "api",
  "file_count": 3,
  "files": ["src/api/main.py", "src/api/models.py", "src/api/state.py"],
  "languages": {"python": 3},
  "depends_on": ["analysis", "embeddings"],
  "depended_by": [],
  "blast_radius": [{"file": "main.py", "risk_level": "medium", ...}],
  "module_risk": "medium"
}

Module not found

If resolve_module_with_fallback returns None:

Line 325-329: raises HTTPException(status_code=404, detail="Module 'xyz' not found. Available: analysis, api, embeddings, ...")


GET /blast-radius/{module_name} and GET /blast-radius — line 355

Returns blast radius (change risk) for files, optionally scoped to a module.

Example — scoped to a module

Input: module_name = "analysis"

Line 362: state.ensure_initialized() Line 363: modules_result → full modules dict Line 364-365: runtime → graph runtime

Line 368: module_name is "analysis" → truthy, enter block Line 370: actual_name"analysis" Line 376: target_files["src/analysis/blast_radius.py", "src/analysis/dependency_graph.py", ...] (sorted) Line 377: scope"analysis"

Line 381: file_results, max_riskcompute_file_risks(target_files, runtime) → e.g. ([...], "high")

Line 383-387: returns:

{
  "module": "analysis",
  "module_risk": "high",
  "total_files": 6,
  "files": [{"file": "dependency_graph.py", "risk_level": "high", "affected_files": 18}, ...]
}

Example — whole repo (no module_name)

Line 368: module_name is "" → falsy Line 379: target_files → all files from deps["graph"] (sorted) Line 380: scope"all"


GET /modules — line 396

Lists all available module names.

Example

Line 401: state.ensure_initialized() Line 402: modules_result{"modules": {"api": {...}, "analysis": {...}}, "stats": {"total_modules": 5}} Line 403-405: returns:

{
  "modules": ["api", "analysis", "embeddings", "ingestion", "generation"],
  "total": 5
}

GET /reading-order — line 410

Returns the recommended file reading order with risk annotations.

Example

Line 415: state.ensure_initialized() Line 416: files → list of 127 file dicts Line 417: deps → dependency graph Line 418: runtime → graph runtime Line 419: ordergenerate_reading_order(files, deps, graph_runtime=runtime){"order": [{"file": "config.py", "relevance": "essential", "why": "Used by every module"}, ...]} Line 420: order_files["config.py", "utils.py", ...] Line 421: risks, _compute_file_risks(order_files, runtime) Line 422: risks_by_file{"config.py": {"risk_level": "critical", ...}, ...} Line 423-428: enriches each item with risk_level, affected_files, direct, transitive Line 429-430: maps relevancepriority, whyreason for frontend compatibility

Returns the enriched order dict.


GET /execution-flow — line 435

Returns the execution flow diagram and narration.

Example

Line 439: state.ensure_initialized() Line 440: analyze_result{"repo_path": "/home/user/my-app", ...} Line 441: repo_path"/home/user/my-app" Line 442-444: files, deps, runtime from state Line 445: ordergenerate_reading_order(files, deps, graph_runtime=runtime) Line 446: flowgenerate_execution_flow(order, deps) → Mermaid diagram + narration text Line 447: returns {"flow": "<mermaid + narration>"}


POST /refresh — line 450

Re-scans files and rebuilds dependency graph + modules. Does NOT re-embed or re-index.

Example

Line 458: state.ensure_initialized() Line 459: state.rebuild_analysis_cache() — re-scans, rebuilds graph

Line 461-464: returns:

{
  "status": "refreshed",
  "files": 127,
  "modules": ["api", "analysis", "embeddings", "ingestion", "generation"]
}

POST /incremental-reindex — line 472

Re-embeds only files that changed since last indexing (hash-based comparison).

Example

Line 478: store → current VectorStore Line 479: repo_path"/home/user/my-app" Line 480: collection_name"my-app" Line 481: persist_dir"/home/user/my-app/.codewalk/chroma" Line 482: indexed_files["src/main.py", "src/utils.py", ...] (all files currently in ChromaDB) Line 483-484: if empty → HTTPException(400, "No files indexed yet. Run /analyze first.")

Line 486: resultincremental_reindex(indexed_files, repo_path, collection_name, persist_dir=persist_dir){"new_files": 2, "changed_files": 1, "deleted_files": 0, ...}

Line 489: state.rebuild_analysis_cache(embedded_chunks=result.get("embedded_chunks")) — refresh graph

Line 491: returns the result dict


POST /review — line 498

Reviews the current git diff for bugs, security issues, and style.

Example

Input: ReviewRequest(staged=True, target_branch="main")

Line 505: state.ensure_initialized() Line 507-511: gets store and deps (non-fatal if missing)

Line 513-520: resultreview_diff(staged=True, target_branch="main", use_llm=True, store=store, deps=deps, graph_store=..., repo_path=...)

Line 522-533: transforms result.issues into list of dicts:

[{
  "severity": "high",
  "category": "security",
  "file_path": "src/api/main.py",
  "line_number": 42,
  "title": "SQL injection risk",
  "explanation": "...",
  "suggestion": "...",
  "code_snippet": "..."
}]

Line 535-540: returns {"issues": [...], "summary": "...", "files_reviewed": 3, "lines_added": 45, "lines_removed": 12}


POST /review/file — line 547

Reviews a single file against codebase conventions using LLM + vector search for context.

Example

Input: ReviewFileRequest(file_path="src/codewalk/api/main.py")

Line 554: store → VectorStore Line 556-557: reads file content from disk Line 559: resultsstore.search("code in src/codewalk/api/main.py", n_results=5) — top 5 similar chunks Line 561: filtered, _filter_by_distance(results) — removes low-quality matches Line 562: patternsformat_context(filtered) — formats chunks as context string

Line 564: llmget_llm(temperature=0) (deterministic) Line 565-574: invokes LLM with system prompt (review for consistency, error handling, naming, bugs) and user prompt (file content + patterns from elsewhere)

Line 576: returns {"review": "<LLM review text>", "file_path": "src/codewalk/api/main.py"}


POST /review/guidelines — line 583

Loads team coding guidelines from a directory of markdown/text files.

Example

Input: GuidelinesRequest(docs_path="/home/user/my-app/docs/guidelines")

Line 589: path"/home/user/my-app/docs/guidelines" (from request) Line 590-594: validates path is not empty Line 595: os.path.isdir(path)True

Line 598: storeget_guidelines_store() — embeds guideline files into ChromaDB Line 602: countstore.chunk_count() → e.g. 24 Line 603: returns {"status": "loaded", "chunks": 24, "path": "/home/user/my-app/docs/guidelines"}


POST /voice/ask — line 608

Voice-in, voice-out codebase Q&A. Accepts audio file, transcribes, routes to the right tool, executes, and speaks the result.

Example

Input: audio file (webm from browser mic) saying "What does the config module do?"

Line 624: audio_bytes → raw bytes of the uploaded audio Line 625: questiontranscribe_bytes(audio_bytes, file_name="audio.webm")"What does the config module do?"

Line 627-634: question.strip() is truthy → skip fallback

Line 636: route_resultroute("What does the config module do?"){"tool": "codewalk_get_module_info", "arguments": {"module_name": "config"}} Line 637: tool_name"codewalk_get_module_info" Line 638: arguments{"module_name": "config"}

Line 649: state.ensure_initialized()

Line 652: resultexecute_direct("codewalk_get_module_info", {"module_name": "config"}) → module info dict

Line 655: voiceformat_voice_response(result){"technical": "<full detail>", "speech": "The config module has 1 file..."}

Line 658: audio_responsesynthesize("The config module has 1 file...") → MP3 bytes

Line 660-666: returns:

{
  "question": "What does the config module do?",
  "tool": "codewalk_get_module_info",
  "answer": "<full technical detail>",
  "speech": "The config module has 1 file...",
  "audio_base64": "<base64-encoded MP3>"
}

GET /cycles — line 697

Detects circular dependencies in the codebase.

Example

Line 700: state.ensure_initialized() Line 701: runtimestate.get_graph_runtime() Line 702: returns runtime.detect_cycles() → e.g. {"cycles": [["a.py", "b.py", "a.py"]], "count": 1}


GET /architecture — line 705

Architecture health report: graph stats, centrality scores, and cycles.

Example

Line 708: state.ensure_initialized() Line 709: runtime → graph runtime Line 710-713: returns:

{
  "stats": {"nodes": 127, "edges": 203, "is_dag": false},
  "centrality": [{"file": "config.py", "betweenness": 0.42}, ...],
  "cycles": {"cycles": [...], "count": 2}
}

GET /health — line 718

Simple health check. Always returns {"status": "ok"}.


POST /docs/index

Indexes a folder of .md/.pdf/.txt documents into the DocStore.

Example

Input: {"docs_path": "/Users/me/team-docs"}

Line: doc_store = state.get_doc_store() Line: result = doc_store.index_docs("/Users/me/team-docs") Line: returns {"docs_found": 5, "chunks_stored": 42}


POST /docs/search

Semantic search across indexed documents.

Example

Input: {"query": "deployment process", "n_results": 3}

Line: doc_store = state.get_doc_store() Line: results = doc_store.search("deployment process", n_results=3) Line: returns [{"text": "...", "metadata": {...}, "distance": 0.12}, ...]


POST /docs/ask

Ask a question answered by indexed documents. Uses DOC_ASK_PROMPT with LLM.

Example

Input: {"question": "How do we deploy to production?", "n_results": 5}

Line: searches docs → gets top 5 chunks Line: formats DOC_ASK_PROMPT with chunks as context + question Line: answer = get_llm().invoke(prompt) Line: returns {"answer": "...", "sources": [{"doc_path": "deploy.md", "section": "Steps"}]}

Clone this wiki locally