-
Notifications
You must be signed in to change notification settings - Fork 0
api main
FastAPI application with all HTTP endpoints — analyze, chat, overview, modules, blast radius, reading order, execution flow, review, voice, and health check.
| Term | Definition | Example |
|---|---|---|
| edge | A connection between two vertices in a graph, representing a relationship (e.g., an import). | If pipeline.py imports scanner.py, there's a directed edge pipeline.py → scanner.py. |
| DAG | Directed Acyclic Graph — a directed graph with no cycles (no circular paths back to the same node). | A→B→C is a DAG. A→B→C→A is NOT (it has a cycle). |
| cycle | A path in a graph that starts and ends at the same vertex. A→B→C→A is a cycle. | If auth.py imports user.py and user.py imports auth.py, that's a cycle. |
| betweenness centrality | How often a vertex sits on the shortest path between other vertices. High = important hub file. |
config.py with betweenness=45.0 means 45 shortest paths between other files pass through it. |
| blast radius | All files that would be affected if a given file changes — found by following reverse import edges transitively. | If A imports B and C imports A, changing B has blast radius = {A, C}. |
| transitive dependency | An indirect dependency through a chain. If A imports B and B imports C, then A transitively depends on C. | Changing C could break A even though A never directly imports C. |
| embedding | A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors. | The code def add(a, b): return a+b might become [0.12, -0.45, 0.78, ...] (1536 numbers for OpenAI). |
| vector store | A database optimized for storing embeddings and finding the most similar ones quickly. | ChromaDB stores code chunk embeddings and returns the 5 most similar chunks to your query. |
| cosine distance | Measures how different two vectors are. 0.0 = identical meaning, 1.0 = completely different, 2.0 = opposite. | Query "scan files" has cosine distance 0.15 to scan_directory() (very similar) and 0.85 to grade_answer() (very different). |
| ChromaDB | An open-source vector database for storing and searching embeddings. Used here to store code chunks. |
collection.query(query_texts=["scan files"], n_results=5) returns the 5 closest code chunks. |
| chunk | A piece of source code (usually one function or class) stored as a unit for search. | The function def scan_directory(root): ... (20 lines) is one chunk. |
| AST | Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). |
def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)]. |
| LLM | Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt. |
get_llm() returns a ChatOpenAI instance that can answer questions about code. |
| glob pattern | A wildcard pattern for matching file paths. * matches anything in one directory, ** matches across directories. |
**/*.py matches all Python files in any subdirectory. src/*.ts matches TypeScript files only in src/. |
| diff | The set of changes between two versions of code, showing added (+) and removed (-) lines. |
- old_line\n+ new_line shows old_line was replaced with new_line. |
| hunk | A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file). | A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed). |
| STT | Speech-to-Text — converting spoken audio into text (transcription). | User speaks into microphone → STT produces "What does scan_directory do?". |
Creates the FastAPI application instance with CORS middleware allowing all origins.
app = FastAPI(title="Codewalk API", description="AI-powered codebase onboarding tool", version="1.0.0")CORS is wide-open (allow_origins=["*"]) so the Next.js frontend can call the API from any port.
Catches all unhandled exceptions and converts them to user-friendly JSON messages via classify_error().
Input: an unhandled ValueError("chromadb collection not found")
Line 59: user_message → classify_error(ValueError(...)) → e.g. "ChromaDB collection not found. Try re-analyzing."
Line 60: logs "[api] Error: chromadb collection not found"
Line 61-63: returns JSONResponse(status_code=500, content={"detail": "ChromaDB collection not found. Try re-analyzing."})
Indexes a codebase: scan → chunk → embed → store → build agent.
index_mode |
Behavior |
|---|---|
"auto" |
Skip indexing if collection already has data |
"reindex" |
Smart re-index (only changed/new/deleted files) |
"full" |
Nuke everything, re-embed from scratch |
Input: AnalyzeRequest(repo_path="/home/user/my-app", collection_name="", index_mode="auto")
Line 77: request.repo_path → "/home/user/my-app" (not empty, stays as is)
Line 78: request.collection_name is "" → enter block
Line 79: "/home/user/my-app".rstrip("/").split("/")[-1] → "my-app"
Line 79: request.collection_name → "my-app"
Line 80: persist_dir → "/home/user/my-app/.codewalk/chroma"
Line 81: store → VectorStore(persist_dir="/home/user/my-app/.codewalk/chroma")
Line 82: store.create_collection("my-app")
Line 83: existing_count → store.chunk_count() → e.g. 843
Line 86: request.index_mode == "full" → False, existing_count == 0 → False
Line 88: request.index_mode == "reindex" → False
Line 90: auto mode + data exists → skip indexing
Line 91-95: index_result → {"repo_path": "/home/user/my-app", "files_scanned": 0, "chunks_created": 0, "skipped": True}
Line 96: logs "[api] Skipping indexing — collection already has 843 chunks"
Line 99: files → scan_directory("/home/user/my-app") → list of 127 file dicts
Line 100: deps → build_dependency_graph(files) → {"graph": {...}, "reverse": {...}}
Line 101: modules_result → detect_modules(files, deps) → {"modules": {"api": {...}, ...}, "stats": {...}}
Line 104: agent → create_agent(store, modules_result, files=files, deps=deps)
Line 107: state.initialize(store, agent, modules_result, index_result, files=files, deps=deps, repo_path="/home/user/my-app", embedded_chunks=None)
Line 110-112: loads guidelines store if REVIEW_GUIDELINES_PATH is set
Line 114-120: returns:
{
"status": "complete",
"repo_path": "/home/user/my-app",
"files_scanned": 0,
"chunks_created": 0,
"modules": ["api", "analysis", "embeddings", "ingestion", "generation"]
}Streams analysis progress via Server-Sent Events (SSE). Same logic as /analyze but yields progress events at each step.
Each event is a JSON line: data: {"step": "<step>", "message": "<msg>"}\n\n
| Step | When |
|---|---|
init |
Checking existing index |
scan |
Scanning directory |
filter |
LLM-based file filtering (if enabled) |
chunk |
Chunking + embedding |
embed |
Embedding complete |
store |
Storing in ChromaDB |
reindex |
Smart re-index stats |
skip |
Index exists, skipping |
analyze |
Building dependency graph |
agent |
Creating AI agent |
guidelines |
Embedding guidelines |
done |
Final event with full result |
error |
Exception message |
Input: AnalyzeRequest(repo_path="/home/user/my-app", index_mode="auto")
Events yielded:
data: {"step": "init", "message": "Checking existing index..."}
data: {"step": "skip", "message": "Index exists (843 chunks) — skipping"}
data: {"step": "analyze", "message": "Building dependency graph..."}
data: {"step": "analyze", "message": "Detected 5 modules"}
data: {"step": "agent", "message": "Creating AI agent..."}
data: {"step": "done", "message": "Analysis complete!", "result": {"status": "complete", ...}}
Returns StreamingResponse with media_type="text/event-stream" and X-Accel-Buffering: no.
Asks the agent a question about the codebase.
Input: ChatRequest(message="How does authentication work?", thread_id="session-42")
Line 253: state.ensure_initialized() — auto-loads if needed
Line 254: agent → the compiled LangGraph agent
Line 255-258: config → {"configurable": {"thread_id": "session-42"}}
Line 259-262: result → agent.invoke({"messages": [("human", "How does authentication work?")]}, config=config)
Line 262: answer → result["messages"][-1].content → e.g. "Auth uses JWT tokens issued by..."
Line 263: returns ChatResponse(answer="Auth uses JWT tokens issued by...", thread_id="session-42")
Returns project overview: tech stack, modules, Mermaid diagram, LLM summary, and top 30 riskiest files.
Line 274: state.ensure_initialized()
Line 275: modules_result → {"modules": {"api": {...}, ...}, "stats": {"total_files": 45, "total_modules": 5}, "module_graph": {"api": ["analysis"]}}
Line 276: store → VectorStore(...)
Line 279: diagram → generate_module_diagram({"api": ["analysis"]}) → "graph LR\n api --> analysis"
Line 282: analyze_result → {"repo_path": "/home/user/my-app", ...}
Line 283: tech → detect_tech_stack("/home/user/my-app") → ["Python", "FastAPI", "ChromaDB"]
Line 286: overview_text → generate_overview(tech, modules_result, diagram) → LLM-generated summary string
Line 288: deps → {"graph": {...}}
Line 289: runtime → _graph_runtime (or fallback to deps["graph"])
Line 290: blast_map → calculate_full_blast_map(runtime) → {"blast_map": [{"file": "config.py", "count": 30}, ...]}
Line 291: top_files → ["config.py", "utils.py", ...] (first 30)
Line 292: top_risky, _ → compute_file_risks(top_files, runtime) → list of dicts with risk levels
Line 294-302: returns OverviewResponse(tech_stack=["Python", "FastAPI", "ChromaDB"], total_files=45, total_modules=5, ...)
Returns details about a specific module: files, languages, dependencies, and blast radius.
Input: module_name = "api"
Line 316: state.ensure_initialized()
Line 317: module_result → full modules dict
Line 318: modules → {"api": {"files": [...], "file_count": 3, "languages": {"python": 3}}, ...}
Line 319: module_graph → {"api": ["analysis", "embeddings"], ...}
Line 321-323: actual_name → "api", info → {"files": [...], "file_count": 3, ...}, matched_as_feature → False
Line 331: depends_on → module_graph["api"] → ["analysis", "embeddings"]
Line 332-334: depended_by → scan all modules → e.g. [] (nothing depends on api)
Line 337: runtime → _graph_runtime or fallback
Line 338: file_risks, max_risk → compute_file_risks(["src/api/main.py", "src/api/models.py", "src/api/state.py"], runtime) → e.g. ([{"file": "main.py", "risk_level": "medium", ...}], "medium")
Line 340-348: returns:
{
"name": "api",
"file_count": 3,
"files": ["src/api/main.py", "src/api/models.py", "src/api/state.py"],
"languages": {"python": 3},
"depends_on": ["analysis", "embeddings"],
"depended_by": [],
"blast_radius": [{"file": "main.py", "risk_level": "medium", ...}],
"module_risk": "medium"
}If resolve_module_with_fallback returns None:
Line 325-329: raises HTTPException(status_code=404, detail="Module 'xyz' not found. Available: analysis, api, embeddings, ...")
Returns blast radius (change risk) for files, optionally scoped to a module.
Input: module_name = "analysis"
Line 362: state.ensure_initialized()
Line 363: modules_result → full modules dict
Line 364-365: runtime → graph runtime
Line 368: module_name is "analysis" → truthy, enter block
Line 370: actual_name → "analysis"
Line 376: target_files → ["src/analysis/blast_radius.py", "src/analysis/dependency_graph.py", ...] (sorted)
Line 377: scope → "analysis"
Line 381: file_results, max_risk → compute_file_risks(target_files, runtime) → e.g. ([...], "high")
Line 383-387: returns:
{
"module": "analysis",
"module_risk": "high",
"total_files": 6,
"files": [{"file": "dependency_graph.py", "risk_level": "high", "affected_files": 18}, ...]
}Line 368: module_name is "" → falsy
Line 379: target_files → all files from deps["graph"] (sorted)
Line 380: scope → "all"
Lists all available module names.
Line 401: state.ensure_initialized()
Line 402: modules_result → {"modules": {"api": {...}, "analysis": {...}}, "stats": {"total_modules": 5}}
Line 403-405: returns:
{
"modules": ["api", "analysis", "embeddings", "ingestion", "generation"],
"total": 5
}Returns the recommended file reading order with risk annotations.
Line 415: state.ensure_initialized()
Line 416: files → list of 127 file dicts
Line 417: deps → dependency graph
Line 418: runtime → graph runtime
Line 419: order → generate_reading_order(files, deps, graph_runtime=runtime) → {"order": [{"file": "config.py", "relevance": "essential", "why": "Used by every module"}, ...]}
Line 420: order_files → ["config.py", "utils.py", ...]
Line 421: risks, _ → compute_file_risks(order_files, runtime)
Line 422: risks_by_file → {"config.py": {"risk_level": "critical", ...}, ...}
Line 423-428: enriches each item with risk_level, affected_files, direct, transitive
Line 429-430: maps relevance → priority, why → reason for frontend compatibility
Returns the enriched order dict.
Returns the execution flow diagram and narration.
Line 439: state.ensure_initialized()
Line 440: analyze_result → {"repo_path": "/home/user/my-app", ...}
Line 441: repo_path → "/home/user/my-app"
Line 442-444: files, deps, runtime from state
Line 445: order → generate_reading_order(files, deps, graph_runtime=runtime)
Line 446: flow → generate_execution_flow(order, deps) → Mermaid diagram + narration text
Line 447: returns {"flow": "<mermaid + narration>"}
Re-scans files and rebuilds dependency graph + modules. Does NOT re-embed or re-index.
Line 458: state.ensure_initialized()
Line 459: state.rebuild_analysis_cache() — re-scans, rebuilds graph
Line 461-464: returns:
{
"status": "refreshed",
"files": 127,
"modules": ["api", "analysis", "embeddings", "ingestion", "generation"]
}Re-embeds only files that changed since last indexing (hash-based comparison).
Line 478: store → current VectorStore
Line 479: repo_path → "/home/user/my-app"
Line 480: collection_name → "my-app"
Line 481: persist_dir → "/home/user/my-app/.codewalk/chroma"
Line 482: indexed_files → ["src/main.py", "src/utils.py", ...] (all files currently in ChromaDB)
Line 483-484: if empty → HTTPException(400, "No files indexed yet. Run /analyze first.")
Line 486: result → incremental_reindex(indexed_files, repo_path, collection_name, persist_dir=persist_dir) → {"new_files": 2, "changed_files": 1, "deleted_files": 0, ...}
Line 489: state.rebuild_analysis_cache(embedded_chunks=result.get("embedded_chunks")) — refresh graph
Line 491: returns the result dict
Reviews the current git diff for bugs, security issues, and style.
Input: ReviewRequest(staged=True, target_branch="main")
Line 505: state.ensure_initialized()
Line 507-511: gets store and deps (non-fatal if missing)
Line 513-520: result → review_diff(staged=True, target_branch="main", use_llm=True, store=store, deps=deps, graph_store=..., repo_path=...)
Line 522-533: transforms result.issues into list of dicts:
[{
"severity": "high",
"category": "security",
"file_path": "src/api/main.py",
"line_number": 42,
"title": "SQL injection risk",
"explanation": "...",
"suggestion": "...",
"code_snippet": "..."
}]Line 535-540: returns {"issues": [...], "summary": "...", "files_reviewed": 3, "lines_added": 45, "lines_removed": 12}
Reviews a single file against codebase conventions using LLM + vector search for context.
Input: ReviewFileRequest(file_path="src/codewalk/api/main.py")
Line 554: store → VectorStore
Line 556-557: reads file content from disk
Line 559: results → store.search("code in src/codewalk/api/main.py", n_results=5) — top 5 similar chunks
Line 561: filtered, _ → filter_by_distance(results) — removes low-quality matches
Line 562: patterns → format_context(filtered) — formats chunks as context string
Line 564: llm → get_llm(temperature=0) (deterministic)
Line 565-574: invokes LLM with system prompt (review for consistency, error handling, naming, bugs) and user prompt (file content + patterns from elsewhere)
Line 576: returns {"review": "<LLM review text>", "file_path": "src/codewalk/api/main.py"}
Loads team coding guidelines from a directory of markdown/text files.
Input: GuidelinesRequest(docs_path="/home/user/my-app/docs/guidelines")
Line 589: path → "/home/user/my-app/docs/guidelines" (from request)
Line 590-594: validates path is not empty
Line 595: os.path.isdir(path) → True
Line 598: store → get_guidelines_store() — embeds guideline files into ChromaDB
Line 602: count → store.chunk_count() → e.g. 24
Line 603: returns {"status": "loaded", "chunks": 24, "path": "/home/user/my-app/docs/guidelines"}
Voice-in, voice-out codebase Q&A. Accepts audio file, transcribes, routes to the right tool, executes, and speaks the result.
Input: audio file (webm from browser mic) saying "What does the config module do?"
Line 624: audio_bytes → raw bytes of the uploaded audio
Line 625: question → transcribe_bytes(audio_bytes, file_name="audio.webm") → "What does the config module do?"
Line 627-634: question.strip() is truthy → skip fallback
Line 636: route_result → route("What does the config module do?") → {"tool": "codewalk_get_module_info", "arguments": {"module_name": "config"}}
Line 637: tool_name → "codewalk_get_module_info"
Line 638: arguments → {"module_name": "config"}
Line 649: state.ensure_initialized()
Line 652: result → execute_direct("codewalk_get_module_info", {"module_name": "config"}) → module info dict
Line 655: voice → format_voice_response(result) → {"technical": "<full detail>", "speech": "The config module has 1 file..."}
Line 658: audio_response → synthesize("The config module has 1 file...") → MP3 bytes
Line 660-666: returns:
{
"question": "What does the config module do?",
"tool": "codewalk_get_module_info",
"answer": "<full technical detail>",
"speech": "The config module has 1 file...",
"audio_base64": "<base64-encoded MP3>"
}Detects circular dependencies in the codebase.
Line 700: state.ensure_initialized()
Line 701: runtime → state.get_graph_runtime()
Line 702: returns runtime.detect_cycles() → e.g. {"cycles": [["a.py", "b.py", "a.py"]], "count": 1}
Architecture health report: graph stats, centrality scores, and cycles.
Line 708: state.ensure_initialized()
Line 709: runtime → graph runtime
Line 710-713: returns:
{
"stats": {"nodes": 127, "edges": 203, "is_dag": false},
"centrality": [{"file": "config.py", "betweenness": 0.42}, ...],
"cycles": {"cycles": [...], "count": 2}
}Simple health check. Always returns {"status": "ok"}.
Indexes a folder of .md/.pdf/.txt documents into the DocStore.
Input: {"docs_path": "/Users/me/team-docs"}
Line: doc_store = state.get_doc_store()
Line: result = doc_store.index_docs("/Users/me/team-docs")
Line: returns {"docs_found": 5, "chunks_stored": 42}
Semantic search across indexed documents.
Input: {"query": "deployment process", "n_results": 3}
Line: doc_store = state.get_doc_store()
Line: results = doc_store.search("deployment process", n_results=3)
Line: returns [{"text": "...", "metadata": {...}, "distance": 0.12}, ...]
Ask a question answered by indexed documents. Uses DOC_ASK_PROMPT with LLM.
Input: {"question": "How do we deploy to production?", "n_results": 5}
Line: searches docs → gets top 5 chunks
Line: formats DOC_ASK_PROMPT with chunks as context + question
Line: answer = get_llm().invoke(prompt)
Line: returns {"answer": "...", "sources": [{"doc_path": "deploy.md", "section": "Steps"}]}