-
Notifications
You must be signed in to change notification settings - Fork 0
api models
Pydantic request and response models that define the shape of every JSON body the API accepts and returns.
| Term | Definition | Example |
|---|---|---|
| blast radius | All files that would be affected if a given file changes — found by following reverse import edges transitively. | If A imports B and C imports A, changing B has blast radius = {A, C}. |
| embedding | A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors. | The code def add(a, b): return a+b might become [0.12, -0.45, 0.78, ...] (1536 numbers for OpenAI). |
| ChromaDB | An open-source vector database for storing and searching embeddings. Used here to store code chunks. |
collection.query(query_texts=["scan files"], n_results=5) returns the 5 closest code chunks. |
| chunk | A piece of source code (usually one function or class) stored as a unit for search. | The function def scan_directory(root): ... (20 lines) is one chunk. |
| AST | Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.). |
def add(a, b): return a+b becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)]. |
| LLM | Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt. |
get_llm() returns a ChatOpenAI instance that can answer questions about code. |
| Pydantic | A Python library for data validation using type hints. Defines schemas as classes with typed fields. |
class QueryRoute(BaseModel): route: str; target: str — any instance is guaranteed to have string route and target fields. |
| diff | The set of changes between two versions of code, showing added (+) and removed (-) lines. |
- old_line\n+ new_line shows old_line was replaced with new_line. |
| hunk | A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file). | A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed). |
Request body for POST /analyze. Tells the API which repo to index and how.
| Field | Type | Default | Purpose |
|---|---|---|---|
repo_path |
str |
"" |
Absolute path to the repo on disk |
collection_name |
str |
"" |
ChromaDB collection name (derived from repo if empty) |
index_mode |
str |
"auto" |
"auto" / "reindex" / "full"
|
Input JSON: {"repo_path": "/home/user/my-app", "collection_name": "", "index_mode": "full"}
Line 7: repo_path → "/home/user/my-app"
Line 8: collection_name → "" (empty — the API endpoint will later derive "my-app" from the path)
Line 9: index_mode → "full" (nuke existing index, re-embed everything)
The resulting AnalyzeRequest object:
AnalyzeRequest(repo_path="/home/user/my-app", collection_name="", index_mode="full")
Request body for POST /chat. Carries the user's question and a thread ID for conversation memory.
| Field | Type | Default | Purpose |
|---|---|---|---|
message |
str |
(required) | The question to ask the agent |
thread_id |
str |
"default" |
Conversation thread for multi-turn memory |
Input JSON: {"message": "How does authentication work?", "thread_id": "session-42"}
Line 13: message → "How does authentication work?"
Line 14: thread_id → "session-42"
The resulting ChatRequest object:
ChatRequest(message="How does authentication work?", thread_id="session-42")
Response body for POST /analyze. Reports what the indexing pipeline produced.
| Field | Type | Purpose |
|---|---|---|
status |
str |
Always "complete" on success |
repo_path |
str |
The repo that was analyzed |
files_scanned |
int |
Number of source files found |
chunks_created |
int |
Number of text chunks embedded |
modules |
list[str] |
Detected module names |
Input (constructed by the /analyze endpoint):
AnalyzeResponse(
status="complete",
repo_path="/home/user/my-app",
files_scanned=127,
chunks_created=843,
modules=["api", "models", "services", "utils"]
)Serialized JSON:
{
"status": "complete",
"repo_path": "/home/user/my-app",
"files_scanned": 127,
"chunks_created": 843,
"modules": ["api", "models", "services", "utils"]
}Response body for POST /chat. Contains the agent's answer.
| Field | Type | Purpose |
|---|---|---|
answer |
str |
The agent's response text |
thread_id |
str |
Echoed back for client-side correlation |
Input:
ChatResponse(answer="Auth uses JWT tokens issued by the /login endpoint.", thread_id="session-42")Serialized JSON:
{
"answer": "Auth uses JWT tokens issued by the /login endpoint.",
"thread_id": "session-42"
}Response body for GET /modules/{name}. Full details about one module.
| Field | Type | Default | Purpose |
|---|---|---|---|
name |
str |
Module name (or "users (inside 'features')" if matched as sub-folder) |
|
file_count |
int |
Number of files in the module | |
files |
list[str] |
Sorted list of file paths | |
languages |
dict[str, int] |
Language → file count mapping | |
depends_on |
list[str] |
Modules this one imports from | |
depended_by |
list[str] |
Modules that import from this one | |
blast_radius |
list[dict] |
[] |
Per-file risk data |
module_risk |
str |
"low" |
Highest risk level across all files |
Input:
ModuleResponse(
name="api",
file_count=3,
files=["src/api/main.py", "src/api/models.py", "src/api/state.py"],
languages={"python": 3},
depends_on=["services", "models"],
depended_by=[],
blast_radius=[{"file": "src/api/main.py", "risk_level": "high", "affected_files": 12}],
module_risk="high",
)Serialized JSON:
{
"name": "api",
"file_count": 3,
"files": ["src/api/main.py", "src/api/models.py", "src/api/state.py"],
"languages": {"python": 3},
"depends_on": ["services", "models"],
"depended_by": [],
"blast_radius": [{"file": "src/api/main.py", "risk_level": "high", "affected_files": 12}],
"module_risk": "high"
}Response body for GET /overview. Full project summary with diagram and risk data.
| Field | Type | Default | Purpose |
|---|---|---|---|
tech_stack |
list[str] |
Detected technologies (e.g. ["Python", "FastAPI"]) |
|
total_files |
int |
Total source files in the repo | |
total_modules |
int |
Number of detected modules | |
modules |
list[str] |
Module names | |
diagram |
str |
Mermaid diagram markup of module graph | |
overview_text |
str |
LLM-generated project summary | |
riskiest_files |
list[dict] |
[] |
Top 30 riskiest files with blast radius data |
OverviewResponse(
tech_stack=["Python", "FastAPI", "ChromaDB"],
total_files=45,
total_modules=5,
modules=["api", "embeddings", "analysis", "ingestion", "generation"],
diagram="graph LR\n api --> analysis\n api --> embeddings",
overview_text="This project is an AI-powered codebase onboarding tool...",
riskiest_files=[{"file": "src/config.py", "risk_level": "critical", "affected_files": 30}],
)Response body for GET /blast-radius/{module_name}. Shows which files break if you change files in a module.
| Field | Type | Purpose |
|---|---|---|
module |
str |
Module name, or "all" for whole repo |
module_risk |
str |
Highest risk across all files |
total_files |
int |
Number of files in scope |
files |
list[dict] |
Per-file risk details |
BlastRadiusResponse(
module="analysis",
module_risk="high",
total_files=6,
files=[
{"file": "src/analysis/dependency_graph.py", "risk_level": "high", "affected_files": 18},
{"file": "src/analysis/module_detector.py", "risk_level": "medium", "affected_files": 7},
],
)Request body for POST /review. Controls which git diff to review.
| Field | Type | Default | Purpose |
|---|---|---|---|
staged |
bool |
False |
If True, review only staged changes (--staged) |
target_branch |
str | None |
None |
Diff against a branch (e.g. "main" for full PR review) |
Input JSON: {"staged": true, "target_branch": "main"}
Line 62: staged → True (only review git add-ed files)
Line 63: target_branch → "main" (diff current branch against main)
Request body for POST /review/file. Points to a single file to review.
| Field | Type | Purpose |
|---|---|---|
file_path |
str |
Absolute or relative path to the file |
Input JSON: {"file_path": "src/codewalk/api/main.py"}
Line 67: file_path → "src/codewalk/api/main.py"
Request body for POST /review/guidelines. Loads team coding standards.
| Field | Type | Default | Purpose |
|---|---|---|---|
docs_path |
str | None |
None |
Path to directory with .md/.txt guideline files |
Input JSON: {"docs_path": "/home/user/my-app/docs/guidelines"}
Line 71: docs_path → "/home/user/my-app/docs/guidelines"
Request body for POST /docs/index. Points to a folder of documents to index.
| Field | Type | Purpose |
|---|---|---|
docs_path |
str |
Absolute path to directory with .md/.pdf/.txt files |
Input JSON: {"docs_path": "/Users/me/team-docs"}
Request body for POST /docs/search. Semantic search across indexed documents.
| Field | Type | Default | Purpose |
|---|---|---|---|
query |
str |
Search query text | |
n_results |
int |
5 |
Number of results to return |
Input JSON: {"query": "deployment process", "n_results": 3}
Request body for POST /docs/ask. Ask a question answered from indexed documents.
| Field | Type | Default | Purpose |
|---|---|---|---|
question |
str |
The question to answer | |
n_results |
int |
5 |
Number of doc chunks to include as context |
Input JSON: {"question": "How do we deploy to production?", "n_results": 5}
Generic error response model, used across all endpoints.
| Field | Type | Default | Purpose |
|---|---|---|---|
error |
str |
Short error message | |
detail |
str |
"" |
Optional extended explanation |
ErrorResponse(error="Module not found", detail="Available: api, models, services")Serialized JSON:
{
"error": "Module not found",
"detail": "Available: api, models, services"
}