Home

Codewalk Wiki

Codewalk is an AI-powered codebase onboarding tool that analyzes repositories, builds dependency graphs, embeds code into vector stores, and answers questions about codebases using RAG (Retrieval-Augmented Generation).

Key Concepts

Term	Definition	Example
topological sort	Ordering vertices so that for every edge A→B, A comes after B. Only works on DAGs.	If A imports B and B imports C, topological order is: C, B, A (dependencies first).
blast radius	All files that would be affected if a given file changes — found by following reverse import edges transitively.	If A imports B and C imports A, changing B has blast radius = {A, C}.
transitive dependency	An indirect dependency through a chain. If A imports B and B imports C, then A transitively depends on C.	Changing C could break A even though A never directly imports C.
embedding	A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors.	The code `def add(a, b): return a+b` might become `[0.12, -0.45, 0.78, ...]` (1536 numbers for OpenAI).
vector store	A database optimized for storing embeddings and finding the most similar ones quickly.	ChromaDB stores code chunk embeddings and returns the 5 most similar chunks to your query.
cosine distance	Measures how different two vectors are. 0.0 = identical meaning, 1.0 = completely different, 2.0 = opposite.	Query `"scan files"` has cosine distance 0.15 to `scan_directory()` (very similar) and 0.85 to `grade_answer()` (very different).
ChromaDB	An open-source vector database for storing and searching embeddings. Used here to store code chunks.	`collection.query(query_texts=["scan files"], n_results=5)` returns the 5 closest code chunks.
DuckDB	An embedded SQL database (like SQLite but optimized for analytics). Used here to store file metadata and import edges.	`conn.execute("SELECT path FROM files WHERE language='python'").fetchall()` returns all Python file paths.
chunk	A piece of source code (usually one function or class) stored as a unit for search.	The function `def scan_directory(root): ...` (20 lines) is one chunk.
AST	Abstract Syntax Tree — a tree representation of source code structure, where each node is a language construct (function, class, if-statement, etc.).	`def add(a, b): return a+b` becomes a tree: FunctionDef → [args: a, b] → [body: Return → BinOp(a + b)].
tree-sitter	A fast, multi-language parser that builds ASTs. Supports 100+ languages without needing each language's compiler.	tree-sitter parses `config.py` into an AST, then we extract function/class nodes from it.
RAG	Retrieval-Augmented Generation — instead of asking an LLM to answer from memory, first retrieve relevant documents, then include them in the prompt.	Question: `"What does scan_directory do?"` → retrieve the source code of scan_directory → include it in the LLM prompt → get an accurate answer.
LLM	Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt.	`get_llm()` returns a ChatOpenAI instance that can answer questions about code.
Pydantic	A Python library for data validation using type hints. Defines schemas as classes with typed fields.	`class QueryRoute(BaseModel): route: str; target: str` — any instance is guaranteed to have string `route` and `target` fields.
glob pattern	A wildcard pattern for matching file paths. `` matches anything in one directory, `*` matches across directories.	`*/.py` matches all Python files in any subdirectory. `src/*.ts` matches TypeScript files only in `src/`.
diff	The set of changes between two versions of code, showing added (+) and removed (-) lines.	`- old_line\n+ new_line` shows `old_line` was replaced with `new_line`.
hunk	A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file).	A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed).
TTS	Text-to-Speech — converting text into spoken audio.	`speak("The config module loads settings")` generates an audio file and plays it.
STT	Speech-to-Text — converting spoken audio into text (transcription).	User speaks into microphone → STT produces `"What does scan_directory do?"`.

Glossary

Full definitions of all technical terms (vertex, edge, DAG, embedding, RAG, etc.) with examples.

config.py — Settings and LLM factory for all providers
errors.py — User-friendly error classification
log.py — Shared logging utility (stderr + file)
pipeline.py — Full indexing pipeline: scan → chunk → embed → store
query.py — Core query logic shared by MCP, API, and agent

Analysis Package

module_detector.py — Detects logical modules from file paths
dependency_graph.py — Builds file-level import dependency graphs
blast_radius.py — Calculates change impact (direct + transitive dependents)
reading_order.py — Suggests file reading order via topological sort
code_parser.py — Extracts functions/classes from source files (Tree-sitter)
python_parser.py — Python-specific AST parser for symbols
relevance_filter.py — LLM-based file filtering for indexing

Embeddings Package

chunker.py — Splits source files into parent/child chunks
embedder.py — Generates vector embeddings for code chunks
vector_store.py — ChromaDB wrapper for storing/querying embeddings

Ingestion Package

scanner.py — Walks directories and collects source files
file_filter.py — Pattern-based file inclusion/exclusion
tech_detect.py — Detects project tech stack from marker files

Generation Package

overview_generator.py — LLM-generated project overviews
module_explainer.py — LLM-generated module explanations
flow_generator.py — LLM-generated execution flow narratives
diagram_generator.py — Mermaid diagram generation

Graph Package

graph_store.py — DuckDB-backed import/call graph storage
graph_runtime.py — In-memory graph for traversal (NetworkX)
call_extractor.py — Extracts function call relationships from AST

RAG Package

chain.py — RAG answer generation chain
query_router.py — Routes queries to graph/vector/hybrid
query_rewriter.py — Rewrites queries for better retrieval
chunk_grader.py — Grades retrieved chunks for relevance
answer_grader.py — Grades generated answers for hallucination
graph_expansion.py — Expands retrieval using dependency graph
retrieval_quality.py — Distance-based filtering of search results
prompts.py — Prompt templates for RAG chain

Agent Package

graph.py — LangGraph agent workflow definition
tools.py — Tool definitions for the LangGraph agent
prompts.py — System prompts for the agent

API Package

main.py — FastAPI REST endpoints
models.py — Pydantic request/response models
state.py — Global application state management

MCP Package

server.py — Model Context Protocol server for VS Code integration

Review Package

reviewer.py — Code review engine (diff + file review)
diff_parser.py — Git diff parsing into structured hunks
models.py — Review result data models
review_prompts.py — Prompt templates for code review
guidelines_loader.py — Loads team coding standards
test_coverage.py — Test coverage gap detection

Doc Knowledge Package

doc_parser.py — Parses .md, .pdf, .txt documents into chunks
doc_store.py — ChromaDB store for document chunks
prompts.py — Prompt template for doc Q&A

Voice Package

companion.py — Voice companion orchestrator
stt.py — Speech-to-text (microphone recording + transcription)
tts.py — Text-to-speech output
router.py — Routes voice transcriptions to codewalk tools
backends.py — TTS/STT backend implementations

Home

Codewalk Wiki

Key Concepts

Glossary

Table of Contents

Core Modules

Analysis Package

Embeddings Package

Ingestion Package

Generation Package

Graph Package

RAG Package

Agent Package

API Package

MCP Package

Review Package

Doc Knowledge Package

Voice Package

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally