You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A RAG-powered developer tool that indexes GitHub repositories and answers natural language questions about codebases with cited file paths.
Stack
Layer
Technology
Embeddings
Voyage AI voyage-code-3 (1024-dim)
Vector store
pgvector on Postgres 16
Answer synthesis
Google Gemini 2.5 Flash
Chunking
Tree-sitter AST-aware
API
Python 3.12 + FastAPI
Frontend
React 18 + Vite
Orchestration
Docker Compose
Quick start
# 1. Configure environment
cp .env.example .env
# Edit .env — fill in ANTHROPIC_API_KEY and VOYAGE_API_KEY# 2. Start the database, API, and frontend
docker compose up db api frontend
# 3. Index a repository (runs as a one-shot CLI container)
docker compose --profile ingest run ingestion --repo https://github.com/org/repo
# Optional: specify a branch
docker compose --profile ingest run ingestion --repo https://github.com/org/repo --branch develop
# 4. Open the UI
open http://localhost:5173
How it works
Ingestion pipeline
Clone / pull — GitPython clones the repo to /tmp/repos/<name> (idempotent: git pull on subsequent runs).
Walk files — skips .git, node_modules, vendor, __pycache__, dist, build, .next, binaries, files > 500 KB.
AST chunking — Tree-sitter extracts functions, classes, and methods. Files with no AST hits fall back to 60-line sliding windows with 10-line overlap. Chunks < 5 lines are dropped; chunks > 150 lines are split into 100-line sub-chunks with 15-line overlap.
Embed — Voyage AI voyage-code-3 in batches of 128 with a 1 s inter-batch sleep.
Upsert — clean re-index per file: delete existing chunks then bulk insert with embeddings. The HNSW index (created by db/init.sql) handles approximate nearest-neighbour search.
Query API (POST /query)
Embed the question with voyage-code-3 (input_type="query").
Cosine similarity search via pgvector's <=> operator; return top-8 chunks with similarity > 0.3.
Build a context block of retrieved chunks with file paths and line numbers.
Call Claude claude-sonnet-4-6 with a system prompt that instructs it to cite every claim.
Return { answer, chunks, latency_ms }.
API reference
Method
Path
Description
GET
/repos
List all indexed repositories
POST
/query
Ask a question; body: { question, repo_id, top_k? }