A production-grade Retrieval-Augmented Generation platform built from scratch with zero external vector DB dependencies. RAGPilot implements custom HNSW indexing, hybrid BM25+vector search with reciprocal rank fusion, multi-strategy document chunking, and a full evaluation framework — all in pure Python and TypeScript.
| Metric | Value |
|---|---|
| Endpoints | 45+ |
| Components | 40+ |
| Chunking Strategies | 5 |
| Search Modes | 3 |
| Eval Metrics | 5 |
| Lines of Code | 8500+ |
- Multi-format ingestion supporting PDF, DOCX, HTML, and Markdown
- Automatic content extraction with metadata preservation
- Batch upload and processing pipeline
- Document versioning and collection management
- Fixed-size chunking with configurable character windows
- Sentence-aware chunking that preserves linguistic boundaries
- Paragraph-based chunking for structurally coherent segments
- Semantic chunking using embedding similarity thresholds
- Recursive chunking with hierarchical separator fallback
- Custom HNSW (Hierarchical Navigable Small World) approximate nearest neighbor index
- Exact KNN brute-force search for baseline comparison
- Rich metadata filtering with operators:
$gt,$gte,$lt,$lte,$ne,$in,$nin - Cosine similarity scoring with normalized vectors
- BM25 sparse retrieval with TF-IDF weighting
- Dense vector search via custom HNSW or exact KNN
- Reciprocal Rank Fusion (RRF) for combining sparse and dense results
- Configurable alpha blending between search modalities
- Cross-encoder simulation for fine-grained relevance scoring
- Maximal Marginal Relevance (MMR) for result diversification
- Cohere-style re-ranking simulation
- Configurable top-k post-rerank selection
- Pluggable prompt templates with variable substitution
- Streaming response generation
- Source citation with chunk-level attribution
- Context window management with automatic truncation
- Sliding window memory for recent message retention
- Summary-based compression for long conversation histories
- Hybrid window + summary strategy
- Per-conversation isolation with session management
- Faithfulness scoring for hallucination detection
- Relevance measurement between queries and retrieved context
- Correctness comparison against ground-truth answers
- Precision and Recall metrics for retrieval quality
- Batch evaluation runs with aggregate reporting
System Architecture Diagram
graph TB
User[User] --> Frontend[Frontend - Vue.js SPA]
Frontend --> API[API Gateway - FastAPI]
API --> DocService[Document Service]
API --> SearchService[Search Service]
API --> ChatService[Chat Service]
API --> EvalService[Eval Service]
DocService --> Chunking[Chunking Engine]
Chunking --> Embedding[Embedding Service]
Embedding --> VectorStore[(Vector Store)]
SearchService --> VectorStore
SearchService --> BM25[BM25 Index]
SearchService --> Reranker[Re-ranker]
BM25 --> Reranker
ChatService --> Memory[Conversation Memory]
ChatService --> LLM[LLM Service]
ChatService --> SearchService
style User fill:#667eea,color:#fff
style Frontend fill:#764ba2,color:#fff
style API fill:#667eea,color:#fff
style DocService fill:#7c3aed,color:#fff
style SearchService fill:#7c3aed,color:#fff
style ChatService fill:#7c3aed,color:#fff
style EvalService fill:#7c3aed,color:#fff
style Chunking fill:#6d28d9,color:#fff
style Embedding fill:#6d28d9,color:#fff
style VectorStore fill:#5b21b6,color:#fff
style BM25 fill:#5b21b6,color:#fff
style Reranker fill:#6d28d9,color:#fff
style Memory fill:#6d28d9,color:#fff
style LLM fill:#6d28d9,color:#fff
# Clone and setup
cp .env.example .env
# Backend
cd backend && pip install -r requirements.txt && uvicorn app.main:app --reload
# Frontend
cd frontend && npm install && npm run devdocker-compose up --buildThe frontend will be available at http://localhost:3000 and the API at http://localhost:8000.
ragpilot/
├── backend/
│ ├── app/
│ │ ├── main.py
│ │ ├── config.py
│ │ ├── models.py
│ │ ├── routers/
│ │ │ ├── chat.py
│ │ │ ├── collections.py
│ │ │ ├── documents.py
│ │ │ ├── eval.py
│ │ │ └── search.py
│ │ └── services/
│ │ ├── chunking_service.py
│ │ ├── document_service.py
│ │ ├── embedding_service.py
│ │ ├── llm_service.py
│ │ ├── reranker.py
│ │ └── vector_store.py
│ ├── tests/
│ │ ├── test_chunking.py
│ │ ├── test_vector_store.py
│ │ └── test_eval.py
│ ├── Dockerfile
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ ├── views/
│ │ ├── stores/
│ │ ├── services/
│ │ └── router/
│ ├── Dockerfile
│ ├── nginx.conf
│ └── package.json
├── docker-compose.yml
├── .env.example
├── .editorconfig
├── .gitignore
├── LICENSE
└── README.md
Built with ❤️ by panaceya