Local-first RAG experimentation platform for testing and comparing retrieval techniques. Experiment with different indexing strategies, retrieval methods, query expansion techniques, and orchestration approaches to understand how they impact retrieval quality and performance.
This application provides a comprehensive playground for experimenting with Retrieval-Augmented Generation (RAG) techniques. Users can test individual RAG pipelines to see how data is retrieved, or run comparison mode to evaluate up to two different RAG pipelines side-by-side and analyze differences in results, retrieval quality, and performance metrics.
Test individual RAG pipelines with full control over technique selection:
- Indexing Strategies: Standard chunking, parent document, semantic chunking, headers-based, proposition-based
- Retrieval Methods: Basic vector search, hybrid (vector + BM25 fusion)
- Query Expansion: HyDE (Hypothetical Document Embeddings)
- Filtering & Post-processing: Reranking, contextual compression
- Orchestration: Self-RAG, CRAG (Corrective RAG), Adaptive Retrieval
- View retrieved chunks, response quality, and performance metrics
Side-by-side comparison of up to two RAG pipelines:
- Configure two different technique combinations
- Execute the same query against both pipelines simultaneously
- Compare results with semantic similarity scoring
- Analyze differences in:
- Retrieved chunks and relevance
- Response quality and accuracy
- Latency and performance
- Retrieval effectiveness
The platform supports techniques across five categories:
Indexing (Document Chunking)
standard_chunking- Fixed-size text chunks with overlapparent_document- Small chunks with parent document contextsemantic_chunking- Meaning-based chunk boundariesheaders_chunking- Section-aware chunkingproposition_chunking- Sentence-level proposition extraction
Retrieval Methods
basic_rag- Vector similarity searchfusion_retrieval- Hybrid vector + BM25 fusion
Query Expansion
hyde- Hypothetical Document Embeddings for query enhancement
Filtering & Post-processing
reranking- Re-rank retrieved documents by relevancecompression- Contextual compression to reduce noise
Orchestration (Advanced Controllers)
self_rag- Self-reflective retrieval with quality evaluationcrag- Corrective RAG with web search fallbackadaptive_retrieval- Adaptive retrieval strategy selection
- FastAPI - Modern async web framework
- LangChain v1.0+ - RAG pipeline orchestration
- Ollama - Local LLM service (llama3.2, nomic-embed-text)
- ChromaDB - Embedded vector database
- SQLModel - Database ORM with SQLite
- rank-bm25 - BM25 keyword search
- PyMuPDF - PDF document processing
- Pydantic Settings - Configuration management
- Uvicorn - ASGI server
- React - UI framework
- TypeScript - Type-safe JavaScript
- Vite - Build tool and dev server
- Tailwind CSS - Utility-first CSS
- Axios - HTTP client
- Python 3.11+ (currently using Python 3.13.3)
- Node.js 18+
- uv package manager
- Ollama - Local LLM service
- Install Ollama and pull required models:
# Install Ollama from https://ollama.ai
# Pull required models
ollama pull nomic-embed-text
ollama pull llama3.2:3b- Configure Ollama for optimal performance:
# Set these BEFORE starting Ollama (REQUIRED for parallel execution)
export OLLAMA_NUM_PARALLEL=4 # Allow parallel requests
export OLLAMA_MAX_LOADED_MODELS=2 # Keep models in memory
# Start Ollama
ollama servePerformance Impact:
- ✅ With settings:
basic_rag + hyde + reranking= ~8-10s - ❌ Without settings:
basic_rag + hyde + reranking= ~20-23s
- Install dependencies:
cd rag-lab
uv sync- Start the FastAPI server:
uv run uvicorn app.main:app --reloadThe API will be available at http://localhost:8000
- Navigate to frontend directory:
cd frontend- Install dependencies:
npm install- Start the development server:
npm run devThe application will be available at http://localhost:5173
Create a .env file in the root directory (optional, defaults work for local development):
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_LLM_MODEL=llama3.2:3b
DATABASE_URL=sqlite:///./playground.dbrag-lab/
├── app/ # FastAPI application
│ ├── main.py # Application entry point
│ ├── core/ # Core functionality
│ │ ├── config.py # Configuration management
│ │ ├── health.py # Ollama health checks
│ │ ├── concurrency.py # LlmLockManager
│ │ └── dependencies.py # Dependency injection
│ ├── db/ # Database layer
│ │ ├── database.py # SQLModel engine
│ │ ├── models.py # Database models
│ │ └── repositories.py # Data access layer
│ ├── services/ # Business logic services
│ │ ├── vectorstore.py # ChromaDB & LocalFileStore
│ │ ├── bm25_manager.py # BM25 index management
│ │ ├── ingestion.py # PDF processing
│ │ └── rag_query_service.py # RAG query orchestration
│ ├── rag/ # RAG pipeline
│ │ ├── techniques/ # RAG techniques
│ │ │ ├── indexing/ # Chunking strategies
│ │ │ ├── retrieval/ # Retrieval methods
│ │ │ ├── query_expansion/ # Query enhancement
│ │ │ ├── filtering/ # Reranking & compression
│ │ │ └── orchestration/ # Advanced controllers
│ │ └── pipelines/ # Pipeline orchestration
│ ├── api/ # API routes
│ │ └── v1/
│ │ ├── documents.py # Document management
│ │ ├── rag.py # RAG query endpoints
│ │ └── results.py # Result retrieval
│ ├── models/ # Pydantic schemas
│ └── utils/ # Utilities
├── frontend/ # React application
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom hooks
│ │ ├── services/ # API client
│ │ └── contexts/ # React context
│ └── public/ # Static assets
├── chromadb/ # Vector store data (gitignored)
├── storage/parents/ # Parent document store (gitignored)
├── uploads/ # Uploaded PDFs (gitignored)
├── playground.db # SQLite database (gitignored)
└── pyproject.toml # Project dependencies
GET /- Root endpointGET /health- Health check (verifies Ollama connectivity)
POST /api/v1/documents/upload- Upload PDF with indexing configurationGET /api/v1/documents- List documents (optionally filtered by session)GET /api/v1/documents/{doc_id}- Get document detailsDELETE /api/v1/documents/{doc_id}- Delete document
POST /api/v1/rag/query- Execute RAG query with technique selectionPOST /api/v1/rag/compare- Compare two RAG pipelinesPOST /api/v1/rag/validate- Validate technique combination
GET /api/v1/results/{session_id}- Get all results for sessionGET /api/v1/results/compare?result_ids=id1,id2- Compare resultsGET /api/v1/results/detail/{result_id}- Get result details
Full API Documentation: Visit http://localhost:8000/docs when server is running
# Run with uv
uv run pytest
# With coverage
uv run pytest --cov=app# Install black and ruff
uv add --dev black ruff
# Format code
uv run black app/
uv run ruff check app/ --fixError: "Cannot connect to Ollama"
Solution: Ensure Ollama is running (ollama serve) and models are pulled
Symptom: basic_rag + hyde takes more than 10 seconds (should be 3-4s)
Solution:
-
Check if environment variables are set:
echo $OLLAMA_NUM_PARALLEL echo $OLLAMA_MAX_LOADED_MODELS
-
If not set, export them BEFORE starting Ollama:
export OLLAMA_NUM_PARALLEL=4 export OLLAMA_MAX_LOADED_MODELS=2
-
Restart Ollama after setting variables
Expected Performance:
- ✅ With proper config:
basic_rag + hyde= 3-4s - ❌ Without config:
basic_rag + hyde= 20-25s
Error: "Unable to open database file"
Solution: Ensure the application has write permissions in the project directory
Error: "No module named 'langchain_ollama'"
Solution: Run uv sync to install all dependencies
See LICENSE file for details.
