Multi-phase Efficient Vector Retrieval - A 5-phase RAG pipeline that optimizes context selection through progressive filtering and intelligent budgeting.
Unlike simple vector search, MeVe combines vector similarity, cross-encoder verification, BM25 fallback, MMR deduplication, and token budgeting to deliver high-quality, budget-aware context for LLMs.
Traditional RAG systems often:
- Return irrelevant chunks despite high similarity scores
- Waste tokens on redundant information
- Fail silently when vector search underperforms
- Ignore token budget constraints
MeVe solves these problems with a smart 5-phase pipeline:
β
Quality First - Cross-encoder verification ensures relevance
β
Adaptive Fallback - BM25 backup when vector search fails
β
Zero Redundancy - MMR-based deduplication
β
Budget Aware - Greedy token packing within limits
β
Production Ready - Tested on HotpotQA dataset
# Clone the repository
git clone https://github.com/nakulbh/Meve-framework.git
cd meve
# Install with uv (recommended)
uv pip install -e .
# Or with pip
pip install -e .from meve import MeVeEngine, MeVeConfig, ContextChunk
from meve.services.vector_db_client import VectorDBClient
# 1. Prepare your data
chunks = [
ContextChunk("The Eiffel Tower is in Paris, France.", "doc1"),
ContextChunk("Paris is the capital of France.", "doc2"),
ContextChunk("The Louvre Museum is in Paris.", "doc3"),
]
# 2. Initialize ChromaDB (default vector database)
vector_db = VectorDBClient(
chunks=chunks,
collection_name="my_collection",
is_persistent=False # In-memory for quick testing
)
# 3. Configure the pipeline
config = MeVeConfig(
k_init=10, # Initial candidates from vector search
tau_relevance=0.5, # Relevance threshold (0-1)
n_min=3, # Min chunks to avoid fallback
theta_redundancy=0.85, # Similarity threshold for deduplication
t_max=512 # Maximum token budget
)
# 4. Initialize engine with ChromaDB
engine = MeVeEngine(config=config, vector_db_client=vector_db)
# 5. Retrieve context
context = engine.run("Where is the Eiffel Tower?")
print(context)MeVe uses ChromaDB as its default vector database. Three ways to use it:
# Option 1: In-memory (fastest, temporary)
vector_db = VectorDBClient(chunks=chunks, is_persistent=False)
# Option 2: Persistent storage (production)
vector_db = VectorDBClient(chunks=chunks, is_persistent=True)
# Option 3: Load existing collection
vector_db = VectorDBClient(
collection_name="my_collection",
is_persistent=True,
load_existing=True # No re-embedding needed!
)Quick start: python examples/quickstart_chromadb.py
Full guide: See docs/chromadb_guide.md
# Download HotpotQA dataset
make download-data
# Run with real data (loads 50 examples by default)
make run
# Or run the basic example
make exampleQuery β kNN Search β Verification β [Fallback?] β Deduplication β Budgeting β Context
β β β β β
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
-
Phase 1 (kNN) - Vector similarity search via ChromaDB
Returns topk_initcandidates usingall-MiniLM-L6-v2embeddings -
Phase 2 (Verification) - Cross-encoder re-ranking
Usescross-encoder/ms-marco-MiniLM-L-6-v2to filter bytau_relevancethreshold -
Phase 3 (Fallback) - Conditional BM25 retrieval
Only triggers when|verified| < n_min- supplements with lexical search -
Phase 4 (Prioritization) - MMR-based deduplication
Removes redundant chunks usingtheta_redundancysimilarity threshold -
Phase 5 (Budgeting) - Greedy token packing
Fits top chunks withint_maxbudget using GPT-2 tokenizer
| Parameter | Type | Default | Description |
|---|---|---|---|
k_init |
int | 10 | Initial candidates from vector search |
tau_relevance |
float | 0.5 | Cross-encoder threshold (0-1) |
n_min |
int | 3 | Min verified chunks to skip fallback |
theta_redundancy |
float | 0.85 | Similarity threshold for deduplication |
t_max |
int | 512 | Maximum token budget |
# Development - Fast iteration
config = MeVeConfig(k_init=5, tau_relevance=0.3, n_min=2, t_max=256)
# Production - Quality focus
config = MeVeConfig(k_init=20, tau_relevance=0.6, n_min=5, t_max=1024)
# Tight budget - Minimal tokens
config = MeVeConfig(k_init=10, tau_relevance=0.7, n_min=2, t_max=128)# Install development dependencies
make install-dev
# Run tests
make test
# Format code
make format
# Lint code
make lint
# Clean cache
make cleanmeve/
βββ core/
β βββ engine.py # MeVeEngine orchestrator
β βββ models.py # ContextChunk, MeVeConfig, Query
βββ phases/
β βββ phase1_knn.py # Vector search
β βββ phase2_verification.py # Cross-encoder
β βββ phase3_fallback.py # BM25 fallback
β βββ phase4_prioritization.py # MMR deduplication
β βββ phase5_budgeting.py # Token packing
βββ services/
β βββ vector_db_client.py # ChromaDB wrapper
βββ utils/
# meve/phases/phase6_custom.py
def execute_phase_6(query: str, chunks: List[ContextChunk], config: MeVeConfig):
"""Your custom phase logic."""
# Process chunks
return processed_chunks
# Update MeVeEngine.run() to call your phase
# Add parameters to MeVeConfig if needed- Question Answering - Retrieve precise context for factual queries
- Chatbots - Budget-aware context for conversational AI
- Document Search - Hybrid vector + lexical retrieval
- Knowledge Bases - Deduplicated, relevant snippets
# Run all tests
pytest
# Run specific test file
pytest __tests__/unit/test_engine.py
# Run with coverage
pytest --cov=meveTest fixtures available in __tests__/fixtures/sample_data.py.
See CONTRIBUTING.md for development guidelines.
Commit Convention: feat:, fix:, docs:, test:, refactor:, chore:
MIT License - see LICENSE for details.
Built with β€οΈ for the RAG community