A research framework for optimizing Retrieval-Augmented Generation (RAG) pipelines using Generative Evolutionary Prompt Adjustment (GEPA).
Retrieval-Augmented Generation (RAG) systems rely on complex interactions between multiple components (Query Planner, Retriever, Reranker, Generator). Optimizing these components individually often leads to sub-optimal end-to-end performance. We introduce a Staged Evolutionary Optimization approach that iteratively refines prompts for each module, ensuring downstream components adapt to the improved signal distributions of upstream modules. Our framework provides robust evaluation, statistical significance testing, and reproducibility for high-stakes domains like financial document analysis.
- Iterative Staged Optimization: Optimizes components in topological order (Query Planner → Reranker → Generator) to maximize holistic performance.
- Modular RAG Architecture:
- Query Planner: Decomposes complex queries.
- Reranker: Cross-encoder based filtering and deduplication.
- Generator: Context-aware response generation.
- Robust Evaluation Engine:
- Strict Train/Validation/Test splits to prevent data leakage.
- Comprehensive metrics: Precision/Recall, BLEU/ROUGE, and custom RAGAS-based scores.
- Statistical significance testing (Confidence Intervals, p-values).
- Python 3.9+
- OpenAI API Key (for GPT-4/GPT-3.5)
-
Install dependencies
pip install -r requirements.txt
-
Configure Environment Create a
.envfile in the root directory:OPENAI_API_KEY=your_sk_...
To reproduce the full research experiment with train/val/test splits and staged optimization:
python run_research_experiment.py \
--experiment_name "output_001" \
--n_queries 100 \
--model "gpt-4-turbo"To run a standalone optimization pass:
python run_optimization.py \
--data_path data/train/ \
--output_dir gepa_runs/rag-optimization/
├── modules/ # Core RAG Component Implementations
│ ├── evaluation/ # Metrics and RAGAS integration
│ ├── generator/ # LLM Response Generation
│ ├── query_planner/ # Query decomposition and strategy
│ ├── reranker/ # Context filtering and ranking
│ ├── base.py # Base abstractions
│ └── pipeline.py # End-to-end pipeline orchestrator
├── gepa_adapters/ # GEPA Optimization Interfaces
│ ├── generator_adapter.py
│ ├── query_planner_adapter.py
│ └── reranker_adapter.py
├── run_research_experiment.py # Main entry point for research exp
├── run_optimization.py # Optimization runner
└── requirements.txt # Dependencies
Our approach optimizes the RAG pipeline in three distinct stages:
- Query Planner Optimization: Evolves decomposition strategies to maximize retrieval recall.
- Reranker Optimization: Tunes filtering logic using the optimized queries from Stage 1, focusing on precision and context window utilization.
- Generator Optimization: Refines response synthesis prompts conditioned on the high-quality context from Stage 2.
This project is licensed under the MIT License - see the LICENSE file for details.