This project implements a Retrieval-Augmented Generation (RAG) system using:
- 🌐 LangChain
- 📦 ChromaDB for vector storage
- 🔠 BM25 for keyword-based retrieval
- 🤖 Gemini 2.5 Flash (via
langchain-google-genai) as the LLM - 📄 Support for loading local and web documents (PDF, DOCX, TXT, URLs)
It includes a hybrid retrieval mechanism combining semantic search and BM25 keyword search, with an iterative critique-and-revision loop for fact-checked, accurate answers.
hybrid-rag-gemini/
│
├── rag_engine.py # Core logic: document loading, hybrid retriever, RAG pipeline
├── app.py # (Optional) Streamlit frontend for user interaction
├── .env # Stores your Google API key
├── data/ # Directory for local documents (PDF, DOCX, TXT)
│ |── attention_is_all_you_need.pdf
| └── realm_retrieval_augmented_pretraining.pdf
├── chroma_db/ # Persistent vector store directory. This will be created automatically during first run.
├── requirements.txt # Python dependencies
└── README.md # This file
-
Clone this repository:
git clone https://github.com/TargetTactician/Hybrid_RAG_Gemini.git cd Hybrid_RAG_Gemini -
Set up your environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set your API key: Must Create a
.envfile in the root directory:GOOGLE_API_KEY=your_google_api_key_here
app.py:
streamlit run app.py- Hybrid Retrieval: Combines semantic vector search (via Chroma) and keyword-based BM25.
- Query Rewriting: Reformulates user input for better recall.
- Critique Loop: Uses the LLM to critique and revise its own answers for factuality.
- Context Compression: Limits token budget for Gemini models.
- Source Attribution: Displays original source(s) for traceability.
- Persistence: Stores ChromaDB to disk — no need to re-embed each time.
Q: What are the benefits of Retrieval Augmented Generation according to latch.ai?
Q: What are the main components of an AI agent described by Lilian Weng?
Q: What are the key ideas from the paper "Attention Is All You Need"?
- If ChromaDB already exists, it loads from disk. Otherwise, it creates and persists embeddings.
- LLM token limit is handled by compressing context chunks dynamically.
- Gemini 2.5 Flash is used for both generation and critique.