Skip to content

TargetTactician/Hybrid_RAG_Gemini

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 Hybrid RAG with Gemini 2.5 + BM25 (LangChain + ChromaDB)

This project implements a Retrieval-Augmented Generation (RAG) system using:

  • 🌐 LangChain
  • 📦 ChromaDB for vector storage
  • 🔠 BM25 for keyword-based retrieval
  • 🤖 Gemini 2.5 Flash (via langchain-google-genai) as the LLM
  • 📄 Support for loading local and web documents (PDF, DOCX, TXT, URLs)

It includes a hybrid retrieval mechanism combining semantic search and BM25 keyword search, with an iterative critique-and-revision loop for fact-checked, accurate answers.


📁 Project Structure


hybrid-rag-gemini/
│
├── rag_engine.py         # Core logic: document loading, hybrid retriever, RAG pipeline
├── app.py                # (Optional) Streamlit frontend for user interaction
├── .env                  # Stores your Google API key
├── data/                 # Directory for local documents (PDF, DOCX, TXT)
│   |── attention_is_all_you_need.pdf
|   └── realm_retrieval_augmented_pretraining.pdf
├── chroma_db/            # Persistent vector store directory. This will be created automatically during first run.
├── requirements.txt      # Python dependencies
└── README.md             # This file


⚙️ Installation

  1. Clone this repository:

    git clone https://github.com/TargetTactician/Hybrid_RAG_Gemini.git
    cd Hybrid_RAG_Gemini
    
  2. Set up your environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set your API key: Must Create a .env file in the root directory:

    GOOGLE_API_KEY=your_google_api_key_here

🚀 Usage

🖼️ Run Streamlit App

app.py:

streamlit run app.py

🔍 Features

  • Hybrid Retrieval: Combines semantic vector search (via Chroma) and keyword-based BM25.
  • Query Rewriting: Reformulates user input for better recall.
  • Critique Loop: Uses the LLM to critique and revise its own answers for factuality.
  • Context Compression: Limits token budget for Gemini models.
  • Source Attribution: Displays original source(s) for traceability.
  • Persistence: Stores ChromaDB to disk — no need to re-embed each time.

📄 Example Queries

Q: What are the benefits of Retrieval Augmented Generation according to latch.ai?

Q: What are the main components of an AI agent described by Lilian Weng?

Q: What are the key ideas from the paper "Attention Is All You Need"?

📌 Notes

  • If ChromaDB already exists, it loads from disk. Otherwise, it creates and persists embeddings.
  • LLM token limit is handled by compressing context chunks dynamically.
  • Gemini 2.5 Flash is used for both generation and critique.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors