🧠 RAG ChatBot — Chat with Your Local Documents

A Retrieval-Augmented Generation (RAG) chatbot built with Python, FAISS, and Groq LLMs. This system allows you to query, search, and chat with your own local documents using semantic search and context-aware response generation.

This project demonstrates a production-style RAG pipeline, focusing on clean architecture, modular design, and scalability.

🚀 Overview

Traditional LLMs do not have access to your private or local documents. RAG ChatBot bridges this gap by combining:

Vector-based retrieval (FAISS)
Semantic embeddings (all-MiniLM-L6-v2)
LLM-based response generation (Groq/Llama 3)

How it works:

You ask a natural language question, and the system:

Searches your documents intelligently (Semantic Search).
Retrieves the most relevant content chunks.
Generates accurate, grounded answers based strictly on your data.

✨ Key Capabilities

📄 Multi-Format Support: Chat with PDFs, text files, or markdown notes.
🔍 Semantic Intelligence: Uses vector similarity instead of simple keyword matching.
⚡ Lightning Fast: High-performance similarity search using FAISS.
🧠 Context-Aware: Answers powered by Groq's high-speed inference engine.
🔐 Secure: Environment-based API key management.
🧩 Modular: Clean separation of concerns for easy extension.

📁 Project Structure

RAGChatBot/
├── app.py                 # Main entry point for the chatbot
├── main.py                # Optional alternative entry/test script
│
├── src/                   # Core application logic
│   ├── __init__.py
│   ├── data_loader.py     # Document loading and preprocessing
│   ├── embedding.py       # Chunking & embedding pipeline
│   ├── vectorstore.py     # FAISS vector store management
│   └── search.py          # RAG search & retrieval logic
│
├── data/                  # Place your source documents here
├── faiss_store/           # Persisted FAISS index files
├── notebook/              # Jupyter notebooks for experiments
│
├── .env                   # Environment variables (Local only)
├── .gitignore             # Prevents sensitive data from being pushed
├── pyproject.toml         # Project metadata
├── uv.lock                # Locked dependencies (uv)
└── README.md

🛠️ Installation

1️⃣ Clone the Repository

  git clone [https://github.com/YourUsername/RAGChatBot.git](https://github.com/YourUsername/RAGChatBot.git)
  cd RAGChatBot

2️⃣ Set Up Virtual Environment (using uv)

Ensure uv is installed.

Create venv

uv venv

Activate (Windows)

.venv\Scripts\activate

Activate (Linux/macOS)

source .venv/bin/activate

Install dependencies

uv sync

3️⃣ Configure Environment Variables Create a .env file in the project root:

      GROQ_API_KEY=your_actual_groq_api_key

Usage

Run the ChatBot Bash

python app.py

🔄 The Pipeline in Action Load: Documents are read and cleaned from the data/ folder.

Embed: Text is split into chunks and converted into vector embeddings.

Store: Vectors are indexed in faiss_store/ for instant retrieval.

Query: Your question is compared against the index to find relevant text.

Generate: The LLM receives the question + the retrieved text to give a factual answer.

🧩 Adding New Documents Add your new files (.txt, .pdf, etc.) to the data/ directory.

In app.py, temporarily uncomment:

# store.build_from_documents(docs)

Run the app once to rebuild the index, then re-comment the line to save time on future runs.

⚙️ Configuration

Top-K Retrieval: Change top_k in search.py or app.py to control how much context the LLM receives.
Chunk Size: Adjust chunking logic in src/embedding.py to handle very long or very short documents more effectively.
Model Selection: Switch between different Groq models (e.g., llama-3.3-70b-versatile or llama-3.1-8b-instant).

🧪 Tech Stack Python 3.12+

FAISS (Facebook AI Similarity Search)

LangChain / Groq API

HuggingFace Embeddings

uv (Package management)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 RAG ChatBot — Chat with Your Local Documents

🚀 Overview

How it works:

✨ Key Capabilities

📁 Project Structure

Create venv

Activate (Windows)

Activate (Linux/macOS)

Install dependencies

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
faiss_store		faiss_store
notebook		notebook
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tempCodeRunnerFile.py		tempCodeRunnerFile.py
uv.lock		uv.lock

Thevishal-kumar/RAGChatBot

Folders and files

Latest commit

History

Repository files navigation

🧠 RAG ChatBot — Chat with Your Local Documents

🚀 Overview

How it works:

✨ Key Capabilities

📁 Project Structure

Create venv

Activate (Windows)

Activate (Linux/macOS)

Install dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages