A document Q&A chatbot that lets you ask questions about your documents using two different approaches: Document Injection and Retrieval-Augmented Generation (RAG). Compare both methods side-by-side through an intuitive Gradio web interface with real-time token usage tracking.
- Dual Chat Modes β Switch between Document Injection and Full RAG to compare approaches
- Document Upload β Upload
.txt,.md, and.pdffiles, or use default documents - Real-time Token Tracking β Monitor cumulative input/output/total token usage per session
- Vector Similarity Search β RAG mode uses HuggingFace embeddings with in-memory vector store
- Conversation History β Multi-turn conversations with full context retention
- Shareable Link β Gradio generates a public URL for remote access
- Standalone CLI Tools β Run RAG or injection modes from the command line
flowchart TB
subgraph UI["Gradio Web UI"]
Upload["π Upload Documents"]
ModeSelect["π Mode Selector"]
Chat["π¬ Chat Interface"]
Tokens["π Token Stats"]
end
subgraph Injection["Document Injection Mode"]
AllDocs["All Documents"] --> SysPrompt["System Prompt"]
end
subgraph RAG["Full RAG Mode"]
Embed["HuggingFace Embeddings<br/><i>all-MiniLM-L6-v2</i>"]
VecStore["In-Memory Vector Store"]
SimSearch["Similarity Search<br/><i>k=2</i>"]
Embed --> VecStore --> SimSearch --> RAGPrompt["System Prompt"]
end
subgraph LLM["Language Model"]
OpenAI["OpenAI GPT-4o-mini"]
end
Upload --> |".txt / .md / .pdf"| Loader["Document Loader"]
Loader --> Injection
Loader --> RAG
ModeSelect --> |"Document Injection"| Injection
ModeSelect --> |"Full RAG"| RAG
SysPrompt --> OpenAI
RAGPrompt --> OpenAI
OpenAI --> Chat
OpenAI --> Tokens
| Document Injection | Full RAG | |
|---|---|---|
| Approach | Injects all document content into the system prompt | Embeds documents into vectors, retrieves only the 2 most relevant chunks |
| Pros | Simple, complete context available | Token-efficient, scales to larger document sets |
| Cons | Token-heavy, limited by context window | May miss relevant info if not in top-k results |
| Best for | Small document sets | Larger document collections |
| Library | Version | Purpose |
|---|---|---|
| LangChain | β₯ 0.3.0 | LLM orchestration framework |
| langchain-openai | β₯ 0.2.0 | OpenAI GPT integration |
| langchain-huggingface | β₯ 0.1.0 | HuggingFace embedding models |
| langchain-community | β₯ 0.3.0 | Community document loaders |
| langchain-google-genai | β₯ 2.0.0 | Google Gemini integration (optional) |
| Gradio | β₯ 5.0.0 | Web UI framework |
| pypdf | β₯ 4.0.0 | PDF text extraction |
| sentence-transformers | β₯ 3.0.0 | Embedding model runtime |
| python-dotenv | β₯ 1.0.0 | Environment variable management |
Rag_System_LangChain/
βββ app.py # Main application β Gradio web UI with both modes
βββ full_rag_system.py # Standalone CLI β RAG mode with directory loaders
βββ context_injection.py # Standalone CLI β Document injection mode
βββ documents_injection_system.py # Utility β Recursive document loader (.txt/.md/.pdf)
βββ requirements.txt # Python dependencies
βββ common-sections-template.md # Sample markdown template
βββ .env # API keys (not committed)
βββ docs/ # Default document directory
βββ Rabbit Company Sample.txt # Sample company data
βββ Serena Company Sample.txt # Sample company data
The primary entrypoint with the full Gradio web interface. Implements both chat modes in a single application.
| Function | Description |
|---|---|
load_file_content(file_path) |
Reads .txt, .md, or .pdf files and returns text content |
load_default_documents() |
Loads all documents from the docs/ folder |
load_uploaded_documents(files) |
Loads documents from user-uploaded files |
get_documents(uploaded_files) |
Returns uploaded docs (priority) or defaults |
create_vector_store(documents) |
Creates an InMemoryVectorStore with HuggingFace embeddings |
injection_chat(message, history, documents) |
Handles Document Injection mode β all docs in system prompt |
rag_chat(message, history, documents) |
Handles RAG mode β similarity search β relevant chunks only |
chat(message, history, mode, uploaded_files) |
Routes to the appropriate chat mode |
Command-line RAG chatbot using LangChain's PyPDFDirectoryLoader and DirectoryLoader. Loads all documents from docs/, builds a vector store, and enters an interactive Q&A loop.
Standalone utility that recursively reads all .txt, .md, .pdf files from a folder using pypdf and pathlib. Returns a list of {filepath: content} dictionaries.
- Python 3.10+
- An OpenAI API key
# Clone the repository
git clone https://github.com/YOUR_USERNAME/Rag_System_LangChain.git
cd Rag_System_LangChain
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .env
# Edit .env and add your OpenAI API keyCreate a .env file in the project root:
OPENAI_API_KEY=sk-your-api-key-here
# Optional: for Google Gemini support
# GOOGLE_API_KEY=your-google-api-keypython app.pyThis launches the Gradio interface and prints a local URL (and a public share link). Open it in your browser to:
- Select a mode β Choose between "Document Injection" or "Full RAG"
- Load documents β Upload files or click "Default" to use the
docs/folder - Ask questions β Type questions about your documents in the chat
- Monitor tokens β Track cumulative token usage in the stats bar below the chat
python full_rag_system.pyInteractive command-line chatbot using RAG with vector similarity search. Loads all documents from docs/ automatically.
Place your .txt, .md, or .pdf files in the docs/ folder, or upload them directly through the web UI.
In app.py, modify the model initialization:
# Default
llm = ChatOpenAI(model='gpt-4o-mini')
# Use a different model
llm = ChatOpenAI(model='gpt-4o') # More capable, higher cost
llm = ChatOpenAI(model='gpt-3.5-turbo') # Lower costIn the rag_chat() function, adjust the number of retrieved chunks:
# Retrieve more context (default: k=2)
retrieved_docs = vector_store.similarity_search(message, k=4)Uncomment the Gemini configuration in full_rag_system.py and set GOOGLE_API_KEY in your .env:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")Contributions are welcome! Here's how to get started:
- Fork the repository
- Create a branch for your feature or fix:
git checkout -b feature/my-feature
- Make your changes and ensure they work
- Commit with a clear message:
git commit -m "Add: description of your change" - Push and open a Pull Request:
git push origin feature/my-feature
- Add persistent vector store support (ChromaDB, FAISS)
- Implement document chunking for large files
- Add support for more file formats (
.docx,.csv) - Build a conversation export feature
- Add streaming responses
This project is licensed under the MIT License.