A comprehensive Retrieval-Augmented Generation (RAG) system with a modern Streamlit frontend for document processing, vector search, and AI-powered question answering.
- Document Processing: Support for PDF, DOCX, TXT, and MD files
- Vector Database: ChromaDB for efficient similarity search and retrieval
- LLM Integration: OpenAI GPT models for intelligent answer generation
- Semantic Search: Advanced similarity matching with configurable chunk sizes
- RAG Fusion: LLM-powered query expansion for enhanced retrieval accuracy
- Modern Streamlit Interface: Clean, responsive web interface with custom styling
- Real-time Processing: Live document upload and processing with progress indicators
- Interactive Chat: Natural conversation interface with question-answer flow
- Source Attribution: Detailed display of which documents and chunks were used for answers
- Similarity Scores: Visual similarity percentages showing how well each chunk matches your question
- Expandable Source Chunks: Click to view the exact document chunks used for each answer
- Query Variations Display: See how RAG Fusion expands your questions for better retrieval
- System Status Dashboard: Real-time monitoring of vector store and LLM health
- Document Analytics: Track number of document chunks, conversations, and processing metrics
- Chunk Configuration Display: View current chunk size and overlap settings
- Health Indicators: Visual status indicators for vector store and LLM availability
- Conversation History: Persistent tracking and display of all Q&A interactions
- Data Export: Export complete conversation history to text files
- Data Clearing: One-click clearing of all documents and conversation data
- Vector Store Reinitialization: Manual reinitialization option for troubleshooting
- Tooltip Explanations: Hover-over tooltips explaining system components and metrics
- File Upload Management: Support for multiple file uploads with size tracking
- Temporary File Handling: Secure processing of uploaded documents
- Error Handling: Comprehensive error messages and recovery suggestions
- Advanced Retrieval: Query expansion, similarity filtering, and content-based reranking for better chunk relevance
- RAG Fusion: Intelligent query variation generation using LLM for improved retrieval coverage
-
Install Dependencies:
pip install -r requirements.txt
-
Environment Variables: Create a
.envfile in the root directory:OPENAI_API_KEY=your_openai_api_key_here -
Run the Application:
streamlit run app.py
- Upload Documents: Use the sidebar to upload PDF, DOCX, TXT, or MD files
- Process Documents: Click "🔄 Process Documents" to create embeddings and store in vector database
- Ask Questions: Use the chat interface to ask questions about your documents
- View Sources: See which documents and specific chunks were used to generate answers
- Monitor System Health: Check the sidebar for real-time system status and analytics
- View Similarity Scores: See percentage scores showing how well each document chunk matches your question
- Explore Source Chunks: Click expandable sections to view the exact text chunks used for answers
- Export Conversations: Download complete conversation history as text files
- Manage Data: Clear all data or reinitialize the vector store when needed
- Document Upload: Drag and drop or select multiple files
- System Status: Real-time health indicators for vector store and LLM
- Analytics Dashboard:
- Document chunks count
- Total conversations
- Chunk size and overlap settings
- Vector store and LLM status
- Data Management: Clear data, reinitialize vector store, export conversations
- Question Input: Natural language questions about your documents
- Answer Display: AI-generated responses with source attribution
- Source Information: Shows which documents were referenced
- Similarity Scores: Percentage scores indicating chunk relevance
- Expandable Chunks: Detailed view of source document chunks
- Query Variations: Display of how RAG Fusion expanded your question for better retrieval
- Conversation History: Complete history of all interactions
RAG/
├── app.py # Main Streamlit application with UI
├── config.py # Configuration settings
├── rag_system/
│ ├── __init__.py
│ ├── document_processor.py # Document loading and chunking
│ ├── vector_store.py # ChromaDB vector database operations
│ ├── llm_interface.py # OpenAI LLM integration
│ ├── rag_fusion.py # RAG Fusion query expansion system
│ └── rag_engine.py # Main RAG orchestration engine
├── embeddings_*/ # Vector embeddings storage directories
├── requirements.txt # Python dependencies
└── README.md # This file
The RAG system follows a classic Retrieval-Augmented Generation architecture with four main components working together. Here's the step-by-step process:
Component: DocumentProcessor class
- Purpose: Converts uploaded documents into searchable chunks
- Process:
- File Loading: Supports PDF, DOCX, TXT, and MD files
- Text Extraction: Extracts raw text from each file
- Chunking: Splits text into smaller chunks (default 500 characters with 100 character overlap)
- Metadata Addition: Adds source file information to each chunk
Component: VectorStore class
- Purpose: Creates and manages a searchable vector database
- Process:
- Embedding Generation: Uses HuggingFace's
all-MiniLM-L6-v2model to convert text chunks into vectors - Vector Storage: Stores embeddings in ChromaDB with metadata
- Persistence: Saves vectors to disk for future use
- Embedding Generation: Uses HuggingFace's
When a user asks a question:
Step 1: RAG Fusion Query Expansion
- Component:
RAGFusionclass - Purpose: Generates multiple query variations to improve retrieval coverage
- Process:
- Original Query: Takes the user's original question
- LLM-Powered Expansion: Uses the LLM to generate 5 additional variations considering:
- Synonyms and related terms
- Different phrasings and structures
- Broader and narrower interpretations
- Technical and non-technical language
- Different aspects of the same topic
- Query Set: Creates a set of 6 queries (original + 5 variations)
Step 2: Multi-Vector Search
- Component:
VectorStorewithRAGFusion - Purpose: Searches the vector database with all query variations
- Process:
- Parallel Search: Performs similarity search with each query variation
- Result Collection: Gathers results from all 6 searches
- Deduplication: Removes duplicate chunks based on content hash
- Fusion Scoring: Combines scores using weighted algorithm
Step 3: Scoring Algorithm
- Component:
RAGFusion._combine_and_rank_results() - Purpose: Intelligently combines results from multiple queries
- Process:
- Score Aggregation: For each unique chunk, collects all scores from different queries
- Fusion Calculation: Computes weighted fusion score using:
- Max Score (50% weight): Best similarity score from any query variation
- Average Score (30% weight): Mean of all scores for the chunk
- Frequency Bonus (20% weight): How many queries found this chunk (capped at 3)
- Final Score:
fusion_score = (max_score × 0.5) + (avg_score × 0.3) + (frequency_bonus × 0.2) - Ranking: Sorts chunks by fusion score in descending order
Component: LLMInterface class
- Purpose: Generates human-like answers using retrieved context
- Process:
- Context Preparation: Combines retrieved document chunks into context
- Prompt Engineering: Creates a structured prompt with context and question
- LLM Generation: Uses OpenAI's GPT-4o-mini to generate answer
- Source Tracking: Tracks which documents were used as sources
Additional Features:
- Follow-up Questions: Generates3relevant follow-up questions
- Source Attribution: Shows which documents were referenced
- Similarity Scores: Displays how relevant each chunk was
- Conversation History: Maintains chat history for context
- Query Variations Display: Shows all generated query variations for transparency
Embedding Model: sentence-transformers/all-MiniLM-L6-v2
- Lightweight, fast, and effective for semantic search
- Runs on CPU for accessibility
Vector Database: ChromaDB
- Persistent storage with automatic indexing
- Supports similarity search with scores
LLM: OpenAI GPT-4o-mini
- Configurable temperature (default: 0.7)
- Structured prompting for consistent responses
- Used for both answer generation and query expansion
Chunking Strategy: RecursiveCharacterTextSplitter
- Splits on natural boundaries (paragraphs, sentences, words)
- Maintains semantic coherence within chunks
RAG Fusion Configuration:
- Query Variations: 6 total (original + 5 LLM-generated)
- Fusion Weights: 50% max score, 30% average score, 20% frequency bonus
- Fallback: Simple term extraction if LLM unavailable
Documents → Text Extraction → Chunking → Embedding → Vector Store
↓
User Question → RAG Fusion → Multiple Query Variations → Similarity Search → Context Retrieval
↓
Context + Question → LLM Prompt → Answer Generation → Response
This architecture ensures that the system can provide accurate, source-attributed answers based on the specific documents you've uploaded, rather than relying solely on the LLM's pre-trained knowledge. The RAG Fusion component significantly improves retrieval accuracy by exploring multiple ways to interpret and search for the user's question.
- Chunk Size: 500 characters per document chunk
- Chunk Overlap: 100 characters overlap between chunks
- LLM Model: OpenAI GPT-4o-mini (configurable)
- Embedding Model: sentence-transformers/all-MiniLM-L6-v2
- Vector Store: ChromaDB with persistent storage
- RAG Fusion: Enabled by default for enhanced retrieval
- Retrieval Settings:
- Search Candidates: 8 initially retrieved
- Final Chunks: 4 chunks returned after filtering/reranking
- Min Similarity Threshold: 0.3 (similarity required)
- Query Expansion: Enabled (expands queries with key terms)
- Reranking: Enabled (combines semantic similarity with content relevance)
- Query Variations: 6 variations per question (original + 5)
- Adjust chunk size and overlap in
rag_system/document_processor.py - Modify LLM settings in
rag_system/llm_interface.py - Configure vector store parameters in
rag_system/vector_store.py - Tune retrieval settings in
config.py:DEFAULT_K: Number of initial search candidatesMAX_CHUNKS_TO_RETURN: Final number of chunks returnedMIN_SIMILARITY_THRESHOLD: Minimum similarity score for filteringUSE_QUERY_EXPANSION: Enable/disable query expansionUSE_RERANKING: Enable/disable content-based reranking
The RAG system includes sophisticated retrieval techniques for chunk relevance:
- LLM-Powered Query Expansion: Uses an LLM to generate multiple variations of user questions
- Intelligent Variations: Creates synonyms, different phrasings, broader/narrower interpretations
- Fusion Scoring: Combines results from all variations using weighted scoring (max score, average score, frequency)
- Enhanced Coverage: Finds relevant chunks even when exact terms don't match the original question
- Transparency: Displays all generated query variations in the interface for user understanding
- Key Term Extraction: Automatically identifies important terms from user questions
- Query Variations: Creates multiple search queries using key terms
- Broader Coverage: Finds relevant chunks even when exact terms don't match
- Threshold-Based Filtering: Only includes chunks above minimum similarity threshold
- Quality Control: Ensures only relevant chunks are considered for answers
- Configurable Threshold: Adjustable minimum similarity score (default: 0.3)
- Term Overlap Analysis: Calculates how many query terms appear in each chunk
- Position Weighting: Gives higher scores to chunks where terms appear earlier
- Combined Scoring: Merges semantic similarity with content relevance
- Weighted Average: 70% semantic similarity + 30% content relevance
- Cosine Similarity: Advanced similarity calculation between questions and document chunks
- Percentage Scores: Visual similarity percentages (0-100%) for each chunk
- Relevance Ranking: Automatic ranking of chunks by relevance to your question
- Tooltip Explanations: Hover over similarity scores for detailed explanations
- Document Sources: Shows which files were used for each answer
- Chunk-Level Detail: Displays specific document chunks with IDs
- Content Preview: Expandable sections showing exact chunk content
- File Path Display: Clear indication of source document names
- Health Checks: Real-time status of vector store and LLM
- Performance Metrics: Track document count, conversation count, and processing stats
- Error Recovery: Automatic suggestions for common issues
- Manual Controls: Options to reinitialize components when needed
- Session Persistence: Maintains conversation history and uploaded files
- Export Functionality: Download conversations with timestamps and sources
- Clean Slate: Complete data clearing for fresh starts
- File Management: Secure handling of uploaded documents
- Vector Store Not Ready: Click "🔄 Reinitialize Vector Store" in sidebar
- LLM Not Available: Check your OpenAI API key in
.envfile - Processing Errors: Ensure uploaded files are valid PDF, DOCX, TXT, or MD format
- Memory Issues: Clear data periodically if processing large documents
- Chunk Size: Smaller chunks (500-1000 chars) for precise answers, larger chunks (1000-2000 chars) for context
- Document Size: Process documents in batches for better performance
- Regular Maintenance: Export conversations and clear data periodically
MIT License - see LICENSE file for details.
