DocWhisperer

A powerful document question-answering system using RAG (Retrieval Augmented Generation) technology. This application allows users to upload PDF documents and ask questions about their contents, receiving accurate answers based on the document context.

Example Response Interface

Features

📚 PDF document processing and storage
🔍 Semantic search using vector embeddings
🤖 Advanced LLM-based question answering
💻 User-friendly web interface
📊 Source tracking and analysis details
⚡ Performance optimizations with caching
🛡️ Robust error handling and logging

Prerequisites

Python 3.8+
Ollama installed and running locally
Sufficient disk space for document storage and vector embeddings

Installation

Clone the repository:

git clone https://github.com/pushpendra-tripathi/docwhisperer.git
cd docwhisperer

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create a .env file (optional):

CHROMA_PATH=chroma
DATA_PATH=data
DEFAULT_MODEL=deepseek-r1:1.5b
EMBEDDING_MODEL=snowflake-arctic-embed2
CHUNK_SIZE=800
CHUNK_OVERLAP=80
TOP_K_RESULTS=20
TEMPERATURE=0.0

Usage

Start the web application:

streamlit run app.py

Open your browser and navigate to the provided URL (typically http://localhost:8501)
Use the sidebar to:
- Upload PDF documents
- Process uploaded documents
- Reset the database if needed
Ask questions in the chat interface about the uploaded documents

Project Structure

app.py: Main Streamlit web application
query_data.py: Core RAG query engine implementation
populate_database.py: Document processing and database management
config.py: Configuration settings
get_embedding_function.py: Embedding model configuration
test_rag.py: Test suite for RAG functionality

Configuration

The application can be configured through environment variables or the config.py file:

CHROMA_PATH: Directory for vector store
DATA_PATH: Directory for uploaded documents
DEFAULT_MODEL: Ollama model for question answering
EMBEDDING_MODEL: Model for document embeddings
CHUNK_SIZE: Document chunk size for processing
CHUNK_OVERLAP: Overlap between chunks
TOP_K_RESULTS: Number of relevant chunks to retrieve
TEMPERATURE: LLM temperature setting

Performance Optimization

The application includes several performance optimizations:

LRU caching for document retrieval
Efficient document chunking
Optimized vector similarity search
Retries for LLM queries
Streamlined UI updates

Error Handling

The application implements comprehensive error handling:

Graceful handling of upload errors
Retry mechanism for LLM queries
Detailed error logging
User-friendly error messages
Database state validation

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

LangChain for RAG implementation
Streamlit for the web interface
Chroma for vector storage
Ollama for LLM integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocWhisperer

Example Response Interface

Features

Prerequisites

Installation

Usage

Project Structure

Configuration

Performance Optimization

Error Handling

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
get_embedding_function.py		get_embedding_function.py
populate_database.py		populate_database.py
query_data.py		query_data.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DocWhisperer

Example Response Interface

Features

Prerequisites

Installation

Usage

Project Structure

Configuration

Performance Optimization

Error Handling

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages