A Retrieval-Augmented Generation (RAG) system that combines large language models with vector-based information retrieval. The system enables users to upload documents, process them into embeddings, and ask natural language questions that are answered based on the document content.
- 📄 Document Ingestion: Support for PDF, TXT, MD, and DOCX files
- 🔍 Vector Search: High-performance similarity search using Qdrant
- 🤖 Local LLM: Privacy-focused responses using local LLM models via Ollama
- 🚀 REST API: Clean and well-documented API endpoints
- 🎯 Multiple Collections: Support for multiple document collections
- 📊 Monitoring: Built-in metrics and monitoring support
The system follows a layered architecture:
- API Layer: FastAPI-based REST API
- Service Layer: Business logic and orchestration
- Data Layer: Vector database (Qdrant) and file storage
- Infrastructure Layer: Configuration, logging, and monitoring
- Backend: Python 3.11+, FastAPI
- Vector Database: Qdrant
- LLM Runtime: Ollama (Llama 2, Mistral, etc.)
- Embeddings: sentence-transformers (Granite embedding model)
- Model:
ibm-granite/granite-embedding-small-english-r2 - Dimension: 384
- Max context: 8192 tokens
- Libraries: transformers 5.0.0+, sentence-transformers 5.2.2+, torch 2.10.0+
- Model:
- Document Processing: PyPDF2, python-docx
- Deployment: Docker, Docker Compose
Note: For more details on the Granite model migration, see VERSION_UPDATES.md or GRANITE_MIGRATION.md.
- Python 3.11 or higher
- Docker and Docker Compose
- Git
- Clone the repository:
git clone https://github.com/yourusername/simple-rag-system.git
cd simple-rag-system- Set up environment variables:
# Windows
copy env.example .env
# Linux/Mac
cp env.example .env
# Or use the setup script
scripts\setup_env.bat # Windows
./scripts/setup_env.sh # Linux/Mac
# Edit .env with your configuration- Start services with Docker Compose:
docker-compose -f deployments/docker/docker-compose.yml up -d- Wait for services to be ready:
docker-compose logs -f rag-app- Access the API:
- API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Qdrant Dashboard: http://localhost:6333/dashboard
- Create conda environment:
conda env create -f environment.yml
conda activate simple-rag-system- Install dev dependencies (optional):
pip install -r requirements-dev.txtSee CONDA_SETUP.md for detailed conda setup instructions.
- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt
pip install -r requirements-dev.txt- Start Qdrant:
docker run -d -p 6333:6333 -p 6334:6334 --name qdrant qdrant/qdrant:latest- Start Ollama:
docker run -d -p 11434:11434 --name ollama ollama/ollama:latest- Pull a model:
docker exec ollama ollama pull llama2- Run the application:
uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000curl -X POST "http://localhost:8000/api/v1/documents" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf" \
-F "collection=my_collection"curl -X POST "http://localhost:8000/api/v1/query" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the main topic of the document?",
"collection": "my_collection",
"top_k": 5
}'curl -X GET "http://localhost:8000/api/v1/collections"Once the server is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
simple-rag-system/
├── src/ # Source code
│ ├── api/ # API layer
│ ├── services/ # Business logic
│ ├── core/ # Core functionality
│ ├── utils/ # Utilities
│ ├── parsers/ # Document parsers
│ ├── embedding/ # Embedding models
│ └── llm/ # LLM integration
├── tests/ # Test suite
├── deployments/ # Docker configurations
├── docs/ # Documentation
└── scripts/ # Utility scripts
For detailed project structure, see PROJECT_STRUCTURE.md
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test file
pytest tests/unit/test_document_processor.py# Format code
black src/ tests/
isort src/ tests/
# Lint code
flake8 src/ tests/
# Type checking
mypy src/pip install pre-commit
pre-commit installdocker-compose -f deployments/docker/docker-compose.prod.yml up -dSee env.example for all available configuration options. For detailed setup instructions:
- QUICK_START.md - Quick start guide
- LOCAL_SETUP.md - Detailed local setup
- CONDA_SETUP.md - Conda-specific setup guide
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
To enable monitoring:
docker-compose -f deployments/docker/docker-compose.yml --profile monitoring up -d- Basic Design - System overview
- C4 Model - Architecture diagrams
- High-Level Design - Architectural patterns
- Data Flow - Data flow diagrams
- Sequence Diagrams - Interaction sequences
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI - Modern web framework
- Qdrant - Vector similarity search engine
- Ollama - Run LLMs locally
- sentence-transformers - Sentence embeddings
- LangChain - Framework for LLM applications
For issues, questions, or contributions, please:
- Open an issue on GitHub
- Contact: your.email@example.com
- Web UI for easier document management
- Chat history and conversation memory
- Multi-modal support (images, audio)
- Advanced chunking strategies
- Reranking models
- Multi-language support
- Fine-tuning capabilities
Note: This is a simple RAG system designed for demonstration and learning purposes. For production use, consider additional security measures, monitoring, and optimization.