Simple RAG System

A Retrieval-Augmented Generation (RAG) system that combines large language models with vector-based information retrieval. The system enables users to upload documents, process them into embeddings, and ask natural language questions that are answered based on the document content.

Features

📄 Document Ingestion: Support for PDF, TXT, MD, and DOCX files
🔍 Vector Search: High-performance similarity search using Qdrant
🤖 Local LLM: Privacy-focused responses using local LLM models via Ollama
🚀 REST API: Clean and well-documented API endpoints
🎯 Multiple Collections: Support for multiple document collections
📊 Monitoring: Built-in metrics and monitoring support

Architecture

The system follows a layered architecture:

API Layer: FastAPI-based REST API
Service Layer: Business logic and orchestration
Data Layer: Vector database (Qdrant) and file storage
Infrastructure Layer: Configuration, logging, and monitoring

Tech Stack

Backend: Python 3.11+, FastAPI
Vector Database: Qdrant
LLM Runtime: Ollama (Llama 2, Mistral, etc.)
Embeddings: sentence-transformers (Granite embedding model)
- Model: ibm-granite/granite-embedding-small-english-r2
- Dimension: 384
- Max context: 8192 tokens
- Libraries: transformers 5.0.0+, sentence-transformers 5.2.2+, torch 2.10.0+
Document Processing: PyPDF2, python-docx
Deployment: Docker, Docker Compose

Note: For more details on the Granite model migration, see VERSION_UPDATES.md or GRANITE_MIGRATION.md.

Quick Start

Prerequisites

Python 3.11 or higher
Docker and Docker Compose
Git

Installation

Clone the repository:

git clone https://github.com/yourusername/simple-rag-system.git
cd simple-rag-system

Set up environment variables:

# Windows
copy env.example .env

# Linux/Mac
cp env.example .env

# Or use the setup script
scripts\setup_env.bat  # Windows
./scripts/setup_env.sh  # Linux/Mac

# Edit .env with your configuration

Start services with Docker Compose:

docker-compose -f deployments/docker/docker-compose.yml up -d

Wait for services to be ready:

docker-compose logs -f rag-app

Access the API:

API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Qdrant Dashboard: http://localhost:6333/dashboard

Manual Installation (Development)

Option A: Using Conda (Recommended for Data Science/ML workflows)

Create conda environment:

conda env create -f environment.yml
conda activate simple-rag-system

Install dev dependencies (optional):

pip install -r requirements-dev.txt

See CONDA_SETUP.md for detailed conda setup instructions.

Option B: Using venv (Python virtual environment)

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt
pip install -r requirements-dev.txt

Start Qdrant:

docker run -d -p 6333:6333 -p 6334:6334 --name qdrant qdrant/qdrant:latest

Start Ollama:

docker run -d -p 11434:11434 --name ollama ollama/ollama:latest

Pull a model:

docker exec ollama ollama pull llama2

Run the application:

uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000

Usage

Upload a Document

curl -X POST "http://localhost:8000/api/v1/documents" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.pdf" \
  -F "collection=my_collection"

Query the System

curl -X POST "http://localhost:8000/api/v1/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the main topic of the document?",
    "collection": "my_collection",
    "top_k": 5
  }'

List Collections

curl -X GET "http://localhost:8000/api/v1/collections"

API Documentation

Once the server is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Project Structure

simple-rag-system/
├── src/                    # Source code
│   ├── api/               # API layer
│   ├── services/          # Business logic
│   ├── core/              # Core functionality
│   ├── utils/             # Utilities
│   ├── parsers/           # Document parsers
│   ├── embedding/         # Embedding models
│   └── llm/               # LLM integration
├── tests/                 # Test suite
├── deployments/           # Docker configurations
├── docs/                  # Documentation
└── scripts/               # Utility scripts

For detailed project structure, see PROJECT_STRUCTURE.md

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/unit/test_document_processor.py

Code Quality

# Format code
black src/ tests/
isort src/ tests/

# Lint code
flake8 src/ tests/

# Type checking
mypy src/

Pre-commit Hooks

pip install pre-commit
pre-commit install

Deployment

Production Deployment

docker-compose -f deployments/docker/docker-compose.prod.yml up -d

Environment Variables

See env.example for all available configuration options. For detailed setup instructions:

QUICK_START.md - Quick start guide
LOCAL_SETUP.md - Detailed local setup
CONDA_SETUP.md - Conda-specific setup guide

Monitoring

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin/admin)

To enable monitoring:

docker-compose -f deployments/docker/docker-compose.yml --profile monitoring up -d

Documentation

Basic Design - System overview
C4 Model - Architecture diagrams
High-Level Design - Architectural patterns
Data Flow - Data flow diagrams
Sequence Diagrams - Interaction sequences

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

FastAPI - Modern web framework
Qdrant - Vector similarity search engine
Ollama - Run LLMs locally
sentence-transformers - Sentence embeddings
LangChain - Framework for LLM applications

Support

For issues, questions, or contributions, please:

Open an issue on GitHub
Contact: your.email@example.com

Roadmap

Web UI for easier document management
Chat history and conversation memory
Multi-modal support (images, audio)
Advanced chunking strategies
Reranking models
Multi-language support
Fine-tuning capabilities

Note: This is a simple RAG system designed for demonstration and learning purposes. For production use, consider additional security measures, monitoring, and optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
docker		docker
docs		docs
frontend		frontend
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONDA_SETUP.md		CONDA_SETUP.md
Dockerfile		Dockerfile
Dockerfile.frontend		Dockerfile.frontend
LICENSE		LICENSE
LOCAL_SETUP.md		LOCAL_SETUP.md
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
behave.ini		behave.ini
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
env.example		env.example
environment-dev.yml		environment-dev.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_granite_model.py		test_granite_model.py

Folders and files

Latest commit

History

Repository files navigation

Simple RAG System

Features

Architecture

Tech Stack

Quick Start

Prerequisites

Installation

Manual Installation (Development)

Option A: Using Conda (Recommended for Data Science/ML workflows)

Option B: Using venv (Python virtual environment)

Usage

Upload a Document

Query the System

List Collections

API Documentation

Project Structure

Development

Running Tests

Code Quality

Pre-commit Hooks

Deployment

Production Deployment

Environment Variables

Monitoring

Documentation

Contributing

License

Acknowledgments

Support

Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages