RetrievAI — Built for Scalable, Reliable AI Responses

Intelligent Document Retrieval & AI-Powered Answers

A production-ready Retrieval-Augmented Generation (RAG) system that lets you upload documents, ask questions in natural language, and get precise, cited AI-powered answers with confidence scoring.

📸 Screenshots

Landing Page

Features & How It Works

Features	How It Works

Call to Action

Workspace

Document Q&A	General Knowledge

✨ Features

Core RAG Capabilities

📄 Multi-Format Documents — PDF, DOCX, TXT, HTML, and images
🧠 Smart Chunking — Context-aware document splitting for optimal retrieval
🔍 Vector Search — Fast semantic retrieval with ChromaDB
💬 Structured Responses — Clean markdown formatting with source citations
📊 Confidence Scoring — Know when answers are well-supported by your documents

User Features

👤 Firebase Authentication — Secure user sessions with Google login
💾 Session Management — Organize conversations into separate workspaces
📌 Pin Messages — Save important responses for quick access
🔄 Session Cloning — Duplicate and modify conversations
📥 Export Chat — Download conversations as markdown
⏱️ Temporary Docs — Auto-cleanup of test documents

Developer Features

🚀 FastAPI Backend — Fast, async Python API
⚛️ React + TypeScript Frontend — Modern, responsive UI with Framer Motion
🎯 Type-Safe — Full TypeScript coverage
🔌 RESTful API — Clean, documented endpoints
📦 Docker Ready — Containerization support

🚀 Quick Start

Prerequisites

Python 3.11+
Node.js 18+
pnpm
HuggingFace account with Inference API access

1. Clone and Setup

git clone https://github.com/coder-msk/RetrievAI---Built-for-Scalable-Reliable-AI-Responses.git
cd RetrievAI

# Backend setup
cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your HF_TOKEN

# Frontend setup
cd ../frontend
pnpm install
cp .env.example .env
# Edit .env and add your Firebase config

2. Get Required API Keys

HuggingFace Token (Required):

Go to https://huggingface.co/settings/tokens
Create a Fine-grained token with "Inference Providers" permission
Add to backend/.env: HF_TOKEN=hf_your_token_here

Firebase (Optional, for auth):

Create project at https://console.firebase.google.com
Get config from Project Settings → Web App
Add to frontend/.env

3. Run

# Terminal 1 - Backend (port 8000)
cd backend
python app.py

# Terminal 2 - Frontend (port 3000)
cd frontend
pnpm install
pnpm dev

Visit http://localhost:3000

📁 Project Structure

RetrievAI/
├── backend/
│   ├── app.py              # FastAPI main application
│   ├── rag_system.py       # Core RAG logic
│   ├── llm_provider.py     # HuggingFace LLM integration
│   ├── config.py           # Configuration
│   ├── vector_store.py     # ChromaDB vector database
│   ├── document_loader.py  # Document processing
│   ├── embeddings_provider.py
│   ├── db.py               # SQLite for metadata
│   ├── auth.py             # Firebase authentication
│   └── requirements.txt
│
├── frontend/
│   ├── client/
│   │   └── src/
│   │       ├── pages/
│   │       │   ├── LandingPage.tsx
│   │       │   ├── FeaturesPage.tsx
│   │       │   ├── AuthPage.tsx
│   │       │   └── WorkspacePage.tsx
│   │       ├── components/
│   │       ├── hooks/
│   │       ├── services/
│   │       └── types/
│   ├── package.json
│   └── vite.config.ts
│
├── NOTICE              # Attribution notice
├── LICENSE             # Apache 2.0
└── README.md           # This file

⚙️ Configuration

Backend (`backend/config.py`)

# Model Selection
LLM_MODEL = "Qwen/Qwen3-4B-Instruct-2507"
EMBEDDING_MODEL = "Alibaba-NLP/Qwen3-Embedding-0.6B"

# Generation Parameters
MAX_TOKENS = 2048      # Response length
TEMPERATURE = 0.3      # Lower = more consistent

# RAG Behavior
TOP_K_RESULTS = 4      # Number of chunks to retrieve
CHUNK_SIZE = 1000      # Document chunk size
CHUNK_OVERLAP = 200    # Overlap between chunks

# Storage
VECTOR_DB_DIR = "./data/chroma"
DB_PATH = "./data/rag.db"

Recommended Models

Model	Speed	Quality	Notes
`Qwen/Qwen2.5-72B-Instruct`	⭐⭐	⭐⭐⭐⭐⭐	Best quality
`Qwen/Qwen2.5-32B-Instruct`	⭐⭐⭐	⭐⭐⭐	Good balance
`mistralai/Mixtral-8x7B-Instruct-v0.1`	⭐⭐⭐⭐	⭐⭐⭐	Fast

🔧 API Endpoints

Sessions

Method	Endpoint	Description
`POST`	`/sessions`	Create new chat session
`GET`	`/sessions`	List user sessions
`PATCH`	`/sessions/{id}`	Rename session
`DELETE`	`/sessions/{id}`	Delete session
`POST`	`/sessions/{id}/clone`	Clone session

Messages

Method	Endpoint	Description
`GET`	`/sessions/{id}/messages`	Get chat history
`POST`	`/sessions/{id}/messages`	Send question, get RAG response
`PATCH`	`/sessions/{id}/messages/{msg_id}/pin`	Pin/unpin message
`GET`	`/sessions/{id}/export`	Export as markdown

Documents

Method	Endpoint	Description
`POST`	`/upload`	Upload document (PDF, DOCX, etc.)
`GET`	`/documents`	List uploaded documents
`GET`	`/documents/{id}/preview`	Preview document
`DELETE`	`/documents/{id}`	Delete document

System

Method	Endpoint	Description
`GET`	`/stats`	System statistics
`GET`	`/me`	Current user info
`DELETE`	`/clear`	Clear all user data

📊 Query Response Format

{
  "question": "What are the HR policies?",
  "response": "# Human Resources Policies\n\nThe company provides...",
  "sources": [
    {"source": "hr_manual.pdf", "chunk": 0, "id": "doc_123"}
  ],
  "num_sources": 1,
  "supported_by_documents": true,
  "confidence": {
    "score": 0.89,
    "label": "high"
  },
  "mode": "hybrid",
  "retrieval_ms": 45,
  "generation_ms": 3200
}

Response Modes:

hybrid (default) — Uses documents if available, falls back to general knowledge
strict — Only answers from uploaded documents

Confidence Levels:

high (>0.75) — Strong document support
medium (0.45-0.75) — Moderate support
low (<0.45) — Weak or no document support

🚢 Deployment

Docker

# Backend
FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend/ .
CMD ["python", "app.py"]

# Frontend
FROM node:18-alpine
WORKDIR /app
COPY frontend/package.json frontend/pnpm-lock.yaml .
RUN npm install -g pnpm && pnpm install
COPY frontend/ .
RUN pnpm build
CMD ["pnpm", "preview"]

Railway / Render

Push to GitHub
Connect to your platform
Add environment variables (HF_TOKEN, Firebase config, etc.)
Deploy

🔒 Security Best Practices

Never commit .env files — Use .env.example templates
Rotate API keys regularly
Use Firebase Authentication for production
Set appropriate CORS origins in backend/.env
Enable HTTPS in production
Implement rate limiting for API endpoints

🐛 Troubleshooting

Issue	Solution
Invalid HF token	Get new token from https://huggingface.co/settings/tokens with "Inference API" permission
Model not found	Check model name in `config.py`, wait 20s for model warmup
Slow responses	Switch to faster model, reduce `MAX_TOKENS`, upgrade HF plan
Poor quality	Use 7B+ model, increase `MAX_TOKENS` to 1024+, lower `TEMPERATURE`
Rate limiting	Upgrade to HuggingFace Pro ($9/mo), implement caching

📈 Tech Stack

Component	Technology
Backend API	FastAPI (Python)
LLM Provider	HuggingFace Inference API
Vector Database	ChromaDB
Embeddings	Alibaba-NLP/Qwen3-Embedding-0.6B
Authentication	Firebase Auth
Frontend	React 19 + TypeScript
Styling	Tailwind CSS
Animations	Framer Motion
Build Tool	Vite

📝 License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

👤 Author

Muhammad Salman Khan

GitHub: @coder-msk
Email: salmanmarwat06@gmail.com

Version: 1.0 Status: ✅ Production Ready

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
frontend		frontend
screenshots		screenshots
.gitignore		.gitignore
DEPLOY.md		DEPLOY.md
LICENSE		LICENSE
NOTICE		NOTICE
Procfile		Procfile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
netlify.toml		netlify.toml
nixpacks.toml		nixpacks.toml

Folders and files

Latest commit

History

Repository files navigation

RetrievAI — Built for Scalable, Reliable AI Responses

📸 Screenshots

Landing Page

Features & How It Works

Call to Action

Workspace

✨ Features

Core RAG Capabilities

User Features

Developer Features

🚀 Quick Start

Prerequisites

1. Clone and Setup

2. Get Required API Keys

3. Run

📁 Project Structure

⚙️ Configuration

Backend (backend/config.py)

Recommended Models

🔧 API Endpoints

Sessions

Messages

Documents

System

📊 Query Response Format

🚢 Deployment

Docker

Railway / Render

🔒 Security Best Practices

🐛 Troubleshooting

📈 Tech Stack

📝 License

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`backend/config.py`)

Packages