Intelligent Document Retrieval & AI-Powered Answers
A production-ready Retrieval-Augmented Generation (RAG) system that lets you upload documents, ask questions in natural language, and get precise, cited AI-powered answers with confidence scoring.
| Features | How It Works |
|---|---|
![]() |
![]() |
| Document Q&A | General Knowledge |
|---|---|
![]() |
![]() |
- π Multi-Format Documents β PDF, DOCX, TXT, HTML, and images
- π§ Smart Chunking β Context-aware document splitting for optimal retrieval
- π Vector Search β Fast semantic retrieval with ChromaDB
- π¬ Structured Responses β Clean markdown formatting with source citations
- π Confidence Scoring β Know when answers are well-supported by your documents
- π€ Firebase Authentication β Secure user sessions with Google login
- πΎ Session Management β Organize conversations into separate workspaces
- π Pin Messages β Save important responses for quick access
- π Session Cloning β Duplicate and modify conversations
- π₯ Export Chat β Download conversations as markdown
- β±οΈ Temporary Docs β Auto-cleanup of test documents
- π FastAPI Backend β Fast, async Python API
- βοΈ React + TypeScript Frontend β Modern, responsive UI with Framer Motion
- π― Type-Safe β Full TypeScript coverage
- π RESTful API β Clean, documented endpoints
- π¦ Docker Ready β Containerization support
- Python 3.11+
- Node.js 18+
- pnpm
- HuggingFace account with Inference API access
git clone https://github.com/coder-msk/RetrievAI---Built-for-Scalable-Reliable-AI-Responses.git
cd RetrievAI
# Backend setup
cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your HF_TOKEN
# Frontend setup
cd ../frontend
pnpm install
cp .env.example .env
# Edit .env and add your Firebase configHuggingFace Token (Required):
- Go to https://huggingface.co/settings/tokens
- Create a Fine-grained token with "Inference Providers" permission
- Add to
backend/.env:HF_TOKEN=hf_your_token_here
Firebase (Optional, for auth):
- Create project at https://console.firebase.google.com
- Get config from Project Settings β Web App
- Add to
frontend/.env
# Terminal 1 - Backend (port 8000)
cd backend
python app.py
# Terminal 2 - Frontend (port 3000)
cd frontend
pnpm install
pnpm devVisit http://localhost:3000
RetrievAI/
βββ backend/
β βββ app.py # FastAPI main application
β βββ rag_system.py # Core RAG logic
β βββ llm_provider.py # HuggingFace LLM integration
β βββ config.py # Configuration
β βββ vector_store.py # ChromaDB vector database
β βββ document_loader.py # Document processing
β βββ embeddings_provider.py
β βββ db.py # SQLite for metadata
β βββ auth.py # Firebase authentication
β βββ requirements.txt
β
βββ frontend/
β βββ client/
β β βββ src/
β β βββ pages/
β β β βββ LandingPage.tsx
β β β βββ FeaturesPage.tsx
β β β βββ AuthPage.tsx
β β β βββ WorkspacePage.tsx
β β βββ components/
β β βββ hooks/
β β βββ services/
β β βββ types/
β βββ package.json
β βββ vite.config.ts
β
βββ NOTICE # Attribution notice
βββ LICENSE # Apache 2.0
βββ README.md # This file
# Model Selection
LLM_MODEL = "Qwen/Qwen3-4B-Instruct-2507"
EMBEDDING_MODEL = "Alibaba-NLP/Qwen3-Embedding-0.6B"
# Generation Parameters
MAX_TOKENS = 2048 # Response length
TEMPERATURE = 0.3 # Lower = more consistent
# RAG Behavior
TOP_K_RESULTS = 4 # Number of chunks to retrieve
CHUNK_SIZE = 1000 # Document chunk size
CHUNK_OVERLAP = 200 # Overlap between chunks
# Storage
VECTOR_DB_DIR = "./data/chroma"
DB_PATH = "./data/rag.db"| Model | Speed | Quality | Notes |
|---|---|---|---|
Qwen/Qwen2.5-72B-Instruct |
ββ | βββββ | Best quality |
Qwen/Qwen2.5-32B-Instruct |
βββ | βββ | Good balance |
mistralai/Mixtral-8x7B-Instruct-v0.1 |
ββββ | βββ | Fast |
| Method | Endpoint | Description |
|---|---|---|
POST |
/sessions |
Create new chat session |
GET |
/sessions |
List user sessions |
PATCH |
/sessions/{id} |
Rename session |
DELETE |
/sessions/{id} |
Delete session |
POST |
/sessions/{id}/clone |
Clone session |
| Method | Endpoint | Description |
|---|---|---|
GET |
/sessions/{id}/messages |
Get chat history |
POST |
/sessions/{id}/messages |
Send question, get RAG response |
PATCH |
/sessions/{id}/messages/{msg_id}/pin |
Pin/unpin message |
GET |
/sessions/{id}/export |
Export as markdown |
| Method | Endpoint | Description |
|---|---|---|
POST |
/upload |
Upload document (PDF, DOCX, etc.) |
GET |
/documents |
List uploaded documents |
GET |
/documents/{id}/preview |
Preview document |
DELETE |
/documents/{id} |
Delete document |
| Method | Endpoint | Description |
|---|---|---|
GET |
/stats |
System statistics |
GET |
/me |
Current user info |
DELETE |
/clear |
Clear all user data |
{
"question": "What are the HR policies?",
"response": "# Human Resources Policies\n\nThe company provides...",
"sources": [
{"source": "hr_manual.pdf", "chunk": 0, "id": "doc_123"}
],
"num_sources": 1,
"supported_by_documents": true,
"confidence": {
"score": 0.89,
"label": "high"
},
"mode": "hybrid",
"retrieval_ms": 45,
"generation_ms": 3200
}Response Modes:
hybrid(default) β Uses documents if available, falls back to general knowledgestrictβ Only answers from uploaded documents
Confidence Levels:
high(>0.75) β Strong document supportmedium(0.45-0.75) β Moderate supportlow(<0.45) β Weak or no document support
# Backend
FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend/ .
CMD ["python", "app.py"]# Frontend
FROM node:18-alpine
WORKDIR /app
COPY frontend/package.json frontend/pnpm-lock.yaml .
RUN npm install -g pnpm && pnpm install
COPY frontend/ .
RUN pnpm build
CMD ["pnpm", "preview"]- Push to GitHub
- Connect to your platform
- Add environment variables (
HF_TOKEN, Firebase config, etc.) - Deploy
- Never commit
.envfiles β Use.env.exampletemplates - Rotate API keys regularly
- Use Firebase Authentication for production
- Set appropriate CORS origins in
backend/.env - Enable HTTPS in production
- Implement rate limiting for API endpoints
| Issue | Solution |
|---|---|
| Invalid HF token | Get new token from https://huggingface.co/settings/tokens with "Inference API" permission |
| Model not found | Check model name in config.py, wait 20s for model warmup |
| Slow responses | Switch to faster model, reduce MAX_TOKENS, upgrade HF plan |
| Poor quality | Use 7B+ model, increase MAX_TOKENS to 1024+, lower TEMPERATURE |
| Rate limiting | Upgrade to HuggingFace Pro ($9/mo), implement caching |
| Component | Technology |
|---|---|
| Backend API | FastAPI (Python) |
| LLM Provider | HuggingFace Inference API |
| Vector Database | ChromaDB |
| Embeddings | Alibaba-NLP/Qwen3-Embedding-0.6B |
| Authentication | Firebase Auth |
| Frontend | React 19 + TypeScript |
| Styling | Tailwind CSS |
| Animations | Framer Motion |
| Build Tool | Vite |
Copyright 2026 Muhammad Salman Khan
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Muhammad Salman Khan
- GitHub: @coder-msk
- Email: salmanmarwat06@gmail.com
Version: 1.0 Status: β Production Ready





