Skip to content

coder-msk/RetrievAI---Built-for-Scalable-Reliable-AI-Responses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RetrievAI β€” Built for Scalable, Reliable AI Responses

Intelligent Document Retrieval & AI-Powered Answers

A production-ready Retrieval-Augmented Generation (RAG) system that lets you upload documents, ask questions in natural language, and get precise, cited AI-powered answers with confidence scoring.


πŸ“Έ Screenshots

Landing Page

Landing Page β€” Hero Section

Features & How It Works

Features How It Works
Features How It Works

Call to Action

CTA Section

Workspace

Document Q&A General Knowledge
Document Chat RAG Chat

✨ Features

Core RAG Capabilities

  • πŸ“„ Multi-Format Documents β€” PDF, DOCX, TXT, HTML, and images
  • 🧠 Smart Chunking β€” Context-aware document splitting for optimal retrieval
  • πŸ” Vector Search β€” Fast semantic retrieval with ChromaDB
  • πŸ’¬ Structured Responses β€” Clean markdown formatting with source citations
  • πŸ“Š Confidence Scoring β€” Know when answers are well-supported by your documents

User Features

  • πŸ‘€ Firebase Authentication β€” Secure user sessions with Google login
  • πŸ’Ύ Session Management β€” Organize conversations into separate workspaces
  • πŸ“Œ Pin Messages β€” Save important responses for quick access
  • πŸ”„ Session Cloning β€” Duplicate and modify conversations
  • πŸ“₯ Export Chat β€” Download conversations as markdown
  • ⏱️ Temporary Docs β€” Auto-cleanup of test documents

Developer Features

  • πŸš€ FastAPI Backend β€” Fast, async Python API
  • βš›οΈ React + TypeScript Frontend β€” Modern, responsive UI with Framer Motion
  • 🎯 Type-Safe β€” Full TypeScript coverage
  • πŸ”Œ RESTful API β€” Clean, documented endpoints
  • πŸ“¦ Docker Ready β€” Containerization support

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • pnpm
  • HuggingFace account with Inference API access

1. Clone and Setup

git clone https://github.com/coder-msk/RetrievAI---Built-for-Scalable-Reliable-AI-Responses.git
cd RetrievAI

# Backend setup
cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your HF_TOKEN

# Frontend setup
cd ../frontend
pnpm install
cp .env.example .env
# Edit .env and add your Firebase config

2. Get Required API Keys

HuggingFace Token (Required):

  1. Go to https://huggingface.co/settings/tokens
  2. Create a Fine-grained token with "Inference Providers" permission
  3. Add to backend/.env: HF_TOKEN=hf_your_token_here

Firebase (Optional, for auth):

  1. Create project at https://console.firebase.google.com
  2. Get config from Project Settings β†’ Web App
  3. Add to frontend/.env

3. Run

# Terminal 1 - Backend (port 8000)
cd backend
python app.py

# Terminal 2 - Frontend (port 3000)
cd frontend
pnpm install
pnpm dev

Visit http://localhost:3000


πŸ“ Project Structure

RetrievAI/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app.py              # FastAPI main application
β”‚   β”œβ”€β”€ rag_system.py       # Core RAG logic
β”‚   β”œβ”€β”€ llm_provider.py     # HuggingFace LLM integration
β”‚   β”œβ”€β”€ config.py           # Configuration
β”‚   β”œβ”€β”€ vector_store.py     # ChromaDB vector database
β”‚   β”œβ”€β”€ document_loader.py  # Document processing
β”‚   β”œβ”€β”€ embeddings_provider.py
β”‚   β”œβ”€β”€ db.py               # SQLite for metadata
β”‚   β”œβ”€β”€ auth.py             # Firebase authentication
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ client/
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ pages/
β”‚   β”‚       β”‚   β”œβ”€β”€ LandingPage.tsx
β”‚   β”‚       β”‚   β”œβ”€β”€ FeaturesPage.tsx
β”‚   β”‚       β”‚   β”œβ”€β”€ AuthPage.tsx
β”‚   β”‚       β”‚   └── WorkspacePage.tsx
β”‚   β”‚       β”œβ”€β”€ components/
β”‚   β”‚       β”œβ”€β”€ hooks/
β”‚   β”‚       β”œβ”€β”€ services/
β”‚   β”‚       └── types/
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.ts
β”‚
β”œβ”€β”€ NOTICE              # Attribution notice
β”œβ”€β”€ LICENSE             # Apache 2.0
└── README.md           # This file

βš™οΈ Configuration

Backend (backend/config.py)

# Model Selection
LLM_MODEL = "Qwen/Qwen3-4B-Instruct-2507"
EMBEDDING_MODEL = "Alibaba-NLP/Qwen3-Embedding-0.6B"

# Generation Parameters
MAX_TOKENS = 2048      # Response length
TEMPERATURE = 0.3      # Lower = more consistent

# RAG Behavior
TOP_K_RESULTS = 4      # Number of chunks to retrieve
CHUNK_SIZE = 1000      # Document chunk size
CHUNK_OVERLAP = 200    # Overlap between chunks

# Storage
VECTOR_DB_DIR = "./data/chroma"
DB_PATH = "./data/rag.db"

Recommended Models

Model Speed Quality Notes
Qwen/Qwen2.5-72B-Instruct ⭐⭐ ⭐⭐⭐⭐⭐ Best quality
Qwen/Qwen2.5-32B-Instruct ⭐⭐⭐ ⭐⭐⭐ Good balance
mistralai/Mixtral-8x7B-Instruct-v0.1 ⭐⭐⭐⭐ ⭐⭐⭐ Fast

πŸ”§ API Endpoints

Sessions

Method Endpoint Description
POST /sessions Create new chat session
GET /sessions List user sessions
PATCH /sessions/{id} Rename session
DELETE /sessions/{id} Delete session
POST /sessions/{id}/clone Clone session

Messages

Method Endpoint Description
GET /sessions/{id}/messages Get chat history
POST /sessions/{id}/messages Send question, get RAG response
PATCH /sessions/{id}/messages/{msg_id}/pin Pin/unpin message
GET /sessions/{id}/export Export as markdown

Documents

Method Endpoint Description
POST /upload Upload document (PDF, DOCX, etc.)
GET /documents List uploaded documents
GET /documents/{id}/preview Preview document
DELETE /documents/{id} Delete document

System

Method Endpoint Description
GET /stats System statistics
GET /me Current user info
DELETE /clear Clear all user data

πŸ“Š Query Response Format

{
  "question": "What are the HR policies?",
  "response": "# Human Resources Policies\n\nThe company provides...",
  "sources": [
    {"source": "hr_manual.pdf", "chunk": 0, "id": "doc_123"}
  ],
  "num_sources": 1,
  "supported_by_documents": true,
  "confidence": {
    "score": 0.89,
    "label": "high"
  },
  "mode": "hybrid",
  "retrieval_ms": 45,
  "generation_ms": 3200
}

Response Modes:

  • hybrid (default) β€” Uses documents if available, falls back to general knowledge
  • strict β€” Only answers from uploaded documents

Confidence Levels:

  • high (>0.75) β€” Strong document support
  • medium (0.45-0.75) β€” Moderate support
  • low (<0.45) β€” Weak or no document support

🚒 Deployment

Docker

# Backend
FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend/ .
CMD ["python", "app.py"]
# Frontend
FROM node:18-alpine
WORKDIR /app
COPY frontend/package.json frontend/pnpm-lock.yaml .
RUN npm install -g pnpm && pnpm install
COPY frontend/ .
RUN pnpm build
CMD ["pnpm", "preview"]

Railway / Render

  1. Push to GitHub
  2. Connect to your platform
  3. Add environment variables (HF_TOKEN, Firebase config, etc.)
  4. Deploy

πŸ”’ Security Best Practices

  1. Never commit .env files β€” Use .env.example templates
  2. Rotate API keys regularly
  3. Use Firebase Authentication for production
  4. Set appropriate CORS origins in backend/.env
  5. Enable HTTPS in production
  6. Implement rate limiting for API endpoints

πŸ› Troubleshooting

Issue Solution
Invalid HF token Get new token from https://huggingface.co/settings/tokens with "Inference API" permission
Model not found Check model name in config.py, wait 20s for model warmup
Slow responses Switch to faster model, reduce MAX_TOKENS, upgrade HF plan
Poor quality Use 7B+ model, increase MAX_TOKENS to 1024+, lower TEMPERATURE
Rate limiting Upgrade to HuggingFace Pro ($9/mo), implement caching

πŸ“ˆ Tech Stack

Component Technology
Backend API FastAPI (Python)
LLM Provider HuggingFace Inference API
Vector Database ChromaDB
Embeddings Alibaba-NLP/Qwen3-Embedding-0.6B
Authentication Firebase Auth
Frontend React 19 + TypeScript
Styling Tailwind CSS
Animations Framer Motion
Build Tool Vite

πŸ“ License

Copyright 2026 Muhammad Salman Khan

Licensed under the Apache License, Version 2.0. See LICENSE for details.

πŸ‘€ Author

Muhammad Salman Khan


Version: 1.0 Status: βœ… Production Ready

About

RetrievAI: A scalable, full-stack RAG system. Features a Python backend, TypeScript frontend, and optimized vector retrieval for reliable AI-driven responses.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors