Skip to content

Retrieval-Augmented Generation (RAG) chat experience built with a FastAPI backend and a React/Vite frontend. Users can ask questions grounded in the project knowledge base (RAG_doc.md). Local sentence-transformer embeddings keep requests free and fast, while OpenAI embeddings remain available when needed.

Notifications You must be signed in to change notification settings

vikast908/RAG-master

Repository files navigation

RAG Chat Application

Retrieval-Augmented Generation (RAG) chat experience built with a FastAPI backend and a React/Vite frontend. Users can ask questions grounded in the project knowledge base (RAG_doc.md). Local sentence-transformer embeddings keep requests free and fast, while OpenAI embeddings remain available when needed.

Features

  • Full RAG pipeline: document chunking, hybrid embedding support, ChromaDB vector store, retrieval, prompt assembly, and OpenAI generation.
  • Batch embedding generation with async endpoints so long indexing jobs never block the API.
  • Detailed logging piped into responses so the UI can show a "thinking" trace.
  • SQLite (default) or Postgres chat history storage via SQLAlchemy async sessions.
  • Fast buttoned caches (backend LRU + frontend memoization) so repeated questions return instantly.
  • Responsive chat interface that stays editable even when a response is pending.
  • Automatic fallback to local embeddings whenever the configured OpenAI embedding model is unavailable.
  • OpenAI-style chat surface with a bright theme, custom avatars, hover-only scrollbars, cache badges, and a Gemini-like composer pill.
  • Sidebar history panel with client-side elastic-search (token scoring + ranking) so users can instantly filter through prior answers without new API calls.

Architecture Overview

Backend (/backend)

  • FastAPI + Uvicorn web service with /chat, /health, /reindex, and /history.
  • rag_service.py loads RAG_doc.md, chunks it, builds embeddings (local or OpenAI), retrieves top k matches, and prompts OpenAI for grounded answers.
  • vector_store.py wraps a persistent ChromaDB collection stored under backend/chroma_db.
  • database.py / models.py define the async SQLAlchemy engine and ChatHistory model (defaults to sqlite+aiosqlite:///./chat_history.db).
  • .env controls OpenAI keys, embedding providers, DB URLs, etc. (.env.example documents every flag).

Frontend (/frontend)

  • React 18 app bootstrapped with Vite.
  • src/components/Chat.jsx handles the conversation flow, optimistic user messages, streaming-style typing indicator, and log viewer.
  • Axios is used for all HTTP calls; CSS modules keep styling isolated.
  • Local cache short-circuits repeated user prompts and the UI uses hover-only scrollbars, custom avatars, and a bright neutral palette inspired by ChatGPT.

Prerequisites

  • Python 3.10+
  • Node.js 18+ (with npm)
  • (Optional) OpenAI API key for generation/embeddings when not using local mode
  • Git Bash / WSL / PowerShell for running scripts on Windows

Backend Setup

cd backend
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS/Linux
pip install -r requirements.txt
cp .env.example .env

Populate .env:

OPENAI_API_KEY=sk-your-key
USE_LOCAL_EMBEDDINGS=true          # default; set false to force OpenAI embeddings
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_MODEL=all-MiniLM-L6-v2   # local sentence-transformer slug
DATABASE_URL=sqlite+aiosqlite:///./chat_history.db
OPENAI_MODEL=gpt-4o-mini
RESPONSE_CACHE_SIZE=20             # in-memory cache entries for repeated questions

Leave USE_LOCAL_EMBEDDINGS=true if you do not want to spend API credits. When false, ensure OPENAI_EMBEDDING_MODEL is one your OpenAI project can access; otherwise the backend will automatically drop back to the local model.

Database options

  • SQLite (default): Ready out of the box (stored at backend/chat_history.db). Works great for local development.
  • Postgres: Set DATABASE_URL=postgresql+asyncpg://user:pass@host:port/dbname. If your password includes special characters (@, &, ^, etc.) percent-encode them. Create the database manually (CREATE DATABASE rag_chat;) before launching the API.

Frontend Setup

cd frontend
npm install

Running the App

Option A - helper script

./scripts/start.sh

Starts the backend on http://localhost:8000 and the Vite dev server on http://localhost:3000.

Option B - manual

# Terminal 1
cd backend
venv\Scripts\activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Terminal 2
cd frontend
npm run dev

Embedding Modes

  • Local (default): Uses sentence-transformers (all-MiniLM-L6-v2). No API calls or cost. Requires installing the Python dependencies listed in requirements.txt.
  • OpenAI: Set USE_LOCAL_EMBEDDINGS=false and provide a supported OPENAI_EMBEDDING_MODEL. Handy when you want higher-quality embeddings. If OpenAI returns 403 model_not_found, the backend logs the failure, switches back to the local model, and continues serving queries.

API Endpoints

Method Endpoint Description
GET / Health message.
GET /health Returns RAG readiness (chunks loaded, vector DB ready, API key present).
GET /history Latest 50 chat exchanges from the async SQL DB (frontend adds elastic-style search locally).
POST /chat Runs the full RAG pipeline for the provided message.
POST /reindex Clears ChromaDB and rebuilds embeddings from RAG_doc.md.

Requests to /chat expect { "message": "..." } and respond with { "response": "...", "logs": { steps: [...] } }.

RAG Pipeline

  1. Load knowledge: Read RAG_doc.md and split into overlapping ~1000-character chunks.
  2. Embed: Generate embeddings in batches via sentence-transformers or OpenAI.
  3. Persist: Save embeddings + metadata in ChromaDB for instant lookups.
  4. Retrieve: Embed incoming queries and fetch the top k similar chunks.
  5. Augment prompt: Build a contextual prompt that clearly separates context and user question.
  6. Generate: Call OpenAI's chat completions API (default gpt-4o-mini) and log each step.

Updating the Knowledge Base

  1. Edit RAG_doc.md (keep it concise and factual for better retrieval).
  2. Restart the backend or call POST /reindex to wipe and rebuild the ChromaDB collection.
  3. Wait for the “Knowledge base initialization complete” log before sending new questions.

Tip: the UI does not reindex automatically. Use a curl command (curl -X POST http://localhost:8000/reindex) or add a temporary button when needed.

Troubleshooting

  • ModuleNotFoundError: sentence-transformers - run pip install -r backend/requirements.txt inside the backend venv.
  • OpenAI 403 / model_not_found: either switch USE_LOCAL_EMBEDDINGS=true or configure OPENAI_EMBEDDING_MODEL to a permitted model; the service will fall back automatically but logs the warning.
  • Port conflicts: stop existing services or edit the port numbers in scripts/start.sh.
  • Frontend build errors: run npm install from frontend/, then npm run build or npm run dev.
  • Cache disabled/too small: set RESPONSE_CACHE_SIZE in .env (0 disables caching, higher values improve hit rate for repeated questions at the cost of RAM).

Reference Docs

  • DESIGN.md – design tokens, layout rules, and UI conventions (composer sizing, sidebar behavior, etc.).
  • ERRORS.md – mapping between backend error strings and the friendly text shown in the UI.
  • IMPLEMENTATION_PLAN.md – rolling backlog and open technical tasks.
  • FOLDER_STRUCTURE.md – quick refresher on the repo layout.

License

Demo project for showcasing RAG patterns. Use internally or extend at your own risk.

About

Retrieval-Augmented Generation (RAG) chat experience built with a FastAPI backend and a React/Vite frontend. Users can ask questions grounded in the project knowledge base (RAG_doc.md). Local sentence-transformer embeddings keep requests free and fast, while OpenAI embeddings remain available when needed.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published