Retrieval-Augmented Generation (RAG) chat experience built with a FastAPI backend and a React/Vite frontend. Users can ask questions grounded in the project knowledge base (RAG_doc.md). Local sentence-transformer embeddings keep requests free and fast, while OpenAI embeddings remain available when needed.
- Full RAG pipeline: document chunking, hybrid embedding support, ChromaDB vector store, retrieval, prompt assembly, and OpenAI generation.
- Batch embedding generation with async endpoints so long indexing jobs never block the API.
- Detailed logging piped into responses so the UI can show a "thinking" trace.
- SQLite (default) or Postgres chat history storage via SQLAlchemy async sessions.
- Fast buttoned caches (backend LRU + frontend memoization) so repeated questions return instantly.
- Responsive chat interface that stays editable even when a response is pending.
- Automatic fallback to local embeddings whenever the configured OpenAI embedding model is unavailable.
- OpenAI-style chat surface with a bright theme, custom avatars, hover-only scrollbars, cache badges, and a Gemini-like composer pill.
- Sidebar history panel with client-side elastic-search (token scoring + ranking) so users can instantly filter through prior answers without new API calls.
- FastAPI + Uvicorn web service with
/chat,/health,/reindex, and/history. rag_service.pyloadsRAG_doc.md, chunks it, builds embeddings (local or OpenAI), retrieves topkmatches, and prompts OpenAI for grounded answers.vector_store.pywraps a persistent ChromaDB collection stored underbackend/chroma_db.database.py/models.pydefine the async SQLAlchemy engine andChatHistorymodel (defaults tosqlite+aiosqlite:///./chat_history.db)..envcontrols OpenAI keys, embedding providers, DB URLs, etc. (.env.exampledocuments every flag).
- React 18 app bootstrapped with Vite.
src/components/Chat.jsxhandles the conversation flow, optimistic user messages, streaming-style typing indicator, and log viewer.- Axios is used for all HTTP calls; CSS modules keep styling isolated.
- Local cache short-circuits repeated user prompts and the UI uses hover-only scrollbars, custom avatars, and a bright neutral palette inspired by ChatGPT.
- Python 3.10+
- Node.js 18+ (with npm)
- (Optional) OpenAI API key for generation/embeddings when not using local mode
- Git Bash / WSL / PowerShell for running scripts on Windows
cd backend
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS/Linux
pip install -r requirements.txt
cp .env.example .envPopulate .env:
OPENAI_API_KEY=sk-your-key
USE_LOCAL_EMBEDDINGS=true # default; set false to force OpenAI embeddings
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_MODEL=all-MiniLM-L6-v2 # local sentence-transformer slug
DATABASE_URL=sqlite+aiosqlite:///./chat_history.db
OPENAI_MODEL=gpt-4o-mini
RESPONSE_CACHE_SIZE=20 # in-memory cache entries for repeated questionsLeave
USE_LOCAL_EMBEDDINGS=trueif you do not want to spend API credits. When false, ensureOPENAI_EMBEDDING_MODELis one your OpenAI project can access; otherwise the backend will automatically drop back to the local model.
- SQLite (default): Ready out of the box (stored at
backend/chat_history.db). Works great for local development. - Postgres: Set
DATABASE_URL=postgresql+asyncpg://user:pass@host:port/dbname. If your password includes special characters (@,&,^, etc.) percent-encode them. Create the database manually (CREATE DATABASE rag_chat;) before launching the API.
cd frontend
npm install./scripts/start.shStarts the backend on http://localhost:8000 and the Vite dev server on http://localhost:3000.
# Terminal 1
cd backend
venv\Scripts\activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Terminal 2
cd frontend
npm run dev- Local (default): Uses
sentence-transformers(all-MiniLM-L6-v2). No API calls or cost. Requires installing the Python dependencies listed inrequirements.txt. - OpenAI: Set
USE_LOCAL_EMBEDDINGS=falseand provide a supportedOPENAI_EMBEDDING_MODEL. Handy when you want higher-quality embeddings. If OpenAI returns403 model_not_found, the backend logs the failure, switches back to the local model, and continues serving queries.
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Health message. |
GET |
/health |
Returns RAG readiness (chunks loaded, vector DB ready, API key present). |
GET |
/history |
Latest 50 chat exchanges from the async SQL DB (frontend adds elastic-style search locally). |
POST |
/chat |
Runs the full RAG pipeline for the provided message. |
POST |
/reindex |
Clears ChromaDB and rebuilds embeddings from RAG_doc.md. |
Requests to /chat expect { "message": "..." } and respond with { "response": "...", "logs": { steps: [...] } }.
- Load knowledge: Read
RAG_doc.mdand split into overlapping ~1000-character chunks. - Embed: Generate embeddings in batches via sentence-transformers or OpenAI.
- Persist: Save embeddings + metadata in ChromaDB for instant lookups.
- Retrieve: Embed incoming queries and fetch the top
ksimilar chunks. - Augment prompt: Build a contextual prompt that clearly separates context and user question.
- Generate: Call OpenAI's chat completions API (default
gpt-4o-mini) and log each step.
- Edit
RAG_doc.md(keep it concise and factual for better retrieval). - Restart the backend or call
POST /reindexto wipe and rebuild the ChromaDB collection. - Wait for the “Knowledge base initialization complete” log before sending new questions.
Tip: the UI does not reindex automatically. Use a curl command (
curl -X POST http://localhost:8000/reindex) or add a temporary button when needed.
ModuleNotFoundError: sentence-transformers- runpip install -r backend/requirements.txtinside the backend venv.- OpenAI 403 /
model_not_found: either switchUSE_LOCAL_EMBEDDINGS=trueor configureOPENAI_EMBEDDING_MODELto a permitted model; the service will fall back automatically but logs the warning. - Port conflicts: stop existing services or edit the port numbers in
scripts/start.sh. - Frontend build errors: run
npm installfromfrontend/, thennpm run buildornpm run dev. - Cache disabled/too small: set
RESPONSE_CACHE_SIZEin.env(0 disables caching, higher values improve hit rate for repeated questions at the cost of RAM).
DESIGN.md– design tokens, layout rules, and UI conventions (composer sizing, sidebar behavior, etc.).ERRORS.md– mapping between backend error strings and the friendly text shown in the UI.IMPLEMENTATION_PLAN.md– rolling backlog and open technical tasks.FOLDER_STRUCTURE.md– quick refresher on the repo layout.
Demo project for showcasing RAG patterns. Use internally or extend at your own risk.