An end-to-end conversational Retrieval-Augmented Generation (RAG) stack that keeps multi-user context, manages memory across sessions, and orchestrates seven specialized LangGraph agents. The stack now ships with a working API, Streamlit UI, uv-powered Docker images, and Gemini-powered embeddings by default.
- LangGraph orchestration with dedicated nodes for query understanding, rewriting, retrieval routing, context synthesis, memory management, summarization, and response shaping.
- Persistent memory via SQLAlchemy (SQLite by default, bring your own PostgreSQL/MySQL via
MEMORY_DB_URL). - Chroma vector store with Gemini
text-embedding-004embeddings—no localtorchdownload required. - Document ingestion pipeline covering PDF (PyMuPDF), Markdown, and HTML with metadata extraction and chunk tracking.
- Full-stack experience: FastAPI backend + Streamlit UI, both dockerized and wired together through Docker Compose.
- Tracing-ready with LangSmith optional toggles per environment.
| Component | Status | Notes |
|---|---|---|
| FastAPI service | ✅ running on :8000 |
Includes /health, /query, /documents/upload, /sessions, /sessions/{id}, /sessions/{id}/close |
| Streamlit UI | ✅ running on :8501 |
Talks to the API over the internal Docker network |
| Docker images | ✅ docker-api + docker-ui |
Built with Python 3.11-slim, uv, and cached .env defaults |
| Embeddings | ✅ Gemini text embeddings | Requires at least one GEMINI_API_KEY_* env var |
| PDF parsing | ✅ PyMuPDF installed | langchain_community loader now finds pymupdf |
etech_conversational_rag/
├── docker/ # Dockerfiles, compose stack, docs
├── sample_data/ # HTML, MD, PDF fixtures for ingestion demos
├── src/ # FastAPI app, LangGraph orchestration, agents, RAG, memory, UI
├── tests/ # Pytest suites for agents, graph, rag, api, memory
├── requirements.txt # Shared runtime + dev dependencies
├── .env # Runtime secrets (Gemini, LangSmith, etc.)
└── docs/ # System architecture and swagger files.
Create a .env in the repo root (Docker copies it during the build). At minimum supply one Gemini key; LangSmith is optional but enabled by default.
# Gemini LLM + embedding keys (round-robin rotation supported)
GEMINI_API_KEY_0=AIza...
GEMINI_API_KEY_1=AIza...
# Optional LangSmith tracing
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_API_KEY=lsv2_...
LANGCHAIN_PROJECT=etech-conversational-rag-dev
# Memory + vector persistence (override for Postgres/MySQL)
MEMORY_DB_URL=sqlite:////app/data/memory/memory.db
CHROMA_PERSIST_DIR=/app/chroma_data
# UI ↔ API wiring
API_BASE_URL=http://api:8000💡 Tip: To rotate keys automatically, add
GEMINI_API_KEY_{0..N}. The built-in KeyManager handles round-robin, least-used, or random strategies.
-
Clone & enter the repo
git clone <repo-url> cd etech_conversational_rag
-
Create
.env(see snippet above). Secrets never ship in Git history. -
Launch the stack
cd docker docker compose up --build- Uses uv for dependency installs (2-3× faster than pip).
- Named volumes persist chat memory (
memory_data), uploads, and Chroma vectors between runs.
-
Visit the services
Service URL Purpose Streamlit UI http://localhost:8501 Upload docs, create sessions, chat
| FastAPI | http://localhost:8000 | Base API (see /health, /query, /documents/upload, /sessions) |
| API Docs | http://localhost:8000/docs | Swagger UI |
- Stop everything with
Ctrl+C, thendocker compose down(add-vto wipe persisted data).
-
Python 3.11 virtual env
python -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install -r requirements.txt -
Env setup – copy
.envor usepython-dotenvCLI. -
Run the API
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
-
Run the Streamlit UI (new terminal, same env)
streamlit run src/ui/streamlit_app.py --server.port 8501
-
Optional: Switch
MEMORY_DB_URLto a managed Postgres instance for multi-host deployments.
- Use the Streamlit uploader or hit the API directly:
curl -X POST http://localhost:8000/documents/upload \ -F "file=@sample_data/md_files/sample_langchain_guide.md" \ -F "document_type=markdown" \ -F "document_name=langchain_guide" \ -F 'metadata={"category":"docs","version":"1.0"}'
- The ingestion pipeline:
- Validates + normalizes file metadata.
- Chunks text (configurable size/overlap).
- Generates Gemini embeddings.
- Persists chunks + metadata to Chroma.
Sample fixtures live in sample_data/{md_files,pdf_files,html_files} for quick demos.
- REST (non-streaming)
curl -X POST http://localhost:8000/api/v1/query \ -H "Content-Type: application/json" \ -d '{ "session_id": "session-abc", "user_id": "user-123", "query": "Summarize LangGraph routing", "include_metadata": true }'
- Streaming endpoints are not yet available on this build; the Streamlit UI hides the toggle (
STREAMING_SUPPORTED=False).
# Run the focused suites that touch recent changes
pytest tests/rag/test_embeddings.py tests/rag/test_vector_store.py
# Or run the full suite with coverage
pytest tests -v --cov=src --cov-report=term-missingAll tests run against lightweight SQLite + in-memory stubs, so no external services are required.
┌────────────────────────────────────────────┐
│ Streamlit UI (chat, uploads) │
└───────────────┬────────────────────────────┘
│ REST (JSON)
┌───────────────▼────────────────────────────┐
│ FastAPI Backend (src/api) │
│ • Session + memory endpoints │
│ • Document ingestion │
│ • Query API │
└───────────────┬────────────────────────────┘
│ invokes LangGraph
┌───────────────▼────────────────────────────┐
│ LangGraph Orchestrator (7 nodes) │
│ query → rewrite → route → synthesize → │
│ memory → summarize → respond │
└───────────────┬────────────────────────────┘
┌────────┴────────┬────────┐
│ │ │
┌──────▼──────┐ ┌───────▼──────┐ ┌───────────────┐
│Memory Store │ │Chroma Vector │ │Gemini LLM/Emb │
│(SQLite/SQL) │ │DB (persisted)│ │API (cloud) │
└─────────────┘ └──────────────┘ └───────────────┘
- Memory Store: SQLAlchemy models for conversations, summaries, and long-term facts. Default DSN is SQLite; override via
MEMORY_DB_URL. - Vector Store: Chroma persisted under
/app/chroma_data(Docker volumechroma_data). - Gemini: Used for both embeddings and generation; swap out by editing
src/rag/embeddings.py&src/config/llm_config.pyif needed.
| Symptom | Fix |
|---|---|
ModuleNotFoundError: pymupdf |
Already bundled in requirements.txt; re-run uv pip install -r requirements.txt or rebuild Docker image. |
401 from Gemini |
Ensure at least one GEMINI_API_KEY_* is valid and not rate-limited; keys rotate automatically but still obey quota. |
| LangSmith warnings | Set LANGCHAIN_TRACING_V2=false or remove LANGCHAIN_API_KEY in .env. |
| Docker build slow | Cached layers depend on .env + requirements.txt. After editing those, the uv install step reruns; otherwise it reuses the cache. |