Acuity — Production RAG with Hybrid Search + Evals

Hybrid BM25 + pgvector retrieval, citation-grounded answers, and a 48-question eval harness that gates every prompt change.

What it does

Acuity is an end-to-end RAG over an arXiv corpus. It ingests papers, embeds them with OpenAI text-embedding-3-small into pgvector, indexes content into a Postgres tsvector for BM25, and serves answers through a streaming SSE chat endpoint that fuses both retrievers via reciprocal rank fusion, optionally reranks with a cross-encoder, and grounds every claim with [Sₙ] citation markers back to the source chunk.

A separate /eval pipeline runs a fixed 48-question test set on a schedule, persists run history with P@5, R@5, MRR, faithfulness + four RAGAS metrics, and exposes a 28-day trend chart in the frontend dashboard. CI blocks merges if any metric regresses.

Features

Hybrid retrieval — Postgres tsvector (BM25 via GIN index) ∪ pgvector HNSW, fused with reciprocal rank fusion at k = 60.
Confidence-gated reranking — sentence-transformers ms-marco-MiniLM-L-6-v2 reranks the top-15 candidates; skipped automatically when RRF score is already high.
Streaming citation-grounded generation — sse-starlette streams Claude's response with inline [Sₙ] markers; a post-hoc verifier discards completions whose claims lack a retrieved chunk.
Per-claim faithfulness scoring — entailment scored against the cited chunk; answers below 0.7 auto-retry once with widened k before failing.
Persistent eval harness — 48-question test set, 4 core + 4 RAGAS metrics, run history with git SHA + config snapshot; CI gate on regression.

Screenshots

Stack

Layer	Tech
Backend	Python 3.11, FastAPI, sse-starlette, SQLAlchemy 2 + asyncpg, Alembic, Pydantic 2
Storage	Postgres 16, pgvector (HNSW), Postgres `tsvector` + GIN (BM25)
Retrieval	reciprocal rank fusion (k = 60), sentence-transformers cross-encoder
Generation	Anthropic Claude `sonnet-4-6`, OpenAI `text-embedding-3-small`, tiktoken
Eval	custom + RAGAS metrics, persisted to `eval_runs`, CI gate on regression
Frontend	Next.js 14, TypeScript, Tailwind, Recharts
Ops	Docker Compose, structlog, slowapi rate limiting

Run locally

git clone https://github.com/phantomdev0826/acuity-rag
cd acuity-rag
cp .env.example .env       # add OPENAI_API_KEY + ANTHROPIC_API_KEY
docker compose up -d --build
docker compose exec backend alembic upgrade head
docker compose exec backend python -m scripts.seed_fake      # 3 papers / 12 chunks via OpenAI embeddings
docker compose exec backend python -m scripts.run_eval       # populate eval_runs

Open http://localhost:3000 for the chat UI, http://localhost:3000/eval for the metrics dashboard, http://localhost:8000/docs for the OpenAPI explorer.

Architecture

                    ┌──────────────┐
  user query ──────▶│   /chat      │ ── stream ───────▶ frontend (SSE)
                    └─────┬────────┘
                          │
                  ┌───────┴────────┐
        ┌─────────▼──────┐  ┌──────▼─────────┐
        │ BM25 (Postgres │  │ pgvector ANN   │
        │ GIN/tsvector)  │  │ HNSW           │
        └─────────┬──────┘  └──────┬─────────┘
                  │                │
                  └──── RRF k=60 ──┘
                          │
                ┌─────────▼──────────┐
                │ cross-encoder      │
                │ rerank (gated)     │
                └─────────┬──────────┘
                          │
                ┌─────────▼──────────┐
                │ Claude generation  │
                │ + [Sₙ] markers     │
                └─────────┬──────────┘
                          │
                ┌─────────▼──────────┐
                │ per-claim          │
                │ entailment verifier│   ──── grounded < 0.7 → retry
                └─────────┬──────────┘
                          │
                          ▼
                   final SSE event

Tests

docker compose exec backend pytest

Unit tests for the RRF combiner, the BM25/pgvector adapters, the citation extractor, and the entailment verifier.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
eval		eval
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acuity — Production RAG with Hybrid Search + Evals

What it does

Features

Screenshots

Stack

Run locally

Architecture

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Acuity — Production RAG with Hybrid Search + Evals

What it does

Features

Screenshots

Stack

Run locally

Architecture

Tests

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages