Docs-AI Minimal

What 🔎

RAG API for document Q&A that ingests PDFs, finds the most relevant chunks, and returns grounded answers with low latency.

How ⚙️

On upload, PDFs are extracted, chunked with overlap, embedded (768-d), and stored in PostgreSQL/pgvector for cosine search; on ask, the query is embedded, top-k chunks are retrieved, and Gemini 1.5 Flash generates a context-bound answer with Redis caching and SQLAlchemy-managed sessions; endpoints: POST /upload, POST /ask.

Why 💡

Cuts manual reading time, keeps answers faithful to your documents, and runs locally with simple ops—fast to prototype, easy to productionize.

Techstack 🧰

FastAPI, LangGraph, Google GenAI Embedding-001 (768), Gemini 1.5 Flash, PostgreSQL + pgvector (IVFFLAT), Redis cache, SQLAlchemy, Docker Compose, .env config (DATABASE_URL, REDIS_URL, GEMINI_API_KEY), batching + exponential backoff, cosine similarity search.

Requirements

Python 3.11+
Docker Desktop
Google AI Studio API key (GEMINI_API_KEY)

Set up

Create and activate virtual environment

python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows PowerShell
source .venv/bin/activate # macOS/Linux

Install dependencies

pip install -r requirements.txt

Set up environment variables Create .env in the repo root:

DATABASE_URL=postgresql+psycopg://postgres:postgres@localhost:5432/postgres
REDIS_URL=redis://localhost:6379
GEMINI_API_KEY=YOUR_GOOGLE_AI_STUDIO_KEY

Start Postgres + Redis

docker compose -f docker-compose.dev.yml up -d

Initialize the database schema:

docker cp infra/sql_init.sql docs-ai-minimal-pg-1:/sql_init.sql
docker exec -it docs-ai-minimal-pg-1 psql -U postgres -f /sql_init.sql

Run the API server

python -m uvicorn api.main:app --reload

Usage

Upload a PDF

curl -X POST -F "file=@data/examplepdf.pdf" http://localhost:8000/upload

Ask a question

curl -X POST http://localhost:8000/ask -H "Content-Type: application/json" -d "{\"question\":\"Summarize the document in 2 sentences.\"}"

How It Works

/upload → Extracts PDF text, chunks it (with overlap), generates embeddings via google-genai (output_dimensionality=768), stores in Postgres pgvector.
/ask → Embeds the query, searches similar chunks in pgvector, sends top results to Gemini 1.5 Flash for the answer.

Chunking config (in core/ingest.py):

def chunk(text: str, max_chars=2400, overlap=150):
    ...

max_chars → chunk size
overlap → shared characters between chunks

Reset Local Data

This removes all DB + Redis data.

docker compose -f docker-compose.dev.yml down -v

Code Architecture & Functionality

High-level flow

POST /upload → extract PDF text → chunk → embed chunks (batching + backoff) → store text + vectors in Postgres (pgvector).
POST /ask → embed query → cosine search in pgvector → pass top docs as context to Gemini → return grounded answer.

Directories

api/
- main.py – FastAPI app + router registration.
- routes.py – HTTP endpoints: /healthz, /upload, /ask.
core/
- ingest.py – PDF → text, chunking, calls embed_docs() and persists rows.
- llm.py – Text generation (Gemini 1.5 Flash) and embeddings (google‑genai) with batching + retry/backoff.
- workflows.py – Minimal RAG pipeline (retrieve → generate) built with LangGraph.
db/
- session.py – SQLAlchemy engine/session from DATABASE_URL.
- pg.py – insert_embeddings() and similarity_search() with pgvector.
- redis.py – tiny JSON cache for retrieved docs.
infra/
- sql_init.sql – pgvector schema (768‑d vectors + IVFFLAT index).

Endpoints

GET /healthz – liveness probe.
POST /upload (multipart form) – field file (PDF). Returns {status, chunks}.
POST /ask (JSON) – body {"question": "..."}. Returns {answer}.

Chunking (core/ingest.py)

chunk(text, max_chars=2400, overlap=150) → overlap preserves context across boundaries.

Embeddings (core/llm.py)

Model: models/embedding-001 with output_dimensionality=768 (matches vector(768)).
Task pairing: docs → RETRIEVAL_DOCUMENT, queries → RETRIEVAL_QUERY.
Batching: default EMBED_BATCH_SIZE=24 (env override) with exponential backoff on 429/5xx.

Retrieval (db/pg.py)

Cosine distance: ORDER BY (e.embedding <=> :query_vec).
Returns top‑k chunk texts to the generator.

Generation (core/llm.py → generate())

Model: gemini-1.5-flash.
Prompt instructs “answer strictly from provided context; say you can’t find it if missing”.

Config

.env – DATABASE_URL, REDIS_URL, GEMINI_API_KEY.
(Optional) EMBED_BATCH_SIZE – tune request burst to avoid throttling.
Adjust max_chars/overlap in core/ingest.py for ingestion speed vs. recall.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docs-AI Minimal

What 🔎

How ⚙️

Why 💡

Techstack 🧰

Requirements

Set up

Usage

How It Works

Reset Local Data

Code Architecture & Functionality

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
api		api
core		core
data		data
db		db
docs		docs
infra		infra
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Docs-AI Minimal

What 🔎

How ⚙️

Why 💡

Techstack 🧰

Requirements

Set up

Usage

How It Works

Reset Local Data

Code Architecture & Functionality

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages