Minimal RAG Backend (Text Uploads + GPT-5 Reasoning Token Streaming)
A tiny Go backend that:
Ingests text files, stores them, and generates embeddings (Supabase/Postgres vector column).
Performs RAG: retrieves top-K docs and builds a compact prompt for completion.
Exposes a non-streaming completion endpoint (MVP) and a streaming reasoning-tokens API (WebSocket/SSE) for GPT-5.
Features
Text upload (multipart/form-data) → file persisted locally (MVP) and embedded via OpenAI.
Doc listing for a quick sanity check of what’s retrievable.
RAG completion (non-streaming): builds a short prompt from nearest neighbors and returns a single answer.
Reasoning token streaming (design + endpoint): live token feed via WebSocket/SSE for GPT-5 “thinking” stream.
Strict JSON errors, CORS allow-list, timeouts, small input caps.
API
Base path: /
Health Check
GET /health → {"status":"ok"}
Upload a document
POST /upload (multipart/form-data, field: myFile)
Stores file on disk under ./files/
Creates OpenAI embedding
Inserts row in docs table: { name, location, vector }
Response: {"message":"uploaded","name":"...", "location":"files/upload-..."}
MVP accepts text files for embedding. Non-text will be rejected or you’ll need to extract text upstream.
List docs
GET /docs
Response: {"docs":[{"name":"..."}, ...], "count": N} (MVP counts from results to avoid slow exact count.)
RAG completion
POST /completion Content-Type: application/json
{ "prompt": "Your query here" }
Retrieves top-K (MVP: K=3) by cosine similarity from docs.vector
Builds compact prompt and calls OpenAI chat completion
Response: {"response":"final answer text"}
Reasoning tokens (streaming) — design & contract
Quickstart Prereqs
Go 1.22+
Supabase project (or Postgres with Supabase REST adapter)
OpenAI API key
Env
Create .env:
OPENAI_API_KEY=... POSTGRES_URL=... # Supabase URL POSTGRES_KEY=... # Supabase anon/service key (match your SDK usage)
Database (minimal)
Table: docs
create table if not exists docs ( id bigserial primary key, name text not null, location text not null, vector vector(1536) not null -- Adjust dimension to the model you use (e.g., text-embedding-3-small is 1536). );
Run go run .
Server listens on :8090 by default. Dev CORS allow-list includes http://localhost:5173 and "null" (for file:// testing).
Example requests
Upload:
curl -F "myFile=@notes.txt" http://localhost:8090/upload
List:
curl http://localhost:8090/docs
Non-streaming completion:
curl -X POST http://localhost:8090/completion
-H "Content-Type: application/json"
-d '{"prompt":"What does the design say about streaming tokens?"}'
TODOs
High Priority
- Implement streaming
- Deploy to AWS
- Refactor handlers further into helpers
- Better handling of model output
- Chunking
Medium Priority
- Midleware layer to recover errors, implement global logging, etc.
- Add index to database
- Make frontend look nicer
- Authentication/authorization
Low Priority
- Reformat README
- Unit tests