Quickstart — RAG Support Assistant

The minimum steps to run the service locally and verify it works.

0. Requirements

Python 3.11+ (tested on 3.13)
Docker Desktop (for Postgres + Redis in dev and for regression eval)
~8 GB disk space for embeddings/reranker/cache; explicit Ollama mode requires additional space for models.

Per selected profile:

GraceKelly at D:\GraceKelly\ (port 8011) — default local orchestrator for Claude Sonnet 4.6 / GPT-5 / Gemini via Perplexity Pro.
Ollama (https://ollama.com/download) — for explicit local-first scenario or fallback.
Mistral API key (MISTRAL_API_KEY) — for direct Mistral fast-tier.

1. Dependencies

cd D:\RAG_Support_Assistant
python -m venv .venv
. .venv/Scripts/activate          # Windows PowerShell: . .venv\Scripts\Activate.ps1
pip install --require-hashes -r requirements.lock

2. Configuration

cp .env.example .env              # Windows: copy .env.example .env

Open .env and fill in the required values. Minimal scenarios:

Scenario	Required variables
GraceKelly primary (default)	`GRACEKELLY_BASE_URL=http://127.0.0.1:8011`, `LLM_PROVIDER_PROFILE=gracekelly-primary` is implied
Local-only Ollama	`LLM_PROVIDER_PROFILE=local-first`
+ Mistral fast tier	`MISTRAL_API_KEY=<key>` + `LLM_PROVIDER_PROFILE=external-mistral`
GraceKelly mixed routing (Claude Sonnet 4.6 reasoning)	`MISTRAL_API_KEY=<key>` + `LLM_PROVIDER_PROFILE=gracekelly-mixed` + `GRACEKELLY_REQUEST_TIMEOUT_SEC=120`

Full list of variables — see README.md section Environment Variables.

3. Infrastructure (Postgres + Redis)

For dev — spin up disposable containers:

docker run -d --name rag-postgres -p 5432:5432 \
    -e POSTGRES_USER=rag -e POSTGRES_PASSWORD=rag_dev_password -e POSTGRES_DB=rag_assistant \
    postgres:16-alpine

docker run -d --name rag-redis -p 6379:6379 redis:7-alpine

Then run migrations:

alembic upgrade head

4. Scenario A — GraceKelly primary (default)

# Start GraceKelly in a separate terminal
cd D:\GraceKelly
uvicorn gracekelly.main:create_app --factory --host 127.0.0.1 --port 8011

# Launch RAG Support Assistant
cd D:\RAG_Support_Assistant
python main.py

Open http://localhost:8000/static/login.html (password + SSO) or http://localhost:8000/static/chat.html (chat UI). After login — /agent for the agent copilot dashboard. (legacy / index UI was removed 2026-04-27 — it was unauthenticated, see SESSION-NOTES-2026-04-27.)

gracekelly-primary profile routes fast and strong tiers through the local GraceKelly orchestrator. /api/health/ready checks GraceKelly readiness and does not require Ollama if the active profile does not use Ollama.

5. Scenario B — explicit Local-only Ollama

# Start Ollama and pull models
ollama serve &
ollama pull qwen2.5:7b

# Launch with explicit local-first profile
LLM_PROVIDER_PROFILE=local-first python main.py

6. Scenario C — GraceKelly mixed routing

Useful when you need reasoning quality (Claude Sonnet 4.6) for final answers, but want background processing (classification, grade_docs, verify_facts) handled by fast Mistral API.

Start GraceKelly (separate project):

cd D:\GraceKelly
$env:GRACEKELLY_EXECUTION_PROFILE = "hybrid"
uvicorn gracekelly.main:create_app --factory --host 127.0.0.1 --port 8011

In D:\RAG_Support_Assistant\.env:

MISTRAL_API_KEY=<your-key>
LLM_PROVIDER_PROFILE=gracekelly-mixed
GRACEKELLY_REQUEST_TIMEOUT_SEC=120

Launch RAG:
```
python main.py
```

gracekelly-mixed profile routes fast tier through Mistral API (~1-3s/call), strong tier (final answer) through GraceKelly browser → Perplexity Pro (Claude Sonnet 4.6, ~30-60s/call).

7. Document ingestion and first query

# Document for ingestion (PDF, MD, TXT)
# PowerShell (Windows) — note: curl.exe, not curl (which is the Invoke-WebRequest alias)
curl.exe -X POST http://localhost:8000/api/upload `
    -H "Authorization: Bearer <admin-jwt>" `
    -F "file=@docs/warranty.md"

# Bash (Linux/macOS)
curl -X POST http://localhost:8000/api/upload \
    -H "Authorization: Bearer <admin-jwt>" \
    -F "file=@docs/warranty.md"

# First query
# PowerShell (Windows)
curl.exe -X POST http://localhost:8000/api/ask `
    -H "Authorization: Bearer <admin-jwt>" `
    -H "Content-Type: application/json" `
    -d '{"question":"What is the warranty period?"}'

# Bash (Linux/macOS)
curl -X POST http://localhost:8000/api/ask \
    -H "Authorization: Bearer <admin-jwt>" \
    -H "Content-Type: application/json" \
    -d '{"question":"What is the warranty period?"}'

To get admin JWT for dev: POST /api/auth/login with admin/admin (if ADMIN_PASSWORD_HASH is not set in .env).

8. Health checks

curl http://localhost:8000/api/health/live      # liveness
curl http://localhost:8000/api/health/ready     # readiness (dependencies)
curl http://localhost:8000/api/metrics          # metrics snapshot
curl http://localhost:8000/api/admin/providers  # active routing profile + recent usage (auth)

9. Regression eval

For continuous quality checks against a curated 20-case dataset:

# Mock provider benchmark (no GK, no quota burn)
python scripts/regression_eval.py \
    --baseline ollama-small \
    --candidate mistral-small-latest \
    --max-cases 5 \
    --no-persist

# Live GK mixed routing (requires explicit paid/API opt-in)
python scripts/regression_eval.py \
    --baseline ministral-3b-latest \
    --candidate-profile gracekelly-mixed \
    --max-cases 20 \
    --allow-paid-apis

Without --allow-paid-apis, provider/model targets run in mock-provider-benchmark mode: answers and cost/latency metrics are simulated from evaluation/curated_cases.jsonl, so the command does not call GraceKelly or Mistral and does not persist to the DB when --no-persist is set. Live provider calls require explicit opt-in via --allow-paid-apis.

Results are written to reports/regression/<timestamp>-*.{json,md}. PowerShell wrapper scripts\run_regression_via_gracekelly.ps1 -AllowLive spins up disposable Postgres + Redis + ingestion + regression in one command after explicit live opt-in.

10. Common issues

:::caution[vector store is not initialized] No ingested documents found. Upload via POST /api/upload or run the ingestion script. :::

:::caution[[provider_unavailable]] Circuit breaker is open at the adapter. Wait for cooldown (60s) or reset manually: POST /api/admin/circuit-breaker/reset. :::

:::caution[[model_mismatch] … but UI shows 'Sonar'] Perplexity server-side auto-router replaced the model. This is an external GK error, classified as infrastructure_failure in regression eval. Reconsider the query or try a different phrasing. :::

:::caution[HF_HUB_OFFLINE=1] If HF_HUB_OFFLINE=1 is set in env but BAAI/bge-m3 is not cached — embeddings will fail. Pull the model once with HF_HUB_OFFLINE empty, then re-enable. :::

:::caution[Postgres DuplicateObject on ENUM] Migration 012 is sensitive to duplicate CREATE TYPE. Fixed in d163942; if you hit this — update the branch. :::

11. Where to go next

README.md — full list of env vars + public endpoints + Prometheus metrics.
docs/runbook.md — operational runbook for on-call (alerts, diagnostics, actions).
docs/disaster-recovery.md — DR scenarios A-F (data loss, encryption, encryption-key).
docs/operations/ — runbooks for backup, helm, gracekelly smoke.
docs/CHANGELOG.md — change history by arcs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quickstart — RAG Support Assistant

0. Requirements

1. Dependencies

2. Configuration

3. Infrastructure (Postgres + Redis)

4. Scenario A — GraceKelly primary (default)

5. Scenario B — explicit Local-only Ollama

6. Scenario C — GraceKelly mixed routing

7. Document ingestion and first query

8. Health checks

9. Regression eval

10. Common issues

11. Where to go next

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Quickstart — RAG Support Assistant

0. Requirements

1. Dependencies

2. Configuration

3. Infrastructure (Postgres + Redis)

4. Scenario A — GraceKelly primary (default)

5. Scenario B — explicit Local-only Ollama

6. Scenario C — GraceKelly mixed routing

7. Document ingestion and first query

8. Health checks

9. Regression eval

10. Common issues

11. Where to go next