Turn unstructured documents into schema-consistent knowledge graphs. Explore them visually. Ask grounded questions. Evaluate what to trust.
OntographRAG is an ontology-guided KG-RAG system for document intelligence. It builds Neo4j-backed knowledge graphs from raw text, retrieves over both graph structure and chunk vectors, and exposes answer-grounding and uncertainty signals for downstream use.
The project is organized around three interactive workflows plus one evaluation pipeline:
- Ingest: turn documents or benchmark corpora into named knowledge graphs.
- Explore: inspect entities, relationships, provenance, and graph structure.
- Ask: query the active graph with grounded RAG.
- Evaluate: benchmark KG-RAG against vanilla RAG and compare uncertainty measures from the CLI.
Works across domains such as biomedical literature, legal documents, financial reports, and technical manuals. It is especially useful when schema consistency matters across many documents.
From a source checkout, the most reliable form is:
.venv/bin/python -m ontographrag.cli ...
If you want the shorter ontograph ... command, run uv sync after pulling the latest changes so the console script is installed into your virtualenv.
# 1. Clone and install
git clone https://github.com/julka01/OntographRAG.git
cd OntographRAG
uv sync
source .venv/bin/activate
# 2. Build the React frontend (required once after clone or after frontend changes)
cd frontend && npm install && npm run build && cd ..
# 3. Start Neo4j
docker compose up -d neo4j
# 4. Check readiness
.venv/bin/python -m ontographrag.cli doctor
# 5. Start the app
.venv/bin/python -m ontographrag.cli serve
# or directly:
.venv/bin/uvicorn ontographrag.api.app:app --host 0.0.0.0 --port 8000
# 6. Open the GUI
# → http://localhost:8000Happy path in the app:
- select a file
- optionally attach an ontology
- create a named KG
- inspect the graph
- ask questions against the active KG
# 1. Clone and install
git clone https://github.com/julka01/OntographRAG.git
cd OntographRAG
uv sync
source .venv/bin/activate
docker compose up -d neo4j
# 2. Download benchmark datasets (see exact paths below)
mkdir -p MIRAGE/rawdata/{pubmedqa/data,hotpotqa,2wikimultihopqa,musique,multihoprag,realmedqa,bioasq/Task10BGoldenEnriched}
# PubMedQA — https://github.com/pubmedqa/pubmedqa
# Download test_set.json from the repo and place at:
# MIRAGE/rawdata/pubmedqa/data/test_set.json
# HotpotQA — https://hotpotqa.github.io/
wget http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_fullwiki_v1.json \
-O MIRAGE/rawdata/hotpotqa/hotpot_dev_fullwiki_v1.json
# 2WikiMultiHopQA — https://github.com/Alab-NII/2wikimultihop
# Download dev.json from the GitHub release and place at:
# MIRAGE/rawdata/2wikimultihopqa/dev.json
# MuSiQue — https://github.com/StonyBrookNLP/musique
# Download musique_ans_v1.0_dev.jsonl from the GitHub release and place at:
# MIRAGE/rawdata/musique/musique_ans_v1.0_dev.jsonl
# MultiHopRAG — https://github.com/yixuantt/MultiHop-RAG
# Download MultiHopRAG.json and corpus.json and place at:
# MIRAGE/rawdata/multihoprag/MultiHopRAG.json
# MIRAGE/rawdata/multihoprag/corpus.json
# RealMedQA — https://huggingface.co/datasets/k2141255/RealMedQA
# Download and place at: MIRAGE/rawdata/realmedqa/RealMedQA.json
# BioASQ — http://bioasq.org/participate/challenges (free registration required)
# Download Task10BGoldenEnriched and place at:
# MIRAGE/rawdata/bioasq/Task10BGoldenEnriched/10B1_golden.json
# Then build the shared PubMed abstract corpus:
python experiments/prepare_bioasq_corpus.py \
--bioasq-path MIRAGE/rawdata/bioasq/Task10BGoldenEnriched/10B1_golden.json \
--output MIRAGE/rawdata/bioasq/pubmed_abstracts.jsonl \
--email you@example.com \
--verbose
# 3. Run a cheap smoke test
python experiments/experiment.py \
--datasets hotpotqa --num-samples 30 --subset-seed 42 --rebuild-kg --evaluation-mode accuracy_only
# 4. Run a full metric pass (paper configuration: seed 42, n=100)
python experiments/experiment.py \
--datasets hotpotqa 2wikimultihopqa musique pubmedqa multihoprag \
--num-samples 100 --subset-seed 42 --rebuild-kg --evaluation-mode full_metrics \
--llm-provider openrouter --llm-model openai/gpt-4o-mini \
--retrieval-temperature-values 0.0
# 5. Add BioASQ
python experiments/experiment.py \
--datasets bioasq --num-samples 100 --subset-seed 42 --rebuild-kg --evaluation-mode full_metrics \
--llm-provider openrouter --llm-model openai/gpt-4o-mini \
--retrieval-temperature-values 0.0Most GraphRAG tools (including Microsoft's GraphRAG) let an LLM freely decide what to extract, which often leads to type drift, duplicate entities, and schema inconsistency across documents. OntographRAG takes the opposite approach: you define the schema, the system respects it.
| OntographRAG | Microsoft GraphRAG | |
|---|---|---|
| Schema control | Bring your own OWL/RDF/JSON ontology; extraction is constrained to your types | LLM decides freely; no schema enforcement |
| Graph storage | Neo4j with named KGs, Cypher, vector indexes, and provenance | Parquet files in a local directory |
| Retrieval | Routed hybrid: entity-first linking, retriever-first graph expansion, vector fallback, and evidence organization | Community summarisation or entity search |
| Trust signals | App surfaces Structural and Grounding support; evaluation computes the full uncertainty suite | None by default |
| Interfaces | Web UI, REST API, CLI, experiments | CLI + Python library |
# Ingest a document into a running server
.venv/bin/python -m ontographrag.cli ingest report.pdf --kg-name demo-kg
# Explore the available graphs
.venv/bin/python -m ontographrag.cli explore list
.venv/bin/python -m ontographrag.cli explore show demo-kg
# Ask a grounded question
.venv/bin/python -m ontographrag.cli ask "What are the main findings?" --kg-name demo-kg
# Evaluate benchmark runs
.venv/bin/python -m ontographrag.cli evaluate --datasets hotpotqa --num-samples 30 --subset-seed 42Supply a clinical ontology (SNOMED CT, ICD-10, HPO) and process patient notes, discharge summaries, or EHR exports in bulk. Because every patient's data is extracted into the same schema, the whole population becomes queryable as a single graph:
MATCH (p:Patient)-[:HAS_DIAGNOSIS]->(d:Diagnosis {name: "Hypertension"})
-[:CO_OCCURS_WITH]->(c:Diagnosis)
WHERE p.age > 60
RETURN c.name, count(*) AS frequency ORDER BY frequency DESCProcess a corpus of papers, extract entities and relationships consistently across documents, and ask cross-paper questions that single documents cannot answer alone.
Legal (case law entities), finance (company relationships), engineering (component hierarchies). Supply the domain ontology; OntographRAG handles the rest.
Supply a .owl / .rdf / .ttl ontology file and every extracted entity and relationship is validated against your schema. The same document processed twice produces the same graph shape. Across a corpus of documents, every entity lands in the same type hierarchy — enabling aggregation, comparison, and population-level queries that are impossible with free-form extraction.
This is the core differentiator. Without schema enforcement, LLMs produce synonym explosion ("myocardial infarction", "heart attack", "MI", "AMI" as four separate nodes), type drift (the same concept classified differently across documents), and graphs that can't be meaningfully queried at scale. The ontology collapses all of this into a consistent, traversable structure.
Without an ontology, extraction still works — the LLM infers types — but schema-constrained extraction is what unlocks population-level reasoning.
Graphs are persisted in Neo4j with:
- Vector indexes (384-dim
all-MiniLM-L6-v2by default) for semantic search over chunks - Named KGs — multiple independent graphs in one database, scoped by name tag
- Full Cypher access — query or extend the graph with any Cypher statement
Queries do not rely on one brittle retrieval path. OntographRAG now uses a routed KG-RAG stack:
- Entity-first retrieval: symbolic matching plus per-entity ANN over entity embeddings
- Graph expansion: question-local traversal with provenance-aware edges
- PPR-style scoring: chunks are ranked by support flowing through the local entity subgraph, not only by hop count
- Retriever-first graph expansion: when entity anchoring is weak, dense passage retrieval seeds the graph instead
- Vector fallback: if graph signal is weak, the system falls back cleanly to vector retrieval rather than forcing a bad subgraph
This makes the retriever much closer to recent strong GraphRAG systems while preserving a single shared interface for vanilla RAG and KG-RAG.
Retrieved graph paths and supporting passages are organized into explicit reasoning chains before generation. In the app, the chat view surfaces two trust signals that are easy to interpret in practice:
- Structural: whether the answer is supported by graph paths
- Grounding: whether the retrieved evidence actually grounded the question
The live UI deliberately keeps these signals simple. The full uncertainty suite remains available in the evaluation pipeline.
A dedicated evaluation pipeline (experiments/uncertainty_metrics.py) computes uncertainty metrics per answer in three families. The core challenge: standard output-variance metrics collapse under KG-RAG because deterministic graph retrieval gives every sample identical context → identical outputs → artificially low variance. The structural and grounding families are designed to remain discriminative in this regime.
| Metric | Formula | Ref |
|---|---|---|
semantic_entropy |
NLI-cluster N responses with DeBERTa; |
Farquhar et al., Nature 2024 |
discrete_semantic_entropy |
Same clustering; |
Farquhar et al., Nature 2024 |
p_true |
Fraction of samples in the same NLI cluster as the most probable response; |
Farquhar et al., Nature 2024 |
selfcheckgpt |
Pairwise NLI contradiction rate: contradictions / (2 × pairs) across all response pairs | Manakul et al., EMNLP 2023 |
sre_uq |
KME = weighted mean response embedding; Gaussian kernel |
Vipulanandan et al., ICLR 2026 |
vn_entropy ⭐ |
L2-normalise embeddings |
This work |
sd_uq ⭐ |
Gram-Schmidt: |
This work |
vn_entropy is a soft, parameter-free analogue of semantic entropy (no NLI, no threshold). sd_uq extends it by conditioning out the question direction, estimating
No LLM sampling required. Path queries filtered to edges with confidence ≥ 0.4. Immune to context determinism.
| Metric | Formula | Intuition |
|---|---|---|
graph_path_support (GPS) ⭐ |
Find Q-entities and A-entities by name; per-entity reachability query ≤ 3 hops; |
0 = KG fully supports the answer path; 1 = answer has no structural grounding. Single-sample metric, does not collapse under context determinism. |
NLI between retrieved chunks and the generated answer. Works even when all N samples receive identical context.
| Metric | Formula | Intuition |
|---|---|---|
support_entailment_uncertainty (SEU) ⭐ |
DeBERTa NLI(chunk → answer) per chunk; |
0 = all chunks entail the answer; 0.5 = neutral; 1 = all chunks contradict. Key signal for abstentions and wrong-hop answers on multi-hop benchmarks. |
evidence_conflict_uncertainty (ECU) ⭐ |
Count entail-contradict chunk pairs; |
Variance complement to SEU. High = some chunks support while others contradict — genuine evidentiary conflict. Particularly diagnostic on multi-hop questions. |
After each run the pipeline computes per metric:
- AUROC — discriminates correct from incorrect answers using the metric as a score. Higher = better (0.5 = random, 1.0 = perfect).
- AUREC — Area Under the Rejection-Error Curve. Reject most uncertain questions first; measure error rate on retained questions at each rejection level. Lower = better.
⭐ = novel contribution of this work.
Every endpoint accepts a provider + model pair. Supported providers: OpenRouter (free tier available), OpenAI, Google Gemini, Ollama (local), DeepSeek, HuggingFace. Switch model per request with no code changes.
- Quick Start
- Workflow Cheatsheet
- Setup
- Web UI
- API Reference
- Experiments
- Architecture
- Utility scripts
- Docker
- Configuration
- Python 3.11+
- Node.js 18+ and npm (for the React frontend)
- Neo4j 5.0+ (via Docker or local install)
- 8 GB RAM minimum (16 GB recommended for large documents)
# Python dependencies
uv sync # recommended
# or
pip install -r requirements.txt
# React frontend (required — the backend serves the built assets)
cd frontend && npm install && npm run build && cd ..Copy .env.example to .env and fill in your values:
# ── Neo4j ───────────────────────────────────────────────────────────────────
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j
# ── LLM providers (at least one required) ───────────────────────────────────
OPENROUTER_API_KEY=your-key # free-tier models available
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
DEEPSEEK_API_KEY=...
HF_API_TOKEN=...
OLLAMA_HOST=http://localhost:11434
# ── Embeddings ───────────────────────────────────────────────────────────────
EMBEDDING_PROVIDER=sentence_transformers # recommended runtime default; or openai
# ── Weights & Biases (optional — experiment tracking) ────────────────────────
WANDB_API_KEY= # if set, experiment runs are logged to W&B automatically
WANDB_PROJECT=ontographrag # default project name
# ── Security (production) ────────────────────────────────────────────────────
APP_API_KEY= # set to enforce API key auth on all endpoints
ALLOWED_ORIGINS=* # comma-separated origins for CORS
# ── Server ───────────────────────────────────────────────────────────────────
LOG_LEVEL=INFO
LLM_TIMEOUT_SECONDS=120Chunking, retrieval thresholds, and benchmark sweeps are now controlled primarily by constructor defaults and CLI flags rather than by top-level environment variables. For the live benchmark flags, use experiments/README.md as the source of truth.
The web interface is served at http://localhost:8000. It is a React + TypeScript single-page app (frontend/) built with Vite. Run cd frontend && npm install && npm run build once after clone (or after any frontend changes) — the backend serves the built assets from frontend/dist/.
- Build KG — upload a document (PDF, TXT, CSV, JSON, XML ≤ 50 MB), choose provider/model, optionally attach an ontology file. Extraction progress streams to the UI in real time via SSE and the graph loads automatically on completion.
- Graph visualisation — interactive force-directed network. Node size scales with degree. Click a node to open its detail panel (type, properties, connected nodes).
- Search — dims non-matching nodes rather than hiding them; shows match count.
- Filter — per-type checkboxes with node/edge counts.
- Named KG management — create, list, and switch between multiple saved graphs.
- Ask questions against the active knowledge graph; answers cite source chunks.
- Trust pills — the chat surface exposes two lightweight support signals inline with each response:
- Structural: graph-path support for the answer
- Grounding: how well the retrieved evidence supports the question
- Chat history persisted in
localStorage. - Highlighted nodes — entities used in the answer are highlighted in the graph.
- Thinking indicator while waiting for the LLM response.
Server runs on port 8000. Interactive docs at http://localhost:8000/docs.
Authentication: set
APP_API_KEYin.envto requireX-API-Key: <key>on all requests. Unset = open (development mode).
| Method | Endpoint | Description |
|---|---|---|
POST |
/create_ontology_guided_kg |
Build an ontology-guided KG from a file upload |
POST |
/extract_graph |
Extract a raw KG (no ontology) from a file |
POST |
/load_kg_from_file |
Load a graph from file into Neo4j |
GET |
/kg_progress_stream |
SSE stream of KG build progress |
Multipart form:
| Field | Type | Default | Description |
|---|---|---|---|
file |
file | required | Document (PDF/TXT/CSV/JSON/XML, ≤ 50 MB) |
provider |
string | openai |
LLM provider |
model |
string | gpt-3.5-turbo |
Model name |
embedding_model |
string | sentence_transformers |
Embedding backend for chunks and entities |
ontology_file |
file | optional | Custom ontology (.owl/.rdf/.ttl/.xml) |
max_chunks |
int | optional | Max text chunks to process (1..500) |
kg_name |
string | optional | Name tag for the resulting KG |
enable_coreference_resolution |
bool | false |
Optional build-time coreference pass |
Response:
{
"kg_id": "uuid",
"kg_name": "my-kg",
"graph_data": { "nodes": [...], "relationships": [...] },
"method": "ontology_guided"
}Server-Sent Events. Connect with EventSource:
data: {"line": "✓ Extracted 42 entities from chunk 3/10"}
data: {"done": true}
| Method | Endpoint | Description |
|---|---|---|
POST |
/kg/create |
Create a named KG record |
GET |
/kg/list |
List all KGs with document counts |
GET |
/kg/{kg_name} |
Stats for a specific KG |
DELETE |
/kg/{kg_name} |
Delete a KG |
GET |
/kg/{kg_name}/entities |
List entities in a KG |
| Method | Endpoint | Description |
|---|---|---|
POST |
/save_kg_to_neo4j |
Persist an in-memory KG to Neo4j |
POST |
/load_kg_from_neo4j |
Load a KG from Neo4j by name |
POST |
/clear_kg |
Delete all nodes and relationships |
GET |
/health/neo4j |
Connectivity check |
Rate limited: 30 requests/minute per IP.
JSON body:
| Field | Type | Default | Description |
|---|---|---|---|
question |
string | required | Question to answer (max 4096 chars) |
provider_rag |
string | openrouter |
LLM provider |
model_rag |
string | openai/gpt-oss-120b:free |
Model name |
kg_name |
string | optional | Restrict retrieval to a specific KG |
document_names |
string[] | [] |
Restrict to specific documents |
session_id |
string | default_session |
Session identifier |
Response:
{
"session_id": "default_session",
"message": "...",
"info": {
"sources": ["chunk_id_1"],
"model": "openai/gpt-oss-120b:free",
"chunk_count": 5,
"entity_count": 12,
"relationship_count": 8,
"confidence": 0.87,
"kg_confidence": 0.74,
"structural_support": 0.74,
"grounding_support": 0.81,
"guardrail": {},
"entities": { "used_entities": [...] }
}
}The UI uses structural_support and grounding_support as the main trust signals. The older confidence field is still returned for compatibility, but it is not the primary app-facing summary anymore.
| Method | Endpoint | Description |
|---|---|---|
POST |
/validate_csv |
Validate a CSV before bulk processing |
POST |
/bulk_process_csv |
Build KGs from all rows of a CSV |
| Field | Type | Default | Description |
|---|---|---|---|
file |
file | required | CSV file |
provider |
string | openai |
LLM provider |
model |
string | gpt-3.5-turbo |
LLM model |
text_column |
string | full_report_text |
Column containing the text to process |
id_column |
string | optional | Column to use as document ID |
start_row |
int | 0 |
First row to process |
batch_size |
int | 50 |
Rows per batch |
GET /models/{provider} — lists available models for a provider.
# Build a KG with ontology
curl -X POST http://localhost:8000/create_ontology_guided_kg \
-F "file=@document.pdf" \
-F "provider=openrouter" \
-F "model=openai/gpt-4o-mini" \
-F "ontology_file=@schema.owl" \
-F "kg_name=my-kg"
# Ask a question
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"question": "What are the main concepts?", "kg_name": "my-kg", "provider_rag": "openrouter", "model_rag": "openai/gpt-4o-mini"}'
# Stream build progress
curl -N http://localhost:8000/kg_progress_stream
# List KGs
curl http://localhost:8000/kg/list
# Health check
curl http://localhost:8000/health/neo4jThe experiments/ directory runs the current benchmark pipeline for vanilla RAG vs KG-RAG across biomedical and multi-hop QA datasets. It now uses seeded deterministic subsets, dataset-scoped KGs, official-style answer EM/F1 where supported, and the current 15-metric uncertainty suite. See experiments/README.md for the live flag list and dataset caveats.
Weights & Biases integration — if WANDB_API_KEY is set in the environment, each run is automatically logged to W&B with per-question tables, per-metric AUROC/AUREC scores, and run metadata (dataset, model, seed, evaluation mode). No flags required; set the key and it activates. Results are also always written locally under results/runs/<run_id>/ regardless of W&B.
# 30-question smoke test
python experiments/experiment.py \
--datasets hotpotqa \
--num-samples 30 \
--subset-seed 42 \
--rebuild-kg \
--evaluation-mode accuracy_only
# Full uncertainty pass (paper configuration: seed 42, n=100)
python experiments/experiment.py \
--datasets hotpotqa 2wikimultihopqa musique pubmedqa multihoprag \
--num-samples 100 \
--subset-seed 42 \
--rebuild-kg \
--evaluation-mode full_metrics \
--retrieval-temperature-values 0.0| Dataset | Task | Download |
|---|---|---|
pubmedqa |
Biomedical yes/no/maybe over source abstracts | pubmedqa/pubmedqa |
realmedqa |
Clinical recommendation QA over NICE guidance | RealMedQA on Hugging Face |
hotpotqa |
Multi-hop Wikipedia QA | HotpotQA |
2wikimultihopqa |
Multi-hop Wikipedia QA | 2WikiMultiHopQA |
musique |
Compositional multi-hop QA | MuSiQue |
multihoprag |
Multi-hop RAG benchmark with shared corpus | MultiHop-RAG |
bioasq |
Biomedical factoid / yes-no QA | bioasq.org (free registration; needs shared-corpus prep) |
Place downloaded files under MIRAGE/rawdata/ — see experiments/README.md for exact paths.
| Flag | Default | Description |
|---|---|---|
--num-samples |
all | Questions per dataset |
--subset-seed |
42 |
Deterministic question-subset seed |
--entropy-samples |
5 |
Responses per question for uncertainty metrics |
--similarity-thresholds |
[0.1] |
Cosine similarity cutoffs to sweep |
--max-chunks-values |
[10] |
Retrieved chunk counts to sweep |
--llm-provider |
openai |
LLM provider |
--llm-model |
gpt-4o-mini |
Model |
--datasets |
pubmedqa bioasq |
Datasets to run |
--rebuild-kg |
False |
Rebuild the dataset KG |
--max-kg-contexts |
unset | Cap the passages indexed into the KG build |
--dataset-kg-scope |
evaluation_subset |
Build KG from the selected subset or the full normalized dataset |
--allow-gold-evidence-contexts |
False |
Controlled-evidence mode only; bypass corpus-safety guardrails |
--no-llm-judge |
False |
Disable LLM-as-judge and use heuristic matching only |
--judge-provider |
generation provider | Separate provider for the correctness judge |
--judge-model |
generation model | Separate model for the correctness judge |
--temperature |
1.0 |
Generation temperature for uncertainty sampling |
--retrieval-temperature-values |
[0.0] |
Final-stage retrieval sampling temperature sweep |
--retrieval-shortlist-factor |
4 |
Overfetch factor for retrieval-temperature sampling |
--multi-temperature |
False |
Also run T=0, 0.5, 1.0 output-side sweeps |
--evaluation-mode |
full_metrics |
accuracy_only or full_metrics |
Run artifacts are written under results/runs/<run_id>/ and checkpoints under results/checkpoints/.
- Adjacent chunk expansion — when retrieval uses the
retrieval_vectorindex, seed element IDs are resolved to their parentChunkbefore expanding to positional neighbours, so answers split across chunk boundaries are correctly reassembled. - Confidence-aware graph filtering — traversal queries and PPR subgraph fetch now apply
coalesce(r.confidence, 1.0) >= 0.4to skip low-confidence edges extracted during KG build.
- GPS (Graph Path Support) — switched from full
[*1..N]path enumeration (times out on dense graphs) to one query per answer entity withLIMIT 1, preserving the confidence filter thatshortestPathcannot support. GPS now correctly returns non-zero values when answer entities are not reachable via high-confidence paths. - SEU (Support Entailment Uncertainty) — now computed even when generation failed: retrieved chunks still exist and are evaluated against the expected answer as hypothesis. This turns SEU into a signal for why the model abstained (context didn't entail the answer vs. other failure modes). ECU receives the same fix.
- SPS (Subgraph Perturbation Stability) — entity caps tightened from 20 to 5 per query with 20 s timeout to prevent silent hangs.
- Ingest — file uploaded; PDF text extracted via PyMuPDF, plaintext decoded
- Chunk — deterministic overlapping text windows are created for passage-level extraction
- Ontology load — custom
.owl/.ttlparsed (owlready2), or free-form extraction if none supplied - LLM extraction — each chunk is processed with an ontology-constrained prompt; entities and relationships are returned as structured JSON
- Cross-chunk extraction — adjacent chunk pairs get a second pass for span-overflow relations that would otherwise be missed
- Entity harmonization — duplicate and synonym entities are merged; alias surfaces are retained in
synonyms; the most specific compatible type wins - Relationship provenance — edges are stamped with chunk-position, passage, and question-local provenance so later retrieval can stay passage-local when needed
- Specificity stats — entities receive
passage_countandnode_specificity = 1 / passage_countso generic hubs can be down-weighted at retrieval time - Embed — chunks and entities are embedded; entity vectors are name-centered so short query mentions align cleanly at ANN lookup time
- Write — nodes, relationships, embeddings, and provenance are stored in Neo4j; entities are tagged by
kgName; progress streams via SSE
- Entity-first seeding — when enabled, the system extracts named mentions from the question, runs symbolic alias matching plus per-entity ANN, and anchors retrieval on those entity seeds
- Question-local traversal — provenance-aware graph traversal keeps path hops local to the current KG scope and, for bundle-style benchmarks, to the current question bundle
- Graph scoring — local entity neighborhoods are ranked with PPR-style support flow rather than a fixed hop table alone
- Retriever-first graph expansion — if entity anchoring is weak, dense passage retrieval seeds a second graph-expansion pass from chunk-linked entities
- Fallback retrieval — if graph signal stays weak, the system falls back to vector retrieval and then text search rather than forcing a brittle graph answer
- Evidence organization — graph paths and supporting passages are grouped into chain-style evidence blocks before generation
- Answer synthesis — the LLM answers from the evidence block, while the app surfaces simplified Structural and Grounding support signals
frontend/ # React + TypeScript web UI (Vite)
├── src/
│ ├── components/ # Chat, graph, KG build, layout UI components
│ ├── hooks/ # useChat, useGraph, useModels, useHealth, ...
│ ├── context/ # AppContext, ThemeContext
│ └── types/ # Shared TypeScript types
└── dist/ # Built assets served by FastAPI (git-ignored; run npm run build)
ontographrag/
├── api/
│ └── app.py # FastAPI application, all endpoints; serves frontend/dist/
├── kg/
│ ├── builders/
│ │ ├── ontology_guided_kg_creator.py # OntologyGuidedKGCreator — core extraction, harmonization, Neo4j write
│ │ └── enhanced_kg_creator.py # UnifiedOntologyGuidedKGCreator — API-facing wrapper + CSV bulk ops
│ ├── chunking.py # Hierarchical chunking (large extraction chunks + small retrieval sub-chunks)
│ └── utils/
│ ├── common_functions.py # Shared helpers (embedding, text normalization)
│ └── constants.py # Default values and Neo4j label constants
├── rag/
│ ├── systems/
│ │ ├── enhanced_rag_system.py # KG-RAG: entity-first + PPR scoring + RFGE + vector fallback
│ │ └── vanilla_rag_system.py # Vanilla RAG: vector-only baseline with adjacent chunk expansion
│ ├── answer_guardrails.py # Runtime answer quality guardrails
│ └── retrieval_sampling.py # Retrieval temperature sampling helpers
├── schemas/
│ └── models.py # Pydantic models: Chunk, Entity, Relationship, KGContext, RetrievalResult
└── providers/
└── model_providers.py # LLM + embedding provider abstractions
experiments/
├── experiment.py # Main benchmark runner
├── uncertainty_metrics.py # 15-metric UQ suite (output, structural, grounding)
├── dataset_adapters.py # Dataset normalization and corpus-role metadata
├── prepare_bioasq_corpus.py # Build shared PubMed abstract corpus for BioASQ
└── visualize_results.py # Plotting utilities
| Component | Detail |
|---|---|
| Embeddings | all-MiniLM-L6-v2 (384-dim), runs locally on CPU |
| Vector similarity | Cosine, default threshold 0.1 |
| Chunk size | 1500 chars, 200 overlap |
| Graph database | Neo4j 5.0+ with vector indexes |
| Graph visualisation | React + force-directed graph (frontend) |
| File upload limit | 50 MB |
| Chat rate limit | 30 req/min per IP |
| KG build rate limit | 5 req/min per IP |
The supported product surface is the CLI, web app, and experiment runner. A few root-level Python modules remain because the app imports them directly:
| Module | Purpose |
|---|---|
graphDB_dataAccess.py |
Low-level Neo4j data access layer used by the API |
csv_processor.py |
CSV validation and bulk-processing helpers for the app |
shared/common_fn.py |
Shared text and embedding utilities used across modules |
# Neo4j only (recommended for development)
docker compose up -d neo4j
# Full stack (Neo4j + API server)
docker compose up -d
# Logs
docker compose logs -f
# Stop
docker compose down
# Neo4j Browser → http://localhost:7474
# Connect to bolt://localhost:7687| Provider | Env var | Notes |
|---|---|---|
openrouter |
OPENROUTER_API_KEY |
Recommended; free-tier models available |
openai |
OPENAI_API_KEY |
GPT-3.5, GPT-4, GPT-4o |
gemini |
GEMINI_API_KEY |
Gemini Pro, Flash |
ollama |
— | Local models; set OLLAMA_HOST if non-default |
huggingface |
HF_API_TOKEN |
HuggingFace Inference API |
deepseek |
DEEPSEEK_API_KEY |
DeepSeek Chat/Coder |
| Provider | Typical use | Notes |
|---|---|---|
sentence_transformers (runtime default) |
local CPU/GPU embeddings | Uses the local MiniLM-family sentence-transformers path |
openai |
hosted embeddings | Requires OPENAI_API_KEY |
huggingface, vertexai |
advanced/provider-helper integrations | Supported by lower-level provider helpers, but the current runtime paths default to sentence_transformers or openai |
| Variable | Default | Effect |
|---|---|---|
EMBEDDING_PROVIDER |
sentence_transformers |
Runtime embedding backend for app / retrieval |
OLLAMA_HOST |
http://localhost:11434 |
Base URL for local Ollama models |
APP_API_KEY |
unset | Enables API-key enforcement when present |
ALLOWED_ORIGINS |
* |
CORS policy for the FastAPI server |
LLM_TIMEOUT_SECONDS |
120 |
Per-request LLM timeout |
These are no longer primarily env-var driven:
- KG build chunk windows and overlaps are code-level defaults in the builder / benchmark runner
- retrieval thresholds and chunk-count sweeps are CLI-controlled in experiments/README.md
- hybrid-retrieval behavior is controlled by retriever settings such as
retrieval_mode,use_rfge,use_ppr_scoring, anduse_evidence_block
MIT. See LICENSE for details.
Issues and feature requests: GitHub Issues
