ctxgraph

Typed knowledge graph for AI agents. Single Rust binary. Single SQLite file. One LLM call per write. Zero LLM calls for 90% of reads.

brew install rohansx/tap/ctxgraph
ctxgraph init
ctxgraph log "Migrated auth from Redis sessions to JWT. Chose JWT for stateless scaling."
ctxgraph query "why did we move away from Redis?"

Working spec: docs/CLARITY.md — product, decisions, the 5 pieces to build, launch pitch. Architecture: docs/ARCHITECTURE.md — as-built (§1-4) + v0.3 target (§5-14). Roadmap: docs/ROADMAP.md — 5 pieces + 12-week schedule + this-weekend todo. Benchmarks: docs/BENCHMARKS.md — measured F1 numbers + hostile-reader audit.

Headline result (measured 2026-05-13)

Same LLM, same fixture, same scoring code. ctxgraph's single-call schema-typed prompt vs Graphiti's 6-call pipeline:

Metric (Gemma 4 26B-A4B in both columns)	ctxgraph	Graphiti	Δ
entity F1 (pair-fuzzy)	0.819	0.824	-0.005
relation F1 (pair-fuzzy)	0.555	0.096	+0.459 (5.8×)
combined F1	0.687	0.460	+0.227

The win replicates with Gemma 4 31B (ctxgraph 0.739 vs Graphiti 0.467, +0.272 combined / +0.499 relation). It's architectural, not model-specific. Graphiti's 6-call pipeline tops out at combined F1 ≈ 0.46 regardless of which Gemma you feed it.

Fixture: 29 hand-labeled cross-domain episodes covering 25 domains (crates/ctxgraph-extract/tests/fixtures/cross_domain_v2.json). Scoring code: scripts/openrouter_bench.py + scripts/graphiti_openrouter_bench.py. Raw per-episode outputs: scripts/results/v0.9_cross_domain_v2/*.json. Total cost to reproduce: ~$0.15.

Full benchmark detail → docs/BENCHMARKS.md

How it works

                 ┌──────────────────────────────────────┐
                 │       WRITE PATH (one LLM call)       │
                 │  Tier 1: GLiNER2 ONNX (CPU, ~30ms)    │
                 │  Tier 2: NuExtract 2.0 (local Ollama) │
                 │  Tier 3: Cloud (only if needed)       │
                 │    Mode B default: Cerebras free       │
                 │    Paid: DeepInfra Gemma-4-26B-A4B    │
                 └──────────────────────────────────────┘
                                  │
                                  ▼
                 ┌──────────────────────────────────────┐
                 │       SQLite + FTS5 + sqlite-vec      │
                 │       bi-temporal edges, RRF search   │
                 └──────────────────────────────────────┘
                                  ▲
                                  │
                 ┌──────────────────────────────────────┐
                 │   READ PATH (zero LLM in 90% cases)   │
                 │  Simple (90%):                        │
                 │    verb → typed relation via cosine   │
                 │    embedding match (~30 LOC)          │
                 │    then deterministic SQL             │
                 │  Complex (10%):                       │
                 │    local Qwen3-1.5B parses NL →       │
                 │    graph op, then SQL                 │
                 │  NO cloud LLM ever in read path       │
                 └──────────────────────────────────────┘

Two architectural bets:

One LLM call per write. Tiered escalation: local ONNX handles ~70% of episodes, local LLM another 25%, cloud only when both fail. Compare to Graphiti's 6 calls per episode.
Zero LLM calls in the read path for 90% of queries. The universal schema's 10 typed relations are a closed set — your user verb cosine-matches to one of them, then SQL runs deterministically. Only multi-hop / time-filter / conjunction queries (~10%) call a tiny local Qwen3-1.5B. No cloud LLM ever sees a read.

This is the bit competitors can't match. Graphiti, Mem0, Letta all need an LLM at read time because their relation types are free-form text the SQL engine can't reason about.

The universal schema (v0.3 target)

9 entity types, 10 relations, hardcoded. Users never write a schema.

Entity types	Relation types
Person, Place, Organization, Concept, Artifact, Event, Time, Idea, Fact	mentions, located_at, related_to, caused, preceded, references, owned_by, part_of, depends_on, participated_in

Broad enough to handle personal wikis, work notes, research, recipes, code, journal entries — anything text-shaped. Edge-case domains (recipes need "Ingredient", scientific datasets need "Measurement") get handled by an automatic schema-improvement loop: the LLM logs suggestions to a side-table; a nightly cron promotes types that show up across ≥ 5 distinct episodes with cosine-similarity < 0.85 to any existing type. Users see this as a one-line notice the next time they invoke the CLI.

Full schema rationale → docs/CLARITY.md § 3

Three modes

You pick one at ctxgraph init. All three keep reads local.

Mode	Writes	Cost / 1k eps	Best for
`local-only`	GLiNER2 → NuExtract 2.0 → Qwen3-8B (all local)	$0	Privacy / offline / sensitive data
`cloud-fallback` (default)	Local first; Cerebras free tier when local is stuck	$0 in practice*	Most users
`cloud-quality`	Skip local; every episode goes to Cerebras Qwen3-32B or DeepInfra Gemma-4-26B-A4B	$0–$0.11	Long-form text, research papers

* Cerebras free tier = 1M tokens/day, 30 RPM. Enough for ~1 250 episodes/day. DeepInfra Gemma-4-26B-A4B ($0.07 in / $0.34 out, ~$0.11/1k eps) is the paid fallback when Cerebras rate-limits.

allow_cloud = false in ~/.ctxgraph/config.toml forces Mode A regardless of mode — the privacy override.

Competitive landscape

	ctxgraph	Graphiti / Zep	Mem0	Letta	Cognee
Distribution	single Rust binary	Python + Neo4j + Docker	Python SDK	Python	Python + Neo4j
Local-only mode	yes	no	no	yes (Apache 2.0)	no
LLM calls per write	1	6	N	varies	varies
LLM in read path	no (90% of queries)	yes	yes	yes	yes
Schema-typed extraction	yes (universal 9/10)	free-form verbs	free-form	typed but manual	typed but manual
Bi-temporal edges	yes	yes	no	no	no
Verified $/1k eps (Gemma 4 26B)	$0.11	~$0.66 (6×)	N/A	N/A	N/A
Apples-to-apples combined F1 vs ctxgraph (same model)	0.687	0.460	not measured	not measured	not measured
Stars (rough, May 2026)	early	~20K	~50K	~30K	~15K

More competitor analysis → docs/ROADMAP.md § "Competitive landscape"

What's in the box today (v0.8.0)

Component	Status	Lines
`ctxgraph-core` — SQLite + FTS5 + bi-temporal graph	shipped	~2 000
`ctxgraph-extract` — tiered extraction (current: GLiNER + GLiREL + LLM gate)	shipped	~8 500
`ctxgraph-embed` — fastembed wrapper, all-MiniLM-L6-v2 (384-dim)	shipped	~70
`ctxgraph-cli` — init, log, query, entities, stats, models, mcp start	shipped	~600
`ctxgraph-mcp` — MCP server, 6 tools	shipped	~870

v0.3 is the next launch — see docs/ROADMAP.md. It swaps GLiNER + GLiREL for GLiNER2 (single forward pass), adopts the universal schema, adds the no-LLM read path, defaults to Cerebras free tier, and re-runs the 29-episode benchmark to confirm the headline lands at ≥ 0.745 combined F1 with a fully local stack.

Install

# macOS + Linux (prebuilt binaries via Homebrew)
brew install rohansx/tap/ctxgraph

# or from source (Rust 1.85+)
cargo install ctxgraph-cli

Quick start

ctxgraph init
ctxgraph log "Alice chose PostgreSQL over MySQL for the new billing service."
ctxgraph log "PostgreSQL replaced MySQL in prod on 2026-04-12."
ctxgraph query "what did Alice choose?"
ctxgraph query "what was replaced?"

MCP server (Claude Code / Cursor / Cline)

{
  "mcpServers": {
    "ctxgraph": { "command": "ctxgraph-mcp" }
  }
}

Tool	Description
`ctxgraph_add_episode`	Record an event or decision
`ctxgraph_search`	Fused FTS5 + semantic + graph search
`ctxgraph_traverse`	Walk the graph from an entity
`ctxgraph_find_precedents`	Find similar past events
`ctxgraph_list_entities`	List entities with filters
`ctxgraph_export_graph`	Export entities and edges

Rust SDK

use ctxgraph::{Graph, Episode};

let mut graph = Graph::init(".ctxgraph")?;
graph.add_episode(
    Episode::builder("Chose Postgres over Mongo for the billing rewrite").build()
)?;
let results = graph.search("why Postgres?", 10)?;

Project structure

crates/
├── ctxgraph-core/      types, storage, query, temporal
├── ctxgraph-extract/   tiered extraction (ONNX + LLM)
├── ctxgraph-embed/     local embeddings (384-dim)
├── ctxgraph-cli/       CLI binary
└── ctxgraph-mcp/       MCP server

Reproducing the benchmark

export OPENROUTER_API_KEY=sk-or-...
python scripts/openrouter_bench.py \
  --model google/gemma-4-26b-a4b-it \
  --out bench.json \
  --skip-tech \
  --cd-fixture crates/ctxgraph-extract/tests/fixtures/cross_domain_v2.json

# Spin up Neo4j for Graphiti
docker run -d --name neo4j-bench -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/benchpass123 neo4j:5.26-community

# Graphiti through OpenRouter (needs Python 3.12 venv)
python3.12 -m venv /tmp/graphiti_venv
/tmp/graphiti_venv/bin/pip install graphiti-core neo4j openai
/tmp/graphiti_venv/bin/python scripts/graphiti_openrouter_bench.py \
  --model google/gemma-4-26b-a4b-it \
  --out graphiti.json

python scripts/compare_v2.py

Total cost: ~$0.15. Total wall-clock: ~90 minutes.

Contributing

See CONTRIBUTING.md. For design discussions, docs/CLARITY.md is the working doc — propose changes against it.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.claude		.claude
.github		.github
crates		crates
docs		docs
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
ctxgraph.toml.example		ctxgraph.toml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ctxgraph

Headline result (measured 2026-05-13)

How it works

The universal schema (v0.3 target)

Three modes

Competitive landscape

What's in the box today (v0.8.0)

Install

Quick start

MCP server (Claude Code / Cursor / Cline)

Rust SDK

Project structure

Reproducing the benchmark

Contributing

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ctxgraph

Headline result (measured 2026-05-13)

How it works

The universal schema (v0.3 target)

Three modes

Competitive landscape

What's in the box today (v0.8.0)

Install

Quick start

MCP server (Claude Code / Cursor / Cline)

Rust SDK

Project structure

Reproducing the benchmark

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages