Evidence-backed temporal memory for long-horizon AI agents.
Quipu is pronounced KEE-poo. The name comes from knotted Andean record systems: durable memory encoded as structure, not just loose text.
Quipu gives an agent a local memory layer it can inspect, verify, update, and forget. It stores raw evidence, extracts scoped facts and preferences from that evidence, retrieves the right memories for a task, and keeps enough provenance that you can ask why a memory exists or remove it later.
curl -fsSL https://raw.githubusercontent.com/jeffhajewski/quipu/main/scripts/install.sh | bash
export PATH="$HOME/.quipu/bin:$PATH"
quipu --db "$HOME/.quipu/memory.lattice" healthThe installer builds the Zig core against a system-installed LatticeDB native
library for durable graph, full-text, vector, and stream storage. You need
git, zig, lattice.h, and liblattice installed; set LATTICE_INCLUDE
and LATTICE_LIB_DIR when they are not in standard system paths.
Agents do not just need longer prompts. They need memory with discipline.
Most memory layers eventually become a pile of chunks. They can recall stale facts, mix data across users or projects, lose the evidence behind a summary, or fail to fully suppress deleted information. Quipu is built around stricter invariants:
- Raw evidence is preserved unless it is explicitly forgotten.
- Derived facts link back to the messages that justify them.
- Current facts are temporal: old values become historical, not silently wrong.
- Retrieval is scoped, budgeted, and inspectable.
- Forgetting propagates from raw evidence to derived memories.
- SDKs stay thin; memory semantics live in the local daemon.
The result is a memory system that is useful for real agent work: repo preferences, user style, project procedures, facts that change over time, and audit trails for what the agent used.
Quipu currently supports:
memory.remember: store raw sessions, turns, and messages.- Deterministic extraction for package-manager facts, test commands, and response-style preferences.
- Temporal supersession for single-valued facts and preferences.
memory.retrieve: return both a rendered prompt and structured context packet with scope filtering, needs filtering, token budgeting, event windows,validAt, warnings, and optional trace output.memory.answer: retrieve scoped evidence and answer with either the deterministic local fallback or a configured OpenRouter-compatible chat model.memory.search: lexical, vector, reciprocal-rank hybrid, and graph search modes.memory.inspect: inspect a node, its provenance, dependents, and stream-backed audit records.memory.forget: qid or query selectors, dry-run closure reports, tombstones, redaction state, and suppression of derived memories and contaminated core summaries backed by forgotten evidence.memory.feedbackandmemory.core.*for retrieval feedback and user-managed core memory blocks.- Native LatticeDB-backed persistence with graph nodes/edges, FTS, configurable hash or OpenAI-compatible vector search, and durable streams.
- Durable stream events can be materialized into idempotent
Jobnodes for extraction, async entity resolution, consolidation, forgetting, retrieval logging, and feedback work. - Python and TypeScript SDKs that validate JSON-RPC shape and talk to a local
quipu serve-stdioprocess. - A dependency-free MCP stdio adapter with tools, docs/schema resources, and workflow prompts for tool-calling hosts.
- Synthetic evals that check temporal truth, cross-scope leakage, evidence faithfulness, forgetting leakage, and emit run manifests.
- External benchmark readiness gates, LoCoMo smoke fixtures, a real LoCoMo
locomo10.jsonnormalizer, retrieval trace artifacts, and reproducible benchmark manifests. - Deterministic external benchmark baselines (
full_context,recent_only,bm25,vector_rag,hybrid_bm25_vector,summary_only,memory_cards_only,graph_only) plus Q0-Q13/full Quipu ablation runs.
The current benchmark surface has two honest result classes:
synthetic_smoke: deterministic CI fixtures for temporal truth, scope, evidence, preferences, and forgetting.external_smoke: tiny normalized fixtures shaped like external benchmarks.
One local publishable LoCoMo run has passed the readiness gate with the full
external dataset, LatticeDB storage, retrieval traces, deterministic answer and
grading, baselines, ablations, verification, and reproducible manifests. Treat
it as a Quipu reproducibility result, not an apples-to-apples leaderboard claim
against published systems that use different answer models, judges, retrieval
cutoffs, category subsets, or dataset slices.
Latest local synthetic run:
- Q0 raw-only fake baseline: 5/5 queries, 1/1 forgetting checks.
- Core in-memory baseline: 5/5 queries, 1/1 forgetting checks.
- Core LatticeDB
0.6.0baseline: 5/5 queries, 1/1 forgetting checks.
See docs/benchmark-results.md for the generated report and caveats.
Current external benchmark status:
- LoCoMo mini fixture: covers single-hop, multi-hop, temporal, adversarial, event-summary, and forgetting paths. This validates the harness, not LoCoMo performance.
- Real LoCoMo adapter: normalizes the upstream SNAP
locomo10.jsonshape into Quipu scenarios and can run local deterministic baseline, ablation, and core benchmark passes with artifacts. - LongMemEval mini fixture and real adapter: normalizes the official cleaned LongMemEval oracle/S/M JSON files into timestamped session events and session-level evidence queries. This is wired for reproducible runs, not a published LongMemEval score.
- Full LoCoMo local raw-retrieval run: 1,986 questions, LatticeDB
0.6.0, core Lattice answer-containment exact match 28.55%, evidence recall 58.27%, with reports underartifacts/benchmarks/locomo-core-lattice-raw-fast-full/. - Same-suite provider semantic baselines using OpenRouter
openai/text-embedding-3-smallat the provider default dimensionality: vector RAG exact 25.28%, evidence recall 57.90%; hybrid BM25/vector exact 26.64%, evidence recall 58.07%, with reports underartifacts/benchmarks/locomo-full-openrouter-semantic/. - Lattice graph retrieval with deterministic entity resolution is wired through
the benchmark harness and passes the LoCoMo graph smoke with verification and
trace artifacts under
artifacts/benchmarks/locomo-graph-smoke/. The full LoCoMo graph+ER run is blocked on LatticeDB0.6.0LatticeIo/close issues with large-message writes, stream rereads, and dense edge materialization. - Local BM25 on the same normalized suite: exact 24.12%, evidence recall 50.42%. Full context remains the local upper bound at exact 42.80% and evidence recall 99.70%.
- External benchmark reports include published reference points from other memory systems where available. They are orientation only: answer models, judges, retrieval cutoffs, dataset slices, and methodologies differ.
Start a local durable Quipu process:
quipu --db "$HOME/.quipu/memory.lattice" serve-stdioOr send JSON-RPC directly:
printf '%s\n' '{"jsonrpc":"2.0","id":"1","method":"memory.remember","params":{"scope":{"projectId":"repo:quipu"},"messages":[{"role":"user","content":"This repo uses pnpm. Run just test before committing.","createdAt":"2026-04-28T10:15:00Z"}]}}' \
| quipu --db "$HOME/.quipu/memory.lattice" rpc-stdin
printf '%s\n' '{"jsonrpc":"2.0","id":"2","method":"memory.retrieve","params":{"query":"What should I run before committing?","scope":{"projectId":"repo:quipu"},"needs":["procedural"],"options":{"includeDebug":true}}}' \
| quipu --db "$HOME/.quipu/memory.lattice" rpc-stdin
printf '%s\n' '{"jsonrpc":"2.0","id":"3","method":"memory.answer","params":{"query":"What should I run before committing?","scope":{"projectId":"repo:quipu"},"needs":["procedural"],"options":{"includeDebug":true}}}' \
| quipu --db "$HOME/.quipu/memory.lattice" rpc-stdinCheck the store:
quipu --db "$HOME/.quipu/memory.lattice" health
quipu --db "$HOME/.quipu/memory.lattice" verifyProvider-backed vector search can be enabled in the Lattice adapter with an OpenAI-compatible embeddings endpoint:
export OPENROUTER_API_KEY=...
QUIPU_EMBEDDING_PROVIDER=openrouter \
quipu --db "$HOME/.quipu/memory.lattice" \
remember --project repo:quipu --text "The launch code is heliotrope."With LatticeDB 0.6.0, Quipu requests 768-dimensional
openai/text-embedding-3-small vectors by default for the core Lattice adapter.
The installed LatticeDB header exposes configurable vector_dimensions, but
local smoke tests show commits fail with LatticeIo at 1017 dimensions and
above. Full 1536-dimensional OpenAI embeddings should be re-enabled after that
LatticeDB persistence issue is fixed.
Provider-backed answer generation and async entity resolution use the unified LLM provider surface:
export OPENAI_API_KEY=...
quipu --db "$HOME/.quipu/memory.lattice" \
--llm-provider openai \
--llm-model gpt-4o \
remember --project repo:quipu --text "Alice Smith owns the Lisbon rollout."
quipu --db "$HOME/.quipu/memory.lattice" jobs run entity-resolve
quipu --db "$HOME/.quipu/memory.lattice" \
--llm-provider openai \
answer --project repo:quipu --mode graph --query "Who owns the Lisbon rollout?"OpenRouter, Anthropic, Google Gemini, Ollama, Groq, Together, Mistral,
DeepSeek, Kimi/Moonshot, Cohere, Azure OpenAI, and custom OpenAI-compatible
endpoints are documented in docs/providers.md.
For local smoke tests without model keys, use
--entity-provider deterministic and omit --answer-provider; memory.answer
falls back to deterministic prompt extraction.
Use the CLI directly:
quipu --db "$HOME/.quipu/memory.lattice" init
quipu --db "$HOME/.quipu/memory.lattice" remember \
--project repo:quipu \
--text "This repo uses pnpm. Run just test before committing."
quipu --db "$HOME/.quipu/memory.lattice" retrieve \
--project repo:quipu \
--query "test command" \
--need procedural \
--debug
quipu --db "$HOME/.quipu/memory.lattice" answer \
--project repo:quipu \
--query "What should I run before committing?" \
--need procedural
quipu --db "$HOME/.quipu/memory.lattice" consolidate --project repo:quipu
quipu --db "$HOME/.quipu/memory.lattice" forget \
--project repo:quipu \
--query "pnpm" \
--dry-runPython:
from quipu import Quipu
with Quipu.local(db_path="/tmp/quipu.lattice") as q:
remembered = q.remember(
scope={"projectId": "repo:quipu"},
messages=[{"role": "user", "content": "Use pnpm for this repo."}],
)
retrieved = q.retrieve(
query="package manager",
scope={"projectId": "repo:quipu"},
needs=["current_facts"],
options={"includeDebug": True},
)
answered = q.answer(
query="What package manager does this repo use?",
scope={"projectId": "repo:quipu"},
needs=["current_facts"],
)
print(remembered["messageQids"])
print(retrieved["prompt"])
print(answered["answer"])TypeScript:
import { Quipu } from "@quipu/memory";
const q = await Quipu.local({ dbPath: "/tmp/quipu.lattice" });
try {
await q.remember({
scope: { projectId: "repo:quipu" },
messages: [{ role: "user", content: "Run just test before committing." }],
});
const retrieved = await q.retrieve({
query: "test command",
scope: { projectId: "repo:quipu" },
needs: ["procedural"],
options: { includeDebug: true },
});
const answered = await q.answer({
query: "What should I run before committing?",
scope: { projectId: "repo:quipu" },
needs: ["procedural"],
});
console.log(retrieved.prompt);
console.log(answered.answer);
} finally {
q.close();
}SDK packaging is still settling; from source, use sdk/python and
sdk/typescript directly.
Agent runtime / app
-> Python SDK, TypeScript SDK, CLI, or MCP adapter
-> JSON-RPC 2.0 protocol
-> Zig core runtime
-> storage adapter
-> in-memory adapter or LatticeDB adapter
The Zig core owns memory semantics. Storage adapters provide graph records, full-text search, vector search, streams, transactions, and verification.
The LatticeDB adapter stores Quipu nodes as durable graph nodes with public qids,
creates provenance edges, indexes text with Lattice FTS, stores deterministic
lattice_hash_embed or OpenAI-compatible provider vectors for search, and
publishes events through Lattice native durable streams.
Quipu is a working core prototype, not a packaged production daemon yet.
Implemented:
- Contract-first JSON-RPC protocol schemas and conformance fixtures.
- Zig runtime for the public memory methods.
- LatticeDB-backed storage adapter for normal runtime and tests, plus an in-memory adapter for adapter-local/no-lattice builds only.
- Schema metadata, initial migration record, and schema-aware verification.
- Retrieval, inspection, feedback, core memory blocks, forgetting, and audit stream logging.
- Reciprocal-rank hybrid retrieval, graph expansion over resolved entities, and
provider-backed
memory.answer. - Python/TypeScript SDK validators and stdio clients.
- Python/TypeScript
localhelpers that auto-start the core over stdio. - MCP tools, resources, and prompts.
- Synthetic eval harness, strict core eval baseline, and run manifests.
- External benchmark readiness gate, LoCoMo mini smoke fixture, real LoCoMo
dataset normalizer, retrieval trace artifacts, and
justtargets for LoCoMo smoke/full/download runs. - Deterministic required baselines and Q0-Q13/full Quipu ablation manifests for benchmark reports.
Still in progress:
- Long-running socket or HTTP daemon.
- Release artifacts for Quipu itself.
- Full migration runner and compatibility checks.
- Learned reranking and scoring beyond deterministic RRF/graph boosts.
- LLM-backed extraction and consolidation workers.
- Judge scoring for LoCoMo publication.
- MemoryAgentBench adapter.
- Richer host integrations.
Common checks:
just test
just fmt
just ci
cd core && zig build test
cd sdk/typescript && npm test
PYTHONPATH=evals/src python3 -m quipu_evals.core_runner --strict
PYTHONPATH=evals/src python3 -m quipu_evals.benchmarks \
--include-lattice \
--markdown docs/benchmark-results.md
PYTHONPATH=evals/src python3 -m quipu_evals.benchmarks \
--external-benchmark locomo \
--include-baselines \
--include-ablations \
--allow-failures \
--markdown artifacts/benchmarks/locomo-smoke/report.md
PYTHONPATH=evals/src python3 -m quipu_evals.benchmarks \
/path/to/locomo10.json \
--external-benchmark locomo \
--result-class publishable \
--include-baselines \
--include-ablations \
--include-lattice \
--require-lattice \
--skip-core \
--core-retrieval-mode graph \
--core-entity-provider deterministic \
--allow-failures \
--markdown artifacts/benchmarks/locomo-full/report.md
PYTHONPATH=evals/src python3 -m quipu_evals.core_runner \
--strict \
--output artifacts/evals/core-results.json \
--manifest artifacts/evals/core-manifest.json
just benchmark-locomo-smoke
just benchmark-locomo-graph-smoke
just benchmark-locomo-full /path/to/locomo10.json
just benchmark-locomo-downloadBuild with an explicit system LatticeDB install:
cd core
zig build \
-Dlattice-include=/path/to/latticedb/include \
-Dlattice-lib=/path/to/latticedb/libcore/: Zig runtime, CLI entrypoint, storage adapters, deterministic extraction, retrieval, forgetting, audit streams, and tests.protocol/: JSON-RPC schemas and conformance fixtures.sdk/typescript/: thin TypeScript SDK and protocol tests.sdk/python/: thin Python SDK and protocol tests.mcp/: dependency-free MCP stdio adapter with tools, resources, and prompts.evals/: synthetic scenario schema, deterministic baselines/ablations, Zig core runner, graders, external smoke fixtures, real LoCoMo normalizer, run manifests, readiness gates, trace artifacts, and tests.docs/: implementation notes for API, algorithms, data model, evals, architecture, security, and publication.examples/: planned integration examples.
Public JSON-RPC methods:
system.healthmemory.remembermemory.retrievememory.searchmemory.inspectmemory.forgetmemory.feedbackmemory.core.getmemory.core.update
See protocol/README.md, docs/api.md, and protocol/schemas/methods.schema.json for the implemented contract.
MIT.