A plugin that gives OpenClaw agents both in-conversation context compression (hierarchical summary DAG) and cross-session long-term recall (vector store with tiered retrieval) — packaged as a single drop-in runtime extension.
🚀 Quick Start • 🌟 Overview • 🏗️ Architecture • 🧠 Retrieval Pipeline • 📈 Results • 🛠️ Tools • ⚙️ Configuration • 🔬 Benchmarks • 🇨🇳 中文文档
- [2026-04-12] 🈯 Chinese support & opt-in vector dedup shipped. CJK preference extraction (12 new regex patterns covering likes, habits, goals, biographical facts, and nostalgia), CJK-aware token estimation using empirically-calibrated
cl100k_baseratios (0.87 chars/token for CJK — fixes a ~4.6× underestimate on Chinese text that was causing compression to trigger far too late), and opt-in cosine-similarity deduplication at vector write time (vectorDedupEnabled, default off). English benchmark numbers are bit-identical on LongMemEval-50 — verified empirically, not just by construction. - [2026-04-11] 🚀 v1.0.0 released! Temporal reasoning fix ships with the first public release: prepending
[Date: …]headers to indexed session documents and passingasOfto the LLM judge lifted LoCoMo temporal from 42.4% → 80.4% (+38.0 pts) and LongMemEval-500 from 72.7% → 80.9% (+8.2 pts).
- 🚀 Quick Start
- 🌟 Overview
- 🏗️ Architecture
- 🧠 Retrieval Pipeline
- 📈 Results
- 🛠️ Tools
- ⚙️ Configuration
- 📦 Installation
- 🔬 Benchmarks
- 🗺️ Roadmap
- 📄 License
MemCC is an OpenClaw plugin — it loads via OpenClaw's plugin discovery and registers itself as both a context engine and a memory capability. No standalone runtime, no separate server.
# 📥 Clone and install
git clone <repo> memcc
cd memcc
npm install
# ▶️ Point OpenClaw at the plugin
export OPENCLAW_PLUGINS=/path/to/memcc
# Or add the package path to your openclaw config
# 🔑 Set the OpenAI key used by embeddings + LLM rerank
export OPENAI_API_KEY=sk-...
# 🧪 Run the full test suite
npm test
# 📊 Run the benchmark suites (see §Benchmarks)
npm run benchOnce loaded, OpenClaw resolves the context engine memcc (also aliased as default) and exposes four tools to the agent: memory_search, memory_get, lcm_grep, lcm_expand_query. No additional setup is needed — the plugin migrates its own SQLite schemas on first boot and lazily initializes the LanceDB store.
LLM agents that live across many turns and many sessions face two fundamentally different problems:
- The current conversation is too long to fit in the context window. You need to compress old turns without losing the ability to reach back and recover exact details.
- Things learned in previous sessions are gone the moment the agent restarts. You need a persistent store that understands semantic recall, not just text matching.
Most memory systems pick one side. MemCC does both in one plugin, and crucially the two sides talk to each other: when the context engine compacts a chunk of conversation, the resulting summary is handed straight to the long-term memory runtime for embedding. When the context engine next assembles a prompt, it asks the memory runtime for relevant recall and splices it back into the same context that the compacted summaries live in.
|
Hierarchical in-session compaction SQLite-backed conversation store with Goal: stay under token budget without losing recoverability. |
Verbatim vector recall across sessions LanceDB vector store (OpenAI Goal: semantic recall that survives restarts. |
💡 The two subsystems are decoupled through a
MemoryRuntimeBridgeinterface — the context engine never imports memory code directly. This means the memory runtime can initialize asynchronously (embeddings client startup) without blocking the context engine from accepting turns.
Three lanes — Host/User on the left, Context Engine (DAG) in the middle, Long-Term Memory Runtime on the right. Blue edges are the primary ingest/assemble flow, green edges are memory writes, orange is control/bootstrap, and gray-dashed is tool wiring.
| Component | File | Role |
|---|---|---|
| Plugin shell | src/plugin/index.ts |
Registers context engine, memory capability, and tools with OpenClaw; wires the MemoryRuntimeBridge after async init. |
| ContextEngine | src/context/engine.ts |
Implements the OpenClaw ContextEngine interface: ingest, ingestBatch, assemble, afterTurn, compact, bootstrap. Owns compaction. |
| ConversationStore | src/context/store/conversation-store.ts |
SQLite + optional FTS5 for raw messages. |
| SummaryStore | src/context/store/summary-store.ts |
Stores the leaf / condensed summary DAG; tracks the per-conversation context item list. |
| CompactionEngine | src/context/compaction.ts |
Decides when to compact (0.75 × budget), produces summaries via the configured model, preserves freshTailCount=4 messages. |
| ContextAssembler | src/context/assembler.ts |
Walks the DAG + fresh tail + memory-injected sections to emit a token-budgeted message list. |
| LargeFileHandler | src/context/large-files.ts |
Externalizes oversize blobs and swaps in pointers before they pollute the summary tree. |
| Bootstrap | src/context/bootstrap.ts |
Replays a session JSONL file on first hit so restarted sessions don't lose the DAG. |
| MemoryRuntime | src/memory/runtime.ts |
Owns the memory side: ingestSummary, retrieve, injectIntoContext, importLegacy. |
| VectorStore | src/memory/store/vector-store.ts |
LanceDB backend — embeds via the OpenAI client, upserts records with wing/room metadata. Opt-in cosine-similarity dedup at write time (see §Configuration). |
| MemoryStore | src/memory/store/memory-store.ts |
SQLite metadata store: memory records, knowledge-graph triples, FTS index. |
| Retrieval layers | src/memory/layers/*.ts |
l0-identity.ts, l1-essential.ts, l2-on-demand.ts, l3-deep-search.ts — see §Retrieval Pipeline. |
| Search | src/memory/search/*.ts |
hybrid-search.ts (BM25-style + temporal boost), llm-rerank.ts, knowledge-graph.ts. |
| Classify | src/memory/classify.ts |
Routes text to a (wing, room) namespace based on entity/topic detection. |
| Preference extractor | src/memory/preference-extractor.ts |
Mines stated user preferences from compaction summaries (27 English + 12 Chinese regex patterns; CJK-aware length bounds) and stores them as synthetic memory docs for paraphrase recall. |
| Token estimator | src/token-estimator.ts |
CJK-aware token count estimation used by compaction, assembly, bootstrap, and layer budgeting. Empirical cl100k_base ratios: 0.87 chars/token for CJK, 4.0 for Latin. Non-CJK fast path is bit-identical to the pre-existing Math.ceil(text.length / 4) formula. |
- Ingest. OpenClaw passes each turn to
ContextEngine.ingest(). Large files are externalized; the raw message is stored and arawcontext item is appended to the conversation's item list. - Compaction check. After the turn,
afterTurn()asks theCompactionEnginewhether the conversation is over 75% of its budget. If yes, aleafsummary is produced (optionally rolled into acondensedsummary). - Summary → memory. Each summary is pushed through
MemoryRuntimeBridge.ingestSummary(). TheMemoryRuntimeclassifies it, embeds it with OpenAI, upserts into LanceDB, and runs the preference extractor. Both the summary doc and any synthetic preference docs land in the vector store. - Assemble. On the next turn,
assemble()grabs the last 3 recent messages as a retrieval query, callsmemoryRuntime.retrieve(query), theninjectIntoContext(retrieval)to produce budgeted<memcc-identity>,<memcc-essential>,<memcc-context>,<memcc-recall>sections. These are interleaved with the DAG assembly (summaries + fresh tail). - Hint. If compressed history exists, a
systemPromptAdditionis emitted telling the agent it may calllcm_grep/lcm_expand_queryto recover exact detail from a summary.
MemCC doesn't do a single "top-K vector search". Each retrieval fans out across four independent layers, each answering a different question, with its own token budget.
The query is broadcast to L0–L3 in parallel. L3 internally runs vector → hybrid rerank → KG merge → (optional) LLM rerank. Every layer writes into its own wrapped section at inject time, with a token budget that tightens when contextOptimization=true.
| Layer | Source | What it's for | Budget |
|---|---|---|---|
| L0 Identity | agentDir/* static files |
Who this agent is — role, persona, baseline facts. Always injected verbatim. | unbounded |
| L1 Essential | MemoryStore pinned memories |
Top identity fragments and high-weight pinned facts. Trimmed line-by-line to budget. | 800 tok (400 in opt mode) |
| L2 On-Demand | MemoryStore wing-filtered |
Memories tagged with the wing detected from the query (e.g. a "coding" wing for code questions). Skipped entirely in optimization mode. | ≤ 500 tok or dropped |
| L3 Deep Search | VectorStore + MemoryStore triples |
Semantic recall: vector top-20 → BM25 + temporal rerank → optional KG merge → optional LLM rerank. | top 10 (top 3 in opt mode) |
- Vector search — 20 candidates from LanceDB (
text-embedding-3-small). - Hybrid rerank (
hybrid-search.ts) — BM25-style lexical scoring + temporal boost (newer memories win on ties). - KG merge (
knowledge-graph.ts) — if query entities match stored(subject, predicate, object)triples, a synthetic KG result is injected unless already covered. - LLM rerank (
llm-rerank.ts, gated byllmRerank=true, default:gpt-4o-mini) — final cross-encoder pass for the last mile of precision. Costs roughly $0.001/search.
💡 The layer split isn't academic — it's what makes the budget shrinkable.
contextOptimization=truedrops L2 entirely, halves L1, and caps L3 at 3 hits. That's the difference between "inject ~2.5k tok per turn" and "inject ~800 tok per turn" on the same query.
MemCC passes every target on both LoCoMo and LongMemEval, and posts leading numbers in temporal reasoning — historically the hardest subset for memory systems.
| 🏆 86.9% LoCoMo overall (1,986 q) |
🏆 80.4% LoCoMo temporal (321 q) |
🏆 88.1% LoCoMo episodic (1,665 q) |
🏆 80.9% LongMemEval-500 (498 q) |
| Benchmark | Score | Target | Status |
|---|---|---|---|
| LoCoMo overall | 86.9% | ≥ 50% | ✅ PASS |
| LoCoMo episodic | 88.1% | ≥ 60% | ✅ PASS |
| LoCoMo temporal | 80.4% | ≥ 50% | ✅ PASS |
| LongMemEval-100 | 92.1% | ≥ 88% | ✅ PASS |
| LongMemEval-500 | 80.9% | — | leading |
MemCC registers four tools with OpenClaw. Three of them operate on short-term context (the compacted DAG), one operates on long-term memory.
| Tool | Operates on | What it does |
|---|---|---|
memory_search |
Long-term (LanceDB) | Semantic search across all stored memories. Returns hits from the L3 deep search layer with scores. Best for "what did we decide about X last month". |
memory_get |
Long-term (LanceDB) | Fetch a single memory by id — useful when memory_search gives an id and the agent wants the full verbatim text. |
lcm_grep |
Short-term (context DAG) | Full-text search inside the current session's messages and compacted summaries. Use this when the agent needs an exact phrase from compressed history. |
lcm_expand_query |
Short-term (context DAG) | Expand a compacted summary back to the underlying raw messages that produced it — effectively "undo this piece of the compression". |
When assemble() detects compressed history, it injects a system-prompt hint telling the agent exactly which tools to reach for:
"Some earlier conversation has been compressed into summaries. If you need exact details from compressed history, use
lcm_grepto search, thenlcm_expand_queryfor full context."
All config flows through the OpenClaw plugin config schema defined in openclaw.plugin.json. Every option is optional — the defaults are already tuned.
| Option | Default | Effect |
|---|---|---|
contextOptimization |
false |
true halves L1 budget (800→400), drops L2 entirely, caps L3 at 3 hits, tightens compaction targets (leaf 1500→1000, condensed 2000→1200). |
compactionModel |
null (inherit) |
Model used to generate leaf/condensed summaries. Provider/model string like openai/gpt-4o-mini. |
embeddingModel |
text-embedding-3-small |
OpenAI embedding model — anything compatible with the OpenAI Embeddings API. |
llmRerank |
true |
Runs an LLM cross-encoder pass after hybrid rerank. Set false for pure vector + BM25. |
rerankModel |
gpt-4o-mini |
Model used for llmRerank. Cheap defaults recommended. |
vectorDedupEnabled |
false |
When true, VectorStore.add() computes cosine similarity against the nearest existing neighbor and skips writes whose similarity ≥ vectorDedupThreshold. Per-item against persisted state only (not intra-batch). Useful for long-running ingestion workloads where the same fact may be re-ingested; opt-in because LongMemEval-style benchmarks don't exercise the dedup path. |
vectorDedupThreshold |
0.95 |
Cosine similarity threshold for write-time dedup. Valid range [0, 1] inclusive. Out-of-range or NaN values fall back to default. Tighter (0.98) suppresses fewer duplicates; looser (0.90) suppresses more. |
$HOME/.openclaw/memcc/
├── context.db # SQLite — conversations, messages, summaries
├── memory.db # SQLite — memory records, triples, FTS
└── lance/ # LanceDB — vector store
These aren't in the config schema — they've been tuned by the benchmark runs.
| Parameter | Default (opt) | Notes |
|---|---|---|
contextThreshold |
0.75 | Compaction fires at 75% of token budget. |
freshTailCount |
4 | Last 4 messages always kept verbatim. |
leafChunkTokens |
4000 | Max tokens collapsed into one leaf. |
leafTargetTokens |
1500 (1000) | Target size of a leaf summary. |
condensedTargetTokens |
2000 (1200) | Target size of a condensed summary. |
leafMinFanout |
3 | Minimum leaves before a condensed roll-up. |
- 🟢 Node.js 22.x (for the built-in
node:sqlitemodule) - 🔑 OpenAI-compatible API for embeddings + rerank
- 🪶 OpenClaw ≥ 2026.4.2 (peer dependency)
# From the memCC directory
npm install
# Verify the plugin metadata is picked up
cat openclaw.plugin.json
# Run tests
npm test
# Run benchmarks (OPENAI_API_KEY required)
OPENAI_API_KEY=sk-... npm run bench| Package | Purpose |
|---|---|
@lancedb/lancedb |
Vector store backend |
apache-arrow |
Columnar transport for LanceDB |
@sinclair/typebox |
Plugin config schema validation |
node:sqlite (built-in) |
Context + memory metadata stores |
The benchmark suites are standard Vitest specs under test/benchmark/. They pull datasets from a sibling memory-lancedb-pro/bench/ directory (LongMemEval and LoCoMo).
# 🎯 LongMemEval-50 (smoke test, ~1 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-50.test.ts --testTimeout 600000
# 📏 LongMemEval-100 (~3 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-100.test.ts --testTimeout 600000
# 🏁 LongMemEval-500 full (~45 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-500.test.ts --testTimeout 3600000
# 🎢 LoCoMo full (~40-50 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/locomo.test.ts --testTimeout 3600000test/benchmark/longmemeval.test.ts— representative subsettest/benchmark/longmemeval-100.test.ts— 100 questions + synthetic prefs + LLM reranktest/benchmark/longmemeval-500.test.ts— full 500test/benchmark/longmemeval-tail.test.ts— 420-500 slice (resumes after timeout)test/benchmark/locomo.test.ts— 10 conversations, 1,986 queries
OS: macOS 26.4 (arm64)
Node: 22.22.0
OpenClaw: 2026.4.9
MemCC: 0.1.0
Embedder: OpenAI text-embedding-3-small (1536-dim)
Judge: gpt-4o-mini
Retrieval quality
- Aggregation/counting path for multi-session "how many X" / "list all Y" queries (the remaining weak spot in the LongMemEval tail after date headers)
- Landmark caching — cheap per-document metadata (author, location, entity tags extracted once at ingest) to unlock gains similar to date headers for entity-centric queries
- Per-subset regression gates in CI so no benchmark subset can silently drop
Infrastructure
- Alternative embedding backends (local models, Qwen3-Embedding, Voyage)
- Configurable vector-store backend (LanceDB today; sqlite-vec as a zero-dep fallback)
- Export/import of the memory store for agent migration
Integration
- Multi-agent memory sharing with access control (grants already exist for sub-agents; expand to peer agents)
- Streaming ingestion API for real-time memory updates from external events
Contributions welcome.
MIT — see LICENSE (or package.json license field) for details.
{ "plugins": { "memcc": { "contextOptimization": false, // true → tighter budgets on L1-L3 "compactionModel": null, // null → inherit from OpenClaw "embeddingModel": "text-embedding-3-small", "llmRerank": true, // ~$0.001/search — default on "rerankModel": "gpt-4o-mini", // null → gpt-4o-mini default "vectorDedupEnabled": false, // opt-in write-time near-duplicate suppression "vectorDedupThreshold": 0.95 // cosine similarity in [0,1]; skip writes ≥ threshold } } }