MemCC — Hybrid Memory for OpenClaw

DAG Context Engine · Verbatim Vector Long-Term Memory · 4-Layer Retrieval

A plugin that gives OpenClaw agents both in-conversation context compression (hierarchical summary DAG) and cross-session long-term recall (vector store with tiered retrieval) — packaged as a single drop-in runtime extension.

🚀 Quick Start • 🌟 Overview • 🏗️ Architecture • 🧠 Retrieval Pipeline • 📈 Results • 🛠️ Tools • ⚙️ Configuration • 🔬 Benchmarks • 🇨🇳 中文文档

🔥 News

[2026-04-12] 🈯 Chinese support & opt-in vector dedup shipped. CJK preference extraction (12 new regex patterns covering likes, habits, goals, biographical facts, and nostalgia), CJK-aware token estimation using empirically-calibrated cl100k_base ratios (0.87 chars/token for CJK — fixes a ~4.6× underestimate on Chinese text that was causing compression to trigger far too late), and opt-in cosine-similarity deduplication at vector write time (vectorDedupEnabled, default off). English benchmark numbers are bit-identical on LongMemEval-50 — verified empirically, not just by construction.
[2026-04-11] 🚀 v1.0.0 released! Temporal reasoning fix ships with the first public release: prepending [Date: …] headers to indexed session documents and passing asOf to the LLM judge lifted LoCoMo temporal from 42.4% → 80.4% (+38.0 pts) and LongMemEval-500 from 72.7% → 80.9% (+8.2 pts).

🚀 Quick Start

MemCC is an OpenClaw plugin — it loads via OpenClaw's plugin discovery and registers itself as both a context engine and a memory capability. No standalone runtime, no separate server.

# 📥 Clone and install
git clone <repo> memcc
cd memcc
npm install

# ▶️ Point OpenClaw at the plugin
export OPENCLAW_PLUGINS=/path/to/memcc
# Or add the package path to your openclaw config

# 🔑 Set the OpenAI key used by embeddings + LLM rerank
export OPENAI_API_KEY=sk-...

# 🧪 Run the full test suite
npm test

# 📊 Run the benchmark suites (see §Benchmarks)
npm run bench

Once loaded, OpenClaw resolves the context engine memcc (also aliased as default) and exposes four tools to the agent: memory_search, memory_get, lcm_grep, lcm_expand_query. No additional setup is needed — the plugin migrates its own SQLite schemas on first boot and lazily initializes the LanceDB store.

🌟 Overview

LLM agents that live across many turns and many sessions face two fundamentally different problems:

The current conversation is too long to fit in the context window. You need to compress old turns without losing the ability to reach back and recover exact details.
Things learned in previous sessions are gone the moment the agent restarts. You need a persistent store that understands semantic recall, not just text matching.

Most memory systems pick one side. MemCC does both in one plugin, and crucially the two sides talk to each other: when the context engine compacts a chunk of conversation, the resulting summary is handed straight to the long-term memory runtime for embedding. When the context engine next assembles a prompt, it asks the memory runtime for relevant recall and splices it back into the same context that the compacted summaries live in.

🗜️ Context Engine (DAG)

Hierarchical in-session compaction

SQLite-backed conversation store with leaf → condensed summary rollup. Fresh-tail messages are preserved verbatim; older turns collapse into a summary tree keyed to the session. Large files are externalized and replaced with pointers.

Goal: stay under token budget without losing recoverability.

🧠 Long-Term Memory Runtime

Verbatim vector recall across sessions

LanceDB vector store (OpenAI text-embedding-3-small, 1536-dim) + SQLite metadata, backed by a classifier that routes each memory into a wing/room namespace. Synthetic preference docs, knowledge-graph triples, and a 4-layer retrieval pipeline sit on top.

Goal: semantic recall that survives restarts.

💡 The two subsystems are decoupled through a MemoryRuntimeBridge interface — the context engine never imports memory code directly. This means the memory runtime can initialize asynchronously (embeddings client startup) without blocking the context engine from accepting turns.

🏗️ Architecture

Three lanes — Host/User on the left, Context Engine (DAG) in the middle, Long-Term Memory Runtime on the right. Blue edges are the primary ingest/assemble flow, green edges are memory writes, orange is control/bootstrap, and gray-dashed is tool wiring.

Core components

Component	File	Role
Plugin shell	`src/plugin/index.ts`	Registers context engine, memory capability, and tools with OpenClaw; wires the `MemoryRuntimeBridge` after async init.
ContextEngine	`src/context/engine.ts`	Implements the OpenClaw `ContextEngine` interface: `ingest`, `ingestBatch`, `assemble`, `afterTurn`, `compact`, `bootstrap`. Owns compaction.
ConversationStore	`src/context/store/conversation-store.ts`	SQLite + optional FTS5 for raw messages.
SummaryStore	`src/context/store/summary-store.ts`	Stores the `leaf` / `condensed` summary DAG; tracks the per-conversation context item list.
CompactionEngine	`src/context/compaction.ts`	Decides when to compact (`0.75 × budget`), produces summaries via the configured model, preserves `freshTailCount=4` messages.
ContextAssembler	`src/context/assembler.ts`	Walks the DAG + fresh tail + memory-injected sections to emit a token-budgeted message list.
LargeFileHandler	`src/context/large-files.ts`	Externalizes oversize blobs and swaps in pointers before they pollute the summary tree.
Bootstrap	`src/context/bootstrap.ts`	Replays a session JSONL file on first hit so restarted sessions don't lose the DAG.
MemoryRuntime	`src/memory/runtime.ts`	Owns the memory side: `ingestSummary`, `retrieve`, `injectIntoContext`, `importLegacy`.
VectorStore	`src/memory/store/vector-store.ts`	LanceDB backend — embeds via the OpenAI client, upserts records with `wing/room` metadata. Opt-in cosine-similarity dedup at write time (see §Configuration).
MemoryStore	`src/memory/store/memory-store.ts`	SQLite metadata store: memory records, knowledge-graph triples, FTS index.
Retrieval layers	`src/memory/layers/*.ts`	`l0-identity.ts`, `l1-essential.ts`, `l2-on-demand.ts`, `l3-deep-search.ts` — see §Retrieval Pipeline.
Search	`src/memory/search/*.ts`	`hybrid-search.ts` (BM25-style + temporal boost), `llm-rerank.ts`, `knowledge-graph.ts`.
Classify	`src/memory/classify.ts`	Routes text to a `(wing, room)` namespace based on entity/topic detection.
Preference extractor	`src/memory/preference-extractor.ts`	Mines stated user preferences from compaction summaries (27 English + 12 Chinese regex patterns; CJK-aware length bounds) and stores them as synthetic memory docs for paraphrase recall.
Token estimator	`src/token-estimator.ts`	CJK-aware token count estimation used by compaction, assembly, bootstrap, and layer budgeting. Empirical `cl100k_base` ratios: 0.87 chars/token for CJK, 4.0 for Latin. Non-CJK fast path is bit-identical to the pre-existing `Math.ceil(text.length / 4)` formula.

Data flow — one turn, end to end

Ingest. OpenClaw passes each turn to ContextEngine.ingest(). Large files are externalized; the raw message is stored and a raw context item is appended to the conversation's item list.
Compaction check. After the turn, afterTurn() asks the CompactionEngine whether the conversation is over 75% of its budget. If yes, a leaf summary is produced (optionally rolled into a condensed summary).
Summary → memory. Each summary is pushed through MemoryRuntimeBridge.ingestSummary(). The MemoryRuntime classifies it, embeds it with OpenAI, upserts into LanceDB, and runs the preference extractor. Both the summary doc and any synthetic preference docs land in the vector store.
Assemble. On the next turn, assemble() grabs the last 3 recent messages as a retrieval query, calls memoryRuntime.retrieve(query), then injectIntoContext(retrieval) to produce budgeted <memcc-identity>, <memcc-essential>, <memcc-context>, <memcc-recall> sections. These are interleaved with the DAG assembly (summaries + fresh tail).
Hint. If compressed history exists, a systemPromptAddition is emitted telling the agent it may call lcm_grep / lcm_expand_query to recover exact detail from a summary.

🧠 Retrieval Pipeline

MemCC doesn't do a single "top-K vector search". Each retrieval fans out across four independent layers, each answering a different question, with its own token budget.

The query is broadcast to L0–L3 in parallel. L3 internally runs vector → hybrid rerank → KG merge → (optional) LLM rerank. Every layer writes into its own wrapped section at inject time, with a token budget that tightens when contextOptimization=true.

Layer	Source	What it's for	Budget
L0 Identity	`agentDir/*` static files	Who this agent is — role, persona, baseline facts. Always injected verbatim.	unbounded
L1 Essential	`MemoryStore` pinned memories	Top identity fragments and high-weight pinned facts. Trimmed line-by-line to budget.	800 tok (400 in opt mode)
L2 On-Demand	`MemoryStore` wing-filtered	Memories tagged with the wing detected from the query (e.g. a "coding" wing for code questions). Skipped entirely in optimization mode.	≤ 500 tok or dropped
L3 Deep Search	`VectorStore` + `MemoryStore` triples	Semantic recall: vector top-20 → BM25 + temporal rerank → optional KG merge → optional LLM rerank.	top 10 (top 3 in opt mode)

L3 detail — hybrid rerank

Vector search — 20 candidates from LanceDB (text-embedding-3-small).
Hybrid rerank (hybrid-search.ts) — BM25-style lexical scoring + temporal boost (newer memories win on ties).
KG merge (knowledge-graph.ts) — if query entities match stored (subject, predicate, object) triples, a synthetic KG result is injected unless already covered.
LLM rerank (llm-rerank.ts, gated by llmRerank=true, default: gpt-4o-mini) — final cross-encoder pass for the last mile of precision. Costs roughly $0.001/search.

💡 The layer split isn't academic — it's what makes the budget shrinkable. contextOptimization=true drops L2 entirely, halves L1, and caps L3 at 3 hits. That's the difference between "inject ~2.5k tok per turn" and "inject ~800 tok per turn" on the same query.

📈 Results

MemCC passes every target on both LoCoMo and LongMemEval, and posts leading numbers in temporal reasoning — historically the hardest subset for memory systems.

🏆 86.9%
_{LoCoMo overall (1,986 q)}

🏆 80.4%
_{LoCoMo temporal (321 q)}

🏆 88.1%
_{LoCoMo episodic (1,665 q)}

🏆 80.9%
_{LongMemEval-500 (498 q)}

Benchmark	Score	Target	Status
LoCoMo overall	86.9%	≥ 50%	✅ PASS
LoCoMo episodic	88.1%	≥ 60%	✅ PASS
LoCoMo temporal	80.4%	≥ 50%	✅ PASS
LongMemEval-100	92.1%	≥ 88%	✅ PASS
LongMemEval-500	80.9%	—	leading

🛠️ Tools

MemCC registers four tools with OpenClaw. Three of them operate on short-term context (the compacted DAG), one operates on long-term memory.

Tool	Operates on	What it does
`memory_search`	Long-term (LanceDB)	Semantic search across all stored memories. Returns hits from the L3 deep search layer with scores. Best for "what did we decide about X last month".
`memory_get`	Long-term (LanceDB)	Fetch a single memory by id — useful when `memory_search` gives an id and the agent wants the full verbatim text.
`lcm_grep`	Short-term (context DAG)	Full-text search inside the current session's messages and compacted summaries. Use this when the agent needs an exact phrase from compressed history.
`lcm_expand_query`	Short-term (context DAG)	Expand a compacted summary back to the underlying raw messages that produced it — effectively "undo this piece of the compression".

When assemble() detects compressed history, it injects a system-prompt hint telling the agent exactly which tools to reach for:

"Some earlier conversation has been compressed into summaries. If you need exact details from compressed history, use lcm_grep to search, then lcm_expand_query for full context."

⚙️ Configuration

All config flows through the OpenClaw plugin config schema defined in openclaw.plugin.json. Every option is optional — the defaults are already tuned.

{
  "plugins": {
    "memcc": {
      "contextOptimization": false,    // true → tighter budgets on L1-L3
      "compactionModel": null,          // null → inherit from OpenClaw
      "embeddingModel": "text-embedding-3-small",
      "llmRerank": true,                // ~$0.001/search — default on
      "rerankModel": "gpt-4o-mini",     // null → gpt-4o-mini default
      "vectorDedupEnabled": false,      // opt-in write-time near-duplicate suppression
      "vectorDedupThreshold": 0.95      // cosine similarity in [0,1]; skip writes ≥ threshold
    }
  }
}

Option	Default	Effect
`contextOptimization`	`false`	`true` halves L1 budget (800→400), drops L2 entirely, caps L3 at 3 hits, tightens compaction targets (leaf 1500→1000, condensed 2000→1200).
`compactionModel`	`null` (inherit)	Model used to generate leaf/condensed summaries. Provider/model string like `openai/gpt-4o-mini`.
`embeddingModel`	`text-embedding-3-small`	OpenAI embedding model — anything compatible with the OpenAI Embeddings API.
`llmRerank`	`true`	Runs an LLM cross-encoder pass after hybrid rerank. Set `false` for pure vector + BM25.
`rerankModel`	`gpt-4o-mini`	Model used for `llmRerank`. Cheap defaults recommended.
`vectorDedupEnabled`	`false`	When `true`, `VectorStore.add()` computes cosine similarity against the nearest existing neighbor and skips writes whose similarity ≥ `vectorDedupThreshold`. Per-item against persisted state only (not intra-batch). Useful for long-running ingestion workloads where the same fact may be re-ingested; opt-in because LongMemEval-style benchmarks don't exercise the dedup path.
`vectorDedupThreshold`	`0.95`	Cosine similarity threshold for write-time dedup. Valid range `[0, 1]` inclusive. Out-of-range or `NaN` values fall back to default. Tighter (`0.98`) suppresses fewer duplicates; looser (`0.90`) suppresses more.

Storage locations (auto-resolved)

$HOME/.openclaw/memcc/
├── context.db      # SQLite — conversations, messages, summaries
├── memory.db       # SQLite — memory records, triples, FTS
└── lance/          # LanceDB — vector store

Hard-coded optimal defaults

These aren't in the config schema — they've been tuned by the benchmark runs.

Parameter	Default (opt)	Notes
`contextThreshold`	0.75	Compaction fires at 75% of token budget.
`freshTailCount`	4	Last 4 messages always kept verbatim.
`leafChunkTokens`	4000	Max tokens collapsed into one leaf.
`leafTargetTokens`	1500 (1000)	Target size of a leaf summary.
`condensedTargetTokens`	2000 (1200)	Target size of a condensed summary.
`leafMinFanout`	3	Minimum leaves before a condensed roll-up.

📦 Installation

Requirements

🟢 Node.js 22.x (for the built-in node:sqlite module)
🔑 OpenAI-compatible API for embeddings + rerank
🪶 OpenClaw ≥ 2026.4.2 (peer dependency)

Steps

# From the memCC directory
npm install

# Verify the plugin metadata is picked up
cat openclaw.plugin.json

# Run tests
npm test

# Run benchmarks (OPENAI_API_KEY required)
OPENAI_API_KEY=sk-... npm run bench

Runtime deps

Package	Purpose
`@lancedb/lancedb`	Vector store backend
`apache-arrow`	Columnar transport for LanceDB
`@sinclair/typebox`	Plugin config schema validation
`node:sqlite` (built-in)	Context + memory metadata stores

🔬 Benchmarks

The benchmark suites are standard Vitest specs under test/benchmark/. They pull datasets from a sibling memory-lancedb-pro/bench/ directory (LongMemEval and LoCoMo).

# 🎯 LongMemEval-50 (smoke test, ~1 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-50.test.ts --testTimeout 600000

# 📏 LongMemEval-100 (~3 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-100.test.ts --testTimeout 600000

# 🏁 LongMemEval-500 full (~45 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-500.test.ts --testTimeout 3600000

# 🎢 LoCoMo full (~40-50 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/locomo.test.ts --testTimeout 3600000

Files

test/benchmark/longmemeval.test.ts — representative subset
test/benchmark/longmemeval-100.test.ts — 100 questions + synthetic prefs + LLM rerank
test/benchmark/longmemeval-500.test.ts — full 500
test/benchmark/longmemeval-tail.test.ts — 420-500 slice (resumes after timeout)
test/benchmark/locomo.test.ts — 10 conversations, 1,986 queries

Environment (for reproducing the 2026-04-11 run)

OS:       macOS 26.4 (arm64)
Node:     22.22.0
OpenClaw: 2026.4.9
MemCC:    0.1.0
Embedder: OpenAI text-embedding-3-small (1536-dim)
Judge:    gpt-4o-mini

🗺️ Roadmap

Retrieval quality

Aggregation/counting path for multi-session "how many X" / "list all Y" queries (the remaining weak spot in the LongMemEval tail after date headers)
Landmark caching — cheap per-document metadata (author, location, entity tags extracted once at ingest) to unlock gains similar to date headers for entity-centric queries
Per-subset regression gates in CI so no benchmark subset can silently drop

Infrastructure

Alternative embedding backends (local models, Qwen3-Embedding, Voyage)
Configurable vector-store backend (LanceDB today; sqlite-vec as a zero-dep fallback)
Export/import of the memory store for agent migration

Integration

Multi-agent memory sharing with access control (grants already exist for sub-agents; expand to peer agents)
Streaming ingestion API for real-time memory updates from external events

Contributions welcome.

📄 License

MIT — see LICENSE (or package.json license field) for details.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
src		src
test		test
.gitignore		.gitignore
README.md		README.md
index.ts		index.ts
openclaw.plugin.json		openclaw.plugin.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemCC — Hybrid Memory for OpenClaw

DAG Context Engine · Verbatim Vector Long-Term Memory · 4-Layer Retrieval

🔥 News

📑 Table of Contents

🚀 Quick Start

🌟 Overview

🗜️ Context Engine (DAG)

🧠 Long-Term Memory Runtime

🏗️ Architecture

Core components

Data flow — one turn, end to end

🧠 Retrieval Pipeline

L3 detail — hybrid rerank

📈 Results

🛠️ Tools

⚙️ Configuration

Storage locations (auto-resolved)

Hard-coded optimal defaults

📦 Installation

Requirements

Steps

Runtime deps

🔬 Benchmarks

Files

Environment (for reproducing the 2026-04-11 run)

🗺️ Roadmap

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MemCC — Hybrid Memory for OpenClaw

DAG Context Engine · Verbatim Vector Long-Term Memory · 4-Layer Retrieval

🔥 News

📑 Table of Contents

🚀 Quick Start

🌟 Overview

🗜️ Context Engine (DAG)

🧠 Long-Term Memory Runtime

🏗️ Architecture

Core components

Data flow — one turn, end to end

🧠 Retrieval Pipeline

L3 detail — hybrid rerank

📈 Results

🛠️ Tools

⚙️ Configuration

Storage locations (auto-resolved)

Hard-coded optimal defaults

📦 Installation

Requirements

Steps

Runtime deps

🔬 Benchmarks

Files

Environment (for reproducing the 2026-04-11 run)

🗺️ Roadmap

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages