Skip to content

CortexReach/memCC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MemCC — Hybrid Memory for OpenClaw

DAG Context Engine · Verbatim Vector Long-Term Memory · 4-Layer Retrieval

A plugin that gives OpenClaw agents both in-conversation context compression (hierarchical summary DAG) and cross-session long-term recall (vector store with tiered retrieval) — packaged as a single drop-in runtime extension.


LoCoMo 86.9% LoCoMo temporal 80.4% LongMemEval-500 80.9% LongMemEval-100 92.1%
OpenClaw ≥2026.4.2 Node 22.x TypeScript 5.7 MIT License

🚀 Quick Start🌟 Overview🏗️ Architecture🧠 Retrieval Pipeline📈 Results🛠️ Tools⚙️ Configuration🔬 Benchmarks🇨🇳 中文文档


🔥 News

  • [2026-04-12] 🈯 Chinese support & opt-in vector dedup shipped. CJK preference extraction (12 new regex patterns covering likes, habits, goals, biographical facts, and nostalgia), CJK-aware token estimation using empirically-calibrated cl100k_base ratios (0.87 chars/token for CJK — fixes a ~4.6× underestimate on Chinese text that was causing compression to trigger far too late), and opt-in cosine-similarity deduplication at vector write time (vectorDedupEnabled, default off). English benchmark numbers are bit-identical on LongMemEval-50 — verified empirically, not just by construction.
  • [2026-04-11] 🚀 v1.0.0 released! Temporal reasoning fix ships with the first public release: prepending [Date: …] headers to indexed session documents and passing asOf to the LLM judge lifted LoCoMo temporal from 42.4% → 80.4% (+38.0 pts) and LongMemEval-500 from 72.7% → 80.9% (+8.2 pts).

📑 Table of Contents


🚀 Quick Start

MemCC is an OpenClaw plugin — it loads via OpenClaw's plugin discovery and registers itself as both a context engine and a memory capability. No standalone runtime, no separate server.

# 📥 Clone and install
git clone <repo> memcc
cd memcc
npm install

# ▶️ Point OpenClaw at the plugin
export OPENCLAW_PLUGINS=/path/to/memcc
# Or add the package path to your openclaw config

# 🔑 Set the OpenAI key used by embeddings + LLM rerank
export OPENAI_API_KEY=sk-...

# 🧪 Run the full test suite
npm test

# 📊 Run the benchmark suites (see §Benchmarks)
npm run bench

Once loaded, OpenClaw resolves the context engine memcc (also aliased as default) and exposes four tools to the agent: memory_search, memory_get, lcm_grep, lcm_expand_query. No additional setup is needed — the plugin migrates its own SQLite schemas on first boot and lazily initializes the LanceDB store.


🌟 Overview

LLM agents that live across many turns and many sessions face two fundamentally different problems:

  1. The current conversation is too long to fit in the context window. You need to compress old turns without losing the ability to reach back and recover exact details.
  2. Things learned in previous sessions are gone the moment the agent restarts. You need a persistent store that understands semantic recall, not just text matching.

Most memory systems pick one side. MemCC does both in one plugin, and crucially the two sides talk to each other: when the context engine compacts a chunk of conversation, the resulting summary is handed straight to the long-term memory runtime for embedding. When the context engine next assembles a prompt, it asks the memory runtime for relevant recall and splices it back into the same context that the compacted summaries live in.

🗜️ Context Engine (DAG)

Hierarchical in-session compaction

SQLite-backed conversation store with leaf → condensed summary rollup. Fresh-tail messages are preserved verbatim; older turns collapse into a summary tree keyed to the session. Large files are externalized and replaced with pointers.

Goal: stay under token budget without losing recoverability.

🧠 Long-Term Memory Runtime

Verbatim vector recall across sessions

LanceDB vector store (OpenAI text-embedding-3-small, 1536-dim) + SQLite metadata, backed by a classifier that routes each memory into a wing/room namespace. Synthetic preference docs, knowledge-graph triples, and a 4-layer retrieval pipeline sit on top.

Goal: semantic recall that survives restarts.

💡 The two subsystems are decoupled through a MemoryRuntimeBridge interface — the context engine never imports memory code directly. This means the memory runtime can initialize asynchronously (embeddings client startup) without blocking the context engine from accepting turns.


🏗️ Architecture

MemCC Architecture

Three lanes — Host/User on the left, Context Engine (DAG) in the middle, Long-Term Memory Runtime on the right. Blue edges are the primary ingest/assemble flow, green edges are memory writes, orange is control/bootstrap, and gray-dashed is tool wiring.

Core components

Component File Role
Plugin shell src/plugin/index.ts Registers context engine, memory capability, and tools with OpenClaw; wires the MemoryRuntimeBridge after async init.
ContextEngine src/context/engine.ts Implements the OpenClaw ContextEngine interface: ingest, ingestBatch, assemble, afterTurn, compact, bootstrap. Owns compaction.
ConversationStore src/context/store/conversation-store.ts SQLite + optional FTS5 for raw messages.
SummaryStore src/context/store/summary-store.ts Stores the leaf / condensed summary DAG; tracks the per-conversation context item list.
CompactionEngine src/context/compaction.ts Decides when to compact (0.75 × budget), produces summaries via the configured model, preserves freshTailCount=4 messages.
ContextAssembler src/context/assembler.ts Walks the DAG + fresh tail + memory-injected sections to emit a token-budgeted message list.
LargeFileHandler src/context/large-files.ts Externalizes oversize blobs and swaps in pointers before they pollute the summary tree.
Bootstrap src/context/bootstrap.ts Replays a session JSONL file on first hit so restarted sessions don't lose the DAG.
MemoryRuntime src/memory/runtime.ts Owns the memory side: ingestSummary, retrieve, injectIntoContext, importLegacy.
VectorStore src/memory/store/vector-store.ts LanceDB backend — embeds via the OpenAI client, upserts records with wing/room metadata. Opt-in cosine-similarity dedup at write time (see §Configuration).
MemoryStore src/memory/store/memory-store.ts SQLite metadata store: memory records, knowledge-graph triples, FTS index.
Retrieval layers src/memory/layers/*.ts l0-identity.ts, l1-essential.ts, l2-on-demand.ts, l3-deep-search.ts — see §Retrieval Pipeline.
Search src/memory/search/*.ts hybrid-search.ts (BM25-style + temporal boost), llm-rerank.ts, knowledge-graph.ts.
Classify src/memory/classify.ts Routes text to a (wing, room) namespace based on entity/topic detection.
Preference extractor src/memory/preference-extractor.ts Mines stated user preferences from compaction summaries (27 English + 12 Chinese regex patterns; CJK-aware length bounds) and stores them as synthetic memory docs for paraphrase recall.
Token estimator src/token-estimator.ts CJK-aware token count estimation used by compaction, assembly, bootstrap, and layer budgeting. Empirical cl100k_base ratios: 0.87 chars/token for CJK, 4.0 for Latin. Non-CJK fast path is bit-identical to the pre-existing Math.ceil(text.length / 4) formula.

Data flow — one turn, end to end

  1. Ingest. OpenClaw passes each turn to ContextEngine.ingest(). Large files are externalized; the raw message is stored and a raw context item is appended to the conversation's item list.
  2. Compaction check. After the turn, afterTurn() asks the CompactionEngine whether the conversation is over 75% of its budget. If yes, a leaf summary is produced (optionally rolled into a condensed summary).
  3. Summary → memory. Each summary is pushed through MemoryRuntimeBridge.ingestSummary(). The MemoryRuntime classifies it, embeds it with OpenAI, upserts into LanceDB, and runs the preference extractor. Both the summary doc and any synthetic preference docs land in the vector store.
  4. Assemble. On the next turn, assemble() grabs the last 3 recent messages as a retrieval query, calls memoryRuntime.retrieve(query), then injectIntoContext(retrieval) to produce budgeted <memcc-identity>, <memcc-essential>, <memcc-context>, <memcc-recall> sections. These are interleaved with the DAG assembly (summaries + fresh tail).
  5. Hint. If compressed history exists, a systemPromptAddition is emitted telling the agent it may call lcm_grep / lcm_expand_query to recover exact detail from a summary.

🧠 Retrieval Pipeline

MemCC doesn't do a single "top-K vector search". Each retrieval fans out across four independent layers, each answering a different question, with its own token budget.

4-Layer Retrieval

The query is broadcast to L0–L3 in parallel. L3 internally runs vector → hybrid rerank → KG merge → (optional) LLM rerank. Every layer writes into its own wrapped section at inject time, with a token budget that tightens when contextOptimization=true.

Layer Source What it's for Budget
L0 Identity agentDir/* static files Who this agent is — role, persona, baseline facts. Always injected verbatim. unbounded
L1 Essential MemoryStore pinned memories Top identity fragments and high-weight pinned facts. Trimmed line-by-line to budget. 800 tok (400 in opt mode)
L2 On-Demand MemoryStore wing-filtered Memories tagged with the wing detected from the query (e.g. a "coding" wing for code questions). Skipped entirely in optimization mode. 500 tok or dropped
L3 Deep Search VectorStore + MemoryStore triples Semantic recall: vector top-20 → BM25 + temporal rerank → optional KG merge → optional LLM rerank. top 10 (top 3 in opt mode)

L3 detail — hybrid rerank

  1. Vector search — 20 candidates from LanceDB (text-embedding-3-small).
  2. Hybrid rerank (hybrid-search.ts) — BM25-style lexical scoring + temporal boost (newer memories win on ties).
  3. KG merge (knowledge-graph.ts) — if query entities match stored (subject, predicate, object) triples, a synthetic KG result is injected unless already covered.
  4. LLM rerank (llm-rerank.ts, gated by llmRerank=true, default: gpt-4o-mini) — final cross-encoder pass for the last mile of precision. Costs roughly $0.001/search.

💡 The layer split isn't academic — it's what makes the budget shrinkable. contextOptimization=true drops L2 entirely, halves L1, and caps L3 at 3 hits. That's the difference between "inject ~2.5k tok per turn" and "inject ~800 tok per turn" on the same query.


📈 Results

MemCC passes every target on both LoCoMo and LongMemEval, and posts leading numbers in temporal reasoning — historically the hardest subset for memory systems.

🏆 86.9%
LoCoMo overall (1,986 q)
🏆 80.4%
LoCoMo temporal (321 q)
🏆 88.1%
LoCoMo episodic (1,665 q)
🏆 80.9%
LongMemEval-500 (498 q)
Benchmark Score Target Status
LoCoMo overall 86.9% ≥ 50% ✅ PASS
LoCoMo episodic 88.1% ≥ 60% ✅ PASS
LoCoMo temporal 80.4% ≥ 50% ✅ PASS
LongMemEval-100 92.1% ≥ 88% ✅ PASS
LongMemEval-500 80.9% leading

🛠️ Tools

MemCC registers four tools with OpenClaw. Three of them operate on short-term context (the compacted DAG), one operates on long-term memory.

Tool Operates on What it does
memory_search Long-term (LanceDB) Semantic search across all stored memories. Returns hits from the L3 deep search layer with scores. Best for "what did we decide about X last month".
memory_get Long-term (LanceDB) Fetch a single memory by id — useful when memory_search gives an id and the agent wants the full verbatim text.
lcm_grep Short-term (context DAG) Full-text search inside the current session's messages and compacted summaries. Use this when the agent needs an exact phrase from compressed history.
lcm_expand_query Short-term (context DAG) Expand a compacted summary back to the underlying raw messages that produced it — effectively "undo this piece of the compression".

When assemble() detects compressed history, it injects a system-prompt hint telling the agent exactly which tools to reach for:

"Some earlier conversation has been compressed into summaries. If you need exact details from compressed history, use lcm_grep to search, then lcm_expand_query for full context."


⚙️ Configuration

All config flows through the OpenClaw plugin config schema defined in openclaw.plugin.json. Every option is optional — the defaults are already tuned.

{
  "plugins": {
    "memcc": {
      "contextOptimization": false,    // true → tighter budgets on L1-L3
      "compactionModel": null,          // null → inherit from OpenClaw
      "embeddingModel": "text-embedding-3-small",
      "llmRerank": true,                // ~$0.001/search — default on
      "rerankModel": "gpt-4o-mini",     // null → gpt-4o-mini default
      "vectorDedupEnabled": false,      // opt-in write-time near-duplicate suppression
      "vectorDedupThreshold": 0.95      // cosine similarity in [0,1]; skip writes ≥ threshold
    }
  }
}
Option Default Effect
contextOptimization false true halves L1 budget (800→400), drops L2 entirely, caps L3 at 3 hits, tightens compaction targets (leaf 1500→1000, condensed 2000→1200).
compactionModel null (inherit) Model used to generate leaf/condensed summaries. Provider/model string like openai/gpt-4o-mini.
embeddingModel text-embedding-3-small OpenAI embedding model — anything compatible with the OpenAI Embeddings API.
llmRerank true Runs an LLM cross-encoder pass after hybrid rerank. Set false for pure vector + BM25.
rerankModel gpt-4o-mini Model used for llmRerank. Cheap defaults recommended.
vectorDedupEnabled false When true, VectorStore.add() computes cosine similarity against the nearest existing neighbor and skips writes whose similarity ≥ vectorDedupThreshold. Per-item against persisted state only (not intra-batch). Useful for long-running ingestion workloads where the same fact may be re-ingested; opt-in because LongMemEval-style benchmarks don't exercise the dedup path.
vectorDedupThreshold 0.95 Cosine similarity threshold for write-time dedup. Valid range [0, 1] inclusive. Out-of-range or NaN values fall back to default. Tighter (0.98) suppresses fewer duplicates; looser (0.90) suppresses more.

Storage locations (auto-resolved)

$HOME/.openclaw/memcc/
├── context.db      # SQLite — conversations, messages, summaries
├── memory.db       # SQLite — memory records, triples, FTS
└── lance/          # LanceDB — vector store

Hard-coded optimal defaults

These aren't in the config schema — they've been tuned by the benchmark runs.

Parameter Default (opt) Notes
contextThreshold 0.75 Compaction fires at 75% of token budget.
freshTailCount 4 Last 4 messages always kept verbatim.
leafChunkTokens 4000 Max tokens collapsed into one leaf.
leafTargetTokens 1500 (1000) Target size of a leaf summary.
condensedTargetTokens 2000 (1200) Target size of a condensed summary.
leafMinFanout 3 Minimum leaves before a condensed roll-up.

📦 Installation

Requirements

  • 🟢 Node.js 22.x (for the built-in node:sqlite module)
  • 🔑 OpenAI-compatible API for embeddings + rerank
  • 🪶 OpenClaw ≥ 2026.4.2 (peer dependency)

Steps

# From the memCC directory
npm install

# Verify the plugin metadata is picked up
cat openclaw.plugin.json

# Run tests
npm test

# Run benchmarks (OPENAI_API_KEY required)
OPENAI_API_KEY=sk-... npm run bench

Runtime deps

Package Purpose
@lancedb/lancedb Vector store backend
apache-arrow Columnar transport for LanceDB
@sinclair/typebox Plugin config schema validation
node:sqlite (built-in) Context + memory metadata stores

🔬 Benchmarks

The benchmark suites are standard Vitest specs under test/benchmark/. They pull datasets from a sibling memory-lancedb-pro/bench/ directory (LongMemEval and LoCoMo).

# 🎯 LongMemEval-50 (smoke test, ~1 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-50.test.ts --testTimeout 600000

# 📏 LongMemEval-100 (~3 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-100.test.ts --testTimeout 600000

# 🏁 LongMemEval-500 full (~45 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/longmemeval-500.test.ts --testTimeout 3600000

# 🎢 LoCoMo full (~40-50 min)
OPENAI_API_KEY=<key> npx vitest run test/benchmark/locomo.test.ts --testTimeout 3600000

Files

  • test/benchmark/longmemeval.test.ts — representative subset
  • test/benchmark/longmemeval-100.test.ts — 100 questions + synthetic prefs + LLM rerank
  • test/benchmark/longmemeval-500.test.ts — full 500
  • test/benchmark/longmemeval-tail.test.ts — 420-500 slice (resumes after timeout)
  • test/benchmark/locomo.test.ts — 10 conversations, 1,986 queries

Environment (for reproducing the 2026-04-11 run)

OS:       macOS 26.4 (arm64)
Node:     22.22.0
OpenClaw: 2026.4.9
MemCC:    0.1.0
Embedder: OpenAI text-embedding-3-small (1536-dim)
Judge:    gpt-4o-mini

🗺️ Roadmap

Retrieval quality

  • Aggregation/counting path for multi-session "how many X" / "list all Y" queries (the remaining weak spot in the LongMemEval tail after date headers)
  • Landmark caching — cheap per-document metadata (author, location, entity tags extracted once at ingest) to unlock gains similar to date headers for entity-centric queries
  • Per-subset regression gates in CI so no benchmark subset can silently drop

Infrastructure

  • Alternative embedding backends (local models, Qwen3-Embedding, Voyage)
  • Configurable vector-store backend (LanceDB today; sqlite-vec as a zero-dep fallback)
  • Export/import of the memory store for agent migration

Integration

  • Multi-agent memory sharing with access control (grants already exist for sub-agents; expand to peer agents)
  • Streaming ingestion API for real-time memory updates from external events

Contributions welcome.


📄 License

MIT — see LICENSE (or package.json license field) for details.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors