Episodic memory service for AI agents — automatic consolidation, neuroscience-inspired retrieval.
| Dashboard | Graph view |
|---|---|
![]() |
![]() |
AI agents are stateless. When a bot conversation ends, every observation, preference, and decision it accumulated is gone. Naive solutions make this worse: storing raw messages and doing keyword search gives you a log, not a memory. Flat embeddings + cosine similarity retrieves what matches your query, not what's relevant to it.
Real memory isn't lookup. It's association: a question about Alice surfaces what you know about Alice's team, her preferences, a decision made last month that affects the answer. Getting there requires structure the agent didn't have to build manually.
Engram is grounded in the Synapse spreading activation model and the neuroscience of episodic memory. The key ideas:
- Episodes consolidate into engrams. Raw observations are transient. Repeated or semantically related episodes consolidate — via LLM summarization — into durable structured memories called engrams.
- New memories are labile. For 24 hours after formation, a memory can be updated by new related episodes. After that window closes, it freezes. This mimics the molecular biology of memory consolidation.
- Memory fades automatically. Engrams decay exponentially over time — handled by a background process, no client scheduling needed. Operational details (meeting reminders, deploy notes) decay faster than knowledge (facts, decisions, preferences). Access slows decay; reinforcement reverses it.
- Retrieval is activation, not lookup. A query seeds a spreading activation process — not a vector search — that propagates through the memory graph, surfacing relevant engrams even when they don't directly match the query.
Engram runs as a sidecar service. Any agent — Discord bot, Slack bot, Claude agent via MCP — posts raw observations to Engram, then queries it at retrieval time. The service handles everything else.
Three memory tiers + schemas:
| Tier | Type | Description |
|---|---|---|
| 1 | Episodes | Raw ingested messages, events, observations — lossless |
| 2 | Entities | Named entities (people, orgs, technologies) extracted by NER |
| 3 | Engrams | LLM-consolidated memory summaries, the primary retrieval target |
| — | Schemas | Recurring patterns extracted from L2+ engrams — behavioural generalizations surfaced at retrieval time |
Retrieval uses spreading activation. Three signals seed the activation in parallel — semantic vector similarity, lexical BM25 full-text search, and NER-matched entity lookup — then activation spreads across the engram graph through typed edges. Lateral inhibition sharpens results. A "feeling of knowing" gate returns empty rather than confabulating when memory confidence is too low.
Consolidation is automatic. A background pipeline runs every 15 minutes: Claude (or Ollama) infers semantic relationships between recent episodes using a sliding window, clusters them, and summarizes each cluster into an engram. Engrams link back to their source episodes and extracted entities, building a traversable memory graph without any manual curation.
Multi-level compression. Every engram has five pre-computed pyramid summaries (4, 8, 16, 32, 64 words). Callers request the compression level that fits their token budget.
Schema induction. After enough memories accumulate, Engram extracts recurring behavioural schemas — generalizations about problem types, approaches, and what works. Schemas are pre-computed at all compression levels and surfaced automatically alongside recalled memories, letting agents apply learned patterns without storing everything in the prompt.
- Go 1.24+
- Ollama with
nomic-embed-textpulled (for embeddings) - One of: Anthropic API key · Claude Code CLI installed · Ollama (for consolidation)
go install github.com/vthunder/engram/cmd/engram@latestOr build from source:
git clone https://github.com/vthunder/engram
cd engram
go build -o engram ./cmd/engram# engram.yaml
server:
port: 8080
# api_key: "your-secret-key" # omit to disable auth (fine for local use)
storage:
path: "./engram.db"
# Pyramid compression — fast local model is sufficient for word-count compression
compression_llm:
provider: "ollama"
model: "qwen2.5:7b"
base_url: "http://localhost:11434"
# Engram summarization — Haiku produces coherent prose reliably
consolidation_llm:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
# Relationship/edge detection — structured JSON output
inference_llm:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
embedding:
base_url: "http://localhost:11434"
model: "nomic-embed-text"
ner:
provider: "spacy"
spacy_url: "http://localhost:8001"
consolidation:
enabled: true
interval: "15m"
decay:
interval: "1h" # run decay every hour (set to 0 to disable)
lambda: 0.005 # exponential decay rate
floor: 0.01 # minimum activation levelANTHROPIC_API_KEY=sk-ant-... ./engram --config engram.yamlNo Anthropic API key? Set consolidation_llm.provider: "claude-code" and inference_llm.provider: "claude-code" to use an existing Claude Code subscription, or use "ollama" for a fully local setup. See Configuration for all options.
No spaCy sidecar? Set ner.provider: "ollama" for model-based NER, or omit the ner block to skip entity extraction entirely (retrieval still works via semantic + lexical seeding).
# Ingest an observation
curl -X POST http://localhost:8080/v1/episodes \
-H "Content-Type: application/json" \
-d '{"content": "Alice mentioned she prefers morning standups.", "source": "slack", "author": "alice"}'
# Query memory (spreading activation retrieval)
curl -X POST http://localhost:8080/v1/engrams/search \
-H "Content-Type: application/json" \
-d '{"query": "Alice meeting preferences", "limit": 10}'
# Trigger consolidation manually
curl -X POST http://localhost:8080/v1/consolidateA browser UI for browsing engrams, episodes, entities, and the memory graph ships in the ui/ directory.
cd ui
npm install
npm run devOpen http://localhost:5173. The dev server proxies /v1 and /health to http://localhost:8080, so no CORS configuration is needed.
To point the UI at a different host (e.g. a remote server), edit ui/public/config.json — no rebuild required:
{ "apiUrl": "http://your-server:8080", "apiKey": "your-secret-key" }Engram can serve as an MCP server alongside the REST API, giving Claude agents direct memory access.
ENGRAM_MCP=1 ./engram --config engram.yamlAdd to claude_desktop_config.json or .mcp.json:
{
"mcpServers": {
"engram": {
"command": "/path/to/engram",
"args": ["--config", "/path/to/engram.yaml"],
"env": {
"ENGRAM_MCP": "1",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}MCP tools: search_memory, list_engrams, get_engram, get_engram_context, query_episode, get_schema.
┌──────────────┐ ┌──────────────────────────────────────────┐
│ Claude agent │─MCP─▶│ Engram Service │
│ │ │ │
└──────────────┘ │ ┌───────────┐ ┌─────────────────────┐ │
│ │ REST + │ │ SQLite graph DB │ │
┌──────────────┐ │ │ MCP API │◀─▶│ sqlite-vec (KNN) │ │
│ Bot / agent │─────▶│ │ │ │ FTS5 (BM25) │ │
└──────────────┘ │ └─────┬─────┘ └─────────────────────┘ │
│ │ │
│ ┌─────▼──────────────────────────────┐ │
│ │ Background pipeline │ │
│ │ NER (spaCy/Ollama) · Embeddings │ │
│ │ Consolidation (Claude/Ollama) │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘
Engram stores everything in a single SQLite file — no external database. The sqlite-vec extension handles vector KNN; FTS5 handles lexical search. Both are bundled extensions, not separate services.
# docker-compose.yml
services:
bot:
image: mybot
environment:
ENGRAM_URL: http://engram:8080
ENGRAM_API_KEY: ${ENGRAM_API_KEY}
engram:
image: ghcr.io/vthunder/engram:latest
environment:
ENGRAM_SERVER_API_KEY: ${ENGRAM_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
ENGRAM_EMBEDDING_BASE_URL: http://ollama:11434
volumes:
- engram-data:/data
command: ["--config", "/config/engram.yaml"]Beyond spreading activation retrieval, Engram lets bots maintain a tiered conversation buffer — recent messages raw, older ones compressed, anything beyond the buffer retrievable as engrams. The channel field on episodes groups messages by conversation; ?before={id} provides cursor-based pagination; ?level=N applies pyramid compression in-band:
GET /v1/episodes?channel=guild:general&limit=10 # raw recent
GET /v1/episodes?channel=guild:general&limit=20&before={ep10_id}&level=8 # compressed
GET /v1/episodes?channel=guild:general&limit=70&before={ep30_id}&unconsolidated=true&level=8 # bufferSetting consolidation.max_buffer equal to the bot's fetch limit ensures older episodes are always in one place or the other — never in limbo. See API reference for the full pattern.
- Conversational agents — persistent memory across sessions: preferences, decisions, relationship context
- Discord / Slack bots — remember what users said and decided, surface it when relevant
- Long-running research agents — accumulate findings over days; recall related prior work at query time
- Personal assistants — "what did I say I needed to follow up on?" answered from actual memory
- Configuration reference — all config keys, environment variable overrides
- REST API reference — all endpoints, request/response shapes
- MCP tools reference — tools available to Claude agents, usage patterns
- OpenAPI spec
Mozilla Public License 2.0. See LICENSE or https://mozilla.org/MPL/2.0/.

