_____
| ____|_ __ __ _ _ __ __ _ _ __ ___
| _| | '_ \ / _` | '__/ _` | '_ ` _ \
| |___| | | | (_| | | | (_| | | | | | |
|_____|_| |_|\__, |_| \__,_|_| |_| |_|
|___/
Package name: @proofofwork/engram
Author: danillo felix
Status: beta / work in progress. Engram is usable as a local-first single-user coding-agent memory runtime and is release-gated, but it is still pre-1.0. Treat the API, benchmark adapters, and packaging as evolving surfaces.
Engram is a local-first cognitive memory runtime for black-box LLM coding agents.
It is not a chat-history summarizer and it is not a codebase indexer. Engram stores the experience around work: failures, decisions, rejected approaches, fixes, proof tests, source tags, evidence, lifecycle state, and recall traces. Before the next agent action, it compiles a small working-memory packet so the LLM gets the right prior experience without resending the whole conversation.
Think of it as a small engineering brain in front of the model. The LLM still does the reasoning and code generation, but Engram handles the memory discipline that a stateless model does not have: what to write down, what needs review, what was already disproven, what should be recalled now, and what should stay out of the prompt.
The end goal is not "memory" as a feature. The end goal is more efficient coding agents: fewer repeated mistakes, less repeated explanation, less context resending, and smaller prompts that still carry the engineering experience that matters.
Coding agents repeat expensive mistakes because most LLM calls are stateless. The usual workaround is to resend long context, summarize chat logs, or index the entire codebase. Engram takes a narrower path: it remembers what happened while working and recalls only the experience that is relevant to the next task.
This makes Engram useful when you want an agent to remember:
- a prior test failure and the exact fix that solved it
- a rejected approach the user does not want repeated
- a file, symbol, command, or test associated with a lesson
- whether a memory came from the user, a tool, an outcome, or an agent inference
- weak inferred memories that should be reviewed before active recall
That means Engram is not trying to answer "what code exists in this repository?" It is trying to answer "what should the agent know before it acts, so it avoids burning tokens and time on a failure path we already proved?"
- No full-context resend required: Engram compiles bounded task packets from stored memories, lessons, and warnings.
- Context savings without provider APIs: Engram reports packet size, selected memories, dropped memories, and estimated savings for Claude Code, Codex, Cline, OpenCode, OpenAI, and generic callers. Savings percentages are shown only when Engram has an observed or caller-provided baseline.
- No source-code indexing: file paths, symbols, commands, and tests are memory tags only. This keeps Engram separate from code retrieval products.
- Anti-repeat by design: Lesson Cards store failure -> trigger -> wrong approach -> fix -> proof test, then rank above generic memories during coding work.
- Learns from outcomes: recall applies time decay and outcome-weighted reinforcement, so fresh and proven lessons earn attention while stale generic memories fade unless pinned.
- Evidence-aware recall: memories are labeled as user-stated, tool-observed, agent-inferred, test-proven, outcome-confirmed, or system-configured.
- Reviewable memory: weak inferred memories enter the Memory Inbox instead of active recall.
- Auditable lifecycle: memories can be raw, active, reinforced, superseded, contradicted, archived, pinned, and audited.
- Agent-ready integration: Engram exposes a REST API, Python CLI, global npm wrapper, MCP server, and hook adapter.
- Onboarding and fragility:
engram onboardandengram fragilitysurface what tends to go wrong in a repo or file without indexing source code. - Portable lessons: Lesson packs export/import durable anti-repeat knowledge across machines or projects.
- Smoke score plus evals:
scripts/live_score.pyruns a temp-database smoke scenario, whileengram eval runandengram bench ...report precision, stale suppression, decoy contamination, token reduction, lesson recall, and anti-repeat benchmark metrics. - Benchmark adapters: compare Engram against local baselines, Mem0, or any subprocess-based competitor adapter on the same anti-repeat dataset.
- Release-gated heavy local proof: the release gate covers unit/integration tests, live score, integrity checks, artifact hygiene, MCP initialization, package install smoke tests, a 10k benchmark, and a 100k-memory stress gate with drift cycles, concurrent hooks, repeated sleep/consolidation, and final integrity verification.
Engram runs a write, manage, read loop around coding-agent work:
- Write: hooks, MCP tools, the CLI, or the REST API record engineering events such as task starts, failed commands, passing tests, rejected approaches, user corrections, and final outcomes.
- Extract: the runtime turns those events into typed memories, file tags, source tags, Lesson Cards, Error Book entries, and audit records.
- Gate: trusted evidence can enter active recall; weak agent inference is quarantined in the Memory Inbox until reviewed.
- Manage: duplicates reinforce existing memories, stale records are archived or superseded, FTS rows stay synchronized, and link inference builds a small associative graph between related memories.
- Meter: Engram estimates packet tokens locally and labels savings as measured, estimated, or unavailable depending on the available baseline.
- Read: before the next agent step, Engram activates only the relevant lessons, failures, warnings, decisions, files, symbols, commands, and proof tests for the current goal.
The output is a compact working-memory packet, not a transcript. That is the core tradeoff: Engram gives the agent remembered experience without turning the prompt into a full chat replay or a whole-repository index.
Most memory tools store conversations, summaries, vectors, or code chunks. Engram stores engineering experience with provenance and lifecycle controls.
- Experience, not chat logs: memories are organized around work outcomes, not raw conversation history.
- Lessons, not vague summaries: a Lesson Card captures the failure trigger, the wrong approach, the durable fix, and the proof test that verified it.
- Evidence before trust: each memory carries an evidence class, and low-trust inferences stay reviewable.
- Local auditability: mutation history, recall traces, inbox decisions, and replay fixtures remain in a local SQLite database.
- Agent-neutral surface: Claude Code, Codex, OpenCode, Cline, shell scripts, and custom tools can all talk to the same runtime through MCP, hooks, CLI, or HTTP.
- No code-index dependency: Engram can remember "last time this test failed, this fix worked" without pre-indexing the repository.
The memory-agent market is crowded, and Engram should be judged by its narrow mission rather than by generic memory-platform claims.
- Mem0 is a broad universal memory layer with open-source and hosted paths, SDKs, MCP/agent integrations, and published benchmark claims. Its paper reports lower latency and token cost versus full-context methods on long-dialogue memory tasks: arXiv:2504.19413.
- Zep / Graphiti focuses on temporal context graphs: evolving entities, facts, relationships, provenance episodes, and temporal validity windows. Zep's paper frames this as a temporal knowledge graph architecture for agent memory: arXiv:2501.13956.
- Letta Code is a memory-first agent runtime. It keeps durable agent memory, supports self-editing memory, and can launch sleep-time reflection subagents.
- AgentMemory is a direct coding-agent memory competitor with hooks, MCP, benchmark claims, and broad agent support.
- MCP server-memory is the simple reference-style knowledge-graph memory server many MCP users can reach for by default.
Engram's wedge is different: local, governed, anti-repeat engineering memory for black-box coding agents. It stores the work experience around failures, decisions, rejected approaches, proof tests, and outcomes. It deliberately does not index the codebase and does not ask users to adopt a full agent runtime.
What Engram can claim today:
- It is a beta local-first coding-agent memory runtime with REST, CLI, MCP, hooks, a TypeScript client, an inspector, and reproducible release gates.
- It reduces prompt bloat by sending bounded working-memory packets instead of replaying whole chat history when a baseline is available.
- It has a differentiated Lesson Card primitive: failure, trigger, wrong approach, fix, proof test, source tags, evidence, and lifecycle.
- It is regression-gated with
92passing tests, live score100 / 124, anti-repeat score1.0on the committed smoke dataset, LongMemEval-style tiny-fixture recall, a 100k-memory stress gate, gitleaks, wheel/npm smoke installs, and MCP initialization. - On the checked-in local raw Mem0 smoke comparison, Engram scores
1.0and local raw Mem0 scores0.9for Engram's anti-repeat dataset because Engram preserves the rejected-approach signal.
What Engram should not claim yet:
- Not "better than Mem0/Zep/Letta/AgentMemory" broadly. Those products solve wider or different problems and publish their own benchmark surfaces.
- Not a hosted/team memory platform.
- Not a codebase indexer or replacement for RepoRecall, Cursor, Continue, Aider maps, or Sourcegraph-style code search.
- Not a learned neural memory controller. Engram uses deterministic lifecycle, decay, outcome weighting, and audit, not model-weight training.
Current verification target:
uv run pytest -q
uv run python scripts/live_score.py
uv run python scripts/stress_gate.py
uv run python scripts/release_gate.pyCurrent local evidence for v0.10: 92 passed, live score 100 / 124, anti-repeat
benchmark score 1.0, LongMemEval-style recall_at_5=1.0, production stress
gate passed, and release gate passed. Treat live_score.py as a release smoke
test plus eval sanity check, not a scientific benchmark against other memory
systems.
flowchart LR
Agent[Claude Code / Codex / OpenCode / Cline] --> Surface[Hooks / MCP / CLI / REST / TS Client]
Surface --> Runtime[MemoryRuntime Facade]
Runtime --> Services[Storage / Recall / Lifecycle / Lesson / Setup / Replay]
Services --> Extract[Typed Extraction Workers]
Extract --> Gate[Evidence + Admission Gate]
Gate --> Active[(Active / Reinforced Memories)]
Gate --> Inbox[(Memory Inbox)]
Active --> Links[(Associative Links)]
Services --> Lessons[(Lesson Cards)]
Services --> Feedback[(Outcome Feedback)]
Services --> Packs[(Lesson Packs)]
Services --> Onboard[Onboarding + Fragility]
Services --> ErrorBook[(Error Book)]
Services --> Audit[(Mutation Audit)]
Services --> Replay[(Replay + Eval Fixtures)]
Services --> Bench[Benchmark Adapter Registry]
Links --> Recall[Activation + Ranking]
Lessons --> Recall
ErrorBook --> Recall
Recall --> Meter[Context Meter]
Audit --> Integrity[Integrity Checks]
Replay --> Integrity
Bench --> Local[Engram / Full History / No Memory]
Bench --> Mem0[Mem0 Adapter]
Bench --> External[Subprocess Adapter]
Local --> Reports[Normalized Benchmark Reports]
Mem0 --> Reports
External --> Reports
Feedback --> Recall
Packs --> Lessons
Onboard --> Packet
Meter --> Stats[Context Stats]
Meter --> Packet[Bounded Working-Memory Packet]
Stats --> Reports
Packet --> Agent
Engram sits in front of the LLM. The model remains a black box; Engram manages what remembered experience should be visible for the next step.
Engram does not require Claude, Codex, Cline, OpenCode, or any model provider to
return token usage. It measures the packet it creates, estimates tokens with a
provider profile, and returns structured context_stats beside the packet:
packet_chars,packet_bytes, andpacket_tokens_estimatedselected_memory_count,dropped_low_roi_count, andabstainedbaseline_tokens_estimatedandestimated_context_savings_percentwhen a baseline is knowntoken_estimation_source:profile_estimateorchar_fallbackunless a future integration supplies provider-reported usagesavings_confidence:measured,estimated, orunavailable
The stats are metadata, not prompt content. Hook mode writes the packet to the
agent and logs the estimate outside the injected prompt; CLI, REST, MCP, and the
TypeScript client receive the same context_stats object.
engram think "What should the agent avoid repeating?" \
--provider-profile claude \
--baseline-tokens 9000
engram work prepare "Fix recall regression" \
--provider-profile codex \
--baseline-text "$(cat /tmp/session-history.txt)"For human-readable savings output in the terminal, add --format table to
engram think or engram work prepare:
engram work prepare "Fix recall regression" \
--provider-profile codex \
--baseline-tokens 9000 \
--format tableIf no baseline exists, Engram still reports packet size but does not claim percentage savings:
{
"packet_tokens_estimated": 420,
"estimated_context_savings_percent": null,
"savings_confidence": "unavailable"
}stateDiagram-v2
[*] --> Raw: weak agent inference
[*] --> Active: user/tool/test/outcome evidence
Raw --> Active: inbox approve
Raw --> Archived: inbox reject/archive
Active --> Reinforced: duplicate or repeated use
Active --> Superseded: safe supersession
Active --> Contradicted: explicit conflict
Active --> Archived: manual/archive policy
Reinforced --> Superseded
Reinforced --> Archived
Superseded --> Archived: retention policy
Contradicted --> Archived: retention policy
note right of Superseded
Supersession requires stronger lexical
and embedding agreement, is capped per
event, and writes an event-level audit row.
end note
The lifecycle avoids treating every generated statement as truth. User-stated, tool-observed, test-proven, outcome-confirmed, and system-configured memories can become active immediately. Low-confidence agent-inferred memory stays reviewable.
sequenceDiagram
participant User
participant Agent
participant Engram
participant Tests
User->>Agent: Fix graph recall bug
Agent->>Engram: prepare goal context
Engram-->>Agent: prior lessons + warnings
Agent->>Engram: work start
Engram-->>Agent: session id
Agent->>Tests: run pytest
Tests-->>Agent: failure
Agent->>Engram: record test_failed + file/symbol/command/test
Agent->>Engram: record rejected approach or decision
Agent->>Tests: run proof test
Tests-->>Agent: pass
Agent->>Engram: finish with fix + proof test
Engram->>Engram: create Lesson Card + Error Book entry
Engram->>Engram: sync FTS, audit, replay/eval state
User->>Agent: Fix related recall bug
Agent->>Engram: work prepare
Engram-->>Agent: packet with prior failure, rejected approach, fix, proof
This is the core behavior Engram is built to protect: the next agent should not walk into a known failure loop if the system already saw the mistake and fix.
The consume side is explicit. engram_prepare and engram_think are the calls
that read Engram and return a bounded context packet. The other calls either
write new experience, inspect specific records, or close the feedback loop so
future recall gets better.
The direction labels below describe the agent workflow. Some consume calls still write runtime metadata internally, such as recall traces, compiled workspace records, memory use counts, and Lesson Card hit counts.
flowchart TD
A[Start task] --> B[READ engram_onboard]
B --> C[READ engram_lessons + engram_error_book]
C --> D[READ engram_prepare]
D --> E[Agent reads returned context_packet]
E --> F[WRITE engram_start]
F --> G{Work loop}
G --> H[WRITE engram_record]
G --> I[READ engram_think on scope pivot]
G --> J[READ engram_inspect / audit / inbox]
G --> K[WRITE engram_feedback for used or bad recalled memory]
H --> G
I --> G
J --> G
K --> G
G --> L[WRITE engram_finish]
L --> M[BOTH engram_sleep consolidation]
| Step | Agent direction | MCP tool | CLI equivalent | When to call it | What the agent consumes or writes |
|---|---|---|---|---|---|
| 1 | Read | engram_onboard |
engram onboard --scope <scope> |
Once at the start of a repo session or when entering an unfamiliar scope. | Reads a small orientation packet: lessons, recurring failures, fragile files, useful commands, open loops, and rejected approaches. |
| 2 | Read | engram_lessons |
engram lessons --scope <scope> |
Before touching an area that may have prior failure history. | Reads anti-repeat Lesson Cards: failure, trigger, wrong approach, fix, proof tests, and source tags. |
| 3 | Read | engram_error_book |
engram error-book --scope <scope> --mode structured |
Before retrying a known-fragile area or debugging a repeated failure. | Reads structured prior failures, warnings, rejected approaches, and user corrections. |
| 4 | Read | engram_fragility |
engram fragility --scope <scope> --file <path> |
Before editing a file that might have burned previous sessions. | Reads file-scoped experience tags; Engram does not index source code. |
| 5 | Read, main consume point | engram_prepare |
engram work prepare "<goal>" --scope <scope> --file <path> --symbol <symbol> |
Before non-trivial coding work. Pass known files, symbols, budget, intent, provider profile, baseline text/tokens, and session_id if a durable session already exists. |
Returns workspace.context_packet, selected memory metadata, warnings, error_book, trace_id, and context_stats. This is the normal replacement for filling the prompt with full chat or broad docs. |
| 6 | Write | engram_start |
engram work start "<goal>" --scope <scope> --agent <agent> |
When the task becomes a durable implementation or debugging session. | Writes a work session and returns session_id; carry it through later work calls. |
| 7 | Write | engram_record |
engram work record <session_id> <kind> "<content>" ... |
Whenever something meaningful happens: plan, decision, rejected approach, failed command, passing test, user correction, referenced file, or open loop. | Writes event evidence, source tags, extracted memories, Lesson Card candidates, Error Book inputs, and audit records. Use kind="note" for uncertain agent inference so it goes through review. |
| 8 | Read | engram_think |
engram think "<goal>" --scope <scope> |
Mid-flight when the task pivots and you need fresh context without starting another work session. | Returns the same bounded workspace shape as prepare, but without attaching it to work-session lifecycle. |
| 9 | Read | engram_memories, engram_inspect, engram_audit, engram_trace, engram_inbox |
engram memories, engram memory <id>, engram audit, engram trace <id>, engram inbox |
When a recalled memory looks surprising, stale, too weak, or needs provenance review. | Reads the specific record, lifecycle, mutation history, recall trace, or pending review item instead of dumping the whole memory store into context. |
| 10 | Write | engram_feedback |
`engram feedback <memory_id> <used | helpful | ignored |
| 11 | Write | engram_finish |
engram work finish <session_id> "<outcome>" --file <path> --test <test> |
Before the agent final response. Include success/failure, files, tests, and remaining risk. | Writes the final outcome and applies outcome feedback to memories and Lesson Cards recalled for the session. |
| 12 | Both | engram_sleep |
engram sleep --scope <scope> |
Periodically, after substantial work or before sharing lesson packs. | Reads recent raw work, consolidates duplicates, updates Error Book/wiki state, and writes lifecycle changes. |
baseline_text and baseline_tokens matter because they tell Engram what the
agent already has in context. With a baseline, engram_prepare can report
estimated savings and avoid returning redundant memory. Without one, Engram
still returns a bounded packet and marks savings confidence as unavailable.
Concrete MCP trace:
{
"tool": "engram_prepare",
"arguments": {
"goal": "Fix graph recall regression",
"scope": "repo/engram",
"files": ["server/engram/recall_service.py"],
"symbols": ["RecallService._expand_graph"],
"budget_tokens": 1500,
"intent": "working_and_errors",
"provider_profile": "codex",
"baseline_tokens": 9000,
"show_context_stats": true
}
}The agent reads only the returned packet fields it needs:
{
"workspace": {
"id": "wrk_...",
"context_packet": "# Goal\nFix graph recall regression\n\n# Memories\n...",
"memories": [{"id": "mem_...", "content": "Prior fix ..."}],
"trace_id": "trc_...",
"context_stats": {
"packet_tokens_estimated": 420,
"estimated_context_savings_percent": 95.3
}
},
"warnings": [],
"error_book": []
}Then the write side starts and records events:
{"tool": "engram_start", "arguments": {"goal": "Fix graph recall regression", "scope": "repo/engram", "agent": "codex"}}
{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "plan_proposed", "content": "Alias link_type separately from memory type.", "files": ["server/engram/recall_service.py"]}}
{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "approach_rejected", "content": "Do not rename memory.type; it breaks memory extraction semantics.", "files": ["server/engram/recall_service.py"]}}
{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "test_failed", "content": "Graph recall regression reproduced.", "commands": ["uv run pytest tests/test_runtime.py -q"], "tests": ["tests/test_runtime.py"]}}
{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "test_passed", "content": "Regression test passes after link_type alias.", "commands": ["uv run pytest tests/test_runtime.py -q"], "tests": ["tests/test_runtime.py"]}}If the prepared packet included a memory that mattered, close that loop:
{
"tool": "engram_feedback",
"arguments": {
"memory_id": "mem_...",
"signal": "used",
"scope": "repo/engram",
"actor": "codex",
"reason": "Selected the link_type alias fix from the prepared packet.",
"session_id": "ses_...",
"workspace_id": "wrk_...",
"trace_id": "trc_..."
}
}Finish the work session:
{
"tool": "engram_finish",
"arguments": {
"session_id": "ses_...",
"outcome": "Fixed graph recall by aliasing link_type separately from memory type.",
"success": true,
"files": ["server/engram/recall_service.py", "tests/test_runtime.py"],
"tests": ["uv run pytest tests/test_runtime.py -q"],
"remaining_risk": null
}
}For the next related task, the agent consumes memory again with
engram_prepare or engram_think. It should not replay the full previous
session unless the user explicitly asks for that transcript.
erDiagram
EVENTS ||--o{ MEMORIES : extracts
EVENTS ||--o{ WORK_EVENTS : records
MEMORIES ||--o{ MEMORY_LINKS : relates
MEMORIES ||--o{ MEMORY_MUTATIONS : audits
MEMORIES ||--o{ MEMORY_INBOX : reviews
MEMORIES ||--o{ MEMORY_FEEDBACK : receives
LESSON_CARDS ||--o{ LESSON_FEEDBACK : receives
LESSON_CARDS ||--o{ LESSON_PACKS : exports
WORK_SESSIONS ||--o{ WORK_EVENTS : contains
WORK_EVENTS ||--o{ LESSON_CARDS : creates
RECALL_TRACES ||--o{ WORKSPACES : compiles
EVAL_RUNS ||--o{ RECALL_TRACES : measures
RUNTIME_LOCKS ||--o{ WORK_SESSIONS : serializes
The database is local SQLite. The default path is .engram/engram.sqlite3.
Embeddings and Lesson Card source IDs are stored on the memory and lesson rows
with provenance metadata. All major behavior is available through the CLI and
API.
- Admission gate: duplicate memories reinforce existing records instead of polluting active recall.
- Evidence classes: memories are labeled by source quality.
- Lifecycle states: raw, active, reinforced, superseded, contradicted, and archived records are preserved with audit history.
- Intent-aware recall: hooks and tools can request skip, working-only, working-and-errors, or full memory packets.
- Attention decay: stale generic memories lose rank over time; pinned and high-risk safety memories decay conservatively.
- Outcome-weighting: successful sessions boost the recalled memories and Lesson Cards that helped; harmful/ignored context is demoted.
- Error Book: prior failures can be returned as structured failure, cause, fix, proof-test, warning, and source-id entries.
- Lesson Cards: durable failure -> trigger -> wrong approach -> fix -> proof test cards that rank above generic memories during coding work.
- Memory Inbox: weak inferred memories stay reviewable instead of entering active recall.
- Proof Mode:
engram eval runproduces local metrics for lesson recall, stale suppression, token savings, and lesson count. - Setup/Doctor:
engram setupandengram doctormake agent wiring inspectable. - Replay: export and replay real Engram-assisted sessions as fixtures.
- Inspector: open
inspector/index.htmlagainst a local Engram server for lessons, inbox, audit, eval, and packet preview. - Integrity:
engram integrity verifychecks FTS drift, orphan links, stale inbox items, embedding drift, and missing mutation audit records. - Project onboarding:
engram onboardreturns top lessons, warnings, useful commands, fragile files, and ignored approaches for a scope. - Fragility heatmap:
engram fragilityranks files by prior failures, lessons, tests, and related memories using tags from work events. - Lesson packs:
engram lessons export/importmoves reviewed Lesson Cards between projects without moving chat logs or source indexes.
- Explicit services:
MemoryRuntimeis now a delegating facade over service objects sharing aRuntimeContext, not a mixin stack. - Embedding provenance: each memory records provider, model, and dimensions.
- Semantic opt-in: hash embeddings remain the deterministic default;
sentence-transformers stay behind the
semanticextra. - Embedding rebuild:
engram embeddings rebuildrecomputes stored embeddings for the active provider.
- Supersession safety: corrections require stronger lexical and embedding agreement, are capped per event, and write an event-level audit row with affected memory IDs.
- Admission hardening: caller-supplied evidence metadata is ignored unless it comes from an internal trusted path; agent-inferred decisions, warnings, errors, corrections, and rejected approaches stay in RAW/Inbox.
- Cross-process safety: sleep and supersession use SQLite runtime locks in addition to in-process locks.
- Production stress proof:
scripts/stress_gate.pyverifies a 100k-memory local store, repeated lifecycle drift checks, concurrent hook writes, repeated sleep/consolidation cycles, and final global integrity. - Release proof:
engram release checkruns the release gate, including strict doctor, 10k benchmark, production stress gate, Python wheel smoke, npm smoke, TypeScript build, and MCP initialize. - Inspector triage: the HTML inspector can approve, reject, or archive Memory Inbox items.
- Schema registry: CLI argparse and MCP tool schemas are generated from one Pydantic-backed command registry.
- Post-commit graph work: bounded link inference runs after the write transaction and recall drains completed link jobs before packet assembly while continuing safely if background link inference is still pending under load.
- Eval hygiene: lesson/stale/decoy metrics are ID- and relevance-based instead of substring-based.
- Decision extraction: casual wording like "I decided to take a break" no longer becomes an active engineering decision.
- Learning core: recall-time decay, outcome-weighted lesson utility, onboarding, fragility reports, and lesson pack import/export are covered by tests.
- Context metering: provider-independent packet stats and baseline-aware savings estimates are returned out of band for Claude, Codex, Cline, OpenCode, OpenAI, MCP, CLI, REST, and TS clients.
- Production hardening: bearer auth, CORS allowlists, rate limits, request-size limits, liveness/readiness endpoints, no-store security headers, and artifact content checks are release-gated.
- Current proof:
92pytest tests pass,scripts/live_score.pyreports100 / 124, benchmark gates pass, andscripts/release_gate.pypasses locally.
uv sync --extra dev
uv run engram init
uv run uvicorn engram.app:app --app-dir server --host 127.0.0.1 --port 8000 --reloadWrite/update detected agent instruction files with an Engram-owned marker block:
uv run engram setup all --applysetup all auto-detects existing instruction files in the current directory
such as AGENTS.md, CLAUDE.md, OPENCODE.md, or CLINE.md. To create a
specific agent file explicitly, pass that agent name, for example
uv run engram setup claude --apply.
Setup supports optional policy modes:
off: no automatic hook activity.advisory: instruction-only reminders.assist: default; hooks inject context and record outcomes when available.enforce: hooks fail missing lifecycle checkpoints such as tool/stop events without a prepared Engram session.
uv run engram setup claude --policy assist --apply
uv run engram-hook UserPromptSubmit --agent claude-code --policy enforceGlobal CLI wrapper:
npm install -g @proofofwork/engram
# or, from this repository
npm link ./packages/cli
engram --help
engram lessons --scope demo
engram error-book --scope demo --mode structured
engram-mcp
engram-hook UserPromptSubmit --agent claude-codeuv run engram init
uv run engram observe "User prefers concise engineering plans" --scope demo
uv run engram observe "Maybe FragileIdea exists." --scope demo --kind assistant --actor assistant
uv run engram think "How should I answer this user?" --scope demo
uv run engram lessons --scope demo
uv run engram onboard --scope repo/engram
uv run engram fragility --scope repo/engram --file server/engram/recall_service.py
uv run engram lessons export --scope repo/engram --output lessons.engram.json
uv run engram lessons import lessons.engram.json --scope repo/other
uv run engram error-book --scope demo
uv run engram error-book --scope demo --mode structured
uv run engram inbox --scope demo
uv run engram eval run --scope demo
uv run engram setup claude
uv run engram setup all --apply
uv run engram doctor
uv run engram sleep --scope demoengram observe defaults to --kind user --actor user. Use --kind assistant
or --kind tool when recording inferred agent context that should pass through
the Memory Inbox before it becomes active recall.
SESSION=$(uv run engram work start "Fix recall bug" --scope repo/engram --agent codex \
| uv run python -c 'import json,sys; print(json.load(sys.stdin)["session_id"])')
uv run engram work record "$SESSION" test_failed \
"Graph recall failed because link.type shadowed memory.type." \
--file server/engram/recall_service.py \
--symbol RecallService._expand_graph \
--command "uv run pytest -q" \
--test tests/test_runtime.py
uv run engram work finish "$SESSION" \
"Fixed recall by aliasing link_type separately from memory type." \
--file server/engram/recall_service.py \
--test tests/test_runtime.py
uv run engram work prepare "Fix related graph recall issue" \
--scope repo/engram \
--file server/engram/recall_service.py \
--symbol RecallService._expand_graphThe final command returns a working-memory packet containing the relevant lesson, warning, source tags, and proof context.
POST /v1/observe: ingest raw events and extract typed memories.POST /v1/think: activate memory and compile a bounded workspace.POST /v1/commit: write LLM/tool outcomes back into memory.POST /v1/sleep: consolidate duplicates and repeated experience.GET /v1/memories/{id}: inspect evidence and lifecycle for one memory.PATCH /v1/memories/{id}: pin, unpin, archive, reinforce, supersede, or contradict a memory.GET /v1/audit: inspect memory mutation history.GET /v1/lessons: inspect anti-repeat Lesson Cards.POST /v1/lessons/export: export a portable Lesson Card pack.POST /v1/lessons/import: import a Lesson Card pack as raw or trusted lessons.GET /v1/onboard: compile project onboarding memory for a scope.GET /v1/fragility: rank fragile files from experience tags.GET /v1/inbox: inspect pending memory-review items.PATCH /v1/inbox/{id}: approve, reject, or archive review items.POST /v1/evals/run: run local Proof Mode metrics.POST /v1/setup: generate agent setup commands and config paths.GET /v1/doctor: check local Engram wiring.POST /v1/fts/rebuild: rebuild active-memory FTS rows.POST /v1/embeddings/rebuild: rebuild memory embeddings for the active provider.GET /v1/integrity/verify: verify FTS, audit, orphan, and embedding drift.GET /v1/replay/export: export a scope as a replay fixture.POST /v1/replay/run: replay a fixture into Engram and run eval.POST /v1/work/start: start an engineering work session.POST /v1/work/event: record a coding-agent work event.POST /v1/work/prepare: compile an engineering working-memory packet.POST /v1/work/finish: store the task outcome.GET /v1/workspace/{id}: inspect a compiled workspace.GET /v1/trace/{id}: inspect recall decisions.GET /v1/memories: inspect stored memories.GET /v1/error-book: read prior failures, warnings, and corrections. Passmode=structuredfor Error Book v2 entries.GET /v1/wiki: list compiled wiki pages.
uv run engram-mcpMCP tools:
engram_observeengram_thinkengram_feedbackengram_startengram_prepareengram_recordengram_finishengram_sleepengram_memoriesengram_inspectengram_auditengram_traceengram_error_bookengram_lessonsengram_lessons_exportengram_lessons_importengram_onboardengram_fragilityengram_inboxengram_integrityengram_eval_summaryengram_benchmarkengram_setup
uv run engram-hook UserPromptSubmit --agent claude-code --policy assistengram-hook reads hook JSON from stdin and normalizes prompt/tool/stop events into
Engram work sessions. See docs/integrations/ for Claude Code, Cline,
OpenCode, and Codex examples. Prompt hooks print only the prepared context
packet to stdout for injection; context savings are printed as a compact table
on stderr.
uv run python scripts/live_score.pyThe live score runs a temp-database smoke scenario through engram when it is
linked globally. It checks:
- Lesson Card creation
- Lesson recall in the prepared context packet
- prior failure recall
- rejected approach recall
- source tag preservation
- structured Error Book through the CLI
- eval metrics
- Memory Inbox quarantine
- doctor counts
- replay export/run
- integrity verification
Example shape:
{
"score": 100,
"earned": 124,
"total": 124,
"counts": {
"lessons": 2,
"inbox": 2,
"replayed": 5,
"error_book": 3
}
}live_score.py is a smoke gate. For benchmark claims, use the dedicated bench
commands:
uv run engram bench anti-repeat \
--dataset benchmarks/datasets/smoke/anti_repeat.jsonl \
--output /tmp/engram-anti-repeat
uv run engram bench longmemeval \
--data benchmarks/datasets/smoke/longmemeval_tiny.jsonl \
--output /tmp/engram-longmemevalThe anti-repeat benchmark is Engram-native: it measures whether a second agent
task recalls the prior failure, rejected approach, proof test, source tags, and
fix from an earlier work session. The LongMemEval adapter reports external
retrieval metrics such as recall_at_5, mrr, precision_at_k, abstention,
knowledge-update accuracy, latency, and context savings.
Use engram bench compare --system all --dataset ... to compare Engram against
two local baselines: full-history and no-memory. Do not publish external
competitor numbers until that competitor is wired through the same adapter and
the generated results.json is checked in or attached to the release.
See docs/benchmarks.md for dataset formats, metrics, reporting rules, and the
market-proof checklist that separates usable claims from claims that still need
external adapters.
uv run engram bench compare \
--system engram,mem0 \
--dataset benchmarks/datasets/smoke/anti_repeat.jsonl
uv run engram bench compare \
--system engram,external \
--adapter-name reporecall \
--adapter-command "uv run python adapters/reporecall_adapter.py" \
--dataset benchmarks/datasets/smoke/anti_repeat.jsonlAdapters are for benchmarking only. They do not change Engram recall and they do
not add code indexing. Engram sends the same dataset to each adapter, receives a
normalized packet per scenario, and scores every system with the same evaluator.
Mem0 is optional and only required when --system mem0 is selected. The generic
subprocess adapter accepts JSON on stdin and returns JSON on stdout, which makes
future Zep, AgentMemory, RepoRecall, or custom comparisons possible without
adding those products to Engram core.
Install named adapter dependencies with:
uv sync --extra adaptersCurrent checked-in Mem0 smoke result:
benchmarks/results/mem0-smoke-local/results.json- Engram anti-repeat score:
1.0 - local raw Mem0 anti-repeat score:
0.9 - measured gap: Engram recalls rejected approaches from Lesson Cards; the local raw Mem0 run does not
This is a narrow anti-repeat smoke result, not a broad claim against hosted Mem0 or Mem0 with an LLM extraction policy.
Engram is deliberately narrow.
It does not try to replace code search, embeddings over a repository, or a general long-term memory SaaS. It focuses on the layer coding agents are missing when they repeat mistakes: local, auditable, evidence-aware experience memory.
Use Engram when you want:
- a Claude Code, Codex, OpenCode, or Cline session to remember prior engineering outcomes
- anti-repeat protection before a risky edit
- a small working packet instead of a full transcript resend
- replayable memory-assisted sessions for local evaluation
- local-first storage you can inspect and delete
Do not use Engram as:
- a semantic code search engine
- a replacement for tests
- a source of truth for unreviewed agent guesses
- a benchmark claim against general memory platforms without running your own workload
uv run pytest -q
uv run python -m compileall -q server tests scripts
npm --prefix packages/ts-client install
npm --prefix packages/ts-client run build
cd packages/cli && npm pack --dry-run
uv run python scripts/live_score.py
uv run engram integrity verify
uv run python scripts/stress_gate.py
uv run python scripts/release_gate.pyThe repository should be clean after removing generated caches and build artifacts.
See docs/audit-coverage.md for the audit synthesis coverage matrix and
docs/release.md for npm/Python release steps.