Skip to content

proofofwork-agency/engram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 _____
| ____|_ __   __ _ _ __ __ _ _ __ ___
|  _| | '_ \ / _` | '__/ _` | '_ ` _ \
| |___| | | | (_| | | | (_| | | | | | |
|_____|_| |_|\__, |_|  \__,_|_| |_| |_|
             |___/

Engram

Package name: @proofofwork/engram Author: danillo felix

Status: beta / work in progress. Engram is usable as a local-first single-user coding-agent memory runtime and is release-gated, but it is still pre-1.0. Treat the API, benchmark adapters, and packaging as evolving surfaces.

Engram is a local-first cognitive memory runtime for black-box LLM coding agents.

It is not a chat-history summarizer and it is not a codebase indexer. Engram stores the experience around work: failures, decisions, rejected approaches, fixes, proof tests, source tags, evidence, lifecycle state, and recall traces. Before the next agent action, it compiles a small working-memory packet so the LLM gets the right prior experience without resending the whole conversation.

Think of it as a small engineering brain in front of the model. The LLM still does the reasoning and code generation, but Engram handles the memory discipline that a stateless model does not have: what to write down, what needs review, what was already disproven, what should be recalled now, and what should stay out of the prompt.

The end goal is not "memory" as a feature. The end goal is more efficient coding agents: fewer repeated mistakes, less repeated explanation, less context resending, and smaller prompts that still carry the engineering experience that matters.

Why It Exists

Coding agents repeat expensive mistakes because most LLM calls are stateless. The usual workaround is to resend long context, summarize chat logs, or index the entire codebase. Engram takes a narrower path: it remembers what happened while working and recalls only the experience that is relevant to the next task.

This makes Engram useful when you want an agent to remember:

  • a prior test failure and the exact fix that solved it
  • a rejected approach the user does not want repeated
  • a file, symbol, command, or test associated with a lesson
  • whether a memory came from the user, a tool, an outcome, or an agent inference
  • weak inferred memories that should be reviewed before active recall

That means Engram is not trying to answer "what code exists in this repository?" It is trying to answer "what should the agent know before it acts, so it avoids burning tokens and time on a failure path we already proved?"

Factual Sales Points

  • No full-context resend required: Engram compiles bounded task packets from stored memories, lessons, and warnings.
  • Context savings without provider APIs: Engram reports packet size, selected memories, dropped memories, and estimated savings for Claude Code, Codex, Cline, OpenCode, OpenAI, and generic callers. Savings percentages are shown only when Engram has an observed or caller-provided baseline.
  • No source-code indexing: file paths, symbols, commands, and tests are memory tags only. This keeps Engram separate from code retrieval products.
  • Anti-repeat by design: Lesson Cards store failure -> trigger -> wrong approach -> fix -> proof test, then rank above generic memories during coding work.
  • Learns from outcomes: recall applies time decay and outcome-weighted reinforcement, so fresh and proven lessons earn attention while stale generic memories fade unless pinned.
  • Evidence-aware recall: memories are labeled as user-stated, tool-observed, agent-inferred, test-proven, outcome-confirmed, or system-configured.
  • Reviewable memory: weak inferred memories enter the Memory Inbox instead of active recall.
  • Auditable lifecycle: memories can be raw, active, reinforced, superseded, contradicted, archived, pinned, and audited.
  • Agent-ready integration: Engram exposes a REST API, Python CLI, global npm wrapper, MCP server, and hook adapter.
  • Onboarding and fragility: engram onboard and engram fragility surface what tends to go wrong in a repo or file without indexing source code.
  • Portable lessons: Lesson packs export/import durable anti-repeat knowledge across machines or projects.
  • Smoke score plus evals: scripts/live_score.py runs a temp-database smoke scenario, while engram eval run and engram bench ... report precision, stale suppression, decoy contamination, token reduction, lesson recall, and anti-repeat benchmark metrics.
  • Benchmark adapters: compare Engram against local baselines, Mem0, or any subprocess-based competitor adapter on the same anti-repeat dataset.
  • Release-gated heavy local proof: the release gate covers unit/integration tests, live score, integrity checks, artifact hygiene, MCP initialization, package install smoke tests, a 10k benchmark, and a 100k-memory stress gate with drift cycles, concurrent hooks, repeated sleep/consolidation, and final integrity verification.

How It Works

Engram runs a write, manage, read loop around coding-agent work:

  1. Write: hooks, MCP tools, the CLI, or the REST API record engineering events such as task starts, failed commands, passing tests, rejected approaches, user corrections, and final outcomes.
  2. Extract: the runtime turns those events into typed memories, file tags, source tags, Lesson Cards, Error Book entries, and audit records.
  3. Gate: trusted evidence can enter active recall; weak agent inference is quarantined in the Memory Inbox until reviewed.
  4. Manage: duplicates reinforce existing memories, stale records are archived or superseded, FTS rows stay synchronized, and link inference builds a small associative graph between related memories.
  5. Meter: Engram estimates packet tokens locally and labels savings as measured, estimated, or unavailable depending on the available baseline.
  6. Read: before the next agent step, Engram activates only the relevant lessons, failures, warnings, decisions, files, symbols, commands, and proof tests for the current goal.

The output is a compact working-memory packet, not a transcript. That is the core tradeoff: Engram gives the agent remembered experience without turning the prompt into a full chat replay or a whole-repository index.

What Makes It Different

Most memory tools store conversations, summaries, vectors, or code chunks. Engram stores engineering experience with provenance and lifecycle controls.

  • Experience, not chat logs: memories are organized around work outcomes, not raw conversation history.
  • Lessons, not vague summaries: a Lesson Card captures the failure trigger, the wrong approach, the durable fix, and the proof test that verified it.
  • Evidence before trust: each memory carries an evidence class, and low-trust inferences stay reviewable.
  • Local auditability: mutation history, recall traces, inbox decisions, and replay fixtures remain in a local SQLite database.
  • Agent-neutral surface: Claude Code, Codex, OpenCode, Cline, shell scripts, and custom tools can all talk to the same runtime through MCP, hooks, CLI, or HTTP.
  • No code-index dependency: Engram can remember "last time this test failed, this fix worked" without pre-indexing the repository.

Market Context

The memory-agent market is crowded, and Engram should be judged by its narrow mission rather than by generic memory-platform claims.

  • Mem0 is a broad universal memory layer with open-source and hosted paths, SDKs, MCP/agent integrations, and published benchmark claims. Its paper reports lower latency and token cost versus full-context methods on long-dialogue memory tasks: arXiv:2504.19413.
  • Zep / Graphiti focuses on temporal context graphs: evolving entities, facts, relationships, provenance episodes, and temporal validity windows. Zep's paper frames this as a temporal knowledge graph architecture for agent memory: arXiv:2501.13956.
  • Letta Code is a memory-first agent runtime. It keeps durable agent memory, supports self-editing memory, and can launch sleep-time reflection subagents.
  • AgentMemory is a direct coding-agent memory competitor with hooks, MCP, benchmark claims, and broad agent support.
  • MCP server-memory is the simple reference-style knowledge-graph memory server many MCP users can reach for by default.

Engram's wedge is different: local, governed, anti-repeat engineering memory for black-box coding agents. It stores the work experience around failures, decisions, rejected approaches, proof tests, and outcomes. It deliberately does not index the codebase and does not ask users to adopt a full agent runtime.

Claim Boundary

What Engram can claim today:

  • It is a beta local-first coding-agent memory runtime with REST, CLI, MCP, hooks, a TypeScript client, an inspector, and reproducible release gates.
  • It reduces prompt bloat by sending bounded working-memory packets instead of replaying whole chat history when a baseline is available.
  • It has a differentiated Lesson Card primitive: failure, trigger, wrong approach, fix, proof test, source tags, evidence, and lifecycle.
  • It is regression-gated with 92 passing tests, live score 100 / 124, anti-repeat score 1.0 on the committed smoke dataset, LongMemEval-style tiny-fixture recall, a 100k-memory stress gate, gitleaks, wheel/npm smoke installs, and MCP initialization.
  • On the checked-in local raw Mem0 smoke comparison, Engram scores 1.0 and local raw Mem0 scores 0.9 for Engram's anti-repeat dataset because Engram preserves the rejected-approach signal.

What Engram should not claim yet:

  • Not "better than Mem0/Zep/Letta/AgentMemory" broadly. Those products solve wider or different problems and publish their own benchmark surfaces.
  • Not a hosted/team memory platform.
  • Not a codebase indexer or replacement for RepoRecall, Cursor, Continue, Aider maps, or Sourcegraph-style code search.
  • Not a learned neural memory controller. Engram uses deterministic lifecycle, decay, outcome weighting, and audit, not model-weight training.

Current verification target:

uv run pytest -q
uv run python scripts/live_score.py
uv run python scripts/stress_gate.py
uv run python scripts/release_gate.py

Current local evidence for v0.10: 92 passed, live score 100 / 124, anti-repeat benchmark score 1.0, LongMemEval-style recall_at_5=1.0, production stress gate passed, and release gate passed. Treat live_score.py as a release smoke test plus eval sanity check, not a scientific benchmark against other memory systems.

Architecture

flowchart LR
  Agent[Claude Code / Codex / OpenCode / Cline] --> Surface[Hooks / MCP / CLI / REST / TS Client]
  Surface --> Runtime[MemoryRuntime Facade]
  Runtime --> Services[Storage / Recall / Lifecycle / Lesson / Setup / Replay]
  Services --> Extract[Typed Extraction Workers]
  Extract --> Gate[Evidence + Admission Gate]
  Gate --> Active[(Active / Reinforced Memories)]
  Gate --> Inbox[(Memory Inbox)]
  Active --> Links[(Associative Links)]
  Services --> Lessons[(Lesson Cards)]
  Services --> Feedback[(Outcome Feedback)]
  Services --> Packs[(Lesson Packs)]
  Services --> Onboard[Onboarding + Fragility]
  Services --> ErrorBook[(Error Book)]
  Services --> Audit[(Mutation Audit)]
  Services --> Replay[(Replay + Eval Fixtures)]
  Services --> Bench[Benchmark Adapter Registry]
  Links --> Recall[Activation + Ranking]
  Lessons --> Recall
  ErrorBook --> Recall
  Recall --> Meter[Context Meter]
  Audit --> Integrity[Integrity Checks]
  Replay --> Integrity
  Bench --> Local[Engram / Full History / No Memory]
  Bench --> Mem0[Mem0 Adapter]
  Bench --> External[Subprocess Adapter]
  Local --> Reports[Normalized Benchmark Reports]
  Mem0 --> Reports
  External --> Reports
  Feedback --> Recall
  Packs --> Lessons
  Onboard --> Packet
  Meter --> Stats[Context Stats]
  Meter --> Packet[Bounded Working-Memory Packet]
  Stats --> Reports
  Packet --> Agent
Loading

Engram sits in front of the LLM. The model remains a black box; Engram manages what remembered experience should be visible for the next step.

Context Efficiency

Engram does not require Claude, Codex, Cline, OpenCode, or any model provider to return token usage. It measures the packet it creates, estimates tokens with a provider profile, and returns structured context_stats beside the packet:

  • packet_chars, packet_bytes, and packet_tokens_estimated
  • selected_memory_count, dropped_low_roi_count, and abstained
  • baseline_tokens_estimated and estimated_context_savings_percent when a baseline is known
  • token_estimation_source: profile_estimate or char_fallback unless a future integration supplies provider-reported usage
  • savings_confidence: measured, estimated, or unavailable

The stats are metadata, not prompt content. Hook mode writes the packet to the agent and logs the estimate outside the injected prompt; CLI, REST, MCP, and the TypeScript client receive the same context_stats object.

engram think "What should the agent avoid repeating?" \
  --provider-profile claude \
  --baseline-tokens 9000

engram work prepare "Fix recall regression" \
  --provider-profile codex \
  --baseline-text "$(cat /tmp/session-history.txt)"

For human-readable savings output in the terminal, add --format table to engram think or engram work prepare:

engram work prepare "Fix recall regression" \
  --provider-profile codex \
  --baseline-tokens 9000 \
  --format table

If no baseline exists, Engram still reports packet size but does not claim percentage savings:

{
  "packet_tokens_estimated": 420,
  "estimated_context_savings_percent": null,
  "savings_confidence": "unavailable"
}

Memory Lifecycle

stateDiagram-v2
  [*] --> Raw: weak agent inference
  [*] --> Active: user/tool/test/outcome evidence
  Raw --> Active: inbox approve
  Raw --> Archived: inbox reject/archive
  Active --> Reinforced: duplicate or repeated use
  Active --> Superseded: safe supersession
  Active --> Contradicted: explicit conflict
  Active --> Archived: manual/archive policy
  Reinforced --> Superseded
  Reinforced --> Archived
  Superseded --> Archived: retention policy
  Contradicted --> Archived: retention policy

  note right of Superseded
    Supersession requires stronger lexical
    and embedding agreement, is capped per
    event, and writes an event-level audit row.
  end note
Loading

The lifecycle avoids treating every generated statement as truth. User-stated, tool-observed, test-proven, outcome-confirmed, and system-configured memories can become active immediately. Low-confidence agent-inferred memory stays reviewable.

Coding-Agent Flow

sequenceDiagram
  participant User
  participant Agent
  participant Engram
  participant Tests

  User->>Agent: Fix graph recall bug
  Agent->>Engram: prepare goal context
  Engram-->>Agent: prior lessons + warnings
  Agent->>Engram: work start
  Engram-->>Agent: session id
  Agent->>Tests: run pytest
  Tests-->>Agent: failure
  Agent->>Engram: record test_failed + file/symbol/command/test
  Agent->>Engram: record rejected approach or decision
  Agent->>Tests: run proof test
  Tests-->>Agent: pass
  Agent->>Engram: finish with fix + proof test
  Engram->>Engram: create Lesson Card + Error Book entry
  Engram->>Engram: sync FTS, audit, replay/eval state
  User->>Agent: Fix related recall bug
  Agent->>Engram: work prepare
  Engram-->>Agent: packet with prior failure, rejected approach, fix, proof
Loading

This is the core behavior Engram is built to protect: the next agent should not walk into a known failure loop if the system already saw the mistake and fix.

Full Toolcall Workflow

The consume side is explicit. engram_prepare and engram_think are the calls that read Engram and return a bounded context packet. The other calls either write new experience, inspect specific records, or close the feedback loop so future recall gets better.

The direction labels below describe the agent workflow. Some consume calls still write runtime metadata internally, such as recall traces, compiled workspace records, memory use counts, and Lesson Card hit counts.

flowchart TD
  A[Start task] --> B[READ engram_onboard]
  B --> C[READ engram_lessons + engram_error_book]
  C --> D[READ engram_prepare]
  D --> E[Agent reads returned context_packet]
  E --> F[WRITE engram_start]
  F --> G{Work loop}
  G --> H[WRITE engram_record]
  G --> I[READ engram_think on scope pivot]
  G --> J[READ engram_inspect / audit / inbox]
  G --> K[WRITE engram_feedback for used or bad recalled memory]
  H --> G
  I --> G
  J --> G
  K --> G
  G --> L[WRITE engram_finish]
  L --> M[BOTH engram_sleep consolidation]
Loading
Step Agent direction MCP tool CLI equivalent When to call it What the agent consumes or writes
1 Read engram_onboard engram onboard --scope <scope> Once at the start of a repo session or when entering an unfamiliar scope. Reads a small orientation packet: lessons, recurring failures, fragile files, useful commands, open loops, and rejected approaches.
2 Read engram_lessons engram lessons --scope <scope> Before touching an area that may have prior failure history. Reads anti-repeat Lesson Cards: failure, trigger, wrong approach, fix, proof tests, and source tags.
3 Read engram_error_book engram error-book --scope <scope> --mode structured Before retrying a known-fragile area or debugging a repeated failure. Reads structured prior failures, warnings, rejected approaches, and user corrections.
4 Read engram_fragility engram fragility --scope <scope> --file <path> Before editing a file that might have burned previous sessions. Reads file-scoped experience tags; Engram does not index source code.
5 Read, main consume point engram_prepare engram work prepare "<goal>" --scope <scope> --file <path> --symbol <symbol> Before non-trivial coding work. Pass known files, symbols, budget, intent, provider profile, baseline text/tokens, and session_id if a durable session already exists. Returns workspace.context_packet, selected memory metadata, warnings, error_book, trace_id, and context_stats. This is the normal replacement for filling the prompt with full chat or broad docs.
6 Write engram_start engram work start "<goal>" --scope <scope> --agent <agent> When the task becomes a durable implementation or debugging session. Writes a work session and returns session_id; carry it through later work calls.
7 Write engram_record engram work record <session_id> <kind> "<content>" ... Whenever something meaningful happens: plan, decision, rejected approach, failed command, passing test, user correction, referenced file, or open loop. Writes event evidence, source tags, extracted memories, Lesson Card candidates, Error Book inputs, and audit records. Use kind="note" for uncertain agent inference so it goes through review.
8 Read engram_think engram think "<goal>" --scope <scope> Mid-flight when the task pivots and you need fresh context without starting another work session. Returns the same bounded workspace shape as prepare, but without attaching it to work-session lifecycle.
9 Read engram_memories, engram_inspect, engram_audit, engram_trace, engram_inbox engram memories, engram memory <id>, engram audit, engram trace <id>, engram inbox When a recalled memory looks surprising, stale, too weak, or needs provenance review. Reads the specific record, lifecycle, mutation history, recall trace, or pending review item instead of dumping the whole memory store into context.
10 Write engram_feedback `engram feedback <memory_id> <used helpful ignored
11 Write engram_finish engram work finish <session_id> "<outcome>" --file <path> --test <test> Before the agent final response. Include success/failure, files, tests, and remaining risk. Writes the final outcome and applies outcome feedback to memories and Lesson Cards recalled for the session.
12 Both engram_sleep engram sleep --scope <scope> Periodically, after substantial work or before sharing lesson packs. Reads recent raw work, consolidates duplicates, updates Error Book/wiki state, and writes lifecycle changes.

baseline_text and baseline_tokens matter because they tell Engram what the agent already has in context. With a baseline, engram_prepare can report estimated savings and avoid returning redundant memory. Without one, Engram still returns a bounded packet and marks savings confidence as unavailable.

Concrete MCP trace:

{
  "tool": "engram_prepare",
  "arguments": {
    "goal": "Fix graph recall regression",
    "scope": "repo/engram",
    "files": ["server/engram/recall_service.py"],
    "symbols": ["RecallService._expand_graph"],
    "budget_tokens": 1500,
    "intent": "working_and_errors",
    "provider_profile": "codex",
    "baseline_tokens": 9000,
    "show_context_stats": true
  }
}

The agent reads only the returned packet fields it needs:

{
  "workspace": {
    "id": "wrk_...",
    "context_packet": "# Goal\nFix graph recall regression\n\n# Memories\n...",
    "memories": [{"id": "mem_...", "content": "Prior fix ..."}],
    "trace_id": "trc_...",
    "context_stats": {
      "packet_tokens_estimated": 420,
      "estimated_context_savings_percent": 95.3
    }
  },
  "warnings": [],
  "error_book": []
}

Then the write side starts and records events:

{"tool": "engram_start", "arguments": {"goal": "Fix graph recall regression", "scope": "repo/engram", "agent": "codex"}}

{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "plan_proposed", "content": "Alias link_type separately from memory type.", "files": ["server/engram/recall_service.py"]}}

{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "approach_rejected", "content": "Do not rename memory.type; it breaks memory extraction semantics.", "files": ["server/engram/recall_service.py"]}}

{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "test_failed", "content": "Graph recall regression reproduced.", "commands": ["uv run pytest tests/test_runtime.py -q"], "tests": ["tests/test_runtime.py"]}}

{"tool": "engram_record", "arguments": {"session_id": "ses_...", "kind": "test_passed", "content": "Regression test passes after link_type alias.", "commands": ["uv run pytest tests/test_runtime.py -q"], "tests": ["tests/test_runtime.py"]}}

If the prepared packet included a memory that mattered, close that loop:

{
  "tool": "engram_feedback",
  "arguments": {
    "memory_id": "mem_...",
    "signal": "used",
    "scope": "repo/engram",
    "actor": "codex",
    "reason": "Selected the link_type alias fix from the prepared packet.",
    "session_id": "ses_...",
    "workspace_id": "wrk_...",
    "trace_id": "trc_..."
  }
}

Finish the work session:

{
  "tool": "engram_finish",
  "arguments": {
    "session_id": "ses_...",
    "outcome": "Fixed graph recall by aliasing link_type separately from memory type.",
    "success": true,
    "files": ["server/engram/recall_service.py", "tests/test_runtime.py"],
    "tests": ["uv run pytest tests/test_runtime.py -q"],
    "remaining_risk": null
  }
}

For the next related task, the agent consumes memory again with engram_prepare or engram_think. It should not replay the full previous session unless the user explicitly asks for that transcript.

What Engram Stores

erDiagram
  EVENTS ||--o{ MEMORIES : extracts
  EVENTS ||--o{ WORK_EVENTS : records
  MEMORIES ||--o{ MEMORY_LINKS : relates
  MEMORIES ||--o{ MEMORY_MUTATIONS : audits
  MEMORIES ||--o{ MEMORY_INBOX : reviews
  MEMORIES ||--o{ MEMORY_FEEDBACK : receives
  LESSON_CARDS ||--o{ LESSON_FEEDBACK : receives
  LESSON_CARDS ||--o{ LESSON_PACKS : exports
  WORK_SESSIONS ||--o{ WORK_EVENTS : contains
  WORK_EVENTS ||--o{ LESSON_CARDS : creates
  RECALL_TRACES ||--o{ WORKSPACES : compiles
  EVAL_RUNS ||--o{ RECALL_TRACES : measures
  RUNTIME_LOCKS ||--o{ WORK_SESSIONS : serializes
Loading

The database is local SQLite. The default path is .engram/engram.sqlite3. Embeddings and Lesson Card source IDs are stored on the memory and lesson rows with provenance metadata. All major behavior is available through the CLI and API.

Product Layers

Brain Layer

  • Admission gate: duplicate memories reinforce existing records instead of polluting active recall.
  • Evidence classes: memories are labeled by source quality.
  • Lifecycle states: raw, active, reinforced, superseded, contradicted, and archived records are preserved with audit history.
  • Intent-aware recall: hooks and tools can request skip, working-only, working-and-errors, or full memory packets.
  • Attention decay: stale generic memories lose rank over time; pinned and high-risk safety memories decay conservatively.
  • Outcome-weighting: successful sessions boost the recalled memories and Lesson Cards that helped; harmful/ignored context is demoted.
  • Error Book: prior failures can be returned as structured failure, cause, fix, proof-test, warning, and source-id entries.

Anti-Repeat Product Layer

  • Lesson Cards: durable failure -> trigger -> wrong approach -> fix -> proof test cards that rank above generic memories during coding work.
  • Memory Inbox: weak inferred memories stay reviewable instead of entering active recall.
  • Proof Mode: engram eval run produces local metrics for lesson recall, stale suppression, token savings, and lesson count.
  • Setup/Doctor: engram setup and engram doctor make agent wiring inspectable.
  • Replay: export and replay real Engram-assisted sessions as fixtures.
  • Inspector: open inspector/index.html against a local Engram server for lessons, inbox, audit, eval, and packet preview.
  • Integrity: engram integrity verify checks FTS drift, orphan links, stale inbox items, embedding drift, and missing mutation audit records.
  • Project onboarding: engram onboard returns top lessons, warnings, useful commands, fragile files, and ignored approaches for a scope.
  • Fragility heatmap: engram fragility ranks files by prior failures, lessons, tests, and related memories using tags from work events.
  • Lesson packs: engram lessons export/import moves reviewed Lesson Cards between projects without moving chat logs or source indexes.

Local Core Runtime

  • Explicit services: MemoryRuntime is now a delegating facade over service objects sharing a RuntimeContext, not a mixin stack.
  • Embedding provenance: each memory records provider, model, and dimensions.
  • Semantic opt-in: hash embeddings remain the deterministic default; sentence-transformers stay behind the semantic extra.
  • Embedding rebuild: engram embeddings rebuild recomputes stored embeddings for the active provider.

Integrity Hardening

  • Supersession safety: corrections require stronger lexical and embedding agreement, are capped per event, and write an event-level audit row with affected memory IDs.
  • Admission hardening: caller-supplied evidence metadata is ignored unless it comes from an internal trusted path; agent-inferred decisions, warnings, errors, corrections, and rejected approaches stay in RAW/Inbox.
  • Cross-process safety: sleep and supersession use SQLite runtime locks in addition to in-process locks.
  • Production stress proof: scripts/stress_gate.py verifies a 100k-memory local store, repeated lifecycle drift checks, concurrent hook writes, repeated sleep/consolidation cycles, and final global integrity.
  • Release proof: engram release check runs the release gate, including strict doctor, 10k benchmark, production stress gate, Python wheel smoke, npm smoke, TypeScript build, and MCP initialize.
  • Inspector triage: the HTML inspector can approve, reject, or archive Memory Inbox items.

Pass-5 Closure

  • Schema registry: CLI argparse and MCP tool schemas are generated from one Pydantic-backed command registry.
  • Post-commit graph work: bounded link inference runs after the write transaction and recall drains completed link jobs before packet assembly while continuing safely if background link inference is still pending under load.
  • Eval hygiene: lesson/stale/decoy metrics are ID- and relevance-based instead of substring-based.
  • Decision extraction: casual wording like "I decided to take a break" no longer becomes an active engineering decision.
  • Learning core: recall-time decay, outcome-weighted lesson utility, onboarding, fragility reports, and lesson pack import/export are covered by tests.
  • Context metering: provider-independent packet stats and baseline-aware savings estimates are returned out of band for Claude, Codex, Cline, OpenCode, OpenAI, MCP, CLI, REST, and TS clients.
  • Production hardening: bearer auth, CORS allowlists, rate limits, request-size limits, liveness/readiness endpoints, no-store security headers, and artifact content checks are release-gated.
  • Current proof: 92 pytest tests pass, scripts/live_score.py reports 100 / 124, benchmark gates pass, and scripts/release_gate.py passes locally.

Quick Start

uv sync --extra dev
uv run engram init
uv run uvicorn engram.app:app --app-dir server --host 127.0.0.1 --port 8000 --reload

Write/update detected agent instruction files with an Engram-owned marker block:

uv run engram setup all --apply

setup all auto-detects existing instruction files in the current directory such as AGENTS.md, CLAUDE.md, OPENCODE.md, or CLINE.md. To create a specific agent file explicitly, pass that agent name, for example uv run engram setup claude --apply.

Setup supports optional policy modes:

  • off: no automatic hook activity.
  • advisory: instruction-only reminders.
  • assist: default; hooks inject context and record outcomes when available.
  • enforce: hooks fail missing lifecycle checkpoints such as tool/stop events without a prepared Engram session.
uv run engram setup claude --policy assist --apply
uv run engram-hook UserPromptSubmit --agent claude-code --policy enforce

Global CLI wrapper:

npm install -g @proofofwork/engram
# or, from this repository
npm link ./packages/cli
engram --help
engram lessons --scope demo
engram error-book --scope demo --mode structured
engram-mcp
engram-hook UserPromptSubmit --agent claude-code

Local CLI

uv run engram init
uv run engram observe "User prefers concise engineering plans" --scope demo
uv run engram observe "Maybe FragileIdea exists." --scope demo --kind assistant --actor assistant
uv run engram think "How should I answer this user?" --scope demo
uv run engram lessons --scope demo
uv run engram onboard --scope repo/engram
uv run engram fragility --scope repo/engram --file server/engram/recall_service.py
uv run engram lessons export --scope repo/engram --output lessons.engram.json
uv run engram lessons import lessons.engram.json --scope repo/other
uv run engram error-book --scope demo
uv run engram error-book --scope demo --mode structured
uv run engram inbox --scope demo
uv run engram eval run --scope demo
uv run engram setup claude
uv run engram setup all --apply
uv run engram doctor
uv run engram sleep --scope demo

engram observe defaults to --kind user --actor user. Use --kind assistant or --kind tool when recording inferred agent context that should pass through the Memory Inbox before it becomes active recall.

Work Session Example

SESSION=$(uv run engram work start "Fix recall bug" --scope repo/engram --agent codex \
  | uv run python -c 'import json,sys; print(json.load(sys.stdin)["session_id"])')

uv run engram work record "$SESSION" test_failed \
  "Graph recall failed because link.type shadowed memory.type." \
  --file server/engram/recall_service.py \
  --symbol RecallService._expand_graph \
  --command "uv run pytest -q" \
  --test tests/test_runtime.py

uv run engram work finish "$SESSION" \
  "Fixed recall by aliasing link_type separately from memory type." \
  --file server/engram/recall_service.py \
  --test tests/test_runtime.py

uv run engram work prepare "Fix related graph recall issue" \
  --scope repo/engram \
  --file server/engram/recall_service.py \
  --symbol RecallService._expand_graph

The final command returns a working-memory packet containing the relevant lesson, warning, source tags, and proof context.

Core API

  • POST /v1/observe: ingest raw events and extract typed memories.
  • POST /v1/think: activate memory and compile a bounded workspace.
  • POST /v1/commit: write LLM/tool outcomes back into memory.
  • POST /v1/sleep: consolidate duplicates and repeated experience.
  • GET /v1/memories/{id}: inspect evidence and lifecycle for one memory.
  • PATCH /v1/memories/{id}: pin, unpin, archive, reinforce, supersede, or contradict a memory.
  • GET /v1/audit: inspect memory mutation history.
  • GET /v1/lessons: inspect anti-repeat Lesson Cards.
  • POST /v1/lessons/export: export a portable Lesson Card pack.
  • POST /v1/lessons/import: import a Lesson Card pack as raw or trusted lessons.
  • GET /v1/onboard: compile project onboarding memory for a scope.
  • GET /v1/fragility: rank fragile files from experience tags.
  • GET /v1/inbox: inspect pending memory-review items.
  • PATCH /v1/inbox/{id}: approve, reject, or archive review items.
  • POST /v1/evals/run: run local Proof Mode metrics.
  • POST /v1/setup: generate agent setup commands and config paths.
  • GET /v1/doctor: check local Engram wiring.
  • POST /v1/fts/rebuild: rebuild active-memory FTS rows.
  • POST /v1/embeddings/rebuild: rebuild memory embeddings for the active provider.
  • GET /v1/integrity/verify: verify FTS, audit, orphan, and embedding drift.
  • GET /v1/replay/export: export a scope as a replay fixture.
  • POST /v1/replay/run: replay a fixture into Engram and run eval.
  • POST /v1/work/start: start an engineering work session.
  • POST /v1/work/event: record a coding-agent work event.
  • POST /v1/work/prepare: compile an engineering working-memory packet.
  • POST /v1/work/finish: store the task outcome.
  • GET /v1/workspace/{id}: inspect a compiled workspace.
  • GET /v1/trace/{id}: inspect recall decisions.
  • GET /v1/memories: inspect stored memories.
  • GET /v1/error-book: read prior failures, warnings, and corrections. Pass mode=structured for Error Book v2 entries.
  • GET /v1/wiki: list compiled wiki pages.

MCP

uv run engram-mcp

MCP tools:

  • engram_observe
  • engram_think
  • engram_feedback
  • engram_start
  • engram_prepare
  • engram_record
  • engram_finish
  • engram_sleep
  • engram_memories
  • engram_inspect
  • engram_audit
  • engram_trace
  • engram_error_book
  • engram_lessons
  • engram_lessons_export
  • engram_lessons_import
  • engram_onboard
  • engram_fragility
  • engram_inbox
  • engram_integrity
  • engram_eval_summary
  • engram_benchmark
  • engram_setup

Hooks

uv run engram-hook UserPromptSubmit --agent claude-code --policy assist

engram-hook reads hook JSON from stdin and normalizes prompt/tool/stop events into Engram work sessions. See docs/integrations/ for Claude Code, Cline, OpenCode, and Codex examples. Prompt hooks print only the prepared context packet to stdout for injection; context savings are printed as a compact table on stderr.

Live Score

uv run python scripts/live_score.py

The live score runs a temp-database smoke scenario through engram when it is linked globally. It checks:

  • Lesson Card creation
  • Lesson recall in the prepared context packet
  • prior failure recall
  • rejected approach recall
  • source tag preservation
  • structured Error Book through the CLI
  • eval metrics
  • Memory Inbox quarantine
  • doctor counts
  • replay export/run
  • integrity verification

Example shape:

{
  "score": 100,
  "earned": 124,
  "total": 124,
  "counts": {
    "lessons": 2,
    "inbox": 2,
    "replayed": 5,
    "error_book": 3
  }
}

Benchmarks

live_score.py is a smoke gate. For benchmark claims, use the dedicated bench commands:

uv run engram bench anti-repeat \
  --dataset benchmarks/datasets/smoke/anti_repeat.jsonl \
  --output /tmp/engram-anti-repeat

uv run engram bench longmemeval \
  --data benchmarks/datasets/smoke/longmemeval_tiny.jsonl \
  --output /tmp/engram-longmemeval

The anti-repeat benchmark is Engram-native: it measures whether a second agent task recalls the prior failure, rejected approach, proof test, source tags, and fix from an earlier work session. The LongMemEval adapter reports external retrieval metrics such as recall_at_5, mrr, precision_at_k, abstention, knowledge-update accuracy, latency, and context savings.

Use engram bench compare --system all --dataset ... to compare Engram against two local baselines: full-history and no-memory. Do not publish external competitor numbers until that competitor is wired through the same adapter and the generated results.json is checked in or attached to the release.

See docs/benchmarks.md for dataset formats, metrics, reporting rules, and the market-proof checklist that separates usable claims from claims that still need external adapters.

Benchmark Adapters

uv run engram bench compare \
  --system engram,mem0 \
  --dataset benchmarks/datasets/smoke/anti_repeat.jsonl

uv run engram bench compare \
  --system engram,external \
  --adapter-name reporecall \
  --adapter-command "uv run python adapters/reporecall_adapter.py" \
  --dataset benchmarks/datasets/smoke/anti_repeat.jsonl

Adapters are for benchmarking only. They do not change Engram recall and they do not add code indexing. Engram sends the same dataset to each adapter, receives a normalized packet per scenario, and scores every system with the same evaluator. Mem0 is optional and only required when --system mem0 is selected. The generic subprocess adapter accepts JSON on stdin and returns JSON on stdout, which makes future Zep, AgentMemory, RepoRecall, or custom comparisons possible without adding those products to Engram core.

Install named adapter dependencies with:

uv sync --extra adapters

Current checked-in Mem0 smoke result:

  • benchmarks/results/mem0-smoke-local/results.json
  • Engram anti-repeat score: 1.0
  • local raw Mem0 anti-repeat score: 0.9
  • measured gap: Engram recalls rejected approaches from Lesson Cards; the local raw Mem0 run does not

This is a narrow anti-repeat smoke result, not a broad claim against hosted Mem0 or Mem0 with an LLM extraction policy.

Positioning

Engram is deliberately narrow.

It does not try to replace code search, embeddings over a repository, or a general long-term memory SaaS. It focuses on the layer coding agents are missing when they repeat mistakes: local, auditable, evidence-aware experience memory.

Use Engram when you want:

  • a Claude Code, Codex, OpenCode, or Cline session to remember prior engineering outcomes
  • anti-repeat protection before a risky edit
  • a small working packet instead of a full transcript resend
  • replayable memory-assisted sessions for local evaluation
  • local-first storage you can inspect and delete

Do not use Engram as:

  • a semantic code search engine
  • a replacement for tests
  • a source of truth for unreviewed agent guesses
  • a benchmark claim against general memory platforms without running your own workload

Development Gate

uv run pytest -q
uv run python -m compileall -q server tests scripts
npm --prefix packages/ts-client install
npm --prefix packages/ts-client run build
cd packages/cli && npm pack --dry-run
uv run python scripts/live_score.py
uv run engram integrity verify
uv run python scripts/stress_gate.py
uv run python scripts/release_gate.py

The repository should be clean after removing generated caches and build artifacts.

See docs/audit-coverage.md for the audit synthesis coverage matrix and docs/release.md for npm/Python release steps.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors