TSchonleber · Velamj · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
@@ -155,7 +155,9 @@ jobs:
               - 'src/agentmemory/rerank.py'
               - 'src/agentmemory/embeddings.py'
               - 'src/agentmemory/retrieval.py'
+              - 'src/agentmemory/retrieval/**'
               - 'bin/intent_classifier.py'
+              - 'benchmarks/**'
               - 'tests/bench/**'
 
       - name: Set up Python

diff --git a/.gitignore b/.gitignore
@@ -14,6 +14,10 @@ db/*.backup
 logs/
 blobs/
 backups/
+benchmarks/results/
+benchmarks/training_data/
+src/agentmemory/retrieval/models/*.json
+.vs/
 .DS_Store
 /tmp/
 *.swp

diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -5,7 +5,8 @@
 brainctl is a persistent memory system for AI agents backed by a single SQLite
 database. No server process, no external dependencies beyond Python and SQLite.
 Multiple agents (or a single agent across sessions) share one `brain.db` file
-for memories, events, entities, decisions, and a knowledge graph.
+for episodic, semantic, and procedural memory plus events, entities,
+decisions, and a knowledge graph.
 
 ## Project Structure
 
@@ -18,7 +19,9 @@ src/agentmemory/
   cli.py               Entry point
   mcp_server.py        MCP server entry
   hippocampus.py       Consolidation engine (brainctl-consolidate entry point)
-  commands/            24 command modules
+  procedural.py        Canonical procedural memory service + heuristics
+  retrieval/           Query planner, candidate generation, evidence fusion, answerability
+  commands/            25 command modules
     agent.py           Agent registration and state
     memory.py          Memory CRUD and search
     event.py           Event logging and queries
@@ -55,6 +58,11 @@ All state lives in a single `brain.db` file (SQLite, WAL mode).
 | Table | Purpose |
 |-------|---------|
 | `memories` | Durable facts, preferences, lessons, conventions |
+| `procedures` | Canonical procedural memories linked 1:1 to bridge rows in `memories` |
+| `procedure_steps` | Ordered step projection for procedures |
+| `procedure_sources` | Provenance links from procedures back to memories/events/decisions/entities |
+| `procedure_runs` | Execution/application feedback history for procedures |
+| `procedure_candidates` | Repeat-pattern staging area before promotion to canonical procedures |
 | `events` | Timestamped event log (append-oriented) |
 | `entities` | Named entities (people, projects, tools, concepts) |
 | `knowledge_edges` | Typed, weighted edges between any two records |
@@ -70,6 +78,15 @@ All state lives in a single `brain.db` file (SQLite, WAL mode).
 
 See `db/init_schema.sql` for full column definitions and migrations.
 
+`memories.memory_type` is now a three-way core layer selector:
+- `episodic` — specific events, traces, and observations
+- `semantic` — distilled facts, preferences, and conventions
+- `procedural` — reusable workflows, runbooks, troubleshooting sequences, rollback plans
+
+The canonical structured procedure lives in `procedures`; the linked
+`memories` row keeps a human-readable synopsis so legacy memory search and
+older interfaces continue to see something useful.
+
 ### Vector Tables (optional, requires sqlite-vec)
 
 | Table | Purpose |
@@ -94,17 +111,31 @@ natural-language queries ("what does Alice prefer?") match memories that
 contain *any* meaningful term, not only memories that contain every word.
 Stopwords are dropped before OR expansion.
 
-### Hybrid Search + Reciprocal Rank Fusion
+### Retrieval Executive + Hybrid Search
+
+`cmd_search` now acts as a compatibility shell around a retrieval executive:
+
+1. `retrieval.query_planner` inspects the query and emits a structured plan
+   (`normalized_intent`, `answer_type`, target entities, temporal anchors,
+   preferred memory layers, candidate tables, abstain policy).
+2. `cmd_search` still performs the existing FTS5/sqlite-vec retrieval paths
+   for memories, events, and context.
+3. `retrieval.candidate_generation` adds a first-class procedural candidate
+   path using `procedures_fts` plus structured fallback search.
+4. `retrieval.evidence_graph` expands top procedures over
+   `procedure_sources` and `knowledge_edges` to gather supporting episodes,
+   decisions, events, tools, and rollback relations.
+5. `retrieval.late_reranker` deterministically fuses direct lexical match,
+   procedural structure match, validation recency, execution history, and
+   evidence support.
+6. `retrieval.answerability` decides whether to abstain instead of returning
+   ungrounded nearest-neighbor junk.
+
+The effective plan and answerability diagnostics surface in `_debug` /
+`metacognition` so benchmark misses remain explainable.
 
-`cmd_search` merges FTS5 and sqlite-vec results with Reciprocal Rank Fusion
-(`rrf_score = 1/(60 + rank)`), applies temporal decay, category half-life,
-and adaptive salience weighting, then runs a regex-based query intent
-classifier (`bin/intent_classifier.py`) whose output is normalized inside
-`cmd_search` onto six rerank profiles: `entity_lookup`, `event_lookup`,
-`decision_lookup`, `graph_traversal`, `procedural`, `general`. The
-classifier covers ~80% of real agent queries with zero latency; the
-rerank branch applied to the blend is reported in the
-`metacognition.rerank_branch` field of every response.
+The old hybrid core is preserved: memories/events/context still merge FTS5
+and sqlite-vec via Reciprocal Rank Fusion when vector search is available.
 
 ### Vector Search (optional)
 
@@ -123,14 +154,18 @@ Multi-hop neighbor queries across the knowledge graph via `brainctl graph`.
 
 ### Retrieval Regression Gate
 
-`tests/bench/` ships a deterministic search-quality harness: 30 synthetic
-memories + 8 events + 6 entities + 20 graded queries (3=primary, 2=related,
-1=tangential) across seven query classes (entity / procedural / decision /
-temporal / troubleshooting / negative / ambiguous). The runner reports
-P@1, P@5, Recall@5, MRR, nDCG@5 against a committed baseline at
+`tests/bench/` ships a deterministic search-quality harness: synthetic
+memories + procedures + events + entities with graded queries (3=primary,
+2=related, 1=tangential) across entity / procedural / decision / temporal /
+troubleshooting / negative / ambiguous classes. The runner reports
+P@1, P@5, Recall@5, MRR, nDCG@5 plus P@5 ceiling diagnostics
+(`p_at_5_ceiling`, `p_at_5_ratio_to_ceiling`) against a committed baseline at
 `tests/bench/baselines/search_quality.json`. Any >2% drop on a headline
 metric fails the `test_search_quality_bench.py` pytest regression test.
-Run with `python3 -m tests.bench.run` or `bin/brainctl-bench`.
+The harness also records failure modes (`retrieval_failure`,
+`utilization_failure`, `hallucination`, `correct_abstain`) and captures the
+retrieval executive debug payload. Run with `python3 -m tests.bench.run` or
+`bin/brainctl-bench`.
 
 ## Knowledge Graph
 
@@ -190,6 +225,7 @@ Runs as part of the nightly consolidation cycle; results surface in
 | **Compression** | Merges clusters of related low-value memories into summaries |
 | **Dream** | Synthesizes new hypotheses from loosely connected memories |
 | **Hebbian** | Strengthens edges between frequently co-accessed records |
+| **Procedural synthesis** | Promotes repeated successful action patterns into procedure candidates or canonical procedures |
 
 `bin/consolidation-cycle.sh` chains the hippocampus passes with:
 

diff --git a/COGNITIVE_PROTOCOL.md b/COGNITIVE_PROTOCOL.md
@@ -15,6 +15,7 @@ Before starting any task, check what's already known:
 
 ```bash
 brainctl -a myagent search "task keywords" --limit 10
+brainctl -a myagent procedure search "task keywords" --limit 5
 brainctl event tail -n 10
 brainctl decision list
 ```
@@ -35,9 +36,24 @@ When you find something non-obvious, save it right away:
 brainctl -a myagent memory add "what you discovered" -c CATEGORY -s SCOPE
 ```
 
+If what you learned is reusable execution knowledge rather than a plain fact,
+store it as a procedure:
+
+```bash
+brainctl -a myagent procedure add \
+  --title "staging deploy runbook" \
+  --goal "deploy to staging safely" \
+  --step "run tests" \
+  --step "brainctl migrate" \
+  --step "deploy and verify health"
+```
+
 **Good memories:** "The API rate-limits at 100 req/15s with Retry-After header"
 **Bad memories:** "I ran npm install" (trivial) / "The build passed" (transient)
 
+**Good procedures:** rollback plans, troubleshooting sequences, migration
+runbooks, validated tool-use recipes.
+
 ### Categories
 
 | Category | Use for |

diff --git a/MCP_SERVER.md b/MCP_SERVER.md
@@ -50,7 +50,7 @@ docker run -v ~/.agentmemory:/data -e BRAIN_DB=/data/brain.db brainctl
 The `CMD` defaults to `brainctl-mcp`, so the container runs the MCP
 server over stdio.
 
-## Available Tools (201)
+## Available Tools (209)
 
 | Tool | Description |
 |------|-------------|
@@ -69,12 +69,20 @@ server over stdio.
 | `trigger_update` | Update fields on an existing trigger |
 | `trigger_delete` | Cancel/delete a trigger by ID |
 | `decision_add` | Record a decision with rationale |
+| `procedure_add` | Create a structured procedural memory with ordered steps |
+| `procedure_get` | Fetch a canonical procedure with steps and provenance |
+| `procedure_list` | List procedures with scope/status filters |
+| `procedure_search` | Search procedural memories and return structured matches |
+| `procedure_update` | Update a canonical procedure |
+| `procedure_feedback` | Record execution outcome / validation against a procedure |
+| `procedure_backfill` | Promote likely procedures from existing memories/events/decisions |
+| `procedure_stats` | Show canonical procedure and candidate counts |
 | `handoff_add` | Create a structured handoff packet |
 | `handoff_latest` | Fetch the latest matching handoff packet |
 | `handoff_consume` | Mark a handoff packet consumed |
 | `handoff_pin` | Pin a handoff packet for preservation |
 | `handoff_expire` | Mark a handoff packet expired |
-| `search` | Cross-table search (memories + events + entities) |
+| `search` | Cross-table search with retrieval planning across memories + procedures + events + entities |
 | `pagerank` | Compute PageRank centrality over knowledge graph |
 | `stats` | Database statistics and health summary |
 | `resolve_conflict` | AGM credibility-weighted belief conflict resolution |
@@ -114,13 +122,15 @@ server over stdio.
 
 **Store information:**
 - Durable fact/lesson/convention: `memory_add` (enforces W(m) write gate)
+- Durable workflow/runbook: `procedure_add` or `memory_add(memory_type="procedural")`
 - What just happened: `event_add` (timestamped, no gate)
 - Why a choice was made: `decision_add` (with rationale)
 - Working state for next session: `handoff_add`
 
 **Find information:**
 - Everything about a topic: `search` (memories + events + entities)
 - Just memories: `memory_search` (supports category, scope, pagerank_boost)
+- Just procedures: `procedure_search`
 - Just events: `event_search` (supports event_type, project)
 - A specific entity: `entity_get`
 - Entities matching a query: `entity_search`
@@ -144,6 +154,7 @@ server over stdio.
 
 | Category | Tools | When to use |
 |----------|-------|-------------|
+| Procedural memory | `procedure_add`, `procedure_search`, `procedure_feedback`, `procedure_backfill`, `procedure_stats` | Runbooks, rollback plans, troubleshooting routines, validated workflows |
 | Consolidation | `consolidation_run`, `replay_boost`, `replay_queue` | Memory maintenance |
 | Reconsolidation | `reconsolidation_check`, `reconsolidate` | Lability window mechanics |
 | Beliefs & Conflicts | `resolve_conflict`, `belief_collapse` | When memories contradict |
@@ -165,13 +176,15 @@ What do you need?
 |
 +-- Store something?
 |   +-- Durable fact ----------> memory_add
+|   +-- Durable runbook -------> procedure_add
 |   +-- What just happened ----> event_add
 |   +-- Why a choice was made -> decision_add
 |   +-- State for next session > handoff_add
 |
 +-- Find something?
 |   +-- Broad topic search ----> search
 |   +-- Memories only ---------> memory_search
+|   +-- Procedures only -------> procedure_search
 |   +-- Events only -----------> event_search
 |   +-- Entity by name --------> entity_get
 |

diff --git a/README.md b/README.md
@@ -2,14 +2,19 @@
 
 **Forgetful agents, fixed by a SQLite file.**
 
-One `brain.db` gives your agent durable memory across sessions — facts learned, decisions made, entities tracked, and state handed off. No server. No API keys. No LLM calls required.
+One `brain.db` gives your agent durable memory across sessions — episodic evidence, semantic facts, procedural runbooks, decisions made, entities tracked, and state handed off. No server. No API keys. No LLM calls required.
 
 ```python
 from agentmemory import Brain
 
 brain = Brain(agent_id="my-agent")
-ctx = brain.orient(project="api-v2")           # session start: handoff + events + triggers + memories
+ctx = brain.orient(project="api-v2")           # handoff + events + triggers + memories + procedures
 brain.remember("rate-limit: 100/15s", category="integration")
+brain.remember_procedure(
+    goal="Deploy to staging safely",
+    steps=["Run tests", "brainctl migrate", "Deploy", "Check health"],
+    title="Staging deploy runbook",
+)
 brain.decide("use Retry-After for backoff", "server controls timing", project="api-v2")
 brain.wrap_up("auth module complete", project="api-v2")  # session end: logs + handoff for next run
 ```
@@ -45,13 +50,15 @@ brain.relate("OpenAI", "provides", "GPT-4o")
 
 **Memory types**
 - `convention`, `decision`, `environment`, `identity`, `integration`, `lesson`, `preference`, `project`, `user`
+- Core memory layers: episodic, semantic, and procedural
 - Category controls natural half-life: identity decays over ~1 year; integration details over ~1 month
 - Hard cap: 10,000 memories per agent. Emergency compression retires lowest-confidence entries.
 
 **Retrieval modes**
 - FTS5 full-text search with stemming (default, zero dependencies)
 - Vector similarity via sqlite-vec + Ollama nomic-embed-text (`brainctl[vec]`)
 - Hybrid: Reciprocal Rank Fusion over FTS5 + vector results
+- Retrieval executive above memories/events/context/decisions/procedures: query planning, candidate fusion, procedural evidence expansion, deterministic late reranking, grounded abstention
 - Context profiles: named search presets scoped to task type (`--profile ops`, `--profile research`, etc.)
 - `--benchmark` preset: flattens recency/salience for synthetic evaluation runs
 
@@ -62,7 +69,7 @@ brain.relate("OpenAI", "provides", "GPT-4o")
 - Cross-encoder controls: `--rerank-top-n` and `--rerank-budget-ms` tune candidate window + strict latency budget
 - Top-heavy staged rollout controls (I6): `--rollout-mode`, `--rollout-canary-agents`, `--rollout-canary-percent`, `--rollback-top-heavy`
 - Env mirrors for rollout controls: `BRAINCTL_TOPHEAVY_ROLLOUT_MODE`, `BRAINCTL_TOPHEAVY_CANARY_AGENTS`, `BRAINCTL_TOPHEAVY_CANARY_PERCENT`, `BRAINCTL_TOPHEAVY_ROLLBACK`
-- Retrieval regression-gated in CI: >2% drop on P@1/P@5/MRR/nDCG@5 fails the build
+- Retrieval regression-gated in CI: >2% drop on P@1/P@5/MRR/nDCG@5 fails the build. Search-quality output also reports the fixture-specific P@5 ceiling and ratio-to-ceiling so sparse graded queries do not make raw P@5 look worse than it is.
 
 **Knowledge graph**
 - Typed entity nodes: `agent`, `concept`, `document`, `event`, `location`, `organization`, `person`, `project`, `service`, `tool`
@@ -112,7 +119,7 @@ Trading bots:
 | `plugins/octobot/` | OctoBot |
 | `plugins/coinbase-agentkit/` | Coinbase AgentKit |
 
-## MCP server (201 tools)
+## MCP server (209 tools)
 
 ```json
 {
@@ -130,7 +137,11 @@ Add to `~/.claude/claude_desktop_config.json`, `~/.cursor/mcp.json`, or equivale
 
 ```bash
 brainctl memory add "content" -c convention   # store a memory
+brainctl memory add "rollback steps..." -c convention --type procedural
 brainctl search "query"                       # FTS5 search
+brainctl procedure add --goal "Deploy to staging safely" --step "Run tests" --step "brainctl migrate"
+brainctl procedure search "how do I deploy to staging?"
+brainctl procedure feedback 12 --success --validated --outcome "deploy completed cleanly"
 brainctl vsearch "semantic query"             # vector search (requires [vec])
 brainctl entity create "Alice" -t person      # create entity
 brainctl entity relate Alice works_at Acme    # link entities
@@ -146,14 +157,17 @@ brainctl gaps scan                            # coverage + orphan + broken-edge
 brainctl consolidate cycle                    # full consolidation pass
 ```
 
-## Python API (22 methods)
+## Python API
 
 | Method | What it does |
 |--------|--------------|
 | `orient(project)` | One-call session start: handoff + events + triggers + memories |
 | `wrap_up(summary)` | One-call session end: logs event + creates handoff |
 | `remember(content, category)` | Store a durable fact through the W(m) write gate |
+| `remember(content, category, memory_type="procedural")` | Store free text and compile it into a structured procedure when appropriate |
+| `remember_procedure(goal, steps, ...)` | Create a canonical procedural memory with structured fields |
 | `search(query)` | FTS5 full-text search with stemming |
+| `search_procedures(query)` | Search structured procedures with deterministic procedural scoring |
 | `vsearch(query)` | Vector similarity search (optional) |
 | `think(query)` | Spreading-activation recall across the knowledge graph |
 | `forget(memory_id)` | Soft-delete a memory |
@@ -167,6 +181,8 @@ brainctl consolidate cycle                    # full consolidation pass
 | `resume()` | Fetch and consume latest handoff |
 | `doctor()` | Diagnostic health check |
 | `consolidate()` | Promote high-importance memories |
+| `procedure_feedback(procedure_id, ...)` | Record execution outcome, validation, and utility for a procedure |
+| `backfill_procedures()` | Synthesize candidate/canonical procedures from existing memories, events, and decisions |
 | `tier_stats()` | Write-tier distribution |
 | `stats()` | Database overview |
 | `affect(text)` | Classify emotional state |
@@ -177,17 +193,19 @@ brainctl consolidate cycle                    # full consolidation pass
 
 - **Write gate** (W(m)): surprise scoring rejects redundant writes. Bypass with `force=True`.
 - **Three-tier routing**: high-value memories get full indexing; low-value get lightweight storage.
+- **Procedural compilation**: explicit runbooks live in dedicated procedure tables; `memory_type="procedural"` free text is heuristically compiled without deleting the original evidence.
 - **Duplicate suppression**: near-duplicates reinforce existing memories instead of creating new rows.
 - **Half-life decay**: unused memories fade at a rate set by category. Recalled memories are reinforced.
-- **Consolidation**: Hebbian learning, temporal promotion, compression — runs on a cron schedule.
+- **Consolidation**: Hebbian learning, temporal promotion, compression, and procedural candidate synthesis — runs on a cron schedule.
 
 ## Retrieval benchmarks
 
 Tested with default settings, no tuning for benchmark data. Two harnesses
 ship in the tree:
 
 * `tests/bench/` — single-system retrieval baselines for `Brain.search`
-  and `cmd_search`, gated against regression in CI.
+  and `cmd_search`, now covering procedural lookup, rollback/troubleshooting,
+  ambiguity, and abstention, gated against regression in CI.
 * `tests/bench/competitor_runs/` — same-fixture head-to-head harness
   with adapters for Mem0, Letta, Zep, Cognee, MemPalace, OpenAI Memory.
   Skip-not-fabricate contract: missing SDK / API key raises

diff --git a/benchmarks/__init__.py b/benchmarks/__init__.py
@@ -0,0 +1,2 @@
+"""Legacy benchmark comparison helpers for brainctl vs MemPalace."""
+
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		"""Legacy benchmark comparison helpers for brainctl vs MemPalace."""