Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,9 @@ jobs:
- 'src/agentmemory/rerank.py'
- 'src/agentmemory/embeddings.py'
- 'src/agentmemory/retrieval.py'
- 'src/agentmemory/retrieval/**'
- 'bin/intent_classifier.py'
- 'benchmarks/**'
- 'tests/bench/**'

- name: Set up Python
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ db/*.backup
logs/
blobs/
backups/
benchmarks/results/
benchmarks/training_data/
src/agentmemory/retrieval/models/*.json
.vs/
.DS_Store
/tmp/
*.swp
Expand Down
72 changes: 54 additions & 18 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
brainctl is a persistent memory system for AI agents backed by a single SQLite
database. No server process, no external dependencies beyond Python and SQLite.
Multiple agents (or a single agent across sessions) share one `brain.db` file
for memories, events, entities, decisions, and a knowledge graph.
for episodic, semantic, and procedural memory plus events, entities,
decisions, and a knowledge graph.

## Project Structure

Expand All @@ -18,7 +19,9 @@ src/agentmemory/
cli.py Entry point
mcp_server.py MCP server entry
hippocampus.py Consolidation engine (brainctl-consolidate entry point)
commands/ 24 command modules
procedural.py Canonical procedural memory service + heuristics
retrieval/ Query planner, candidate generation, evidence fusion, answerability
commands/ 25 command modules
agent.py Agent registration and state
memory.py Memory CRUD and search
event.py Event logging and queries
Expand Down Expand Up @@ -55,6 +58,11 @@ All state lives in a single `brain.db` file (SQLite, WAL mode).
| Table | Purpose |
|-------|---------|
| `memories` | Durable facts, preferences, lessons, conventions |
| `procedures` | Canonical procedural memories linked 1:1 to bridge rows in `memories` |
| `procedure_steps` | Ordered step projection for procedures |
| `procedure_sources` | Provenance links from procedures back to memories/events/decisions/entities |
| `procedure_runs` | Execution/application feedback history for procedures |
| `procedure_candidates` | Repeat-pattern staging area before promotion to canonical procedures |
| `events` | Timestamped event log (append-oriented) |
| `entities` | Named entities (people, projects, tools, concepts) |
| `knowledge_edges` | Typed, weighted edges between any two records |
Expand All @@ -70,6 +78,15 @@ All state lives in a single `brain.db` file (SQLite, WAL mode).

See `db/init_schema.sql` for full column definitions and migrations.

`memories.memory_type` is now a three-way core layer selector:
- `episodic` — specific events, traces, and observations
- `semantic` — distilled facts, preferences, and conventions
- `procedural` — reusable workflows, runbooks, troubleshooting sequences, rollback plans

The canonical structured procedure lives in `procedures`; the linked
`memories` row keeps a human-readable synopsis so legacy memory search and
older interfaces continue to see something useful.

### Vector Tables (optional, requires sqlite-vec)

| Table | Purpose |
Expand All @@ -94,17 +111,31 @@ natural-language queries ("what does Alice prefer?") match memories that
contain *any* meaningful term, not only memories that contain every word.
Stopwords are dropped before OR expansion.

### Hybrid Search + Reciprocal Rank Fusion
### Retrieval Executive + Hybrid Search

`cmd_search` now acts as a compatibility shell around a retrieval executive:

1. `retrieval.query_planner` inspects the query and emits a structured plan
(`normalized_intent`, `answer_type`, target entities, temporal anchors,
preferred memory layers, candidate tables, abstain policy).
2. `cmd_search` still performs the existing FTS5/sqlite-vec retrieval paths
for memories, events, and context.
3. `retrieval.candidate_generation` adds a first-class procedural candidate
path using `procedures_fts` plus structured fallback search.
4. `retrieval.evidence_graph` expands top procedures over
`procedure_sources` and `knowledge_edges` to gather supporting episodes,
decisions, events, tools, and rollback relations.
5. `retrieval.late_reranker` deterministically fuses direct lexical match,
procedural structure match, validation recency, execution history, and
evidence support.
6. `retrieval.answerability` decides whether to abstain instead of returning
ungrounded nearest-neighbor junk.

The effective plan and answerability diagnostics surface in `_debug` /
`metacognition` so benchmark misses remain explainable.

`cmd_search` merges FTS5 and sqlite-vec results with Reciprocal Rank Fusion
(`rrf_score = 1/(60 + rank)`), applies temporal decay, category half-life,
and adaptive salience weighting, then runs a regex-based query intent
classifier (`bin/intent_classifier.py`) whose output is normalized inside
`cmd_search` onto six rerank profiles: `entity_lookup`, `event_lookup`,
`decision_lookup`, `graph_traversal`, `procedural`, `general`. The
classifier covers ~80% of real agent queries with zero latency; the
rerank branch applied to the blend is reported in the
`metacognition.rerank_branch` field of every response.
The old hybrid core is preserved: memories/events/context still merge FTS5
and sqlite-vec via Reciprocal Rank Fusion when vector search is available.

### Vector Search (optional)

Expand All @@ -123,14 +154,18 @@ Multi-hop neighbor queries across the knowledge graph via `brainctl graph`.

### Retrieval Regression Gate

`tests/bench/` ships a deterministic search-quality harness: 30 synthetic
memories + 8 events + 6 entities + 20 graded queries (3=primary, 2=related,
1=tangential) across seven query classes (entity / procedural / decision /
temporal / troubleshooting / negative / ambiguous). The runner reports
P@1, P@5, Recall@5, MRR, nDCG@5 against a committed baseline at
`tests/bench/` ships a deterministic search-quality harness: synthetic
memories + procedures + events + entities with graded queries (3=primary,
2=related, 1=tangential) across entity / procedural / decision / temporal /
troubleshooting / negative / ambiguous classes. The runner reports
P@1, P@5, Recall@5, MRR, nDCG@5 plus P@5 ceiling diagnostics
(`p_at_5_ceiling`, `p_at_5_ratio_to_ceiling`) against a committed baseline at
`tests/bench/baselines/search_quality.json`. Any >2% drop on a headline
metric fails the `test_search_quality_bench.py` pytest regression test.
Run with `python3 -m tests.bench.run` or `bin/brainctl-bench`.
The harness also records failure modes (`retrieval_failure`,
`utilization_failure`, `hallucination`, `correct_abstain`) and captures the
retrieval executive debug payload. Run with `python3 -m tests.bench.run` or
`bin/brainctl-bench`.

## Knowledge Graph

Expand Down Expand Up @@ -190,6 +225,7 @@ Runs as part of the nightly consolidation cycle; results surface in
| **Compression** | Merges clusters of related low-value memories into summaries |
| **Dream** | Synthesizes new hypotheses from loosely connected memories |
| **Hebbian** | Strengthens edges between frequently co-accessed records |
| **Procedural synthesis** | Promotes repeated successful action patterns into procedure candidates or canonical procedures |

`bin/consolidation-cycle.sh` chains the hippocampus passes with:

Expand Down
16 changes: 16 additions & 0 deletions COGNITIVE_PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Before starting any task, check what's already known:

```bash
brainctl -a myagent search "task keywords" --limit 10
brainctl -a myagent procedure search "task keywords" --limit 5
brainctl event tail -n 10
brainctl decision list
```
Expand All @@ -35,9 +36,24 @@ When you find something non-obvious, save it right away:
brainctl -a myagent memory add "what you discovered" -c CATEGORY -s SCOPE
```

If what you learned is reusable execution knowledge rather than a plain fact,
store it as a procedure:

```bash
brainctl -a myagent procedure add \
--title "staging deploy runbook" \
--goal "deploy to staging safely" \
--step "run tests" \
--step "brainctl migrate" \
--step "deploy and verify health"
```

**Good memories:** "The API rate-limits at 100 req/15s with Retry-After header"
**Bad memories:** "I ran npm install" (trivial) / "The build passed" (transient)

**Good procedures:** rollback plans, troubleshooting sequences, migration
runbooks, validated tool-use recipes.

### Categories

| Category | Use for |
Expand Down
17 changes: 15 additions & 2 deletions MCP_SERVER.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ docker run -v ~/.agentmemory:/data -e BRAIN_DB=/data/brain.db brainctl
The `CMD` defaults to `brainctl-mcp`, so the container runs the MCP
server over stdio.

## Available Tools (201)
## Available Tools (209)

| Tool | Description |
|------|-------------|
Expand All @@ -69,12 +69,20 @@ server over stdio.
| `trigger_update` | Update fields on an existing trigger |
| `trigger_delete` | Cancel/delete a trigger by ID |
| `decision_add` | Record a decision with rationale |
| `procedure_add` | Create a structured procedural memory with ordered steps |
| `procedure_get` | Fetch a canonical procedure with steps and provenance |
| `procedure_list` | List procedures with scope/status filters |
| `procedure_search` | Search procedural memories and return structured matches |
| `procedure_update` | Update a canonical procedure |
| `procedure_feedback` | Record execution outcome / validation against a procedure |
| `procedure_backfill` | Promote likely procedures from existing memories/events/decisions |
| `procedure_stats` | Show canonical procedure and candidate counts |
| `handoff_add` | Create a structured handoff packet |
| `handoff_latest` | Fetch the latest matching handoff packet |
| `handoff_consume` | Mark a handoff packet consumed |
| `handoff_pin` | Pin a handoff packet for preservation |
| `handoff_expire` | Mark a handoff packet expired |
| `search` | Cross-table search (memories + events + entities) |
| `search` | Cross-table search with retrieval planning across memories + procedures + events + entities |
| `pagerank` | Compute PageRank centrality over knowledge graph |
| `stats` | Database statistics and health summary |
| `resolve_conflict` | AGM credibility-weighted belief conflict resolution |
Expand Down Expand Up @@ -114,13 +122,15 @@ server over stdio.

**Store information:**
- Durable fact/lesson/convention: `memory_add` (enforces W(m) write gate)
- Durable workflow/runbook: `procedure_add` or `memory_add(memory_type="procedural")`
- What just happened: `event_add` (timestamped, no gate)
- Why a choice was made: `decision_add` (with rationale)
- Working state for next session: `handoff_add`

**Find information:**
- Everything about a topic: `search` (memories + events + entities)
- Just memories: `memory_search` (supports category, scope, pagerank_boost)
- Just procedures: `procedure_search`
- Just events: `event_search` (supports event_type, project)
- A specific entity: `entity_get`
- Entities matching a query: `entity_search`
Expand All @@ -144,6 +154,7 @@ server over stdio.

| Category | Tools | When to use |
|----------|-------|-------------|
| Procedural memory | `procedure_add`, `procedure_search`, `procedure_feedback`, `procedure_backfill`, `procedure_stats` | Runbooks, rollback plans, troubleshooting routines, validated workflows |
| Consolidation | `consolidation_run`, `replay_boost`, `replay_queue` | Memory maintenance |
| Reconsolidation | `reconsolidation_check`, `reconsolidate` | Lability window mechanics |
| Beliefs & Conflicts | `resolve_conflict`, `belief_collapse` | When memories contradict |
Expand All @@ -165,13 +176,15 @@ What do you need?
|
+-- Store something?
| +-- Durable fact ----------> memory_add
| +-- Durable runbook -------> procedure_add
| +-- What just happened ----> event_add
| +-- Why a choice was made -> decision_add
| +-- State for next session > handoff_add
|
+-- Find something?
| +-- Broad topic search ----> search
| +-- Memories only ---------> memory_search
| +-- Procedures only -------> procedure_search
| +-- Events only -----------> event_search
| +-- Entity by name --------> entity_get
|
Expand Down
32 changes: 25 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,19 @@

**Forgetful agents, fixed by a SQLite file.**

One `brain.db` gives your agent durable memory across sessions — facts learned, decisions made, entities tracked, and state handed off. No server. No API keys. No LLM calls required.
One `brain.db` gives your agent durable memory across sessions — episodic evidence, semantic facts, procedural runbooks, decisions made, entities tracked, and state handed off. No server. No API keys. No LLM calls required.

```python
from agentmemory import Brain

brain = Brain(agent_id="my-agent")
ctx = brain.orient(project="api-v2") # session start: handoff + events + triggers + memories
ctx = brain.orient(project="api-v2") # handoff + events + triggers + memories + procedures
brain.remember("rate-limit: 100/15s", category="integration")
brain.remember_procedure(
goal="Deploy to staging safely",
steps=["Run tests", "brainctl migrate", "Deploy", "Check health"],
title="Staging deploy runbook",
)
brain.decide("use Retry-After for backoff", "server controls timing", project="api-v2")
brain.wrap_up("auth module complete", project="api-v2") # session end: logs + handoff for next run
```
Expand Down Expand Up @@ -45,13 +50,15 @@ brain.relate("OpenAI", "provides", "GPT-4o")

**Memory types**
- `convention`, `decision`, `environment`, `identity`, `integration`, `lesson`, `preference`, `project`, `user`
- Core memory layers: episodic, semantic, and procedural
- Category controls natural half-life: identity decays over ~1 year; integration details over ~1 month
- Hard cap: 10,000 memories per agent. Emergency compression retires lowest-confidence entries.

**Retrieval modes**
- FTS5 full-text search with stemming (default, zero dependencies)
- Vector similarity via sqlite-vec + Ollama nomic-embed-text (`brainctl[vec]`)
- Hybrid: Reciprocal Rank Fusion over FTS5 + vector results
- Retrieval executive above memories/events/context/decisions/procedures: query planning, candidate fusion, procedural evidence expansion, deterministic late reranking, grounded abstention
- Context profiles: named search presets scoped to task type (`--profile ops`, `--profile research`, etc.)
- `--benchmark` preset: flattens recency/salience for synthetic evaluation runs

Expand All @@ -62,7 +69,7 @@ brain.relate("OpenAI", "provides", "GPT-4o")
- Cross-encoder controls: `--rerank-top-n` and `--rerank-budget-ms` tune candidate window + strict latency budget
- Top-heavy staged rollout controls (I6): `--rollout-mode`, `--rollout-canary-agents`, `--rollout-canary-percent`, `--rollback-top-heavy`
- Env mirrors for rollout controls: `BRAINCTL_TOPHEAVY_ROLLOUT_MODE`, `BRAINCTL_TOPHEAVY_CANARY_AGENTS`, `BRAINCTL_TOPHEAVY_CANARY_PERCENT`, `BRAINCTL_TOPHEAVY_ROLLBACK`
- Retrieval regression-gated in CI: >2% drop on P@1/P@5/MRR/nDCG@5 fails the build
- Retrieval regression-gated in CI: >2% drop on P@1/P@5/MRR/nDCG@5 fails the build. Search-quality output also reports the fixture-specific P@5 ceiling and ratio-to-ceiling so sparse graded queries do not make raw P@5 look worse than it is.

**Knowledge graph**
- Typed entity nodes: `agent`, `concept`, `document`, `event`, `location`, `organization`, `person`, `project`, `service`, `tool`
Expand Down Expand Up @@ -112,7 +119,7 @@ Trading bots:
| `plugins/octobot/` | OctoBot |
| `plugins/coinbase-agentkit/` | Coinbase AgentKit |

## MCP server (201 tools)
## MCP server (209 tools)

```json
{
Expand All @@ -130,7 +137,11 @@ Add to `~/.claude/claude_desktop_config.json`, `~/.cursor/mcp.json`, or equivale

```bash
brainctl memory add "content" -c convention # store a memory
brainctl memory add "rollback steps..." -c convention --type procedural
brainctl search "query" # FTS5 search
brainctl procedure add --goal "Deploy to staging safely" --step "Run tests" --step "brainctl migrate"
brainctl procedure search "how do I deploy to staging?"
brainctl procedure feedback 12 --success --validated --outcome "deploy completed cleanly"
brainctl vsearch "semantic query" # vector search (requires [vec])
brainctl entity create "Alice" -t person # create entity
brainctl entity relate Alice works_at Acme # link entities
Expand All @@ -146,14 +157,17 @@ brainctl gaps scan # coverage + orphan + broken-edge
brainctl consolidate cycle # full consolidation pass
```

## Python API (22 methods)
## Python API

| Method | What it does |
|--------|--------------|
| `orient(project)` | One-call session start: handoff + events + triggers + memories |
| `wrap_up(summary)` | One-call session end: logs event + creates handoff |
| `remember(content, category)` | Store a durable fact through the W(m) write gate |
| `remember(content, category, memory_type="procedural")` | Store free text and compile it into a structured procedure when appropriate |
| `remember_procedure(goal, steps, ...)` | Create a canonical procedural memory with structured fields |
| `search(query)` | FTS5 full-text search with stemming |
| `search_procedures(query)` | Search structured procedures with deterministic procedural scoring |
| `vsearch(query)` | Vector similarity search (optional) |
| `think(query)` | Spreading-activation recall across the knowledge graph |
| `forget(memory_id)` | Soft-delete a memory |
Expand All @@ -167,6 +181,8 @@ brainctl consolidate cycle # full consolidation pass
| `resume()` | Fetch and consume latest handoff |
| `doctor()` | Diagnostic health check |
| `consolidate()` | Promote high-importance memories |
| `procedure_feedback(procedure_id, ...)` | Record execution outcome, validation, and utility for a procedure |
| `backfill_procedures()` | Synthesize candidate/canonical procedures from existing memories, events, and decisions |
| `tier_stats()` | Write-tier distribution |
| `stats()` | Database overview |
| `affect(text)` | Classify emotional state |
Expand All @@ -177,17 +193,19 @@ brainctl consolidate cycle # full consolidation pass

- **Write gate** (W(m)): surprise scoring rejects redundant writes. Bypass with `force=True`.
- **Three-tier routing**: high-value memories get full indexing; low-value get lightweight storage.
- **Procedural compilation**: explicit runbooks live in dedicated procedure tables; `memory_type="procedural"` free text is heuristically compiled without deleting the original evidence.
- **Duplicate suppression**: near-duplicates reinforce existing memories instead of creating new rows.
- **Half-life decay**: unused memories fade at a rate set by category. Recalled memories are reinforced.
- **Consolidation**: Hebbian learning, temporal promotion, compression — runs on a cron schedule.
- **Consolidation**: Hebbian learning, temporal promotion, compression, and procedural candidate synthesis — runs on a cron schedule.

## Retrieval benchmarks

Tested with default settings, no tuning for benchmark data. Two harnesses
ship in the tree:

* `tests/bench/` — single-system retrieval baselines for `Brain.search`
and `cmd_search`, gated against regression in CI.
and `cmd_search`, now covering procedural lookup, rollback/troubleshooting,
ambiguity, and abstention, gated against regression in CI.
* `tests/bench/competitor_runs/` — same-fixture head-to-head harness
with adapters for Mem0, Letta, Zep, Cognee, MemPalace, OpenAI Memory.
Skip-not-fabricate contract: missing SDK / API key raises
Expand Down
2 changes: 2 additions & 0 deletions benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""Legacy benchmark comparison helpers for brainctl vs MemPalace."""

Loading
Loading