Skip to content

bburda/ai_memory_protocol

Repository files navigation

AI Memory Protocol

License Python CI

Versioned, graph-based persistent memory for AI coding agents — powered by Sphinx-Needs.

AI agents lose context between sessions. This protocol gives them a structured way to remember, recall, and evolve knowledge — with full Git history, typed entries, graph links, and machine-readable output.

Features

  • Typed memories — observations, decisions, facts, preferences, risks, goals, open questions
  • Graph links — relates, supports, depends, supersedes, contradicts, example_of
  • Tag-based discoverytopic:api, repo:backend, tier:core
  • Context-optimized output — brief / compact / context / JSON formats with body toggling
  • Stale detection — auto-expire, review reminders, staleness checks
  • Auto-scaling — RST files split at 50 entries, transparent to queries
  • Git-native — every memory is an RST directive, fully diffable and versioned
  • MCP server — expose memory as tools for Claude Desktop, VS Code Copilot, and other MCP clients
  • Autonomous capture — extract memories from Git commits, CI logs, and discussion transcripts
  • Planning engine — analyze memory graph and propose maintenance actions
  • CLI-first — 16+ subcommands for full lifecycle management

Installation

git clone https://github.com/bburda/ai_memory_protocol.git
pipx install -e ai_memory_protocol/

# With MCP server support
pipx install -e 'ai_memory_protocol/[mcp]'

This installs the memory CLI command (and optionally memory-mcp-stdio) globally on PATH.

Quick Start

# 1. Create a memory workspace
memory init .memories --name "My Project" --install

# 2. Add your first memory
memory add fact "API runs on port 8080" \
  --tags "topic:api,repo:backend" \
  --confidence high \
  --body "Gateway listens on 0.0.0.0:8080 by default" \
  --rebuild

# 3. Search
memory recall api port
memory recall --tag topic:api --format brief

# 4. Get full details
memory get FACT_api_runs_on_port_8080

How It Works

RST files (memory/*.rst)          ← Human + AI editable, Git-tracked
    │
    ▼ memory rebuild (sphinx-build)
needs.json (_build/html/needs.json)   ← Machine-readable index
    │
    ▼ memory recall / get / list
Formatted output                  ← Optimized for LLM context windows

Memories are stored as Sphinx-Needs directives in RST files. A memory rebuild command runs Sphinx to produce needs.json — the single query layer for all search operations. This means memories are simultaneously human-readable documentation and machine-queryable data.

CLI Reference

memory init <dir>                       # Create a new workspace
memory add <type> "<title>" [options]   # Record a memory
memory recall [query] [--tag ...] [--format brief|compact|context|json]
memory get <ID>                         # Full details of one memory
memory related <ID> [--hops N]          # Graph walk from a memory
memory list [--type TYPE] [--status S]  # Browse all memories
memory update <ID> [--confidence ...] [--add-tags ...]
memory deprecate <ID> [--by NEW_ID]     # Mark as deprecated
memory tags [--prefix PREFIX]           # Discover tags in use
memory stale                            # Find expired/overdue memories
memory review                           # Show memories needing review
memory rebuild                          # Rebuild needs.json
memory capture git                      # Extract memories from recent commits
memory capture ci --input <file|->      # Extract memories from CI/test logs
memory capture discussion --input <file|->  # Extract from conversation transcripts
memory plan [--auto-apply]              # Analyze graph and propose maintenance
memory apply <plan.json>                # Execute a generated plan
memory doctor                           # Verify installation health

Key flags for recall:

  • --format brief — ultra-compact, minimal tokens
  • --body — include body text (off by default)
  • --sort newest|oldest|confidence|updated
  • --limit N — cap results
  • --expand 0 — disable graph expansion
  • --stale — only expired/review-overdue

MCP Server

Expose memory tools to LLM clients via the Model Context Protocol.

Setup

Install with MCP extras:

pipx install -e 'ai_memory_protocol/[mcp]'

Claude Code

claude mcp add --transport stdio --env MEMORY_DIR=/path/to/.memories memory -- memory-mcp-stdio

Or add to .mcp.json in your project root (project scope):

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "memory-mcp-stdio",
      "env": {
        "MEMORY_DIR": "/path/to/.memories"
      }
    }
  }
}

VS Code (GitHub Copilot)

Add to .vscode/mcp.json:

{
  "servers": {
    "memory": {
      "command": "memory-mcp-stdio",
      "env": {
        "MEMORY_DIR": "${workspaceFolder}/.memories"
      }
    }
  }
}

Available MCP Tools

Tool Description
memory_recall Search memories by text/tags with formatting options
memory_get Get full details of a specific memory
memory_add Record a new memory with tags and metadata
memory_update Update metadata (status, confidence, tags, etc.)
memory_deprecate Mark a memory as deprecated
memory_tags List all tags with counts
memory_stale Find expired/overdue memories
memory_rebuild Rebuild needs.json index
memory_capture_git Extract memories from recent Git commits
memory_capture_ci Extract memories from CI/test log output
memory_capture_discussion Extract memories from conversation transcripts
memory_plan Analyze memory graph and propose maintenance actions
memory_apply Execute a generated maintenance plan

Memory Types

Type Prefix Use Case
mem MEM_ Observation, note, or finding
dec DEC_ Design or architectural decision
fact FACT_ Verified, stable knowledge
pref PREF_ Coding style or convention
risk RISK_ Uncertainty or assumption
goal GOAL_ Objective or target
q Q_ Open question needing resolution

Graph Links

Link Meaning
relates General association
supports Evidence or justification
depends Hard dependency
supersedes Replaces older memory
contradicts Conflict or tension
example_of Concrete instance of concept

Metadata

Field Values Purpose
confidence low / medium / high Trust level
scope global, repo:X, product:X Applicability
tags prefix:value format Categorization
source URL, commit, description Provenance
review_after ISO date Staleness trigger
expires_at ISO date Auto-expire date
created_at ISO date Capture timestamp

Tagging Conventions

Tags use prefix:value format for consistent discovery:

  • topic: — Subject area (topic:gateway, topic:auth)
  • repo: — Repository (repo:backend, repo:web-ui)
  • domain: — Knowledge domain (domain:robotics, domain:web)
  • tier: — Importance level (tier:core, tier:detail)
  • intent: — Purpose (intent:decision, intent:coding-style)

AI Agent Integration

Recommended Workflow

1. READ — Peek then Drill (two-phase recall)

Always use a two-phase approach. Never go straight to body text on broad queries.

Phase A — Peek (scan titles, zero body text):

memory recall --tag topic:gateway --format brief --expand 0

Returns [ID] Title (confidence) one-liners. Minimal tokens. Do this FIRST.

Phase B — Drill (read full body of specific memories):

memory get DEC_handler_context_pattern

Only after peeking — pick the 2-3 most relevant IDs and get them individually.

When to recall — recall is NOT just a session-start ritual. Recall at each of these moments:

Trigger What to recall
Session start recall --format brief --limit 20 --sort newest
New task or topic recall --tag topic:<X> --format brief
Entering unfamiliar code recall --tag repo:<X> --type fact --format brief
Before a design decision recall --tag topic:<X> --type dec
Encountering an error or failure recall <error message keywords> — FIRST reaction before debugging; check if this problem was already solved
Stuck after initial attempts recall --tag topic:<X> --type mem,fact — broaden search to related areas and past solutions
Before implementing a pattern recall --tag intent:coding-style --type pref

2. WRITE — Record at specific trigger points

Recording memories is NOT optional. Write at these concrete moments:

Trigger Type Example
Chose approach A over B dec "Use tl::expected over exceptions"
Fixed a non-obvious bug mem "EntityCache race condition fix"
Discovered undocumented API fact "Routes match in registration order"
User stated a preference pref "Prefer Zustand over Redux"
Identified a risk risk "JWT secret hardcoded in tests"
Question remains unanswered q "Should synthetic components expose operations?"

End-of-task writes: summarize architecture learned (fact), record conventions (pref), note anything a future agent needs (mem), capture unfinished goals (goal).

Write quality rules:

  • --tags is mandatory — without tags, the memory is unfindable
  • --body must be self-contained with file paths and concrete details
  • Use --rebuild flag to make new memories immediately searchable

3. SUPERSEDE, don't edit

When knowledge changes, add a new entry with --supersedes OLD_ID and deprecate the old one.

4. CHECK STALENESS periodically

Run memory stale at the start of long sessions to keep the graph accurate.

Context Window Optimization

  • recall omits body by default — this is intentional, not a limitation
  • Peek with --format briefdrill with get <ID> — this is the core pattern
  • Use --limit 10 and --expand 0 when exploring broad topics
  • Use --tag filters to narrow results instead of free-text
  • Use memory tags to discover available tag prefixes before filtering

Project Structure

ai_memory_protocol/
├── pyproject.toml           # Package definition, CLI + MCP entry points
├── README.md
├── LICENSE                  # Apache 2.0
├── CONTRIBUTING.md
├── .pre-commit-config.yaml
├── .github/workflows/ci.yml
└── src/
    └── ai_memory_protocol/
        ├── __init__.py
        ├── cli.py           # CLI (argparse, 16+ subcommands)
        ├── mcp_server.py    # MCP server (13 tools, stdio transport)
        ├── capture.py       # Knowledge extraction (git, CI, discussion)
        ├── planner.py       # Graph analysis and maintenance planning
        ├── executor.py      # Plan execution engine
        ├── config.py        # Type definitions, constants
        ├── engine.py        # Workspace detection, search, graph walk
        ├── formatter.py     # Output formatting (brief/compact/context/json)
        ├── rst.py           # RST generation, editing, file splitting
        └── scaffold.py      # Workspace scaffolding (init command)

Memory data lives in a separate workspace (e.g., .memories/), created with memory init.

Autonomous Workflow

The protocol supports a fully autonomous memory lifecycle — agents can capture, plan, and maintain knowledge without human intervention:

  capture (git / CI / discussion)
        │
        ▼
  plan  (analyze graph → propose actions)
        │
        ▼
  apply (execute plan → add/update/deprecate)
        │
        ▼
  rebuild (sphinx-build → needs.json)
        │
        ▼
  recall (search updated graph)

Capture sources:

  • memory capture git — scans recent commits, extracts decisions, bug fixes, refactors
  • memory capture ci --input <log> — parses test failures, compiler errors, deprecation warnings
  • memory capture discussion --input <transcript> — classifies conversation into decisions, facts, preferences, risks, goals, questions

Planning engine:

  • memory plan — analyzes the memory graph for staleness, missing links, contradictions, and proposes maintenance actions
  • memory plan --auto-apply — execute the plan immediately after analysis
  • memory apply plan.json — execute a previously saved plan

All captured candidates include provenance (--source) and are deduplicated against existing memories.

Build-as-Guardian

The Sphinx build acts as a quality gate for the memory graph. needs_warnings in conf.py define constraints that fire during memory rebuild:

needs_warnings = {
    "missing_topic_tag": "type in ['mem','dec','fact',...] and not any(t.startswith('topic:') for t in tags)",
    "empty_body": "description == '' or description == 'TODO: Add description.'",
    "deprecated_without_supersede": "status == 'deprecated' and len(supersedes_back) == 0",
}

With sphinx-build -W (warnings as errors), the build fails if any memory violates these constraints. This means:

  • Every memory must have at least one topic: tag
  • No empty placeholders survive to the index
  • Deprecated memories must be superseded by a replacement

Agents learn to self-correct: if rebuild fails, they read the warning, fix the offending memory, and retry.

Human Role

Humans are observers and editors, not gatekeepers:

  • Dashboardsmemory/dashboards.rst contains needtable, needlist, and needflow directives rendering the live state of the memory graph as HTML
  • RST editing — memories are plain RST, editable in any text editor or IDE with full diff/blame in Git
  • Override — humans can update status, confidence, or tags on any memory via CLI or direct RST edit
  • Reviewmemory review surfaces memories whose review_after date has passed, prompting human validation

The protocol is designed so that agents maintain knowledge autonomously while humans retain full visibility and override capability.

Contributing

See CONTRIBUTING.md for guidelines on how to contribute.

License

Apache 2.0

About

Toy project for using Sphinx-needs to manage AI memories

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages