Skip to content
This repository was archived by the owner on Apr 14, 2026. It is now read-only.

Latest commit

 

History

History
1187 lines (896 loc) · 41.5 KB

File metadata and controls

1187 lines (896 loc) · 41.5 KB

Admin Guide

ai-memory is an AI-agnostic memory management system. It works with any MCP-compatible AI client -- including Claude AI, OpenAI ChatGPT, xAI Grok, META Llama, and others. The HTTP API and CLI are completely platform-independent.

Key features for admins: Zero token cost until recall (replaces built-in auto-memory), TOON compact default response format (79% smaller than JSON), MCP prompts for proactive AI behavior (recall-first, memory-workflow), 4 feature tiers (keyword → autonomous with local LLMs via Ollama), 191 tests with 95%+ coverage across 15/15 modules.

Deployment Options

MCP Server (Recommended)

The simplest deployment is as an MCP tool server. No daemon process to manage -- your AI client spawns the process on demand. MCP (Model Context Protocol) is an open standard supported by multiple AI platforms.

Below is an example for Claude Code (user scope: merge mcpServers into ~/.claude.json; or project scope: .mcp.json in project root). Other MCP-compatible clients have their own configuration locations — consult your platform's documentation.

{
  "mcpServers": {
    "memory": {
      "command": "ai-memory",
      "args": ["--db", "~/.claude/ai-memory.db", "mcp", "--tier", "semantic"]
    }
  }
}

Claude Code note: MCP server configuration does not go in settings.json or settings.local.json -- those files do not support mcpServers.

The MCP server:

  • Starts when your AI client opens a session
  • Communicates over stdio (JSON-RPC) -- the standard MCP transport
  • Stops when the session ends
  • Uses the same SQLite database as the CLI and HTTP daemon
  • Correctly skips all JSON-RPC notifications (no response sent)
  • Works with any MCP-compatible client, not just Claude Code

Standalone (Development)

Run the HTTP daemon directly in the foreground:

ai-memory --db /path/to/ai-memory.db serve

The daemon listens on 127.0.0.1:9077 by default and exposes 24 HTTP endpoints.

Systemd (Production HTTP Daemon)

sudo tee /etc/systemd/system/ai-memory.service > /dev/null << 'EOF'
[Unit]
Description=AI Memory Daemon
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db serve
Restart=on-failure
RestartSec=5
Environment=RUST_LOG=ai_memory=info,tower_http=info

# Graceful shutdown: checkpoints WAL before exit
KillSignal=SIGINT
TimeoutStopSec=10

[Install]
WantedBy=multi-user.target
EOF

sudo mkdir -p /var/lib/ai-memory
sudo systemctl daemon-reload
sudo systemctl enable --now ai-memory

Production Hardening: Add security directives to the [Service] section to restrict the daemon's privileges:

[Service]
User=ai-memory
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
NoNewPrivileges=yes
ReadWritePaths=/var/lib/ai-memory

Check status:

sudo systemctl status ai-memory
sudo journalctl -u ai-memory -f

Docker

Example Dockerfile:

FROM rust:1.75-slim AS builder
WORKDIR /src
COPY . .
RUN cargo build --release

FROM debian:bookworm-slim
COPY --from=builder /src/target/release/ai-memory /usr/local/bin/
VOLUME /data
EXPOSE 9077
CMD ["ai-memory", "--db", "/data/ai-memory.db", "serve"]

Build and run:

docker build -t ai-memory .
docker run -d -p 127.0.0.1:9077:9077 -v ai-memory-data:/data ai-memory

Configuration

CLI Flags

Flag Default Description
--db <path> ai-memory.db Path to SQLite database
--host <addr> 127.0.0.1 Bind address (serve only)
--port <port> 9077 Bind port (serve only)
--json false JSON output for CLI commands
--tier <tier> semantic Feature tier: keyword, semantic, smart, autonomous (mcp/serve only)

Feature Tiers

The --tier flag controls which features are enabled. Each tier builds on the previous one:

Tier Tools Embedding Model LLM Required Approx. Memory
keyword 21 No No Minimal
semantic (default) 21 Yes (HuggingFace) No ~256 MB
smart 21 Yes Yes (Ollama) ~1 GB
autonomous 21 Yes Yes (Ollama) ~4 GB

Set the tier when starting the MCP server or HTTP daemon:

ai-memory mcp --tier semantic        # default
ai-memory mcp --tier smart           # enables LLM-powered tools
ai-memory serve --tier autonomous    # full feature set

Ollama Setup (Smart & Autonomous Tiers)

The smart and autonomous tiers require a running Ollama instance for LLM inference (Gemma 4 models).

macOS

brew install ollama
# Or download from https://ollama.com/download/mac
ollama serve &
ollama pull gemma4:e2b    # Smart tier (~1GB)
ollama pull gemma4:e4b    # Autonomous tier (~2.3GB)

Linux

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama
ollama pull gemma4:e2b    # Smart tier (~1GB)
ollama pull gemma4:e4b    # Autonomous tier (~2.3GB)

Windows

# Download from https://ollama.com/download/windows, or:
winget install Ollama.Ollama
ollama pull gemma4:e2b    # Smart tier (~1GB)
ollama pull gemma4:e4b    # Autonomous tier (~2.3GB)

Verify

curl http://localhost:11434/api/tags
ollama run gemma4:e2b "Hello, world"

ai-memory connects to Ollama at http://localhost:11434 by default. Set OLLAMA_HOST to override. If Ollama is not running, ai-memory gracefully falls back to the semantic tier.

Embedding Model (semantic tier and above)

At the semantic tier and above, ai-memory downloads a sentence-transformer model from HuggingFace on first startup. The model is cached in the HuggingFace cache directory (~/.cache/huggingface/ by default).

  • First startup may take 30-60 seconds while the model downloads (~100 MB)
  • Subsequent startups load from cache (2-5 seconds)
  • Set HF_HOME to override the cache directory
  • No HuggingFace account or API key is required

Memory Budget Guidance

Tier RAM Requirement Notes
keyword Minimal (~10 MB) SQLite + FTS5 only
semantic ~256 MB Embedding model loaded in memory
smart ~1 GB Embedding model + Ollama with smaller LLM
autonomous ~4 GB Embedding model + Ollama with larger LLM

Environment Variables

Variable Default Description
AI_MEMORY_DB ai-memory.db Database path (overridden by --db)
RUST_LOG (none) Logging filter (e.g., ai_memory=info,tower_http=debug)
AI_MEMORY_NO_CONFIG (none) Set to 1 to skip config file loading (useful for testing)

Configuration File (config.toml)

ai-memory supports an optional configuration file at ~/.config/ai-memory/config.toml. This file is read once at process startup and supports the following keys:

Note: Configuration is loaded once at process startup. Changes to config.toml require restarting the ai-memory process (MCP server, HTTP daemon, or CLI) to take effect.

Key Type Default Valid Values Description
tier String "semantic" "keyword", "semantic", "smart", "autonomous" Feature tier controlling which AI capabilities are active
db String "ai-memory.db" Any valid file path Path to the SQLite database file
ollama_url String "http://localhost:11434" Any URL Ollama base URL for LLM generation (smart/autonomous tiers)
embed_url String Value of ollama_url Any URL Separate URL for the embedding service; falls back to ollama_url if unset
embedding_model String Tier-dependent "mini_lm_l6_v2" (384-dim, ~90 MB), "nomic_embed_v15" (768-dim, ~270 MB) HuggingFace sentence-transformer model for semantic search
llm_model String Tier-dependent "gemma4:e2b" (~1 GB Q4), "gemma4:e4b" (~2.3 GB Q4) Ollama LLM model tag for smart/autonomous features
cross_encoder Bool false (true for autonomous tier) true, false Enable neural cross-encoder reranking (not a string -- must be bare true/false without quotes)
default_namespace String "global" Any valid namespace (max 128 bytes, no slashes/spaces/nulls) Default namespace applied to new memories
max_memory_mb Integer Tier-dependent Any positive integer Maximum memory budget in MB; used for automatic tier selection via from_memory_budget()
archive_on_gc Bool true true, false Archive expired memories instead of permanently deleting them during GC
[ttl] Section -- -- Per-tier TTL overrides (all sub-fields are integers in seconds)
ttl.short_ttl_secs Integer 21600 (6 hours) 0 = never expires, or positive integer TTL for short-tier memories in seconds
ttl.mid_ttl_secs Integer 604800 (7 days) 0 = never expires, or positive integer TTL for mid-tier memories in seconds
ttl.long_ttl_secs Integer 0 (never expires) 0 = never expires, or positive integer TTL for long-tier memories in seconds
ttl.short_extend_secs Integer 3600 (1 hour) Non-negative integer TTL extension on access for short-tier memories
ttl.mid_extend_secs Integer 86400 (1 day) Non-negative integer TTL extension on access for mid-tier memories

Note: Set any TTL to 0 to disable expiry for that tier. Values are clamped to a 10-year maximum (315,360,000 seconds). Negative extension values are clamped to 0.

Note: Restored memories have their expires_at cleared (set to NULL) and become permanent.

Complete Annotated config.toml

Below is a complete example showing every supported field with explanatory comments. Copy this to ~/.config/ai-memory/config.toml and uncomment the lines you want to customize.

# =============================================================================
# ai-memory configuration
# Location: ~/.config/ai-memory/config.toml
# Docs: https://github.com/alphaonedev/ai-memory-mcp
#
# All fields are optional. CLI flags and MCP args override these values.
# Changes require restarting the ai-memory process to take effect.
# =============================================================================

# ---------------------------------------------------------------------------
# Feature tier (controls which AI capabilities are active)
# ---------------------------------------------------------------------------
# Valid values: "keyword", "semantic", "smart", "autonomous"
#   keyword    — FTS5 keyword search only, no models, minimal RAM
#   semantic   — adds embedding-based hybrid recall (~256 MB)
#   smart      — adds query expansion, auto-tagging, contradiction detection (~1 GB, requires Ollama)
#   autonomous — full feature set with cross-encoder reranking (~4 GB, requires Ollama)
# Default: "semantic"
# tier = "semantic"

# ---------------------------------------------------------------------------
# Database path
# ---------------------------------------------------------------------------
# Path to the SQLite database file.
# Default: "ai-memory.db" (relative to working directory)
# db = "~/.claude/ai-memory.db"

# ---------------------------------------------------------------------------
# Ollama URLs (smart and autonomous tiers only)
# ---------------------------------------------------------------------------
# Base URL for Ollama LLM generation.
# Default: "http://localhost:11434"
# ollama_url = "http://localhost:11434"

# Separate URL for embedding requests. Falls back to ollama_url if unset.
# Default: same as ollama_url
# embed_url = "http://localhost:11434"

# ---------------------------------------------------------------------------
# Model selection
# ---------------------------------------------------------------------------
# Embedding model for semantic search (semantic tier and above).
# Valid values:
#   "mini_lm_l6_v2"   — sentence-transformers/all-MiniLM-L6-v2, 384-dim, ~90 MB
#   "nomic_embed_v15"  — nomic-ai/nomic-embed-text-v1.5, 768-dim, ~270 MB
# Default: tier-dependent (mini_lm_l6_v2 for semantic, nomic_embed_v15 for smart/autonomous)
# embedding_model = "mini_lm_l6_v2"

# LLM model served via Ollama (smart and autonomous tiers).
# Valid values:
#   "gemma4:e2b"  — Google Gemma 4 Effective 2B, ~1 GB Q4 (smart tier default)
#   "gemma4:e4b"  — Google Gemma 4 Effective 4B, ~2.3 GB Q4 (autonomous tier default)
# Default: tier-dependent (gemma4:e2b for smart, gemma4:e4b for autonomous)
# llm_model = "gemma4:e2b"

# ---------------------------------------------------------------------------
# Cross-encoder reranking
# ---------------------------------------------------------------------------
# Enable neural cross-encoder reranking for improved recall precision.
# NOTE: This is a boolean, NOT a string. Use bare true/false without quotes.
# Default: false (true for autonomous tier)
# cross_encoder = true

# ---------------------------------------------------------------------------
# Namespace and memory limits
# ---------------------------------------------------------------------------
# Default namespace applied to new memories when none is specified.
# Default: "global"
# default_namespace = "global"

# Maximum memory budget in MB. Used for automatic tier selection when tier
# is not explicitly set — the highest tier that fits within this budget is chosen.
# Default: tier-dependent (0/256/1024/4096 for keyword/semantic/smart/autonomous)
# max_memory_mb = 4096

# ---------------------------------------------------------------------------
# Garbage collection
# ---------------------------------------------------------------------------
# Archive expired memories before GC permanently deletes them.
# When true, expired memories are moved to the archive table and can be
# restored later. When false, GC permanently deletes expired memories.
# Default: true
# archive_on_gc = true

# ---------------------------------------------------------------------------
# Per-tier TTL overrides
# ---------------------------------------------------------------------------
# Customize time-to-live and access-extension durations per memory tier.
# Set any TTL to 0 to disable expiry for that tier.
# Values are clamped to a 10-year maximum (315,360,000 seconds).
# Negative extension values are clamped to 0.
# [ttl]
# short_ttl_secs = 21600        # 6 hours (default)
# mid_ttl_secs = 604800         # 7 days (default)
# long_ttl_secs = 0             # 0 = never expires (default)
# short_extend_secs = 3600      # +1 hour on access (default)
# mid_extend_secs = 86400       # +1 day on access (default)

Precedence: CLI flags and MCP args take precedence over config.toml values. When the MCP server is launched by an AI client, the --tier flag in the MCP args is used, not the config.toml tier setting.

Compile-Time Constants

These are set in the source code and require recompilation to change:

Constant Value Location
DEFAULT_PORT 9077 main.rs
GC_INTERVAL_SECS 1800 (30 min) main.rs
MAX_CONTENT_SIZE 65536 (64 KB) models.rs
PROMOTION_THRESHOLD 5 accesses models.rs
SHORT_TTL_EXTEND_SECS 3600 (1 hour) models.rs
MID_TTL_EXTEND_SECS 86400 (1 day) models.rs

Graceful Shutdown

The HTTP daemon handles SIGINT (Ctrl+C) gracefully:

  1. Stops accepting new connections
  2. Waits for in-flight requests to complete
  3. Checkpoints the WAL (PRAGMA wal_checkpoint(TRUNCATE))
  4. Exits cleanly

For systemd, use KillSignal=SIGINT and TimeoutStopSec=10 to ensure the checkpoint completes.

Note: The HTTP daemon handles SIGINT (Ctrl+C) gracefully with WAL checkpoint. Systemd sends SIGTERM by default -- the service file sets KillSignal=SIGINT to ensure clean shutdown.

The MCP server exits cleanly when stdin closes (AI client session ends).

Database Management

SQLite Settings

The database uses these pragmas (set automatically on open):

  • WAL mode -- write-ahead logging for concurrent reads
  • busy_timeout = 5000 -- 5 second wait on lock contention
  • synchronous = NORMAL -- balanced durability/performance
  • foreign_keys = ON -- enforced referential integrity (links cascade on delete)

Backup

Live backup (while daemon is running):

sqlite3 /path/to/ai-memory.db ".backup /path/to/backup.db"

JSON export (includes links):

ai-memory --db /path/to/ai-memory.db export > backup.json

File copy (daemon must be stopped or use WAL checkpoint first):

systemctl stop ai-memory
cp /path/to/ai-memory.db /path/to/backup.db
cp /path/to/ai-memory.db-wal /path/to/backup.db-wal 2>/dev/null
systemctl start ai-memory

Restore

From JSON (preserves links):

ai-memory --db /path/to/new.db import < backup.json

From SQLite backup:

systemctl stop ai-memory
cp /path/to/backup.db /var/lib/ai-memory/ai-memory.db
systemctl start ai-memory

Migration

The schema is auto-migrated on startup. The schema_version table tracks the current version (currently 4). Migrations are forward-only and non-destructive.

  • v1 -> v2: Added confidence (REAL) and source (TEXT) columns
  • v2 -> v3: Added embedding (BLOB) column for storing dense vector embeddings
  • v3 -> v4: Added archived_memories table for GC archival

Migration error handling: only expected errors (e.g., "duplicate column" when re-running a migration) are silently ignored. Real failures are propagated and will prevent startup, ensuring data integrity.

Upgrade Procedure

  1. Stop the service: sudo systemctl stop ai-memory
  2. Backup the database: sqlite3 /var/lib/ai-memory/ai-memory.db ".backup /var/lib/ai-memory/ai-memory-backup.db"
  3. Install the new binary (e.g., cargo install ai-memory or replace the binary at /usr/local/bin/ai-memory)
  4. Start the service: sudo systemctl start ai-memory

Schema migrations run automatically on startup. No manual migration steps are required.

Database Maintenance

Manually trigger garbage collection:

# Via CLI
ai-memory gc

# Via API
curl -X POST http://127.0.0.1:9077/api/v1/gc

By default, GC archives expired memories before deleting them. To disable archiving and permanently delete instead, set archive_on_gc = false in config.toml. Archived memories are moved to a separate archive table and can be listed, restored, or purged:

# List archived memories
curl http://127.0.0.1:9077/api/v1/archive

# Restore an archived memory
curl -X POST http://127.0.0.1:9077/api/v1/archive/<id>/restore

# Purge all archived memories permanently (optional: ?older_than_days=N)
curl -X DELETE http://127.0.0.1:9077/api/v1/archive

# View archive statistics
curl http://127.0.0.1:9077/api/v1/archive/stats

Disk space guidance: Approximate database growth: ~2KB per memory (keyword tier), ~3.5KB per memory (semantic tier, 384-dim embeddings), ~5KB per memory (768-dim embeddings). WAL file may grow up to ~50MB during heavy write bursts; checkpoint occurs on graceful shutdown. Archive table grows unboundedly -- use ai-memory archive purge periodically.

Compact the database (reduces file size after many deletions):

sqlite3 /path/to/ai-memory.db "VACUUM"

Rebuild the FTS index (if it becomes corrupt):

sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"

Security Hardening

Transaction Safety

Critical operations use BEGIN IMMEDIATE / COMMIT transactions to prevent data corruption under concurrent access:

  • touch() -- the read-modify-write cycle for access count, TTL extension, auto-promotion, and priority reinforcement is fully atomic
  • consolidate() -- the multi-step merge (create new memory, delete originals, aggregate tags) is fully atomic

This prevents race conditions where two concurrent recalls could cause incorrect access counts or missed auto-promotions.

FTS Query Injection Protection

All full-text search queries are sanitized before being passed to SQLite FTS5:

  • Special characters (*, ", (, ), :, +, -, ^, etc.) are stripped
  • Remaining tokens are individually double-quoted (e.g., auth flow becomes "auth" "flow")
  • This prevents FTS query syntax injection that could cause errors or unexpected results

The sanitization is applied in recall(), search(), and forget() operations.

Error Sanitization

The HTTP API never leaks internal database error details to clients. All rusqlite::Error and anyhow::Error responses are replaced with a generic "Internal server error" message. Detailed errors are logged server-side for debugging.

Bulk Input Limits

To prevent memory exhaustion and abuse:

  • Bulk create (POST /memories/bulk): Limited to 1,000 items per request
  • Import (POST /import): Limited to 1,000 memories per request

Requests exceeding these limits receive a 400 Bad Request response.

Path Parameter Validation

All ID path parameters (e.g., /memories/{id}, /links/{id}) are validated before database queries are executed. Invalid IDs (empty, too long, containing null bytes) are rejected with a 400 Bad Request response before any database access occurs.

Input Validation

All write paths go through the validation layer (validate.rs):

  • Title: max 512 bytes, no null bytes
  • Content: max 64KB, no null bytes
  • Namespace: max 128 bytes, no slashes/spaces/nulls
  • Source: whitelist (user, claude, hook, api, cli, import, consolidation, system)
  • Tags: max 50 tags, each max 128 bytes
  • Priority: 1-10
  • Confidence: 0.0-1.0, finite
  • Relations: whitelist (related_to, supersedes, contradicts, derived_from)
  • IDs: max 128 bytes, no null bytes
  • Timestamps: valid RFC3339
  • TTL: positive, max 1 year

Localhost Binding

By default, the HTTP daemon binds to 127.0.0.1 only. It is not accessible from the network. This is intentional -- ai-memory is a local-machine tool.

The MCP server communicates over stdio only -- no network exposure.

CORS

The HTTP server uses CorsLayer::permissive() -- any origin can make requests. For production, use a reverse proxy with restrictive CORS headers.

No Authentication

There is no authentication mechanism. This is by design -- the daemon is intended for localhost access only by your AI client (Claude AI, ChatGPT, Grok, Llama, or any other). If you expose it to a network, you are responsible for adding a reverse proxy with authentication.

Multi-User Warning

ai-memory is a single-user tool. Namespaces do not provide access control. If multiple users share a database, any user can read/write any namespace.

TLS / Reverse Proxy

ai-memory does not support TLS natively. For HTTPS, terminate TLS at a reverse proxy. Minimal nginx example:

server {
    listen 443 ssl;
    server_name memory.example.com;

    ssl_certificate     /etc/ssl/certs/memory.pem;
    ssl_certificate_key /etc/ssl/private/memory.key;

    location / {
        proxy_pass http://127.0.0.1:9077;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Data at Rest

The SQLite database is stored as a regular file. It is not encrypted. If you need encryption at rest, use filesystem-level encryption (LUKS, FileVault, BitLocker).

MCP Notification Handling

The MCP server correctly handles all JSON-RPC notifications (requests without an id field). Notifications are processed but no response is sent, per the JSON-RPC 2.0 specification. This prevents protocol errors when any MCP client sends notifications/initialized or other notification messages.

WAL Files

SQLite WAL mode creates two additional files alongside the database:

  • ai-memory.db-wal -- write-ahead log
  • ai-memory.db-shm -- shared memory file

Both are cleaned up on graceful shutdown (the daemon runs PRAGMA wal_checkpoint(TRUNCATE) on SIGINT). If the daemon crashes, these files persist but are automatically recovered on next open.

HTTP API Endpoints

Maximum request body size: 50 MB.

The HTTP daemon exposes 24 endpoints under /api/v1:

Method Path Description
GET /health Deep health check (DB + FTS integrity)
POST /memories Create a memory
POST /memories/bulk Bulk create (max 1,000)
GET /memories/{id} Get a memory by ID (includes links)
PUT /memories/{id} Update a memory
DELETE /memories/{id} Delete a memory
POST /memories/{id}/promote Promote a memory to long-term
GET /memories List memories with filters
GET /search AND search with 6-factor scoring
GET /recall OR recall with touch + auto-promote
POST /recall OR recall (POST body)
POST /forget Bulk delete by pattern/namespace/tier
POST /consolidate Consolidate 2-100 memories
POST /links Create a link between memories
GET /links/{id} Get links for a memory
GET /namespaces List namespaces with counts
GET /stats Aggregate statistics
POST /gc Trigger garbage collection
GET /export Export all memories and links
POST /import Import memories and links (max 1,000)
GET /archive List archived memories
POST /archive/{id}/restore Restore an archived memory
DELETE /archive Permanently delete archived memories (optional ?older_than_days=N)
GET /archive/stats Archive statistics

HTTP API Request/Response Examples

Below are curl examples showing the exact JSON request bodies and response formats for the most important endpoints. The base URL is http://127.0.0.1:9077/api/v1.

POST /memories (Store)

Create a new memory. Only title and content are required; all other fields have defaults.

curl -X POST http://127.0.0.1:9077/api/v1/memories \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Project uses PostgreSQL 16",
    "content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
    "tier": "long",
    "namespace": "infra",
    "tags": ["postgres", "database"],
    "priority": 9,
    "confidence": 1.0,
    "source": "user",
    "ttl_secs": 604800
  }'

Required fields:

Field Type Description
title string Memory title (max 512 bytes)
content string Memory content (max 64 KB)

Optional fields:

Field Type Default Description
tier string "mid" "short", "mid", or "long"
namespace string "global" Namespace for grouping (max 128 bytes, no slashes/spaces)
tags array [] String tags (max 50 tags, each max 128 bytes)
priority integer 5 1-10 (clamped)
confidence float 1.0 0.0-1.0 (clamped)
source string "api" One of: user, claude, hook, api, cli, import, consolidation, system
expires_at string (none) Explicit expiry timestamp (RFC3339)
ttl_secs integer (none) TTL in seconds (overrides tier default)

Response (201 Created):

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "tier": "long",
  "namespace": "infra",
  "title": "Project uses PostgreSQL 16"
}

If potential contradictions are found (memories with similar titles in the same namespace), the response includes:

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "tier": "long",
  "namespace": "infra",
  "title": "Project uses PostgreSQL 16",
  "potential_contradictions": ["existing-id-1", "existing-id-2"]
}

Deduplication: if a memory with the same title+namespace already exists, it is upserted (tier never downgrades, priority keeps the maximum).

Minimal example (defaults applied):

curl -X POST http://127.0.0.1:9077/api/v1/memories \
  -H "Content-Type: application/json" \
  -d '{"title": "Quick note", "content": "Something to remember."}'

Response: {"id": "...", "tier": "mid", "namespace": "global", "title": "Quick note"}

GET /memories/{id} (Get)

Retrieve a single memory by ID, including its links to other memories.

curl http://127.0.0.1:9077/api/v1/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Response (200 OK):

{
  "memory": {
    "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "tier": "long",
    "namespace": "infra",
    "title": "Project uses PostgreSQL 16",
    "content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
    "tags": ["postgres", "database"],
    "priority": 9,
    "confidence": 1.0,
    "source": "user",
    "access_count": 3,
    "created_at": "2026-04-03T15:00:00+00:00",
    "updated_at": "2026-04-03T15:00:00+00:00",
    "last_accessed_at": "2026-04-10T09:30:00+00:00",
    "expires_at": null
  },
  "links": [
    {
      "source_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "target_id": "f7e8d9c0-b1a2-3456-7890-abcdef123456",
      "relation": "related_to",
      "created_at": "2026-04-05T12:00:00+00:00"
    }
  ]
}

Response (404 Not Found): {"error": "not found"}

Note: last_accessed_at and expires_at are omitted from the JSON when null.

GET /recall?context=... (Recall)

Fuzzy OR search with ranked results. Automatically bumps access count, extends TTL, and auto-promotes frequently accessed mid-tier memories to long-term.

curl "http://127.0.0.1:9077/api/v1/recall?context=database+migration+postgres&namespace=infra&limit=5"

Query parameters:

Parameter Type Default Description
context string (required) Search context / query text
namespace string (none) Filter by namespace
limit integer 10 Max results (capped at 50)
tags string (none) Comma-separated tag filter
since string (none) Only memories updated after this RFC3339 timestamp
until string (none) Only memories updated before this RFC3339 timestamp

Response (200 OK):

{
  "memories": [
    {
      "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "tier": "long",
      "namespace": "infra",
      "title": "Project uses PostgreSQL 16",
      "content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
      "tags": ["postgres", "database"],
      "priority": 9,
      "confidence": 1.0,
      "source": "user",
      "access_count": 4,
      "created_at": "2026-04-03T15:00:00+00:00",
      "updated_at": "2026-04-03T15:00:00+00:00",
      "last_accessed_at": "2026-04-12T10:00:00+00:00",
      "score": 0.763
    }
  ],
  "count": 1
}

Each memory in the response includes a score field (float, rounded to 3 decimal places) representing the composite relevance score. Memories are returned sorted by score descending.

Recall is also available via POST for larger query bodies:

curl -X POST http://127.0.0.1:9077/api/v1/recall \
  -H "Content-Type: application/json" \
  -d '{
    "context": "database migration postgres",
    "namespace": "infra",
    "limit": 5,
    "tags": "postgres",
    "since": "2026-01-01T00:00:00Z"
  }'

PUT /memories/{id} (Update)

Partial update -- only provided fields are modified. All fields are optional.

curl -X PUT http://127.0.0.1:9077/api/v1/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
  -H "Content-Type: application/json" \
  -d '{
    "content": "PostgreSQL 16.2 with pgvector 0.7 for embeddings. Upgraded 2026-04-10.",
    "priority": 10,
    "tags": ["postgres", "database", "pgvector"]
  }'

Updatable fields:

Field Type Description
title string New title
content string New content
tier string New tier ("short", "mid", "long")
namespace string New namespace
tags array Replace tags entirely
priority integer New priority (1-10)
confidence float New confidence (0.0-1.0)
expires_at string New expiry (RFC3339)

Response (200 OK): Returns the full updated memory object:

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "tier": "long",
  "namespace": "infra",
  "title": "Project uses PostgreSQL 16",
  "content": "PostgreSQL 16.2 with pgvector 0.7 for embeddings. Upgraded 2026-04-10.",
  "tags": ["postgres", "database", "pgvector"],
  "priority": 10,
  "confidence": 1.0,
  "source": "user",
  "access_count": 4,
  "created_at": "2026-04-03T15:00:00+00:00",
  "updated_at": "2026-04-12T10:05:00+00:00"
}

Response (404 Not Found): {"error": "not found"}

Response (409 Conflict): {"error": "title already exists in namespace ..."} (if updating the title to one that already exists in the same namespace)

GET /archive (List Archived)

List memories that were archived by garbage collection.

curl "http://127.0.0.1:9077/api/v1/archive?namespace=infra&limit=20&offset=0"

Query parameters:

Parameter Type Default Description
namespace string (none) Filter by namespace
limit integer 50 Max results (capped at 1000)
offset integer 0 Pagination offset

Response (200 OK):

{
  "archived": [
    {
      "id": "expired-memory-id",
      "tier": "short",
      "namespace": "infra",
      "title": "Temp debug session",
      "content": "Debugging connection pooling issue...",
      "tags": ["debug"],
      "priority": 3,
      "confidence": 1.0,
      "source": "claude",
      "access_count": 1,
      "created_at": "2026-04-01T10:00:00+00:00",
      "updated_at": "2026-04-01T10:00:00+00:00",
      "expires_at": "2026-04-01T16:00:00+00:00",
      "archived_at": "2026-04-02T00:30:00+00:00",
      "archive_reason": "gc"
    }
  ],
  "count": 1
}

POST /archive/{id}/restore (Restore)

Restore an archived memory back to the active memories table. The restored memory has its expires_at cleared (becomes permanent).

curl -X POST http://127.0.0.1:9077/api/v1/archive/expired-memory-id/restore

Response (200 OK):

{
  "restored": true,
  "id": "expired-memory-id"
}

Response (404 Not Found): {"error": "not found in archive"}

Monitoring

Health Endpoint (Deep Check)

curl http://127.0.0.1:9077/api/v1/health

The health check performs a deep verification:

  1. Database is readable (runs SELECT COUNT(*) FROM memories)
  2. FTS5 index integrity check (INSERT INTO memories_fts(memories_fts) VALUES('integrity-check'))

Returns 200 OK with {"status": "ok", "service": "ai-memory"} if healthy. Returns 503 Service Unavailable with {"status": "error", "service": "ai-memory"} if the database or FTS index is unhealthy.

Stats Endpoint

curl http://127.0.0.1:9077/api/v1/stats

Returns:

  • Total memory count
  • Breakdown by tier
  • Breakdown by namespace
  • Memories expiring within 1 hour
  • Total link count
  • Database file size in bytes

MCP Server Monitoring

The MCP server logs to stderr. Monitor via:

# If running via an AI client, check your client's MCP logs
# If running manually:
ai-memory mcp 2>mcp-server.log

Key log messages:

  • ai-memory MCP server started (stdio) -- server is ready
  • ai-memory MCP server stopped -- stdin closed (AI client session ended), server exiting

Logs

The HTTP daemon logs via tracing with configurable levels:

# Info level (default recommended)
RUST_LOG=ai_memory=info,tower_http=info ai-memory serve

# Debug level (verbose, includes all HTTP requests)
RUST_LOG=ai_memory=debug,tower_http=debug ai-memory serve

# Trace level (extremely verbose)
RUST_LOG=ai_memory=trace ai-memory serve

With systemd, logs go to the journal:

sudo journalctl -u ai-memory -f
sudo journalctl -u ai-memory --since "1 hour ago"

Monitoring Script Example

#!/bin/bash
HEALTH=$(curl -sf http://127.0.0.1:9077/api/v1/health | jq -r '.status')
if [ "$HEALTH" != "ok" ]; then
    echo "ai-memory health check failed"
    systemctl restart ai-memory
fi

CI/CD Pipeline

The project uses GitHub Actions for continuous integration and release automation.

CI (Every Push and PR)

Runs on ubuntu-latest and macos-latest:

  1. Formatting -- cargo fmt --check
  2. Linting -- cargo clippy -- -D warnings
  3. Tests -- cargo test (191 tests: 140 unit + 51 integration, 15/15 modules)
  4. Build -- cargo build --release

Uses Swatinem/rust-cache@v2 for build caching.

Release (On Tag Push)

Triggered by tags matching v* (e.g., v0.1.0):

  1. Builds release binaries for:
    • x86_64-unknown-linux-gnu (Ubuntu)
    • aarch64-apple-darwin (macOS ARM)
  2. Packages each as ai-memory-<target>.tar.gz
  3. Creates a GitHub Release with the artifacts

Running CI Locally

# Replicate the CI checks
cargo fmt --check
cargo clippy -- -D warnings
cargo test
cargo build --release

Multi-Node Sync

For multi-machine deployments (e.g., laptop + server, or multiple workstations), use the sync command to keep databases in sync.

Manual Sync

# Pull remote changes to local
ai-memory sync /mnt/shared/ai-memory.db --direction pull

# Push local changes to remote
ai-memory sync /mnt/shared/ai-memory.db --direction push

# Bidirectional merge (recommended)
ai-memory sync /mnt/shared/ai-memory.db --direction merge

Automated Sync via Cron

# Sync every 15 minutes (bidirectional merge)
*/15 * * * * /usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db sync /mnt/shared/remote-memory.db --direction merge --json >> /var/log/ai-memory-sync.log 2>&1

Sync uses the same dedup-safe upsert as regular stores:

  • Title+namespace conflicts are resolved by keeping the higher priority
  • Tier never downgrades
  • Links are synced alongside memories
  • Safe to run concurrently from multiple machines (SQLite WAL mode handles locking)

Sync via sshfs or rsync

If the remote database is on another machine, mount it or copy it first:

# Option 1: sshfs mount
mkdir -p /mnt/remote-memory
sshfs user@server:/var/lib/ai-memory /mnt/remote-memory
ai-memory sync /mnt/remote-memory/ai-memory.db --direction merge

# Option 2: rsync + sync + rsync
rsync -a server:/var/lib/ai-memory/ai-memory.db /tmp/remote.db
ai-memory sync /tmp/remote.db --direction merge
rsync -a /tmp/remote.db server:/var/lib/ai-memory/ai-memory.db

Auto-Consolidation (Maintenance)

Auto-consolidation groups memories by namespace and primary tag, then merges groups with enough members into a single long-term summary. This reduces memory count and improves recall relevance.

Manual Run

# Preview what would be consolidated
ai-memory auto-consolidate --dry-run

# Consolidate all namespaces (groups of 3+)
ai-memory auto-consolidate

# Only short-term memories, minimum 5 per group
ai-memory auto-consolidate --short-only --min-count 5

Cron Schedule

# Run auto-consolidation daily at 3am, short-term memories only
0 3 * * * /usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db auto-consolidate --short-only --json >> /var/log/ai-memory-consolidate.log 2>&1

Man Page

Install the man page for system-wide documentation:

ai-memory man | sudo tee /usr/local/share/man/man1/ai-memory.1 > /dev/null
sudo mandb
man ai-memory

Scaling Considerations

ai-memory is designed for single-machine use. It is not a distributed system.

  • Concurrency: The daemon uses Arc<Mutex<Connection>> -- one write at a time, but this is fine for a single-user tool. SQLite WAL mode allows concurrent reads.
  • MCP concurrency: The MCP server is single-threaded (synchronous stdio loop), one request at a time. This is by design -- MCP clients typically send one request at a time.
  • Database size: SQLite handles databases up to 281 TB. Practically, performance stays excellent up to millions of rows.
  • Memory usage: Minimal. The daemon holds only the connection and a path in memory. All data is on disk.
  • Multiple instances: You can run multiple daemons on different ports with different databases. Do not point two daemons at the same database file. The MCP server and CLI can share a database (both use WAL mode).

Troubleshooting

Daemon won't start

Port already in use:

ss -tlnp | grep 9077
# Kill the existing process or use a different port
ai-memory serve --port 9078

Database locked:

# Remove stale WAL files (only if daemon is not running)
rm -f ai-memory.db-wal ai-memory.db-shm

Permission denied:

# Check file permissions
ls -la /path/to/ai-memory.db
# Ensure the user running the daemon has read/write access

MCP server not connecting

Binary not found: Check that the path in your MCP configuration (e.g., ~/.claude.json for Claude Code user scope, or .mcp.json for project scope) is correct and the binary is executable.

Database path issues: The MCP server opens the database at the path specified by --db. Ensure the directory exists and is writable.

Protocol errors: Check stderr output. The MCP server logs parse errors and protocol issues to stderr.

Slow queries

If recall or search is slow:

# Rebuild the FTS index
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"

# Compact the database
sqlite3 /path/to/ai-memory.db "VACUUM"

FTS index corruption

Symptoms: search returns no results or errors.

# Check integrity
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('integrity-check')"

# Rebuild if corrupt
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"

Database is growing too large

# Check what's taking space
ai-memory stats

# Delete expired memories
ai-memory gc

# Delete all short-term memories in a namespace
ai-memory forget --tier short --namespace my-app

# Compact after deletion
sqlite3 /path/to/ai-memory.db "VACUUM"