ai-memory is an AI-agnostic memory management system. It works with any MCP-compatible AI client -- including Claude AI, OpenAI ChatGPT, xAI Grok, META Llama, and others. The HTTP API and CLI are completely platform-independent.
Key features for admins: Zero token cost until recall (replaces built-in auto-memory), TOON compact default response format (79% smaller than JSON), MCP prompts for proactive AI behavior (recall-first, memory-workflow), 4 feature tiers (keyword → autonomous with local LLMs via Ollama), 191 tests with 95%+ coverage across 15/15 modules.
The simplest deployment is as an MCP tool server. No daemon process to manage -- your AI client spawns the process on demand. MCP (Model Context Protocol) is an open standard supported by multiple AI platforms.
Below is an example for Claude Code (user scope: merge mcpServers into ~/.claude.json; or project scope: .mcp.json in project root). Other MCP-compatible clients have their own configuration locations — consult your platform's documentation.
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.claude/ai-memory.db", "mcp", "--tier", "semantic"]
}
}
}Claude Code note: MCP server configuration does not go in
settings.jsonorsettings.local.json-- those files do not supportmcpServers.
The MCP server:
- Starts when your AI client opens a session
- Communicates over stdio (JSON-RPC) -- the standard MCP transport
- Stops when the session ends
- Uses the same SQLite database as the CLI and HTTP daemon
- Correctly skips all JSON-RPC notifications (no response sent)
- Works with any MCP-compatible client, not just Claude Code
Run the HTTP daemon directly in the foreground:
ai-memory --db /path/to/ai-memory.db serveThe daemon listens on 127.0.0.1:9077 by default and exposes 24 HTTP endpoints.
sudo tee /etc/systemd/system/ai-memory.service > /dev/null << 'EOF'
[Unit]
Description=AI Memory Daemon
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db serve
Restart=on-failure
RestartSec=5
Environment=RUST_LOG=ai_memory=info,tower_http=info
# Graceful shutdown: checkpoints WAL before exit
KillSignal=SIGINT
TimeoutStopSec=10
[Install]
WantedBy=multi-user.target
EOF
sudo mkdir -p /var/lib/ai-memory
sudo systemctl daemon-reload
sudo systemctl enable --now ai-memoryProduction Hardening: Add security directives to the [Service] section to restrict the daemon's privileges:
[Service]
User=ai-memory
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
NoNewPrivileges=yes
ReadWritePaths=/var/lib/ai-memoryCheck status:
sudo systemctl status ai-memory
sudo journalctl -u ai-memory -fExample Dockerfile:
FROM rust:1.75-slim AS builder
WORKDIR /src
COPY . .
RUN cargo build --release
FROM debian:bookworm-slim
COPY --from=builder /src/target/release/ai-memory /usr/local/bin/
VOLUME /data
EXPOSE 9077
CMD ["ai-memory", "--db", "/data/ai-memory.db", "serve"]Build and run:
docker build -t ai-memory .
docker run -d -p 127.0.0.1:9077:9077 -v ai-memory-data:/data ai-memory| Flag | Default | Description |
|---|---|---|
--db <path> |
ai-memory.db |
Path to SQLite database |
--host <addr> |
127.0.0.1 |
Bind address (serve only) |
--port <port> |
9077 |
Bind port (serve only) |
--json |
false |
JSON output for CLI commands |
--tier <tier> |
semantic |
Feature tier: keyword, semantic, smart, autonomous (mcp/serve only) |
The --tier flag controls which features are enabled. Each tier builds on the previous one:
| Tier | Tools | Embedding Model | LLM Required | Approx. Memory |
|---|---|---|---|---|
keyword |
21 | No | No | Minimal |
semantic (default) |
21 | Yes (HuggingFace) | No | ~256 MB |
smart |
21 | Yes | Yes (Ollama) | ~1 GB |
autonomous |
21 | Yes | Yes (Ollama) | ~4 GB |
Set the tier when starting the MCP server or HTTP daemon:
ai-memory mcp --tier semantic # default
ai-memory mcp --tier smart # enables LLM-powered tools
ai-memory serve --tier autonomous # full feature setThe smart and autonomous tiers require a running Ollama instance for LLM inference (Gemma 4 models).
brew install ollama
# Or download from https://ollama.com/download/mac
ollama serve &
ollama pull gemma4:e2b # Smart tier (~1GB)
ollama pull gemma4:e4b # Autonomous tier (~2.3GB)curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama
ollama pull gemma4:e2b # Smart tier (~1GB)
ollama pull gemma4:e4b # Autonomous tier (~2.3GB)# Download from https://ollama.com/download/windows, or:
winget install Ollama.Ollama
ollama pull gemma4:e2b # Smart tier (~1GB)
ollama pull gemma4:e4b # Autonomous tier (~2.3GB)curl http://localhost:11434/api/tags
ollama run gemma4:e2b "Hello, world"ai-memory connects to Ollama at http://localhost:11434 by default. Set OLLAMA_HOST to override. If Ollama is not running, ai-memory gracefully falls back to the semantic tier.
At the semantic tier and above, ai-memory downloads a sentence-transformer model from HuggingFace on first startup. The model is cached in the HuggingFace cache directory (~/.cache/huggingface/ by default).
- First startup may take 30-60 seconds while the model downloads (~100 MB)
- Subsequent startups load from cache (2-5 seconds)
- Set
HF_HOMEto override the cache directory - No HuggingFace account or API key is required
| Tier | RAM Requirement | Notes |
|---|---|---|
keyword |
Minimal (~10 MB) | SQLite + FTS5 only |
semantic |
~256 MB | Embedding model loaded in memory |
smart |
~1 GB | Embedding model + Ollama with smaller LLM |
autonomous |
~4 GB | Embedding model + Ollama with larger LLM |
| Variable | Default | Description |
|---|---|---|
AI_MEMORY_DB |
ai-memory.db |
Database path (overridden by --db) |
RUST_LOG |
(none) | Logging filter (e.g., ai_memory=info,tower_http=debug) |
AI_MEMORY_NO_CONFIG |
(none) | Set to 1 to skip config file loading (useful for testing) |
ai-memory supports an optional configuration file at ~/.config/ai-memory/config.toml. This file is read once at process startup and supports the following keys:
Note: Configuration is loaded once at process startup. Changes to
config.tomlrequire restarting the ai-memory process (MCP server, HTTP daemon, or CLI) to take effect.
| Key | Type | Default | Valid Values | Description |
|---|---|---|---|---|
tier |
String | "semantic" |
"keyword", "semantic", "smart", "autonomous" |
Feature tier controlling which AI capabilities are active |
db |
String | "ai-memory.db" |
Any valid file path | Path to the SQLite database file |
ollama_url |
String | "http://localhost:11434" |
Any URL | Ollama base URL for LLM generation (smart/autonomous tiers) |
embed_url |
String | Value of ollama_url |
Any URL | Separate URL for the embedding service; falls back to ollama_url if unset |
embedding_model |
String | Tier-dependent | "mini_lm_l6_v2" (384-dim, ~90 MB), "nomic_embed_v15" (768-dim, ~270 MB) |
HuggingFace sentence-transformer model for semantic search |
llm_model |
String | Tier-dependent | "gemma4:e2b" (~1 GB Q4), "gemma4:e4b" (~2.3 GB Q4) |
Ollama LLM model tag for smart/autonomous features |
cross_encoder |
Bool | false (true for autonomous tier) |
true, false |
Enable neural cross-encoder reranking (not a string -- must be bare true/false without quotes) |
default_namespace |
String | "global" |
Any valid namespace (max 128 bytes, no slashes/spaces/nulls) | Default namespace applied to new memories |
max_memory_mb |
Integer | Tier-dependent | Any positive integer | Maximum memory budget in MB; used for automatic tier selection via from_memory_budget() |
archive_on_gc |
Bool | true |
true, false |
Archive expired memories instead of permanently deleting them during GC |
[ttl] |
Section | -- | -- | Per-tier TTL overrides (all sub-fields are integers in seconds) |
ttl.short_ttl_secs |
Integer | 21600 (6 hours) |
0 = never expires, or positive integer |
TTL for short-tier memories in seconds |
ttl.mid_ttl_secs |
Integer | 604800 (7 days) |
0 = never expires, or positive integer |
TTL for mid-tier memories in seconds |
ttl.long_ttl_secs |
Integer | 0 (never expires) |
0 = never expires, or positive integer |
TTL for long-tier memories in seconds |
ttl.short_extend_secs |
Integer | 3600 (1 hour) |
Non-negative integer | TTL extension on access for short-tier memories |
ttl.mid_extend_secs |
Integer | 86400 (1 day) |
Non-negative integer | TTL extension on access for mid-tier memories |
Note: Set any TTL to
0to disable expiry for that tier. Values are clamped to a 10-year maximum (315,360,000 seconds). Negative extension values are clamped to 0.
Note: Restored memories have their
expires_atcleared (set to NULL) and become permanent.
Below is a complete example showing every supported field with explanatory comments. Copy this to ~/.config/ai-memory/config.toml and uncomment the lines you want to customize.
# =============================================================================
# ai-memory configuration
# Location: ~/.config/ai-memory/config.toml
# Docs: https://github.com/alphaonedev/ai-memory-mcp
#
# All fields are optional. CLI flags and MCP args override these values.
# Changes require restarting the ai-memory process to take effect.
# =============================================================================
# ---------------------------------------------------------------------------
# Feature tier (controls which AI capabilities are active)
# ---------------------------------------------------------------------------
# Valid values: "keyword", "semantic", "smart", "autonomous"
# keyword — FTS5 keyword search only, no models, minimal RAM
# semantic — adds embedding-based hybrid recall (~256 MB)
# smart — adds query expansion, auto-tagging, contradiction detection (~1 GB, requires Ollama)
# autonomous — full feature set with cross-encoder reranking (~4 GB, requires Ollama)
# Default: "semantic"
# tier = "semantic"
# ---------------------------------------------------------------------------
# Database path
# ---------------------------------------------------------------------------
# Path to the SQLite database file.
# Default: "ai-memory.db" (relative to working directory)
# db = "~/.claude/ai-memory.db"
# ---------------------------------------------------------------------------
# Ollama URLs (smart and autonomous tiers only)
# ---------------------------------------------------------------------------
# Base URL for Ollama LLM generation.
# Default: "http://localhost:11434"
# ollama_url = "http://localhost:11434"
# Separate URL for embedding requests. Falls back to ollama_url if unset.
# Default: same as ollama_url
# embed_url = "http://localhost:11434"
# ---------------------------------------------------------------------------
# Model selection
# ---------------------------------------------------------------------------
# Embedding model for semantic search (semantic tier and above).
# Valid values:
# "mini_lm_l6_v2" — sentence-transformers/all-MiniLM-L6-v2, 384-dim, ~90 MB
# "nomic_embed_v15" — nomic-ai/nomic-embed-text-v1.5, 768-dim, ~270 MB
# Default: tier-dependent (mini_lm_l6_v2 for semantic, nomic_embed_v15 for smart/autonomous)
# embedding_model = "mini_lm_l6_v2"
# LLM model served via Ollama (smart and autonomous tiers).
# Valid values:
# "gemma4:e2b" — Google Gemma 4 Effective 2B, ~1 GB Q4 (smart tier default)
# "gemma4:e4b" — Google Gemma 4 Effective 4B, ~2.3 GB Q4 (autonomous tier default)
# Default: tier-dependent (gemma4:e2b for smart, gemma4:e4b for autonomous)
# llm_model = "gemma4:e2b"
# ---------------------------------------------------------------------------
# Cross-encoder reranking
# ---------------------------------------------------------------------------
# Enable neural cross-encoder reranking for improved recall precision.
# NOTE: This is a boolean, NOT a string. Use bare true/false without quotes.
# Default: false (true for autonomous tier)
# cross_encoder = true
# ---------------------------------------------------------------------------
# Namespace and memory limits
# ---------------------------------------------------------------------------
# Default namespace applied to new memories when none is specified.
# Default: "global"
# default_namespace = "global"
# Maximum memory budget in MB. Used for automatic tier selection when tier
# is not explicitly set — the highest tier that fits within this budget is chosen.
# Default: tier-dependent (0/256/1024/4096 for keyword/semantic/smart/autonomous)
# max_memory_mb = 4096
# ---------------------------------------------------------------------------
# Garbage collection
# ---------------------------------------------------------------------------
# Archive expired memories before GC permanently deletes them.
# When true, expired memories are moved to the archive table and can be
# restored later. When false, GC permanently deletes expired memories.
# Default: true
# archive_on_gc = true
# ---------------------------------------------------------------------------
# Per-tier TTL overrides
# ---------------------------------------------------------------------------
# Customize time-to-live and access-extension durations per memory tier.
# Set any TTL to 0 to disable expiry for that tier.
# Values are clamped to a 10-year maximum (315,360,000 seconds).
# Negative extension values are clamped to 0.
# [ttl]
# short_ttl_secs = 21600 # 6 hours (default)
# mid_ttl_secs = 604800 # 7 days (default)
# long_ttl_secs = 0 # 0 = never expires (default)
# short_extend_secs = 3600 # +1 hour on access (default)
# mid_extend_secs = 86400 # +1 day on access (default)Precedence: CLI flags and MCP args take precedence over config.toml values. When the MCP server is launched by an AI client, the --tier flag in the MCP args is used, not the config.toml tier setting.
These are set in the source code and require recompilation to change:
| Constant | Value | Location |
|---|---|---|
DEFAULT_PORT |
9077 | main.rs |
GC_INTERVAL_SECS |
1800 (30 min) | main.rs |
MAX_CONTENT_SIZE |
65536 (64 KB) | models.rs |
PROMOTION_THRESHOLD |
5 accesses | models.rs |
SHORT_TTL_EXTEND_SECS |
3600 (1 hour) | models.rs |
MID_TTL_EXTEND_SECS |
86400 (1 day) | models.rs |
The HTTP daemon handles SIGINT (Ctrl+C) gracefully:
- Stops accepting new connections
- Waits for in-flight requests to complete
- Checkpoints the WAL (
PRAGMA wal_checkpoint(TRUNCATE)) - Exits cleanly
For systemd, use KillSignal=SIGINT and TimeoutStopSec=10 to ensure the checkpoint completes.
Note: The HTTP daemon handles SIGINT (Ctrl+C) gracefully with WAL checkpoint. Systemd sends SIGTERM by default -- the service file sets
KillSignal=SIGINTto ensure clean shutdown.
The MCP server exits cleanly when stdin closes (AI client session ends).
The database uses these pragmas (set automatically on open):
- WAL mode -- write-ahead logging for concurrent reads
- busy_timeout = 5000 -- 5 second wait on lock contention
- synchronous = NORMAL -- balanced durability/performance
- foreign_keys = ON -- enforced referential integrity (links cascade on delete)
Live backup (while daemon is running):
sqlite3 /path/to/ai-memory.db ".backup /path/to/backup.db"JSON export (includes links):
ai-memory --db /path/to/ai-memory.db export > backup.jsonFile copy (daemon must be stopped or use WAL checkpoint first):
systemctl stop ai-memory
cp /path/to/ai-memory.db /path/to/backup.db
cp /path/to/ai-memory.db-wal /path/to/backup.db-wal 2>/dev/null
systemctl start ai-memoryFrom JSON (preserves links):
ai-memory --db /path/to/new.db import < backup.jsonFrom SQLite backup:
systemctl stop ai-memory
cp /path/to/backup.db /var/lib/ai-memory/ai-memory.db
systemctl start ai-memoryThe schema is auto-migrated on startup. The schema_version table tracks the current version (currently 4). Migrations are forward-only and non-destructive.
- v1 -> v2: Added
confidence(REAL) andsource(TEXT) columns - v2 -> v3: Added
embedding(BLOB) column for storing dense vector embeddings - v3 -> v4: Added
archived_memoriestable for GC archival
Migration error handling: only expected errors (e.g., "duplicate column" when re-running a migration) are silently ignored. Real failures are propagated and will prevent startup, ensuring data integrity.
- Stop the service:
sudo systemctl stop ai-memory - Backup the database:
sqlite3 /var/lib/ai-memory/ai-memory.db ".backup /var/lib/ai-memory/ai-memory-backup.db" - Install the new binary (e.g.,
cargo install ai-memoryor replace the binary at/usr/local/bin/ai-memory) - Start the service:
sudo systemctl start ai-memory
Schema migrations run automatically on startup. No manual migration steps are required.
Manually trigger garbage collection:
# Via CLI
ai-memory gc
# Via API
curl -X POST http://127.0.0.1:9077/api/v1/gcBy default, GC archives expired memories before deleting them. To disable archiving and permanently delete instead, set archive_on_gc = false in config.toml. Archived memories are moved to a separate archive table and can be listed, restored, or purged:
# List archived memories
curl http://127.0.0.1:9077/api/v1/archive
# Restore an archived memory
curl -X POST http://127.0.0.1:9077/api/v1/archive/<id>/restore
# Purge all archived memories permanently (optional: ?older_than_days=N)
curl -X DELETE http://127.0.0.1:9077/api/v1/archive
# View archive statistics
curl http://127.0.0.1:9077/api/v1/archive/statsDisk space guidance: Approximate database growth: ~2KB per memory (keyword tier), ~3.5KB per memory (semantic tier, 384-dim embeddings), ~5KB per memory (768-dim embeddings). WAL file may grow up to ~50MB during heavy write bursts; checkpoint occurs on graceful shutdown. Archive table grows unboundedly -- use ai-memory archive purge periodically.
Compact the database (reduces file size after many deletions):
sqlite3 /path/to/ai-memory.db "VACUUM"Rebuild the FTS index (if it becomes corrupt):
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"Critical operations use BEGIN IMMEDIATE / COMMIT transactions to prevent data corruption under concurrent access:
touch()-- the read-modify-write cycle for access count, TTL extension, auto-promotion, and priority reinforcement is fully atomicconsolidate()-- the multi-step merge (create new memory, delete originals, aggregate tags) is fully atomic
This prevents race conditions where two concurrent recalls could cause incorrect access counts or missed auto-promotions.
All full-text search queries are sanitized before being passed to SQLite FTS5:
- Special characters (
*,",(,),:,+,-,^, etc.) are stripped - Remaining tokens are individually double-quoted (e.g.,
auth flowbecomes"auth" "flow") - This prevents FTS query syntax injection that could cause errors or unexpected results
The sanitization is applied in recall(), search(), and forget() operations.
The HTTP API never leaks internal database error details to clients. All rusqlite::Error and anyhow::Error responses are replaced with a generic "Internal server error" message. Detailed errors are logged server-side for debugging.
To prevent memory exhaustion and abuse:
- Bulk create (
POST /memories/bulk): Limited to 1,000 items per request - Import (
POST /import): Limited to 1,000 memories per request
Requests exceeding these limits receive a 400 Bad Request response.
All ID path parameters (e.g., /memories/{id}, /links/{id}) are validated before database queries are executed. Invalid IDs (empty, too long, containing null bytes) are rejected with a 400 Bad Request response before any database access occurs.
All write paths go through the validation layer (validate.rs):
- Title: max 512 bytes, no null bytes
- Content: max 64KB, no null bytes
- Namespace: max 128 bytes, no slashes/spaces/nulls
- Source: whitelist (user, claude, hook, api, cli, import, consolidation, system)
- Tags: max 50 tags, each max 128 bytes
- Priority: 1-10
- Confidence: 0.0-1.0, finite
- Relations: whitelist (related_to, supersedes, contradicts, derived_from)
- IDs: max 128 bytes, no null bytes
- Timestamps: valid RFC3339
- TTL: positive, max 1 year
By default, the HTTP daemon binds to 127.0.0.1 only. It is not accessible from the network. This is intentional -- ai-memory is a local-machine tool.
The MCP server communicates over stdio only -- no network exposure.
The HTTP server uses CorsLayer::permissive() -- any origin can make requests. For production, use a reverse proxy with restrictive CORS headers.
There is no authentication mechanism. This is by design -- the daemon is intended for localhost access only by your AI client (Claude AI, ChatGPT, Grok, Llama, or any other). If you expose it to a network, you are responsible for adding a reverse proxy with authentication.
ai-memory is a single-user tool. Namespaces do not provide access control. If multiple users share a database, any user can read/write any namespace.
ai-memory does not support TLS natively. For HTTPS, terminate TLS at a reverse proxy. Minimal nginx example:
server {
listen 443 ssl;
server_name memory.example.com;
ssl_certificate /etc/ssl/certs/memory.pem;
ssl_certificate_key /etc/ssl/private/memory.key;
location / {
proxy_pass http://127.0.0.1:9077;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}The SQLite database is stored as a regular file. It is not encrypted. If you need encryption at rest, use filesystem-level encryption (LUKS, FileVault, BitLocker).
The MCP server correctly handles all JSON-RPC notifications (requests without an id field). Notifications are processed but no response is sent, per the JSON-RPC 2.0 specification. This prevents protocol errors when any MCP client sends notifications/initialized or other notification messages.
SQLite WAL mode creates two additional files alongside the database:
ai-memory.db-wal-- write-ahead logai-memory.db-shm-- shared memory file
Both are cleaned up on graceful shutdown (the daemon runs PRAGMA wal_checkpoint(TRUNCATE) on SIGINT). If the daemon crashes, these files persist but are automatically recovered on next open.
Maximum request body size: 50 MB.
The HTTP daemon exposes 24 endpoints under /api/v1:
| Method | Path | Description |
|---|---|---|
GET |
/health |
Deep health check (DB + FTS integrity) |
POST |
/memories |
Create a memory |
POST |
/memories/bulk |
Bulk create (max 1,000) |
GET |
/memories/{id} |
Get a memory by ID (includes links) |
PUT |
/memories/{id} |
Update a memory |
DELETE |
/memories/{id} |
Delete a memory |
POST |
/memories/{id}/promote |
Promote a memory to long-term |
GET |
/memories |
List memories with filters |
GET |
/search |
AND search with 6-factor scoring |
GET |
/recall |
OR recall with touch + auto-promote |
POST |
/recall |
OR recall (POST body) |
POST |
/forget |
Bulk delete by pattern/namespace/tier |
POST |
/consolidate |
Consolidate 2-100 memories |
POST |
/links |
Create a link between memories |
GET |
/links/{id} |
Get links for a memory |
GET |
/namespaces |
List namespaces with counts |
GET |
/stats |
Aggregate statistics |
POST |
/gc |
Trigger garbage collection |
GET |
/export |
Export all memories and links |
POST |
/import |
Import memories and links (max 1,000) |
GET |
/archive |
List archived memories |
POST |
/archive/{id}/restore |
Restore an archived memory |
DELETE |
/archive |
Permanently delete archived memories (optional ?older_than_days=N) |
GET |
/archive/stats |
Archive statistics |
Below are curl examples showing the exact JSON request bodies and response formats for the most important endpoints. The base URL is http://127.0.0.1:9077/api/v1.
Create a new memory. Only title and content are required; all other fields have defaults.
curl -X POST http://127.0.0.1:9077/api/v1/memories \
-H "Content-Type: application/json" \
-d '{
"title": "Project uses PostgreSQL 16",
"content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
"tier": "long",
"namespace": "infra",
"tags": ["postgres", "database"],
"priority": 9,
"confidence": 1.0,
"source": "user",
"ttl_secs": 604800
}'Required fields:
| Field | Type | Description |
|---|---|---|
title |
string | Memory title (max 512 bytes) |
content |
string | Memory content (max 64 KB) |
Optional fields:
| Field | Type | Default | Description |
|---|---|---|---|
tier |
string | "mid" |
"short", "mid", or "long" |
namespace |
string | "global" |
Namespace for grouping (max 128 bytes, no slashes/spaces) |
tags |
array | [] |
String tags (max 50 tags, each max 128 bytes) |
priority |
integer | 5 |
1-10 (clamped) |
confidence |
float | 1.0 |
0.0-1.0 (clamped) |
source |
string | "api" |
One of: user, claude, hook, api, cli, import, consolidation, system |
expires_at |
string | (none) | Explicit expiry timestamp (RFC3339) |
ttl_secs |
integer | (none) | TTL in seconds (overrides tier default) |
Response (201 Created):
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16"
}If potential contradictions are found (memories with similar titles in the same namespace), the response includes:
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"potential_contradictions": ["existing-id-1", "existing-id-2"]
}Deduplication: if a memory with the same title+namespace already exists, it is upserted (tier never downgrades, priority keeps the maximum).
Minimal example (defaults applied):
curl -X POST http://127.0.0.1:9077/api/v1/memories \
-H "Content-Type: application/json" \
-d '{"title": "Quick note", "content": "Something to remember."}'Response: {"id": "...", "tier": "mid", "namespace": "global", "title": "Quick note"}
Retrieve a single memory by ID, including its links to other memories.
curl http://127.0.0.1:9077/api/v1/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890Response (200 OK):
{
"memory": {
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
"tags": ["postgres", "database"],
"priority": 9,
"confidence": 1.0,
"source": "user",
"access_count": 3,
"created_at": "2026-04-03T15:00:00+00:00",
"updated_at": "2026-04-03T15:00:00+00:00",
"last_accessed_at": "2026-04-10T09:30:00+00:00",
"expires_at": null
},
"links": [
{
"source_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"target_id": "f7e8d9c0-b1a2-3456-7890-abcdef123456",
"relation": "related_to",
"created_at": "2026-04-05T12:00:00+00:00"
}
]
}Response (404 Not Found): {"error": "not found"}
Note: last_accessed_at and expires_at are omitted from the JSON when null.
Fuzzy OR search with ranked results. Automatically bumps access count, extends TTL, and auto-promotes frequently accessed mid-tier memories to long-term.
curl "http://127.0.0.1:9077/api/v1/recall?context=database+migration+postgres&namespace=infra&limit=5"Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
context |
string | (required) | Search context / query text |
namespace |
string | (none) | Filter by namespace |
limit |
integer | 10 |
Max results (capped at 50) |
tags |
string | (none) | Comma-separated tag filter |
since |
string | (none) | Only memories updated after this RFC3339 timestamp |
until |
string | (none) | Only memories updated before this RFC3339 timestamp |
Response (200 OK):
{
"memories": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"content": "The production database runs PostgreSQL 16 with pgvector for embeddings.",
"tags": ["postgres", "database"],
"priority": 9,
"confidence": 1.0,
"source": "user",
"access_count": 4,
"created_at": "2026-04-03T15:00:00+00:00",
"updated_at": "2026-04-03T15:00:00+00:00",
"last_accessed_at": "2026-04-12T10:00:00+00:00",
"score": 0.763
}
],
"count": 1
}Each memory in the response includes a score field (float, rounded to 3 decimal places) representing the composite relevance score. Memories are returned sorted by score descending.
Recall is also available via POST for larger query bodies:
curl -X POST http://127.0.0.1:9077/api/v1/recall \
-H "Content-Type: application/json" \
-d '{
"context": "database migration postgres",
"namespace": "infra",
"limit": 5,
"tags": "postgres",
"since": "2026-01-01T00:00:00Z"
}'Partial update -- only provided fields are modified. All fields are optional.
curl -X PUT http://127.0.0.1:9077/api/v1/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
-H "Content-Type: application/json" \
-d '{
"content": "PostgreSQL 16.2 with pgvector 0.7 for embeddings. Upgraded 2026-04-10.",
"priority": 10,
"tags": ["postgres", "database", "pgvector"]
}'Updatable fields:
| Field | Type | Description |
|---|---|---|
title |
string | New title |
content |
string | New content |
tier |
string | New tier ("short", "mid", "long") |
namespace |
string | New namespace |
tags |
array | Replace tags entirely |
priority |
integer | New priority (1-10) |
confidence |
float | New confidence (0.0-1.0) |
expires_at |
string | New expiry (RFC3339) |
Response (200 OK): Returns the full updated memory object:
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tier": "long",
"namespace": "infra",
"title": "Project uses PostgreSQL 16",
"content": "PostgreSQL 16.2 with pgvector 0.7 for embeddings. Upgraded 2026-04-10.",
"tags": ["postgres", "database", "pgvector"],
"priority": 10,
"confidence": 1.0,
"source": "user",
"access_count": 4,
"created_at": "2026-04-03T15:00:00+00:00",
"updated_at": "2026-04-12T10:05:00+00:00"
}Response (404 Not Found): {"error": "not found"}
Response (409 Conflict): {"error": "title already exists in namespace ..."} (if updating the title to one that already exists in the same namespace)
List memories that were archived by garbage collection.
curl "http://127.0.0.1:9077/api/v1/archive?namespace=infra&limit=20&offset=0"Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
namespace |
string | (none) | Filter by namespace |
limit |
integer | 50 |
Max results (capped at 1000) |
offset |
integer | 0 |
Pagination offset |
Response (200 OK):
{
"archived": [
{
"id": "expired-memory-id",
"tier": "short",
"namespace": "infra",
"title": "Temp debug session",
"content": "Debugging connection pooling issue...",
"tags": ["debug"],
"priority": 3,
"confidence": 1.0,
"source": "claude",
"access_count": 1,
"created_at": "2026-04-01T10:00:00+00:00",
"updated_at": "2026-04-01T10:00:00+00:00",
"expires_at": "2026-04-01T16:00:00+00:00",
"archived_at": "2026-04-02T00:30:00+00:00",
"archive_reason": "gc"
}
],
"count": 1
}Restore an archived memory back to the active memories table. The restored memory has its expires_at cleared (becomes permanent).
curl -X POST http://127.0.0.1:9077/api/v1/archive/expired-memory-id/restoreResponse (200 OK):
{
"restored": true,
"id": "expired-memory-id"
}Response (404 Not Found): {"error": "not found in archive"}
curl http://127.0.0.1:9077/api/v1/healthThe health check performs a deep verification:
- Database is readable (runs
SELECT COUNT(*) FROM memories) - FTS5 index integrity check (
INSERT INTO memories_fts(memories_fts) VALUES('integrity-check'))
Returns 200 OK with {"status": "ok", "service": "ai-memory"} if healthy.
Returns 503 Service Unavailable with {"status": "error", "service": "ai-memory"} if the database or FTS index is unhealthy.
curl http://127.0.0.1:9077/api/v1/statsReturns:
- Total memory count
- Breakdown by tier
- Breakdown by namespace
- Memories expiring within 1 hour
- Total link count
- Database file size in bytes
The MCP server logs to stderr. Monitor via:
# If running via an AI client, check your client's MCP logs
# If running manually:
ai-memory mcp 2>mcp-server.logKey log messages:
ai-memory MCP server started (stdio)-- server is readyai-memory MCP server stopped-- stdin closed (AI client session ended), server exiting
The HTTP daemon logs via tracing with configurable levels:
# Info level (default recommended)
RUST_LOG=ai_memory=info,tower_http=info ai-memory serve
# Debug level (verbose, includes all HTTP requests)
RUST_LOG=ai_memory=debug,tower_http=debug ai-memory serve
# Trace level (extremely verbose)
RUST_LOG=ai_memory=trace ai-memory serveWith systemd, logs go to the journal:
sudo journalctl -u ai-memory -f
sudo journalctl -u ai-memory --since "1 hour ago"#!/bin/bash
HEALTH=$(curl -sf http://127.0.0.1:9077/api/v1/health | jq -r '.status')
if [ "$HEALTH" != "ok" ]; then
echo "ai-memory health check failed"
systemctl restart ai-memory
fiThe project uses GitHub Actions for continuous integration and release automation.
Runs on ubuntu-latest and macos-latest:
- Formatting --
cargo fmt --check - Linting --
cargo clippy -- -D warnings - Tests --
cargo test(191 tests: 140 unit + 51 integration, 15/15 modules) - Build --
cargo build --release
Uses Swatinem/rust-cache@v2 for build caching.
Triggered by tags matching v* (e.g., v0.1.0):
- Builds release binaries for:
x86_64-unknown-linux-gnu(Ubuntu)aarch64-apple-darwin(macOS ARM)
- Packages each as
ai-memory-<target>.tar.gz - Creates a GitHub Release with the artifacts
# Replicate the CI checks
cargo fmt --check
cargo clippy -- -D warnings
cargo test
cargo build --releaseFor multi-machine deployments (e.g., laptop + server, or multiple workstations), use the sync command to keep databases in sync.
# Pull remote changes to local
ai-memory sync /mnt/shared/ai-memory.db --direction pull
# Push local changes to remote
ai-memory sync /mnt/shared/ai-memory.db --direction push
# Bidirectional merge (recommended)
ai-memory sync /mnt/shared/ai-memory.db --direction merge# Sync every 15 minutes (bidirectional merge)
*/15 * * * * /usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db sync /mnt/shared/remote-memory.db --direction merge --json >> /var/log/ai-memory-sync.log 2>&1Sync uses the same dedup-safe upsert as regular stores:
- Title+namespace conflicts are resolved by keeping the higher priority
- Tier never downgrades
- Links are synced alongside memories
- Safe to run concurrently from multiple machines (SQLite WAL mode handles locking)
If the remote database is on another machine, mount it or copy it first:
# Option 1: sshfs mount
mkdir -p /mnt/remote-memory
sshfs user@server:/var/lib/ai-memory /mnt/remote-memory
ai-memory sync /mnt/remote-memory/ai-memory.db --direction merge
# Option 2: rsync + sync + rsync
rsync -a server:/var/lib/ai-memory/ai-memory.db /tmp/remote.db
ai-memory sync /tmp/remote.db --direction merge
rsync -a /tmp/remote.db server:/var/lib/ai-memory/ai-memory.dbAuto-consolidation groups memories by namespace and primary tag, then merges groups with enough members into a single long-term summary. This reduces memory count and improves recall relevance.
# Preview what would be consolidated
ai-memory auto-consolidate --dry-run
# Consolidate all namespaces (groups of 3+)
ai-memory auto-consolidate
# Only short-term memories, minimum 5 per group
ai-memory auto-consolidate --short-only --min-count 5# Run auto-consolidation daily at 3am, short-term memories only
0 3 * * * /usr/local/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db auto-consolidate --short-only --json >> /var/log/ai-memory-consolidate.log 2>&1Install the man page for system-wide documentation:
ai-memory man | sudo tee /usr/local/share/man/man1/ai-memory.1 > /dev/null
sudo mandb
man ai-memoryai-memory is designed for single-machine use. It is not a distributed system.
- Concurrency: The daemon uses
Arc<Mutex<Connection>>-- one write at a time, but this is fine for a single-user tool. SQLite WAL mode allows concurrent reads. - MCP concurrency: The MCP server is single-threaded (synchronous stdio loop), one request at a time. This is by design -- MCP clients typically send one request at a time.
- Database size: SQLite handles databases up to 281 TB. Practically, performance stays excellent up to millions of rows.
- Memory usage: Minimal. The daemon holds only the connection and a path in memory. All data is on disk.
- Multiple instances: You can run multiple daemons on different ports with different databases. Do not point two daemons at the same database file. The MCP server and CLI can share a database (both use WAL mode).
Port already in use:
ss -tlnp | grep 9077
# Kill the existing process or use a different port
ai-memory serve --port 9078Database locked:
# Remove stale WAL files (only if daemon is not running)
rm -f ai-memory.db-wal ai-memory.db-shmPermission denied:
# Check file permissions
ls -la /path/to/ai-memory.db
# Ensure the user running the daemon has read/write accessBinary not found:
Check that the path in your MCP configuration (e.g., ~/.claude.json for Claude Code user scope, or .mcp.json for project scope) is correct and the binary is executable.
Database path issues:
The MCP server opens the database at the path specified by --db. Ensure the directory exists and is writable.
Protocol errors: Check stderr output. The MCP server logs parse errors and protocol issues to stderr.
If recall or search is slow:
# Rebuild the FTS index
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"
# Compact the database
sqlite3 /path/to/ai-memory.db "VACUUM"Symptoms: search returns no results or errors.
# Check integrity
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('integrity-check')"
# Rebuild if corrupt
sqlite3 /path/to/ai-memory.db "INSERT INTO memories_fts(memories_fts) VALUES('rebuild')"# Check what's taking space
ai-memory stats
# Delete expired memories
ai-memory gc
# Delete all short-term memories in a namespace
ai-memory forget --tier short --namespace my-app
# Compact after deletion
sqlite3 /path/to/ai-memory.db "VACUUM"