This guide covers integrating Engram with MCP clients like Claude Desktop, Claude Code, and Cursor.
Engram uses a server-based architecture. engram serve is a persistent process that owns the database and handles all memory operations. MCP clients connect to the server in one of two ways:
- SSE (recommended) — Clients that support SSE connect directly to the server
- stdio proxy — Clients that only support stdio spawn
engram stdio, a thin bridge that proxies to the running server
This architecture allows multiple MCP clients (Cursor, Claude Desktop, Claude Code) to share the same memory store simultaneously without database locking conflicts.
Engram needs to run as a background service so it's always available when your MCP clients connect. You only run it once -- all your MCP clients share the same server.
Create ~/Library/LaunchAgents/com.engram.server.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.engram.server</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/engram</string>
<string>serve</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>DUCKDB_PATH</key>
<string>/Users/YOUR_USERNAME/Library/Application Support/Engram/memory.duckdb</string>
<key>OLLAMA_URL</key>
<string>http://localhost:11434</string>
</dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardErrorPath</key>
<string>/usr/local/var/log/engram.log</string>
</dict>
</plist>Then load it:
mkdir -p ~/Library/Application\ Support/Engram
launchctl load ~/Library/LaunchAgents/com.engram.server.plistEngram will now start automatically on login. To stop: launchctl unload ~/Library/LaunchAgents/com.engram.server.plist
Create ~/.config/systemd/user/engram.service:
[Unit]
Description=Engram Memory Server
After=network.target
[Service]
ExecStart=/usr/local/bin/engram serve
Environment=DUCKDB_PATH=%h/.local/share/engram/memory.duckdb
Environment=OLLAMA_URL=http://localhost:11434
Restart=on-failure
[Install]
WantedBy=default.targetThen enable it:
mkdir -p ~/.local/share/engram
systemctl --user daemon-reload
systemctl --user enable --now engramCheck status: systemctl --user status engram
See WINDOWS.md for detailed Windows setup including running as a startup task.
For testing, just run in a terminal:
engram serveThe server starts on port 3490 by default and prints:
MCP SSE endpoint: http://localhost:3490/mcp/sse
Health check: http://localhost:3490/health
Add to .cursor/mcp.json in your project root or ~/.cursor/mcp.json for global access:
{
"mcpServers": {
"engram": {
"url": "http://localhost:3490/mcp/sse"
}
}
}claude mcp add engram --transport sse http://localhost:3490/mcp/sseClaude Desktop doesn't support SSE directly, so it uses engram stdio -- a thin proxy that bridges stdio to the running server:
{
"mcpServers": {
"engram": {
"command": "/absolute/path/to/engram",
"args": ["stdio"]
}
}
}Tip: On macOS, run
which engramto get the full path. TheENGRAM_SERVER_URLenv var defaults tohttp://localhost:3490-- only set it if your server runs on a different host or port.
Any client that supports SSE can connect to http://localhost:3490/mcp/sse. Clients that only support stdio can spawn engram stdio as a proxy to the running server.
| Variable | Default | Description |
|---|---|---|
DUCKDB_PATH |
./engram.duckdb |
Path to the DuckDB database file |
OLLAMA_URL |
http://localhost:11434 |
Ollama server endpoint |
EMBEDDING_MODEL |
nomic-embed-text |
Embedding model name |
ENGRAM_PORT |
3490 |
Server port |
ENGRAM_SERVER_URL |
http://localhost:3490 |
Server URL (used by stdio proxy) |
engram [serve] # Start HTTP/SSE server (default)
engram serve --port=3490 # Explicit serve with port override
engram stdio # Stdio proxy to running server
Once configured, restart your MCP client and try:
- Store a memory: "Remember that I prefer dark mode for all UIs"
- Retrieve it: "What are my UI preferences?"
To verify semantic search is working (not just returning recent results):
- Store a few memories about different topics
- Search for something semantically related to an older memory
- If the older, relevant memory ranks higher than recent unrelated ones, vector search is active
Check logs for Generated query embedding with 768 dimensions. If you see Failed to generate query embedding, Ollama may be down — Engram falls back to chronological ordering gracefully.
Store a new episode in memory.
| Parameter | Required | Description |
|---|---|---|
content |
Yes | Episode content |
source |
Yes | Source client (e.g., claude-desktop, cursor) |
name |
Human-readable label | |
source_model |
Model identifier (e.g., claude-4.6-sonnet) |
|
source_description |
Freeform context about the episode | |
group_id |
Multi-tenant group (default: default) |
|
tags |
Array of tags for categorization | |
valid_at |
ISO 8601 timestamp — when the information became true | |
metadata |
JSON string with additional data |
Search episodes using semantic similarity and filters.
| Parameter | Required | Description |
|---|---|---|
query |
Text to search for (embedded for semantic ranking) | |
group_id |
Filter by group | |
max_results |
Limit results (default: 10) | |
before |
ISO 8601 timestamp upper bound | |
after |
ISO 8601 timestamp lower bound | |
tags |
Filter by tags (AND logic) | |
source |
Filter by source client | |
include_expired |
Include expired episodes (default: false) | |
min_similarity |
Minimum similarity score to include (0.0–1.0). Only applies in vector mode. | |
search_mode |
How to search: vector (by meaning, default), keyword (by exact words), or hybrid (both combined). The default will change to hybrid in the next major version. |
|
search_alpha |
In hybrid mode, how much to favor meaning vs. exact words. Higher = more meaning-based, lower = more word-based (default: 0.7). For pure word search, use search_mode=keyword instead. |
Which mode should I use?
vector(default) — Best when you want conceptually similar results. "What are my deployment preferences?" will find memories about CI/CD pipelines, hosting, etc. even if they don't contain the word "deployment."keyword— Best when you know the exact words. Useful when Ollama is down, or when you need precise term matching.hybrid— Best of both worlds. Finds results that are both semantically relevant and contain the right words. Will become the default in a future version.
Search results include a similarity score (0.0–1.0) in vector and hybrid modes. Keyword mode does not return similarity scores.
Retrieve episodes by time range, source, or group.
Modify episode metadata, tags, or expiration.
Health check — returns system status and version.
Safety: Episodes can be marked as expired but cannot be permanently deleted via MCP tools. This prevents accidental memory loss from LLM errors.
If you're upgrading from v1.x:
- Start
engram serveas a persistent process (launchd, systemd, or Docker) - Update MCP client configs to use either the SSE URL directly or
"args": ["stdio"] - Remove
npx supergatewayfrom any configs — it's no longer needed - Your existing DuckDB file is unchanged — point
DUCKDB_PATHto the same file
The stdio proxy prints this when it can't reach the server. Make sure engram serve is running:
curl http://localhost:3490/healthUse absolute paths in MCP config, not ./engram or ~/engram.
- Verify Ollama is running:
ollama list - Check URL is
http://localhost:11434(nothttps)
Make sure the directory for DUCKDB_PATH is writable.
Check logs for Generated query embedding with 768 dimensions. If you see Failed to generate query embedding, Ollama may be down — Engram falls back to chronological ordering gracefully.