Native, zero-cloud memory for AI agents. SQLite-backed. Sub-millisecond. Fully private.
Mnemosyne is a local-first memory system for the Hermes Agent framework. It stores conversations, preferences, and knowledge in SQLite with native vector search (sqlite-vec) and full-text search (FTS5) -- no external databases, no API keys, no network calls.
Mnemosyne is evaluated on the BEAM long-context memory benchmark (Tavakoli et al., ICLR 2026) using the official end-to-end protocol: retrieved memories feed into an LLM, answers are scored by an LLM-as-judge against pre-written rubrics.
End-to-end results (48 questions per scale, 3 conversations each, 180 total):
| Scale | Mnemosyne | RAG (Llama-4) | LIGHT | Honcho | Hindsight |
|---|---|---|---|---|---|
| 100K | 35.4% | 32.3% | 35.8% | 63.0% | 73.4% |
| 500K | 19.3% | 33.0% | 35.9% | 64.9% | 71.1% |
| 1M | 19.2% | 30.7% | 33.6% | 63.1% | 73.9% |
What this says:
- At 100K (small conversations), Mnemosyne is competitive -- beats RAG, ties LIGHT
- At 500K+, performance degrades significantly below RAG. This is a known issue: the episodic consolidation pipeline is not producing entries during benchmark ingestion, so retrieval at scale loses information
- Published baselines use identical BEAM dataset and LLM-as-judge protocol
Per-ability highlight (100K): Information Extraction 80.5%, Abstention 50%, Summarization 41.7%. Multi-hop Reasoning (16.7%) and Event Ordering (13.3%) are weak -- these require fact linking across distant messages, which needs the episodic tier.
Full benchmark report: docs/beam-benchmark.md
pip install mnemosyne-memoryNote: The package name on PyPI is
mnemosyne-memory.
With all optional features (dense retrieval + local LLM consolidation):
pip install mnemosyne-memory[all]
⚠️ Ubuntu 24.04 / Debian 12 users: If you geterror: externally-managed-environment, your system Python is PEP 668-protected. Install Mnemosyne into the Hermes runtime venv (not a separate one) to avoid ABI mismatches with compiled dependencies:HERMES_PY="$HOME/.hermes/hermes-agent/venv/bin/python" "$HERMES_PY" -m pip install --upgrade --no-cache-dir "mnemosyne-memory[all]" "$HERMES_PY" -m mnemosyne.installInstalling into a separate venv can cause NumPy, sqlite-vec, or fastembed ABI crashes when Hermes tries to import them.
git clone https://github.com/AxDSan/mnemosyne.git
cd mnemosyne
pip install -e ".[all,dev]"If you only need Mnemosyne as a Hermes memory backend and want to skip pip entirely:
curl -sSL https://raw.githubusercontent.com/AxDSan/mnemosyne/main/deploy_hermes_provider.sh | bashThis symlinks the provider into ~/.hermes/plugins/mnemosyne and adds the repo to sys.path at runtime. No virtual environment required -- works out of the box on Ubuntu 24.04.
# 1. Install the plugin
python -m mnemosyne.install
# 2. Activate as your memory provider
hermes memory setup
# → Select "mnemosyne" and press EnterVerify:
hermes memory status # Should show "Provider: mnemosyne"
hermes mnemosyne stats # Shows working + episodic memory countsNote: The
hermes memory setuppicker defaults to "Built-in only" every time it opens. This is normal Hermes UI behavior -- your previous selection is saved. Just select Mnemosyne and press Enter.
| Feature | Mnemosyne | Honcho | Zep | Mem0 |
|---|---|---|---|---|
| Cost | Free forever | $$$ Paid (credit-based) | $$$ Paid (Flex/Enterprise) | Freemium ($0--$249/mo) |
| Hosting | Local -- your machine | Cloud only | Cloud / BYOC | Cloud only |
| Privacy | 100% local, zero data exfil | External API calls | External API calls | External API calls |
| Latency (read) | 0.076 ms | ~38 ms | ~62 ms | ~45 ms |
| Latency (write) | 0.81 ms | ~45 ms | ~85 ms | ~50 ms |
| Latency (search) | 1.2 ms | ~52 ms | ~78 ms | ~60 ms |
| Cold start | 0 ms (instant) | ~500 ms | ~800 ms | ~300 ms |
| Offline capable | Yes -- airplane mode works | No | No | No |
| Setup complexity | pip install mnemosyne-memory |
Docker + API keys + account | Docker + PostgreSQL + config | API key + signup |
| Vector store | sqlite-vec (built-in) | pgvector (external) | pgvector (external) | pgvector (external) |
| Full-text search | FTS5 (built-in) | Separate service | Separate service | Separate service |
| Auth required | None | Supabase auth | OAuth / API key | API key |
| Rate limits | None -- unlimited | Yes (plan-dependent) | Yes (credit-based) | Yes (plan-dependent) |
| Data ownership | You own the SQLite file | Vendor-hosted | Vendor-hosted | Vendor-hosted |
| Export / import | One JSON file, any machine | Limited | Limited | Limited |
| Dependencies | Python stdlib + optional ONNX | Docker, PostgreSQL, network | Docker, PostgreSQL, network | pip + API key + network |
| Integration | Native Hermes plugin | REST API SDK | REST API SDK | REST API SDK |
| Memory architecture | BEAM (3-tier: working + episodic + scratchpad) | Session + facts | Graph RAG + facts | Session + facts |
| Auto-consolidation | Sleep cycles built-in | Manual / paid add-on | Manual | Manual |
| Temporal knowledge graph | Native triples with validity | No | No | No |
| Import from other systems | 7 providers (Mem0, Letta, Zep, Cognee, Honcho, SuperMemory, Hindsight) | No | No | No |
| Benchmark (LongMemEval) | 98.9% Recall@All@5 | Not published | Not published | Not published |
| From | You Gain | You Lose |
|---|---|---|
| Honcho | 500x faster reads, zero monthly bill, 100% offline, no Docker, no credit system | Cloud-hosted dashboard, managed scaling, team sharing features |
| Zep | 43x faster search, no PostgreSQL to maintain, no deployment overhead, instant cold start | Graph RAG visualization, enterprise compliance certs (SOC 2), managed BYOC |
| Mem0 | Sub-millisecond everything, no API rate limits, no vendor lock-in, full data portability | Managed platform features, 90K+ developer community, YC-backed ecosystem |
| Hindsight | Zero dependency, no network calls, SQLite-native, BEAM architecture, can import FROM Hindsight | Cloud sync across devices, managed inference, web dashboard |
- If you care about speed: Mnemosyne is 43--500x faster than any cloud alternative because it runs in-process with SQLite -- no HTTP roundtrips, no network overhead.
- If you care about privacy: Your data never leaves your machine. No API calls. No telemetry. No vendor access.
- If you care about cost: Zero ongoing cost. No credits. No tiers. No "contact sales."
- If you care about simplicity:
pip install mnemosyne-memoryand it works. No Docker. No config files. No signup.
Trade-off: You manage your own backup/restore (one SQLite file, trivial). You don't get a web dashboard or team collaboration features -- Mnemosyne is built for individual developers and local agents, not enterprise teams.
Key capabilities:
- BEAM architecture -- Three tiers: hot working memory, long-term episodic memory, temporary scratchpad
- Hybrid search -- 50% vector similarity + 30% FTS5 rank + 20% importance, all inside SQLite
- Automatic consolidation -- Old working memories are summarized and moved to episodic memory via
mnemosyne_sleep() - Temporal triples -- Time-aware knowledge graph with automatic invalidation
- Entity extraction -- Regex + Levenshtein fuzzy matching (no spaCy, no PyTorch)
- LLM-driven fact extraction -- Structured facts from raw text with graceful fallback chain
- Host LLM adapter -- Route consolidation and fact extraction through Hermes' authenticated provider (e.g., OAuth-backed Codex) without managing credentials in Mnemosyne
- Memory banks -- Per-bank SQLite isolation for domain separation
- MCP server -- 6 tools, stdio + SSE transports
- Temporal recall -- Exponential decay scoring with configurable halflife
- Export / import -- Move your entire memory database to a new machine with one JSON file
- Cross-provider importers -- Migrate from Mem0, Letta, Zep, Cognee, Honcho, SuperMemory, Hindsight
- Cross-session scope --
remember(..., scope="global")makes facts visible everywhere - Configurable compression --
int8(default),float32, orbit(32x smaller) vectors - Binary vectors -- Information-theoretic binarization (MIB) for 32x memory reduction with deterministic Hamming-distance retrieval
- Streaming & DeltaSync -- Real-time memory event stream (push/pull) and checkpoint-based incremental sync between instances
- Configurable auto-sleep -- Automatically triggers consolidation when working memory exceeds
sleep_threshold; configurable viaconfig.yamlor env var - ignore_patterns -- Regex-based content filtering to exclude shell commands, stack traces, and boilerplate from memory storage
All numbers measured on CPU with sqlite-vec + FTS5 enabled.
| System | Score | Notes |
|---|---|---|
| Mnemosyne (dense) | 98.9% Recall@All@5 | Oracle subset, 100 instances, bge-small-en-v1.5 |
| Mempalace | 96.6% Recall@5 | AAAK + Palace architecture |
| Mastra Observational Memory | 84.23% (gpt-4o) | Three-date model |
| Full-context GPT-4o baseline | ~60.2% | No memory system |
| Operation | Honcho | Zep | MemGPT | Mnemosyne | Speedup |
|---|---|---|---|---|---|
| Write | 45ms | 85ms | 120ms | 0.81ms | 56x |
| Read | 38ms | 62ms | 95ms | 0.076ms | 500x |
| Search | 52ms | 78ms | 140ms | 1.2ms | 43x |
| Cold Start | 500ms | 800ms | 1200ms | 0ms | Instant |
Write throughput:
| Operation | Count | Total | Avg |
|---|---|---|---|
| Working memory writes | 500 | 8.7s | 17.4 ms |
| Episodic inserts (with embedding) | 500 | 10.7s | 21.3 ms |
| Sleep consolidation | 300 old items | 33 ms | -- |
Hybrid recall scaling (query latency stays flat as corpus grows):
| Corpus Size | Query | Avg Latency | p95 |
|---|---|---|---|
| 100 | "concept 42" | 5.1 ms | 6.9 ms |
| 500 | "concept 42" | 5.0 ms | 5.7 ms |
| 1,000 | "concept 42" | 5.3 ms | 6.5 ms |
| 2,000 | "concept 42" | 7.0 ms | 8.6 ms |
Working memory recall scaling (FTS5 fast path):
| WM Size | Query | Avg Latency | p95 |
|---|---|---|---|
| 1,000 | "concept 42" | 2.4 ms | 3.1 ms |
| 5,000 | "domain 7" | 3.2 ms | 3.8 ms |
| 10,000 | "concept 42" | 6.4 ms | 7.2 ms |
- Python 3.9+
- Hermes Agent (for plugin integration)
pip install mnemosyne-memory
# With all extras (dense retrieval + local LLM consolidation)
pip install mnemosyne-memory[all]git clone https://github.com/AxDSan/mnemosyne.git
cd mnemosyne
pip install -e ".[all,dev]"
python -m mnemosyne.install
⚠️ Ubuntu 24.04 / Debian 12 users: Ifpip installfails withexternally-managed-environment, see the Quick Start → Option A note about using a virtual environment.
# Dense retrieval (required for semantic search and the 98.9% LongMemEval score)
pip install fastembed>=0.3.0
# Local LLM consolidation (sleep cycle summarization)
pip install ctransformers>=0.2.27 huggingface-hub>=0.20Note: Without
fastembed, Mnemosyne falls back to keyword-only retrieval. It still works, but you won't get competitive semantic search or the benchmark scores above.
python -m mnemosyne.install --uninstallIf you installed from PyPI:
pip install --upgrade mnemosyne-memoryIf you installed from source:
cd mnemosyne
git pull
pip install -e ".[all,dev]"Always restart Hermes after updating so plugin changes take effect:
hermes gateway restartIf the update includes database schema changes, run the migration helper:
python scripts/migrate_from_legacy.pySee UPDATING.md for detailed troubleshooting and rollback instructions.
# Show memory statistics (current session only)
hermes mnemosyne stats
# Show memory statistics across ALL sessions
hermes mnemosyne stats --global
# Search memories
hermes mnemosyne inspect "dark mode preferences"
# Run consolidation (compress old working memory into episodic summaries)
hermes mnemosyne sleep
# Export all memories to a JSON file
hermes mnemosyne export --output mnemosyne_backup.json
# Import memories from a JSON file
hermes mnemosyne import --input mnemosyne_backup.json
# Import from Hindsight (JSON export or live API)
mnemosyne import-hindsight hindsight-export.json hermes
mnemosyne import-hindsight http://localhost:8888 hermes
# The same timestamp-preserving Hindsight importer is available inside Hermes
hermes mnemosyne import --from hindsight --file hindsight-export.json --bank hermes
hermes mnemosyne import --from hindsight --base-url http://localhost:8888 --bank hermes
# Clear scratchpad
hermes mnemosyne clearOptional MCP server: For external access or integration with MCP-compatible services, run the MCP server:
mnemosyne mcp # stdio transport mnemosyne mcp --transport sse --port 8080 # SSE transportMnemosyne does not currently expose a standalone REST API server.
from mnemosyne import remember, recall
# Store a fact
remember(
content="User prefers dark mode interfaces",
importance=0.9,
source="preference"
)
# Store a global preference (visible in every session)
remember(
content="User email is abdi.moya@gmail.com",
importance=0.95,
source="preference",
scope="global"
)
# Store a temporary credential with expiry
remember(
content="API key: sk-abc123",
importance=0.8,
source="credential",
valid_until="2026-12-31T00:00:00"
)
# Search memories
results = recall("interface preferences", top_k=3)
# Temporal knowledge graph
from mnemosyne.core.triples import TripleStore
kg = TripleStore()
kg.add("Maya", "assigned_to", "auth-migration", valid_from="2026-01-15")
kg.query("Maya", as_of="2026-02-01")from mnemosyne.core.beam import BeamMemory
beam = BeamMemory(session_id="my_session")
# Working memory (auto-injected into prompts)
beam.remember("Important context", importance=0.9)
# Episodic memory (long-term, searchable)
beam.consolidate_to_episodic(
summary="User likes Neovim",
source_wm_ids=["wm1"],
importance=0.8
)
# Scratchpad (temporary reasoning)
beam.scratchpad_write("todo: fix auth bug")
# Search both tiers
results = beam.recall("editor preferences", top_k=5)Temporal recall adds an exponential decay boost so recent memories rank higher:
results = recall(
"deployments",
temporal_weight=0.5, # Enable temporal scoring
temporal_halflife=48.0, # 48-hour halflife
query_time="2026-04-29T12:00:00" # Reference point
)Regex-based entity extraction with Levenshtein fuzzy matching. No spaCy, no PyTorch.
remember(
"Met with Abdias J about the Mnemosyne v2 release",
extract_entities=True
)
# Extracts: "Abdias J", "Mnemosyne" -- stored as triples
# Fuzzy match: querying "Abdias" finds "Abdias J" (similarity: 0.925)Catches @mentions, #hashtags, "quoted phrases", and capitalized sequences (2-5 words). Misses pronouns and complex coreferences.
Extract structured facts from raw text using an LLM, with a graceful fallback chain:
- Host LLM adapter (if
MNEMOSYNE_HOST_LLM_ENABLED=trueand a backend is registered -- e.g. when running under Hermes) - Remote OpenAI-compatible API (if
MNEMOSYNE_LLM_BASE_URLis set) - Local ctransformers GGUF model
- Skip -- extraction fails silently, memory is still stored
remember(
"User said they prefer Python over JavaScript for backend work",
extract=True # Extracts 2-5 factual statements as triples
)Per-bank SQLite isolation for domain separation:
from mnemosyne.core.banks import BankManager
BankManager().create_bank("work")
BankManager().create_bank("personal")
work_mem = Mnemosyne(bank="work")
work_mem.remember("Sprint review scheduled for Friday")mnemosyne bank list
mnemosyne bank create research
mnemosyne mcp --bank work # MCP server scoped to a bank6 tools, 2 transports, for any MCP-compatible client:
# stdio -- for Claude Desktop, etc.
mnemosyne mcp
# SSE -- for web clients
mnemosyne mcp --transport sse --port 8080| Tool | Description |
|---|---|
mnemosyne_remember |
Store a memory |
mnemosyne_recall |
Search with hybrid scoring |
mnemosyne_sleep |
Run consolidation |
mnemosyne_scratchpad_read |
Read scratchpad |
mnemosyne_scratchpad_write |
Write to scratchpad |
mnemosyne_get_stats |
Memory statistics |
When registered as a Hermes plugin, Mnemosyne exposes 15 tools to the agent, providing full memory lifecycle management directly from the conversation:
| # | Tool | Description |
|---|---|---|
| 1 | mnemosyne_remember |
Store a memory with importance, source, expiry, and cross-session scope |
| 2 | mnemosyne_recall |
Hybrid vector + FTS5 search across working and episodic memory with configurable scoring weights |
| 3 | mnemosyne_stats |
Get BEAM tier statistics (working, episodic, scratchpad counts) |
| 4 | mnemosyne_triple_add |
Add a temporal triple to the knowledge graph with validity dates |
| 5 | mnemosyne_triple_query |
Query the temporal knowledge graph with as_of historical lookback |
| 6 | mnemosyne_sleep |
Run the consolidation cycle — summarize old working memories into episodic tier |
| 7 | mnemosyne_scratchpad_write |
Write a temporary note to the scratchpad reasoning workspace |
| 8 | mnemosyne_scratchpad_read |
Read all current scratchpad entries |
| 9 | mnemosyne_scratchpad_clear |
Clear all scratchpad entries |
| 10 | mnemosyne_invalidate |
Mark a memory as expired or superseded by a replacement |
| 11 | mnemosyne_export |
Export all memories to a portable JSON file for backup or migration |
| 12 | mnemosyne_update |
Update the content or importance of an existing memory by ID |
| 13 | mnemosyne_forget |
Permanently delete a memory by ID |
| 14 | mnemosyne_import |
Import from JSON file, Mem0, Letta, Zep, Cognee, Honcho, SuperMemory, or Hindsight |
| 15 | mnemosyne_diagnose |
Run PII-safe diagnostics (dependencies, DB state, vector readiness) — never exposes memory content |
The plugin also registers three lifecycle hooks (pre_llm_call, on_session_start, post_tool_call) for automatic context injection before every LLM call.
┌─────────────────────────────────────────────────────────────┐
│ HERMES AGENT │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ pre_llm │────▶│ Mnemosyne │────▶│ SQLite │ │
│ │ hook │ │ BEAM │ │ │ │
│ └─────────────┘ └──────────────┘ │ working_mem │ │
│ ▲ │ episodic_mem│ │
│ │ │ vec_episodes│ │
│ └──────── Auto-injected context ───│ fts_episodes│ │
│ │ scratchpad │ │
│ │ triples │ │
│ └─────────────┘ │
│ │
│ Core runs in-process. Optional MCP server available. │
└─────────────────────────────────────────────────────────────┘
BEAM (Bilevel Episodic-Associative Memory):
working_memory-- Hot context, auto-injected before LLM calls, TTL-based evictionepisodic_memory-- Long-term storage with sqlite-vec + FTS5 hybrid searchscratchpad-- Temporary agent reasoning workspace
Binary Vectors (MIB — Maximally Informative Binarization): Mnemosyne uses information-theoretic binarization (building on Moorcheh ITS, arXiv:2601.11557) to compress 384-dimensional float32 embeddings into 48-byte binary vectors — a 32× reduction. Retrieval uses Hamming distance (XOR + popcount) for deterministic, CPU-efficient ranking without ANN indices or external vector databases. This enables sub-millisecond search over millions of vectors entirely within SQLite.
SQLite is already in your stack. Hermes uses it for session persistence. Mnemosyne extends that same file -- no new dependencies, no Docker containers, no connection pooling.
| Feature | Honcho | Zep | Mnemosyne |
|---|---|---|---|
| Deployment | Docker + PostgreSQL | Docker + Postgres | pip install |
| Query Language | REST API | REST API | SELECT ... WHERE MATCH |
| Vector Store | pgvector | pgvector | sqlite-vec |
| Text Search | Separate API | Separate API | Built-in FTS5 |
| Auth Required | Yes (supabase) | Yes | No |
| Offline Mode | No | No | Yes |
| Cold Start Latency | 500-800ms | 800ms+ | 0ms |
By default, Mnemosyne stores its main database at ~/.hermes/mnemosyne/data/mnemosyne.db. Named memory banks live under ~/.hermes/mnemosyne/data/banks/<name>/, and standalone triple stores may create triples.db in the data directory.
# Simple backup of the full Mnemosyne data directory
cp -a ~/.hermes/mnemosyne/data ~/backups/mnemosyne_data_$(date +%Y%m%d)
# Export to JSON (portable across machines)
hermes mnemosyne export --output mnemosyne_backup.json
# Import on a new machine
hermes mnemosyne import --input mnemosyne_backup.jsonImport directly from supported providers into Mnemosyne:
# List all supported providers
hermes mnemosyne import --list-providers
# Mem0 → Mnemosyne
hermes mnemosyne import --from mem0 --api-key sk-xxx
# Letta → Mnemosyne
hermes mnemosyne import --from letta --api-key sk-xxx
# Zep → Mnemosyne
hermes mnemosyne import --from zep --api-key sk-xxx
# Hindsight → Mnemosyne (JSON export or live API)
mnemosyne import-hindsight hindsight-export.json hermes
mnemosyne import-hindsight http://localhost:8888 hermes
# Hindsight → Mnemosyne via Hermes CLI
hermes mnemosyne import --from hindsight --file hindsight-export.json --bank hermes
hermes mnemosyne import --from hindsight --base-url http://localhost:8888 --bank hermes
# Generate a migration script for any provider
hermes mnemosyne import --from mem0 --generate-script --output-script migrate.py
# Use AI agent extraction (no SDK needed)
hermes mnemosyne import --from zep --agenticSupported providers: Mem0, Letta (MemGPT), Zep, Cognee, Honcho, SuperMemory, Hindsight
The generic Hermes CLI exposes the common importer options. Provider-specific options are available through the Python importers; for example, offline Letta AgentFile imports can use LettaImporter(agent_file_path="./agent.af").
Importers preserve source metadata where available. HindsightImporter uses a dedicated episodic import path to preserve original timestamps; other importers may store source timestamps in metadata while assigning a new Mnemosyne write timestamp. Use --dry-run to validate without writing.
| Variable | Default | Description |
|---|---|---|
MNEMOSYNE_DATA_DIR |
~/.hermes/mnemosyne/data |
Database directory |
MNEMOSYNE_VEC_TYPE |
int8 |
Vector compression: float32, int8, or bit |
MNEMOSYNE_VEC_WEIGHT |
0.5 |
Vector similarity weight |
MNEMOSYNE_FTS_WEIGHT |
0.3 |
FTS5 keyword weight |
MNEMOSYNE_IMPORTANCE_WEIGHT |
0.2 |
Importance weight |
MNEMOSYNE_WM_MAX_ITEMS |
10000 |
Working memory item limit |
MNEMOSYNE_WM_TTL_HOURS |
24 |
Working memory TTL |
MNEMOSYNE_RECENCY_HALFLIFE |
168 |
Recency decay halflife in hours (1 week) |
MNEMOSYNE_EP_LIMIT |
50000 |
Episodic memory recall limit |
MNEMOSYNE_SLEEP_BATCH |
5000 |
Max working memories to fetch for consolidation |
| Variable | Default | Description |
|---|---|---|
MNEMOSYNE_LLM_ENABLED |
true |
Enable LLM summarization in sleep cycle |
MNEMOSYNE_LLM_N_CTX |
2048 |
Context window size for local model |
MNEMOSYNE_LLM_MAX_TOKENS |
2048 |
Max output tokens per summary |
MNEMOSYNE_LLM_N_THREADS |
4 |
CPU threads for local inference |
MNEMOSYNE_LLM_REPO |
TheBloke/TinyLlama... |
HuggingFace repo for GGUF download |
MNEMOSYNE_LLM_FILE |
tinyllama...Q4_K_M.gguf |
GGUF filename |
Use a remote model (llama.cpp server, vLLM, Ollama, etc.) instead of local TinyLlama:
| Variable | Default | Description |
|---|---|---|
MNEMOSYNE_LLM_BASE_URL |
(none) | OpenAI-compatible API base URL, e.g. http://localhost:8080/v1 |
MNEMOSYNE_LLM_API_KEY |
(none) | API key for authenticated endpoints |
MNEMOSYNE_LLM_MODEL |
(none) | Model identifier sent in requests |
When BASE_URL is set, Mnemosyne skips local ctransformers and uses your remote model for consolidation. Falls back to local if remote is unreachable, then to aaak encoding.
Route consolidation and fact extraction through a host-provided LLM (e.g.,
Hermes' authenticated agent.auxiliary_client.call_llm). Useful for
OAuth-backed providers like openai-codex that don't fit the URL+API-key
remote shape.
| Variable | Default | Description |
|---|---|---|
MNEMOSYNE_HOST_LLM_ENABLED |
false |
Opt in to host-adapter routing |
MNEMOSYNE_HOST_LLM_PROVIDER |
(none) | Optional provider override, e.g. openai-codex |
MNEMOSYNE_HOST_LLM_MODEL |
(none) | Optional model override, e.g. gpt-5.1-mini |
MNEMOSYNE_HOST_LLM_N_CTX |
32000 |
Prompt-budget when host is the chosen path (TinyLlama-calibrated LLM_N_CTX=2048 is too small for Codex/GPT-class) |
When the host call fails, the adapter falls back to the local GGUF model rather than the remote URL. See docs/hermes-llm-integration.md for the full behavior model and session-shutdown semantics.
In addition to environment variables, Mnemosyne supports configuration via Hermes' config.yaml file. This is the recommended approach for plugin-level settings, keeping all Hermes configuration in one place.
Place the memory.mnemosyne block under the top-level memory key:
memory:
mnemosyne:
# Enable automatic consolidation on session boundaries
auto_sleep: true
# Minimum working memory count before auto-sleep triggers.
# Prevents consolidation on trivial sessions. Default: 50
sleep_threshold: 50
# Vector storage type: float32 (full precision), int8 (default, good balance), or bit (32x smaller)
vector_type: int8
# Regex patterns to filter BEFORE memory storage.
# Content matching any pattern is silently skipped (not stored).
# Useful for excluding shell commands, stack traces, and boilerplate.
ignore_patterns:
- "^pip install"
- "^npm install"
- "^sudo "
- "^Traceback \\(most recent call last\\)"
- "^Error:"
- "^git "| Key | Type | Default | Description |
|---|---|---|---|
auto_sleep |
bool |
false |
Auto-run sleep() consolidation on session boundaries. Also settable via MNEMOSYNE_AUTO_SLEEP_ENABLED env var. |
sleep_threshold |
int |
50 |
Minimum working memory entries before auto-sleep triggers. Skips consolidation on sessions with few memories. |
vector_type |
str |
int8 |
Vector storage type: float32 (1,536 bytes/vector), int8 (384 bytes, default), bit (48 bytes, 32× smaller). |
ignore_patterns |
list[str] |
[] |
Regex patterns applied via Python re.search(). Memories matching any pattern are discarded before storage. |
ignore_patterns filters content before it enters memory storage. It uses Python's re.search() with re.IGNORECASE to match patterns against the content string. If any pattern matches, the memory is silently skipped — it will never appear in working memory or recall results.
Common use cases:
- Shell commands:
"^pip ","^npm ","^git ","^sudo ","^apt " - Stack traces:
"^Traceback \\(most recent call last\\)","^Error:","^\\s+at " - Boilerplate:
"^---BEGIN","^#include","^#!/" - System noise: Any pattern matching low-signal operational chatter
Patterns can be specified as a YAML list (one per line) or as a comma-separated string. They are applied at remember() time and logged at DEBUG level when matched.
Example:
memory:
mnemosyne:
ignore_patterns:
- "^pip "
- "^npm "
- "^Traceback \\(most recent call last\\)"
- "^Error:"
- "^\\s+at "Mnemosyne includes a real-time memory event system for reactive and distributed memory architectures.
An event-driven stream supporting both push (callbacks) and pull (iterator) patterns. Thread-safe, with a configurable buffer for late-connecting iterators.
from mnemosyne.core.streaming import MemoryStream, MemoryEvent, EventType
stream = MemoryStream(max_buffer=100)
# Push: register callbacks
stream.on(EventType.MEMORY_ADDED, lambda e: print(f"New: {e.memory_id}"))
# Pull: iterate over events as they occur
for event in stream.listen([EventType.MEMORY_ADDED, EventType.MEMORY_CONSOLIDATED]):
process(event)Supported event types: MEMORY_ADDED, MEMORY_RECALLED, MEMORY_INVALIDATED, MEMORY_CONSOLIDATED, MEMORY_UPDATED.
Checkpoint-based incremental synchronization between Mnemosyne instances. Computes diffs (only changed memories since last sync) and applies them with insert/update/skip tracking.
from mnemosyne.core.streaming import DeltaSync
ds = DeltaSync(mnemosyne)
# Push: compute and send delta since last sync
result = ds.sync_to("peer_b", "working_memory")
# result = {"delta": [...], "count": 3, "checkpoint": "..."}
# Pull: receive and apply remote delta
stats = ds.sync_from("peer_a", remote_delta, "working_memory")
# stats = {"inserted": 3, "updated": 0, "skipped": 0}Checkpoints are persisted per-peer, enabling incremental sync without re-sending already-synchronized memories. Ideal for multi-agent setups and backup/restore workflows.
# Run tests locally
python -m pytest tests/test_beam.py -v
# Run benchmarks
python tests/benchmark_beam_working_memory.pyAll changes are validated through GitHub Actions CI on Python 3.9--3.12 before merging.
Mnemosyne publishes GitHub Releases and PyPI packages automatically on every v* tag. See CONTRIBUTING.md for the release process.
Full documentation is in the docs/ directory:
- Getting Started -- Installation, quickstart, first memory
- Architecture -- BEAM tiers, SQLite backend, hybrid search
- API Reference -- Python API:
remember,recall,sleep, triples, importers - Hermes Integration -- Using as a Hermes memory backend
- Hermes LLM Integration -- Routing consolidation through Hermes' authenticated provider (Codex/OAuth)
- LLM Installation Guide -- Installation instructions for AI agents and LLMs
- Configuration -- Environment variables, vector compression, LLM setup
- Changelog -- Release history
Contributions are welcome. Areas of active interest:
- Encrypted cloud sync (optional, user-controlled)
- Browser extension for web context capture
- Additional embedding models
- Multi-language support
See CONTRIBUTING.md for guidelines.
- Discord: Join the Mnemosyne community for help, discussion, and announcements
- GitHub Issues: Report bugs or request features
- Documentation: docs.mnemosyne.site
- Email: abdi.moya@gmail.com
If this project saves you time or helps your agents remember, consider supporting it:
⭐ Star the repo if you find it useful!
MIT License -- See LICENSE
Copyright (c) 2026 Abdias J
- Hermes Agent Framework -- The ecosystem Mnemosyne was built for
- Honcho -- For defining the stateful memory space
- Mempalace -- For proving local-first memory can compete on benchmarks
- SQLite -- The world's most deployed database
"The faintest ink is more powerful than the strongest memory." -- Hermes Trismegistus
