Remembers things at scale and knows when it doesn't.
Claude's built-in memory works fine for 20 facts. When you have hundreds of memories across users and projects, you need semantic search, importance scoring, and confidence levels — not grep over a markdown file.
widemem gives Claude Code a real memory engine: vector embeddings, LLM-based fact extraction, temporal decay, and the honesty to say "I don't have that" instead of guessing.
Built on widemem-ai — the open-source memory layer for AI agents.
| Claude's MEMORY.md | widemem | |
|---|---|---|
| Storage | Flat markdown | Vector embeddings (FAISS/Qdrant) |
| Search | Grep / full context load | Semantic similarity |
| Knows when it doesn't know | Limited | Yes — 4 confidence levels |
| Importance scoring | No | 1-10 with temporal decay |
| Dedup / conflict detection | No | LLM-based |
| Quality gates | No | 5-tier classification |
| Scales to 1000s of facts | Poorly (context window) | Yes (vector index) |
| Multi-user support | No | Built-in |
You: "What city do I live in?"
AI: confidence: NONE — "I don't have that stored. Want to tell me?"
You: "Berlin."
AI: [stores with importance 7, semantic embedding, never guesses again]
...500 memories later...
You: "What do you know about my work?"
AI: confidence: HIGH — "You're a data scientist at Stripe.
You prefer Python with type hints. You're on the platform team."
pip install widemem-ai[mcp,sentence-transformers]For OpenAI embeddings instead of local:
pip install widemem-ai[mcp]Option A — Plugin (recommended):
# In Claude Code
/plugin install widememOption B — Manual:
# Copy skill files to your project
git clone https://github.com/remete618/widemem-skill.git
mkdir -p .claude/skills
cp -r widemem-skill/skills/widemem .claude/skills/
# Add MCP config (or merge into your existing .mcp.json)
cp widemem-skill/.mcp.json .mcp.jsonAdd your OpenAI API key to .mcp.json (replace your-key-here). The default config uses gpt-4o-mini for fact extraction and conflict resolution (~$0.01/operation).
| Env Variable | Default | Options |
|---|---|---|
WIDEMEM_LLM_PROVIDER |
openai |
openai, anthropic, ollama |
WIDEMEM_LLM_MODEL |
gpt-4o-mini |
Any model your provider supports |
WIDEMEM_EMBEDDING_PROVIDER |
sentence-transformers |
openai, sentence-transformers |
WIDEMEM_DATA_PATH |
~/.widemem/data |
Any local path |
WIDEMEM_CONFIDENCE_HIGH |
0.45 |
Similarity threshold for HIGH confidence |
WIDEMEM_CONFIDENCE_MODERATE |
0.25 |
Similarity threshold for MODERATE confidence |
WIDEMEM_CONFIDENCE_LOW |
0.12 |
Similarity threshold for LOW confidence |
Local-only setup (advanced): To run without an API key, set WIDEMEM_LLM_PROVIDER=ollama and WIDEMEM_LLM_MODEL=llama3 (8B+ recommended). 3B models produce JSON errors and unreliable conflict resolution. Remove the OPENAI_API_KEY line.
| Command | What it does |
|---|---|
/mem search <query> |
Semantic search across all memories |
/mem add <text> |
Store a fact (with quality gates — not everything gets in) |
/mem pin <text> |
Pin critical fact at importance 9.0 (allergies, API keys, things that matter) |
/mem stats |
Memory count and health check |
/mem export |
Export all memories as JSON |
/mem import |
Import facts from Claude's MEMORY.md into widemem |
/mem reflect |
Full memory audit — find contradictions, duplicates, staleness |
Or just talk normally. The skill triggers on "remember this", "do you remember", "what do you know about me", and similar phrases.
Most memory tools store everything. You say "ok thanks" and now that's a memory forever. You sneeze and it stores your sneeze.
widemem has quality gates. Every incoming fact gets classified before storage:
| Classification | Example | What happens |
|---|---|---|
| CRITICAL | "Remember: I'm allergic to penicillin" | Pinned at importance 9.0 |
| HIGH | "Actually I moved to Berlin" | Deletes old location, stores new |
| MEDIUM | "I work at Stripe" | SDK handles dedup, then store |
| LOW | "I'm on the dev branch right now" | Stored only if you insist |
| SKIP | "ok", "thanks", "run the tests" | Silently ignored |
Your memory stays clean. Your vector store stays relevant. Your facts stay accurate.
Every search tells you how sure it is:
| Level | Similarity | What the AI says |
|---|---|---|
| HIGH | >= 0.45 | Answers confidently |
| MODERATE | >= 0.25 | "I think so, but not 100% sure" |
| LOW | >= 0.12 | "I have a vague recollection..." |
| NONE | < 0.12 | "I don't have that stored." |
The skill never makes up memories. If confidence is NONE, it says so instead of guessing.
You: "I ALREADY told you my anniversary is March 15th"
The skill detects frustration, searches harder, and when you repeat the fact, pins it at high importance so it never misses it again. Turns out user annoyance is actually useful data.
/mem reflect runs a full audit:
- Duplicates — "Lives in SF" and "User's city is San Francisco" → keep one
- Contradictions — "Works at Google" vs "Works at Meta" → which is current?
- Stale entries — "Debugging auth module" from 3 weeks ago → still relevant?
- Consolidation — "Likes Python" + "Uses mypy" + "Prefers typed code" → merge
Reports everything before touching anything. Nothing gets deleted without your say-so.
Already have facts in Claude's built-in memory? Import them:
/mem import
The skill reads your MEMORY.md index, follows each linked file, extracts the facts, runs quality gates, and stores them in widemem. Duplicates are skipped. Your original MEMORY.md files are not deleted.
You: "I work at Stripe as a data scientist"
|
v
Quality Gate: MEDIUM (personal fact) --> call widemem_add
|
v
widemem_add --> LLM extracts fact, dedup check, conflict resolution --> store (importance: 7)
|
v
Stored in FAISS with embedding vector
...three weeks later...
You: "What do you know about my job?"
|
v
widemem_search --> confidence: HIGH, similarity: 0.89
|
v
"I recall you work at Stripe as a data scientist."
+----------------------------------+
| Claude Code + /mem skill |
| (SKILL.md + references/) |
+-----------+----------------------+
| MCP protocol (stdio)
+-----------v----------------------+
| widemem MCP Server |
| 7 tools: add, search, pin, |
| delete, count, export, health |
+-----------+----------------------+
|
+-----------v----------------------+
| widemem-ai SDK |
| Fact extraction, conflict |
| resolution, confidence scoring |
+----------+-----------+-----------+
| FAISS/ | SQLite | LLM |
| Qdrant | history | (any) |
+----------+-----------+-----------+
| Feature | widemem | claude-mem | Pure SKILL.md skills |
|---|---|---|---|
| Semantic search | FAISS/Qdrant | Chroma | grep |
| Confidence scoring | 4 levels | No | No |
| Quality gates | 5 tiers | Basic | Some |
| Frustration detection | Yes | No | No |
| YMYL protection | Yes | No | No |
| Conflict resolution | LLM-based | No | No |
| Memory pinning | Yes | No | No |
| Self-reflection | /mem reflect |
No | /memory prune |
| Runs locally | Embeddings local, LLM cloud (or Ollama) | Requires Chroma + Bun | Filesystem |
| Install | pip + skill | npm + plugin | curl |
| Says "I don't know" | Yes (4 confidence levels) | Basic | No |
- Python 3.10+
widemem-ai[mcp,sentence-transformers](pip install widemem-ai[mcp,sentence-transformers])- OpenAI API key (default) or Anthropic API key
- For fully local operation (advanced): Ollama with an 8B+ model instead of an API key
widemem uses an LLM for fact extraction and conflict resolution. Search is free (local embeddings).
| Operation | What happens | Cost (gpt-4o-mini) |
|---|---|---|
widemem_add |
LLM extracts facts, checks for conflicts | ~$0.0001 |
widemem_pin |
Same as add, with elevated importance | ~$0.0001 |
widemem_search |
Local embeddings only, no LLM call | $0 |
/mem reflect |
LLM analyzes exported memories for issues | ~$0.001 |
| Usage level | Description | Monthly cost |
|---|---|---|
| Light | Casual coding, ~10 stores/day | ~$0.05 |
| Moderate | Regular coding, ~20 stores/day, weekly reflect | ~$0.15 |
| Heavy | All-day coding, ~50 stores/day, frequent reflects | ~$0.40 |
| Extreme | 200 stores/day, 10+ hours/day | ~$1.50 |
For context: a single Claude Code conversation with Opus costs more than a month of heavy widemem usage. The ceiling is low because gpt-4o-mini is $0.15/1M input tokens, each operation sends ~500 tokens, and search (the most frequent operation) is free.
With Ollama, everything runs on your machine. No API calls, no cost, no data leaves your laptop. The tradeoff: local models (especially 3B) are less reliable at fact extraction and conflict resolution. Use 8B+ for decent results. See "Bring Your Own LLM" below.
widemem separates the embedding model (for search) from the LLM (for fact extraction and conflict resolution). You can mix and match.
| LLM | Extraction quality | Conflict resolution | JSON reliability | Speed |
|---|---|---|---|---|
| gpt-4o | Best | Works (replaces old facts) | Perfect | 19s/6 facts |
| gpt-4o-mini (default) | Good | Works (replaces old facts) | Perfect | 25s/6 facts |
| llama3 8B (Ollama) | Good | Fails (keeps both old and new) | Good | 89s/6 facts |
| llama3.2 3B (Ollama) | Poor (drops facts) | Fails | Unreliable (JSON errors) | 48s/4 facts |
Edit the env block in your .mcp.json:
OpenAI (default, recommended):
"WIDEMEM_LLM_PROVIDER": "openai",
"WIDEMEM_LLM_MODEL": "gpt-4o-mini",
"OPENAI_API_KEY": "sk-..."Anthropic:
"WIDEMEM_LLM_PROVIDER": "anthropic",
"WIDEMEM_LLM_MODEL": "claude-haiku-4-5-20251001",
"ANTHROPIC_API_KEY": "sk-ant-..."Ollama (local, free):
"WIDEMEM_LLM_PROVIDER": "ollama",
"WIDEMEM_LLM_MODEL": "llama3"Requires ollama serve running. Use ollama pull llama3 first. Remove the API key line.
The LLM handles two jobs:
- Fact extraction -- turns "I work at Stripe as a backend engineer" into a clean fact with an importance score
- Conflict resolution -- decides whether a new fact contradicts, duplicates, or extends an existing memory
The LLM does NOT handle search. Search uses the embedding model (sentence-transformers by default), which runs locally and is free. So even with a cloud LLM, your searches never leave your machine and never cost money.
The default thresholds (0.45/0.25/0.12) are calibrated for sentence-transformers/all-MiniLM-L6-v2. If you switch to OpenAI embeddings (text-embedding-3-small), scores run higher and you should raise the thresholds:
| Embedding provider | HIGH | MODERATE | LOW |
|---|---|---|---|
| sentence-transformers (default) | 0.45 | 0.25 | 0.12 |
| OpenAI embeddings | 0.65 | 0.45 | 0.25 |
Set via WIDEMEM_CONFIDENCE_HIGH, WIDEMEM_CONFIDENCE_MODERATE, WIDEMEM_CONFIDENCE_LOW in .mcp.json.
This skill is a taste of widemem-ai. The Python SDK can do more:
- Hierarchical memory (facts → summaries → themes)
- Retrieval modes (fast/balanced/deep)
- Topic weighting and custom extraction
- YMYL protection for health/financial/legal data
- Time-range queries and temporal decay functions
- Batch conflict resolution
- Export/import, history audit trails
If the skill makes you want more, the SDK is where it lives.
Apache 2.0 — same as widemem-ai.