Long-term memory & persistent context for AI agents.
agent-recall gives your AI agents memory that survives across sessions, reboots, and conversations. It stores, retrieves, and threads context so your agent remembers who you are, what you built, and where you left off — without token bloat or stale hallucinations.
Built for Hermes Agent and compatible with any agent framework that can call Python or REST.
| Feature | Description |
|---|---|
| Session Threading | Weave conversation threads into persistent, queryable memory |
| Multi-Backend Storage | SQLite (default), JSON files, or vector DB (Qdrant/Chroma) |
| Context Compression | Auto-summarize old context to stay within token limits |
| Structured Recall | Tag, filter, and search memories by topic, time, or session |
| Agent-Agnostic | Works with Hermes, LangChain, AutoGen, or raw OpenAI calls |
| CLI + Python API | Use from code or manage memory from the terminal |
| Zero-Config Defaults | Works out of the box; tune when you need to |
pip install agent-recallOr install from source:
git clone https://github.com/Hemsagar00/agent-recall.git
cd agent-recall
pip install -e ".[dev]"from agent_recall import Memory
mem = Memory() # uses SQLite in ~/.agent_recall/
# Store a memory
mem.remember(
content="User prefers terse replies. No emoji. Tony Stark persona.",
tags=["user_pref", "persona"],
session_id="session_001"
)# Semantic search
results = mem.recall("What does the user want me to sound like?", top_k=3)
# Filter by tag
results = mem.recall(tag="user_pref")
# Time-bounded recall
results = mem.recall(since="2026-05-01")from agent_recall import ContextThread
thread = ContextThread(thread_id="friday_dev")
thread.add_turn(role="user", content="Build a landing page")
thread.add_turn(role="assistant", content="Done. Deployed to Vercel.")
# Later, in a new session:
history = thread.get_context(max_tokens=4000)
# -> returns compressed, relevant history ready to inject into prompt┌─────────────────────────────────────────────┐
│ Agent (Hermes / OpenAI) │
├─────────────────────────────────────────────┤
│ ContextThread │ Memory.recall() │
│ (conversation) │ (semantic search) │
├─────────────────────────────────────────────┤
│ Core Engine (agent_recall.core) │
│ ├─ Summarizer (compress old context) │
│ ├─ Embedder (sentence-transformers) │
│ └─ Indexer (BM25 + vector hybrid) │
├─────────────────────────────────────────────┤
│ Storage Backends │
│ ├─ SQLite (default, local, file-backed) │
│ ├─ JSON (portable, human-readable) │
│ └─ VectorDB (Qdrant / Chroma / pgvector) │
└─────────────────────────────────────────────┘
| Module | Purpose |
|---|---|
agent_recall.core |
Memory engine, embedding, summarization, indexing |
agent_recall.backends |
Pluggable storage: SQLite, JSON, Qdrant, Chroma |
agent_recall.utils |
Token counting, compression, serialization helpers |
agent_recall.cli |
Terminal tool to inspect, search, and purge memory |
# Initialize a new memory store
agent-recall init --backend sqlite --path ~/.friday_memory
# Add a memory from terminal
agent-recall remember "User uses Tailscale, not Cloudflare Tunnel" --tags infra,networking
# Search memories
agent-recall recall "Tailscale" --top 5
# List all sessions
agent-recall sessions
# Export memory to JSON
agent-recall export --out backup_2026.json
# Purge old memories (with confirmation)
agent-recall purge --older-than 90dagent-recall uses a YAML config file at ~/.agent_recall/config.yaml:
backend: sqlite
sqlite_path: ~/.agent_recall/memory.db
embedding_model: sentence-transformers/all-MiniLM-L6-v2
max_context_tokens: 4000
summarize_after_turns: 20
auto_compress: trueOverride per-session:
mem = Memory(config_path="./project_memory.yaml")| Backend | Best For | Persistence |
|---|---|---|
| SQLite (default) | Local agents, single-user, fast | File-based |
| JSON | Debugging, human-readable backups | File-based |
| Qdrant | Multi-agent, cloud-native, scale | Server or embedded |
| Chroma | R-heavy workloads, local vector search | File-based |
Switch backends:
from agent_recall import Memory
from agent_recall.backends import QdrantBackend
mem = Memory(backend=QdrantBackend(url="http://localhost:6333"))Add to your Hermes skill or agent loop:
from agent_recall import Memory, ContextThread
mem = Memory()
thread = ContextThread(thread_id=session_id)
def hermes_turn(user_input: str) -> str:
# 1. Retrieve relevant past context
past = mem.recall(user_input, top_k=5)
# 2. Build prompt with memory
prompt = build_prompt(user_input, past, thread.get_context())
# 3. Generate response
response = llm.generate(prompt)
# 4. Store this turn
thread.add_turn("user", user_input)
thread.add_turn("assistant", response)
return response- Multi-agent shared memory (team memory spaces)
- Auto-tagging with LLM-based classification
- Conflict resolution when memories contradict
- Web dashboard for browsing & editing memory
- Import/export from Obsidian / Notion
- Temporal decay (forget low-value memories over time)
git clone https://github.com/Hemsagar00/agent-recall.git
cd agent-recall
pip install -e ".[dev]"
pytest tests/PRs welcome. Open an issue to discuss architecture changes before coding.
MIT © HemSagarKasi
"An agent without memory is just a prompt with amnesia."