Hybrid memory skill for assistants: SQLite + FTS5 + Vector DB (Chroma).
This repository provides a production-oriented SKILL.md for building persistent, time-aware conversational memory with hybrid retrieval and conflict arbitration.
Most assistant memory implementations are either:
- Too simple: keyword notes only.
- Too heavy: multi-service infrastructure from day one.
mem-skill targets a practical middle path:
- Keep source-of-truth in SQLite.
- Add semantic recall with Chroma.
- Use LLM arbitration for updates and contradictions.
- Persistent atomic user facts with confidence and validity windows.
- Hybrid retrieval: lexical (
FTS5) plus vector similarity. - Conflict handling: insert, update, supersede, expire, pending confirmation.
- Atomic and idempotent upsert flow.
- Profile materialization and budget-aware prompt injection.
- High-risk safety guardrails for sensitive fields.
- Main chat loop handles user response generation.
- Async memory observer processes each completed turn.
- SQLite stores
factsandfact_history. - SQLite
FTS5handles lexical search. - Chroma stores embeddings for semantic retrieval.
- Arbitration output is persisted and synced back to vector index.
SKILL.md: executable skill specification and workflow.agents/openai.yaml: UI metadata and default skill prompt.
- Install or copy this skill folder into your Codex skills directory.
- Trigger with
$mem-skillin prompts. - Ask Codex to scaffold implementation from the workflow in
SKILL.md.
Example prompt:
Use $mem-skill to build a memory observer with SQLite facts, FTS5 retrieval, Chroma vector search, and atomic conflict upserts.
- Start with SQLite schema (
facts,fact_history,facts_fts). - Add embedding and Chroma collection sync.
- Implement hybrid scoring and candidate fusion.
- Add LLM JSON arbitration and transactional upsert.
- Materialize profile and inject relevant slices per turn.
Track at minimum:
- Lexical retrieval hit rate.
- Vector retrieval hit rate.
- Hybrid top-k recall.
- Wrong-overwrite rate.
- Pending-confirmation resolution rate.
- Observer latency p50 and p95.
- Prompt injection token cost.
Run ablation with:
- Lexical-only.
- Vector-only.
- Hybrid.
This repo currently ships the skill specification and metadata.
It does not yet include:
- Reference runtime implementation.
- Dataset and benchmark scripts.
- Demo UI or API server.
- Add minimal Python reference implementation.
- Add reproducible benchmark harness.
- Add sample datasets for contradiction and temporal updates.
- Publish evaluation reports for retrieval and overwrite safety.
This project is licensed under the MIT License.
See LICENSE for details.