License: MIT
This project demonstrates a local-first, schema-driven memory architecture for long-running AI collaboration.
It turns raw dialog into structured, reviewable memory artifacts: Facts → Topic Memory → Global Memory, with an audit log.
For full system design, see: docs/architecture.md.
This system is designed to:
- Persist long-running human ↔ AI collaboration state on disk
- Separate raw dialog from structured, reviewable artifacts
- Keep the model layer replaceable without breaking stored memory
- Support human curation of high-value long-term facts
- Not a vector database
- Not a SaaS backend
- Not a real-time chat replacement
- Not a knowledge graph engine
LLMs are stateless by design. Long projects (writing, research, startup building) need memory that is:
- Local-first: your data stays on disk, not in a SaaS database
- Structured: facts + summaries you can query / diff / review
- Human-in-the-loop: important facts and scores can be curated
- Composable: outputs are plain JSON + Markdown
flowchart LR
U[User / Raw Dialog] --> API[Ingestion API<br/>CLI/GUI/SDK]
API --> WL[Write Layer<br/>WritePipeline]
subgraph Contracts[Contract-first Outputs]
S1[summary.json<br/>schema-versioned]
T1[topic_memory.json<br/>schema-versioned]
G1[GLOBAL_MEMORY.json<br/>schema-versioned]
end
WL --> R[Append-only Persistence<br/>raw.md + meta.json]
WL --> S1
WL --> AUD[Audit Log<br/>append-only]
subgraph Build[Deterministic Rebuild Path]
TB[TopicMemoryBuilder<br/>aggregate + render]
GB[GlobalMemoryBuilder<br/>rule-based selection]
end
R --> TB
S1 --> TB
TB --> T1
T1 --> GB
GB --> G1
LLM[(LLM Provider<br/>Replaceable)] -. optional enrich .-> WL
classDef strong fill:#111,stroke:#555,color:#fff;
class WL,TB,GB strong;
git clone <your-repo-url>
cd memory
python -m venv .venv
source .venv/bin/activate
pip install -e .Optional (enable LLM enrichment):
export DEEPSEEK_API_KEY="your_key_here"Without an API key, the system will fall back to minimal summary generation (fallback summaries, no LLM facts).
This runs a minimal end-to-end flow:
- save one entry
- rebuild topic memory
- rebuild global memory
- print output location
python -m memory_app.demoExpected output:
✔ entry saved
✔ topic rebuilt
✔ global memory updated
✔ output written to ./data
You should see:
./data/GLOBAL_MEMORY.json
./data/GLOBAL_MEMORY.md
from memory_app.core.settings import Settings
from memory_app.core.write_pipeline import SaveEntryRequest, WritePipeline
from memory_app.core.indexer import TopicMemoryBuilder, GlobalMemoryBuilder
settings = Settings.load()
pipeline = WritePipeline(settings)
pipeline.save_entry(
SaveEntryRequest(
topic_path="Career/Interviews",
raw_text="Paste your dialog text here...",
score=7,
title=None,
ts=None,
source="manual_copy_paste",
)
)
TopicMemoryBuilder(settings).build_for_topic("Career/Interviews")
GlobalMemoryBuilder(settings).build_global_memory()
print("Memory updated at:", settings.memory_root)python -m memory_app.app.guiIn GUI mode:
- Enter
topic_path - Paste dialog
- Assign score
- Click Save
- Trigger topic/global rebuild
pytestdocs/
architecture.md system design doc (contracts, rebuild semantics)
src/memory_app/
demo.py minimal end-to-end smoke demo
app/ GUI entrypoint (Tkinter)
core/ pipeline, indexing, rendering, schemas
prompts/ prompt versions (enrich-v1, enrich-v2)
tests/ pytest tests
settings.json optional non-secret defaults (model, prompt, thresholds)
pyproject.toml package metadata
.env local secrets (gitignored)
.gitignore ignores local data/ and secrets
- Secrets (API keys) come from environment variables (optionally loaded from a local
.env). settings.jsonstores non-secret defaults only.
DEEPSEEK_API_KEY(alias:DEEPSEEK_KEY): enable LLM enrichment.MEMORY_ROOT(alias:LOCAL_MEMORY_ROOT): output directory for memory artifacts (default:./data).DEEPSEEK_MODEL: model name (default:deepseek-chat).PROMPT_VERSION: prompt version (default:enrich.v2).REQUEST_TIMEOUT_S: request timeout in seconds (default:300).MEMORY_SETTINGS_PATH: explicit path to asettings.jsonfile (optional).
Notes:
.envis optional and is only loaded for local dev convenience (from the current working directory) when callingSettings.load().- Never put
DEEPSEEK_API_KEYintosettings.json.
Highest → lowest:
- Programmatic overrides (e.g.
Settings.load(memory_root_override=...)) - Environment variables (including variables loaded from
.env) settings.json- Defaults
settings.json discovery:
MEMORY_SETTINGS_PATH(if set)<MEMORY_ROOT>/settings.json./settings.json
Minimal (enable enrichment, keep default ./data output):
# .env (optional)
DEEPSEEK_API_KEY="your_key_here"Write outputs to a custom directory:
MEMORY_ROOT="/absolute/path/to/memory-data"
python -m memory_app.demoThe project includes a minimal Tkinter GUI for pasting dialog text and generating memory artifacts.
These are additive extensions that preserve the core invariant: append-only persisted artifacts + rebuildable higher layers + stable JSON contracts.
API & integration
- REST API wrapper
Scale & operations
- Background task queue
- Observability hooks
Multi-tenant evolution
- Multi-user isolation
Schema lifecycle
- Schema version migration support
