Context
Problem: NeuroStack has no mechanism for detecting that an agent-written memory's claim has drifted from the current state of the vault content it references. Memories live in memories; notes live in notes; neither is compared against the other.
Concrete failure mode: an agent writes a memory on day T0 ("X is an open blocker, must do Y"). On day T3 the referenced project note is updated to record that Y was done. On day T10 a new session retrieves both the stale memory and a summary of the (now updated) note via vault_context. The consuming LLM has no signal that the memory has been superseded; it trusts both and reports X as still-open work.
Why simpler approaches were rejected in design discussion:
- Flag
potentially_stale in vault_context via note.updated_at > memory.created_at + LIKE '%link.md' — too noisy (the watcher reindexes on any edit), broken SQL (path collisions on short names like [[index]] / [[inbox]]), wrong timestamp semantics (memory updates wouldn't clear the flag), and no consumer contract (a new JSON field nothing is told to honour).
- Session-end git-commit reconciliation hook — NeuroStack is not git-aware and shouldn't be; it serves users whose source of truth isn't a git repo. Hooks are personal/local and LLM-specific, which is the wrong layer for a core NeuroStack feature.
Philosophy-aligned framing: NeuroStack already models "retrieval observation diverged from prediction" as a prediction error — the prediction_errors table is populated at retrieval time when cosine distance exceeds threshold (error_type='low_overlap', 'contextual_mismatch') and consumed via vault_prediction_errors. Memory drift is the same concept applied to the agent-written memory layer. This is the neuroscience-grounded, signal-driven vault-maintenance pattern NeuroStack is built on. Reusing it unifies the signal surface rather than introducing a parallel one.
Intended outcome: when a retrieved memory's embedding has drifted from the current chunk-level embeddings of the notes it references, write a memory_drift row to prediction_errors exactly as low_overlap is written for notes. The existing vault_prediction_errors tool surfaces it on demand. Agents/users reconcile by updating or forgetting the memory via the already-existing vault_update_memory / vault_forget tools. Git and any external-system awareness stay out of core; a narrow optional primitive is provided for users who want to feed evidence from their own hooks.
Design
Primitive: memory_drift as a new prediction-error type
Extend prediction_errors to carry memory-centric rows alongside note-centric ones. Single table, single semantic, single consumer contract.
Schema migration v16 (additive):
ALTER TABLE prediction_errors RENAME TO prediction_errors_old;
CREATE TABLE prediction_errors (
error_id INTEGER PRIMARY KEY AUTOINCREMENT,
note_path TEXT, -- NULL allowed for memory-centric rows
memory_id INTEGER REFERENCES memories(memory_id) ON DELETE CASCADE,
query TEXT NOT NULL,
cosine_distance REAL NOT NULL,
error_type TEXT NOT NULL, -- existing: 'low_overlap', 'contextual_mismatch'
-- new: 'memory_drift', 'memory_external_signal'
context TEXT,
detected_at TEXT NOT NULL DEFAULT (datetime('now')),
resolved_at TEXT,
CHECK (note_path IS NOT NULL OR memory_id IS NOT NULL)
);
INSERT INTO prediction_errors
SELECT error_id, note_path, NULL, query, cosine_distance, error_type, context, detected_at, resolved_at
FROM prediction_errors_old;
DROP TABLE prediction_errors_old;
CREATE INDEX idx_pred_errors_note ON prediction_errors(note_path);
CREATE INDEX idx_pred_errors_memory ON prediction_errors(memory_id) WHERE memory_id IS NOT NULL;
CREATE INDEX idx_pred_errors_type ON prediction_errors(error_type);
CREATE INDEX idx_pred_errors_unresolved ON prediction_errors(resolved_at) WHERE resolved_at IS NULL;
One-time table rebuild at migration. Existing row shape preserved.
Detection: retrieval-time writeback (mirrors low_overlap detector)
Fires when memories are retrieved — same pattern as the existing note-centric detector in src/neurostack/search.py:61-68.
Per retrieved memory:
extract_wiki_links(memory.content) — reuses src/neurostack/chunker.py:99.
- Resolve each link to a note path via
resolve_wiki_link() in src/neurostack/graph.py:49 — shortest-unique-path resolution via pre-built path_map/stem_map. (Avoids the broken-LIKE substring-match pattern.)
- If no links resolve, skip (no drift signal available — not an error, just no data).
- Fetch the resolved note's chunk embeddings from the existing
chunks.embedding blobs.
- Compute
max cosine similarity between the memory's embedding and the note's chunks. If 1 - max_sim > DRIFT_THRESHOLD, write a memory_drift row with memory_id, note_path, synthetic query='memory:{memory_id}', context=json.dumps({"wiki_link": ..., "max_sim": ...}).
- Debounce: before insert, look for an existing unresolved
memory_drift row with the same (memory_id, note_path) within the last N hours (24h proposed). If present, update cosine_distance in place (keep strongest signal). Prevents row-flood on hot retrievals.
Entry points:
search_memories() in src/neurostack/memories.py
build_vault_context() in src/neurostack/context.py (after the memories section is assembled)
Graceful degradation: memories without embeddings or without resolvable wiki-links are skipped silently.
Cost envelope: for vault_context returning ~10 memories with ~3 wiki-links each against notes with ~20 chunks, ~600 cosine ops per call, sub-100ms with SQLite-stored blobs + numpy. Cheaper than the LLM summary step already in vault_session_end.
Resolution: piggyback on existing memory lifecycle
A memory being updated or forgotten IS reconciliation. No new tool needed.
- In
update_memory() (src/neurostack/memories.py around the updated_at/revision_count bump): on successful update, UPDATE prediction_errors SET resolved_at = datetime('now') WHERE memory_id = ? AND resolved_at IS NULL.
- In forget/delete:
ON DELETE CASCADE on the FK cleans up automatically.
Consumer surface: extend existing tool, don't add new one
vault_prediction_errors (src/neurostack/tools/search_tools.py:543) already accepts error_type and resolve. Minimal extension:
- Add
memory_id: int | None = None filter.
- Rows with
memory_id IS NOT NULL include a memory sub-object (content, tags, created_at) alongside note_path. Existing callers are unaffected.
Usage (LLM-agnostic, any MCP client):
vault_prediction_errors(error_type="memory_drift", resolved=False, limit=20)
# → memories whose content has drifted from current vault state
# Agent decides: update_memory, forget, or ignore (false positive).
Optional: vault_record_memory_signal for external evidence
A narrow primitive users can target from their own hooks (git commits, issue-tracker events, calendar integrations). Core NeuroStack stays unaware of external systems — the tool just stores what the caller hands it.
vault_record_memory_signal(
memory_id: int,
source: str, # "git", "linear", "calendar", "manual", ...
evidence: dict, # opaque JSON payload
weight: float = 1.0, # higher = more confident staleness
) -> dict
Writes a prediction_errors row with error_type='memory_external_signal'. Usable by any MCP client for any signal type. Not tied to git, not tied to any specific LLM.
Files to change
| File |
Change |
src/neurostack/schema.py |
MIGRATION_V16, bump SCHEMA_VERSION, update dispatcher |
src/neurostack/reconcile.py |
new. detect_memory_drift(conn, memory_id, content, embedding, threshold, debounce_hours) -> list[dict]. Pure, no tool decorator. |
src/neurostack/memories.py |
Call detector after search_memories results assembled; in update_memory() mark open drift rows resolved |
src/neurostack/context.py |
Call detector after memories section in build_vault_context |
src/neurostack/tools/search_tools.py |
Extend vault_prediction_errors with memory_id filter + memory sub-object in rows |
src/neurostack/tools/memory_tools.py |
Add vault_record_memory_signal (~30 LOC) |
tests/test_reconcile.py |
new. Detection on synthetic memory+chunks, resolution on memory update, debounce |
CHANGELOG.md |
Version entry |
Reuse checklist
extract_wiki_links — chunker.py:99
resolve_wiki_link — graph.py:49
prediction_errors table + resolved_at lifecycle — schema.py
- Cosine threshold 0.62 (existing
low_overlap constant)
vault_prediction_errors consumer tool — tools/search_tools.py:543
- Retrieval-time writeback pattern — identical to existing
low_overlap detector
@registry.tool decorator — for vault_record_memory_signal
- Memory embedding already stored —
memories.embedding column
Verification
- Unit test (
tests/test_reconcile.py):
- Memory
"X is a blocker in [[test-project]]" embedded; note work/test-project.md chunk "X is a blocker." embedded.
- Retrieve memory → detector runs → cosine distance small → no row.
- Update note chunk to
"X was resolved.", reindex.
- Retrieve memory → cosine distance exceeds threshold →
memory_drift row.
vault_prediction_errors(error_type="memory_drift", resolved=False) returns the row with memory content + note path.
vault_update_memory(memory_id, content="X was resolved.") → prior row is marked resolved_at.
- Debounce: repeated retrievals within 24h update
cosine_distance in place, no duplicate rows.
- Cost check: measure
vault_context wall time with and without detector enabled. Budget: <200ms added for a 10-memory response.
- False-positive rate: run against real vault data and sample manually. If the 0.62 threshold produces unacceptable noise on short, focused memory embeddings vs. larger chunk embeddings, expose
DRIFT_THRESHOLD via config.toml.
Out of scope
- Session-end hook scripts — personal/local, not core.
- Any Claude Code / specific-LLM integration.
- Git awareness inside NeuroStack — users feed git signals via
vault_record_memory_signal if they want.
- Retrieval-time
potentially_stale response field — reconciliation stays a deliberate read of vault_prediction_errors, not an always-on flag.
- Changes to
memories schema (no resolved_at column on memories) — the signal lives on prediction_errors; the memory itself is updated/forgotten via existing tools.
Open questions
- Threshold: reuse 0.62 or pick a dedicated
MEMORY_DRIFT_THRESHOLD? Memory embeddings tend to be shorter and more focused than chunk embeddings — 0.62 may be too lax. Proposal: ship with 0.62, expose via config, tune after running against live data.
- Debounce window: 24h proposed. Too short → row churn; too long → stale signal.
- Migration risk: v16 rebuilds
prediction_errors. Users with thousands of rows will see a multi-second blocking migration at first run. Acceptable for a minor-version bump?
Context
Problem: NeuroStack has no mechanism for detecting that an agent-written memory's claim has drifted from the current state of the vault content it references. Memories live in
memories; notes live innotes; neither is compared against the other.Concrete failure mode: an agent writes a memory on day T0 ("X is an open blocker, must do Y"). On day T3 the referenced project note is updated to record that Y was done. On day T10 a new session retrieves both the stale memory and a summary of the (now updated) note via
vault_context. The consuming LLM has no signal that the memory has been superseded; it trusts both and reports X as still-open work.Why simpler approaches were rejected in design discussion:
potentially_staleinvault_contextvianote.updated_at > memory.created_at+LIKE '%link.md'— too noisy (the watcher reindexes on any edit), broken SQL (path collisions on short names like[[index]]/[[inbox]]), wrong timestamp semantics (memory updates wouldn't clear the flag), and no consumer contract (a new JSON field nothing is told to honour).Philosophy-aligned framing: NeuroStack already models "retrieval observation diverged from prediction" as a prediction error — the
prediction_errorstable is populated at retrieval time when cosine distance exceeds threshold (error_type='low_overlap','contextual_mismatch') and consumed viavault_prediction_errors. Memory drift is the same concept applied to the agent-written memory layer. This is the neuroscience-grounded, signal-driven vault-maintenance pattern NeuroStack is built on. Reusing it unifies the signal surface rather than introducing a parallel one.Intended outcome: when a retrieved memory's embedding has drifted from the current chunk-level embeddings of the notes it references, write a
memory_driftrow toprediction_errorsexactly aslow_overlapis written for notes. The existingvault_prediction_errorstool surfaces it on demand. Agents/users reconcile by updating or forgetting the memory via the already-existingvault_update_memory/vault_forgettools. Git and any external-system awareness stay out of core; a narrow optional primitive is provided for users who want to feed evidence from their own hooks.Design
Primitive:
memory_driftas a new prediction-error typeExtend
prediction_errorsto carry memory-centric rows alongside note-centric ones. Single table, single semantic, single consumer contract.Schema migration v16 (additive):
One-time table rebuild at migration. Existing row shape preserved.
Detection: retrieval-time writeback (mirrors
low_overlapdetector)Fires when memories are retrieved — same pattern as the existing note-centric detector in
src/neurostack/search.py:61-68.Per retrieved memory:
extract_wiki_links(memory.content)— reusessrc/neurostack/chunker.py:99.resolve_wiki_link()insrc/neurostack/graph.py:49— shortest-unique-path resolution via pre-builtpath_map/stem_map. (Avoids the broken-LIKE substring-match pattern.)chunks.embeddingblobs.max cosine similaritybetween the memory's embedding and the note's chunks. If1 - max_sim > DRIFT_THRESHOLD, write amemory_driftrow withmemory_id,note_path, syntheticquery='memory:{memory_id}',context=json.dumps({"wiki_link": ..., "max_sim": ...}).memory_driftrow with the same(memory_id, note_path)within the last N hours (24h proposed). If present, updatecosine_distancein place (keep strongest signal). Prevents row-flood on hot retrievals.Entry points:
search_memories()insrc/neurostack/memories.pybuild_vault_context()insrc/neurostack/context.py(after the memories section is assembled)Graceful degradation: memories without embeddings or without resolvable wiki-links are skipped silently.
Cost envelope: for
vault_contextreturning ~10 memories with ~3 wiki-links each against notes with ~20 chunks, ~600 cosine ops per call, sub-100ms with SQLite-stored blobs + numpy. Cheaper than the LLM summary step already invault_session_end.Resolution: piggyback on existing memory lifecycle
A memory being updated or forgotten IS reconciliation. No new tool needed.
update_memory()(src/neurostack/memories.pyaround theupdated_at/revision_countbump): on successful update,UPDATE prediction_errors SET resolved_at = datetime('now') WHERE memory_id = ? AND resolved_at IS NULL.ON DELETE CASCADEon the FK cleans up automatically.Consumer surface: extend existing tool, don't add new one
vault_prediction_errors(src/neurostack/tools/search_tools.py:543) already acceptserror_typeandresolve. Minimal extension:memory_id: int | None = Nonefilter.memory_id IS NOT NULLinclude amemorysub-object (content, tags, created_at) alongsidenote_path. Existing callers are unaffected.Usage (LLM-agnostic, any MCP client):
Optional:
vault_record_memory_signalfor external evidenceA narrow primitive users can target from their own hooks (git commits, issue-tracker events, calendar integrations). Core NeuroStack stays unaware of external systems — the tool just stores what the caller hands it.
Writes a
prediction_errorsrow witherror_type='memory_external_signal'. Usable by any MCP client for any signal type. Not tied to git, not tied to any specific LLM.Files to change
src/neurostack/schema.pyMIGRATION_V16, bumpSCHEMA_VERSION, update dispatchersrc/neurostack/reconcile.pydetect_memory_drift(conn, memory_id, content, embedding, threshold, debounce_hours) -> list[dict]. Pure, no tool decorator.src/neurostack/memories.pysearch_memoriesresults assembled; inupdate_memory()mark open drift rows resolvedsrc/neurostack/context.pybuild_vault_contextsrc/neurostack/tools/search_tools.pyvault_prediction_errorswithmemory_idfilter + memory sub-object in rowssrc/neurostack/tools/memory_tools.pyvault_record_memory_signal(~30 LOC)tests/test_reconcile.pyCHANGELOG.mdReuse checklist
extract_wiki_links—chunker.py:99resolve_wiki_link—graph.py:49prediction_errorstable +resolved_atlifecycle —schema.pylow_overlapconstant)vault_prediction_errorsconsumer tool —tools/search_tools.py:543low_overlapdetector@registry.tooldecorator — forvault_record_memory_signalmemories.embeddingcolumnVerification
tests/test_reconcile.py):"X is a blocker in [[test-project]]"embedded; notework/test-project.mdchunk"X is a blocker."embedded."X was resolved.", reindex.memory_driftrow.vault_prediction_errors(error_type="memory_drift", resolved=False)returns the row with memory content + note path.vault_update_memory(memory_id, content="X was resolved.")→ prior row is markedresolved_at.cosine_distancein place, no duplicate rows.vault_contextwall time with and without detector enabled. Budget: <200ms added for a 10-memory response.DRIFT_THRESHOLDviaconfig.toml.Out of scope
vault_record_memory_signalif they want.potentially_staleresponse field — reconciliation stays a deliberate read ofvault_prediction_errors, not an always-on flag.memoriesschema (noresolved_atcolumn onmemories) — the signal lives onprediction_errors; the memory itself is updated/forgotten via existing tools.Open questions
MEMORY_DRIFT_THRESHOLD? Memory embeddings tend to be shorter and more focused than chunk embeddings — 0.62 may be too lax. Proposal: ship with 0.62, expose via config, tune after running against live data.prediction_errors. Users with thousands of rows will see a multi-second blocking migration at first run. Acceptable for a minor-version bump?