Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions BACKLOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# ShrimPK Backlog

Tracked items for the ShrimPK kernel project. Updated after each sprint.
Source of truth for the ShrimPK kernel project.

## Status Legend
- **DONE** — shipped and tested
- **PLANNED** — scheduled for a specific sprint
- **BACKLOG** — accepted, not yet scheduled
- **RESEARCH** — needs investigation before scheduling

---

## Sprint Roadmap (KS73-KS80)

- [x] KS73: Entity unification — EntityFrame, UUID v5, alias store, entity supersession (PR #10)
- [ ] KS74: v0.8.0-beta — recall gap fixes (NR demotion, abstention threshold), TUI dashboard, README rewrite, installer testing
- [ ] KS75: Store-time contradiction detection
- [ ] KS76: Memory import — cold start solver, 4+ parsers (Claude Code, ChatGPT, Obsidian, Mem0)
- [ ] KS77: KU-3 fix + remaining recall fixes (90% gate)
- [ ] KS78: Public launch preparation
- [ ] KS79: Context compression — LLMLingua-2 ONNX at store time
- [ ] KS80: Retroactive link re-scoring + sleep replay

---

## HIGH — Retrieval Quality

*Components exist, need wiring. Validated by academic research.*

- [ ] PPR-weighted Hebbian traversal — Personalized PageRank seeded on echo hits, weighted by edge strength x ACT-R. +20% multi-hop QA (HippoRAG, NeurIPS 2024)
- [ ] Multi-resolution retrieval fallback — memory → label cluster → community summary cascade. All three layers exist, not connected as fallback chain (RAPTOR, ICLR 2024)
- [ ] Retrieval mode parameter — expose naive/local/global/hybrid on `echo` API (LightRAG, EMNLP 2025)
- [ ] Citation-weighted memory scoring — track which injected memories LLM actually cites in response, upweight high-utility memories. Proxy already intercepts responses (RMM, ACL 2025)

## HIGH — Memory Lifecycle

- [ ] Merge operation — explicit ADD/UPDATE/DELETE/NOOP diff during consolidation. All production systems converge on merge as required (Mem0, RMM, Think-in-Memory)
- [ ] Multi-granularity storage — tag memories by scale: utterance/turn/session. +10% LME accuracy (RMM paper)
- [ ] Write-path learned filtering — decide what NOT to store before embedding. Most underresearched area per 2026 survey (arXiv 2603.07670)
- [ ] Soft-deletion compaction — GC when FSRS strength drops below threshold. Currently decay only de-ranks, never removes (MemoryBank pattern)

## HIGH — Cortex Prerequisites (blocking v0.10.0)

- [ ] Inter-layer protocol design — Soul ↔ Brain ↔ Memory API surface. Direct Rust calls vs Tokio channels vs message types
- [ ] Security model for agentic stack — data safety layer, poisoned memory detection. Distinct from command-level Brainstem
- [ ] Alpha/Beta ARC competition model — formal design doc. Async parallelism, leader election, Adaptive Resonance Theory mapping

## MEDIUM — Model & Format Upgrades

- [ ] Nomic Embed Vision v1.5 — CLIP ViT-B/32 → Nomic, +7.8pp ImageNet zero-shot, 6x smaller ONNX. Breaking: 512→768 dim migration
- [ ] f16 quantization for vision/speech — SHRM v3, ~50% disk/RAM savings, f32 promotion at query time
- [ ] Band-limited resampling — replace resample_linear() with rubato crate. Correctness bug: aliasing at 48→16kHz
- [ ] BuiltinConsolidator — bundled extraction model, zero Ollama dependency for consolidation quality
- [ ] Configurable embedding provider — EmbeddingProvider trait, 10 fastembed models + OpenAI API (KS75 — DONE)

## MEDIUM — Graph & Entity

- [ ] Retroactive link invalidation — when A supersedes B, downweight ALL B-anchored Hebbian links, not just B itself (A-MEM/Zettelkasten pattern)
- [ ] Episodic anchoring — bidirectional indices linking Hebbian edges back to source episodes (Graphiti/Zep pattern)
- [ ] Entity-cluster summaries — entity-level community nodes, not just label-level (Graphiti temporal KG)

## MEDIUM — Viz & UI Polish

*Current state: Tauri 2 + Sigma.js 3.0 + ForceAtlas2, 3-level zoom (KS65-66). Functional but early MVP.*

**Graph Polish:**
- [ ] Smooth view transitions — animated node repositioning between galaxy/cluster/neighborhood (currently hard-resets layout)
- [ ] Louvain community visualization — color nodes by community, show boundaries (graphology-communities-louvain installed, unused)
- [ ] Edge labels on hover — show typed relationship (CoActivation, WorksAt, PrefersTool, etc.)
- [ ] Temporal slider — filter graph by time range, animate memory formation over time
- [ ] Custom node shapes per category — distinct shapes for Identity/Fact/Preference/ActiveProject/Conversation
- [ ] Entity super-nodes — render EntityFrame nodes at graph level, not just label clusters
- [ ] Node size by echo frequency — proportional to retrieval count, not just importance score

**Memory Curation:**
- [ ] Inline memory edit — edit content/labels from detail panel, PATCH endpoint on daemon
- [ ] Memory merge — select 2+ nodes, merge into one (new daemon endpoint)
- [ ] Manual link creation — create Hebbian edges from graph view (new daemon endpoint)
- [ ] Retag from graph — drag-drop between clusters or multi-select retag
- [ ] Bulk operations — multi-select for delete/retag/export

**Export Formats:**
- [ ] JSON export per memory — full metadata + embeddings + graph edges
- [ ] Graph export — GraphML/GEXF for external visualization tools

## MEDIUM — Quantization (v0.8.0)

- [ ] Int8 scalar quantization (4x compression, simsimd ready)
- [ ] TurboQuant integration (turbo-quant crate, 8-10x)
- [ ] Binary + float32 rescore pipeline

## MEDIUM — Intelligence Tuning

- [ ] Full ACT-R retrieval history (Vec<u32> ring buffer)
- [ ] ACT-R activation ON by default (after benchmarking)
- [ ] Three-tier store (hot/warm/cold)
- [ ] Importance retrieval boost (A/B test, then enable)

## MEDIUM — Product & Distribution

- [ ] Memory file export as .md sidecars — per-memory files with YAML frontmatter (distinct from bulk `shrimpk dump`)
- [ ] Cloud sync — encrypted cross-device memory, E2E encrypted, server sees only ciphertext
- [ ] Managed API planning
- [ ] Revenue model implementation

## MEDIUM — Benchmarks Not Yet Running

- [ ] LoCoMo benchmark
- [ ] MemoryAgentBench (ICLR 2026) — contradiction/conflict resolution focus
- [ ] EverMemBench (2025) — entity disambiguation focus

## LOW — Backlog

- [ ] Memory as weights prototype (PyTorch via shrimpk-python)
- [ ] Cluster summary tree (MemTree pattern)
- [ ] Custom fine-tuned embedding model
- [ ] crates.io publish (after API stabilizes)
- [ ] Code signing certificate
- [ ] PostToolUse async hook
- [ ] Predictive coding layer — surprise/prediction error signal (~300 lines Rust)
- [ ] Session-level dynamics tracking (COMEDY pattern — user-bot relationship)
- [ ] Emotion channel — Apache 2.0 ONNX model needed (slot reserved in SHRM)
- [ ] CAM++ speaker model upgrade — needs Apache 2.0 ONNX verification
- [ ] SigLIP 2 vision model — needs upstream ONNX availability

## RESEARCH — Long-horizon

- [ ] Causal retrieval — retrieve by causal relevance, not just similarity (2026 survey frontier)
- [ ] Model weight printing — cross-model knowledge transfer via externalized Hebbian weights
- [ ] PyTorch cross-attention memory module — ShrimPK as transformer memory (v1.0+ ML stage)
- [ ] GAAMA paper (arXiv 2603.27910) — concept-mediated KG with 4 node types, very close to ShrimPK architecture
- [ ] Reflexion pattern — self-improvement via failure memories (Shinn et al. 2023)
- [ ] Interleaved replay during sleep consolidation — novel-familiar mixing (neuroscience pattern)
- [ ] EWC (Elastic Weight Consolidation) — prevent catastrophic forgetting in Hebbian updates (Nature Comms 2025)

---

## Sync Issues (fix before next release)

- [ ] `docs/ROADMAP.md` stale at v0.5.0 — update to reflect v0.7.5 state
- [ ] `CHANGELOG.md` stops at v0.7.0 — missing v0.7.1 through v0.7.5
- [ ] MCP tool count inconsistent across docs (12 vs 14)

---

*Last updated: 2026-04-09*
29 changes: 22 additions & 7 deletions benchmarks/cross_model_smoke.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,18 +93,33 @@ def format_context(echo_results):
return "\n\n".join(kept) if kept else "No relevant memories found."


READER_SYSTEM_PROMPT = (
"You are extracting facts from conversation memories. "
"The answer is contained in the memories below. "
"Focus on what the USER said — user statements contain personal facts. "
"Extract the specific answer. Respond in one short sentence."
)

READER_USER_TEMPLATE = (
"Context:\n"
"-----\n"
"{context}\n"
"-----\n"
"\n"
"Given only the context above and not prior knowledge, extract the answer.\n"
"Question: {question}\n"
"Answer:"
)


def ask_ollama(question, context, model):
system = ("You are extracting facts from conversation memories. "
"The answer is contained in the memories below. "
"Focus on what the USER said -- user statements contain personal facts. "
"Extract the specific answer. Respond in one short sentence.")
user = f"{context}\n\nQuestion: {question}\nBased on the memories above, the answer is:"
t0 = time.time()
r = requests.post(f"{OLLAMA}/api/chat", json={
"model": model,
"messages": [
{"role": "system", "content": system},
{"role": "user", "content": user},
{"role": "system", "content": READER_SYSTEM_PROMPT},
{"role": "user", "content": READER_USER_TEMPLATE.format(
context=context, question=question)},
],
"stream": False,
"options": {"temperature": 0.0, "num_predict": 64},
Expand Down
41 changes: 23 additions & 18 deletions benchmarks/run_longmemeval.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,33 +146,38 @@ def truncate_context(context_parts, max_total=16000, max_per_item=3000):
return truncated


def ask_ollama(question, context, model="gemma3:1b"):
"""Ask Ollama to answer based on retrieved context."""
system_prompt = (
"You are answering questions about past conversations. "
"Use ONLY the retrieved conversation memories below to answer. "
"If the information is not in the memories, say you don't have that information. "
"Be concise and direct. Give short factual answers, ideally one sentence."
)

user_prompt = f"""Retrieved memories:
READER_SYSTEM_PROMPT = (
"You are extracting facts from conversation memories. "
"The answer is contained in the memories below. "
"Focus on what the USER said — user statements contain personal facts. "
"Extract the specific answer. Respond in one short sentence."
)

READER_USER_TEMPLATE = (
"Context:\n"
"-----\n"
"{context}\n"
"-----\n"
"\n"
"Given only the context above and not prior knowledge, extract the answer.\n"
"Question: {question}\n"
"Answer:"
)

{context}

Question: {question}

Answer in one short sentence."""

def ask_ollama(question, context, model="gemma3:1b"):
"""Ask Ollama to answer based on retrieved context."""
r = requests.post(
f"{OLLAMA_URL}/api/chat",
json={
"model": model,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
{"role": "system", "content": READER_SYSTEM_PROMPT},
{"role": "user", "content": READER_USER_TEMPLATE.format(
context=context, question=question)},
],
"stream": False,
"options": {"temperature": 0.1, "num_predict": 128},
"options": {"temperature": 0.0, "num_predict": 64},
},
timeout=300,
)
Expand Down
36 changes: 22 additions & 14 deletions benchmarks/run_longmemeval_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,27 +191,35 @@ def format_context(echo_results):
# FIX 2: Extraction-focused prompt — no refusal, positive framing
# ---------------------------------------------------------------------------

def ask_ollama(question, context, model="gemma3:1b"):
"""Ask Ollama with extraction-focused prompt."""
system_prompt = (
"You are extracting facts from conversation memories. "
"The answer is contained in the memories below. "
"Focus on what the USER said — user statements contain personal facts. "
"Extract the specific answer. Respond in one short sentence."
)
READER_SYSTEM_PROMPT = (
"You are extracting facts from conversation memories. "
"The answer is contained in the memories below. "
"Focus on what the USER said — user statements contain personal facts. "
"Extract the specific answer. Respond in one short sentence."
)

READER_USER_TEMPLATE = (
"Context:\n"
"-----\n"
"{context}\n"
"-----\n"
"\n"
"Given only the context above and not prior knowledge, extract the answer.\n"
"Question: {question}\n"
"Answer:"
)

user_prompt = f"""{context}

Question: {question}
Based on the memories above, the answer is:"""

def ask_ollama(question, context, model="gemma3:1b"):
"""Ask Ollama with extraction-focused prompt."""
r = requests.post(
f"{OLLAMA_URL}/api/chat",
json={
"model": model,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
{"role": "system", "content": READER_SYSTEM_PROMPT},
{"role": "user", "content": READER_USER_TEMPLATE.format(
context=context, question=question)},
],
"stream": False,
"options": {"temperature": 0.0, "num_predict": 64},
Expand Down
31 changes: 30 additions & 1 deletion crates/shrimpk-core/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,35 @@ use std::path::PathBuf;
/// Default maximum disk usage: 2 GB.
const DEFAULT_MAX_DISK_BYTES: u64 = 2_147_483_648;

/// Temporal query keywords shared between echo scoring (`apply_temporal_boost`)
/// and memory reformulation (`detect_temporal_keyword`).
///
/// Single source of truth to prevent keyword vocabulary mismatch (KS76 Track 2).
pub const TEMPORAL_QUERY_KEYWORDS: &[&str] = &[
"yesterday",
"today",
"last week",
"last month",
"last year",
"recently",
"just now",
"this morning",
"this week",
"this month",
"days ago",
"weeks ago",
"months ago",
"deadline",
"upcoming",
"when",
"scheduled",
"date",
"due",
"plan",
"next week",
"next month",
];

/// Reranker backend for echo result reranking.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default)]
pub enum RerankerBackend {
Expand Down Expand Up @@ -458,7 +487,7 @@ impl Default for EchoConfig {
use_power_law_decay: default_true(),
use_importance: default_true(),
activation_weight: default_activation_weight(),
importance_weight: 0.0,
importance_weight: 0.25,
use_full_actr_history: false,
community_summaries_enabled: default_true(),
community_summary_threshold: default_community_summary_threshold(),
Expand Down
5 changes: 3 additions & 2 deletions crates/shrimpk-core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ pub mod traits;

// Re-export commonly used types at crate root
pub use config::{
EchoConfig, EmbeddingBackend, FileConfig, QuantizationMode, RerankerBackend, config_dir,
config_path, disk_usage, load_config_file, resolve_config, save_config_file,
EchoConfig, EmbeddingBackend, FileConfig, QuantizationMode, RerankerBackend,
TEMPORAL_QUERY_KEYWORDS, config_dir, config_path, disk_usage, load_config_file, resolve_config,
save_config_file,
};
pub use entity::{EntityFrame, EntityId, EntityKind};
pub use error::{Result, ShrimPKError};
Expand Down
Loading
Loading