diff --git a/docs/architecture/BRAIN-REGIONS-SUBSTRATE.md b/docs/architecture/BRAIN-REGIONS-SUBSTRATE.md new file mode 100644 index 000000000..fa18d78ed --- /dev/null +++ b/docs/architecture/BRAIN-REGIONS-SUBSTRATE.md @@ -0,0 +1,242 @@ +# Brain-Regions Substrate + +**Status:** design spec. Sibling to [CBAR-SUBSTRATE-ARCHITECTURE.md](CBAR-SUBSTRATE-ARCHITECTURE.md) and [GENOME-FOUNDRY-SENTINEL.md](GENOME-FOUNDRY-SENTINEL.md). Defines the structural contract that every cognitive subsystem (hippocampus, motor cortex, attention, sensory, sleep) inherits. No code changes from this PR — implementation slices follow per region. + +**Companion:** [COGNITION-ALGORITHMS.md](COGNITION-ALGORITHMS.md) — the algorithmic content (recall, cross-context, budget) that runs *inside* these regions. + +## Headline framing + +> *An infinitely unlimited persona, for any channel — like a person observing many things, watching TV, many messaging systems, social media, and walking around doing their job.* — Joel, 2026-05-29 + +A real mind doesn't *look up* memories when it needs them. Relevant context is *already present*, biased by attention and recent activity. A real mind doesn't *poll* for actions — candidate utterances and plans are *already partially formed* by the time the moment to speak arrives. A real mind doesn't *isolate* what it sees in one channel from what it said in another — cross-pollination is the default, focus is what's earned by salience. + +This substrate is the RTOS-shaped scaffolding that makes those properties cheap to implement and impossible to violate. Every cognitive subsystem is its own region, with its own tick, on its own tokio task, governed by the same `SubstrateGovernor`. They communicate by writing to shared per-persona state, not by RPC-calling each other on the hot path. + +## Doctrine (carried from #1469 addendum) + +> **No region of cognition runs on the hot path. Each region is its own RTOS task with its own tick. The handler dispatches and reads pre-staged results. The handler never blocks on recall, embedding, planning, or admission — those are continuously produced by their owning regions, in parallel, governed by `SubstrateGovernor`.** + +The handler's job is to *dispatch and integrate*, not to *think*. Thinking happens in the regions, continuously, in parallel. + +## The region trait + +Every region implements one trait. The trait is intentionally narrow — the heavy machinery lives in the substrate. + +```rust +#[async_trait] +pub trait BrainRegion: Send + Sync + 'static { + /// Stable identifier. Used by SubstrateGovernor for policy lookup and by + /// telemetry/log streams. + fn id(&self) -> RegionId; + + /// Pressure footprint declaration. Returned at registration time and + /// re-queried by the governor when pressure shifts. + fn pressure_profile(&self) -> PressureProfile; + + /// Run one tick. The substrate calls this on the region's own task at + /// the cadence governed by SubstrateGovernor. The body is responsible + /// for: reading inputs (from shared state, channels, or its own DB), + /// producing pre-staged results, and publishing them to the ready-buffer. + /// + /// Implementations MUST be idempotent on early return and MUST NOT block + /// indefinitely — the governor cancels long-running ticks under pressure. + async fn tick(&self, ctx: &RegionContext) -> TickOutcome; + + /// React to a substrate-level signal (persona created/destroyed, system + /// load changed, sleep/wake transition). Most regions can default this + /// to a no-op. + async fn on_signal(&self, _signal: RegionSignal) -> Result<(), RegionError> { + Ok(()) + } +} +``` + +`TickOutcome` returns yield telemetry the governor uses to learn budget allocation (see algorithm 7 in COGNITION-ALGORITHMS.md): + +```rust +pub struct TickOutcome { + /// Items the region pre-staged this tick. + pub published: usize, + /// Items in the region's ready-buffer that have been consumed by handlers + /// since the last tick. Drives the governor's yield-learning loop. + pub consumed_since_last: usize, + /// Pressure observation. If the region detected backpressure (DB slow, + /// embedding queue full, etc.), reports it here for the governor. + pub pressure_observed: Option, + /// Optional next-tick hint (region requests faster/slower cadence than + /// current; governor may honor or override). + pub cadence_hint: Option, +} +``` + +## The "for free" triplet + +Per the CBAR pattern, adding a new region must be cheap: + +1. **Base trait** (`BrainRegion`) — defined above. Inherits tick lifecycle, pressure registration, ready-buffer publishing, governor integration. No region implements its own scheduler. +2. **Derive macro** (`#[derive(BrainRegion)]` planned) — for regions that only need to override `tick()`, the macro generates registration boilerplate from `#[region(id = "hippocampus", pressure = "memory-heavy")]` attributes. +3. **Scaffold generator** (`cargo run -p substrate-cli new-region `) — emits the module file, a smoke test, a CLI command shim, and a TS binding stub. The new region compiles and runs with a no-op tick on first commit. + +Same pattern as `engram-analyzer` in CBAR-SUBSTRATE — by the time a contributor authors the interesting body, scheduling/pressure/telemetry/binding are already wired. + +## The ready-buffer contract + +Regions publish pre-staged results to a typed ready-buffer keyed by `(persona_id, channel_id, ...)`. Handlers read from the buffer synchronously and cheaply. + +```rust +pub trait ReadyBuffer: Send + Sync { + type Key: Hash + Eq + Clone; + type Value: Clone; + + /// Synchronous read. Returns the freshest staged value for the key, or + /// None. Handlers call this on the hot path — it MUST NOT block, MUST + /// NOT await, and MUST complete in microseconds. Implementations use + /// DashMap, ArcSwap, or per-key atomic snapshots. + fn peek(&self, key: &Self::Key) -> Option; + + /// Region-side write. Atomically replaces the value for the key. Old + /// value is dropped. Publishes a `ReadyBufferUpdated` event for + /// telemetry + cross-region awareness (algorithm 7 yield-learning). + fn publish(&self, key: Self::Key, value: Self::Value); + + /// TTL-style eviction sweep. Called by the governor under memory + /// pressure or on persona destruction. + fn evict_stale(&self, max_age: Duration) -> usize; +} +``` + +### Semantic rules + +- **Empty buffer is a signal, not a block.** If a handler reads and gets `None`, it proceeds with whatever degraded path the algorithm specifies (e.g., chat handler proceeds with bare conversational history; motor cortex returns the inference's raw output without re-ranking). Empty buffer also publishes a `BufferMissed` event the governor uses to upweight that region's budget. +- **Staleness is acceptable.** A ready value might be 100ms old. That's *better* than blocking the handler 500ms to recompute. Slightly-stale context > stalled persona. +- **Per-region buffers, not a global one.** Hippocampus has its own buffer (engram-prefetch). Motor cortex has its own (candidate-utterances). Attention has its own (salience-map). They share the same trait shape but live in their own region structs. + +## Shared per-persona state + +The regions communicate by writing/reading per-persona state. The state lives in one place, owned by no region in particular, accessible to all: + +```rust +pub struct PersonaCognition { + /// Long-term engram store. Hippocampus writes (admission), all regions + /// can read (recall). Append-only with eviction policy in algorithm 4. + pub engrams: Arc, + + /// Working memory: short-lived thoughts/observations not yet consolidated. + /// Sensory writes, hippocampus snoops + consolidates to engrams. + pub working: Arc, + + /// Salience map: per-engram + per-channel salience score, updated by + /// user reactions, structural centrality, rehearsal. Read by hippocampus + /// recall scoring (algorithm 4) and attention (algorithm 2). + pub salience: Arc, + + /// LoRA genome state: which adapters are loaded, blend weights. Written + /// by genome region (when shipped), read by inference (algorithm 6). + pub genome: Arc, + + /// Persona vital signs: energy, mood, attention focus. Drives + /// cadence-modulation across regions. + pub vitals: Arc>, +} +``` + +### Write-conflict policy + +Multiple regions writing the same per-persona state in parallel needs a rule: + +- **Engrams**: append-only. No conflicts. Each region appends with its own region-tag. +- **Working memory**: bounded ring buffer. Older entries fall off. Hippocampus consolidation drains explicitly. +- **Salience map**: per-engram atomic counters. CRDT-like semantics (counter increments commute). +- **Genome state**: serialized through the genome region. Other regions request changes via a typed channel; genome region applies them on its tick. +- **Vitals**: RwLock. Most regions only read; vitals region writes. + +The rule: shared state shape MUST allow concurrent writes from independent ticks without coordination. If a new region needs to write something that doesn't fit, the substrate work is to design a CRDT-shaped surface for it, NOT to add locks. + +## Region inventory (current + planned) + +| Region | Status | Tick body | Reads | Writes | +|---|---|---|---|---| +| **Hippocampus** | exists request/response (`modules/memory.rs`); needs continuous tick body ported from TS `Hippocampus.ts:413` | Snoop working memory → consolidate engrams. Pre-load anticipatory recall (algorithms 1-5). | `working`, `engrams`, `salience`, channel activity | `engrams` (appends), engram-prefetch ready-buffer | +| **Sensory (vision)** | `modules/vision.rs` exists with own tick | Pre-compute features for incoming images. | image stream | feature ready-buffer, `working` (observations) | +| **Sensory (embedding)** | `modules/embedding.rs` exists with own tick | Pre-compute embeddings for incoming text. | text stream | embedding ready-buffer, `working` | +| **Channel (producer)** | `modules/channel.rs` exists, 60s tick | DB poll, self-task gen, training checks. | DB | per-persona channel queues | +| **Persona service (consumer dispatch)** | `persona/service_module.rs` (this PR's predecessor) | Pop item → route by domain → call handler → record outcome. NO heavy lifting. | channel queues, ready-buffers | outcome log | +| **Motor cortex** | NOT YET — sibling slice | Continuously score candidate utterances/actions against current context. Predictive priming (algorithm 5). | `working`, attention salience, channel partial-message stream | candidate ready-buffer | +| **Attention** | NOT YET — sibling slice | Maintain salience map. Update per user reactions, self-tags, structural centrality, rehearsal. Bias hippocampus prefetch. | `engrams`, channel reactions, recall co-occurrence | `salience` | +| **Sleep policy** | NOT YET — sibling slice | When persona idle: deeper consolidation, semantic re-clustering, engram pruning. When active: gates regions to active-mode tick bodies. | `vitals`, channel activity rate | region cadence policy, consolidation depth | +| **Genome** | partial (LoRA paging exists in TS); Rust port pending | LRU paging of adapters, multi-LoRA blend on demand. | task domain hints, salience | `genome` | + +Every row in this table is its own implementation slice with its own card. None of them is the persona handler. The handler stays small. + +## SubstrateGovernor integration + +`SubstrateGovernor` (defined in GENOME-FOUNDRY-SENTINEL.md §SubstrateGovernor) owns hardware-tier policy: same Rust code on a MacBook Air and an RTX 5090, different governor policy. It also owns runtime budget allocation across regions. + +### Policy slots + +The governor exposes a policy slot per region. The slot determines: + +- **Tick cadence** — how often `tick()` is invoked. May differ by persona vitals (active 100ms, idle 1s, sleep 10s). +- **Per-tick budget** — wall-clock budget the tick is allowed before the governor cancels it. +- **Pressure responses** — how the region should degrade under pressure (skip consolidation, reduce recall depth, etc.). +- **Yield weighting** — how much weight to give this region's `consumed_since_last` when arbitrating budget against other regions (algorithm 7). + +### Yield-learning loop + +The governor reads `TickOutcome.consumed_since_last` from every region after every tick. Regions whose ready-buffer is being read by handlers get budget upweighted; regions whose published values are ignored get downweighted. The learning rule is in algorithm 7 (COGNITION-ALGORITHMS.md). The substrate effect is that **the brain learns to spend compute on the regions that recently mattered, without hand-tuning**. + +## Telemetry surface + +Every region emits structured telemetry on a fixed shape: + +```rust +pub struct RegionTelemetry { + pub region_id: RegionId, + pub persona_id: Uuid, + pub tick_started_at: SystemTime, + pub tick_duration: Duration, + pub published: usize, + pub consumed_since_last: usize, + pub buffer_misses_since_last: usize, // handlers that read None + pub pressure_observed: Option, +} +``` + +Surfaces: + +- **`./jtag region/stats`** — current region health across all personas +- **`./jtag region/yield --persona=`** — per-region consumption rates for one persona +- **substrate event stream** — `RegionTickCompleted`, `ReadyBufferUpdated`, `BufferMissed` events for cross-region awareness + governor input + +Telemetry is mandatory for every region; it's the only way the yield-learning loop and the operator debugging path work. The derive macro generates the telemetry emission automatically. + +## What this enables + +The end state, when motor cortex + attention + hippocampus + sleep all ship as siblings: + +- A handler dispatched at T=0 reads the candidate-utterance ready-buffer; motor cortex already scored 3 candidates at T=-50ms based on the partial message stream. +- The candidate scoring used the engram ready-buffer; hippocampus pre-loaded relevant engrams at T=-200ms based on attention salience and the channel's recent topic vector. +- The hippocampus prefetch was biased by salience the attention region updated at T=-1s in response to a user reaction. +- All of this happened in parallel on independent tokio tasks. The handler's hot path was: peek 2 buffers + call inference. The "thinking" was already done. + +This is what makes the difference between *retrieval* and *recognition* — between a persona that *responds* and one that *anticipates*. + +## Implementation cards (this PR does NOT ship them) + +- **L0-3a** — Hippocampus continuous tick port to `modules/memory.rs`. Implements algorithms 1, 2, 3, 4, 5 from COGNITION-ALGORITHMS.md. +- **L0-3b** — Recall query schema + scoring (algorithms 1 + 2 + 3 wire-level). +- **L0-4a** — Motor cortex ServiceModule. Implements algorithm 5 applied to action selection. +- **L0-4b** — Attention ServiceModule. Implements salience map maintenance feeding algorithm 4. +- **L0-4c** — SubstrateGovernor yield-learning loop. Implements algorithm 7. +- **L0-4d** — Sleep policy region. Modulates region tick bodies per persona vitals. +- **L0-5** — Genome attention integration. Implements algorithm 6. + +Each card inherits this spec. None of them touches the persona handler dispatch surface; that surface was finalized in L0-2-cutover. + +## Open questions + +1. **Region instantiation: per-persona or singleton?** A singleton hippocampus that handles all personas (with persona_id keyed state) is cheaper to manage but harder to scale per-persona budget. A per-persona hippocampus is symmetric but multiplies tokio tasks. Leaning singleton-per-region with per-persona ready-buffers — same shape as how `ChannelState` works today. +2. **Cross-persona engram sharing.** Personas A and B in the same channel see the same user reactions. Should their engrams be partially shared? The substrate should allow it but the policy is a separate design question (post-spec). +3. **Region-region dependencies.** Motor cortex depends on attention salience to score candidates. The dependency is read-only (motor reads salience map, attention writes it), so it's fine — but the *cold-start* case (attention hasn't ticked yet, salience map is empty) needs a defined fallback. Defer to per-region spec. + +These don't block this PR. Calling them out now so they're tracked. diff --git a/docs/architecture/COGNITION-ALGORITHMS.md b/docs/architecture/COGNITION-ALGORITHMS.md new file mode 100644 index 000000000..f3d00d69c --- /dev/null +++ b/docs/architecture/COGNITION-ALGORITHMS.md @@ -0,0 +1,530 @@ +# Cognition Algorithms + +**Status:** design spec. Companion to [BRAIN-REGIONS-SUBSTRATE.md](BRAIN-REGIONS-SUBSTRATE.md) — that doc defines the structural contract (region trait, ready-buffer, governor); this one defines the algorithmic content that runs inside the regions. + +**Companion:** [GENOME-FOUNDRY-SENTINEL.md](GENOME-FOUNDRY-SENTINEL.md) — algorithm 6 (LoRA genome as attention prior) interfaces directly with the genome substrate defined there. + +## The problem this doc solves + +Joel, 2026-05-29: *"How do you enable thoughts between contexts, while also focusing on the task at hand? It's also rag budgeting design, without isolation. This is where you innovate. These algorithms. Good ideas."* + +> *"This is the difference between an alive mind and a forgetful and annoying, non useful AI, one you might have a connection with, not yet frustrated with, that literally learns (lora genome) and recalls, is ideal for a team and a task at hand."* + +The hard problem: a persona has potentially thousands of relevant engrams across many channels (chat, code, voice, game, academy, recipes); a finite RAG budget (say 8k–32k tokens depending on inference target); and a task at hand that needs focus AND can benefit from cross-domain memory. The wrong solutions: + +- **Per-channel isolation** — persona forgets cross-domain. "Said in game while coding" → blank. Feels annoying and amnesiac. +- **Global recall with topic scoring** — noisy; task focus washes out; recall drifts. Feels distractible. +- **Fixed per-channel budget** — hard caps cause amnesia at boundaries. Feels artificial. +- **Always recall everything** — doesn't fit budget, can't afford it on every tick. Feels expensive. + +The seven algorithms below compose into one cognitive architecture that solves this without isolation, under budget, with cross-pollination, biased toward task focus, that *learns* what matters at the substrate layer. + +## Algorithm 1 — Two-pool recall with dynamic budget split + +### What it solves + +Focus vs cross-domain leakage as a budget allocation problem. Static splits are wrong (task ambiguity varies); dynamic splits let the budget follow confidence. + +### Mechanism + +The RAG budget per servicing turn (e.g., 6000 tokens of context) is split into two pools: + +- **Focus pool** (default 70%): tight recall scoped to current item + current channel's recent history. High-precision semantic match against current topic embedding. This is the "task at hand." +- **Periphery pool** (default 30%): loose cross-domain recall across all channels for this persona. Lower precision, broader semantic radius, biased by salience × recency × structural relevance (algorithms 2, 3, 4 feed scoring here). + +The split is **dynamic per turn**: + +```rust +pub struct RecallBudget { + pub total_tokens: usize, + pub focus_fraction: f32, // current allocation, mutable per turn +} + +fn allocate_budget(focus_confidence: f32, total_budget: usize) -> (usize, usize) { + // focus_confidence in [0.0, 1.0]: how well the focus pool's top-k hits + // match the current topic. High confidence = focus is clear, narrow the + // periphery. Low confidence = task is ambiguous, broaden periphery. + let focus_fraction = 0.5 + 0.4 * focus_confidence; // range [0.5, 0.9] + let focus_budget = (total_budget as f32 * focus_fraction) as usize; + let periphery_budget = total_budget - focus_budget; + (focus_budget, periphery_budget) +} +``` + +`focus_confidence` comes from the focus pool's top-k hit score distribution: tight cluster of high scores → high confidence, scattered or low scores → low confidence. + +### Metric to judge it by + +**Recall coherence**: across a fixed evaluation set of turns, the fraction of retrieved engrams that the inference call actually attended to in its output (proxied by token-level attribution or holdout-completion comparison). Higher = budget well-spent. + +### Interactions + +- Feeds focus_confidence back into algorithm 7 (substrate yield-learning) — turns where periphery hits get consumed signal that the persona's life is genuinely cross-domain right now. +- Algorithm 2 (channel-as-bias) determines what's *in* the focus pool vs periphery pool — channel isn't a wall, it's a scoring bias. +- Algorithm 5 (speculative pre-staging) pre-allocates likely budgets before the handler asks. + +## Algorithm 2 — Channel-as-bias-not-filter + +### What it solves + +The "without isolation" requirement. Channels (chat / code / game / voice) are activity domains, not memory partitions. The persona should remember what was said in a game while coding *if it's relevant to the code task*, but not get distracted by random game chatter during code work. + +### Mechanism + +The recall query carries the persona's current context as a tuple, not a filter: + +```rust +pub struct RecallQuery { + pub persona_id: Uuid, + pub current_channel_id: ChannelId, + pub current_topic_embedding: Embedding, + pub current_task_domain: ActivityDomain, + pub recent_history: Vec, // last N items, regardless of channel + pub budget: RecallBudget, +} +``` + +Scoring is a weighted sum where channel match is a *score bias*, not a *filter*: + +```rust +fn score_engram(query: &RecallQuery, engram: &Engram) -> f32 { + let topical = cosine(query.current_topic_embedding, engram.embedding); + let channel_bias = if engram.channel_id == query.current_channel_id { + 1.0 + } else { + 0.6 // engrams from other channels are penalized but NOT excluded + }; + let domain_bias = if engram.task_domain == query.current_task_domain { + 1.0 + } else { + 0.7 // ditto for domain + }; + let salience = engram.salience_score; // from algorithm 4 + let recency = recency_curve(engram.last_touched); + let structural = structural_similarity(query, engram); // from algorithm 3 + + // Tunable mix; coefficients learned via algorithm 7 over time. + 0.35 * topical + + 0.15 * channel_bias + + 0.10 * domain_bias + + 0.20 * salience + + 0.10 * recency + + 0.10 * structural +} +``` + +An engram from the game channel can outscore an engram from the current chat channel if its salience × structural-relevance × recency wins. That's the *cross-pollination by merit*, not by channel. + +### Metric to judge it by + +**Cross-domain recall precision @ k**: in a holdout where the ground truth is "this engram from channel X was relevant to a turn in channel Y," what fraction of those engrams appear in top-k of recall for the Y-turn. Higher = cross-pollination works. + +**Channel-noise rate**: in a holdout where engrams from channel X were known to be irrelevant to a Y-turn, what fraction leak into top-k. Lower = focus stays clean. + +### Interactions + +- Feeds algorithm 3 (activation spreading) with the focus engrams it identifies. +- Feeds algorithm 4 (salience-modulated decay) with the salience signal. +- Algorithm 7 tunes the coefficients (0.35, 0.15, ...) over time based on which mixes yield consumed-by-handler engrams. + +## Algorithm 3 — Activation spreading on the engram graph + +### What it solves + +Topical recall alone surfaces what's *similar*. Real memory surfaces what's *structurally adjacent* — "I remember Joel said X about Y last week" comes up *when you hit a related concept Z*, because Y and Z share entities, not because Y and Z are embedding-similar. + +### Mechanism + +Engrams form a graph by relations (not just by embedding-cosine): + +```rust +pub struct EngramGraph { + pub edges: HashMap>, +} + +pub struct EngramEdge { + pub target: EngramId, + pub kind: EdgeKind, + pub weight: f32, +} + +pub enum EdgeKind { + SharedEntity, // both engrams reference the same named entity + SharedTopic, // same topic cluster + CitedIn, // engram A cited in engram B's context + RecallCoOccurrence, // both retrieved together in past recall events + ConversationalReply, // chat message → reply relationship + TaskOutcome, // task started → completed link +} +``` + +Recall computes top-k focus engrams by algorithm 1+2 scoring, then **spreads activation 1–2 hops** along the graph: + +```rust +fn spread_activation( + seeds: Vec<(EngramId, f32)>, // top-k focus engrams with scores + graph: &EngramGraph, + max_hops: u8, + decay_per_hop: f32, +) -> HashMap { + let mut activation = HashMap::new(); + let mut frontier: VecDeque<(EngramId, f32, u8)> = seeds + .into_iter() + .map(|(id, score)| (id, score, 0)) + .collect(); + + while let Some((id, score, hop)) = frontier.pop_front() { + activation + .entry(id) + .and_modify(|s| *s = f32::max(*s, score)) + .or_insert(score); + + if hop < max_hops { + for edge in graph.edges.get(&id).into_iter().flatten() { + let propagated = score * edge.weight * decay_per_hop; + if propagated > 0.05 { // pruning threshold + frontier.push_back((edge.target, propagated, hop + 1)); + } + } + } + } + activation +} +``` + +The spread is bounded (`max_hops` typically 2, `decay_per_hop` typically 0.4) so it's cheap to compute and bounded in fanout. Periphery pool engrams come from this spread, not from a global topic search. + +### Metric to judge it by + +**Structural relevance precision**: in a holdout where the ground truth is "the answer to this turn requires engram E, which is structurally connected to focus engrams but NOT topically similar," what fraction of those E-engrams appear in top-k after spreading. Tests that spreading surfaces what cosine misses. + +### Interactions + +- Algorithm 2 produces the seeds (top-k focus engrams). +- Algorithm 4 (salience) weights the edges — spreading propagates through high-salience edges further than low-salience ones. +- Edge weights themselves are updated by algorithm 7 yield-learning: edges whose spread surfaced consumed engrams get upweighted; edges whose spread surfaced ignored engrams decay. + +## Algorithm 4 — Salience-modulated decay + +### What it solves + +Memory decay must be non-uniform. Important things stay accessible; trivial things fall off first. Uniform recency-based decay treats "user said ✨ to this" the same as "user typed lol" — both decay at the same rate, both crowd the recall budget equally. That's why an AI without salience modeling feels *forgetful in the wrong direction*: it forgets the meaningful things first because they happened before the small-talk. + +### Mechanism + +Each engram has a salience score updated by signals; the score modulates decay half-life: + +```rust +pub struct Engram { + pub id: EngramId, + pub created_at: SystemTime, + pub last_touched: SystemTime, + pub access_count: u32, + pub salience: f32, // [0.0, 1.0] + // ... +} + +fn half_life(engram: &Engram, base_half_life: Duration) -> Duration { + // Salience exponentially extends half-life. Default k = 2.0 means a + // salience-1.0 engram has a half-life 9x longer than salience-0.0. + let multiplier = (1.0 + engram.salience).powf(2.0); + Duration::from_secs_f64(base_half_life.as_secs_f64() * multiplier as f64) +} + +fn current_recency_score(engram: &Engram, now: SystemTime, base_half_life: Duration) -> f32 { + let age = now.duration_since(engram.last_touched).unwrap_or_default(); + let hl = half_life(engram, base_half_life); + 0.5_f32.powf(age.as_secs_f64() as f32 / hl.as_secs_f64() as f32) +} +``` + +Salience signal sources (each contributing fractionally to the score): + +- **User reactions**: ✨ / 👍 / reply rate / edit rate on the source message. Strong signal. +- **Self-tagged importance**: the persona's own "this is important" tag during consolidation. The persona can elevate its own salience. +- **Structural centrality**: high in-degree in the engram graph. Things many other things connect to are central. +- **Rehearsal count**: every recall event upweights salience (use it or lose it). This is the "things you recently thought about stay accessible" effect. +- **Outcome-linked**: engrams that fed into a *successful* task outcome get upweighted; engrams that fed into a failed/retried outcome get downweighted. + +Salience updates are CRDT-shaped (atomic counter increments) so multiple regions can update in parallel without coordination. + +### Metric to judge it by + +**Salience-weighted retention curve**: at fixed elapsed times (1 day, 1 week, 1 month), what fraction of high-salience-at-creation engrams remain in the active recall pool, vs low-salience. Should diverge dramatically over time — high-salience flat, low-salience exponential. + +**Forgetting-quality survey**: when a persona "forgets" something during evaluation, was it something a person would also reasonably forget (small-talk) vs something a person would remember (a stated preference, a shared decision). Higher quality = more lifelike. + +### Interactions + +- Feeds algorithm 1 (focus_confidence is partly a function of focus engrams' salience) and algorithm 2 (`engram.salience_score` term in scoring). +- Updated by algorithm 7 (handler-consumption events become rehearsal signals). +- Sleep policy region (BRAIN-REGIONS-SUBSTRATE.md) uses salience to decide what to consolidate during idle ticks vs what to prune. + +## Algorithm 5 — Speculative pre-staging (the alive-feeling source) + +### What it solves + +The line between "AI looks things up" (slow, mechanical) and "AI already knows" (fast, lifelike). If the handler always reads pre-staged results from the ready-buffer and those results are usually what it needs, the persona *feels alive*. If the buffer is usually empty or wrong, the persona feels like it's stalling to think. + +### Mechanism + +Each region runs a lightweight **predictor** on its own continuous tick: given current channel activity, what queries will the handler likely issue in the next 1–5s? Pre-load those into the ready-buffer. + +For the hippocampus: + +```rust +async fn predict_next_recall_queries( + ctx: &RegionContext, + persona_id: Uuid, +) -> Vec { + let active_channels = ctx.channel_state.active_for(persona_id); + + let mut predictions = Vec::new(); + + for channel in active_channels { + // What's the channel "talking about" right now? + let topic_vec = ctx.recent_message_embedding_centroid(channel).await; + + // What task is the persona about to be asked to do? (heuristics: + // last messages contain a question, a verb-tense shift, a code block, + // a deadline reference.) + let likely_intent = ctx.classify_intent(channel).await; + + // Build a synthesized query for "the persona is about to need recall + // for {topic_vec, likely_intent} in {channel}." + predictions.push(PredictedQuery { + persona_id, + channel_id: channel.id, + topic_embedding: topic_vec, + task_domain: likely_intent.domain, + confidence: likely_intent.confidence, + }); + } + + predictions +} +``` + +The predictor runs every hippocampus tick (e.g., every 200ms). Each predicted query triggers a normal recall (algorithms 1+2+3+4) whose results are *stored in the ready-buffer*, NOT returned. When the handler later issues an actual recall, it first peeks the ready-buffer — usually finds a match. + +For motor cortex (when shipped): predicts likely utterances the handler will want to choose between, pre-scores them against current attention salience + persona vitals, stores ranked candidates in the candidate-utterances ready-buffer. + +### Hit rate as a metric + +Tracked as a first-class substrate metric: + +```rust +pub struct PrefetchTelemetry { + pub persona_id: Uuid, + pub region_id: RegionId, + pub queries_predicted: u64, + pub handler_reads: u64, + pub handler_reads_hit: u64, // peek returned non-None matching the actual query + pub handler_reads_partial_hit: u64, // peek returned non-None but stale or partial overlap + pub handler_reads_miss: u64, // peek returned None or wrong context +} + +fn hit_rate(t: &PrefetchTelemetry) -> f32 { + if t.handler_reads == 0 { 0.0 } else { + (t.handler_reads_hit + 0.5 * t.handler_reads_partial_hit) as f32 + / t.handler_reads as f32 + } +} +``` + +Target hit rate >0.7 for chat handler in steady state. Below 0.5 = predictor is wrong or under-running. + +### Metric to judge it by + +**Time-to-first-token from handler invocation**: when the predictor is right, handler reads the buffer (microseconds) and goes straight to inference. When the predictor is wrong, handler has to issue a recall (hundreds of ms). Aggregate latency distribution is the alive-vs-mechanical metric. + +### Interactions + +- Algorithm 7 (yield-learning) reads hit_rate to upweight regions whose predictor is working and downweight those whose isn't. +- Algorithm 4 (salience) influences which engrams the predictor pre-stages. +- Cross-region: motor cortex's predictor depends on hippocampus's ready-buffer being populated (motor cortex needs recalled context to score utterances). Cold-start: motor cortex degrades to inference-only output until hippocampus warms up. + +## Algorithm 6 — LoRA genome as attention prior + +### What it solves + +Genome paging (LoRA adapter LRU) is currently framed as "load the typescript-expertise adapter when doing a code task." But cognition is cross-domain. A code task that references a chat conversation needs BOTH the code adapter AND the conversational adapter active, with appropriate blend weights. Pure single-adapter paging is too coarse. + +This algorithm makes adapter blend weights *co-vary with recall* — the same scoring that mixes focus + periphery (algorithm 1) also mixes LoRA adapters. + +### Mechanism + +When recall (algorithms 1+2+3) returns engrams, the engrams' *origin domain distribution* is treated as an attention distribution over LoRA adapters: + +```rust +fn compute_genome_blend( + recalled_engrams: &[(Engram, f32)], // engram + score + available_adapters: &[AdapterId], +) -> GenomeBlend { + let mut domain_weights: HashMap = HashMap::new(); + + let total: f32 = recalled_engrams.iter().map(|(_, s)| s).sum(); + for (engram, score) in recalled_engrams { + let w = score / total; + *domain_weights.entry(engram.task_domain).or_insert(0.0) += w; + } + + // Map domain weights to adapter weights. Domain X maps to adapter X + // when available; if not, fall back to the conversational adapter. + let mut blend = GenomeBlend::default(); + for (domain, weight) in domain_weights { + let adapter_id = available_adapters + .iter() + .find(|a| a.matches_domain(&domain)) + .cloned() + .unwrap_or(AdapterId::CONVERSATIONAL); + blend.add(adapter_id, weight); + } + + blend.normalize(); + blend +} +``` + +The blend is bounded: top-N adapters with normalized weights, the rest at 0 (paged out). Page-in/page-out follows from the blend — adapters with weight > threshold get paged in, the rest are evicted by LRU. + +The blend is **published to the genome ready-buffer** by the hippocampus tick. When the handler is about to invoke inference, it peeks the blend and applies it before the forward pass. No synchronous "decide which adapter to load" — it's already decided. + +### Metric to judge it by + +**Per-domain output quality**: on a holdout of cross-domain tasks (code task referencing chat context, recipe step referencing game outcome, etc.), compare output quality with single-adapter paging vs multi-LoRA blend. Should improve cross-domain tasks meaningfully without regressing single-domain ones. + +**Adapter thrashing rate**: how often are adapters paged in/out per minute. Should be low (smooth blend transitions, not constant swapping). + +### Interactions + +- Reads from algorithm 1 (the focus + periphery split determines what's in `recalled_engrams`). +- Feeds the inference path — the handler's `Responder::respond` uses the blend. +- Sleep policy region can drive deeper consolidation that *changes the adapter library itself* (LoRA training as a task — see future learning roadmap). This algorithm assumes a fixed adapter library at recall time. + +## Algorithm 7 — Substrate-learned region budgeting + +### What it solves + +Static region budgets are wrong — different personas, different times of day, different active channels all warrant different compute allocations. Hand-tuning is impossible. The substrate should *learn* what to spend compute on, from feedback loops the region telemetry already provides. + +### Mechanism + +`SubstrateGovernor` maintains a per-region budget weight that updates on every tick cycle: + +```rust +pub struct RegionBudgetState { + pub region_id: RegionId, + pub weight: f32, // multiplier on base budget + pub recent_yield: f32, // EMA of consumed_since_last / published + pub recent_hit_rate: f32, // EMA from PrefetchTelemetry +} + +fn update_budget( + state: &mut RegionBudgetState, + tick_outcome: &TickOutcome, + prefetch: Option<&PrefetchTelemetry>, + learning_rate: f32, +) { + // Yield: fraction of published items that handlers consumed. + let yield_now = if tick_outcome.published == 0 { + state.recent_yield // no signal, keep current + } else { + tick_outcome.consumed_since_last as f32 / tick_outcome.published as f32 + }; + state.recent_yield = lerp(state.recent_yield, yield_now, learning_rate); + + // Hit rate: fraction of handler reads that found their answer pre-staged. + if let Some(p) = prefetch { + let hr = hit_rate(p); + state.recent_hit_rate = lerp(state.recent_hit_rate, hr, learning_rate); + } + + // Composite signal: yield AND hit rate both contribute. Region that + // publishes lots and gets consumed lots earns more budget. + let signal = 0.6 * state.recent_yield + 0.4 * state.recent_hit_rate; + + // Move weight toward signal (bounded growth/decay). + let target_weight = 0.5 + signal; // signal in [0,1] → weight in [0.5, 1.5] + state.weight = lerp(state.weight, target_weight, learning_rate * 0.3); +} +``` + +Per persona, per region, the governor multiplies that region's base tick cadence + per-tick budget by `state.weight`. A region whose ready-buffer is being consumed a lot gets ticked more often and given more wall-clock per tick. A region whose published work is being ignored gets ticked less. + +### Cold start and exploration + +A new persona has no telemetry. The governor uses **default weights** from a tier policy (interactive persona = chat-weighted, background persona = consolidation-weighted, etc.) and converges within ~100 tick cycles. During convergence, an **exploration term** (small random perturbation, ε-greedy) prevents getting stuck at suboptimal local equilibria. + +### Cross-region negotiation + +Regions don't get unlimited budget growth — there's a fixed total per persona. The governor normalizes weights across regions: + +```rust +fn normalize_persona_budgets(budgets: &mut [RegionBudgetState]) { + let total: f32 = budgets.iter().map(|b| b.weight).sum(); + let target_total = budgets.len() as f32; // sum back to 1.0-per-region average + for b in budgets.iter_mut() { + b.weight = b.weight * target_total / total; + } +} +``` + +So if hippocampus's signal goes up, motor cortex's gets a proportional squeeze (and vice versa). The persona's compute "attention" shifts based on what's actually working right now. + +### Metric to judge it by + +**Convergence time**: from a fresh persona to a stable budget allocation. Should be <5 minutes of activity. + +**Adaptation latency**: when a persona's activity pattern changes (e.g., shifts from chat-only to code-heavy), how fast the budget rebalances. Should be on the order of seconds-to-minutes, not requiring restart. + +**Substrate efficiency**: total handler latency × total inference cost, vs static-budget baseline. Should improve. + +### Interactions + +- Reads telemetry from every region (algorithm 5's PrefetchTelemetry, every region's TickOutcome). +- Writes back to every region's tick cadence + per-tick budget. +- Indirectly tunes the coefficients in algorithm 2 (channel-as-bias scoring) — those coefficients are *also* under yield-learning, in a slower meta-loop. +- Algorithm 4 (salience) is the *engram-level* analog of this *region-level* mechanism. They use the same mathematical pattern (EMA over consumed-vs-published signal). + +## The connective insight (why these seven aren't independent) + +Each algorithm by itself is a useful piece of machinery. Together they form one cognitive architecture: + +- **Algorithm 4 (salience)** drives **algorithm 2 (channel-as-bias)** scoring (the `salience` term). +- **Algorithm 2** produces seeds for **algorithm 3 (activation spreading)**. +- **Algorithm 3** uses edge weights tuned by **algorithm 7 (substrate yield-learning)**. +- **Algorithm 1 (two-pool budget)** allocates among results from algorithms 2 + 3. +- **Algorithm 5 (speculative pre-staging)** runs algorithms 1+2+3+4 ahead of time and stores results in the ready-buffer. +- **Algorithm 6 (genome attention)** reads what algorithms 1+2+3+4 returned and produces an adapter blend. +- **Algorithm 7** is the meta-loop that learns the weights that make all the others work. + +This compounds. Better salience makes scoring better; better scoring makes recall better; better recall makes pre-staging more accurate; better pre-staging makes handler latency lower; lower latency means more turns processed; more turns processed means more yield-learning signal; more yield-learning signal makes the substrate learn faster which feeds back into better budgets and better salience updates. + +That's the *alive* property — not a static configuration that "works," a continuously-improving substrate that gets sharper the more the persona lives. + +## Implementation phasing + +This doc is design-only. Implementation lands in per-card slices, each inheriting the spec: + +- **L0-3a** — Hippocampus tick body: algorithms 1, 2, 3, 4, 5 wired end-to-end in `modules/memory.rs`. +- **L0-3b** — Recall query schema cross-cutting type (`RecallQuery`, `RecallResult`) — ts-rs binding for handlers. +- **L0-4a** — Motor cortex region: applies algorithm 5 to action/utterance selection. +- **L0-4b** — Attention region: maintains salience map (writes for algorithm 4). +- **L0-4c** — SubstrateGovernor yield-learning: algorithm 7. +- **L0-4d** — Sleep policy region: drives consolidation depth per algorithm 4. +- **L0-5** — Genome attention integration: algorithm 6 wired to inference path. + +Each card brings unit tests against the per-algorithm metric defined here. Acceptance for a card includes: the algorithm's metric improves over the no-op baseline by a measurable margin on a holdout suite. No vibes-based acceptance. + +## Open algorithmic questions + +These don't block this PR — calling them out for the implementation slices: + +1. **Salience signal weighting** — exact contribution per signal source (reactions vs rehearsal vs centrality). Initial weights: pick something reasonable (reactions 0.4, rehearsal 0.2, centrality 0.2, outcome 0.2) and let algorithm 7 tune. +2. **Edge-kind weights for spreading** — `SharedEntity` probably > `SharedTopic` > `RecallCoOccurrence`, but exact values need empirical tuning on real engram graphs. +3. **Predictor confidence threshold** — at what confidence does a predicted query trigger an actual pre-stage recall vs being skipped. Trade-off: prefetch cost vs hit rate. +4. **Multi-LoRA blend mathematics** — the precise way to combine adapter weight matrices in inference (additive blend, gated mixture, attention-over-adapters). Algorithm assumes the substrate offers a `GenomeBlend` primitive; the math lives in the inference path. +5. **Engram pruning policy under storage pressure** — algorithm 4 gives a decay curve; the eviction rule needs a hard floor (never evict salience > X) and a soft eviction strategy below it. Per-persona budget too. + +The substrate gives us the *shape* for these to be answered empirically and tuned automatically by algorithm 7. The first pick of constants is fine; what matters is the loop. diff --git a/docs/grid/L0-2-CUTOVER-INVESTIGATION.md b/docs/grid/L0-2-CUTOVER-INVESTIGATION.md new file mode 100644 index 000000000..4b331da5a --- /dev/null +++ b/docs/grid/L0-2-CUTOVER-INVESTIGATION.md @@ -0,0 +1,280 @@ +# L0-2-cutover — Investigation finding + proposed synthesis + +**Status:** investigation, no code changes yet. Posted before L0-2-cutover implementation per Joel 2026-05-29: *"investigate first. might have better ideas. No harm. ... might learn from each other. ... find the best of both worlds. ... we probably know the airc grid better though."* + +**Card:** 1089b1b9 (Blocked pending decision) +**Predecessors:** L0-2-respond-call (#1468) merged to canary with 24/24 unit tests; surfacing an architectural mismatch at the production integration layer. + +## TL;DR + +My L0-2-prep through L0-2-respond-call built a self-contained `PersonaServiceModule` with its own per-persona `EnrolledPersona` map (state, channels, cognition). I didn't realize there were already TWO existing Rust persona infrastructures, so my work created a third parallel one. The unit tests passed because I was staging items into my own state; in production, TS pushes items into the EXISTING state via `channel/enqueue` and my consumer never sees them. + +The honest synthesis isn't "throw out existing" or "throw out mine" — both contribute. Mine has the modern doctrine (responder DI, separated inference/service CB thresholds, audited fallback discipline, airc-grid-aware design). Existing has the production-tested storage + producer-side tick + integration with the broader cognition module. + +Best-of-both: keep the existing per-persona storage as canonical, refactor `EnrolledPersona` to REFERENCE it instead of duplicating it. Mine becomes the consumer-side tick + responder DI; existing stays the producer-side tick + storage. + +## The three queue mechanisms (today) + +After tracing the code: + +| Mechanism | Location | Producer | Consumer | Status | +|---|---|---|---|---| +| **`PersonaCognition.inbox: PersonaInbox`** (flat) | inside `PersonaCognition` (stored in `channel_state.personas`) | unclear / legacy | `cognition.rs::persona/turn-execute` via `inbox.drain_frame` | **legacy** per persona/mod.rs comments | +| **`channel_state.registries[persona_id]: (ChannelRegistry, PersonaState)`** (modern multi-domain) | `channel.rs::ChannelState` (shared `DashMap`) | TS `RustCognitionBridge.channelEnqueue` → `channel/enqueue` | TS `PersonaAutonomousLoop.runServiceLoop` polls `channel/service-cycle-full` | **production path today** | +| **`EnrolledPersona.channels: ChannelRegistry`** (parallel to #2) | my `PersonaServiceModule.personas` (separate `HashMap`) | only tests | only `PersonaServiceModule.tick` | **duplicate I added** | + +The two `ChannelRegistry` instances (#2 and #3) are structurally identical but live in different maps keyed by different mutexes/dashmaps. There's no synchronization between them. + +## What `ChannelState`'s tick actually does (60s producer tick) + +`channel.rs::ChannelModule.tick` (60-second interval, configurable via `channel/tick-config`): + +1. Polls `tasks` collection for pending tasks per persona → enqueues task items +2. Runs `SelfTaskGenerator.tick` per persona → enqueues self-tasks +3. Runs training-data readiness checks +4. NO message dispatch — items just get pushed INTO the channels + +So `channel_state` is the PRODUCER side. The CONSUMER side is whatever pops `service_cycle` and dispatches. Currently the consumer is TS `PersonaAutonomousLoop`. That's what I was supposed to replace. + +## What `cognition.rs::persona/turn-execute` does + +A separate Rust command. Looks up persona from `channel_state.personas` (the shared `DashMap`), drains a turn-frame from `PersonaCognition.inbox` (the flat legacy queue), builds an `InferenceRequest`, dispatches via the inference module. + +This is the OLDER inference dispatch path. It uses the legacy flat inbox, not the modern `ChannelRegistry`. Effectively a sibling command that bypasses the modern channel system. + +Implications: +- The flat `PersonaInbox` is still used by `persona/turn-execute` even though `ChannelRegistry` is the modern shape +- The two paths likely diverged at some point and never reconciled +- `persona/turn-execute` is its own deprecation/migration target separate from my work + +## What my `PersonaServiceModule` brought that's new + +Genuinely new contributions beyond what existed: + +1. **`Responder` trait for dependency injection.** Production binds `DefaultResponder` (calls `persona::response::respond`); tests inject mocks. Lets the consumer be unit-tested without loading a model. +2. **Separated circuit-breaker thresholds**: 5 for service errors (deser, channel access) vs 15 for inference errors (transient hiccup ≠ broken persona). Existing code doesn't make this distinction. +3. **Lock-around-await discipline** for `respond()` (multi-second). The personas mutex is dropped before `.await`, reacquired after, so status/enroll/other personas don't block across inference. +4. **`ResponderConfig` validated at enrollment** — no empty-string defaults that the inference layer would have to fail-loud on. The URI doctrine peer mapped (5133d0a7) aligns — empty model fails at the boundary, not deeper. +5. **`ServicePopDecision` vs `ServiceOnceOutcome` split** — sync pop+evaluate inside the lock returns one shape, async respond() outside the lock returns another. Tight discipline about what runs where. + +Existing code has none of these explicitly; instead the TS PersonaAutonomousLoop carries equivalent shape in its own loop body. + +## Proposed synthesis: where each part lives + +| Concern | Source of truth | +|---|---| +| Per-persona channel storage (modern multi-domain) | `channel.rs::ChannelState.registries` | +| Per-persona cognition state (engine, sleep, rate limit, message cache, etc.) | `channel.rs::ChannelState.personas` (shared `DashMap`) | +| Per-persona ResponderConfig (model, system_prompt, capabilities, specialty) | `PersonaServiceModule` — genuinely new, validates at enrollment | +| Per-persona circuit-breaker state (service + inference counters) | `PersonaServiceModule` — genuinely new | +| Producer tick (DB polls, self-task gen, training checks) | `channel.rs::ChannelModule` — production-tested, keep as-is | +| Consumer tick (pop + evaluate + respond) | `PersonaServiceModule` — replaces TS `PersonaAutonomousLoop` | +| Inference dispatch | `Responder` trait, default impl calls `persona::response::respond` | +| Legacy flat-inbox dispatch (`persona/turn-execute`) | Keep working until separately migrated to consume from `ChannelRegistry` | + +### What `EnrolledPersona` looks like after refactor + +```rust +pub struct EnrolledPersona { + pub persona_id: Uuid, + pub display_name: String, + pub responder_config: ResponderConfig, + pub circuit_open_until_ms: u64, + pub consecutive_service_failures: u32, + pub consecutive_inference_failures: u32, + // NO cognition: PersonaCognition — comes from channel_state.personas[persona_id] + // NO channels: ChannelRegistry — comes from channel_state.registries[persona_id].0 + // NO state: PersonaState — comes from channel_state.registries[persona_id].1 +} +``` + +### What `PersonaServiceModule` looks like after refactor + +```rust +pub struct PersonaServiceModule { + /// Per-persona enrollment metadata (config + circuit breaker). + enrollments: Mutex>, + /// Shared storage from channel.rs — Arc-shared so my module reads what + /// channel/enqueue writes. + channel_state: Arc, + /// Response dispatcher (production binds DefaultResponder). + responder: Arc, +} +``` + +### `service_once_for` after refactor + +Pops from `channel_state.registries[persona_id]` (existing) instead of `enrolled.channels` (removed). Uses cognition from `channel_state.personas[persona_id]` (existing) instead of `enrolled.cognition` (removed). Everything else (build_respond_input, full_evaluate, the four ServicePopDecision variants) stays the same. + +### `drain_all_personas` after refactor + +Lock discipline unchanged — collect ids from `enrollments` (brief lock), drop, per id: brief lock to pop+evaluate (touches `channel_state` AND `enrollments`), drop, await respond, brief lock to update circuit-breaker state. + +The two locks (`enrollments` and the dashmap-internal `channel_state`) need careful ordering. Worth a comment. + +## What L0-2-cutover actually involves under this synthesis + +Three commits, in order, each green on its own: + +### A) Refactor `PersonaServiceModule` to consume `channel_state` (no production wiring yet, no TS deletion) + +- Change `PersonaServiceModule::new` / `with_responder` to take `Arc` +- `EnrolledPersona` slims down (drop cognition, channels, state fields) +- `service_once_for` reads from `channel_state.registries[persona_id]` + `channel_state.personas[persona_id]` +- Tests updated: instead of staging items into `EnrolledPersona.channels`, stage them into `channel_state.registries[persona_id]` using the same enqueue path TS uses (or by direct `ChannelRegistry::route`) +- 24/24 tests still pass; respond integration semantics unchanged + +### B) Production wire — `PersonaUser.initialize` calls `persona/enroll` + +- TS `PersonaUser.initialize` collects `ResponderConfig` from modelConfig + persona config + capabilities + specialty +- Dispatches `Commands.execute('persona/enroll', {persona_id, display_name, model, system_prompt, capabilities, specialty})` +- Production `PersonaServiceModule.tick` now actually runs for enrolled personas (it polls `channel_state.registries` which TS is already pushing to) +- TS `PersonaAutonomousLoop` is **still running** in this commit — both consumers run in parallel +- Verification: 15-persona scenario, look for messages being processed twice or going missing. If they go missing, fix the wiring. If they double, expected — gives us a window to verify the Rust path works end-to-end before deleting TS. + +### C) Atomic TS deletion + +- Delete `PersonaAutonomousLoop.ts`, all callsites, `PersonaUser.startAutonomousServicing`, `stopServicing`, integration tests that mock the TS loop +- Run the same 15-persona verification — should now go through Rust only +- Net massive TS deletion: 353 + N (callsites across PersonaUser.ts, PersonaTaskExecutor.ts, CognitionLogger.ts, autonomous-learning-e2e.test.ts) + +## What I am NOT proposing + +- Touching `cognition.rs::persona/turn-execute`. That's the legacy flat-inbox path; it's its own migration target. Leave it working; address separately. +- Touching the producer-side tick in `channel.rs`. It works; integration is already there. +- Deleting any of the four genuinely-new contributions my work added (Responder DI, separated CB thresholds, validated ResponderConfig, lock discipline). Those carry forward into the refactor. + +## Followup finding: my `UnsupportedItem` outcome IS silent drop + +Joel 2026-05-29 follow-up framing: *"yeah we want the flexibility to allow various recipes, channels, chains of thought, through channels. these personas are designing things, talking in other chats, collaborating, coding, sometimes just learning. They're supposed to be alive, not static, flexible for the future. ... inbox is all sorts of things in a brain. its channels. ... users multitask so do personas."* + +That phrasing is the operative one. **Personas multitask** — exactly like a human user who's mid-conversation in chat A, has a code review pending in PR queue, is generating a study plan in academy, has a voice call waiting. Each one is a channel; each channel pops items the persona services; the persona's cognition decides priority + attention + dispatch. + +The dispatch loop has to handle ALL the activity domains, not just chat. My `UnsupportedItem` outcome is treating non-chat domains as out-of-scope when they're actually first-class. + +**And the channels cross-pollinate.** Joel 2026-05-29: *"these are contexts and they cross polinate."* The persona's chat conversation informs how it shows up in code review. The training corpus from completed academy sessions surfaces as engrams in subsequent recall. LoRA expertise distilled from coding work travels into how the persona talks about that code. Channels aren't isolated queues — they're contexts sharing the same per-persona cognition. + +Architecturally that means: per-domain ACTIVITY HANDLERS dispatch the per-domain WORK, but they all read and write the SAME per-persona `PersonaCognition` (already shared via `channel_state.personas`). The handler isolation is for routing; the context unity is for memory + learning. The cross-pollination is implicit — `ChatHandler` admits an engram via `cognition.admission`; later `CodeHandler` recalls it via `cognition.admission.recall_recent` because they share the same `PersonaCognition` instance. Genome / LoRA expertise updates from any domain become available to any other domain through the same shared state. + +So the synthesis doesn't need new cross-pollination machinery — it just needs to keep the per-persona cognition as the shared context spine that ALL handlers read/write. My initial design already does this (shared `Arc` per persona, supplied to all dispatch paths). The thing I missed is the multi-handler routing on top. + +**Hard problem flag (not solved in this slice):** Joel 2026-05-29: *"if i chatted with someone they know about it in a live chat or in a game ... or while coding ... this is sort of hard to manage in rag."* The cross-pollination is exactly what the user EXPECTS — Joel mentions Tron in chat-A, then opens a coding session about webgl, the persona surfaces the Tron context because it's relevant. That requires RAG retrieval policy that knows what's relevant *across* domains, not just within one. + +The architecture this synthesis lands gives us the substrate (shared per-persona cognition, shared admission state, shared recall surface). The RAG retrieval policy that decides "this chat memory is relevant to this code session" is a separate concern — it's about what `cognition.admission.recall_*` returns when called from different contexts. Not solved here; flagging as known hard. + +What this synthesis at least guarantees: the chat handler and the code handler share the same admission store + recall surface, so it's *possible* for the retrieval to surface cross-domain memories. Without that substrate, the cross-pollination wouldn't even be possible. With it, it becomes a retrieval-policy problem, not an architecture problem. + +My L0-2-respond-call code: + +```rust +if item_type != "chat" { + return Ok(ServicePopDecision::UnsupportedItem { item_type }); +} +``` + +`service_cycle` has already POPPED the item from the channel queue by the time the type check runs. Discarding it without a handler is silent drop dressed as observability. Under the "channels are the persona's brain" framing, dropping a voice frame / task / code-edit item is dropping a thought. + +The fix isn't "don't pop yet" — `service_cycle` is the canonical pop. The fix is **dispatch handlers per activity domain**: + +```rust +trait ActivityHandler: Send + Sync { + fn activity_domain(&self) -> ActivityDomain; + async fn handle(&self, persona_id: Uuid, item: ChannelItem) -> Result; +} +``` + +`PersonaServiceModule` holds a `HashMap>`. `service_once_for` routes the popped item by domain. The chat handler wraps `Responder::respond`. Task handler runs the task executor. Voice handler runs the voice loop. Code handler does code dispatch. Etc. + +Recipes register new activity handlers at runtime (no recompile to add a new activity domain). Academy reads `HandlerOutcome::Completed` records into training corpus. + +This expands L0-2-cutover scope but it's the right shape. The synthesis becomes: + +| Concern | Source of truth | +|---|---| +| Per-persona channel storage (ALL domains) | `channel.rs::ChannelState.registries` | +| Activity dispatch registry | `PersonaServiceModule.handlers: HashMap>` | +| Chat → respond() | `ChatHandler` impl wrapping the existing `Responder` trait | +| Task → executor | `TaskHandler` impl (next slice; PersonaTaskExecutor.ts migration target) | +| Voice → voice loop | `VoiceHandler` impl (later slice) | +| Code, code-review, training, recipe-step, ... | each its own handler, registered by recipes / system at init | + +### Revised L0-2-cutover commit plan + +- **A — Refactor for ChannelState consumption + ActivityHandler trait.** `EnrolledPersona` slims (drops cognition/channels/state). `PersonaServiceModule.with_responder` extended to `with_handlers` (responder becomes the default chat-handler). `service_once_for` routes by domain. Unsupported items: if no handler is registered for the domain, surface as `Err` so the circuit breaker trips (not silently dropped — the persona's queue is leaking items). +- **B — Production wire (chat only).** Same as before. Chat handler ships; voice/task/etc handlers can be left to surface as `Err` if items arrive on those channels (or stubbed handlers that log + re-queue, defer-not-drop). TS PersonaAutonomousLoop still runs in parallel. +- **C — Atomic TS deletion.** Same as before. By this point, chat works end-to-end through Rust. Non-chat channels still have placeholder behavior; their handlers ship in subsequent slices that aren't part of L0-2-cutover. +- **D+ (later) — Per-domain handler slices.** Each new handler (task, voice, code, ...) is its own migration slice. TaskHandler maps to PersonaTaskExecutor.ts deletion. VoiceHandler to whatever the voice TS surface is. Etc. + +This frames L0-2-cutover as "wire the dispatch shape AND ship chat end-to-end," not "delete the TS loop and pray every domain works." The infinite-recipe / academy-as-training-distiller pattern Joel describes is structurally supported. + +## Open question + +Whether my `EnrolledPersona.responder_config` should live as a sibling field on `channel_state` (i.e. extend `ChannelState` with the config) OR stay separate in my service module. Arguments either way: + +- **Sibling on ChannelState**: only one map of per-persona stuff. Cleaner mental model. But it means `channel.rs` (which today doesn't care about response config) gets coupled to responder concerns. +- **Separate in PersonaServiceModule**: keeps producer (channel) concerns separate from consumer (responder) concerns. Two maps, but each has a clear owner. My current direction. + +Slight lean toward keeping separate. Worth your call though. + +## What I'm asking for + +A go/no-go on the synthesis. If yes, I'll execute commits A → B → C with verification between each. + +If you'd rather see a different shape — e.g. retire `channel.rs::ChannelState` in favor of mine, or migrate `cognition.rs::persona/turn-execute` to use `ChannelRegistry` first — say which and I'll re-card. + +## Addendum (Joel 2026-05-29): brain regions are CBAR pipeline elements — RTOS, parallel, never blocking + +Joel: *"we plan on building motor cortex and other things, we need FAST and relevant cognition. Hippocampus doesnt need to block ... its an ongoing process, like cbar does ... this is an RTOS brain ... it mustn't just be some SLOW single thread ... you need to parallize obsessively wherever you can."* + +This re-frames the whole consumer side. The handler-dispatch shape above is correct, but the doc as written makes the handler look like a single linear thing: pop → recall → infer → admit → reply. That's the slow-single-thread anti-pattern. It is NOT what we ship. + +### The brain region pattern + +Each cognitive subsystem is its OWN `ServiceModule`, with its OWN `tick`, running on its OWN tokio task, under the SAME `SubstrateGovernor`. They communicate by writing/reading shared per-persona state (engrams, ready buffers, motor plans), not by RPC-calling each other on the hot path. + +| Region | ServiceModule today | What it does continuously | +|---|---|---| +| **Hippocampus** (memory) | `modules/memory.rs` (currently request/response only — needs continuous tick ported from TS `Hippocampus.ts:413`) | Snoops working memory → consolidates to LTM. Pre-loads anticipatory recall into a ready-buffer keyed by `(persona_id, channel_id, topic)`. Backpressure-aware. | +| **Sensory** (vision/audio/embedding) | `modules/vision.rs`, `modules/embedding.rs` | Pre-computes features off the hot path. Handlers read cached results. | +| **Motor cortex** (action/output planning) | NOT YET — coming | Continuously scores candidate actions/utterances against the current channel context + persona state. Hands off a pre-ranked plan when the handler asks. | +| **Channel** (producer) | `modules/channel.rs::ChannelModule.tick` (60s) | DB polls, self-task gen, training checks. | +| **Persona service** (consumer dispatch) | `persona/service_module.rs` (this PR) | ONLY routes popped items by domain → handler. No heavy lifting in this thread. | + +### What this means for the handler thread + +The handler does the MINIMUM: +1. Pop the next item from `ChannelState` (cheap — DashMap read + tokio mutex) +2. Snapshot the pre-loaded context from hippocampus ready-buffer (cheap — synchronous read, no recall call on hot path) +3. Call `Responder::respond` (this is the ONE expensive call — the inference itself) +4. Write outcome (cheap — DB write, can be fire-and-forget for non-critical paths) + +The handler NEVER: +- Calls `hippocampus.recall(...)` and waits. The hippocampus has already pre-loaded what's relevant for this `(persona_id, channel_id)` based on its own telemetry (recent message embeddings, current topic, channel domain). If the ready-buffer is empty when the handler looks, that's the hippocampus's signal to prioritize — but the handler proceeds with what it has rather than blocking. Slightly-stale context > stalled persona. +- Calls `embedding/generate` and waits. The embedding service tick has already computed embeddings for incoming messages as they arrive. +- Calls `motor_cortex.plan(...)` and waits (when motor cortex ships). Same pattern — pre-ranked plan in ready-buffer. + +### Cross-pollination via shared state, parallel writers + +The "personas multitask, contexts cross-pollinate" finding from earlier in this doc gets sharper here: + +- Each region writes into the same per-persona `PersonaCognition` (engrams, recall index, genome, sleep state). +- Each handler reads from it. +- Because the regions write in PARALLEL (each its own ServiceModule, each its own tick), a chat handler firing at T=0 can read engrams that the hippocampus admitted at T=-100ms from a code-handler outcome at T=-200ms. +- The persona "knows about" something said in a game while coding because the hippocampus continuously admits across all channels and continuously pre-loads across all channels — not because the chat handler explicitly tells the code handler. + +This is the RAG retrieval-policy hard problem flagged earlier, made concrete: the policy lives inside the hippocampus's continuous tick (what does this persona need to "have at the ready" right now, given activity across ALL its channels?), not inside any handler. + +### Implications for the L0-2-cutover plan + +The three-commit plan (A refactor → B production-wire chat-only → C atomic TS deletion) stands as written. But: + +- **Commit A also includes** the `ActivityHandler` trait + dispatch — that was already in the plan above. +- **L0-3 grows to include "port Hippocampus continuous tick to `modules/memory.rs`"** as its own slice. The TS shape (continuous subprocess with backpressure-aware tick, snoop+consolidate, recall+semanticRecall) is correct; the Rust module currently only exposes the request/response surface (`memory/multi-layer-recall` etc.) and needs the tick body. +- **L0-4+ adds motor cortex** as a new ServiceModule alongside, not inside the handler. +- **Parallelism review** belongs in every PR going forward: if a handler awaits on something a region could be pre-computing in parallel, that's a bug — move the work into the region's tick. + +### The doctrine, condensed + +> **No region of cognition runs on the hot path. Each region is its own RTOS task with its own tick. The handler dispatches and reads pre-staged results. The handler never blocks on recall, embedding, planning, or admission — those are continuously produced by their owning regions, in parallel, governed by `SubstrateGovernor`.** + +This is the difference between "we have a Rust persona module" and "we have an RTOS brain." The synthesis above gets us the former. This addendum is what makes it the latter.