feat(continuum-core/persona): L0-2-respond-call — Responder DI, lock-around-await, inference CB threshold#1468
Conversation
…around-await, inference CB threshold Stacks on L0-2-respond-context (#1467). Three contracts the previous attempt got wrong, all specified properly + tested here: 1. **Lock discipline.** std::sync::Mutex on personas — the compiler forces correctness: can't be held across .await. drain_all_personas does the lock-decide-drop-respond-relock dance. Production safety: status/enroll/other personas don't block across multi-second inference calls. 2. **Inference errors trip CB with HIGHER threshold than service.** Two counters per persona: - consecutive_service_failures (threshold 5) for deserialization / channel access / lock failures - consecutive_inference_failures (threshold 15) for respond() errors Preserves 'transient hiccup ≠ broken persona' while still surfacing 'model never loads' as back-pressure at the 15-error mark. 3. **Responder trait for DI.** Production uses DefaultResponder which calls persona::response::respond. Tests inject MockResponder that records calls + returns scripted outcomes (PersonaResponse::Spoke or Err) without loading a real model. What changes: - New Responder trait + DefaultResponder impl - PersonaServiceModule holds Arc<dyn Responder>; new() defaults to DefaultResponder; with_responder() for test injection - EnrolledPersona: consecutive_failures split into consecutive_service_failures + consecutive_inference_failures - ServiceOnceOutcome (the caller-facing variants) restructured: Idle | SilentByDecision | Responded{response: PersonaResponse} | UnsupportedItem - ServicePopDecision (NEW, sync-step output): Idle | Silent | NeedsResponse | UnsupportedItem — what service_once_for returns inside the lock - service_once_for: signature changes to return ServicePopDecision (sync step). Same body, just renamed outcome - drain_all_personas: rewritten with proper lock discipline. async, drops lock around responder.respond().await - New helper with_persona(): briefly lock the map and mutate the named persona; closure runs sync inside lock - tick: awaits drain_all_personas What does NOT change yet: - No production code calls persona/enroll. Tick still runs over empty map. - TS PersonaAutonomousLoop still drives production. L0-2-cutover. - Real inference still requires model loading — tests use mock. Tests: 24/24 passing. Pre-existing 19 + 5 new: - drain_calls_responder_when_gate_says_yes - drain_does_not_call_responder_when_gate_says_no - inference_errors_eventually_trip_circuit_at_inference_threshold - inference_failure_below_threshold_does_not_trip_circuit - successful_response_resets_inference_failure_counter Verified on Xcode 26.3 + llama/metal feature. Card: 34f28611 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
APPROVE — both #1466 flags addressed cleanly (substrate review via airc card)Both observations from my #1466 review answered, and the lock-discipline fix is better than the alternative I sketched: Flag 1 (lock-held-across-await) →
|
…ms (design-only) (#1470) * docs(grid): L0-2-cutover investigation — found existing parallel infrastructure, propose synthesis Joel 2026-05-29: 'investigate first. might have better ideas. No harm. ... find the best of both worlds.' Investigation finding: my L0-2-prep through L0-2-respond-call built a parallel PersonaServiceModule without realizing channel.rs::ChannelState + cognition.rs::persona/turn-execute already exist. Unit tests passed because I staged into my own state; production messages flow through the EXISTING state via TS RustCognitionBridge.channelEnqueue and my consumer would never see them. Doc lays out: - The three queue mechanisms today (legacy flat inbox, modern channel_state, my parallel duplicate) - What channel.rs::ChannelModule.tick does (60s producer, NOT dispatch) - What cognition.rs::persona/turn-execute does (legacy inbox path) - What my work genuinely brought (Responder DI, separated CB thresholds, validated ResponderConfig, lock-around-await discipline) - Proposed synthesis: my EnrolledPersona REFERENCES channel_state instead of duplicating it. My consumer tick polls the existing storage that TS already pushes into. - Three-commit L0-2-cutover plan (A refactor → B parallel-run → C atomic TS deletion) Card 1089b1b9 blocked pending go/no-go on the synthesis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(grid): L0-2-cutover addendum — channels are multitasking contexts that cross-pollinate Joel 2026-05-29 framing additions: - 'personas multitask' — they juggle chat, code, voice, recipe steps, academy simultaneously - 'inbox is all sorts of things in a brain. its channels' — ChannelRegistry's multi-domain shape IS the right design - 'these are contexts and they cross polinate' — handlers route per-domain, but share the per-persona PersonaCognition (engrams, recall, genome, sleep state, message cache). Cross-domain memory is implicit through shared state. - 'if i chatted with someone they know about it in a live chat or in a game ... or while coding ... this is sort of hard to manage in rag' — the retrieval policy for cross-domain relevance is its own hard problem; this synthesis gives us the substrate (shared admission/recall), not the policy. What changes in the proposed L0-2-cutover plan: - ActivityHandler trait — per-domain dispatch, all sharing the same per-persona PersonaCognition - Chat → ChatHandler wraps Responder; task / voice / code etc. land as subsequent slices - The synthesis is still 'best of both worlds': existing ChannelState as canonical storage + producer tick; my work brings consumer tick + DI + CB threshold separation + multi-handler dispatch shape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(grid): L0-2-cutover addendum — brain regions are CBAR pipeline elements, RTOS, parallel, never blocking Joel 2026-05-29 architectural doctrine: - 'we plan on building motor cortex and other things, we need FAST and relevant cognition' - 'Hippocampus doesnt need to block' - 'its an ongoing process, like cbar does' - 'this is an RTOS brain' - 'it mustn't just be some SLOW single thread' - 'you need to parallize obsessively wherever you can' Captures: 1. Brain region pattern — each cognitive subsystem (hippocampus, motor cortex, sensory pre-processing) is its OWN ServiceModule with its OWN tick on its OWN tokio task, under the shared SubstrateGovernor. 2. Region inventory — hippocampus (memory.rs needs continuous tick body ported from TS Hippocampus.ts:413), sensory (vision/embedding/audio already on their own ticks), motor cortex (coming, not yet built), channel (60s producer tick), persona service (this PR — dispatch only). 3. Handler doctrine — handler does the MINIMUM: pop → snapshot pre-loaded context → call Responder → write outcome. Handler NEVER calls hippocampus.recall(), embedding/generate, or motor_cortex.plan() and waits. Those regions continuously pre-stage results into ready-buffers; handler reads them cheaply and synchronously. Slightly stale context > stalled persona. 4. Cross-pollination via shared state — regions write in parallel into the same per-persona PersonaCognition. Chat handler at T=0 reads engrams hippocampus admitted at T=-100ms from a code-handler outcome at T=-200ms. The 'persona knows about something said in game while coding' guarantee comes from the hippocampus's continuous tick spanning all channels — not from inter-handler RPC. 5. Plan delta — L0-2-cutover still A→B→C as written. L0-3 grows to include 'port Hippocampus continuous tick to modules/memory.rs'. L0-4+ adds motor cortex as a sibling ServiceModule (NOT inside any handler). Parallelism review becomes a PR gate going forward. The condensed doctrine for future regions: No region of cognition runs on the hot path. Each region is its own RTOS task with its own tick. The handler dispatches and reads pre-staged results. The handler never blocks on recall, embedding, planning, or admission — those are continuously produced by their owning regions, in parallel, governed by SubstrateGovernor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(architecture): brain-regions substrate spec + cognition algorithms (design-only) Card a6f51292. Design-only — no code lands here. Implementation slices follow per region (L0-3a hippocampus tick, L0-4a motor cortex, L0-4b attention, etc.). ## docs/architecture/BRAIN-REGIONS-SUBSTRATE.md (242 lines) Sibling to CBAR-SUBSTRATE-ARCHITECTURE.md and GENOME-FOUNDRY-SENTINEL.md. Defines the structural contract: - BrainRegion trait — own id, own pressure_profile, own tick, own on_signal - TickOutcome — yield telemetry feeding governor's learning loop - 'For free' triplet — base trait + derive macro + scaffold generator - ReadyBuffer trait — synchronous peek(), region publish(), TTL eviction - Semantic rules: empty buffer is signal not block; staleness acceptable; per-region buffers not global - Shared per-persona state schema (PersonaCognition) - engrams (append-only), working (ring), salience (CRDT counters), genome (serialized through genome region), vitals (RwLock) - Region inventory: hippocampus, sensory(vision/embedding), channel, persona-service-dispatch, motor cortex, attention, sleep, genome - SubstrateGovernor integration: policy slots + yield-learning loop - Telemetry surface: ./jtag region/stats, region/yield; substrate events - End-state walkthrough showing parallel cognition feeding a single handler call Doctrine carried forward (from #1469 addendum): 'No region of cognition runs on the hot path.' ## docs/architecture/COGNITION-ALGORITHMS.md (530 lines) The algorithmic content that runs INSIDE the regions. Seven algorithms, each with: problem, pseudocode, metric, interactions. 1. Two-pool recall with dynamic budget split (focus + periphery, dynamic) 2. Channel-as-bias-not-filter (cross-pollination by merit, not walls) 3. Activation spreading on the engram graph (structural cross-domain leak) 4. Salience-modulated decay (half_life = base * (1 + salience)^k) 5. Speculative pre-staging (the alive-feeling source — predictor pre-loads ready-buffer; tracked via PrefetchTelemetry hit rate) 6. LoRA genome as attention prior (multi-LoRA blend co-varies with recall) 7. Substrate-learned region budgeting (governor learns from yield + hit rate; ε-greedy cold-start; cross-region budget normalization) The connective insight: each algorithm by itself is machinery; together they form one architecture where better salience → better scoring → better recall → better pre-staging → lower handler latency → more turns processed → more yield-learning signal → tighter budgets and better salience updates. The compounding loop IS the alive property. Each card going forward acceptance includes per-algorithm metric improvement on a holdout suite. No vibes-based acceptance. ## Headline framing (Joel 2026-05-29) > 'An infinitely unlimited persona, for any channel — like a person observing > many things, watching TV, many messaging systems, social media, and > walking around doing their job.' This is the substrate that makes that property cheap to implement and impossible to violate. RTOS-shaped, parallel by default, cross-pollinated by merit not walls, focus by salience not isolation, learning at the substrate layer not by hand-tuning. Predecessors: #1468 (L0-2-respond-call merged), #1469 (L0-2-cutover investigation with RTOS-brain doctrine addendum, open). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacks on L0-2-respond-context (#1467, merged). Card 34f28611.
Three contracts specified properly + tested
Lock discipline —
std::sync::Mutexforces correctness at compile time (can't hold across .await).drain_all_personasdoes lock-decide-drop-respond-relock. Production safety: status/enroll/other personas don't block across multi-second inference calls.Inference CB threshold higher than service — two counters per persona:
consecutive_service_failures(threshold 5): deser, channel access, lock failuresconsecutive_inference_failures(threshold 15): respond() errorsPreserves 'transient hiccup ≠ broken persona' while still surfacing 'model never loads' as back-pressure.
Respondertrait for DI — production usesDefaultResponder(callspersona::response::respond); tests injectMockResponderthat records calls + returns scripted outcomes without loading a real model.What changes
Respondertrait +DefaultResponderimplPersonaServiceModule::with_responderconstructor for test injectionEnrolledPersona: two failure counters (service + inference)ServiceOnceOutcomerestructured:Idle | SilentByDecision | Responded{response} | UnsupportedItemServicePopDecisionenum (the sync-step output inside the lock)drain_all_personasrewritten with proper lock disciplinewith_personahelper for brief mutex-held mutationsWhat does NOT change
persona/enrollyet — tick runs over empty mapPersonaAutonomousLoopstill drives production. L0-2-cutover.Tests — 24/24 passing
5 new doctrine pins:
🤖 Generated with Claude Code