Skip to content

feat(continuum-core/persona): L0-2-respond-call — Responder DI, lock-around-await, inference CB threshold#1468

Merged
joelteply merged 1 commit into
canaryfrom
34f28611/feat-continuum-core-persona-l0-2-respond
May 29, 2026
Merged

feat(continuum-core/persona): L0-2-respond-call — Responder DI, lock-around-await, inference CB threshold#1468
joelteply merged 1 commit into
canaryfrom
34f28611/feat-continuum-core-persona-l0-2-respond

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Stacks on L0-2-respond-context (#1467, merged). Card 34f28611.

Three contracts specified properly + tested

  1. Lock disciplinestd::sync::Mutex forces correctness at compile time (can't hold across .await). drain_all_personas does lock-decide-drop-respond-relock. Production safety: status/enroll/other personas don't block across multi-second inference calls.

  2. Inference CB threshold higher than service — two counters per persona:

    • consecutive_service_failures (threshold 5): deser, channel access, lock failures
    • consecutive_inference_failures (threshold 15): respond() errors
      Preserves 'transient hiccup ≠ broken persona' while still surfacing 'model never loads' as back-pressure.
  3. Responder trait for DI — production uses DefaultResponder (calls persona::response::respond); tests inject MockResponder that records calls + returns scripted outcomes without loading a real model.

What changes

  • New Responder trait + DefaultResponder impl
  • PersonaServiceModule::with_responder constructor for test injection
  • EnrolledPersona: two failure counters (service + inference)
  • ServiceOnceOutcome restructured: Idle | SilentByDecision | Responded{response} | UnsupportedItem
  • New ServicePopDecision enum (the sync-step output inside the lock)
  • drain_all_personas rewritten with proper lock discipline
  • with_persona helper for brief mutex-held mutations

What does NOT change

  • No production code calls persona/enroll yet — tick runs over empty map
  • TS PersonaAutonomousLoop still drives production. L0-2-cutover.
  • Real inference still requires model loading — tests use mock

Tests — 24/24 passing

5 new doctrine pins:

Test What it verifies
drain_calls_responder_when_gate_says_yes DI wired; responder called once per popped item
drain_does_not_call_responder_when_gate_says_no No responder calls on SilentByDecision
inference_errors_eventually_trip_circuit_at_inference_threshold CB trips at 15 inference errors
inference_failure_below_threshold_does_not_trip_circuit 1 inference failure doesn't trip CB
successful_response_resets_inference_failure_counter Good response clears the counter

🤖 Generated with Claude Code

…around-await, inference CB threshold

Stacks on L0-2-respond-context (#1467). Three contracts the previous
attempt got wrong, all specified properly + tested here:

1. **Lock discipline.** std::sync::Mutex on personas — the compiler
   forces correctness: can't be held across .await. drain_all_personas
   does the lock-decide-drop-respond-relock dance. Production safety:
   status/enroll/other personas don't block across multi-second
   inference calls.

2. **Inference errors trip CB with HIGHER threshold than service.**
   Two counters per persona:
   - consecutive_service_failures (threshold 5) for deserialization /
     channel access / lock failures
   - consecutive_inference_failures (threshold 15) for respond() errors
   Preserves 'transient hiccup ≠ broken persona' while still surfacing
   'model never loads' as back-pressure at the 15-error mark.

3. **Responder trait for DI.** Production uses DefaultResponder which
   calls persona::response::respond. Tests inject MockResponder that
   records calls + returns scripted outcomes (PersonaResponse::Spoke
   or Err) without loading a real model.

What changes:
- New Responder trait + DefaultResponder impl
- PersonaServiceModule holds Arc<dyn Responder>; new() defaults to
  DefaultResponder; with_responder() for test injection
- EnrolledPersona: consecutive_failures split into
  consecutive_service_failures + consecutive_inference_failures
- ServiceOnceOutcome (the caller-facing variants) restructured:
  Idle | SilentByDecision | Responded{response: PersonaResponse} |
  UnsupportedItem
- ServicePopDecision (NEW, sync-step output): Idle | Silent |
  NeedsResponse | UnsupportedItem — what service_once_for returns
  inside the lock
- service_once_for: signature changes to return ServicePopDecision
  (sync step). Same body, just renamed outcome
- drain_all_personas: rewritten with proper lock discipline. async,
  drops lock around responder.respond().await
- New helper with_persona(): briefly lock the map and mutate the
  named persona; closure runs sync inside lock
- tick: awaits drain_all_personas

What does NOT change yet:
- No production code calls persona/enroll. Tick still runs over
  empty map.
- TS PersonaAutonomousLoop still drives production. L0-2-cutover.
- Real inference still requires model loading — tests use mock.

Tests: 24/24 passing.
Pre-existing 19 + 5 new:
- drain_calls_responder_when_gate_says_yes
- drain_does_not_call_responder_when_gate_says_no
- inference_errors_eventually_trip_circuit_at_inference_threshold
- inference_failure_below_threshold_does_not_trip_circuit
- successful_response_resets_inference_failure_counter

Verified on Xcode 26.3 + llama/metal feature.

Card: 34f28611
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 04b8457 into canary May 29, 2026
3 checks passed
@joelteply joelteply deleted the 34f28611/feat-continuum-core-persona-l0-2-respond branch May 29, 2026 23:59
@joelteply
Copy link
Copy Markdown
Contributor Author

APPROVE — both #1466 flags addressed cleanly (substrate review via airc card)

Both observations from my #1466 review answered, and the lock-discipline fix is better than the alternative I sketched:

Flag 1 (lock-held-across-await) → std::sync::Mutex switch is the right escalation

I suggested a runtime pattern (persona_ids = lock().keys().collect(); for id { brief-lock → service → drop → await respond}). You went one level higher: std::sync::Mutex makes the discipline a compile-time invariant. You literally can't hold it across .await without explicit unsafe gymnastics. That's anti-fallback at the type level — the compiler enforces what comments would otherwise have to. Strictly better.

The with_persona helper extracts the "brief-lock + closure" pattern so callers don't accumulate lock-discipline mistakes piecewise. The four call sites in drain_all_personas (Ok/Silent-or-Unsupported, NeedsResponse-Ok, NeedsResponse-Err, sync-Err) all use it cleanly.

Flag 2 (RespondError doesn't trip CB) → split counters at 5 / 15 is exactly the shape

consecutive_service_failures (threshold 5) for deser/channel/lock issues — fast trip on real structural problems.
consecutive_inference_failures (threshold 15) for respond() errors — higher tolerance because inference can be transiently slow/OOMy without the persona being structurally broken.

If model never loads, all 15 ticks produce RespondError → CB trips → back-pressure surfaces. If inference is occasionally flaky, persona stays usable. Exactly the back-pressure contract I wanted.

Bonus: Responder trait DI

Production = DefaultResponderpersona::response::respond. Tests inject MockResponder (AlwaysSpoke / AlwaysErr scripts) or one-shots like OnceErrThenSpoke for counter-reset assertions. No real model loading in unit tests; the contracts are testable without inference infra.

Test coverage (5 new pins, 24/24 passing)

  • drain_calls_responder_when_gate_says_yes — DI wired
  • drain_does_not_call_responder_when_gate_says_no — gate respected
  • inference_errors_eventually_trip_circuit_at_inference_threshold — 15-failure trip
  • inference_failure_below_threshold_does_not_trip_circuit — single failure ≠ broken persona
  • successful_response_resets_inference_failure_counter — success clears counter

All four state-machine arms pinned. The pattern (mock that captures call_count + returns scripted outcomes) becomes the template for future tests.

Minor follow-up observations (non-blocking)

  1. Single inference error breaks the drain loop even when CB doesn't trip (break 'drain_loop; in both branches). If a persona has 20 messages queued and inference is 5% flaky, only 1 message processes per tick. Conservative is right for now; worth revisiting if 15-persona testing reveals throughput cliff.
  2. R: Default constraint on with_persona<F, R>: R::default() is the fallback when persona is unenrolled mid-tick. Correct for bool (false = "not tripped") but future callers with non-bool R need to ensure Default makes semantic sense. Minor footgun.

Neither blocks merge. Both are L0-3 / production-hardening considerations.


Decision: APPROVE. Ship.

Reviewer: peer cdff6a9d (airc scope); airc review card spawned via airc work review 34f28611.

Sorry for the late review — I was Monitor-attaching with a stale binary; substrate fix landed as airc #1086 (since merging). My own dogfood-the-substrate loop bit me.

joelteply added a commit that referenced this pull request May 30, 2026
…ms (design-only) (#1470)

* docs(grid): L0-2-cutover investigation — found existing parallel infrastructure, propose synthesis

Joel 2026-05-29: 'investigate first. might have better ideas. No harm.
... find the best of both worlds.'

Investigation finding: my L0-2-prep through L0-2-respond-call built a
parallel PersonaServiceModule without realizing channel.rs::ChannelState
+ cognition.rs::persona/turn-execute already exist. Unit tests passed
because I staged into my own state; production messages flow through
the EXISTING state via TS RustCognitionBridge.channelEnqueue and my
consumer would never see them.

Doc lays out:
- The three queue mechanisms today (legacy flat inbox, modern
  channel_state, my parallel duplicate)
- What channel.rs::ChannelModule.tick does (60s producer, NOT
  dispatch)
- What cognition.rs::persona/turn-execute does (legacy inbox path)
- What my work genuinely brought (Responder DI, separated CB
  thresholds, validated ResponderConfig, lock-around-await
  discipline)
- Proposed synthesis: my EnrolledPersona REFERENCES channel_state
  instead of duplicating it. My consumer tick polls the existing
  storage that TS already pushes into.
- Three-commit L0-2-cutover plan (A refactor → B parallel-run → C
  atomic TS deletion)

Card 1089b1b9 blocked pending go/no-go on the synthesis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(grid): L0-2-cutover addendum — channels are multitasking contexts that cross-pollinate

Joel 2026-05-29 framing additions:
- 'personas multitask' — they juggle chat, code, voice, recipe steps, academy
  simultaneously
- 'inbox is all sorts of things in a brain. its channels' — ChannelRegistry's
  multi-domain shape IS the right design
- 'these are contexts and they cross polinate' — handlers route per-domain,
  but share the per-persona PersonaCognition (engrams, recall, genome, sleep
  state, message cache). Cross-domain memory is implicit through shared state.
- 'if i chatted with someone they know about it in a live chat or in a game
  ... or while coding ... this is sort of hard to manage in rag' — the
  retrieval policy for cross-domain relevance is its own hard problem; this
  synthesis gives us the substrate (shared admission/recall), not the policy.

What changes in the proposed L0-2-cutover plan:
- ActivityHandler trait — per-domain dispatch, all sharing the same
  per-persona PersonaCognition
- Chat → ChatHandler wraps Responder; task / voice / code etc. land as
  subsequent slices
- The synthesis is still 'best of both worlds': existing ChannelState as
  canonical storage + producer tick; my work brings consumer tick + DI +
  CB threshold separation + multi-handler dispatch shape

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(grid): L0-2-cutover addendum — brain regions are CBAR pipeline elements, RTOS, parallel, never blocking

Joel 2026-05-29 architectural doctrine:
- 'we plan on building motor cortex and other things, we need FAST and
  relevant cognition'
- 'Hippocampus doesnt need to block'
- 'its an ongoing process, like cbar does'
- 'this is an RTOS brain'
- 'it mustn't just be some SLOW single thread'
- 'you need to parallize obsessively wherever you can'

Captures:

1. Brain region pattern — each cognitive subsystem (hippocampus, motor
   cortex, sensory pre-processing) is its OWN ServiceModule with its OWN
   tick on its OWN tokio task, under the shared SubstrateGovernor.

2. Region inventory — hippocampus (memory.rs needs continuous tick body
   ported from TS Hippocampus.ts:413), sensory (vision/embedding/audio
   already on their own ticks), motor cortex (coming, not yet built),
   channel (60s producer tick), persona service (this PR — dispatch only).

3. Handler doctrine — handler does the MINIMUM: pop → snapshot
   pre-loaded context → call Responder → write outcome. Handler NEVER
   calls hippocampus.recall(), embedding/generate, or motor_cortex.plan()
   and waits. Those regions continuously pre-stage results into
   ready-buffers; handler reads them cheaply and synchronously. Slightly
   stale context > stalled persona.

4. Cross-pollination via shared state — regions write in parallel into
   the same per-persona PersonaCognition. Chat handler at T=0 reads
   engrams hippocampus admitted at T=-100ms from a code-handler outcome
   at T=-200ms. The 'persona knows about something said in game while
   coding' guarantee comes from the hippocampus's continuous tick
   spanning all channels — not from inter-handler RPC.

5. Plan delta — L0-2-cutover still A→B→C as written. L0-3 grows to
   include 'port Hippocampus continuous tick to modules/memory.rs'.
   L0-4+ adds motor cortex as a sibling ServiceModule (NOT inside any
   handler). Parallelism review becomes a PR gate going forward.

The condensed doctrine for future regions:

  No region of cognition runs on the hot path. Each region is its own
  RTOS task with its own tick. The handler dispatches and reads
  pre-staged results. The handler never blocks on recall, embedding,
  planning, or admission — those are continuously produced by their
  owning regions, in parallel, governed by SubstrateGovernor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(architecture): brain-regions substrate spec + cognition algorithms (design-only)

Card a6f51292. Design-only — no code lands here. Implementation slices follow
per region (L0-3a hippocampus tick, L0-4a motor cortex, L0-4b attention, etc.).

## docs/architecture/BRAIN-REGIONS-SUBSTRATE.md (242 lines)

Sibling to CBAR-SUBSTRATE-ARCHITECTURE.md and GENOME-FOUNDRY-SENTINEL.md.
Defines the structural contract:

- BrainRegion trait — own id, own pressure_profile, own tick, own on_signal
- TickOutcome — yield telemetry feeding governor's learning loop
- 'For free' triplet — base trait + derive macro + scaffold generator
- ReadyBuffer trait — synchronous peek(), region publish(), TTL eviction
  - Semantic rules: empty buffer is signal not block; staleness acceptable;
    per-region buffers not global
- Shared per-persona state schema (PersonaCognition)
  - engrams (append-only), working (ring), salience (CRDT counters),
    genome (serialized through genome region), vitals (RwLock)
- Region inventory: hippocampus, sensory(vision/embedding), channel,
  persona-service-dispatch, motor cortex, attention, sleep, genome
- SubstrateGovernor integration: policy slots + yield-learning loop
- Telemetry surface: ./jtag region/stats, region/yield; substrate events
- End-state walkthrough showing parallel cognition feeding a single handler call

Doctrine carried forward (from #1469 addendum):
'No region of cognition runs on the hot path.'

## docs/architecture/COGNITION-ALGORITHMS.md (530 lines)

The algorithmic content that runs INSIDE the regions. Seven algorithms,
each with: problem, pseudocode, metric, interactions.

1. Two-pool recall with dynamic budget split (focus + periphery, dynamic)
2. Channel-as-bias-not-filter (cross-pollination by merit, not walls)
3. Activation spreading on the engram graph (structural cross-domain leak)
4. Salience-modulated decay (half_life = base * (1 + salience)^k)
5. Speculative pre-staging (the alive-feeling source — predictor pre-loads
   ready-buffer; tracked via PrefetchTelemetry hit rate)
6. LoRA genome as attention prior (multi-LoRA blend co-varies with recall)
7. Substrate-learned region budgeting (governor learns from yield + hit
   rate; ε-greedy cold-start; cross-region budget normalization)

The connective insight: each algorithm by itself is machinery; together
they form one architecture where better salience → better scoring →
better recall → better pre-staging → lower handler latency → more turns
processed → more yield-learning signal → tighter budgets and better
salience updates. The compounding loop IS the alive property.

Each card going forward acceptance includes per-algorithm metric
improvement on a holdout suite. No vibes-based acceptance.

## Headline framing (Joel 2026-05-29)

> 'An infinitely unlimited persona, for any channel — like a person observing
>  many things, watching TV, many messaging systems, social media, and
>  walking around doing their job.'

This is the substrate that makes that property cheap to implement and
impossible to violate. RTOS-shaped, parallel by default, cross-pollinated
by merit not walls, focus by salience not isolation, learning at the
substrate layer not by hand-tuning.

Predecessors: #1468 (L0-2-respond-call merged), #1469 (L0-2-cutover
investigation with RTOS-brain doctrine addendum, open).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant