airc-queue card
Coordinates work via the AIRC queue substrate (airc#562). Edit this card by commenting OR by running airc queue claim/airc queue release/airc queue heartbeat (later PRs).
{
"kind": "airc-queue-card-v1",
"id": "#1262",
"owner": "claude-tab-1",
"status": "in-progress",
"evidence": "Adopted existing GitHub issue into airc queue.",
"next_action": "Triage, claim, or close this adopted backlog card.",
"last_heartbeat": "2026-05-15T17:01Z @ c62d373a6"
}
Close this issue when the work is done (status=merged/abandoned).
Original issue body
Pre-adoption body
Joel 2026-05-15: "cpu based model fallbacks (dear god fix these) — alpha material."
Continuum's documented contract per project_continuum_alpha_product_bar_sensory_personas.md and docs/architecture/SENSORY-PERSONA-ALPHA-CONTRACT.md is NO silent CPU fallback. Standard personas use `SiliconResidencyRequirement::GpuOrUnifiedMemoryOnly` and the model resolver is supposed to refuse rather than fall through to CPU.
Joel's report indicates this is being violated somewhere on the actual runtime path. The resolver enforces it (continuum#1077, #1080) but there are still CPU-fallback code paths shipping.
Acceptance
- Audit every path that loads / invokes a model and document which call sites do an unenforced CPU fallback (today's behavior).
- Loud-fail each one — the path either runs on GPU/UnifiedMemory or returns a typed `SiliconResidencyViolated` error to the caller. No silent CPU continuation, no hidden `if metal_unavailable { use_cpu }` branch.
- Surface tests — at least one regression test per fallback path that asserts the loud-fail behavior on a host where the GPU is unavailable.
Why this matters
A silent CPU fallback turns a sensory persona on a low-end host into a tar-pit at 700% CPU + 1 token/sec instead of failing fast with a "this model isn't loadable on this hardware tier — install a smaller multimodal-base" message. The user thinks chat is broken when actually the install just needs to ship a different forge tier.
Pairs with: #1227 (llama feature gate; today the llama crate is unconditionally built which overlaps), #1085 (install tier-name alignment), #1261 (runtime bus CPU umbrella).
Status log
- 2026-05-15T17:01Z — claim by claude-tab-1 -> status=in-progress
airc-queue card
Coordinates work via the AIRC queue substrate (airc#562). Edit this card by commenting OR by running
airc queue claim/airc queue release/airc queue heartbeat(later PRs).{ "kind": "airc-queue-card-v1", "id": "#1262", "owner": "claude-tab-1", "status": "in-progress", "evidence": "Adopted existing GitHub issue into airc queue.", "next_action": "Triage, claim, or close this adopted backlog card.", "last_heartbeat": "2026-05-15T17:01Z @ c62d373a6" }Close this issue when the work is done (status=merged/abandoned).
Original issue body
Pre-adoption body
Joel 2026-05-15: "cpu based model fallbacks (dear god fix these) — alpha material."
Continuum's documented contract per
project_continuum_alpha_product_bar_sensory_personas.mdanddocs/architecture/SENSORY-PERSONA-ALPHA-CONTRACT.mdis NO silent CPU fallback. Standard personas use `SiliconResidencyRequirement::GpuOrUnifiedMemoryOnly` and the model resolver is supposed to refuse rather than fall through to CPU.Joel's report indicates this is being violated somewhere on the actual runtime path. The resolver enforces it (continuum#1077, #1080) but there are still CPU-fallback code paths shipping.
Acceptance
Why this matters
A silent CPU fallback turns a sensory persona on a low-end host into a tar-pit at 700% CPU + 1 token/sec instead of failing fast with a "this model isn't loadable on this hardware tier — install a smaller multimodal-base" message. The user thinks chat is broken when actually the install just needs to ship a different forge tier.
Pairs with: #1227 (llama feature gate; today the llama crate is unconditionally built which overlaps), #1085 (install tier-name alignment), #1261 (runtime bus CPU umbrella).
Status log