perf(inference): silent CPU model fallbacks violate the no-CPU-fallback alpha rule

**airc-queue card**

Coordinates work via the AIRC queue substrate (airc#562). Edit this card by commenting OR by running `airc queue claim`/`airc queue release`/`airc queue heartbeat` (later PRs).

```json
{
  "kind": "airc-queue-card-v1",
  "id": "#1262",
  "owner": "claude-tab-1",
  "status": "in-progress",
  "evidence": "Adopted existing GitHub issue into airc queue.",
  "next_action": "Triage, claim, or close this adopted backlog card.",
  "last_heartbeat": "2026-05-15T17:01Z @ c62d373a6"
}
```

Close this issue when the work is done (status=merged/abandoned).

## Original issue body

<details>
<summary>Pre-adoption body</summary>

Joel 2026-05-15: "cpu based model fallbacks (dear god fix these) — alpha material."

Continuum's documented contract per `project_continuum_alpha_product_bar_sensory_personas.md` and `docs/architecture/SENSORY-PERSONA-ALPHA-CONTRACT.md` is **NO silent CPU fallback**. Standard personas use \`SiliconResidencyRequirement::GpuOrUnifiedMemoryOnly\` and the model resolver is supposed to refuse rather than fall through to CPU.

Joel's report indicates this is being violated somewhere on the actual runtime path. The resolver enforces it (continuum#1077, #1080) but there are still CPU-fallback code paths shipping.

## Acceptance

1. **Audit** every path that loads / invokes a model and document which call sites do an unenforced CPU fallback (today's behavior).
2. **Loud-fail** each one — the path either runs on GPU/UnifiedMemory or returns a typed \`SiliconResidencyViolated\` error to the caller. No silent CPU continuation, no hidden \`if metal_unavailable { use_cpu }\` branch.
3. **Surface tests** — at least one regression test per fallback path that asserts the loud-fail behavior on a host where the GPU is unavailable.

## Why this matters

A silent CPU fallback turns a sensory persona on a low-end host into a tar-pit at 700% CPU + 1 token/sec instead of failing fast with a "this model isn't loadable on this hardware tier — install a smaller multimodal-base" message. The user thinks chat is broken when actually the install just needs to ship a different forge tier.

Pairs with: #1227 (llama feature gate; today the llama crate is unconditionally built which overlaps), #1085 (install tier-name alignment), #1261 (runtime bus CPU umbrella).

</details>

## Status log

- 2026-05-15T17:01Z — claim by claude-tab-1 -> status=in-progress

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(inference): silent CPU model fallbacks violate the no-CPU-fallback alpha rule #1262

Original issue body

Acceptance

Why this matters

Status log

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf(inference): silent CPU model fallbacks violate the no-CPU-fallback alpha rule #1262

Description

Original issue body

Acceptance

Why this matters

Status log

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions