ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source

**airc-queue card**

Coordinates work via the AIRC queue substrate (airc#562). Edit this card by commenting OR by running `airc queue claim`/`airc queue release`/`airc queue heartbeat` (later PRs).

```json
{
  "kind": "airc-queue-card-v1",
  "id": "#917",
  "owner": "claude-tab-2",
  "status": "claimed",
  "evidence": "Adopted existing GitHub issue into airc queue.",
  "next_action": "Triage, claim, or close this adopted backlog card."
}
```

Close this issue when the work is done (status=merged/abandoned).

## Original issue body

<details>
<summary>Pre-adoption body</summary>

## Background

Joel called out a pattern in our inference layer (2026-04-17 chat session):

> "context window limits are defined BY the model as are features such as audio or vision. this is probably why those attempts also failed"
> "if you need a var require it / pass the entire struct around / has all the info you need / or grab it"
> "make it declarative"

Today we have parallel model-info plumbing:
- `system/shared/ModelContextWindows.ts` — TS lookup tables (`getContextWindow`, `getInferenceSpeed`, `isSlowLocalModel`, `getLatencyAwareTokenLimit`)
- `workers/continuum-core/src/ai/types.rs::ModelInfo` — Rust struct with `Option<>` on `max_output_tokens` and `cost_per_1k_tokens`
- 21 hardcoded `ModelInfo {…}` constructions across `openai_adapter.rs`, `candle_adapter.rs`, `anthropic_adapter.rs`, `embedding.rs` — each adapter maintains a *static* catalog instead of querying its source
- `system/core/src/models/mod.rs` — yet another parallel `Option<u32> max_output_tokens` definition

Symptoms this caused (visible on M5 PR #914 verification today):
- `ChatRAGBuilder` computed `totalBudget = floor(contextWindow × 0.75)`. For Qwen3.5-4b's 262k window = 196k tokens. RAG actually filled ~14k per request → llama-server allocated full 262k KV cache per persona slot → `com.docker.llama-server` 20.87 GB resident on M5, 44 GB total vs 32 GB physical = swap.
- Vision/audio attempts have failed silently when the hardcoded TS table claims a model supports a capability the actual model doesn't.
- `getInferenceSpeed` is a TS const — fundamentally can't reflect what's measured at runtime.

## Scope

Single coherent refactor, ~25 files, its own branch. Not to be sprinkled into other PRs.

### 1. `ModelMetadata` (replaces `ModelInfo`), all fields required

```rust
#[derive(Debug, Clone, Serialize, Deserialize, TS)]
#[ts(export, export_to = "../../../shared/generated/ai/ModelMetadata.ts")]
#[serde(rename_all = "camelCase")]
pub struct ModelMetadata {
    pub id: String,
    pub name: String,
    pub provider: String,
    pub capabilities: Vec<ModelCapability>,
    pub context_window: u32,
    pub max_output_tokens: u32,
    pub cost_per_1k_tokens: CostPer1kTokens,  // local = {0,0}
    pub tokens_per_second: f32,
    pub supports_streaming: bool,
    pub supports_tools: bool,
}
```

No `Option<>`. Local-cost = {0,0} is still a declaration, not an absence.

### 2. Adapters query their source, not hardcoded `vec![ModelInfo {…}]`

- **DMR**: `GET http://localhost:12434/engines/v1/models` returns the live catalog. `docker model inspect <id>` exposes GGUF metadata for fields the catalog doesn't.
- **OpenAI / Anthropic / DeepSeek / etc.**: their `/v1/models` endpoint. Cache at adapter `initialize()`.
- **Candle**: GGUF metadata directly from the loaded file.

Delete the 21 hardcoded literals.

### 3. `AIProviderAdapter::model_metadata(model_id)` returns the full struct

```rust
fn model_metadata(&self, model_id: &str) -> Option<ModelMetadata>;  // None ONLY when not in adapter's live catalog
```

### 4. Thread `ModelMetadata` through the chain

- `PersonaResponseGenerator` receives `ModelMetadata` at request entry.
- `ChatRAGBuilder.buildContext(model: ModelMetadata, …)` reads `model.context_window`, `model.tokens_per_second`, `model.capabilities` directly.
- Vision attachment, tool injection — gated by `model.capabilities` and `model.supports_tools`.

### 5. Delete the lookup-helper layer

- `system/shared/ModelContextWindows.ts` — fully deletable.
- `system/core/src/models/mod.rs` — collapse into `ai/types.rs`.

## Acceptance

- `grep -r "Option<u32>" workers/continuum-core/src/ai/` returns zero hits.
- `grep -rn "ModelInfo {" workers/continuum-core/src/` only matches `ai/types.rs` (the definition itself).
- `system/shared/ModelContextWindows.ts` deleted.
- `ChatRAGBuilder` and `PersonaResponseGenerator` take `ModelMetadata`; never reconstruct it from loose strings.
- Live test on M5: persona chat sends prompts that respect `model.context_window` AND the latency budget derived from `model.tokens_per_second`. KV cache pressure drops from 20+ GB to single-GB range.

## Why separate

Touching 21+ adapter sites + consumer chain + IPC export + TS plumbing has to land atomically. Half of it sprinkled into other PRs leaves the codebase worse than it started.

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source #917

Original issue body

Background

Scope

1. `ModelMetadata` (replaces `ModelInfo`), all fields required

2. Adapters query their source, not hardcoded `vec![ModelInfo {…}]`

3. `AIProviderAdapter::model_metadata(model_id)` returns the full struct

4. Thread `ModelMetadata` through the chain

5. Delete the lookup-helper layer

Acceptance

Why separate

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source #917

Description

Original issue body

Background

Scope

1. ModelMetadata (replaces ModelInfo), all fields required

2. Adapters query their source, not hardcoded vec![ModelInfo {…}]

3. AIProviderAdapter::model_metadata(model_id) returns the full struct

4. Thread ModelMetadata through the chain

5. Delete the lookup-helper layer

Acceptance

Why separate

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `ModelMetadata` (replaces `ModelInfo`), all fields required

2. Adapters query their source, not hardcoded `vec![ModelInfo {…}]`

3. `AIProviderAdapter::model_metadata(model_id)` returns the full struct

4. Thread `ModelMetadata` through the chain