Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions docs/dark-host-activation-runbook.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Dark-Host Activation Runbook (AIN-248)

> **Founder-gated.** This runbook depends on three founder authorizations:
>
> 1. **~$45 in provider credits** topped up across the 5 open-weight venues (DeepInfra $15 · Together $15 · Fireworks $10 · Groq $0 · Novita $5).
> 2. **Doppler keys** mirroring those credits into the api's environment.
> 3. **Disc#12 sign-off on the Model×Host ontology** (see [dark-host-ontology-proposal.md](./dark-host-ontology-proposal.md)).
>
> Until all three are in place, **do not run the activation migration**. The smoke harness in [`scripts/dark_host_smoke.py`](../scripts/dark_host_smoke.py) can be exercised independently; the catalog stays inactive.

## Why dark hosts matter

The catalog has **47 inactive models across 10 providers** as of 2026-05-24 (P5 §0 of SP-4). Five of those providers are open-weight venues that host the same Llama / Mistral / DeepSeek weights at different prices — Groq (cheap-fast), DeepInfra (cheap-stable), Together (broad), Fireworks (cheap-reasoning), Novita (frontier-coverage). Activating them moves Ainfera Inference from "5-host prime-broker" to "many-host prime-broker" — venue routing becomes a real prime-brokerage product.

Today the schema is **one-model-one-host** (verified: `select distinct(slug, provider) … having count(distinct provider) > 1` returns 0). The same model on Groq + DeepInfra + Together would need three distinct catalog rows. The Model×Host ontology proposal addresses this — see below.

## Phase 1 — smoke (founder, no DB change)

For each of the 5 venues, validate the adapter+upstream wiring **before** the catalog row is enrolled. The harness is read-only.

```bash
# DeepInfra
export DEEPINFRA_API_KEY="$(doppler secrets get DEEPINFRA_API_KEY --plain)"
uv run python scripts/dark_host_smoke.py \
--provider deepinfra \
--upstream-model meta-llama/Llama-3.3-70B-Instruct \
> smoke/deepinfra-llama-3.3-70b.json

# Together
export TOGETHER_API_KEY="$(doppler secrets get TOGETHER_API_KEY --plain)"
uv run python scripts/dark_host_smoke.py \
--provider together \
--upstream-model meta-llama/Llama-3.3-70B-Instruct-Turbo \
> smoke/together-llama-3.3-70b.json

# Fireworks
export FIREWORKS_API_KEY="$(doppler secrets get FIREWORKS_API_KEY --plain)"
uv run python scripts/dark_host_smoke.py \
--provider fireworks \
--upstream-model accounts/fireworks/models/llama-v3p3-70b-instruct \
> smoke/fireworks-llama-3.3-70b.json

# Groq (free tier — no credit topup needed)
export GROQ_API_KEY="$(doppler secrets get GROQ_API_KEY --plain)"
uv run python scripts/dark_host_smoke.py \
--provider groq \
--upstream-model llama-3.3-70b-versatile \
> smoke/groq-llama-3.3-70b.json

# Novita
export NOVITA_API_KEY="$(doppler secrets get NOVITA_API_KEY --plain)"
uv run python scripts/dark_host_smoke.py \
--provider novita \
--upstream-model meta-llama/llama-3.3-70b-instruct \
> smoke/novita-llama-3.3-70b.json
```

Each report should show `"both_ok": true` and `latency_ms` in the 200–5000ms range. Stash the 5 JSON reports — they become evidence in the §16 audit ledger when the model rows enroll.

## Phase 2 — Model×Host ontology decision (Disc#12)

Before any DB change, the founder authorizes the schema shape from [dark-host-ontology-proposal.md](./dark-host-ontology-proposal.md). Two paths the proposal lays out:

- **Path A (minimal):** keep the existing `models` table; add multiple rows for the same logical model (e.g. three rows for `llama-3.3-70b` differentiated by `provider_id`). Slug becomes non-unique → schema change.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path A slug uniqueness contradiction

Medium Severity

Phase 2 says Path A makes slug non-unique across providers, but ModelORM enforces global UniqueConstraint("slug", name="uq_models_slug"). Path A needs distinct suffixed slugs per venue, not duplicate slugs on different provider_id values.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b93ec86. Configure here.

- **Path B (M:N junction):** new `model_hosts` table; `models` rows stay slug-unique; per-host price/latency lives on the junction. Heavier migration, cleaner routing semantics.

The runbook below assumes **Path A** (the lighter migration). Path B requires a different migration template — see the proposal doc.

## Phase 3 — activation migration TEMPLATE (do NOT apply without §1 sign-off)

This template is **parametrized**. Filling in the values is the founder's tap; the file lives in `alembic/versions/` only after authorization. Saved here as a doc snippet to keep the schema clean.

```python
# alembic/versions/<NEXT>_activate_dark_host_<VENUE>_<MODEL_SLUG>.py
# DO NOT COMMIT until §1 founder authorizations are signed.

from __future__ import annotations
from decimal import Decimal
from alembic import op

# revision = "<NEXT>"
# down_revision = "<HEAD>"

# --- founder fills in these per-venue values from the smoke report ---
_VENUE = "deepinfra" # provider.slug
_MODEL_SLUG = "llama-3.3-70b-deepinfra" # NEW canonical slug (Path A)
_UPSTREAM_NAME = "meta-llama/Llama-3.3-70B-Instruct"
_INPUT_COST_PER_M = Decimal("0.49") # from venue pricing page
_OUTPUT_COST_PER_M = Decimal("0.79")
_Q_PRIOR = Decimal("0.78") # from AA Index v4 ÷ 100
_BRAND_SLUG = "meta-llama" # required for compliance gate
# ---------------------------------------------------------------------

def upgrade() -> None:
op.execute(f"""
UPDATE models SET
provider_model_name = '{_UPSTREAM_NAME}',
input_cost_per_million_usd = {_INPUT_COST_PER_M},
output_cost_per_million_usd = {_OUTPUT_COST_PER_M},
q_prior = {_Q_PRIOR},
aa_index_source = 'aa_v4_2026q2',
active = TRUE
WHERE slug = '{_MODEL_SLUG}'
AND provider_id = (SELECT id FROM providers WHERE slug = '{_VENUE}');
""")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Activation template only updates rows

Medium Severity

The Phase 3 migration template only runs UPDATE models for a new Path A slug such as llama-3.3-70b-deepinfra. If no matching row exists yet, upgrade() succeeds with zero rows changed and the model stays inactive, which is easy to miss.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b93ec86. Configure here.

# `brand.active = TRUE` is asserted in a separate sub-migration so
# the M_allowed gate clears for `_BRAND_SLUG`.

def downgrade() -> None:
op.execute(f"""
UPDATE models SET active = FALSE
WHERE slug = '{_MODEL_SLUG}'
AND provider_id = (SELECT id FROM providers WHERE slug = '{_VENUE}');
""")
```

## Phase 4 — verify (founder, post-deploy)

After the migration applies:

1. **Catalog row is active:**

```sql
SELECT slug, active, q_prior, aa_index_source,
input_cost_per_million_usd, output_cost_per_million_usd
FROM models WHERE slug = '<MODEL_SLUG>';
```

Expect `active=true`, `q_prior` populated, `aa_index_source='aa_v4_2026q2'`.

2. **Brain enrols it:** a routed call with the brain's default policy should now consider the new row. Read-only:

```sql
SELECT chosen_model_slug, candidates
FROM routing_outcomes
ORDER BY created_at DESC LIMIT 1;
```

The new slug should appear in `candidates[]` for a routed call. If the new model is cheaper than the previous winner and clears the floor, it becomes `chosen_model_slug`.

3. **Audit chain hash-link intact:** routine post-deploy.

## Rollback

`alembic downgrade -1` flips `active=false`. The brain stops enrolling the row on the next decision; existing inferences are unaffected (append-only audit chain). No data loss — the row stays in the catalog with the smoke-validated price/q_prior data for the next activation attempt.

## What this runbook does NOT do

- Change the routing engine (it's immutable per SP-4 §1).
- Re-score existing models — the new row enters the candidate set; old rows are untouched.
- Mutate any `routing_outcomes` row.
- Auto-enroll Anthropic / OpenAI / Google models — those are activated via the same template with `provider.slug ∈ {anthropic, openai, gemini, mistral, xai}`. The dark-host venues just use this runbook more often because they're the high-leverage open-weight catalog.
110 changes: 110 additions & 0 deletions docs/dark-host-ontology-proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Model×Host Ontology Proposal (AIN-248, Disc#12)

> **Status: PROPOSAL.** This document does not change any schema. Aulë drafts; the founder authorizes the ontology before any migration ships. The activation runbook ([dark-host-activation-runbook.md](./dark-host-activation-runbook.md)) is parametrized on which path is chosen below.

## The problem

Today's `models` table treats slug as effectively one-model-one-host. The schema is:

```
models
id uuid (PK)
slug text -- canonical, unique-ish
provider_id uuid (FK → providers)
active bool
input_cost_per_million_usd numeric
output_cost_per_million_usd numeric
q_prior numeric(3,2)
aa_index_source text
brand_id uuid (FK → brands)
...
```

Verified live (2026-05-24, `select distinct (slug, provider) … having count(distinct provider) > 1` against `dftfpwzqxoebwzepygzl`): **0 model slugs appear across multiple providers**. The catalog is operationally one-model-one-host today.

The dark-host activation needs the SAME logical model — `llama-3.3-70b`, say — to live on **Groq + DeepInfra + Together** at three different prices, three different latencies, three different reliability profiles. That's the prime-brokerage deepening: routing chooses not just *which model* but *which venue is hosting the model right now*.

There are three principled ways to model this. The founder picks one.

## Path A — keep `models` flat; multi-row per logical model (lightest migration)

Each (logical model, host) pair gets its own `models` row. The slug encodes the host:

```
slug provider_id (FK → providers)
───────────────────────────────── ──────────────────────────
llama-3.3-70b-groq groq
llama-3.3-70b-deepinfra deepinfra
llama-3.3-70b-together together
llama-3.3-70b-fireworks fireworks
llama-3.3-70b-novita novita
```

**Pros:**
- Zero schema change. The existing `models` table already accepts this shape — the activation migration just inserts rows with venue-suffixed slugs.
- Routing engine (`ainfera_routing.decide()`) treats each row as a candidate without any code change. The cheapest-clearing-floor objective naturally picks the cheapest venue for a given logical-model fit.
- §16 capture surfaces the actual venue routed to (`chosen_model_slug = "llama-3.3-70b-groq"`) — full attribution.

**Cons:**
- Slugs encode host info — `llama-3.3-70b-groq` reads as "a Groq-flavored Llama", which is slightly leaky to SDK consumers.
- The marketing claim "Ainfera picks the best venue for `llama-3.3-70b`" is harder to express: the caller has to ask for `ainfera-inference` (routed) — they can't ask for `llama-3.3-70b` and have Ainfera route across venues. They'd have to ask for ONE of the suffixed slugs.
- The "logical model" concept exists only implicitly (by inspecting the slug prefix). Cross-venue reports (`SELECT count(*) WHERE logical_model = "llama-3.3-70b"`) require a substring match.

**Migration burden:** 1 alembic migration per (model, venue) pair, applied via the runbook template. Linear effort, no schema delta.

## Path B — `model_hosts` junction (M:N)

Keep `models` slug-unique (`llama-3.3-70b` is one row). Add a new `model_hosts` junction that carries the per-host price + q_prior + latency profile:

```
models
id, slug (UNIQUE), brand_id, capabilities, ...
-- NO provider_id, NO cost fields, NO q_prior on this row

model_hosts
id uuid (PK)
model_id uuid (FK → models)
provider_id uuid (FK → providers)
upstream_model_name text -- "meta-llama/Llama-3.3-70B-Instruct"
input_cost_per_million_usd numeric
output_cost_per_million_usd numeric
q_prior numeric(3,2) -- per-host (Groq's Llama vs Together's Llama)
aa_index_source text
active bool
UNIQUE (model_id, provider_id)
```

**Pros:**
- Slug stays clean. `model="llama-3.3-70b"` becomes a valid routed request — the brain picks the cheapest active host.
- Cross-host aggregates are trivial: `SELECT count(*) FROM model_hosts WHERE model_id = ?`.
- Per-host quality signal: Groq's faster-cheaper Llama and Together's slower-cheaper Llama can have different `q_prior` values if `q_empirical` later diverges.
- Cleaner public narrative: "Ainfera routes ONE model across many venues" is true at the schema level.

**Cons:**
- **Real schema migration**: add `model_hosts` + backfill from existing `models` rows + drop the per-host fields from `models` (cost / q_prior / aa_index_source / provider_id). The 5 currently-active models become 5 (model, host) junction rows. Audit chain replay needs to handle the old row shape during the migration window.
- Routing engine needs candidate-set shape change: a `Candidate` today is built from one `models` row; under Path B it's built from a `models ⋈ model_hosts` join. Brain code (`ainfera_routing.build_candidates`) gets a column rename. Engine logic is unchanged — but it's a code touch, which is a Disc#12 concern.
- `routing_outcomes.chosen_model_slug` is no longer enough to attribute a routed call to a venue; need `chosen_host_id` or `chosen_model_host` text. Schema change to the immutable §16 row — **violates SP-4 §1 routing_outcomes immutability** unless we add the column as `NULL`-default in a forward-compat migration.

**Migration burden:** 1 big alembic migration + a coordinated routing-engine field rename. Heavier; needs a backward-compat read window during deploy.

## Path C — same as Path A, but pin a "primary" host per logical model

Path A's slugs (`llama-3.3-70b-groq`) but with a new optional column `models.logical_slug` (default null, populated for multi-host rows). Reporting joins on `logical_slug` for cross-venue aggregates; caller-facing API stays unchanged (still slug-based).

**Pros:** lightest schema delta (one nullable column); cross-venue aggregates queryable; public narrative unchanged.
**Cons:** still requires the caller to know which suffixed slug to pin; same leak as Path A from the SDK consumer's perspective.

## Recommendation

**Path A** for the SP-4 dark-host activation pass — it's the lightest migration and the routing engine needs zero code change. The brain just sees more candidates. Cross-venue aggregates are postponed (substring match) until the catalog reaches enough multi-host density to justify the model_hosts junction.

When the catalog hits ~20 multi-host models (Path A becomes painful to report on), migrate to Path B in a follow-up sprint with the §16-column-addition migration handled separately from the schema reshape.

## Disc#12 questions for the founder

1. **Which path?** A (light) / B (junction) / C (logical_slug column).
2. **Slug convention for Path A:** `<model>-<venue>` (e.g. `llama-3.3-70b-groq`) vs `<venue>/<model>` (e.g. `groq/llama-3.3-70b`). The first is greppable; the second matches Anthropic's `anthropic/claude-...` convention. Both work; pick one.
3. **q_prior sourcing per host:** the AA Index gives one number per logical model. Per-host q_prior diverges from `q_empirical` once traffic lands; until then all hosts of the same logical model start at the same `q_prior` value. Confirm that's the intended starting point.
4. **Public claim:** "Ainfera Inference picks the cheapest venue for your floor" — is this in the public narrative for the activation announce, or does it stay internal until enough venues are active to demo it?

Until the founder answers these, the activation runbook stays parked. The smoke harness can be exercised in parallel — it never touches the DB.
Loading
Loading