ainfera-ai · hizrianraz · May 31, 2026 · May 23, 2026 · cursor · May 24, 2026
@@ -0,0 +1,152 @@
+# Dark-Host Activation Runbook (AIN-248)
+
+> **Founder-gated.** This runbook depends on three founder authorizations:
+>
+> 1. **~$45 in provider credits** topped up across the 5 open-weight venues (DeepInfra $15 · Together $15 · Fireworks $10 · Groq $0 · Novita $5).
+> 2. **Doppler keys** mirroring those credits into the api's environment.
+> 3. **Disc#12 sign-off on the Model×Host ontology** (see [dark-host-ontology-proposal.md](./dark-host-ontology-proposal.md)).
+>
+> Until all three are in place, **do not run the activation migration**. The smoke harness in [`scripts/dark_host_smoke.py`](../scripts/dark_host_smoke.py) can be exercised independently; the catalog stays inactive.
+
+## Why dark hosts matter
+
+The catalog has **47 inactive models across 10 providers** as of 2026-05-24 (P5 §0 of SP-4). Five of those providers are open-weight venues that host the same Llama / Mistral / DeepSeek weights at different prices — Groq (cheap-fast), DeepInfra (cheap-stable), Together (broad), Fireworks (cheap-reasoning), Novita (frontier-coverage). Activating them moves Ainfera Inference from "5-host prime-broker" to "many-host prime-broker" — venue routing becomes a real prime-brokerage product.
+
+Today the schema is **one-model-one-host** (verified: `select distinct(slug, provider) … having count(distinct provider) > 1` returns 0). The same model on Groq + DeepInfra + Together would need three distinct catalog rows. The Model×Host ontology proposal addresses this — see below.
+
+## Phase 1 — smoke (founder, no DB change)
+
+For each of the 5 venues, validate the adapter+upstream wiring **before** the catalog row is enrolled. The harness is read-only.
+
+```bash
+# DeepInfra
+export DEEPINFRA_API_KEY="$(doppler secrets get DEEPINFRA_API_KEY --plain)"
+uv run python scripts/dark_host_smoke.py \
+    --provider deepinfra \
+    --upstream-model meta-llama/Llama-3.3-70B-Instruct \
+    > smoke/deepinfra-llama-3.3-70b.json
+
+# Together
+export TOGETHER_API_KEY="$(doppler secrets get TOGETHER_API_KEY --plain)"
+uv run python scripts/dark_host_smoke.py \
+    --provider together \
+    --upstream-model meta-llama/Llama-3.3-70B-Instruct-Turbo \
+    > smoke/together-llama-3.3-70b.json
+
+# Fireworks
+export FIREWORKS_API_KEY="$(doppler secrets get FIREWORKS_API_KEY --plain)"
+uv run python scripts/dark_host_smoke.py \
+    --provider fireworks \
+    --upstream-model accounts/fireworks/models/llama-v3p3-70b-instruct \
+    > smoke/fireworks-llama-3.3-70b.json
+
+# Groq (free tier — no credit topup needed)
+export GROQ_API_KEY="$(doppler secrets get GROQ_API_KEY --plain)"
+uv run python scripts/dark_host_smoke.py \
+    --provider groq \
+    --upstream-model llama-3.3-70b-versatile \
+    > smoke/groq-llama-3.3-70b.json
+
+# Novita
+export NOVITA_API_KEY="$(doppler secrets get NOVITA_API_KEY --plain)"
+uv run python scripts/dark_host_smoke.py \
+    --provider novita \
+    --upstream-model meta-llama/llama-3.3-70b-instruct \
+    > smoke/novita-llama-3.3-70b.json
+```
+
+Each report should show `"both_ok": true` and `latency_ms` in the 200–5000ms range. Stash the 5 JSON reports — they become evidence in the §16 audit ledger when the model rows enroll.
+
+## Phase 2 — Model×Host ontology decision (Disc#12)
+
+Before any DB change, the founder authorizes the schema shape from [dark-host-ontology-proposal.md](./dark-host-ontology-proposal.md). Two paths the proposal lays out:
+
+- **Path A (minimal):** keep the existing `models` table; add multiple rows for the same logical model (e.g. three rows for `llama-3.3-70b` differentiated by `provider_id`). Slug becomes non-unique → schema change.
+- **Path B (M:N junction):** new `model_hosts` table; `models` rows stay slug-unique; per-host price/latency lives on the junction. Heavier migration, cleaner routing semantics.
+
+The runbook below assumes **Path A** (the lighter migration). Path B requires a different migration template — see the proposal doc.
+
+## Phase 3 — activation migration TEMPLATE (do NOT apply without §1 sign-off)
+
+This template is **parametrized**. Filling in the values is the founder's tap; the file lives in `alembic/versions/` only after authorization. Saved here as a doc snippet to keep the schema clean.
+
+```python
+# alembic/versions/<NEXT>_activate_dark_host_<VENUE>_<MODEL_SLUG>.py
+# DO NOT COMMIT until §1 founder authorizations are signed.
+
+from __future__ import annotations
+from decimal import Decimal
+from alembic import op
+
+# revision = "<NEXT>"
+# down_revision = "<HEAD>"
+
+# --- founder fills in these per-venue values from the smoke report ---
+_VENUE = "deepinfra"                           # provider.slug
+_MODEL_SLUG = "llama-3.3-70b-deepinfra"        # NEW canonical slug (Path A)
+_UPSTREAM_NAME = "meta-llama/Llama-3.3-70B-Instruct"
+_INPUT_COST_PER_M = Decimal("0.49")            # from venue pricing page
+_OUTPUT_COST_PER_M = Decimal("0.79")
+_Q_PRIOR = Decimal("0.78")                     # from AA Index v4 ÷ 100
+_BRAND_SLUG = "meta-llama"                     # required for compliance gate
+# ---------------------------------------------------------------------
+
+def upgrade() -> None:
+    op.execute(f"""
+        UPDATE models SET
+            provider_model_name = '{_UPSTREAM_NAME}',
+            input_cost_per_million_usd = {_INPUT_COST_PER_M},
+            output_cost_per_million_usd = {_OUTPUT_COST_PER_M},
+            q_prior = {_Q_PRIOR},
+            aa_index_source = 'aa_v4_2026q2',
+            active = TRUE
+        WHERE slug = '{_MODEL_SLUG}'
+          AND provider_id = (SELECT id FROM providers WHERE slug = '{_VENUE}');
+    """)
+    # `brand.active = TRUE` is asserted in a separate sub-migration so
+    # the M_allowed gate clears for `_BRAND_SLUG`.
+
+def downgrade() -> None:
+    op.execute(f"""
+        UPDATE models SET active = FALSE
+        WHERE slug = '{_MODEL_SLUG}'
+          AND provider_id = (SELECT id FROM providers WHERE slug = '{_VENUE}');
+    """)
+```
+
+## Phase 4 — verify (founder, post-deploy)
+
+After the migration applies:
+
+1. **Catalog row is active:**
+
+   ```sql
+   SELECT slug, active, q_prior, aa_index_source,
+          input_cost_per_million_usd, output_cost_per_million_usd
+   FROM models WHERE slug = '<MODEL_SLUG>';
+   ```
+
+   Expect `active=true`, `q_prior` populated, `aa_index_source='aa_v4_2026q2'`.
+
+2. **Brain enrols it:** a routed call with the brain's default policy should now consider the new row. Read-only:
+
+   ```sql
+   SELECT chosen_model_slug, candidates
+   FROM routing_outcomes
+   ORDER BY created_at DESC LIMIT 1;
+   ```
+
+   The new slug should appear in `candidates[]` for a routed call. If the new model is cheaper than the previous winner and clears the floor, it becomes `chosen_model_slug`.
+
+3. **Audit chain hash-link intact:** routine post-deploy.
+
+## Rollback
+
+`alembic downgrade -1` flips `active=false`. The brain stops enrolling the row on the next decision; existing inferences are unaffected (append-only audit chain). No data loss — the row stays in the catalog with the smoke-validated price/q_prior data for the next activation attempt.
+
+## What this runbook does NOT do
+
+- Change the routing engine (it's immutable per SP-4 §1).
+- Re-score existing models — the new row enters the candidate set; old rows are untouched.
+- Mutate any `routing_outcomes` row.
+- Auto-enroll Anthropic / OpenAI / Google models — those are activated via the same template with `provider.slug ∈ {anthropic, openai, gemini, mistral, xai}`. The dark-host venues just use this runbook more often because they're the high-leverage open-weight catalog.
@@ -0,0 +1,110 @@
+# Model×Host Ontology Proposal (AIN-248, Disc#12)
+
+> **Status: PROPOSAL.** This document does not change any schema. Aulë drafts; the founder authorizes the ontology before any migration ships. The activation runbook ([dark-host-activation-runbook.md](./dark-host-activation-runbook.md)) is parametrized on which path is chosen below.
+
+## The problem
+
+Today's `models` table treats slug as effectively one-model-one-host. The schema is:
+
+```
+models
+  id           uuid (PK)
+  slug         text                          -- canonical, unique-ish
+  provider_id  uuid (FK → providers)
+  active       bool
+  input_cost_per_million_usd  numeric
+  output_cost_per_million_usd numeric
+  q_prior      numeric(3,2)
+  aa_index_source             text
+  brand_id     uuid (FK → brands)
+  ...
+```
+
+Verified live (2026-05-24, `select distinct (slug, provider) … having count(distinct provider) > 1` against `dftfpwzqxoebwzepygzl`): **0 model slugs appear across multiple providers**. The catalog is operationally one-model-one-host today.
+
+The dark-host activation needs the SAME logical model — `llama-3.3-70b`, say — to live on **Groq + DeepInfra + Together** at three different prices, three different latencies, three different reliability profiles. That's the prime-brokerage deepening: routing chooses not just *which model* but *which venue is hosting the model right now*.
+
+There are three principled ways to model this. The founder picks one.
+
+## Path A — keep `models` flat; multi-row per logical model (lightest migration)
+
+Each (logical model, host) pair gets its own `models` row. The slug encodes the host:
+
+```
+slug                              provider_id (FK → providers)
+─────────────────────────────────  ──────────────────────────
+llama-3.3-70b-groq                 groq
+llama-3.3-70b-deepinfra            deepinfra
+llama-3.3-70b-together             together
+llama-3.3-70b-fireworks            fireworks
+llama-3.3-70b-novita               novita
+```
+
+**Pros:**
+- Zero schema change. The existing `models` table already accepts this shape — the activation migration just inserts rows with venue-suffixed slugs.
+- Routing engine (`ainfera_routing.decide()`) treats each row as a candidate without any code change. The cheapest-clearing-floor objective naturally picks the cheapest venue for a given logical-model fit.
+- §16 capture surfaces the actual venue routed to (`chosen_model_slug = "llama-3.3-70b-groq"`) — full attribution.
+
+**Cons:**
+- Slugs encode host info — `llama-3.3-70b-groq` reads as "a Groq-flavored Llama", which is slightly leaky to SDK consumers.
+- The marketing claim "Ainfera picks the best venue for `llama-3.3-70b`" is harder to express: the caller has to ask for `ainfera-inference` (routed) — they can't ask for `llama-3.3-70b` and have Ainfera route across venues. They'd have to ask for ONE of the suffixed slugs.
+- The "logical model" concept exists only implicitly (by inspecting the slug prefix). Cross-venue reports (`SELECT count(*) WHERE logical_model = "llama-3.3-70b"`) require a substring match.
+
+**Migration burden:** 1 alembic migration per (model, venue) pair, applied via the runbook template. Linear effort, no schema delta.
+
+## Path B — `model_hosts` junction (M:N)
+
+Keep `models` slug-unique (`llama-3.3-70b` is one row). Add a new `model_hosts` junction that carries the per-host price + q_prior + latency profile:
+
+```
+models
+  id, slug (UNIQUE), brand_id, capabilities, ...
+  -- NO provider_id, NO cost fields, NO q_prior on this row
+
+model_hosts
+  id            uuid (PK)
+  model_id      uuid (FK → models)
+  provider_id   uuid (FK → providers)
+  upstream_model_name  text       -- "meta-llama/Llama-3.3-70B-Instruct"
+  input_cost_per_million_usd  numeric
+  output_cost_per_million_usd numeric
+  q_prior       numeric(3,2)      -- per-host (Groq's Llama vs Together's Llama)
+  aa_index_source              text
+  active        bool
+  UNIQUE (model_id, provider_id)
+```
+
+**Pros:**
+- Slug stays clean. `model="llama-3.3-70b"` becomes a valid routed request — the brain picks the cheapest active host.
+- Cross-host aggregates are trivial: `SELECT count(*) FROM model_hosts WHERE model_id = ?`.
+- Per-host quality signal: Groq's faster-cheaper Llama and Together's slower-cheaper Llama can have different `q_prior` values if `q_empirical` later diverges.
+- Cleaner public narrative: "Ainfera routes ONE model across many venues" is true at the schema level.
+
+**Cons:**
+- **Real schema migration**: add `model_hosts` + backfill from existing `models` rows + drop the per-host fields from `models` (cost / q_prior / aa_index_source / provider_id). The 5 currently-active models become 5 (model, host) junction rows. Audit chain replay needs to handle the old row shape during the migration window.
+- Routing engine needs candidate-set shape change: a `Candidate` today is built from one `models` row; under Path B it's built from a `models ⋈ model_hosts` join. Brain code (`ainfera_routing.build_candidates`) gets a column rename. Engine logic is unchanged — but it's a code touch, which is a Disc#12 concern.
+- `routing_outcomes.chosen_model_slug` is no longer enough to attribute a routed call to a venue; need `chosen_host_id` or `chosen_model_host` text. Schema change to the immutable §16 row — **violates SP-4 §1 routing_outcomes immutability** unless we add the column as `NULL`-default in a forward-compat migration.
+
+**Migration burden:** 1 big alembic migration + a coordinated routing-engine field rename. Heavier; needs a backward-compat read window during deploy.
+
+## Path C — same as Path A, but pin a "primary" host per logical model
+
+Path A's slugs (`llama-3.3-70b-groq`) but with a new optional column `models.logical_slug` (default null, populated for multi-host rows). Reporting joins on `logical_slug` for cross-venue aggregates; caller-facing API stays unchanged (still slug-based).
+
+**Pros:** lightest schema delta (one nullable column); cross-venue aggregates queryable; public narrative unchanged.
+**Cons:** still requires the caller to know which suffixed slug to pin; same leak as Path A from the SDK consumer's perspective.
+
+## Recommendation
+
+**Path A** for the SP-4 dark-host activation pass — it's the lightest migration and the routing engine needs zero code change. The brain just sees more candidates. Cross-venue aggregates are postponed (substring match) until the catalog reaches enough multi-host density to justify the model_hosts junction.
+
+When the catalog hits ~20 multi-host models (Path A becomes painful to report on), migrate to Path B in a follow-up sprint with the §16-column-addition migration handled separately from the schema reshape.
+
+## Disc#12 questions for the founder
+
+1. **Which path?** A (light) / B (junction) / C (logical_slug column).
+2. **Slug convention for Path A:** `<model>-<venue>` (e.g. `llama-3.3-70b-groq`) vs `<venue>/<model>` (e.g. `groq/llama-3.3-70b`). The first is greppable; the second matches Anthropic's `anthropic/claude-...` convention. Both work; pick one.
+3. **q_prior sourcing per host:** the AA Index gives one number per logical model. Per-host q_prior diverges from `q_empirical` once traffic lands; until then all hosts of the same logical model start at the same `q_prior` value. Confirm that's the intended starting point.
+4. **Public claim:** "Ainfera Inference picks the cheapest venue for your floor" — is this in the public narrative for the activation announce, or does it stay internal until enough venues are active to demo it?
+
+Until the founder answers these, the activation runbook stays parked. The smoke harness can be exercised in parallel — it never touches the DB.