-
Notifications
You must be signed in to change notification settings - Fork 0
chore(api): SP-4 PR-C · dark-host activation scaffold (AIN-248, founder-gated) #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| # Dark-Host Activation Runbook (AIN-248) | ||
|
|
||
| > **Founder-gated.** This runbook depends on three founder authorizations: | ||
| > | ||
| > 1. **~$45 in provider credits** topped up across the 5 open-weight venues (DeepInfra $15 · Together $15 · Fireworks $10 · Groq $0 · Novita $5). | ||
| > 2. **Doppler keys** mirroring those credits into the api's environment. | ||
| > 3. **Disc#12 sign-off on the Model×Host ontology** (see [dark-host-ontology-proposal.md](./dark-host-ontology-proposal.md)). | ||
| > | ||
| > Until all three are in place, **do not run the activation migration**. The smoke harness in [`scripts/dark_host_smoke.py`](../scripts/dark_host_smoke.py) can be exercised independently; the catalog stays inactive. | ||
|
|
||
| ## Why dark hosts matter | ||
|
|
||
| The catalog has **47 inactive models across 10 providers** as of 2026-05-24 (P5 §0 of SP-4). Five of those providers are open-weight venues that host the same Llama / Mistral / DeepSeek weights at different prices — Groq (cheap-fast), DeepInfra (cheap-stable), Together (broad), Fireworks (cheap-reasoning), Novita (frontier-coverage). Activating them moves Ainfera Inference from "5-host prime-broker" to "many-host prime-broker" — venue routing becomes a real prime-brokerage product. | ||
|
|
||
| Today the schema is **one-model-one-host** (verified: `select distinct(slug, provider) … having count(distinct provider) > 1` returns 0). The same model on Groq + DeepInfra + Together would need three distinct catalog rows. The Model×Host ontology proposal addresses this — see below. | ||
|
|
||
| ## Phase 1 — smoke (founder, no DB change) | ||
|
|
||
| For each of the 5 venues, validate the adapter+upstream wiring **before** the catalog row is enrolled. The harness is read-only. | ||
|
|
||
| ```bash | ||
| # DeepInfra | ||
| export DEEPINFRA_API_KEY="$(doppler secrets get DEEPINFRA_API_KEY --plain)" | ||
| uv run python scripts/dark_host_smoke.py \ | ||
| --provider deepinfra \ | ||
| --upstream-model meta-llama/Llama-3.3-70B-Instruct \ | ||
| > smoke/deepinfra-llama-3.3-70b.json | ||
|
|
||
| # Together | ||
| export TOGETHER_API_KEY="$(doppler secrets get TOGETHER_API_KEY --plain)" | ||
| uv run python scripts/dark_host_smoke.py \ | ||
| --provider together \ | ||
| --upstream-model meta-llama/Llama-3.3-70B-Instruct-Turbo \ | ||
| > smoke/together-llama-3.3-70b.json | ||
|
|
||
| # Fireworks | ||
| export FIREWORKS_API_KEY="$(doppler secrets get FIREWORKS_API_KEY --plain)" | ||
| uv run python scripts/dark_host_smoke.py \ | ||
| --provider fireworks \ | ||
| --upstream-model accounts/fireworks/models/llama-v3p3-70b-instruct \ | ||
| > smoke/fireworks-llama-3.3-70b.json | ||
|
|
||
| # Groq (free tier — no credit topup needed) | ||
| export GROQ_API_KEY="$(doppler secrets get GROQ_API_KEY --plain)" | ||
| uv run python scripts/dark_host_smoke.py \ | ||
| --provider groq \ | ||
| --upstream-model llama-3.3-70b-versatile \ | ||
| > smoke/groq-llama-3.3-70b.json | ||
|
|
||
| # Novita | ||
| export NOVITA_API_KEY="$(doppler secrets get NOVITA_API_KEY --plain)" | ||
| uv run python scripts/dark_host_smoke.py \ | ||
| --provider novita \ | ||
| --upstream-model meta-llama/llama-3.3-70b-instruct \ | ||
| > smoke/novita-llama-3.3-70b.json | ||
| ``` | ||
|
|
||
| Each report should show `"both_ok": true` and `latency_ms` in the 200–5000ms range. Stash the 5 JSON reports — they become evidence in the §16 audit ledger when the model rows enroll. | ||
|
|
||
| ## Phase 2 — Model×Host ontology decision (Disc#12) | ||
|
|
||
| Before any DB change, the founder authorizes the schema shape from [dark-host-ontology-proposal.md](./dark-host-ontology-proposal.md). Two paths the proposal lays out: | ||
|
|
||
| - **Path A (minimal):** keep the existing `models` table; add multiple rows for the same logical model (e.g. three rows for `llama-3.3-70b` differentiated by `provider_id`). Slug becomes non-unique → schema change. | ||
| - **Path B (M:N junction):** new `model_hosts` table; `models` rows stay slug-unique; per-host price/latency lives on the junction. Heavier migration, cleaner routing semantics. | ||
|
|
||
| The runbook below assumes **Path A** (the lighter migration). Path B requires a different migration template — see the proposal doc. | ||
|
|
||
| ## Phase 3 — activation migration TEMPLATE (do NOT apply without §1 sign-off) | ||
|
|
||
| This template is **parametrized**. Filling in the values is the founder's tap; the file lives in `alembic/versions/` only after authorization. Saved here as a doc snippet to keep the schema clean. | ||
|
|
||
| ```python | ||
| # alembic/versions/<NEXT>_activate_dark_host_<VENUE>_<MODEL_SLUG>.py | ||
| # DO NOT COMMIT until §1 founder authorizations are signed. | ||
|
|
||
| from __future__ import annotations | ||
| from decimal import Decimal | ||
| from alembic import op | ||
|
|
||
| # revision = "<NEXT>" | ||
| # down_revision = "<HEAD>" | ||
|
|
||
| # --- founder fills in these per-venue values from the smoke report --- | ||
| _VENUE = "deepinfra" # provider.slug | ||
| _MODEL_SLUG = "llama-3.3-70b-deepinfra" # NEW canonical slug (Path A) | ||
| _UPSTREAM_NAME = "meta-llama/Llama-3.3-70B-Instruct" | ||
| _INPUT_COST_PER_M = Decimal("0.49") # from venue pricing page | ||
| _OUTPUT_COST_PER_M = Decimal("0.79") | ||
| _Q_PRIOR = Decimal("0.78") # from AA Index v4 ÷ 100 | ||
| _BRAND_SLUG = "meta-llama" # required for compliance gate | ||
| # --------------------------------------------------------------------- | ||
|
|
||
| def upgrade() -> None: | ||
| op.execute(f""" | ||
| UPDATE models SET | ||
| provider_model_name = '{_UPSTREAM_NAME}', | ||
| input_cost_per_million_usd = {_INPUT_COST_PER_M}, | ||
| output_cost_per_million_usd = {_OUTPUT_COST_PER_M}, | ||
| q_prior = {_Q_PRIOR}, | ||
| aa_index_source = 'aa_v4_2026q2', | ||
| active = TRUE | ||
| WHERE slug = '{_MODEL_SLUG}' | ||
| AND provider_id = (SELECT id FROM providers WHERE slug = '{_VENUE}'); | ||
| """) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Activation template only updates rowsMedium Severity The Phase 3 migration template only runs Reviewed by Cursor Bugbot for commit b93ec86. Configure here. |
||
| # `brand.active = TRUE` is asserted in a separate sub-migration so | ||
| # the M_allowed gate clears for `_BRAND_SLUG`. | ||
|
|
||
| def downgrade() -> None: | ||
| op.execute(f""" | ||
| UPDATE models SET active = FALSE | ||
| WHERE slug = '{_MODEL_SLUG}' | ||
| AND provider_id = (SELECT id FROM providers WHERE slug = '{_VENUE}'); | ||
| """) | ||
| ``` | ||
|
|
||
| ## Phase 4 — verify (founder, post-deploy) | ||
|
|
||
| After the migration applies: | ||
|
|
||
| 1. **Catalog row is active:** | ||
|
|
||
| ```sql | ||
| SELECT slug, active, q_prior, aa_index_source, | ||
| input_cost_per_million_usd, output_cost_per_million_usd | ||
| FROM models WHERE slug = '<MODEL_SLUG>'; | ||
| ``` | ||
|
|
||
| Expect `active=true`, `q_prior` populated, `aa_index_source='aa_v4_2026q2'`. | ||
|
|
||
| 2. **Brain enrols it:** a routed call with the brain's default policy should now consider the new row. Read-only: | ||
|
|
||
| ```sql | ||
| SELECT chosen_model_slug, candidates | ||
| FROM routing_outcomes | ||
| ORDER BY created_at DESC LIMIT 1; | ||
| ``` | ||
|
|
||
| The new slug should appear in `candidates[]` for a routed call. If the new model is cheaper than the previous winner and clears the floor, it becomes `chosen_model_slug`. | ||
|
|
||
| 3. **Audit chain hash-link intact:** routine post-deploy. | ||
|
|
||
| ## Rollback | ||
|
|
||
| `alembic downgrade -1` flips `active=false`. The brain stops enrolling the row on the next decision; existing inferences are unaffected (append-only audit chain). No data loss — the row stays in the catalog with the smoke-validated price/q_prior data for the next activation attempt. | ||
|
|
||
| ## What this runbook does NOT do | ||
|
|
||
| - Change the routing engine (it's immutable per SP-4 §1). | ||
| - Re-score existing models — the new row enters the candidate set; old rows are untouched. | ||
| - Mutate any `routing_outcomes` row. | ||
| - Auto-enroll Anthropic / OpenAI / Google models — those are activated via the same template with `provider.slug ∈ {anthropic, openai, gemini, mistral, xai}`. The dark-host venues just use this runbook more often because they're the high-leverage open-weight catalog. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| # Model×Host Ontology Proposal (AIN-248, Disc#12) | ||
|
|
||
| > **Status: PROPOSAL.** This document does not change any schema. Aulë drafts; the founder authorizes the ontology before any migration ships. The activation runbook ([dark-host-activation-runbook.md](./dark-host-activation-runbook.md)) is parametrized on which path is chosen below. | ||
|
|
||
| ## The problem | ||
|
|
||
| Today's `models` table treats slug as effectively one-model-one-host. The schema is: | ||
|
|
||
| ``` | ||
| models | ||
| id uuid (PK) | ||
| slug text -- canonical, unique-ish | ||
| provider_id uuid (FK → providers) | ||
| active bool | ||
| input_cost_per_million_usd numeric | ||
| output_cost_per_million_usd numeric | ||
| q_prior numeric(3,2) | ||
| aa_index_source text | ||
| brand_id uuid (FK → brands) | ||
| ... | ||
| ``` | ||
|
|
||
| Verified live (2026-05-24, `select distinct (slug, provider) … having count(distinct provider) > 1` against `dftfpwzqxoebwzepygzl`): **0 model slugs appear across multiple providers**. The catalog is operationally one-model-one-host today. | ||
|
|
||
| The dark-host activation needs the SAME logical model — `llama-3.3-70b`, say — to live on **Groq + DeepInfra + Together** at three different prices, three different latencies, three different reliability profiles. That's the prime-brokerage deepening: routing chooses not just *which model* but *which venue is hosting the model right now*. | ||
|
|
||
| There are three principled ways to model this. The founder picks one. | ||
|
|
||
| ## Path A — keep `models` flat; multi-row per logical model (lightest migration) | ||
|
|
||
| Each (logical model, host) pair gets its own `models` row. The slug encodes the host: | ||
|
|
||
| ``` | ||
| slug provider_id (FK → providers) | ||
| ───────────────────────────────── ────────────────────────── | ||
| llama-3.3-70b-groq groq | ||
| llama-3.3-70b-deepinfra deepinfra | ||
| llama-3.3-70b-together together | ||
| llama-3.3-70b-fireworks fireworks | ||
| llama-3.3-70b-novita novita | ||
| ``` | ||
|
|
||
| **Pros:** | ||
| - Zero schema change. The existing `models` table already accepts this shape — the activation migration just inserts rows with venue-suffixed slugs. | ||
| - Routing engine (`ainfera_routing.decide()`) treats each row as a candidate without any code change. The cheapest-clearing-floor objective naturally picks the cheapest venue for a given logical-model fit. | ||
| - §16 capture surfaces the actual venue routed to (`chosen_model_slug = "llama-3.3-70b-groq"`) — full attribution. | ||
|
|
||
| **Cons:** | ||
| - Slugs encode host info — `llama-3.3-70b-groq` reads as "a Groq-flavored Llama", which is slightly leaky to SDK consumers. | ||
| - The marketing claim "Ainfera picks the best venue for `llama-3.3-70b`" is harder to express: the caller has to ask for `ainfera-inference` (routed) — they can't ask for `llama-3.3-70b` and have Ainfera route across venues. They'd have to ask for ONE of the suffixed slugs. | ||
| - The "logical model" concept exists only implicitly (by inspecting the slug prefix). Cross-venue reports (`SELECT count(*) WHERE logical_model = "llama-3.3-70b"`) require a substring match. | ||
|
|
||
| **Migration burden:** 1 alembic migration per (model, venue) pair, applied via the runbook template. Linear effort, no schema delta. | ||
|
|
||
| ## Path B — `model_hosts` junction (M:N) | ||
|
|
||
| Keep `models` slug-unique (`llama-3.3-70b` is one row). Add a new `model_hosts` junction that carries the per-host price + q_prior + latency profile: | ||
|
|
||
| ``` | ||
| models | ||
| id, slug (UNIQUE), brand_id, capabilities, ... | ||
| -- NO provider_id, NO cost fields, NO q_prior on this row | ||
|
|
||
| model_hosts | ||
| id uuid (PK) | ||
| model_id uuid (FK → models) | ||
| provider_id uuid (FK → providers) | ||
| upstream_model_name text -- "meta-llama/Llama-3.3-70B-Instruct" | ||
| input_cost_per_million_usd numeric | ||
| output_cost_per_million_usd numeric | ||
| q_prior numeric(3,2) -- per-host (Groq's Llama vs Together's Llama) | ||
| aa_index_source text | ||
| active bool | ||
| UNIQUE (model_id, provider_id) | ||
| ``` | ||
|
|
||
| **Pros:** | ||
| - Slug stays clean. `model="llama-3.3-70b"` becomes a valid routed request — the brain picks the cheapest active host. | ||
| - Cross-host aggregates are trivial: `SELECT count(*) FROM model_hosts WHERE model_id = ?`. | ||
| - Per-host quality signal: Groq's faster-cheaper Llama and Together's slower-cheaper Llama can have different `q_prior` values if `q_empirical` later diverges. | ||
| - Cleaner public narrative: "Ainfera routes ONE model across many venues" is true at the schema level. | ||
|
|
||
| **Cons:** | ||
| - **Real schema migration**: add `model_hosts` + backfill from existing `models` rows + drop the per-host fields from `models` (cost / q_prior / aa_index_source / provider_id). The 5 currently-active models become 5 (model, host) junction rows. Audit chain replay needs to handle the old row shape during the migration window. | ||
| - Routing engine needs candidate-set shape change: a `Candidate` today is built from one `models` row; under Path B it's built from a `models ⋈ model_hosts` join. Brain code (`ainfera_routing.build_candidates`) gets a column rename. Engine logic is unchanged — but it's a code touch, which is a Disc#12 concern. | ||
| - `routing_outcomes.chosen_model_slug` is no longer enough to attribute a routed call to a venue; need `chosen_host_id` or `chosen_model_host` text. Schema change to the immutable §16 row — **violates SP-4 §1 routing_outcomes immutability** unless we add the column as `NULL`-default in a forward-compat migration. | ||
|
|
||
| **Migration burden:** 1 big alembic migration + a coordinated routing-engine field rename. Heavier; needs a backward-compat read window during deploy. | ||
|
|
||
| ## Path C — same as Path A, but pin a "primary" host per logical model | ||
|
|
||
| Path A's slugs (`llama-3.3-70b-groq`) but with a new optional column `models.logical_slug` (default null, populated for multi-host rows). Reporting joins on `logical_slug` for cross-venue aggregates; caller-facing API stays unchanged (still slug-based). | ||
|
|
||
| **Pros:** lightest schema delta (one nullable column); cross-venue aggregates queryable; public narrative unchanged. | ||
| **Cons:** still requires the caller to know which suffixed slug to pin; same leak as Path A from the SDK consumer's perspective. | ||
|
|
||
| ## Recommendation | ||
|
|
||
| **Path A** for the SP-4 dark-host activation pass — it's the lightest migration and the routing engine needs zero code change. The brain just sees more candidates. Cross-venue aggregates are postponed (substring match) until the catalog reaches enough multi-host density to justify the model_hosts junction. | ||
|
|
||
| When the catalog hits ~20 multi-host models (Path A becomes painful to report on), migrate to Path B in a follow-up sprint with the §16-column-addition migration handled separately from the schema reshape. | ||
|
|
||
| ## Disc#12 questions for the founder | ||
|
|
||
| 1. **Which path?** A (light) / B (junction) / C (logical_slug column). | ||
| 2. **Slug convention for Path A:** `<model>-<venue>` (e.g. `llama-3.3-70b-groq`) vs `<venue>/<model>` (e.g. `groq/llama-3.3-70b`). The first is greppable; the second matches Anthropic's `anthropic/claude-...` convention. Both work; pick one. | ||
| 3. **q_prior sourcing per host:** the AA Index gives one number per logical model. Per-host q_prior diverges from `q_empirical` once traffic lands; until then all hosts of the same logical model start at the same `q_prior` value. Confirm that's the intended starting point. | ||
| 4. **Public claim:** "Ainfera Inference picks the cheapest venue for your floor" — is this in the public narrative for the activation announce, or does it stay internal until enough venues are active to demo it? | ||
|
|
||
| Until the founder answers these, the activation runbook stays parked. The smoke harness can be exercised in parallel — it never touches the DB. |


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path A slug uniqueness contradiction
Medium Severity
Phase 2 says Path A makes
slugnon-unique across providers, butModelORMenforces globalUniqueConstraint("slug", name="uq_models_slug"). Path A needs distinct suffixed slugs per venue, not duplicate slugs on differentprovider_idvalues.Additional Locations (1)
docs/dark-host-ontology-proposal.md#L28-L44Reviewed by Cursor Bugbot for commit b93ec86. Configure here.