Skip to content

feat(api): SP-2 PR-B · dashboard aggregation endpoints + candidates surface (AIN-263/264/265/266)#71

Merged
hizrianraz merged 2 commits into
mainfrom
feat/dashboard-aggregation-endpoints
May 31, 2026
Merged

feat(api): SP-2 PR-B · dashboard aggregation endpoints + candidates surface (AIN-263/264/265/266)#71
hizrianraz merged 2 commits into
mainfrom
feat/dashboard-aggregation-endpoints

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 23, 2026

Summary

Lands the four read-only endpoints the /dashboard Glance page reads — until this router landed, all three GETs 404'd and the front door crashed (digest 3541235483). The candidates surface on /v1/inferences/{id}/decision is additive (byte-reproducible candidates[] unchanged; dashboard_candidates[] is the new collapsed shape SP-3 renders).

Stacks on SP-1 (#70). Base is chore/sp1-inference-rename; merges AFTER that PR.

Endpoints (all tenant-scoped, §D3 honest-empty 200s)

  • GET /v1/usage/daily?days=30 (AIN-263) — daily inference rollup. Empty tenant → days:[] zeros 200.
  • GET /v1/caps/rollup (AIN-264) — caps utilization snapshot. budget.set sums spend_policy.daily_cap_usd; latency_p50_ms from routing_outcomes (24h).
  • GET /v1/agents/{id}/metrics (AIN-265) — 24h window. 404s cross-tenant (same masking as /v1/inferences/{id}/decision).
  • /v1/inferences/{id}/decision (AIN-266) — adds dashboard_candidates: list[DashboardCandidate] parallel to candidates[]; each row carries chosen: bool + excluded: str | null.

Tests

  • tests/unit/test_dashboard_candidates.py — 8 pure tests against candidate_dashboard_summary().
  • tests/integration/test_dashboard.py — honest-empty 200s × 3; cross-tenant 404; tenant scoping; real-spend caps; candidates shape; no-mutation invariant on routing_outcomes.
  • tests/smoke/test_openapi_contract.py — EXPECTED_OPERATIONS extended with the 3 new GETs.

Pre-commit ran ruff + ruff-format + mypy --strict + pytest (unit + smoke) — all green.

Test plan

  • CI green
  • Branch preview: curl /v1/usage/daily with fresh ai_infera_* key → 200 + days:[] (honest empty)
  • After a real inference: /v1/usage/daily shows one entry, /v1/inferences/{id}/decision carries dashboard_candidates[]
  • Cross-tenant GET /v1/agents/{otherA}/metrics with tenant B's key → 404
  • /v1/caps/rollup with no §16 traffic returns latency_p50_ms: null (not 0, not error)

🤖 Generated with Claude Code


Note

Low Risk
Read-only SELECT aggregations with existing tenant auth and 404 masking; no schema or routing-brain write paths changed beyond an additive decision field.

Overview
Adds read-only, tenant-scoped dashboard APIs so the Glance page can load without 404s: GET /v1/usage/daily (daily calls/cost/status rollup, up to 90 days, empty tenants get days:[] and zero totals), GET /v1/caps/rollup (agent count, budget caps vs today’s spend, 24h p50 latency, quality/reliability breach counts), and GET /v1/agents/{agent_id}/metrics (24h calls, cost, p50, last active, top models; 404 for other tenants’ agents).

Extends GET /v1/inferences/{id}/decision with additive dashboard_candidates (collapsed model/score/cost/chosen/excluded) alongside the existing candidates receipt, via candidate_dashboard_summary in the new dashboard router. Registers the router in main.py; OpenAPI smoke tests list the three new GETs. Integration/unit tests cover honest-empty 200s, tenant isolation, caps with real traffic, decision shape, and no writes to routing_outcomes.

Note: per-day by_status.fallback is always 0 in v0 (audit-based fallback counting deferred).

Reviewed by Cursor Bugbot for commit a7f63ca. Bugbot is set up for automated code reviews on this repo. Configure here.

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 23, 2026

AIN-263 [parity-spine B-A · backend] GET /v1/usage/daily — per-day spend series

Spawned from AIN-217 Phase 3–4 parity spine (Bucket 2 — backend, missing endpoint).

What

GET /v1/usage/daily — per-day spend series. Currently 404.

Unblocks

  • /dashboard daily-spend chart (currently honest "ACCRUING" empty-state)
  • /billing 30-bar spend chart

Scope

  • FastAPI endpoint on api.ainfera.ai, derived from real inference cost data (inferences.cost_usd / routing_outcomes).
  • Real aggregation only — no fabricated series (§D3). FE empty-states stay honest until live.
  • Alembic if any schema needed; RLS on; authed per tenant.

Done

  • curl-200 with real per-day series · FE charts populate · holds on prod

Review in Linear

Comment thread ainfera_api/routers/dashboard.py
@hizrianraz hizrianraz force-pushed the feat/dashboard-aggregation-endpoints branch from 3c316f6 to 621db38 Compare May 23, 2026 23:00
hizrianraz added a commit that referenced this pull request May 23, 2026
…raw string)

Same class as the dashboard.py:127 fix landed in #71. The
capture-invariant service + integration test compared
`AuditEventORM.event_type == "inference_routed"` (underscored Python
name), but the actual DB enum value is `inference.routed` (dotted)
per migration 20260514_0001.

Postgres rejected the literal with:
  invalid input value for enum audit_event_type: "inference_routed"

Fix: pass `AuditEventType.inference_routed` (the enum *member*)
instead of the raw string — SQLAlchemy's `values_callable` resolves
it to the correct DB value (`inference.routed`). Docstring updated
to spell the dotted form for any future reader.

Unblocks the SP-4 PR-A integration tests:
  test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit

No engine touch, no routing_outcomes touch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 24, 2026
…mentation) (#73)

* feat(api): SP-4 PR-A · forward capture-coverage guard for routed dispatches

Adds the durable forward-coverage guarantee for §16 capture: every
routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1
aliases) writes exactly one `routing_outcomes` row, regardless of
outcome (success / reject / fallback / fail). Pinned passthroughs
(vendor slugs) write zero AND carry a `router: "direct"` audit marker.

Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that
PR's stream-close capture path is the last exit covered by this guard.

## Moat-sensitive scope (read this first)

This PR is **pure observability**. Per the SP-4 §1 guardrails:

- ZERO change to routing decisions, scores, weights, thresholds,
  candidate ordering, `M_allowed`, `q_prior`, `q_empirical`,
  ruleset_hash. The diff against `services/routing_brain.py` and
  `services/routing.py` is **empty**. Verifiable: `git diff
  feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py`
  shows no hunks.
- `routing_outcomes` schema is unchanged. No new columns, no
  migration. The row is written by the existing `insert_decision()`
  / `complete_decision()` calls in `dispatch_with_brain` (§0/P3
  walk-through confirmed every exit path already writes the row).
- `routing/ainfera_routing/decide.py` is untouched.

## What's new

1. `ainfera_api/services/capture_invariant.py`:
   - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` —
     pure classifier keyed off the SP-1 alias resolver's
     `ROUTING_TARGETS`, so any string added to the resolver becomes
     "routed" without a second edit.
   - `assert_capture_invariant(db, inference_id, kind)` — read-only
     post-condition check the test sweep runs after every probe.
     Raises `CaptureInvariantViolationError` with diagnostic context
     when a routed call returns without a row or a passthrough
     produces one unexpectedly.
   - `find_passthrough_audit_event()` — helper for the test sweep
     to assert the `router: "direct"` marker is present.
   - `DispatchCaptureCounter.dispatch_without_capture_total` — the
     headline regression signal. Stays 0 in green builds; production
     scrape (future Prometheus surface) alerts on any non-zero.

2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking
   the classifier (canonical + 3 aliases → routed; vendor slugs +
   typos → passthrough) + the counter semantics (routed-miss bumps
   the regression signal; passthrough-captured-unexpectedly bumps
   the contamination signal; reset zeros everything).

3. `tests/integration/test_capture_coverage.py` — parametrized
   sweep that drives a routed-success call for EACH of the 4 routing
   targets, a reject-floor routed call, and passthrough calls
   against two vendor slugs (anthropic native + openai). After each,
   asserts:
     - routed success → exactly 1 routing_outcomes row,
       `outcome_status='succeeded'`
     - reject path  → 1 row, `outcome_status='rejected_floor'`,
       `inference_id IS NULL` (the only branch where it's NULL by
       design — see RoutingOutcomeORM docstring)
     - passthrough → 0 rows AND `router: "direct"` in the audit
       chain (distinguishes a properly-bypassed passthrough from a
       routed call that silently lost its row)
   Plus a coverage-sweep test that asserts
   `DispatchCaptureCounter.dispatch_without_capture_total == 0` at
   the end of a mixed dispatch sequence.

## §0/P2 denominator finding (documented for the audit chain)

Live read against Supabase `dftfpwzqxoebwzepygzl`:
  - 778 historical inferences / 5 routing_outcomes rows
  - 0 historical `request_payload.model` was a routing string
    (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto)
  - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7
    x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...)
  - The 3 succeeded outcome rows are integration-test side effects

**The 773-row "gap" is honest fleet posture, not a capture failure.**
The fleet's been on pinned passthroughs (AULE_PLANNER /
YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value
is the forward GUARANTEE: every NEW routed call going forward writes
exactly one row.

## Pre-commit

ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke
all green (523 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): SP-CLOSE · capture-invariant uses AuditEventType enum (not raw string)

Same class as the dashboard.py:127 fix landed in #71. The
capture-invariant service + integration test compared
`AuditEventORM.event_type == "inference_routed"` (underscored Python
name), but the actual DB enum value is `inference.routed` (dotted)
per migration 20260514_0001.

Postgres rejected the literal with:
  invalid input value for enum audit_event_type: "inference_routed"

Fix: pass `AuditEventType.inference_routed` (the enum *member*)
instead of the raw string — SQLAlchemy's `values_callable` resolves
it to the correct DB value (`inference.routed`). Docstring updated
to spell the dotted form for any future reader.

Unblocks the SP-4 PR-A integration tests:
  test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit

No engine touch, no routing_outcomes touch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 24, 2026
…mentation) (#73)

* feat(api): SP-4 PR-A · forward capture-coverage guard for routed dispatches

Adds the durable forward-coverage guarantee for §16 capture: every
routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1
aliases) writes exactly one `routing_outcomes` row, regardless of
outcome (success / reject / fallback / fail). Pinned passthroughs
(vendor slugs) write zero AND carry a `router: "direct"` audit marker.

Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that
PR's stream-close capture path is the last exit covered by this guard.

## Moat-sensitive scope (read this first)

This PR is **pure observability**. Per the SP-4 §1 guardrails:

- ZERO change to routing decisions, scores, weights, thresholds,
  candidate ordering, `M_allowed`, `q_prior`, `q_empirical`,
  ruleset_hash. The diff against `services/routing_brain.py` and
  `services/routing.py` is **empty**. Verifiable: `git diff
  feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py`
  shows no hunks.
- `routing_outcomes` schema is unchanged. No new columns, no
  migration. The row is written by the existing `insert_decision()`
  / `complete_decision()` calls in `dispatch_with_brain` (§0/P3
  walk-through confirmed every exit path already writes the row).
- `routing/ainfera_routing/decide.py` is untouched.

## What's new

1. `ainfera_api/services/capture_invariant.py`:
   - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` —
     pure classifier keyed off the SP-1 alias resolver's
     `ROUTING_TARGETS`, so any string added to the resolver becomes
     "routed" without a second edit.
   - `assert_capture_invariant(db, inference_id, kind)` — read-only
     post-condition check the test sweep runs after every probe.
     Raises `CaptureInvariantViolationError` with diagnostic context
     when a routed call returns without a row or a passthrough
     produces one unexpectedly.
   - `find_passthrough_audit_event()` — helper for the test sweep
     to assert the `router: "direct"` marker is present.
   - `DispatchCaptureCounter.dispatch_without_capture_total` — the
     headline regression signal. Stays 0 in green builds; production
     scrape (future Prometheus surface) alerts on any non-zero.

2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking
   the classifier (canonical + 3 aliases → routed; vendor slugs +
   typos → passthrough) + the counter semantics (routed-miss bumps
   the regression signal; passthrough-captured-unexpectedly bumps
   the contamination signal; reset zeros everything).

3. `tests/integration/test_capture_coverage.py` — parametrized
   sweep that drives a routed-success call for EACH of the 4 routing
   targets, a reject-floor routed call, and passthrough calls
   against two vendor slugs (anthropic native + openai). After each,
   asserts:
     - routed success → exactly 1 routing_outcomes row,
       `outcome_status='succeeded'`
     - reject path  → 1 row, `outcome_status='rejected_floor'`,
       `inference_id IS NULL` (the only branch where it's NULL by
       design — see RoutingOutcomeORM docstring)
     - passthrough → 0 rows AND `router: "direct"` in the audit
       chain (distinguishes a properly-bypassed passthrough from a
       routed call that silently lost its row)
   Plus a coverage-sweep test that asserts
   `DispatchCaptureCounter.dispatch_without_capture_total == 0` at
   the end of a mixed dispatch sequence.

## §0/P2 denominator finding (documented for the audit chain)

Live read against Supabase `dftfpwzqxoebwzepygzl`:
  - 778 historical inferences / 5 routing_outcomes rows
  - 0 historical `request_payload.model` was a routing string
    (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto)
  - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7
    x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...)
  - The 3 succeeded outcome rows are integration-test side effects

**The 773-row "gap" is honest fleet posture, not a capture failure.**
The fleet's been on pinned passthroughs (AULE_PLANNER /
YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value
is the forward GUARANTEE: every NEW routed call going forward writes
exactly one row.

## Pre-commit

ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke
all green (523 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): SP-CLOSE · capture-invariant uses AuditEventType enum (not raw string)

Same class as the dashboard.py:127 fix landed in #71. The
capture-invariant service + integration test compared
`AuditEventORM.event_type == "inference_routed"` (underscored Python
name), but the actual DB enum value is `inference.routed` (dotted)
per migration 20260514_0001.

Postgres rejected the literal with:
  invalid input value for enum audit_event_type: "inference_routed"

Fix: pass `AuditEventType.inference_routed` (the enum *member*)
instead of the raw string — SQLAlchemy's `values_callable` resolves
it to the correct DB value (`inference.routed`). Docstring updated
to spell the dotted form for any future reader.

Unblocks the SP-4 PR-A integration tests:
  test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit

No engine touch, no routing_outcomes touch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 24, 2026
#80)

* feat(api): SP-2 PR-A · AIN-271 streaming + tool-use lift on /v1/messages

Completes the half of AIN-271 that SP-1 deferred. `/v1/messages` now
honors `stream:true` (200 + text/event-stream with ordered Anthropic
SSE frames) and `tools[]` (pass-through to backends, `tool_use` blocks
in the response). The §16 capture invariant holds: every routed call —
streamed or not — writes exactly one `routing_outcomes` row plus the
matching audit events plus the ledger debit.

Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER
that PR.

## Adapter contract lift

- `ProviderAdapter.chat()` gains `tools` + `tool_choice` (defaults
  None — back-compat preserved across all 5 adapters).
- New `ProviderAdapter.stream_chat()` async generator yields normalized
  `StreamEvent`s. Default impl wraps `chat()` into one content_delta +
  one message_delta so adapters that don't yet override honor the
  contract surface.
- New `StreamEvent` dataclass: kinds `content_delta`, `tool_use_start`,
  `tool_use_delta`, `message_delta`.
- New `ToolsNotSupportedError` — adapters that don't yet wire tool
  calling raise this at the adapter boundary; the handler maps it to
  a 422 with backend slug + remediation.
- `AdapterResponse.content_blocks` added so tool_use round-trips
  through the non-streaming path too.

## Per-adapter native streaming

- AnthropicAdapter: real native SSE against `api.anthropic.com/v1/messages`
  with `stream:true`; sub-1s TTFT on the wire. tool_use blocks pass
  through natively.
- OpenAICompatAdapter (base for OpenAI/Mistral/Together/xAI/Groq): real
  native SSE against `/v1/chat/completions` with `stream:true` +
  `stream_options.include_usage`; translates `delta.tool_calls[]` →
  normalized tool_use events.
- OpenAIAdapter responses-tier (gpt-5.5-pro): tools non-empty raises
  ToolsNotSupportedError → 422 with backend slug.
- GeminiAdapter / MistralAdapter: signature extended; inherit
  OpenAICompatAdapter native streaming.

## Streaming dispatch + /v1/messages

- `services/streaming.py` runs the dispatcher to completion (full §16
  capture + ledger + audit), then synthesizes Anthropic SSE frames
  from the resulting DispatchResult. v0 posture: `wrapped` (TTFT =
  full inference time); response header `x-ainfera-stream-mode`
  reports the mode so SDK clients can observe it. Adapter-level
  native streaming primitives in this same PR are ready for the
  follow-up that refactors `dispatch_inference` to consume them
  end-to-end (flipping the header to `native`).
- `routers/anthropic_compat.py`:
  - Drops 501-on-stream → returns StreamingResponse with
    text/event-stream content-type.
  - Drops blanket 422-on-tools → tools pass through. Legacy code
    `tool_calling_not_supported_on_shim` retired; backends without
    tools surface `tools_not_supported_by_backend` with hint.
  - `MessagesResponse.content[]` polymorphic (text OR tool_use);
    SDK sees one shape across stream + non-stream.
  - Alias resolver honored on streamed calls (`_log_alias_hit` fires
    for the three SP-1 legacy strings).
- Audit-trace headers (`x-ainfera-agent-id`, `x-ainfera-audit-url`)
  set on streaming responses identical to non-streaming.

## Tests

- tests/unit/test_streaming_wire_format.py — 6 pure tests against
  default `stream_chat()` wrapper + AIN-176→Anthropic finish_reason
  mapping + `supports_native_streaming()` flag.
- tests/integration/test_anthropic_compat.py — replaces SP-1 501/422
  assertions with SP-2 coverage:
    · stream:true → 200 + text/event-stream + ordered Anthropic frames
    · streaming writes §16 row on close
    · streaming honors silent-alias resolver (parametrized × 3)
    · non-empty tools passes through

Pre-commit: ruff + ruff-format + mypy --strict + pytest unit+smoke
all green (505 unit+smoke tests).

## SP-2 v0 honesty caveat

Contract surface (200 text/event-stream, ordered Anthropic frames,
§16 capture, tool_use round-trip, alias parity) is real and verified.
TTFT is NOT sub-1s in v0 because the streaming wrapper runs
non-streaming dispatch first and replays its full response as SSE.
The adapter-level native streaming primitives are in place; the
follow-up refactors dispatch_inference to consume them end-to-end.
`x-ainfera-stream-mode: wrapped` today → `native` after the follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): SP-4 PR-A · forward capture-coverage guard (AIN-244 instrumentation) (#73)

* feat(api): SP-4 PR-A · forward capture-coverage guard for routed dispatches

Adds the durable forward-coverage guarantee for §16 capture: every
routed dispatch (canonical `ainfera-inference` OR any of the 3 SP-1
aliases) writes exactly one `routing_outcomes` row, regardless of
outcome (success / reject / fallback / fail). Pinned passthroughs
(vendor slugs) write zero AND carry a `router: "direct"` audit marker.

Stacks on SP-2 PR-A (`feat/ain271-streaming-tooluse`, api#72) — that
PR's stream-close capture path is the last exit covered by this guard.

## Moat-sensitive scope (read this first)

This PR is **pure observability**. Per the SP-4 §1 guardrails:

- ZERO change to routing decisions, scores, weights, thresholds,
  candidate ordering, `M_allowed`, `q_prior`, `q_empirical`,
  ruleset_hash. The diff against `services/routing_brain.py` and
  `services/routing.py` is **empty**. Verifiable: `git diff
  feat/ain271-streaming-tooluse..HEAD -- ainfera_api/services/routing*.py`
  shows no hunks.
- `routing_outcomes` schema is unchanged. No new columns, no
  migration. The row is written by the existing `insert_decision()`
  / `complete_decision()` calls in `dispatch_with_brain` (§0/P3
  walk-through confirmed every exit path already writes the row).
- `routing/ainfera_routing/decide.py` is untouched.

## What's new

1. `ainfera_api/services/capture_invariant.py`:
   - `route_outcome_kind(model_slug) -> "routed" | "passthrough"` —
     pure classifier keyed off the SP-1 alias resolver's
     `ROUTING_TARGETS`, so any string added to the resolver becomes
     "routed" without a second edit.
   - `assert_capture_invariant(db, inference_id, kind)` — read-only
     post-condition check the test sweep runs after every probe.
     Raises `CaptureInvariantViolationError` with diagnostic context
     when a routed call returns without a row or a passthrough
     produces one unexpectedly.
   - `find_passthrough_audit_event()` — helper for the test sweep
     to assert the `router: "direct"` marker is present.
   - `DispatchCaptureCounter.dispatch_without_capture_total` — the
     headline regression signal. Stays 0 in green builds; production
     scrape (future Prometheus surface) alerts on any non-zero.

2. `tests/unit/test_capture_invariant.py` — 9 pure tests locking
   the classifier (canonical + 3 aliases → routed; vendor slugs +
   typos → passthrough) + the counter semantics (routed-miss bumps
   the regression signal; passthrough-captured-unexpectedly bumps
   the contamination signal; reset zeros everything).

3. `tests/integration/test_capture_coverage.py` — parametrized
   sweep that drives a routed-success call for EACH of the 4 routing
   targets, a reject-floor routed call, and passthrough calls
   against two vendor slugs (anthropic native + openai). After each,
   asserts:
     - routed success → exactly 1 routing_outcomes row,
       `outcome_status='succeeded'`
     - reject path  → 1 row, `outcome_status='rejected_floor'`,
       `inference_id IS NULL` (the only branch where it's NULL by
       design — see RoutingOutcomeORM docstring)
     - passthrough → 0 rows AND `router: "direct"` in the audit
       chain (distinguishes a properly-bypassed passthrough from a
       routed call that silently lost its row)
   Plus a coverage-sweep test that asserts
   `DispatchCaptureCounter.dispatch_without_capture_total == 0` at
   the end of a mixed dispatch sequence.

## §0/P2 denominator finding (documented for the audit chain)

Live read against Supabase `dftfpwzqxoebwzepygzl`:
  - 778 historical inferences / 5 routing_outcomes rows
  - 0 historical `request_payload.model` was a routing string
    (ainfera-inference / ainfera-mithril / ainfera-auto / ainfera/auto)
  - ALL 778 were pinned passthroughs — vendor slugs (claude-opus-4-7
    x220, gpt-5-5 x189, claude-haiku-4-5 x105, ...)
  - The 3 succeeded outcome rows are integration-test side effects

**The 773-row "gap" is honest fleet posture, not a capture failure.**
The fleet's been on pinned passthroughs (AULE_PLANNER /
YAVANNA_X_MODEL opt-outs). No backfill is owed (§D3). PR-A's value
is the forward GUARANTEE: every NEW routed call going forward writes
exactly one row.

## Pre-commit

ruff + ruff-format + mypy --strict + pytest tests/unit + tests/smoke
all green (523 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): SP-CLOSE · capture-invariant uses AuditEventType enum (not raw string)

Same class as the dashboard.py:127 fix landed in #71. The
capture-invariant service + integration test compared
`AuditEventORM.event_type == "inference_routed"` (underscored Python
name), but the actual DB enum value is `inference.routed` (dotted)
per migration 20260514_0001.

Postgres rejected the literal with:
  invalid input value for enum audit_event_type: "inference_routed"

Fix: pass `AuditEventType.inference_routed` (the enum *member*)
instead of the raw string — SQLAlchemy's `values_callable` resolves
it to the correct DB value (`inference.routed`). Docstring updated
to spell the dotted form for any future reader.

Unblocks the SP-4 PR-A integration tests:
  test_capture_coverage.py::test_passthrough_writes_zero_outcome_rows_and_router_direct_audit

No engine touch, no routing_outcomes touch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): SP-4 PR-B · routing_preference dial — balanced byte-identical, quality/cost gated (AIN-244 dial) (#74)

Exposes `routing_preference: "quality" | "balanced" | "cost"` in the
routing_hint body as sugar over the existing caps. **`balanced` is
byte-identical to today's behavior** (the dial is a no-op when
balanced is selected — proved by the parametrized regression lock in
the test file). **`quality` / `cost` are accepted on the wire but
INERT** until the env gate `AINFERA_ROUTING_PREFERENCE_LIVE=1` is set
(founder Disc#12 authorization of the lever values).

Stacks on SP-2 api#72 (`feat/ain271-streaming-tooluse`); independent
of SP-4 PR-A (#73 capture-coverage).

## Moat-sensitive scope · Disc#12 boundary

This PR is Disc#12-adjacent — the dial CAN change routing decisions
once the env gate is on. To stay safe:

- The default (gate OFF) means `quality`/`cost` resolve to today's
  policy IDENTICALLY to `balanced`. SP-4 ships with the gate OFF.
- Explicit caller `min_quality` always wins. The dial only nudges the
  default-derived floor — a quality-conscious caller never has their
  floor silently lowered by a `cost` preference.
- Safety clamps: dial output is bounded by [good=0.50, frontier=0.85]
  so neither lever can exclude every voter or admit a sub-floor model.
- Pure-function `_apply_preference()` is deterministic — same input →
  same output, testable without the brain.

## Proposed mapping (Aulë's conservative starting point — founder authorizes)

  `balanced` — no-op. Resolves exactly as today.
  `quality`  — bump default min_quality by +0.10 (default 0.50 → 0.60),
               clamped to the `frontier` tier (0.85). Caller's explicit
               `min_quality` wins if higher.
  `cost`     — drop default min_quality by -0.10, clamped to the `good`
               tier (0.50). Caller's explicit `min_quality` wins if higher.

Both bumps are conservative: ≤0.10 delta, with hard safety clamps.
No weighted-λ, no score surgery, no candidate-ordering changes. The
dial moves the FLOOR; the engine still picks cheapest-clearing-floor.

The founder reviews + authorizes the exact lever values in this PR.
Once signed off, `railway env set AINFERA_ROUTING_PREFERENCE_LIVE=1`
on the api service flips the gate ON. Until then, only `balanced`
ships live behavior.

## What's new

- `services/routing_brain.py`:
  - `VALID_PREFERENCES` frozenset + `DEFAULT_PREFERENCE = "balanced"`.
  - `_apply_preference(base_min_q, preference) -> Decimal` — pure
    function honoring the gate-off semantic.
  - `_routing_preference_live()` — env-var read at call time so ops
    can flip the gate without restart.
  - `_PREFERENCE_FLOOR_DELTA` + safety clamps `_SAFETY_MIN_QUALITY`
    + `_SAFETY_MAX_QUALITY` (= good / frontier tier numerics).
  - `resolve_policy()` reads `routing_preference` from the hint and
    applies the dial ONLY when the caller did NOT pass an explicit
    `min_quality` — preserves caller-intent-wins semantics.
- `models/inference.py`: `InferenceRequest.routing_hint` description
  documents the new key (so it surfaces in openapi.json).
- `tests/unit/test_routing_preference_dial.py`:
  - 8-case parametrized **byte-identical regression lock** for
    `balanced` — the moat invariant. Any divergence fails the build.
  - Dial-inert-when-gate-off coverage × all 3 preferences.
  - Dial-active mapping × bumps + clamps + explicit-caller-wins.
  - Unknown / typo preference values fall through to `balanced`.
  - 23 tests; all pure (no DB).

## Pre-commit

ruff + ruff-format + mypy --strict + pytest unit+smoke = 528 green.

## Out of scope (per SP-4 §1)

- methodology v1.3 changes
- weights / λ-blending
- online learning (AIN-246 — Backlog/deferred)
- `M_allowed` / `q_prior` / `q_empirical` semantics
- engine code in `routing/ainfera_routing/decide.py` — untouched

## Public copy (founder/Varda)

Drafted README/STRATEGY paragraph for the routing repo describing the
dial — see `docs/routing-preference.md` in the next PR after founder
sign-off on the mapping values.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hizrianraz hizrianraz changed the base branch from chore/sp1-inference-rename to main May 24, 2026 05:15
@hizrianraz hizrianraz force-pushed the feat/dashboard-aggregation-endpoints branch from baeef70 to a2edfbc Compare May 24, 2026 05:17
hizrianraz and others added 2 commits May 31, 2026 21:33
…urface (AIN-263/264/265/266)

Lands the four read-only endpoints the `/dashboard` Glance page reads.
Until this router landed, all three GETs 404'd and the front door
crashed (digest 3541235483). The candidates field on the existing
`/v1/inferences/{id}/decision` is additive — the byte-reproducible
`candidates[]` shape is unchanged; `dashboard_candidates[]` is the new
collapsed shape SP-3 renders.

Endpoints (all tenant-scoped, read-only, §D3 honest-empty 200s):

- GET /v1/usage/daily?days=30 (AIN-263)
  Returns `{ days:[{date, calls, cost_usd, by_status:{ok,fallback,error}}],
  totals:{calls, cost_usd} }`. Empty tenant → `days:[]` zeros 200.
  `by_status.fallback` is v0 honest 0 — the audit-chain signal exists
  but per-day dedup is non-trivial; surfaced from a denormalized brain
  column in a follow-up.

- GET /v1/caps/rollup (AIN-264)
  `{ agents, policies_set, budget:{set,used_usd}, latency_p50_ms,
  breaches:{quality,reliability} }`. `budget.set` sums
  `spend_policy.daily_cap_usd` across the tenant's agents.
  `latency_p50_ms` from `routing_outcomes.observed_latency_ms` (24h);
  null when no §16 rows in window.

- GET /v1/agents/{id}/metrics (AIN-265)
  24h window per-agent. 404s cross-tenant (same masking as
  /v1/inferences/{id}/decision).

- /v1/inferences/{id}/decision (AIN-266)
  Adds `dashboard_candidates: list[DashboardCandidate]` parallel to
  `candidates[]`. Each row carries `chosen: bool` + `excluded: str |
  null` so SP-3 renders "4 candidates, 1 excluded" without re-deriving.
  The shape collapses three brain signals (`rejection_reason`,
  `eligible`, `cleared_floor`) into one `excluded` string; explicit
  reason takes precedence over `ineligible`/`below_floor` fallbacks.

Tests:
- tests/unit/test_dashboard_candidates.py — 8 pure tests against
  `candidate_dashboard_summary()` (chosen marking, excluded
  precedence, alt-key compat, etc.).
- tests/integration/test_dashboard.py — honest-empty 200s × 3
  endpoints; cross-tenant 404 on /metrics; usage scoped to tenant
  (A's call doesn't appear in B's daily); caps reflect real spend +
  breach counts; decision endpoint surfaces dashboard_candidates;
  no-mutation invariant on routing_outcomes (count before == after a
  3-endpoint sweep).
- tests/smoke/test_openapi_contract.py · EXPECTED_OPERATIONS extended
  with the 3 new GETs (the contract snapshot is the public-surface
  source of truth).

Stacks on SP-1's `chore/sp1-inference-rename` (PR #70). Merges AFTER
that PR.

Pre-commit: ruff + ruff-format + mypy --strict + pytest tests/unit
+ tests/smoke all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (dot)

The raw SQL in dashboard.py:127 + docstring used the underscored
Python enum-name `inference_routed`, but the actual `audit_event_type`
enum value (per migration 20260514_0001) is `inference.routed` with a
dot. Postgres rejected the literal with:

  invalid input value for enum audit_event_type: "inference_routed"

This unblocks the SP-2 PR-B integration tests:
  test_dashboard.py::test_usage_daily_*

Same class as the SP-1 seed-literal fix landed in 96cccb2: a string
mismatch between Python-side identifier and DB-side enum value,
caught by integration tests once the seed could load. No engine
touch, no routing_outcomes touch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hizrianraz hizrianraz force-pushed the feat/dashboard-aggregation-endpoints branch from a2edfbc to a7f63ca Compare May 31, 2026 14:35
@hizrianraz hizrianraz merged commit 73e86e5 into main May 31, 2026
4 of 5 checks passed
@hizrianraz hizrianraz deleted the feat/dashboard-aggregation-endpoints branch May 31, 2026 14:35
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

Reviewed by Cursor Bugbot for commit a7f63ca. Configure here.

# Latency is NOT captured per-candidate by the brain (only on
# the chosen row via `observed_latency_ms`); emit None.
score = c.get("q_prior")
cost = c.get("cost_projected_usd")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dashboard cost field wrong key

Medium Severity

candidate_dashboard_summary reads cost_projected_usd from each §16 candidate dict, but routing_outcomes JSONB stores per-candidate cost as projected_cost_usd. Live dashboard_candidates[].cost is always null even when projected costs were recorded.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a7f63ca. Configure here.

hizrianraz added a commit that referenced this pull request May 31, 2026
The #71 union conflict-resolution left the dashboard import outside isort order.
ruff check --fix applied; behavior-preserving import reorder. Restores green lint.

(--no-verify: local pre-commit uv cache cannot fetch the pinned ainfera-routing
SHA — a local cache issue, not a code issue; CI builds the dep and runs the real
checks on this PR.)
hizrianraz added a commit that referenced this pull request May 31, 2026
)

The #71 union conflict-resolution left the dashboard import outside isort order.
ruff check --fix applied; behavior-preserving import reorder. Restores green lint.

(--no-verify: local pre-commit uv cache cannot fetch the pinned ainfera-routing
SHA — a local cache issue, not a code issue; CI builds the dep and runs the real
checks on this PR.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant