Skip to content

feat(billing,codex): per-tier pricing overlays + service tier extractors#69

Open
Menci wants to merge 3 commits into
mainfrom
precursor-tier-aware-pricing
Open

feat(billing,codex): per-tier pricing overlays + service tier extractors#69
Menci wants to merge 3 commits into
mainfrom
precursor-tier-aware-pricing

Conversation

@Menci

@Menci Menci commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

One of three slices split from #65 (closed).

  • ModelPricing.tiers? overlay + TokenUsage.tier? + resolveEffectivePricing(pricing, tier) helper. Cost compute resolves effective pricing through the overlay so a single model billed at multiple tiers in the same hour aggregates as separate buckets with distinct unit prices.
  • Per-shape tier extractors — Messages reads usage.speed; Responses + Chat read usage.service_tier. 'standard' / 'default' / 'auto' normalize to null (base pricing); 'fast' / 'priority' / 'flex' / others stored verbatim onto TokenUsage.tier.
  • Codex flex/priority pricing — every priced Codex slug gains tiers.flex (50% off) + tiers.priority (~2x) overlays per OpenAI's published rates. Codex CLI's /fast writes service_tier: "priority" per openai/codex source — so operator-facing rows tagged "fast" cost out at the priority row.

Schema dependency

Hard dependency. The SQL repo writes the tier column directly into usage + usage_requests. Depends on the sibling PR adding the tier column to those tables landing first (or concurrently); until then, repo writes will fail at the SQL boundary. Sibling PR title: `feat(protocols,gateway): input_cache_write_1h dimension + migration adding tier column`.

Test plan

  • typecheck + test + lint clean

Menci added 3 commits June 20, 2026 03:57
…rs + TokenUsage.tier + resolveEffectivePricing)

Captures the per-service-tier dimension that bare ModelPricing misses:
distinct service tiers for the same model (Anthropic standard / fast, OpenAI
default / priority / flex / scale) are priced at different rates, and the
gateway needs to surface that distinction in the cost aggregate.

- `ModelPricing.tiers` carries per-service-tier overrides, keyed by the
  wire-value the upstream stamps on the usage object. `resolveEffectivePricing`
  folds a tier override into a flat snapshot before any unit-price lookup,
  so every downstream `unitPriceForDimension` call sees one self-contained map.
- `UsageRecord` and `TokenUsage` grow a `tier` slot; the usage tables key
  buckets on (keyId, model, upstream, modelKey, hour, tier) so distinct tiers
  aggregate as separate buckets with distinct unit-price snapshots. Existing
  rows with `tier = NULL` keep computing identically to before (the resolver
  treats null as base pricing and returns the snapshot sans the `tiers` key).
- `recordTokenUsage` threads the tier from the parsed `TokenUsage` onto
  the bucket so cost compute applies the right override; `tokenUsage`'s
  zero-dimension filter passes `tier` through verbatim.
- Control-plane export / import surfaces the tier alongside the other
  bucket-identity fields; missing tier defaults to null on import.
- Provider config parsers iterate `BILLING_DIMENSIONS` directly instead of
  a hand-rolled `keyof ModelPricing` list — the latter would now include
  `tiers` and admit a non-numeric value into `pricing[dimension]`.

Schema: the SQL repo writes the tier column directly; depends on the sibling
migration adding `tier` to `usage` + `usage_requests`.
….tier across protocol shapes

Reads each upstream's service-tier marker off the usage object and stamps it
onto TokenUsage.tier so the recording layer routes the bucket through the
right tier override:

- Messages: Opus 4.6+ emits `usage.speed: 'standard' | 'fast'`; only `fast`
  surfaces as `tier: 'fast'`. Standard is left unset so base-tier rows
  aggregate with the historical no-tier rows. Streamed deltas propagate
  `speed` so a late delta carries the tier all the way to message_stop.
- Responses: the top-level `response.service_tier` echoes the actual
  processing tier ('priority', 'flex', 'scale', 'default', 'auto'). We drop
  'default' and 'auto' — both denote base pricing — and surface anything
  else verbatim. The WebSocket path reads service_tier the same way as HTTP.
- Chat Completions: same as Responses but reading the top-level
  `chunk.service_tier` (chat.completion[.chunk]).

Protocol types grow `MessagesUsage.speed`, `ResponsesResult.service_tier`,
`ChatCompletionsResult.service_tier`, and `ChatCompletionsStreamEvent.service_tier`.
Add `tiers.flex` and `tiers.priority` overlays for every priced Codex slug
so the dashboard's notional cost reflects which OpenAI service tier the
request actually ran on. The gateway already captures `usage.service_tier`
onto `TokenUsage.tier`; this commit completes the loop by giving the cost
compute a per-tier rate row to look up.

Tier overrides match OpenAI's public pricing (verified 2026-06-19 against
https://platform.openai.com/docs/pricing):

  gpt-5.5         flex $2.5/$0.25/$15      priority $12.5/$1.25/$75
  gpt-5.4         flex $1.25/$0.13/$7.5    priority $5/$0.5/$30
  gpt-5.4-mini    flex $0.375/$0.0375/$2.25  priority $1.5/$0.15/$9

`codex-auto-review` shares `gpt-5.4`'s pricing including the tier
overrides. Codex CLI's `/fast` toggle writes `service_tier: "priority"` on
the wire (per openai/codex's `ServiceTier::Fast.request_value()`), so
operator-facing rows tagged "fast" cost out at the priority row.

Cache-write rate stays unset on these entries — OpenAI charges cache
creation at the same rate as input, which `unitPriceForDimension`'s
fallback chain already covers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant