feat(billing,codex): per-tier pricing overlays + service tier extractors#69
Open
Menci wants to merge 3 commits into
Open
feat(billing,codex): per-tier pricing overlays + service tier extractors#69Menci wants to merge 3 commits into
Menci wants to merge 3 commits into
Conversation
…rs + TokenUsage.tier + resolveEffectivePricing) Captures the per-service-tier dimension that bare ModelPricing misses: distinct service tiers for the same model (Anthropic standard / fast, OpenAI default / priority / flex / scale) are priced at different rates, and the gateway needs to surface that distinction in the cost aggregate. - `ModelPricing.tiers` carries per-service-tier overrides, keyed by the wire-value the upstream stamps on the usage object. `resolveEffectivePricing` folds a tier override into a flat snapshot before any unit-price lookup, so every downstream `unitPriceForDimension` call sees one self-contained map. - `UsageRecord` and `TokenUsage` grow a `tier` slot; the usage tables key buckets on (keyId, model, upstream, modelKey, hour, tier) so distinct tiers aggregate as separate buckets with distinct unit-price snapshots. Existing rows with `tier = NULL` keep computing identically to before (the resolver treats null as base pricing and returns the snapshot sans the `tiers` key). - `recordTokenUsage` threads the tier from the parsed `TokenUsage` onto the bucket so cost compute applies the right override; `tokenUsage`'s zero-dimension filter passes `tier` through verbatim. - Control-plane export / import surfaces the tier alongside the other bucket-identity fields; missing tier defaults to null on import. - Provider config parsers iterate `BILLING_DIMENSIONS` directly instead of a hand-rolled `keyof ModelPricing` list — the latter would now include `tiers` and admit a non-numeric value into `pricing[dimension]`. Schema: the SQL repo writes the tier column directly; depends on the sibling migration adding `tier` to `usage` + `usage_requests`.
….tier across protocol shapes
Reads each upstream's service-tier marker off the usage object and stamps it
onto TokenUsage.tier so the recording layer routes the bucket through the
right tier override:
- Messages: Opus 4.6+ emits `usage.speed: 'standard' | 'fast'`; only `fast`
surfaces as `tier: 'fast'`. Standard is left unset so base-tier rows
aggregate with the historical no-tier rows. Streamed deltas propagate
`speed` so a late delta carries the tier all the way to message_stop.
- Responses: the top-level `response.service_tier` echoes the actual
processing tier ('priority', 'flex', 'scale', 'default', 'auto'). We drop
'default' and 'auto' — both denote base pricing — and surface anything
else verbatim. The WebSocket path reads service_tier the same way as HTTP.
- Chat Completions: same as Responses but reading the top-level
`chunk.service_tier` (chat.completion[.chunk]).
Protocol types grow `MessagesUsage.speed`, `ResponsesResult.service_tier`,
`ChatCompletionsResult.service_tier`, and `ChatCompletionsStreamEvent.service_tier`.
Add `tiers.flex` and `tiers.priority` overlays for every priced Codex slug so the dashboard's notional cost reflects which OpenAI service tier the request actually ran on. The gateway already captures `usage.service_tier` onto `TokenUsage.tier`; this commit completes the loop by giving the cost compute a per-tier rate row to look up. Tier overrides match OpenAI's public pricing (verified 2026-06-19 against https://platform.openai.com/docs/pricing): gpt-5.5 flex $2.5/$0.25/$15 priority $12.5/$1.25/$75 gpt-5.4 flex $1.25/$0.13/$7.5 priority $5/$0.5/$30 gpt-5.4-mini flex $0.375/$0.0375/$2.25 priority $1.5/$0.15/$9 `codex-auto-review` shares `gpt-5.4`'s pricing including the tier overrides. Codex CLI's `/fast` toggle writes `service_tier: "priority"` on the wire (per openai/codex's `ServiceTier::Fast.request_value()`), so operator-facing rows tagged "fast" cost out at the priority row. Cache-write rate stays unset on these entries — OpenAI charges cache creation at the same rate as input, which `unitPriceForDimension`'s fallback chain already covers.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
One of three slices split from #65 (closed).
ModelPricing.tiers?overlay +TokenUsage.tier?+resolveEffectivePricing(pricing, tier)helper. Cost compute resolves effective pricing through the overlay so a single model billed at multiple tiers in the same hour aggregates as separate buckets with distinct unit prices.usage.speed; Responses + Chat readusage.service_tier.'standard'/'default'/'auto'normalize to null (base pricing);'fast'/'priority'/'flex'/ others stored verbatim ontoTokenUsage.tier.tiers.flex(50% off) +tiers.priority(~2x) overlays per OpenAI's published rates. Codex CLI's/fastwritesservice_tier: "priority"peropenai/codexsource — so operator-facing rows tagged "fast" cost out at the priority row.Schema dependency
Hard dependency. The SQL repo writes the
tiercolumn directly intousage+usage_requests. Depends on the sibling PR adding thetiercolumn to those tables landing first (or concurrently); until then, repo writes will fail at the SQL boundary. Sibling PR title: `feat(protocols,gateway): input_cache_write_1h dimension + migration adding tier column`.Test plan