feat(billing,codex): per-tier pricing overlays + service tier extractors by Menci · Pull Request #69 · Menci/Floway

Menci · 2026-06-19T20:05:07Z

Summary

One of three slices split from #65 (closed).

ModelPricing.tiers? overlay + TokenUsage.tier? + resolveEffectivePricing(pricing, tier) helper. Cost compute resolves effective pricing through the overlay so a single model billed at multiple tiers in the same hour aggregates as separate buckets with distinct unit prices.
Per-shape tier extractors — Messages reads usage.speed; Responses + Chat read usage.service_tier. 'standard' / 'default' / 'auto' normalize to null (base pricing); 'fast' / 'priority' / 'flex' / others stored verbatim onto TokenUsage.tier.
Codex flex/priority pricing — every priced Codex slug gains tiers.flex (50% off) + tiers.priority (~2x) overlays per OpenAI's published rates. Codex CLI's /fast writes service_tier: "priority" per openai/codex source — so operator-facing rows tagged "fast" cost out at the priority row.

Schema dependency

Hard dependency. The SQL repo writes the tier column directly into usage + usage_requests. Depends on the sibling PR adding the tier column to those tables landing first (or concurrently); until then, repo writes will fail at the SQL boundary. Sibling PR title: `feat(protocols,gateway): input_cache_write_1h dimension + migration adding tier column`.

Test plan

typecheck + test + lint clean

…rs + TokenUsage.tier + resolveEffectivePricing) Captures the per-service-tier dimension that bare ModelPricing misses: distinct service tiers for the same model (Anthropic standard / fast, OpenAI default / priority / flex / scale) are priced at different rates, and the gateway needs to surface that distinction in the cost aggregate. - `ModelPricing.tiers` carries per-service-tier overrides, keyed by the wire-value the upstream stamps on the usage object. `resolveEffectivePricing` folds a tier override into a flat snapshot before any unit-price lookup, so every downstream `unitPriceForDimension` call sees one self-contained map. - `UsageRecord` and `TokenUsage` grow a `tier` slot; the usage tables key buckets on (keyId, model, upstream, modelKey, hour, tier) so distinct tiers aggregate as separate buckets with distinct unit-price snapshots. Existing rows with `tier = NULL` keep computing identically to before (the resolver treats null as base pricing and returns the snapshot sans the `tiers` key). - `recordTokenUsage` threads the tier from the parsed `TokenUsage` onto the bucket so cost compute applies the right override; `tokenUsage`'s zero-dimension filter passes `tier` through verbatim. - Control-plane export / import surfaces the tier alongside the other bucket-identity fields; missing tier defaults to null on import. - Provider config parsers iterate `BILLING_DIMENSIONS` directly instead of a hand-rolled `keyof ModelPricing` list — the latter would now include `tiers` and admit a non-numeric value into `pricing[dimension]`. Schema: the SQL repo writes the tier column directly; depends on the sibling migration adding `tier` to `usage` + `usage_requests`.

….tier across protocol shapes Reads each upstream's service-tier marker off the usage object and stamps it onto TokenUsage.tier so the recording layer routes the bucket through the right tier override: - Messages: Opus 4.6+ emits `usage.speed: 'standard' | 'fast'`; only `fast` surfaces as `tier: 'fast'`. Standard is left unset so base-tier rows aggregate with the historical no-tier rows. Streamed deltas propagate `speed` so a late delta carries the tier all the way to message_stop. - Responses: the top-level `response.service_tier` echoes the actual processing tier ('priority', 'flex', 'scale', 'default', 'auto'). We drop 'default' and 'auto' — both denote base pricing — and surface anything else verbatim. The WebSocket path reads service_tier the same way as HTTP. - Chat Completions: same as Responses but reading the top-level `chunk.service_tier` (chat.completion[.chunk]). Protocol types grow `MessagesUsage.speed`, `ResponsesResult.service_tier`, `ChatCompletionsResult.service_tier`, and `ChatCompletionsStreamEvent.service_tier`.

Add `tiers.flex` and `tiers.priority` overlays for every priced Codex slug so the dashboard's notional cost reflects which OpenAI service tier the request actually ran on. The gateway already captures `usage.service_tier` onto `TokenUsage.tier`; this commit completes the loop by giving the cost compute a per-tier rate row to look up. Tier overrides match OpenAI's public pricing (verified 2026-06-19 against https://platform.openai.com/docs/pricing): gpt-5.5 flex $2.5/$0.25/$15 priority $12.5/$1.25/$75 gpt-5.4 flex $1.25/$0.13/$7.5 priority $5/$0.5/$30 gpt-5.4-mini flex $0.375/$0.0375/$2.25 priority $1.5/$0.15/$9 `codex-auto-review` shares `gpt-5.4`'s pricing including the tier overrides. Codex CLI's `/fast` toggle writes `service_tier: "priority"` on the wire (per openai/codex's `ServiceTier::Fast.request_value()`), so operator-facing rows tagged "fast" cost out at the priority row. Cache-write rate stays unset on these entries — OpenAI charges cache creation at the same rate as input, which `unitPriceForDimension`'s fallback chain already covers.

Menci added 3 commits June 20, 2026 03:57

Menci mentioned this pull request Jun 19, 2026

feat(billing,codex): per-tier pricing overlays + per-TTL cache writes #65

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(billing,codex): per-tier pricing overlays + service tier extractors#69

feat(billing,codex): per-tier pricing overlays + service tier extractors#69
Menci wants to merge 3 commits into
mainfrom
precursor-tier-aware-pricing

Menci commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Menci commented Jun 19, 2026

Summary

Schema dependency

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant