feat(provider): Claude Code subscription provider#60
Draft
Menci wants to merge 147 commits into
Draft
Conversation
Lays the foundation for a fifth UpstreamProviderKind. After this commit the workspace recognizes 'claude-code' as a valid upstream kind, the @floway-dev/provider-claude-code package exists with config + state asserters mirroring the codex provider's pattern, and the database schema carries a CHECK-constraint extension plus a claude_code_pkce_pending table for the OAuth handoff that lands in a follow-up commit. The data-plane factory entry is a placeholder thrower; the dashboard serializer redacts the new shape (refreshToken stays in state, surface the access-token expiry and quota snapshot summary). Both control-plane import routes and the data-plane fetch path are deliberately not wired here — those land alongside the OAuth helpers and the messages interceptors in subsequent commits.
PKCE S256 generator (32-byte verifier; produces 43-char base64url
verifier+challenge pair matching the real claude CLI). Authorize-URL
builder includes the literal `code=true` query param the CLI emits.
Token exchange + refresh against platform.claude.com/v1/oauth/token use
the JSON body shape (not form-urlencoded) and pin User-Agent: axios/1.13.6
so the wire shape is indistinguishable from a real CLI request. Both
endpoints map `invalid_grant` and `app_session_terminated` to a dedicated
ClaudeCodeOAuthSessionTerminatedError so the request path can surface
session-termination separately from a transient upstream message.
Identity is derived from `GET api.anthropic.com/api/oauth/profile` — the
profile endpoint lives on api.anthropic.com (not the OAuth host), nests
identity under `account` and `organization`, and exposes a granular
`rate_limit_tier` we combine with `organization_type` to produce a
canonical `subscriptionType` string (`pro`, `max_5x`, `max_20x`, `team`,
`enterprise`) — the same shape the CLI persists in its on-disk
credentials.json.
Two ingestion paths:
- OAuth callback: exchange `code` against `/v1/oauth/token`, then fetch
profile and build {config, state}. The profile call routes through
the per-upstream fetcher (proxy-aware); the token exchange uses direct
egress because at exchange time we don't have an upstream yet.
- credentials.json paste: parse `claudeAiOauth` wrapper, validate
`expiresAt` is milliseconds (rejects suspected seconds-mode files),
fetch profile but prefer the CLI's persisted `subscriptionType`
verbatim when present. The JSON's existing access token is reused
for the cached entry so the first request doesn't need a refresh.
Access-token cache mirrors codex's CAS semantics:
- 5-minute skew window — anything within 5 min of expiry counts as
stale to avoid racing the upstream clock.
- Refresh-token rotation MUST land via CAS. A CAS miss is fatal because
our fresh response holds a token a sibling rotation has already
superseded; surfacing the error lets the next request re-read the
just-persisted state.
- Terminal session-terminated errors flip `state` to `refresh_failed`,
clear the cached access token, persist via CAS (best-effort — sibling
may have already flipped), and rethrow.
- `invalidateClaudeCodeAccessToken` clears the cached slot without
touching the refresh token (for 401-retry).
eslint.config.ts: registered the package in projectList (missed in the
prior scaffolding commit).
Test coverage: 73 tests across pkce, oauth, import, access-token-cache.
…parser Adds the static-mimicry surface, request-detection predicate, and the `anthropic-ratelimit-unified-*` response-header parser for the Claude Code subscription provider. No fetch path is wired yet — that lands in the next change. - `headers.ts` pins the v2.1.181 header set Anthropic's third-party detector keys on, with separate Sonnet/Opus and Haiku `anthropic-beta` profiles selected via `pickClaudeCodeHeaders(modelId)`. - `system-blocks.ts` ships the three blocks we drop on the re-mimicry path: identity prefix, default-template boilerplate (`cache_control: ephemeral`), and `buildBillingBlock(version, fp)` which emits the literal `cch=00000` placeholder. `computeCcVersionFingerprint(version, body)` produces the 3-hex `cc_version` suffix via SHA-256 over salt + UTF-8 bytes[4,7,20] of the first user-message text + version, padding with `'0'` (0x30) past end-of-string. Salt and indices ported from sub2api's `gateway_billing_block.go`, which itself ports from Parrot's `cc_mimicry.py` reverse-engineered from CC packet captures. - `detection.ts` mirrors sub2api's `claude_code_validator.go` predicate: UA regex + non-messages/count_tokens/Haiku-probe short-circuits + a strict gate (x-app, anthropic-beta, anthropic-version, valid metadata.user_id in legacy or JSON form, plus a system-block billing fast path or Dice >= 0.5 against six identity templates). Dropping the Dice fallback false-negatives pre-v2.1.36 CC traffic — 2026-06-19 probes confirmed real old CC still bills correctly on the plan tier. - `quota.ts` parses the `unified-*` family captured live on 2026-06-19 into a typed snapshot with nested 5h/7d windows and an overage block, plus a raw passthrough map of every `anthropic-ratelimit-*` header for dashboard display. Missing fields read as `null` rather than throw. Default-template text extracted from `@anthropic-ai/claude-code@2.1.10`'s plain `cli.js` and resolved to a single string (the pinned v2.1.181 release ships a Bun-compiled native binary that resists clean string extraction; the spec records this fallback as acceptable). Tool-name placeholders inside the template substituted with the canonical CC tool identifiers (Bash, TodoWrite, Task, Read, Edit, Write, Grep, AskUserQuestion, WebFetch, general-purpose). Pinning gets bumped together with the CC version. Dep added: `@noble/hashes@^2.2.0` (already in the workspace via packages proxy and http). 147 new tests; full workspace test suite green (2829 tests).
Wires the Claude Code data plane on top of Task 2's OAuth + Task 3's detection/system-blocks/quota helpers. - Six-step messages interceptor chain (hoist-user-system / inject-billing / inject-identity / inject-default-template / pin-dated-model-id / synthesize-metadata-user-id) shaping non-CC traffic into the exact CC wire shape before reaching Anthropic. - callClaudeCodeMessages: pre-fetch gates (state, rate-limit), access-token cache, 401 invalidate-and-retry-once with terminal-state surfacing, best- effort quota persistence on 2xx and 429, recordUpstreamLatency contract observed on every code path including synthetic gates. - Provider factory advertises a static catalog (sonnet / opus / haiku dated ids) with current console.anthropic.com pricing baked in; messages is the only endpoint exposed, every other call surface throws not-implemented. - Detection runs at the boundary so shaped CC traffic bypasses the re-mimicry chain entirely (operator's own CC client passes through with Authorization swap only). 193 tests across the package; workspace typecheck + lint clean.
…CallOptions Add optional `clientRequestHeaders` and `clientRequestPathname` to `UpstreamCallOptions` and capture them once on GatewayCtx from the inbound Hono request. Each `attempt.ts` consumes them via a new `buildUpstreamCallOptions(candidate, ctx, recorder)` helper so the build is uniform across messages / responses / chat-completions and across the generate / count-tokens / compact entry points. Only the claude-code provider reads the new fields today — its shape detector keys on the inbound UA + pathname + system block to decide whether the request is already real Claude Code traffic that can pass through unmodified. Other providers receive the fields and ignore them. Synthetic ctxs (in-process tests, translation chains that never originated as a real HTTP request) leave both fields undefined; the claude-code detector treats that as "not CC-shaped" so the re-mimicry chain runs.
The Claude Code subscription endpoint reads the `x-anthropic-billing-header: …` block in the system prompt to bill the request against the user's plan tier — that block has to reach Anthropic intact. Every other upstream sees the same block as ordinary prompt text, and the per-turn `cch=` hash inside it kills prompt-cache hits on each turn even when the real conversation prefix hasn't changed. Add a `strip-billing-attribution` flag (default-on for copilot / azure / custom, default-off for claude-code) and gate the interceptor on it. The file is rewritten from scratch — the top-comment now explains both purposes and cites the flag id. Update strip-billing-attribution_test.ts to cover the new flag-off path (billing block survives intact when flag is off) alongside the existing strip-when-on assertions, and add gateway-level integration tests in serve_test.ts that pin the routing-side guarantee for both directions: claude-code preserves the block, copilot strips it. Also update flags_test.ts to enumerate the new defaults.
Replace the placeholder thrower with the real createClaudeCodeProvider factory and declare @floway-dev/provider-claude-code as a gateway dependency. The dispatch path that throws for unknown provider kinds now resolves to the real factory for claude-code rows. Compute effective per-binding flags in createClaudeCodeProvider — the factory layers `record.flagOverrides` on top of the catalog defaults for the kind and stamps the resulting set onto every model the catalog advertises. Without this, the flag-gated interceptor would see an empty `binding.enabledFlags` and the `strip-billing-attribution` default-off semantics for claude-code would be indistinguishable from flag-on behavior on a copilot binding. In the provider's callMessages, switch the shaped-request detection to read inbound `clientRequestHeaders` and `clientRequestPathname` from the per-call UpstreamCallOptions. Previously detection ran against the outbound wire-header bag (always empty for unshaped traffic), so the shaped-passthrough path was effectively dead in real traffic — a real Claude Code session would always have been re-mimicked. Update the data-transfer routes to accept claude-code as a valid upstream provider kind during backup/restore and round-trip ClaudeCodeUpstreamState through the runtime's shape assertion (codex already does this; claude-code now does too). Without this, an operator who ran `pnpm run export` on a deployment with a claude-code upstream would have been unable to import the dump back.
…refresh Adds four admin endpoints mirroring the codex naming: - POST /api/upstreams/claude-code-pkce-start - POST /api/upstreams/claude-code-import - POST /api/upstreams/:id/claude-code-reimport - POST /api/upstreams/:id/claude-code-refresh-now The import path accepts either a verbatim ~/.claude/.credentials.json paste or a PKCE callback (full redirect URL or just code+state). The verifier is stashed in a new claude_code_pkce_pending table whose schema was added in 0033 and whose repo wiring (memory + SQL) and scheduled sweep land here. The generic POST /api/upstreams and PATCH /api/upstreams/:id now route claude-code through the dedicated endpoints — config edits return 400 with the canonical "use claude-code-reimport" message, mirroring how codex is handled. State serialization (already in serialize.ts) exposes a slimmed summary without the refresh token; the routes themselves never leak it. Refresh-now CAS-rotates the refresh_token via the per-upstream proxy-aware fetcher, flips the row to refresh_failed on session-terminated, and returns 502 (not 401) so the dashboard's auth client doesn't log the operator out on a dead credential.
…count card Adds the operator-facing UI surface for the claude-code provider, mirroring the codex panels: - ProviderPicker grows a Claude Code tile (rose tone) so the create-mode upstream editor offers the provider. - ClaudeCodeConfigPanel orchestrates the PKCE + credentials.json paste flow, threaded through the typed Hono RPC client. The OAuth tab opens the authorize URL, accepts either the full redirect URL or the ?code=...&state=... fragment, and posts to claude-code-import; the credentials.json tab POSTs the raw paste verbatim. Re-import and Refresh-now buttons appear on edit mode. - ClaudeCodeAccountCard renders the identity + state badge + structured 5h / 7d / overage quota slices, plus access-token expiry as a relative time. The raw header map collapses under a debug disclosure so the detail page covers the operator's `claude-code-reverse`-style poke without leaving the dashboard. - UpstreamRow, ModelInfoBar, UpstreamPicker, and the route-loader guard in /dashboard/upstreams/new.vue learn the new provider kind. - api/types.ts gains the slimmed ClaudeCodeUpstreamState / quota snapshot shapes — the wire shape mirrors what serialize.ts already emits (refresh token never surfaced, only refreshTokenSet boolean).
… add credentials.json safety warning
…ead defaults, header-casing defense) Six small fixes from the final-review pass on the Claude Code provider branch: - system-blocks_test.ts: correct the byte-index annotations for 'hello world this is a test prompt' — bytes[4]='o', bytes[7]='o' (the second 'o' in "world"), bytes[20]='a'. The previous comment claimed 'r' at index 7 and 't' at index 20. Test passes either way (assertion is on hash output), but the comment misled readers. - fetch.ts persistQuotaSnapshot: drop the dishonest `as unknown as Record<string, unknown>` double-cast; the `data` field is typed `unknown`, so plain assignment is enough and correct. - provider.ts UpstreamCallOptions: document the lowercase-key invariant on `clientRequestHeaders`. The record is built via `headersToRecord` in gateway-ctx.ts, which uses `Headers.forEach` and yields lowercase keys per the WHATWG Fetch spec. - fetch.ts shaped passthrough: defensively `delete` both `authorization` and `Authorization` from the spread before adding our bearer token. The lowercase invariant should hold; the uppercase delete is one-line insurance, not a real expectation. - fetch.ts shaped passthrough: extend the `stream: true` coercion comment to record why it is safe in the shaped path — shaped detection requires CC client headers + system blocks + a valid metadata.user_id, and the real Claude Code client always sets `stream: true`. - ClaudeCodeAccountCard.vue: drop dead defensive defaults. The asserter proves `accountUuid` and `email` are non-empty strings on a non-null account, so `?? ''` and `?? 'Claude Code account'` are unreachable. Also drop the `id.length <= 18` short-circuit on `accountIdShort`: canonical UUIDs are always 36 chars, so the branch is dead.
…lable' literal Background. Anthropic returns `anthropic-ratelimit-unified-fallback: available` on every successful /v1/messages response. Claude Code v2.1.10's parser (`HoB` in cli.js, mirrored in extractUnifiedRateLimitInfo.js of the reverse-engineered source) reads it via `=== "available"`. Our parser accepted only the magic strings `y` / `yes` / `true`, so the parsed value was always null and the dashboard's "fallback active" badge never fired. The semantic also flips. The header reports whether a degraded-mode fallback service is reachable, not whether it is currently in use: `available` is the steady-state good signal, anything else is a warning that the fallback path is gone too. Renaming the field to `fallbackAvailable` reflects that, and the dashboard now surfaces a warning badge only when the gateway has lost the fallback signal. Adjustments. quota.ts switches the parse to literal-string equality against `available`; the snapshot field becomes `fallbackAvailable`. The SPA's `ClaudeCodeQuotaSnapshotData` type and the account card mirror the rename. Tests cover the steady-state `available` → true, non-`available` values → false, and absent header → null cases.
The codex catalog mapper hardcoded `enabledFlags: new Set<string>()` on
every UpstreamModel it produced, so flagOverrides configured on a codex
upstream survived the schema validation and SQL round-trip but were
silently dropped before any interceptor could read them.
The fix mirrors the claude-code provider's pattern: resolve the upstream's
effective flag set once at provider construction with
`resolveEffectiveFlags(defaultsForProvider('codex'), [record.flagOverrides])`,
then thread it into `codexRawToUpstreamModel` so every emitted model
carries the resolved set. A dedicated test asserts that turning
`responses-web-search-shim` on at the upstream layer reaches the
produced models.
… fallback) The block was tagged "pinned to @anthropic-ai/claude-code@2.1.181" but its body was the v2.1.10 cli.js text — v2.1.181 ships as a Bun-compiled binary with no plain JS source, and an earlier extraction fell back to v2.1.10 rather than reading the actual v2.1.181 wire shape. Captured v2.1.181's real `system[2]` by pointing the binary at a local 401 echo (ANTHROPIC_BASE_URL trick) and reading back the assembled prompt. v2.1.181 builds the prompt from many conditional sub-templates; we keep the always-on core (intro + URL warning + # System + # Doing tasks + # Executing actions with care + # Using your tools + # Tone and style) and drop the per-session # Environment / # Session-specific guidance / # Context management / # Text output paragraphs that depend on cwd, model, output style, and experiment flags. The captured `# Using your tools` carried the SDK-mode TaskCreate planner; cli mode falls through to TodoWrite (igm does [TaskCreate, TodoWrite].find), which is the token we ship. Detection's IDENTITY_TEMPLATES gains the v2.1.181 "interactive agent" prefix alongside the older "interactive CLI tool" string so Dice- template fallback recognises real CC traffic on either side of the 2.1.181 boundary.
Drop the # System / # Doing tasks / # Executing actions with care /
# Using your tools sections from the v2.1.181 wire shape and keep
only the opener line, the two IMPORTANT: safety lines, and the
# Tone and style bullets — the same trimmed expansion sub2api ships
in backend/internal/service/gateway_service.go.
Those four sections are CC-agent-action instructions ("prefer
dedicated tools over Bash", "use TodoWrite to plan", "default to
writing no comments") that would inappropriately steer the model
when a non-CC downstream is on the other end of the re-mimicry path.
Identity block + safety/URL pair + tone is enough structural
material for Anthropic's plan-billing detector — the detector keys
on shape and identity, not on the trimmed sections.
Block size drops from ~9.9 KB to ~1.5 KB. Detection still passes
both old and new identity-prefix variants (matching is against
IDENTITY_BLOCK, not against the template body).
…of swallowing invalidateClaudeCodeAccessToken previously warn-and-returned when the upstream row disappeared mid-401-retry. That is the silent-catch-and-return pattern the project explicitly forbids; flip it to throw so the operator sees the real condition instead of a downstream 401 mask. ensureClaudeCodeAccessToken previously synthesized a stateMessage fallback for non-active accounts via "account is in state 'X'". Tighten the wire contract: stateMessage is now required on terminal states and must be absent on active. The discriminated-union split keeps the access-token cache from inventing a message when it surfaces ClaudeCodeOAuthSessionTerminatedError, and the fetch.ts non-active synthetic-503 path drops its ternary now that the field is guaranteed.
…tead of silent defaults deriveSubscriptionType previously coerced (organization_type=null) to 'pro' and (organization_type=claude_max, unknown rate_limit_tier) to 'max', and passed through any unknown organization_type by stripping the 'claude_' prefix. All three paths silently invented subscription tiers from missing or unrecognized upstream data; replace them with throws so future Anthropic shape changes fail loud at ingest time instead of mislabeling accounts. claudeCodeTokenRequest previously caught JSON parse errors and smuggled the raw text under a synthetic '_nonJsonBody' key that downstream type checks ignore. Surface the parse failure directly with the response status and the raw-text excerpt; the original parse error rides on `cause`.
…inventing defaults The catalog mapper previously fell back to slug for a missing display_name and to 0 for missing context_window / max_context_window. A 0 is then indistinguishable from a real upstream value, and the silent slug substitution hides a contract change behind plausible-looking output. Throw on each missing field — mirroring the existing slug check — so an upstream shape change surfaces at the catalog refresh boundary rather than seeping into downstream pricing and limits. The fallback chain in codexRawToUpstreamModel still treats 0 as the "unset" sentinel for the per-request window, but the path is now rewritten with explicit comparisons so the null-vs-zero distinction is no longer wrapped in a ||-chain that conflated the two.
…explicitly
Round-1 cleanup: ClaudeCodeAccountCard previously fell back to raw.accounts[0]
when the configured account UUID did not match any state row, silently
showing credentials and quota for an unrelated account. Replace the
opportunistic find-or-fallback with a tagged CredentialLookup union
('present' | 'missing-state' | 'uuid-mismatch') and render an explicit
re-import prompt when the configured UUID is missing from state.
Timestamp parse failures returned the raw ISO string, which let malformed
upstream data render unchanged. Log a warning and substitute an
'(invalid timestamp)' sentinel instead so the operator can see the parse
failed in DevTools.
Replace the `value as Record<string, unknown>` cast in the catalog
asserter with an `isPlainRecord` type predicate; the call site no longer
asserts a shape the surrounding control flow has not already verified.
Drop the `?? {}` fallback in the callResponses / callResponsesCompact
boundary contexts. The provider contract types `headers` as
`Record<string, string> | undefined`, and spreading `undefined` is a
no-op — the fallback was defending against a case the type already
excludes.
… codex) A throw from readClaudeCodeUpstreamState means the row's state column was hand-edited or written by a buggier branch. Returning a 500 with the parse error stitched into the body leaks internal error structure to the dashboard; letting the throw propagate to the framework-level 500 handler stack-traces internally without leaking shape.
…erter contracts extractClaudeCodeCallbackParams previously fed any non-protocol-prefixed input to URLSearchParams as-is. A host-relative paste like `platform.claude.com/oauth/code/callback?code=…&state=…` would hit that path and surface a confusing "missing code" error because URLSearchParams treated the whole string as a single malformed key. Slice at the first `?` so the post-`?` segment parses as a query, and add a regression test. The state asserter previously accepted both `undefined` and `null` for `accessToken` / `quotaSnapshot` and the readClaudeCodeUpstreamState boundary normalized absent → null. Tighten the wire contract: the asserter now requires the keys to be present with explicit `null` (or a populated entry), so the boundary normalization becomes dead code and is dropped. Both writers (`auth/import.ts`, `access-token-cache.ts`) already always set both fields explicitly, so this matches the on-disk reality.
Drop four comments that paraphrase the immediately-following code: the CodexRawModel shape header repeats the field list, the codexRawToUpstreamModel describe-block preamble repeats what the four tests beneath it visibly do, and two provider-test comments restate a helper name and a `toHaveBeenCalledTimes(1)` assertion.
… drift The codex and claude-code refresh-now / OAuth-ingest handlers had diverged on three small dimensions: where the single-use PKCE-state guarantee was documented, whether the asserted state alias or the raw existing.state was passed to saveState, and what comments described the failure-state CAS write. Align both to a single voice and structure so future audits see one OAuth-ingest pattern instead of two near-identical ones. Also drop codex's manual 500-with-error-string wrapper around the state-shape assert. Mirroring claude-code (now letting the throw propagate), a malformed state column means the row was hand-edited; the framework-level 500 handler stack-traces internally without leaking the parse error to the dashboard.
…panels Round-1 cleanup: across UpstreamRow, UpstreamPicker, UpstreamConfigPanel, ModelInfoBar, UpstreamEditPage, and the new-upstream loader, the SPA carried defensive `??`/`||` fallbacks against fields the wire types already require. Drop the fakes and replace silent default arms with an exhaustive `assertNever` so a future widening of UpstreamProviderKind fails the build instead of silently routing through the amber/zinc catch-all. The new-upstream loader threw the loaded list/flags through `?? []` and silently rendered an empty page if the store failed to populate. Throw through the loader instead so the framework surfaces the error. UpstreamEditPage's autoForActive had a defensive `azure → []` early return that was unreachable in practice (the loader does not fetch for Azure, so `upstreamModels` is empty and the fall-through returns the same value); drop it. Add a shared `apps/web/src/utils/assert-never.ts` helper for the exhaustive checks.
…tion The Hono entry built a fresh Record<string, string> on every request via headersToRecord(c.req.raw.headers), then the consumer reconstructed a Headers instance (`new Headers(opts.clientRequestHeaders)`) for case- insensitive lookups. The detector reads four headers; building and re-wrapping the whole map per request was pure allocation churn on the hot path. Forward the runtime's own Headers instance through GatewayCtx and the UpstreamCallOptions wire so lookups stay native; the consumer's wrap still works because new Headers(...) accepts a Headers init. The wire type stays Headers | Record<string, string> so synthetic call sites in tests can keep passing a plain record literal. Also drop a stale citation in the GatewayCtx interface comment that referenced a Gemini WebSocket entry that does not exist (createGateway CtxForWs lives under responses/).
…uotaSnapshot state.ts previously typed `ClaudeCodeQuotaSnapshotEntry.data` as `unknown` and fetch.ts re-cast at the consumer (`as ClaudeCodeQuotaSnapshot | null`) plus non-null asserted afterward. Tighten the wire contract: data is now `ClaudeCodeQuotaSnapshot` directly, the asserter calls into a new `assertClaudeCodeQuotaSnapshot` in quota.ts that validates each field, and `isRateLimitedNow` becomes a type predicate so the resetIso line drops the `!`. Test fixtures that built the snapshot's `data` ad-hoc now use a `fullQuotaSnapshot` helper and a regression test pins the inner-shape rejection so future loose data from the wire fails loud.
…extraction
extractClaudeCodeCallbackParams throws on a malformed callback URL.
Previously the throw fell through the outer try/catch and surfaced
as an opaque 400, with the URL-parse step's role in the data flow
invisible from the read of the function. Wrap the call so the
malformed-URL early-return is visible alongside every other
explicit `{ ok: false }` exit in the helper.
… compute - New protocols/common/models_test.ts covers resolveEffectivePricing (override merge, strips `tiers`, unknown/null/absent tier behavior, null base) and the input_cache_write_1h → input_cache_write → input fallback chain. - New Messages stream usage tests cover the cache_creation per-TTL split (sub-object takes precedence over the rolled-up flat field), the fallback to the flat field when the sub-object is absent, and the speed=fast → tier=fast extraction (standard leaves tier unset). - New aggregate tests verify the per-tier override applies during cost compute (Opus 4.8 fast at 2× base costs twice as much) and that an unknown tier falls back to base pricing, plus pricing for the new input_cache_write_1h dimension via its dedicated rate. - Responses WebSocket usage extractor now reads service_tier from the response object (matching the HTTP path), since it ran a duplicate helper that had drifted before this change.
Add `tiers.flex` and `tiers.priority` overlays for every priced Codex slug so the dashboard's notional cost reflects which OpenAI service tier the request actually ran on. The gateway already captures `usage.service_tier` into `TokenUsage.tier`; this commit completes the loop by giving the cost compute a per-tier rate row to look up. Tier overrides match OpenAI's public pricing (verified 2026-06-19 against https://platform.openai.com/docs/pricing): gpt-5.5 flex $2.5/$0.25/$15 priority $12.5/$1.25/$75 gpt-5.4 flex $1.25/$0.13/$7.5 priority $5/$0.5/$30 gpt-5.4-mini flex $0.375/$0.0375/$2.25 priority $1.5/$0.15/$9 `codex-auto-review` shares `gpt-5.4`'s pricing including the tier overrides. Codex CLI's `/fast` toggle writes `service_tier: "priority"` on the wire (per openai/codex's `ServiceTier::Fast.request_value()`), so operator-facing rows tagged "fast" cost out at the priority row. Cache-write rate stays unset on these entries — OpenAI charges cache creation at the same rate as input, which `unitPriceForDimension`'s fallback chain already covers.
…provider # Conflicts: # apps/web/src/api/types.ts # apps/web/src/components/upstream-edit/UpstreamConfigPanel.vue # apps/web/src/components/upstream-edit/UpstreamEditPage.vue # apps/web/src/pages/dashboard/upstreams/new.vue # packages/gateway/src/control-plane/upstreams/routes.ts # packages/gateway/src/control-plane/upstreams/routes_test.ts # packages/gateway/src/data-plane/llm/responses/websocket.ts # packages/gateway/src/data-plane/llm/shared/gateway-ctx.ts # packages/gateway/src/data-plane/llm/shared/gateway-ctx_test.ts # packages/gateway/src/repo/sql.ts
Backport the Claude Code refresh-race recovery (commit f1efc9d) to Codex. Same architecture, same vulnerability: Under burst load, two workers can both observe a stale access token on the same Codex upstream and both attempt a refresh. OpenAI rotates the refresh_token on every successful /oauth/token call, so exactly one racer wins; the other's request is rejected with `invalid_grant` for trying to redeem the rotated-out copy. The previous flow treated every `invalid_grant` as a dead credential and let the caller flip the account to `refresh_failed` — destroying the working credential a sibling had just rotated, and forcing the operator to re-import. On `invalid_grant`, the access-token cache now re-reads upstream state for the same `chatgpt-account-id` slot and compares the refresh_token it tried against what is now stored. If they differ, a sibling rotated and we return their freshly-minted access token (the caller treats it as a normal cache hit and skips the terminal flip). If they match, we re-raise the original error so the data-plane / control-plane caller flips the row as before. The other refresh-terminal codes — `app_session_terminated`, `invalid_refresh_token`, `invalid_client`, `unauthorized_client`, `access_denied` — bypass recovery entirely; none of them are caused by a rotation race. `CodexOAuthSessionTerminatedError` now carries the raw OAuth `error` value as a `code` field alongside the existing `upstreamMessage` so the recovery branch can single out `invalid_grant` from the catch. `REFRESH_TERMINAL_OAUTH_CODES` is broadened to the audit-aligned set (`app_session_terminated`, `invalid_grant`, `invalid_refresh_token`, `invalid_client`, `unauthorized_client`, `access_denied`) — Codex is OpenAI OAuth, so the list matches sub2api's `isNonRetryableRefreshError` verbatim. Sub2api's tryRecoverFromRefreshRace (backend/internal/service/oauth_refresh_api.go:173-193) is the canonical pattern; we apply it to Codex's per-account credential here. The token rotation persistence hook stays awaited (matches Claude); the recovery branch reads from the just-persisted state via the upstream repo and returns the sibling's cached access token directly so no second mint fires from this call site.
…ope) Adds a third Claude Code credential class alongside the existing full-scope PKCE flow and credentials.json paste: the Setup Token, a long-lived (~1 year) inference-only bearer that Anthropic's "Create a Long-Lived Token" UI issues. Cross-checked against sub2api (BuildSetupTokenAuthorizeURL, ScopeInference, expires_in=31536000 on exchange) and claude-relay-service (SCOPES_SETUP, exchangeSetupTokenCode). Wire-shape differences from the regular OAuth flow: - Authorize URL: scope narrows to `user:inference` only (no org:create_api_key, no user:profile, ...). Host / client_id / redirect_uri / response_type are identical. - Token exchange: body adds expires_in=31536000 to request the 1-year bearer. Response has NO refresh_token — the access token IS the credential. - Profile endpoint: the bearer lacks user:profile scope, so /api/oauth/profile 403s. The existing degraded-identity fallback fires and surfaces a deterministic accountUuid + null email. State and lifecycle: - ClaudeCodeAccountCredential becomes a discriminated union on tokenKind: `oauth` carries a non-empty rotating refresh token; `setup-token` carries null. Old records (written before this field existed) are normalized to `oauth` on read so legacy state passes the strict asserter. - ensureClaudeCodeAccessToken short-circuits for setup-token: cached bearer is returned when fresh, or the row is flipped to refresh_failed with a "re-import to recover" message when expired. No upstream call. - The control-plane refresh-now route rejects setup-token credentials with a precise message rather than attempting a refresh that has no rotation counterpart. ClaudeOAuthTokenResponse.refresh_token becomes optional to reflect the setup-token shape; the OAuth-flow and refresh paths guard explicitly so shape drift surfaces a clear error instead of corrupting persisted state. Adds tests for: setup-token authorize URL scope, setup-token exchange body (expires_in=31536000), setup-token import end-to-end with degraded identity, setup-token cache short-circuit (fresh + expired paths), state asserter token-kind discrimination, and legacy default-fill.
Adds three parallel routes for the Setup-Token import flow alongside the existing OAuth ones: - POST /api/upstreams/claude-code-setup-token-pkce-start - POST /api/upstreams/claude-code-setup-token-import - POST /api/upstreams/:id/claude-code-setup-token-reimport The PKCE pending store is shared with the regular OAuth flow because only the authorize URL's `scope` differs server-side at Anthropic; a verifier issued by either start endpoint can be consumed by either import endpoint in principle, though the UI keeps them paired so the user-facing scope intent matches the persisted credential class. The setup-token import body has no `credentials_json` path (Anthropic's CLI never persists a setup token) — only the OAuth callback is accepted. The default display name falls back to the short account UUID when the degraded-identity path returns null email (the usual case). claude-code-refresh-now now rejects setup-token credentials with a precise 400 message; the existing route was previously typed against the old shape where `refreshToken` was always non-null. It also asserts the refresh-token presence on the upstream response so shape drift surfaces loudly instead of corrupting the persisted row. The control-plane serializer surfaces `tokenKind` on each account summary so the dashboard can label setup-token accounts distinctly. The SPA types declare the field as optional to tolerate the pre-deploy window before the gateway is upgraded. Adds tests covering: setup-token PKCE start, end-to-end setup-token import with degraded identity, refresh-now rejection on setup-token, and setup-token reimport replacing credentials in place.
Three-tab strip in the Claude Code import flow: - Sign in with Claude — existing full-scope PKCE OAuth. - Setup Token — new inference-only PKCE flow with a brief callout explaining the trade-off (cannot self-mint API keys, cannot be refreshed; safer for shared deployments). - Paste credentials.json — existing manual paste. Each PKCE-driven tab manages its own in-flight authorize URL session because the URL's `scope` is locked server-side; switching tabs lazy- fetches the URL for the newly active tab. The Account Card shows a "Setup Token" badge next to the subscription chip (violet tone, uppercase) so an operator scanning the dashboard immediately sees which credential class a row holds. The "Refresh token now" button is hidden for setup-token rows because that flow has no rotation counterpart — re-import is the only renewal path. Wire-shape: dashboards consume the new `tokenKind` field through the `ClaudeCodeAccountCredentialSummary` type declared in commit 2; the field is optional client-side so a dashboard running ahead of a deploy still parses pre-upgrade responses correctly.
…olate A cold-start fan-out of N concurrent requests against an empty access-token cache used to drive N parallel `/v1/oauth/token` POSTs. Anthropic rotates the refresh token on every call, so only one POST survives; the rest get `invalid_grant` and fall into `recoverFromRefreshRace` (commit f1efc9d) which re-reads state and returns the winner's freshly-rotated token. That is correct but burns N round-trips against the OAuth endpoint per cold start and adds the recovery re-read to every loser's latency. Add a module-level `Map<upstreamId, Promise<EnsuredAccessToken>>` that coalesces concurrent ensures on the same upstream into a single in-flight promise. Later callers await the existing promise and observe the first caller's result; the map entry is cleared on settle (both success and failure) so the next wave is free to mint when the cache expires again. Scope: per-isolate only. Cloudflare Workers isolates do not share memory, so siblings on other isolates still race; cross-isolate dedup remains the job of `recoverFromRefreshRace`. Sub2api gates the same path with a Redis SETNX lease over a 60s TTL window (gateway_service refresh path via `oauth_refresh_api.go:91-105`), making the whole cluster single-mint per refresh window. We trade that for zero coordination dependency at the cost of cross-isolate-only round-trip duplication — typically the rare case, since a single isolate usually serves multiple concurrent requests for the same credential. `freshlyMinted` semantics widen slightly: coalesced waiters now report `freshlyMinted: true` even though they did not drive the mint themselves (they share the minter's EnsuredAccessToken). The 401-retry path still behaves correctly — the slight semantic shift turns a previously-cheap "invalidate + re-mint on 401" branch into a "give up on 401" branch for coalesced peers, which is the safer of the two under a dying-credential scenario. Tests cover the 10-parallel-cold-start case (asserts exactly one fetch), the post-settle map cleanup case, and the rejected-mint propagation case (every waiter sees the same rejection, map cleared).
… boundary Anthropic /v1/messages requires `max_tokens` and 422s without it. Real CC also always emits `temperature: 1`; its absence is one of the CC-shape fingerprint failures the plan-billing detector keys on. Third-party clients (cline, aider, custom integrations) routinely omit one or both, expecting the gateway to fill defaults. Sub2api (gateway_service.go:1301-1314, rev 4a5665d) backfills both unconditionally: `max_tokens: 128000`, `temperature: 1`. We mirror the temperature default verbatim and pick a softer max_tokens policy — prefer the model's advertised `limits.max_output_tokens`, else fall back to the gateway-wide MESSAGES_FALLBACK_MAX_TOKENS (8192). Sub2api's hardcoded 128000 ignores per-model output caps and is not worth porting. The new interceptor sits at the head of `claudeCodeMessagesChain` so the rest of the re-mimicry steps (notably `inject-billing-block`, whose fingerprint depends on the post-chain wire shape) see a fully-formed payload.
…org 403 sentinels
Two credential-class body sentinels were leaking through as ordinary
4xxs, leaving the operator to chase phantom request errors while the
gateway burned an OAuth refresh on every retry against a permanently
dead account:
- 400 invalid_request_error / "organization has been disabled"
(sub2api ratelimit_service.go:208-214, CRS
claudeRelayService.js:_isOrganizationDisabledError)
- 403 permission_error / "OAuth authentication is currently not
allowed" (CRS claudeRelayService.js:_isOrganizationDisabledError)
Both signal a permanently disabled org that no retry, refresh, or
re-import can recover; the operator must contact Anthropic. The
match runs on the lowercased error.message substring against the
typed (invalid_request_error / permission_error) error shape.
Detection lives in performUpstreamCall's .then(response => ...) block,
alongside the existing quota-snapshot persist. The response body is
cloned before reading so the upstream response still streams to the
caller verbatim — passthrough discipline holds; the terminal flip is
purely additional dashboard signal (the same role persistTerminalState
plays inside access-token-cache's refresh-death path). The persist is
fire-and-forget via the same waitUntil shim the quota persist uses.
Body parse is deliberately defensive: any 400 / 403 whose body is not
JSON, or whose JSON shape does not match {error:{type,message}}, must
not trigger a terminal flip — those are the common case (max_tokens
validation, beta-feature gating, etc.). Skip-when-already-terminal
keeps the first terminal signal stable when a sibling write
(e.g. an OAuth refresh death) won the race.
…-stream `tokenUsageFromMessagesFrame` only returned the running usage figure on `message_stop`, leaving the per-frame observer's `state.rememberUsage` calls a no-op for the in-flight `message_start` and `message_delta` frames. When the downstream client aborted mid-stream — every Ctrl-C in the Claude Code CLI — the streaming finally block then recorded a null or stale usage, so billing telemetry under-counted every aborted session even though Anthropic already metered the output tokens against the operator's plan window. Return the running snapshot on `message_start` and `message_delta` too so `state.usage` checkpoints as the stream progresses; the finally block already records whatever was last observed when the client disconnects. Audited against sub2api (drains upstream to capture final usage even on client disconnect) — different remedy, same intent: keep recorded usage aligned with what the upstream actually metered.
A stuck upstream SSE read otherwise pins the request open until the underlying fetch read timeout (Workers HTTP cap ~100s) or the client disconnects — neither of which Hono's keepalive scheme catches, because the downstream ping timer fires whether or not upstream sent anything. Add a generic `withIdleTimeout` wrapper that races every iterator next() against a setTimeout, invokes an `onTimeout` callback (to abort the upstream fetch via the gateway's downstream AbortController), and throws `UpstreamIdleTimeoutError`. Each received frame resets the window; Anthropic's own `event: ping` survives parse-sse as a real frame, so the 30s upstream ping cadence keeps healthy streams alive forever. Wire it into the Messages stream path with a 60s window — 2× headroom above Anthropic's 30s ping cadence, comfortably below the Workers HTTP timeout. The thrown error flows through the existing `messagesSseFrames` catch, which emits a synthetic Messages `event: error` so downstream clients see a clean failure instead of a hung socket. Audited against sub2api's `StreamDataIntervalTimeout` (config-driven 30-300s, default off, synthetic error frame on fire); we default on because a stuck upstream cannot pin a request indefinitely.
…e breakpoint cap
Anthropic's prompt-caching API rejects requests that carry more than
four `cache_control` markers ("the API enforces a maximum of 4 cache
points per request",
https://platform.claude.com/docs/en/build-with-claude/prompt-caching).
The budget is shared across system blocks, tools, and message content.
`inject-default-template` previously appended DEFAULT_TEMPLATE_BLOCK at
`system[2]` with its `cache_control: { type: 'ephemeral', ttl: '5m' }`
unconditionally. Callers that already carry four breakpoints across
their tools / system / messages were therefore pushed to five on the
wire and the request 400'd upstream.
Count caller breakpoints before injection: if adding ours would exceed
the cap, inject the template text without `cache_control` (the three-
block system shape that real CC ships still needs to be there for the
Anthropic plan detector; only the cache anchor is dropped). Caller
overage past the cap is left for the caller to fix — we never synthesize
an error for it.
Mirrors sub2api's `maxCacheControlBlocks = 4` and the count-and-demote
pass in `enforceCacheControlLimit`
(`backend/internal/service/gateway_service.go:71,4581`).
…flake The control-plane route tests use `setupAppTest` to spin up the full Hono app + memory D1 mock + admin session per test. Real work is hundreds of milliseconds, but under workspace-parallel load (workers contending for CPU, GC pauses) several of them push past vitest's 10s default and flake intermittently. 30s gives headroom without masking actual hangs. Also folds in eslint --fix for the stylistic/function-paren-newline violations the access-token-cache_test agent left.
c69e72e to
ef2061f
Compare
…ignals Per the deep observability audit (sub2api parity gap 8 + gaps 1/2/4), the provider previously emitted three free-form `console.warn`/`console.info` lines for the entire surface — refresh-race recovery, quota-persist failure, terminal-sentinel persist failure, identity-degraded fallback, unknown organization_type — with no event name discipline, no field discipline, and zero signal for the moments operators most need to grep for: refresh-token rotation, quota-status transitions, terminal flips. Add a small `log.ts` helper (snake_case event name + KV fields, slog-shaped). KV chosen over JSON-line so the lines stay human-greppable on a worker tail without a JSON pretty-printer; both runtimes' `console.*` flush them verbatim. Format: `event_name field=value field2="quoted with spaces"`; numbers and booleans bare; null/undefined emitted as the literal `null` so the operator sees the field was considered. All event names prefixed `claude_code_` so a single grep separates them from anything else in the worker log. Converts every existing console.* in the provider to a structured event and adds the four high-value signals that were previously silent: - `claude_code_refresh_token_rotated` — emitted by the access-token cache on every successful refresh round-trip (was silent; only observable via after-the-fact `accessToken.refreshedAt` diffs). - `claude_code_refresh_race_recovered` — replaces the free-form console.info; now carries `upstream_id`, `account_uuid`, and the first 6 chars of the rotated refresh token. - `claude_code_account_state_flip` — emitted on every persistTerminalState call (access-token cache: setup-token expiry, OAuth refresh death) and on the terminal-sentinel persist in fetch.ts. Carries `from_state`, `to_state`, `reason`, `oauth_code`/`upstream_status`, and the operator-facing message. Previously the flip was silent — the only signal was a 503 surfacing to the client. - `claude_code_quota_state_transition` — emitted from persistQuotaSnapshot only when the persisted status differs from the prior snapshot, so the line count tracks transitions (allowed→rejected, rejected→allowed, initial population) rather than every request. Previously, the only way to observe the moment of rate-limit onset was to diff `state_json.quotaSnapshot.fetchedAt`. - `claude_code_terminal_sentinel_detected` — emitted when the org-disabled-400 or banned-org-403 body sentinel matches, separately from the subsequent state-flip event so the detection signal is visible even if the persist races and loses. Plus housekeeping conversions: - `claude_code_quota_persist_failed`, `claude_code_terminal_sentinel_persist_failed` replace the two warn-on-rejection paths in fetch.ts. - `claude_code_identity_degraded_fallback` and `claude_code_unknown_organization_type` replace the two console.warn calls in auth/identity.ts. The per-request "detection fallback" event (gap 9 in the audit) is out of scope here — it would require either accepting a high-volume per- request log line or adding a runtime env-getter dependency just for an opt-in flag. Left for a follow-up. Includes log_test.ts covering format, whitespace quoting, null/undefined handling, and the three log-level routes.
…e operator The control-plane claudeCodeRefreshNow handler does its own /v1/oauth/token mint + CAS-write, parallel to the data-plane access-token cache. Under a narrow interleaving, a data-plane request rotates the refresh token between our read and our mint, the upstream answers our (now stale) RT with invalid_grant, and the handler flipped the row to refresh_failed and returned a 400 "Re-import the credential to recover" toast — even though the credential was alive and freshly rotated by the sibling. Mirror the data-plane recovery from access-token-cache.ts (commit f1efc9d): on invalid_grant (or a losing CAS), re-read the row. If a sibling rotated successfully (state still active, refresh token differs, fresh access token in cache), surface 200 with the rotated state so the dashboard shows the refresh as the no-op success it actually was. If the row is unchanged or now terminal, fall through to the existing terminal- flip path. Bounded by a `recoveryAllowed` flag for one recovery attempt per request, matching the data-plane depth guard. Audit reference: sdd/sub2api-deep-concurrency.md axis #10.
This was referenced Jun 19, 2026
Open
1 task
a547c70 to
94d5363
Compare
Per the deep admin-routes audit (sub2api-deep-admin-routes.md item #5) plus the dedicated quota-probe decision (sub2api-deep-quota-probe-decision.md), real @anthropic-ai/claude-code@2.1.181 calls GET https://api.anthropic.com/api/oauth/usage as a standalone request to read the unified rate-limit snapshot without burning a model call. The binary's string table carries the literal "fetchUtilization: GET /api/oauth/usage" plus a response decoder that consumes {five_hour, seven_day, seven_day_sonnet, ...} each as {utilization, resets_at} plus optional overage fields. Mirror this in the gateway as POST /api/upstreams/:id/probe-quota: - Add fetchClaudeCodeUsageProbe(accessToken, fetcher) helper in packages/provider-claude-code/src/auth/usage-probe.ts. Sends the same oauth-2025-04-20 anthropic-beta header sub2api pins (without it the upstream 401s even on a valid bearer). Returns body verbatim — we do NOT shape-check the inner JSON because Anthropic adds fields (priorIsUsingOverage, hadPriorUtilizationData, ...) without warning, and a strict parser would reject a perfectly usable new field. - Add control-plane route claudeCodeProbeQuota in packages/gateway/src/control-plane/upstreams/routes.ts. Mints the access token via ensureClaudeCodeAccessToken (the same code path the data plane uses, so refresh races and terminal flips are handled identically). Routes through the per-upstream proxy fetcher so the probe lands through the same chain as data-plane requests. Returns the upstream body spread at the top level + a gateway-stamped fetched_at ISO timestamp. - Persist the probe response into a new ClaudeCodeAccountCredential field `usageProbeSnapshot` so the dashboard's quota card sees the freshest snapshot without re-probing. Distinct slot from the existing `quotaSnapshot` (which is header-derived from the data plane) because the wire shapes are completely different — header-derived ClaudeCodeQuotaSnapshot has a fixed schema, the live probe is free-form JSON that evolves on Anthropic's schedule. - state.ts adds the new slot with a legacy-default normalization in readClaudeCodeUpstreamState so records written before this commit still pass the strict-key gate; the asserter requires an explicit `null` from new writes. State persistence is best-effort: a CAS loss to a concurrent rotation is fine because the live response already returned to the operator regardless. - Audit log: emit the brief's `claude_code_admin_action` structured event (action: 'quota_probe') on both success and error. The event reuses the provider's existing slog-shaped logger (log.ts is now re-exported from the package index so the gateway can emit the same KV-formatted lines as the provider). Non-claude-code upstreams 404; the probe semantics are claude-code-specific.
The probe-quota route (added in the prior commit) lets an operator pull a live /api/oauth/usage snapshot without burning a model call. Surface it as a Refresh button on the account card and reconcile the two snapshot sources the credential now carries: - quotaSnapshot is header-derived; written by the data plane on every /v1/messages response. - usageProbeSnapshot is the verbatim probe body; written by the new operator-driven route. The two cover the same 5h/7d windows under different field names and shapes. Render whichever source is newer per window (probe right after the operator hits Refresh; headers right after any real model call). The probe-only 7-day Sonnet window shows up only when the probe has run. Each window chip labels its source so the operator knows which axis the percentage came from. Plumbing: ClaudeCodeAccountCard emits `refresh-quota`; ClaudeCodeConfigPanel handles the POST and emits `quota-refreshed` with a locally-merged record (the route returns the upstream body verbatim, not the persisted summary, so we mirror the gateway's persistence shape on the client). The new event is distinct from `imported` because the existing handler navigates / re-keys the loader, which is the wrong response for a non-mutating data refresh. UpstreamEditPage lands the updated record on `liveRecord` so the card re-renders without a list reload, and the panel's `:record` binding switches from `props.record` to `liveRecord` so quota-refresh mutations actually reach the card. Raw probe extras get their own disclosure for fields outside the three known windows (priorIsUsingOverage, hadPriorUtilizationData, ...), so a schema addition on Anthropic's side surfaces without a dashboard change.
94d5363 to
739434b
Compare
Per the "杜绝假鲁棒" rule: defensive checks that the type system already
guarantees, or that the only call site contractually rules out, are dead
weight and should be removed so real failures stay loud.
- detection.ts: the inbound-pathname short-circuit (`if (input.pathname !==
'/v1/messages') return true;`) is unreachable. `callMessages` is the only
caller and the catalog declares `endpoints: { messages: {} }`; every other
surface returns a synthetic 405 before reaching the detector. Drop the
branch, the `pathname` field on `ClaudeCodeShapedRequestInput`, and the
`clientRequestPathname` plumbing in `provider.ts:looksShaped`.
- access-token-cache.ts: the `if (account.refreshToken === null) throw …`
check inside the oauth branch is dead. The `setup-token` early-return
above narrows TS to the `oauth` arm of the discriminated union, whose
`refreshToken` is typed `string`. The state asserter enforces it on read.
- ClaudeCodeAccountCard.vue: `typeof raw !== 'object'` is dead — `raw` is
typed `Record<string, string>` from the schema, so the `!raw` guard
already covers the only possible falsy case (the `quota.value?.raw`
optional chain returning undefined).
Tests updated to drop the obsolete `pathname` / `clientRequestPathname`
inputs and the two short-circuit cases that exercised the deleted branch.
…provider # Conflicts: # apps/web/src/api/types.ts # apps/web/src/components/models/ModelInfoBar.vue # apps/web/src/components/settings/UpstreamRow.vue # apps/web/src/components/upstream-edit/CodexConfigPanel.vue # apps/web/src/components/upstream-edit/CopilotConfigPanel.vue # apps/web/src/components/upstream-edit/CopilotDeviceFlow.vue # apps/web/src/components/upstream-edit/ProviderPicker.vue # apps/web/src/components/upstream-edit/UpstreamConfigPanel.vue # apps/web/src/components/upstream-edit/UpstreamEditPage.vue # apps/web/src/components/upstreams/UpstreamPicker.vue # packages/gateway/src/control-plane/auth/github-device-flow.ts # packages/gateway/src/control-plane/data-transfer/routes.ts # packages/gateway/src/control-plane/schemas.ts # packages/gateway/src/control-plane/upstreams/proxy-resolution.ts # packages/gateway/src/control-plane/upstreams/routes.ts # packages/gateway/src/control-plane/upstreams/routes_test.ts # packages/gateway/src/data-plane/providers/registry.ts # packages/gateway/src/repo/sql.ts # packages/provider/src/model.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New
provider-claude-codepackage — Claude.ai subscription as a gateway upstream. PKCE OAuth + credentials.json import, refresh-token rotation with concurrent-race recovery, plan-billing detection + re-mimicry, dynamic/v1/messages?beta=truecatalog discovery, dashboard surface (provider picker, import flow, account card with quota windows). Wire shape verified byte-for-byte againstWei-Shaw/sub2apiand a fresh capture of real@anthropic-ai/claude-code@2.1.181cli.js (sub2api itself only pins 2.1.161; we close the post-2.1.161 release-window gap onmid-conversation-system-2026-04-07).Cross-provider tier-aware pricing schema —
BillingDimensiongainsinput_cache_write_1hfor Anthropic'sextended-cache-ttl-2025-04-11;ModelPricinggains atiers?overlay;TokenUsage.tiercarries the request's actual tier extracted from response (usage.speedfor Anthropic fast mode,usage.service_tierfor OpenAI flex/priority). Cost compute resolves effective pricing throughresolveEffectivePricing(pricing, tier)so a single model billed at multiple tiers in the same hour aggregates as separate buckets with distinct unit prices. Pricing tables populated for Claude Opus 4.6/4.7/4.8 fast mode and every priced Codex slug's flex + priority overlays. Migration0034_usage_per_ttl_and_tier.sqladds the column + widens the dimension CHECK list and bucket-identity unique index — backfillstier = NULLso historical aggregations compute identically.Gateway plumbing improvements — anthropic-ratelimit-*, request-id, cf-ray forwarded to downstream clients on every LLM surface (Messages / Chat / Responses / Gemini);
UpstreamCallOptions.waitUntilplumbed so workerd doesn't cancel post-response persists (quota snapshot, token rotation);clientRequestHeaders+clientRequestPathnamethreaded through so providers can inspect inbound shape; control-plane proxy_fallback_list override applied pre-save (refresh-now, OAuth import, copilot poll) so dashboard operations honor unsaved proxy edits.Test plan
pnpm run typecheck— clean across all packagespnpm run test— 2974 tests pass across 268 filespnpm run lint— cleaninterleaved-thinkingonly matters in multi-turn agent loops, not single requests)59cf53e54c78+ indices[4, 7, 20]+ sha256 + slice(0, 3) algorithm unchanged from v2.1.10 → v2.1.181fine-grained-tool-streaming-2025-05-14removed from v2.1.181 OAuth-mimicry path (release notes: "GA on 4.6")mid-conversation-system-2026-04-07shipped;redact-thinking-2026-02-12+summarize-connector-text-2026-03-13deliberately omitted with rationale)pnpm run db:migrate:remote && pnpm run deploy— migration0034_usage_per_ttl_and_tier.sqlrebuildsusage+usage_requeststables (CHECK constraints can't be ALTERed in SQLite); standard SQLite "create new + copy + rename" pattern, data-preservingOutstanding / deferred
/v1/messagesendpoint per-TTL cache split is unknown — both Copilot accounts in production D1 have Claude disabled by enterprise policy; can verify when an account with Claude access is addedspeed: standardin usage despitefast-mode-2026-02-01inanthropic-beta); body field name TBC, captured by extractor when it does fire