feat(provider): Claude Code subscription provider by Menci · Pull Request #60 · Menci/Floway

Menci · 2026-06-19T11:32:01Z

Summary

New provider-claude-code package — Claude.ai subscription as a gateway upstream. PKCE OAuth + credentials.json import, refresh-token rotation with concurrent-race recovery, plan-billing detection + re-mimicry, dynamic /v1/messages?beta=true catalog discovery, dashboard surface (provider picker, import flow, account card with quota windows). Wire shape verified byte-for-byte against Wei-Shaw/sub2api and a fresh capture of real @anthropic-ai/claude-code@2.1.181 cli.js (sub2api itself only pins 2.1.161; we close the post-2.1.161 release-window gap on mid-conversation-system-2026-04-07).
Cross-provider tier-aware pricing schema — BillingDimension gains input_cache_write_1h for Anthropic's extended-cache-ttl-2025-04-11; ModelPricing gains a tiers? overlay; TokenUsage.tier carries the request's actual tier extracted from response (usage.speed for Anthropic fast mode, usage.service_tier for OpenAI flex/priority). Cost compute resolves effective pricing through resolveEffectivePricing(pricing, tier) so a single model billed at multiple tiers in the same hour aggregates as separate buckets with distinct unit prices. Pricing tables populated for Claude Opus 4.6/4.7/4.8 fast mode and every priced Codex slug's flex + priority overlays. Migration 0034_usage_per_ttl_and_tier.sql adds the column + widens the dimension CHECK list and bucket-identity unique index — backfills tier = NULL so historical aggregations compute identically.
Gateway plumbing improvements — anthropic-ratelimit-*, request-id, cf-ray forwarded to downstream clients on every LLM surface (Messages / Chat / Responses / Gemini); UpstreamCallOptions.waitUntil plumbed so workerd doesn't cancel post-response persists (quota snapshot, token rotation); clientRequestHeaders + clientRequestPathname threaded through so providers can inspect inbound shape; control-plane proxy_fallback_list override applied pre-save (refresh-now, OAuth import, copilot poll) so dashboard operations honor unsaved proxy edits.

Test plan

Outstanding / deferred

Copilot's Anthropic-compatible /v1/messages endpoint per-TTL cache split is unknown — both Copilot accounts in production D1 have Claude disabled by enterprise policy; can verify when an account with Claude access is added
Anthropic fast-mode body activation mechanism — beta header alone doesn't activate (speed: standard in usage despite fast-mode-2026-02-01 in anthropic-beta); body field name TBC, captured by extractor when it does fire

Lays the foundation for a fifth UpstreamProviderKind. After this commit the workspace recognizes 'claude-code' as a valid upstream kind, the @floway-dev/provider-claude-code package exists with config + state asserters mirroring the codex provider's pattern, and the database schema carries a CHECK-constraint extension plus a claude_code_pkce_pending table for the OAuth handoff that lands in a follow-up commit. The data-plane factory entry is a placeholder thrower; the dashboard serializer redacts the new shape (refreshToken stays in state, surface the access-token expiry and quota snapshot summary). Both control-plane import routes and the data-plane fetch path are deliberately not wired here — those land alongside the OAuth helpers and the messages interceptors in subsequent commits.

PKCE S256 generator (32-byte verifier; produces 43-char base64url verifier+challenge pair matching the real claude CLI). Authorize-URL builder includes the literal `code=true` query param the CLI emits. Token exchange + refresh against platform.claude.com/v1/oauth/token use the JSON body shape (not form-urlencoded) and pin User-Agent: axios/1.13.6 so the wire shape is indistinguishable from a real CLI request. Both endpoints map `invalid_grant` and `app_session_terminated` to a dedicated ClaudeCodeOAuthSessionTerminatedError so the request path can surface session-termination separately from a transient upstream message. Identity is derived from `GET api.anthropic.com/api/oauth/profile` — the profile endpoint lives on api.anthropic.com (not the OAuth host), nests identity under `account` and `organization`, and exposes a granular `rate_limit_tier` we combine with `organization_type` to produce a canonical `subscriptionType` string (`pro`, `max_5x`, `max_20x`, `team`, `enterprise`) — the same shape the CLI persists in its on-disk credentials.json. Two ingestion paths: - OAuth callback: exchange `code` against `/v1/oauth/token`, then fetch profile and build {config, state}. The profile call routes through the per-upstream fetcher (proxy-aware); the token exchange uses direct egress because at exchange time we don't have an upstream yet. - credentials.json paste: parse `claudeAiOauth` wrapper, validate `expiresAt` is milliseconds (rejects suspected seconds-mode files), fetch profile but prefer the CLI's persisted `subscriptionType` verbatim when present. The JSON's existing access token is reused for the cached entry so the first request doesn't need a refresh. Access-token cache mirrors codex's CAS semantics: - 5-minute skew window — anything within 5 min of expiry counts as stale to avoid racing the upstream clock. - Refresh-token rotation MUST land via CAS. A CAS miss is fatal because our fresh response holds a token a sibling rotation has already superseded; surfacing the error lets the next request re-read the just-persisted state. - Terminal session-terminated errors flip `state` to `refresh_failed`, clear the cached access token, persist via CAS (best-effort — sibling may have already flipped), and rethrow. - `invalidateClaudeCodeAccessToken` clears the cached slot without touching the refresh token (for 401-retry). eslint.config.ts: registered the package in projectList (missed in the prior scaffolding commit). Test coverage: 73 tests across pkce, oauth, import, access-token-cache.

…-policy comment

…parser Adds the static-mimicry surface, request-detection predicate, and the `anthropic-ratelimit-unified-*` response-header parser for the Claude Code subscription provider. No fetch path is wired yet — that lands in the next change. - `headers.ts` pins the v2.1.181 header set Anthropic's third-party detector keys on, with separate Sonnet/Opus and Haiku `anthropic-beta` profiles selected via `pickClaudeCodeHeaders(modelId)`. - `system-blocks.ts` ships the three blocks we drop on the re-mimicry path: identity prefix, default-template boilerplate (`cache_control: ephemeral`), and `buildBillingBlock(version, fp)` which emits the literal `cch=00000` placeholder. `computeCcVersionFingerprint(version, body)` produces the 3-hex `cc_version` suffix via SHA-256 over salt + UTF-8 bytes[4,7,20] of the first user-message text + version, padding with `'0'` (0x30) past end-of-string. Salt and indices ported from sub2api's `gateway_billing_block.go`, which itself ports from Parrot's `cc_mimicry.py` reverse-engineered from CC packet captures. - `detection.ts` mirrors sub2api's `claude_code_validator.go` predicate: UA regex + non-messages/count_tokens/Haiku-probe short-circuits + a strict gate (x-app, anthropic-beta, anthropic-version, valid metadata.user_id in legacy or JSON form, plus a system-block billing fast path or Dice >= 0.5 against six identity templates). Dropping the Dice fallback false-negatives pre-v2.1.36 CC traffic — 2026-06-19 probes confirmed real old CC still bills correctly on the plan tier. - `quota.ts` parses the `unified-*` family captured live on 2026-06-19 into a typed snapshot with nested 5h/7d windows and an overage block, plus a raw passthrough map of every `anthropic-ratelimit-*` header for dashboard display. Missing fields read as `null` rather than throw. Default-template text extracted from `@anthropic-ai/claude-code@2.1.10`'s plain `cli.js` and resolved to a single string (the pinned v2.1.181 release ships a Bun-compiled native binary that resists clean string extraction; the spec records this fallback as acceptable). Tool-name placeholders inside the template substituted with the canonical CC tool identifiers (Bash, TodoWrite, Task, Read, Edit, Write, Grep, AskUserQuestion, WebFetch, general-purpose). Pinning gets bumped together with the CC version. Dep added: `@noble/hashes@^2.2.0` (already in the workspace via packages proxy and http). 147 new tests; full workspace test suite green (2829 tests).

Wires the Claude Code data plane on top of Task 2's OAuth + Task 3's detection/system-blocks/quota helpers. - Six-step messages interceptor chain (hoist-user-system / inject-billing / inject-identity / inject-default-template / pin-dated-model-id / synthesize-metadata-user-id) shaping non-CC traffic into the exact CC wire shape before reaching Anthropic. - callClaudeCodeMessages: pre-fetch gates (state, rate-limit), access-token cache, 401 invalidate-and-retry-once with terminal-state surfacing, best- effort quota persistence on 2xx and 429, recordUpstreamLatency contract observed on every code path including synthetic gates. - Provider factory advertises a static catalog (sonnet / opus / haiku dated ids) with current console.anthropic.com pricing baked in; messages is the only endpoint exposed, every other call surface throws not-implemented. - Detection runs at the boundary so shaped CC traffic bypasses the re-mimicry chain entirely (operator's own CC client passes through with Authorization swap only). 193 tests across the package; workspace typecheck + lint clean.

…CallOptions Add optional `clientRequestHeaders` and `clientRequestPathname` to `UpstreamCallOptions` and capture them once on GatewayCtx from the inbound Hono request. Each `attempt.ts` consumes them via a new `buildUpstreamCallOptions(candidate, ctx, recorder)` helper so the build is uniform across messages / responses / chat-completions and across the generate / count-tokens / compact entry points. Only the claude-code provider reads the new fields today — its shape detector keys on the inbound UA + pathname + system block to decide whether the request is already real Claude Code traffic that can pass through unmodified. Other providers receive the fields and ignore them. Synthetic ctxs (in-process tests, translation chains that never originated as a real HTTP request) leave both fields undefined; the claude-code detector treats that as "not CC-shaped" so the re-mimicry chain runs.

The Claude Code subscription endpoint reads the `x-anthropic-billing-header: …` block in the system prompt to bill the request against the user's plan tier — that block has to reach Anthropic intact. Every other upstream sees the same block as ordinary prompt text, and the per-turn `cch=` hash inside it kills prompt-cache hits on each turn even when the real conversation prefix hasn't changed. Add a `strip-billing-attribution` flag (default-on for copilot / azure / custom, default-off for claude-code) and gate the interceptor on it. The file is rewritten from scratch — the top-comment now explains both purposes and cites the flag id. Update strip-billing-attribution_test.ts to cover the new flag-off path (billing block survives intact when flag is off) alongside the existing strip-when-on assertions, and add gateway-level integration tests in serve_test.ts that pin the routing-side guarantee for both directions: claude-code preserves the block, copilot strips it. Also update flags_test.ts to enumerate the new defaults.

Replace the placeholder thrower with the real createClaudeCodeProvider factory and declare @floway-dev/provider-claude-code as a gateway dependency. The dispatch path that throws for unknown provider kinds now resolves to the real factory for claude-code rows. Compute effective per-binding flags in createClaudeCodeProvider — the factory layers `record.flagOverrides` on top of the catalog defaults for the kind and stamps the resulting set onto every model the catalog advertises. Without this, the flag-gated interceptor would see an empty `binding.enabledFlags` and the `strip-billing-attribution` default-off semantics for claude-code would be indistinguishable from flag-on behavior on a copilot binding. In the provider's callMessages, switch the shaped-request detection to read inbound `clientRequestHeaders` and `clientRequestPathname` from the per-call UpstreamCallOptions. Previously detection ran against the outbound wire-header bag (always empty for unshaped traffic), so the shaped-passthrough path was effectively dead in real traffic — a real Claude Code session would always have been re-mimicked. Update the data-transfer routes to accept claude-code as a valid upstream provider kind during backup/restore and round-trip ClaudeCodeUpstreamState through the runtime's shape assertion (codex already does this; claude-code now does too). Without this, an operator who ran `pnpm run export` on a deployment with a claude-code upstream would have been unable to import the dump back.

…refresh Adds four admin endpoints mirroring the codex naming: - POST /api/upstreams/claude-code-pkce-start - POST /api/upstreams/claude-code-import - POST /api/upstreams/:id/claude-code-reimport - POST /api/upstreams/:id/claude-code-refresh-now The import path accepts either a verbatim ~/.claude/.credentials.json paste or a PKCE callback (full redirect URL or just code+state). The verifier is stashed in a new claude_code_pkce_pending table whose schema was added in 0033 and whose repo wiring (memory + SQL) and scheduled sweep land here. The generic POST /api/upstreams and PATCH /api/upstreams/:id now route claude-code through the dedicated endpoints — config edits return 400 with the canonical "use claude-code-reimport" message, mirroring how codex is handled. State serialization (already in serialize.ts) exposes a slimmed summary without the refresh token; the routes themselves never leak it. Refresh-now CAS-rotates the refresh_token via the per-upstream proxy-aware fetcher, flips the row to refresh_failed on session-terminated, and returns 502 (not 401) so the dashboard's auth client doesn't log the operator out on a dead credential.

…count card Adds the operator-facing UI surface for the claude-code provider, mirroring the codex panels: - ProviderPicker grows a Claude Code tile (rose tone) so the create-mode upstream editor offers the provider. - ClaudeCodeConfigPanel orchestrates the PKCE + credentials.json paste flow, threaded through the typed Hono RPC client. The OAuth tab opens the authorize URL, accepts either the full redirect URL or the ?code=...&state=... fragment, and posts to claude-code-import; the credentials.json tab POSTs the raw paste verbatim. Re-import and Refresh-now buttons appear on edit mode. - ClaudeCodeAccountCard renders the identity + state badge + structured 5h / 7d / overage quota slices, plus access-token expiry as a relative time. The raw header map collapses under a debug disclosure so the detail page covers the operator's `claude-code-reverse`-style poke without leaving the dashboard. - UpstreamRow, ModelInfoBar, UpstreamPicker, and the route-loader guard in /dashboard/upstreams/new.vue learn the new provider kind. - api/types.ts gains the slimmed ClaudeCodeUpstreamState / quota snapshot shapes — the wire shape mirrors what serialize.ts already emits (refresh token never surfaced, only refreshTokenSet boolean).

… add credentials.json safety warning

…ead defaults, header-casing defense) Six small fixes from the final-review pass on the Claude Code provider branch: - system-blocks_test.ts: correct the byte-index annotations for 'hello world this is a test prompt' — bytes[4]='o', bytes[7]='o' (the second 'o' in "world"), bytes[20]='a'. The previous comment claimed 'r' at index 7 and 't' at index 20. Test passes either way (assertion is on hash output), but the comment misled readers. - fetch.ts persistQuotaSnapshot: drop the dishonest `as unknown as Record<string, unknown>` double-cast; the `data` field is typed `unknown`, so plain assignment is enough and correct. - provider.ts UpstreamCallOptions: document the lowercase-key invariant on `clientRequestHeaders`. The record is built via `headersToRecord` in gateway-ctx.ts, which uses `Headers.forEach` and yields lowercase keys per the WHATWG Fetch spec. - fetch.ts shaped passthrough: defensively `delete` both `authorization` and `Authorization` from the spread before adding our bearer token. The lowercase invariant should hold; the uppercase delete is one-line insurance, not a real expectation. - fetch.ts shaped passthrough: extend the `stream: true` coercion comment to record why it is safe in the shaped path — shaped detection requires CC client headers + system blocks + a valid metadata.user_id, and the real Claude Code client always sets `stream: true`. - ClaudeCodeAccountCard.vue: drop dead defensive defaults. The asserter proves `accountUuid` and `email` are non-empty strings on a non-null account, so `?? ''` and `?? 'Claude Code account'` are unreachable. Also drop the `id.length <= 18` short-circuit on `accountIdShort`: canonical UUIDs are always 36 chars, so the branch is dead.

…lable' literal Background. Anthropic returns `anthropic-ratelimit-unified-fallback: available` on every successful /v1/messages response. Claude Code v2.1.10's parser (`HoB` in cli.js, mirrored in extractUnifiedRateLimitInfo.js of the reverse-engineered source) reads it via `=== "available"`. Our parser accepted only the magic strings `y` / `yes` / `true`, so the parsed value was always null and the dashboard's "fallback active" badge never fired. The semantic also flips. The header reports whether a degraded-mode fallback service is reachable, not whether it is currently in use: `available` is the steady-state good signal, anything else is a warning that the fallback path is gone too. Renaming the field to `fallbackAvailable` reflects that, and the dashboard now surfaces a warning badge only when the gateway has lost the fallback signal. Adjustments. quota.ts switches the parse to literal-string equality against `available`; the snapshot field becomes `fallbackAvailable`. The SPA's `ClaudeCodeQuotaSnapshotData` type and the account card mirror the rename. Tests cover the steady-state `available` → true, non-`available` values → false, and absent header → null cases.

The codex catalog mapper hardcoded `enabledFlags: new Set<string>()` on every UpstreamModel it produced, so flagOverrides configured on a codex upstream survived the schema validation and SQL round-trip but were silently dropped before any interceptor could read them. The fix mirrors the claude-code provider's pattern: resolve the upstream's effective flag set once at provider construction with `resolveEffectiveFlags(defaultsForProvider('codex'), [record.flagOverrides])`, then thread it into `codexRawToUpstreamModel` so every emitted model carries the resolved set. A dedicated test asserts that turning `responses-web-search-shim` on at the upstream layer reaches the produced models.

… fallback) The block was tagged "pinned to @anthropic-ai/claude-code@2.1.181" but its body was the v2.1.10 cli.js text — v2.1.181 ships as a Bun-compiled binary with no plain JS source, and an earlier extraction fell back to v2.1.10 rather than reading the actual v2.1.181 wire shape. Captured v2.1.181's real `system[2]` by pointing the binary at a local 401 echo (ANTHROPIC_BASE_URL trick) and reading back the assembled prompt. v2.1.181 builds the prompt from many conditional sub-templates; we keep the always-on core (intro + URL warning + # System + # Doing tasks + # Executing actions with care + # Using your tools + # Tone and style) and drop the per-session # Environment / # Session-specific guidance / # Context management / # Text output paragraphs that depend on cwd, model, output style, and experiment flags. The captured `# Using your tools` carried the SDK-mode TaskCreate planner; cli mode falls through to TodoWrite (igm does [TaskCreate, TodoWrite].find), which is the token we ship. Detection's IDENTITY_TEMPLATES gains the v2.1.181 "interactive agent" prefix alongside the older "interactive CLI tool" string so Dice- template fallback recognises real CC traffic on either side of the 2.1.181 boundary.

Drop the # System / # Doing tasks / # Executing actions with care / # Using your tools sections from the v2.1.181 wire shape and keep only the opener line, the two IMPORTANT: safety lines, and the # Tone and style bullets — the same trimmed expansion sub2api ships in backend/internal/service/gateway_service.go. Those four sections are CC-agent-action instructions ("prefer dedicated tools over Bash", "use TodoWrite to plan", "default to writing no comments") that would inappropriately steer the model when a non-CC downstream is on the other end of the re-mimicry path. Identity block + safety/URL pair + tone is enough structural material for Anthropic's plan-billing detector — the detector keys on shape and identity, not on the trimmed sections. Block size drops from ~9.9 KB to ~1.5 KB. Detection still passes both old and new identity-prefix variants (matching is against IDENTITY_BLOCK, not against the template body).

…of swallowing invalidateClaudeCodeAccessToken previously warn-and-returned when the upstream row disappeared mid-401-retry. That is the silent-catch-and-return pattern the project explicitly forbids; flip it to throw so the operator sees the real condition instead of a downstream 401 mask. ensureClaudeCodeAccessToken previously synthesized a stateMessage fallback for non-active accounts via "account is in state 'X'". Tighten the wire contract: stateMessage is now required on terminal states and must be absent on active. The discriminated-union split keeps the access-token cache from inventing a message when it surfaces ClaudeCodeOAuthSessionTerminatedError, and the fetch.ts non-active synthetic-503 path drops its ternary now that the field is guaranteed.

…tead of silent defaults deriveSubscriptionType previously coerced (organization_type=null) to 'pro' and (organization_type=claude_max, unknown rate_limit_tier) to 'max', and passed through any unknown organization_type by stripping the 'claude_' prefix. All three paths silently invented subscription tiers from missing or unrecognized upstream data; replace them with throws so future Anthropic shape changes fail loud at ingest time instead of mislabeling accounts. claudeCodeTokenRequest previously caught JSON parse errors and smuggled the raw text under a synthetic '_nonJsonBody' key that downstream type checks ignore. Surface the parse failure directly with the response status and the raw-text excerpt; the original parse error rides on `cause`.

…inventing defaults The catalog mapper previously fell back to slug for a missing display_name and to 0 for missing context_window / max_context_window. A 0 is then indistinguishable from a real upstream value, and the silent slug substitution hides a contract change behind plausible-looking output. Throw on each missing field — mirroring the existing slug check — so an upstream shape change surfaces at the catalog refresh boundary rather than seeping into downstream pricing and limits. The fallback chain in codexRawToUpstreamModel still treats 0 as the "unset" sentinel for the per-request window, but the path is now rewritten with explicit comparisons so the null-vs-zero distinction is no longer wrapped in a ||-chain that conflated the two.

…explicitly Round-1 cleanup: ClaudeCodeAccountCard previously fell back to raw.accounts[0] when the configured account UUID did not match any state row, silently showing credentials and quota for an unrelated account. Replace the opportunistic find-or-fallback with a tagged CredentialLookup union ('present' | 'missing-state' | 'uuid-mismatch') and render an explicit re-import prompt when the configured UUID is missing from state. Timestamp parse failures returned the raw ISO string, which let malformed upstream data render unchanged. Log a warning and substitute an '(invalid timestamp)' sentinel instead so the operator can see the parse failed in DevTools.

Replace the `value as Record<string, unknown>` cast in the catalog asserter with an `isPlainRecord` type predicate; the call site no longer asserts a shape the surrounding control flow has not already verified. Drop the `?? {}` fallback in the callResponses / callResponsesCompact boundary contexts. The provider contract types `headers` as `Record<string, string> | undefined`, and spreading `undefined` is a no-op — the fallback was defending against a case the type already excludes.

… codex) A throw from readClaudeCodeUpstreamState means the row's state column was hand-edited or written by a buggier branch. Returning a 500 with the parse error stitched into the body leaks internal error structure to the dashboard; letting the throw propagate to the framework-level 500 handler stack-traces internally without leaking shape.

…erter contracts extractClaudeCodeCallbackParams previously fed any non-protocol-prefixed input to URLSearchParams as-is. A host-relative paste like `platform.claude.com/oauth/code/callback?code=…&state=…` would hit that path and surface a confusing "missing code" error because URLSearchParams treated the whole string as a single malformed key. Slice at the first `?` so the post-`?` segment parses as a query, and add a regression test. The state asserter previously accepted both `undefined` and `null` for `accessToken` / `quotaSnapshot` and the readClaudeCodeUpstreamState boundary normalized absent → null. Tighten the wire contract: the asserter now requires the keys to be present with explicit `null` (or a populated entry), so the boundary normalization becomes dead code and is dropped. Both writers (`auth/import.ts`, `access-token-cache.ts`) already always set both fields explicitly, so this matches the on-disk reality.

Drop four comments that paraphrase the immediately-following code: the CodexRawModel shape header repeats the field list, the codexRawToUpstreamModel describe-block preamble repeats what the four tests beneath it visibly do, and two provider-test comments restate a helper name and a `toHaveBeenCalledTimes(1)` assertion.

… drift The codex and claude-code refresh-now / OAuth-ingest handlers had diverged on three small dimensions: where the single-use PKCE-state guarantee was documented, whether the asserted state alias or the raw existing.state was passed to saveState, and what comments described the failure-state CAS write. Align both to a single voice and structure so future audits see one OAuth-ingest pattern instead of two near-identical ones. Also drop codex's manual 500-with-error-string wrapper around the state-shape assert. Mirroring claude-code (now letting the throw propagate), a malformed state column means the row was hand-edited; the framework-level 500 handler stack-traces internally without leaking the parse error to the dashboard.

…panels Round-1 cleanup: across UpstreamRow, UpstreamPicker, UpstreamConfigPanel, ModelInfoBar, UpstreamEditPage, and the new-upstream loader, the SPA carried defensive `??`/`||` fallbacks against fields the wire types already require. Drop the fakes and replace silent default arms with an exhaustive `assertNever` so a future widening of UpstreamProviderKind fails the build instead of silently routing through the amber/zinc catch-all. The new-upstream loader threw the loaded list/flags through `?? []` and silently rendered an empty page if the store failed to populate. Throw through the loader instead so the framework surfaces the error. UpstreamEditPage's autoForActive had a defensive `azure → []` early return that was unreachable in practice (the loader does not fetch for Azure, so `upstreamModels` is empty and the fall-through returns the same value); drop it. Add a shared `apps/web/src/utils/assert-never.ts` helper for the exhaustive checks.

…tion The Hono entry built a fresh Record<string, string> on every request via headersToRecord(c.req.raw.headers), then the consumer reconstructed a Headers instance (`new Headers(opts.clientRequestHeaders)`) for case- insensitive lookups. The detector reads four headers; building and re-wrapping the whole map per request was pure allocation churn on the hot path. Forward the runtime's own Headers instance through GatewayCtx and the UpstreamCallOptions wire so lookups stay native; the consumer's wrap still works because new Headers(...) accepts a Headers init. The wire type stays Headers | Record<string, string> so synthetic call sites in tests can keep passing a plain record literal. Also drop a stale citation in the GatewayCtx interface comment that referenced a Gemini WebSocket entry that does not exist (createGateway CtxForWs lives under responses/).

…uotaSnapshot state.ts previously typed `ClaudeCodeQuotaSnapshotEntry.data` as `unknown` and fetch.ts re-cast at the consumer (`as ClaudeCodeQuotaSnapshot | null`) plus non-null asserted afterward. Tighten the wire contract: data is now `ClaudeCodeQuotaSnapshot` directly, the asserter calls into a new `assertClaudeCodeQuotaSnapshot` in quota.ts that validates each field, and `isRateLimitedNow` becomes a type predicate so the resetIso line drops the `!`. Test fixtures that built the snapshot's `data` ad-hoc now use a `fullQuotaSnapshot` helper and a regression test pins the inner-shape rejection so future loose data from the wire fails loud.

…extraction extractClaudeCodeCallbackParams throws on a malformed callback URL. Previously the throw fell through the outer try/catch and surfaced as an opaque 400, with the URL-parse step's role in the data flow invisible from the read of the function. Wrap the call so the malformed-URL early-return is visible alongside every other explicit `{ ok: false }` exit in the helper.

… compute - New protocols/common/models_test.ts covers resolveEffectivePricing (override merge, strips `tiers`, unknown/null/absent tier behavior, null base) and the input_cache_write_1h → input_cache_write → input fallback chain. - New Messages stream usage tests cover the cache_creation per-TTL split (sub-object takes precedence over the rolled-up flat field), the fallback to the flat field when the sub-object is absent, and the speed=fast → tier=fast extraction (standard leaves tier unset). - New aggregate tests verify the per-tier override applies during cost compute (Opus 4.8 fast at 2× base costs twice as much) and that an unknown tier falls back to base pricing, plus pricing for the new input_cache_write_1h dimension via its dedicated rate. - Responses WebSocket usage extractor now reads service_tier from the response object (matching the HTTP path), since it ran a duplicate helper that had drifted before this change.

Add `tiers.flex` and `tiers.priority` overlays for every priced Codex slug so the dashboard's notional cost reflects which OpenAI service tier the request actually ran on. The gateway already captures `usage.service_tier` into `TokenUsage.tier`; this commit completes the loop by giving the cost compute a per-tier rate row to look up. Tier overrides match OpenAI's public pricing (verified 2026-06-19 against https://platform.openai.com/docs/pricing): gpt-5.5 flex $2.5/$0.25/$15 priority $12.5/$1.25/$75 gpt-5.4 flex $1.25/$0.13/$7.5 priority $5/$0.5/$30 gpt-5.4-mini flex $0.375/$0.0375/$2.25 priority $1.5/$0.15/$9 `codex-auto-review` shares `gpt-5.4`'s pricing including the tier overrides. Codex CLI's `/fast` toggle writes `service_tier: "priority"` on the wire (per openai/codex's `ServiceTier::Fast.request_value()`), so operator-facing rows tagged "fast" cost out at the priority row. Cache-write rate stays unset on these entries — OpenAI charges cache creation at the same rate as input, which `unitPriceForDimension`'s fallback chain already covers.

…provider # Conflicts: # apps/web/src/api/types.ts # apps/web/src/components/upstream-edit/UpstreamConfigPanel.vue # apps/web/src/components/upstream-edit/UpstreamEditPage.vue # apps/web/src/pages/dashboard/upstreams/new.vue # packages/gateway/src/control-plane/upstreams/routes.ts # packages/gateway/src/control-plane/upstreams/routes_test.ts # packages/gateway/src/data-plane/llm/responses/websocket.ts # packages/gateway/src/data-plane/llm/shared/gateway-ctx.ts # packages/gateway/src/data-plane/llm/shared/gateway-ctx_test.ts # packages/gateway/src/repo/sql.ts

Backport the Claude Code refresh-race recovery (commit f1efc9d) to Codex. Same architecture, same vulnerability: Under burst load, two workers can both observe a stale access token on the same Codex upstream and both attempt a refresh. OpenAI rotates the refresh_token on every successful /oauth/token call, so exactly one racer wins; the other's request is rejected with `invalid_grant` for trying to redeem the rotated-out copy. The previous flow treated every `invalid_grant` as a dead credential and let the caller flip the account to `refresh_failed` — destroying the working credential a sibling had just rotated, and forcing the operator to re-import. On `invalid_grant`, the access-token cache now re-reads upstream state for the same `chatgpt-account-id` slot and compares the refresh_token it tried against what is now stored. If they differ, a sibling rotated and we return their freshly-minted access token (the caller treats it as a normal cache hit and skips the terminal flip). If they match, we re-raise the original error so the data-plane / control-plane caller flips the row as before. The other refresh-terminal codes — `app_session_terminated`, `invalid_refresh_token`, `invalid_client`, `unauthorized_client`, `access_denied` — bypass recovery entirely; none of them are caused by a rotation race. `CodexOAuthSessionTerminatedError` now carries the raw OAuth `error` value as a `code` field alongside the existing `upstreamMessage` so the recovery branch can single out `invalid_grant` from the catch. `REFRESH_TERMINAL_OAUTH_CODES` is broadened to the audit-aligned set (`app_session_terminated`, `invalid_grant`, `invalid_refresh_token`, `invalid_client`, `unauthorized_client`, `access_denied`) — Codex is OpenAI OAuth, so the list matches sub2api's `isNonRetryableRefreshError` verbatim. Sub2api's tryRecoverFromRefreshRace (backend/internal/service/oauth_refresh_api.go:173-193) is the canonical pattern; we apply it to Codex's per-account credential here. The token rotation persistence hook stays awaited (matches Claude); the recovery branch reads from the just-persisted state via the upstream repo and returns the sibling's cached access token directly so no second mint fires from this call site.

…ope) Adds a third Claude Code credential class alongside the existing full-scope PKCE flow and credentials.json paste: the Setup Token, a long-lived (~1 year) inference-only bearer that Anthropic's "Create a Long-Lived Token" UI issues. Cross-checked against sub2api (BuildSetupTokenAuthorizeURL, ScopeInference, expires_in=31536000 on exchange) and claude-relay-service (SCOPES_SETUP, exchangeSetupTokenCode). Wire-shape differences from the regular OAuth flow: - Authorize URL: scope narrows to `user:inference` only (no org:create_api_key, no user:profile, ...). Host / client_id / redirect_uri / response_type are identical. - Token exchange: body adds expires_in=31536000 to request the 1-year bearer. Response has NO refresh_token — the access token IS the credential. - Profile endpoint: the bearer lacks user:profile scope, so /api/oauth/profile 403s. The existing degraded-identity fallback fires and surfaces a deterministic accountUuid + null email. State and lifecycle: - ClaudeCodeAccountCredential becomes a discriminated union on tokenKind: `oauth` carries a non-empty rotating refresh token; `setup-token` carries null. Old records (written before this field existed) are normalized to `oauth` on read so legacy state passes the strict asserter. - ensureClaudeCodeAccessToken short-circuits for setup-token: cached bearer is returned when fresh, or the row is flipped to refresh_failed with a "re-import to recover" message when expired. No upstream call. - The control-plane refresh-now route rejects setup-token credentials with a precise message rather than attempting a refresh that has no rotation counterpart. ClaudeOAuthTokenResponse.refresh_token becomes optional to reflect the setup-token shape; the OAuth-flow and refresh paths guard explicitly so shape drift surfaces a clear error instead of corrupting persisted state. Adds tests for: setup-token authorize URL scope, setup-token exchange body (expires_in=31536000), setup-token import end-to-end with degraded identity, setup-token cache short-circuit (fresh + expired paths), state asserter token-kind discrimination, and legacy default-fill.

Adds three parallel routes for the Setup-Token import flow alongside the existing OAuth ones: - POST /api/upstreams/claude-code-setup-token-pkce-start - POST /api/upstreams/claude-code-setup-token-import - POST /api/upstreams/:id/claude-code-setup-token-reimport The PKCE pending store is shared with the regular OAuth flow because only the authorize URL's `scope` differs server-side at Anthropic; a verifier issued by either start endpoint can be consumed by either import endpoint in principle, though the UI keeps them paired so the user-facing scope intent matches the persisted credential class. The setup-token import body has no `credentials_json` path (Anthropic's CLI never persists a setup token) — only the OAuth callback is accepted. The default display name falls back to the short account UUID when the degraded-identity path returns null email (the usual case). claude-code-refresh-now now rejects setup-token credentials with a precise 400 message; the existing route was previously typed against the old shape where `refreshToken` was always non-null. It also asserts the refresh-token presence on the upstream response so shape drift surfaces loudly instead of corrupting the persisted row. The control-plane serializer surfaces `tokenKind` on each account summary so the dashboard can label setup-token accounts distinctly. The SPA types declare the field as optional to tolerate the pre-deploy window before the gateway is upgraded. Adds tests covering: setup-token PKCE start, end-to-end setup-token import with degraded identity, refresh-now rejection on setup-token, and setup-token reimport replacing credentials in place.

Three-tab strip in the Claude Code import flow: - Sign in with Claude — existing full-scope PKCE OAuth. - Setup Token — new inference-only PKCE flow with a brief callout explaining the trade-off (cannot self-mint API keys, cannot be refreshed; safer for shared deployments). - Paste credentials.json — existing manual paste. Each PKCE-driven tab manages its own in-flight authorize URL session because the URL's `scope` is locked server-side; switching tabs lazy- fetches the URL for the newly active tab. The Account Card shows a "Setup Token" badge next to the subscription chip (violet tone, uppercase) so an operator scanning the dashboard immediately sees which credential class a row holds. The "Refresh token now" button is hidden for setup-token rows because that flow has no rotation counterpart — re-import is the only renewal path. Wire-shape: dashboards consume the new `tokenKind` field through the `ClaudeCodeAccountCredentialSummary` type declared in commit 2; the field is optional client-side so a dashboard running ahead of a deploy still parses pre-upgrade responses correctly.

…olate A cold-start fan-out of N concurrent requests against an empty access-token cache used to drive N parallel `/v1/oauth/token` POSTs. Anthropic rotates the refresh token on every call, so only one POST survives; the rest get `invalid_grant` and fall into `recoverFromRefreshRace` (commit f1efc9d) which re-reads state and returns the winner's freshly-rotated token. That is correct but burns N round-trips against the OAuth endpoint per cold start and adds the recovery re-read to every loser's latency. Add a module-level `Map<upstreamId, Promise<EnsuredAccessToken>>` that coalesces concurrent ensures on the same upstream into a single in-flight promise. Later callers await the existing promise and observe the first caller's result; the map entry is cleared on settle (both success and failure) so the next wave is free to mint when the cache expires again. Scope: per-isolate only. Cloudflare Workers isolates do not share memory, so siblings on other isolates still race; cross-isolate dedup remains the job of `recoverFromRefreshRace`. Sub2api gates the same path with a Redis SETNX lease over a 60s TTL window (gateway_service refresh path via `oauth_refresh_api.go:91-105`), making the whole cluster single-mint per refresh window. We trade that for zero coordination dependency at the cost of cross-isolate-only round-trip duplication — typically the rare case, since a single isolate usually serves multiple concurrent requests for the same credential. `freshlyMinted` semantics widen slightly: coalesced waiters now report `freshlyMinted: true` even though they did not drive the mint themselves (they share the minter's EnsuredAccessToken). The 401-retry path still behaves correctly — the slight semantic shift turns a previously-cheap "invalidate + re-mint on 401" branch into a "give up on 401" branch for coalesced peers, which is the safer of the two under a dying-credential scenario. Tests cover the 10-parallel-cold-start case (asserts exactly one fetch), the post-settle map cleanup case, and the rejected-mint propagation case (every waiter sees the same rejection, map cleared).

… boundary Anthropic /v1/messages requires `max_tokens` and 422s without it. Real CC also always emits `temperature: 1`; its absence is one of the CC-shape fingerprint failures the plan-billing detector keys on. Third-party clients (cline, aider, custom integrations) routinely omit one or both, expecting the gateway to fill defaults. Sub2api (gateway_service.go:1301-1314, rev 4a5665d) backfills both unconditionally: `max_tokens: 128000`, `temperature: 1`. We mirror the temperature default verbatim and pick a softer max_tokens policy — prefer the model's advertised `limits.max_output_tokens`, else fall back to the gateway-wide MESSAGES_FALLBACK_MAX_TOKENS (8192). Sub2api's hardcoded 128000 ignores per-model output caps and is not worth porting. The new interceptor sits at the head of `claudeCodeMessagesChain` so the rest of the re-mimicry steps (notably `inject-billing-block`, whose fingerprint depends on the post-chain wire shape) see a fully-formed payload.

…org 403 sentinels Two credential-class body sentinels were leaking through as ordinary 4xxs, leaving the operator to chase phantom request errors while the gateway burned an OAuth refresh on every retry against a permanently dead account: - 400 invalid_request_error / "organization has been disabled" (sub2api ratelimit_service.go:208-214, CRS claudeRelayService.js:_isOrganizationDisabledError) - 403 permission_error / "OAuth authentication is currently not allowed" (CRS claudeRelayService.js:_isOrganizationDisabledError) Both signal a permanently disabled org that no retry, refresh, or re-import can recover; the operator must contact Anthropic. The match runs on the lowercased error.message substring against the typed (invalid_request_error / permission_error) error shape. Detection lives in performUpstreamCall's .then(response => ...) block, alongside the existing quota-snapshot persist. The response body is cloned before reading so the upstream response still streams to the caller verbatim — passthrough discipline holds; the terminal flip is purely additional dashboard signal (the same role persistTerminalState plays inside access-token-cache's refresh-death path). The persist is fire-and-forget via the same waitUntil shim the quota persist uses. Body parse is deliberately defensive: any 400 / 403 whose body is not JSON, or whose JSON shape does not match {error:{type,message}}, must not trigger a terminal flip — those are the common case (max_tokens validation, beta-feature gating, etc.). Skip-when-already-terminal keeps the first terminal signal stable when a sibling write (e.g. an OAuth refresh death) won the race.

…-stream `tokenUsageFromMessagesFrame` only returned the running usage figure on `message_stop`, leaving the per-frame observer's `state.rememberUsage` calls a no-op for the in-flight `message_start` and `message_delta` frames. When the downstream client aborted mid-stream — every Ctrl-C in the Claude Code CLI — the streaming finally block then recorded a null or stale usage, so billing telemetry under-counted every aborted session even though Anthropic already metered the output tokens against the operator's plan window. Return the running snapshot on `message_start` and `message_delta` too so `state.usage` checkpoints as the stream progresses; the finally block already records whatever was last observed when the client disconnects. Audited against sub2api (drains upstream to capture final usage even on client disconnect) — different remedy, same intent: keep recorded usage aligned with what the upstream actually metered.

A stuck upstream SSE read otherwise pins the request open until the underlying fetch read timeout (Workers HTTP cap ~100s) or the client disconnects — neither of which Hono's keepalive scheme catches, because the downstream ping timer fires whether or not upstream sent anything. Add a generic `withIdleTimeout` wrapper that races every iterator next() against a setTimeout, invokes an `onTimeout` callback (to abort the upstream fetch via the gateway's downstream AbortController), and throws `UpstreamIdleTimeoutError`. Each received frame resets the window; Anthropic's own `event: ping` survives parse-sse as a real frame, so the 30s upstream ping cadence keeps healthy streams alive forever. Wire it into the Messages stream path with a 60s window — 2× headroom above Anthropic's 30s ping cadence, comfortably below the Workers HTTP timeout. The thrown error flows through the existing `messagesSseFrames` catch, which emits a synthetic Messages `event: error` so downstream clients see a clean failure instead of a hung socket. Audited against sub2api's `StreamDataIntervalTimeout` (config-driven 30-300s, default off, synthetic error frame on fire); we default on because a stuck upstream cannot pin a request indefinitely.

…e breakpoint cap Anthropic's prompt-caching API rejects requests that carry more than four `cache_control` markers ("the API enforces a maximum of 4 cache points per request", https://platform.claude.com/docs/en/build-with-claude/prompt-caching). The budget is shared across system blocks, tools, and message content. `inject-default-template` previously appended DEFAULT_TEMPLATE_BLOCK at `system[2]` with its `cache_control: { type: 'ephemeral', ttl: '5m' }` unconditionally. Callers that already carry four breakpoints across their tools / system / messages were therefore pushed to five on the wire and the request 400'd upstream. Count caller breakpoints before injection: if adding ours would exceed the cap, inject the template text without `cache_control` (the three- block system shape that real CC ships still needs to be there for the Anthropic plan detector; only the cache anchor is dropped). Caller overage past the cap is left for the caller to fix — we never synthesize an error for it. Mirrors sub2api's `maxCacheControlBlocks = 4` and the count-and-demote pass in `enforceCacheControlLimit` (`backend/internal/service/gateway_service.go:71,4581`).

…flake The control-plane route tests use `setupAppTest` to spin up the full Hono app + memory D1 mock + admin session per test. Real work is hundreds of milliseconds, but under workspace-parallel load (workers contending for CPU, GC pauses) several of them push past vitest's 10s default and flake intermittently. 30s gives headroom without masking actual hangs. Also folds in eslint --fix for the stylistic/function-paren-newline violations the access-token-cache_test agent left.

…ignals Per the deep observability audit (sub2api parity gap 8 + gaps 1/2/4), the provider previously emitted three free-form `console.warn`/`console.info` lines for the entire surface — refresh-race recovery, quota-persist failure, terminal-sentinel persist failure, identity-degraded fallback, unknown organization_type — with no event name discipline, no field discipline, and zero signal for the moments operators most need to grep for: refresh-token rotation, quota-status transitions, terminal flips. Add a small `log.ts` helper (snake_case event name + KV fields, slog-shaped). KV chosen over JSON-line so the lines stay human-greppable on a worker tail without a JSON pretty-printer; both runtimes' `console.*` flush them verbatim. Format: `event_name field=value field2="quoted with spaces"`; numbers and booleans bare; null/undefined emitted as the literal `null` so the operator sees the field was considered. All event names prefixed `claude_code_` so a single grep separates them from anything else in the worker log. Converts every existing console.* in the provider to a structured event and adds the four high-value signals that were previously silent: - `claude_code_refresh_token_rotated` — emitted by the access-token cache on every successful refresh round-trip (was silent; only observable via after-the-fact `accessToken.refreshedAt` diffs). - `claude_code_refresh_race_recovered` — replaces the free-form console.info; now carries `upstream_id`, `account_uuid`, and the first 6 chars of the rotated refresh token. - `claude_code_account_state_flip` — emitted on every persistTerminalState call (access-token cache: setup-token expiry, OAuth refresh death) and on the terminal-sentinel persist in fetch.ts. Carries `from_state`, `to_state`, `reason`, `oauth_code`/`upstream_status`, and the operator-facing message. Previously the flip was silent — the only signal was a 503 surfacing to the client. - `claude_code_quota_state_transition` — emitted from persistQuotaSnapshot only when the persisted status differs from the prior snapshot, so the line count tracks transitions (allowed→rejected, rejected→allowed, initial population) rather than every request. Previously, the only way to observe the moment of rate-limit onset was to diff `state_json.quotaSnapshot.fetchedAt`. - `claude_code_terminal_sentinel_detected` — emitted when the org-disabled-400 or banned-org-403 body sentinel matches, separately from the subsequent state-flip event so the detection signal is visible even if the persist races and loses. Plus housekeeping conversions: - `claude_code_quota_persist_failed`, `claude_code_terminal_sentinel_persist_failed` replace the two warn-on-rejection paths in fetch.ts. - `claude_code_identity_degraded_fallback` and `claude_code_unknown_organization_type` replace the two console.warn calls in auth/identity.ts. The per-request "detection fallback" event (gap 9 in the audit) is out of scope here — it would require either accepting a high-volume per- request log line or adding a runtime env-getter dependency just for an opt-in flag. Left for a follow-up. Includes log_test.ts covering format, whitespace quoting, null/undefined handling, and the three log-level routes.

…e operator The control-plane claudeCodeRefreshNow handler does its own /v1/oauth/token mint + CAS-write, parallel to the data-plane access-token cache. Under a narrow interleaving, a data-plane request rotates the refresh token between our read and our mint, the upstream answers our (now stale) RT with invalid_grant, and the handler flipped the row to refresh_failed and returned a 400 "Re-import the credential to recover" toast — even though the credential was alive and freshly rotated by the sibling. Mirror the data-plane recovery from access-token-cache.ts (commit f1efc9d): on invalid_grant (or a losing CAS), re-read the row. If a sibling rotated successfully (state still active, refresh token differs, fresh access token in cache), surface 200 with the rotated state so the dashboard shows the refresh as the no-op success it actually was. If the row is unchanged or now terminal, fall through to the existing terminal- flip path. Bounded by a `recoveryAllowed` flag for one recovery attempt per request, matching the data-plane depth guard. Audit reference: sdd/sub2api-deep-concurrency.md axis #10.

Per the deep admin-routes audit (sub2api-deep-admin-routes.md item #5) plus the dedicated quota-probe decision (sub2api-deep-quota-probe-decision.md), real @anthropic-ai/claude-code@2.1.181 calls GET https://api.anthropic.com/api/oauth/usage as a standalone request to read the unified rate-limit snapshot without burning a model call. The binary's string table carries the literal "fetchUtilization: GET /api/oauth/usage" plus a response decoder that consumes {five_hour, seven_day, seven_day_sonnet, ...} each as {utilization, resets_at} plus optional overage fields. Mirror this in the gateway as POST /api/upstreams/:id/probe-quota: - Add fetchClaudeCodeUsageProbe(accessToken, fetcher) helper in packages/provider-claude-code/src/auth/usage-probe.ts. Sends the same oauth-2025-04-20 anthropic-beta header sub2api pins (without it the upstream 401s even on a valid bearer). Returns body verbatim — we do NOT shape-check the inner JSON because Anthropic adds fields (priorIsUsingOverage, hadPriorUtilizationData, ...) without warning, and a strict parser would reject a perfectly usable new field. - Add control-plane route claudeCodeProbeQuota in packages/gateway/src/control-plane/upstreams/routes.ts. Mints the access token via ensureClaudeCodeAccessToken (the same code path the data plane uses, so refresh races and terminal flips are handled identically). Routes through the per-upstream proxy fetcher so the probe lands through the same chain as data-plane requests. Returns the upstream body spread at the top level + a gateway-stamped fetched_at ISO timestamp. - Persist the probe response into a new ClaudeCodeAccountCredential field `usageProbeSnapshot` so the dashboard's quota card sees the freshest snapshot without re-probing. Distinct slot from the existing `quotaSnapshot` (which is header-derived from the data plane) because the wire shapes are completely different — header-derived ClaudeCodeQuotaSnapshot has a fixed schema, the live probe is free-form JSON that evolves on Anthropic's schedule. - state.ts adds the new slot with a legacy-default normalization in readClaudeCodeUpstreamState so records written before this commit still pass the strict-key gate; the asserter requires an explicit `null` from new writes. State persistence is best-effort: a CAS loss to a concurrent rotation is fine because the live response already returned to the operator regardless. - Audit log: emit the brief's `claude_code_admin_action` structured event (action: 'quota_probe') on both success and error. The event reuses the provider's existing slog-shaped logger (log.ts is now re-exported from the package index so the gateway can emit the same KV-formatted lines as the provider). Non-claude-code upstreams 404; the probe semantics are claude-code-specific.

The probe-quota route (added in the prior commit) lets an operator pull a live /api/oauth/usage snapshot without burning a model call. Surface it as a Refresh button on the account card and reconcile the two snapshot sources the credential now carries: - quotaSnapshot is header-derived; written by the data plane on every /v1/messages response. - usageProbeSnapshot is the verbatim probe body; written by the new operator-driven route. The two cover the same 5h/7d windows under different field names and shapes. Render whichever source is newer per window (probe right after the operator hits Refresh; headers right after any real model call). The probe-only 7-day Sonnet window shows up only when the probe has run. Each window chip labels its source so the operator knows which axis the percentage came from. Plumbing: ClaudeCodeAccountCard emits `refresh-quota`; ClaudeCodeConfigPanel handles the POST and emits `quota-refreshed` with a locally-merged record (the route returns the upstream body verbatim, not the persisted summary, so we mirror the gateway's persistence shape on the client). The new event is distinct from `imported` because the existing handler navigates / re-keys the loader, which is the wrong response for a non-mutating data refresh. UpstreamEditPage lands the updated record on `liveRecord` so the card re-renders without a list reload, and the panel's `:record` binding switches from `props.record` to `liveRecord` so quota-refresh mutations actually reach the card. Raw probe extras get their own disclosure for fields outside the three known windows (priorIsUsingOverage, hadPriorUtilizationData, ...), so a schema addition on Anthropic's side surfaces without a dashboard change.

Per the "杜绝假鲁棒" rule: defensive checks that the type system already guarantees, or that the only call site contractually rules out, are dead weight and should be removed so real failures stay loud. - detection.ts: the inbound-pathname short-circuit (`if (input.pathname !== '/v1/messages') return true;`) is unreachable. `callMessages` is the only caller and the catalog declares `endpoints: { messages: {} }`; every other surface returns a synthetic 405 before reaching the detector. Drop the branch, the `pathname` field on `ClaudeCodeShapedRequestInput`, and the `clientRequestPathname` plumbing in `provider.ts:looksShaped`. - access-token-cache.ts: the `if (account.refreshToken === null) throw …` check inside the oauth branch is dead. The `setup-token` early-return above narrows TS to the `oauth` arm of the discriminated union, whose `refreshToken` is typed `string`. The state asserter enforces it on read. - ClaudeCodeAccountCard.vue: `typeof raw !== 'object'` is dead — `raw` is typed `Record<string, string>` from the schema, so the `!raw` guard already covers the only possible falsy case (the `quota.value?.raw` optional chain returning undefined). Tests updated to drop the obsolete `pathname` / `clientRequestPathname` inputs and the two short-circuit cases that exercised the deleted branch.

…provider # Conflicts: # apps/web/src/api/types.ts # apps/web/src/components/models/ModelInfoBar.vue # apps/web/src/components/settings/UpstreamRow.vue # apps/web/src/components/upstream-edit/CodexConfigPanel.vue # apps/web/src/components/upstream-edit/CopilotConfigPanel.vue # apps/web/src/components/upstream-edit/CopilotDeviceFlow.vue # apps/web/src/components/upstream-edit/ProviderPicker.vue # apps/web/src/components/upstream-edit/UpstreamConfigPanel.vue # apps/web/src/components/upstream-edit/UpstreamEditPage.vue # apps/web/src/components/upstreams/UpstreamPicker.vue # packages/gateway/src/control-plane/auth/github-device-flow.ts # packages/gateway/src/control-plane/data-transfer/routes.ts # packages/gateway/src/control-plane/schemas.ts # packages/gateway/src/control-plane/upstreams/proxy-resolution.ts # packages/gateway/src/control-plane/upstreams/routes.ts # packages/gateway/src/control-plane/upstreams/routes_test.ts # packages/gateway/src/data-plane/providers/registry.ts # packages/gateway/src/repo/sql.ts # packages/provider/src/model.ts

Menci added 30 commits June 19, 2026 01:34

fix(gateway): enumerate claude-code in assertUpstreamProviderKind

983db10

fix(claude-code): surface terminal-state save failures, correct cache…

2c7672f

…-policy comment

fix(claude-code,codex): refresh-now terminal errors are 400, not 502;…

e8692cf

… add credentials.json safety warning

Menci added 14 commits June 19, 2026 18:57

Menci force-pushed the worktree-claude-code-provider branch from c69e72e to ef2061f Compare June 19, 2026 14:33

Menci added 2 commits June 19, 2026 22:49

Menci changed the title ~~feat(claude-code): subscription provider + cross-provider tier-aware pricing~~ feat(provider): Claude Code subscription provider Jun 19, 2026

Menci mentioned this pull request Jun 19, 2026

fix(gateway): record partial-stream usage on client disconnect #66

Draft

1 task

Menci force-pushed the worktree-claude-code-provider branch from a547c70 to 94d5363 Compare June 19, 2026 18:17

Menci added 2 commits June 20, 2026 02:35

Menci force-pushed the worktree-claude-code-provider branch from 94d5363 to 739434b Compare June 19, 2026 18:48

Menci marked this pull request as draft June 19, 2026 19:29

Menci added 2 commits June 20, 2026 03:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(provider): Claude Code subscription provider#60

feat(provider): Claude Code subscription provider#60
Menci wants to merge 147 commits into
mainfrom
worktree-claude-code-provider

Menci commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Menci commented Jun 19, 2026

Summary

Test plan

Outstanding / deferred

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant