Skip to content

feat(provider): Claude Code subscription provider#60

Draft
Menci wants to merge 147 commits into
mainfrom
worktree-claude-code-provider
Draft

feat(provider): Claude Code subscription provider#60
Menci wants to merge 147 commits into
mainfrom
worktree-claude-code-provider

Conversation

@Menci

@Menci Menci commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • New provider-claude-code package — Claude.ai subscription as a gateway upstream. PKCE OAuth + credentials.json import, refresh-token rotation with concurrent-race recovery, plan-billing detection + re-mimicry, dynamic /v1/messages?beta=true catalog discovery, dashboard surface (provider picker, import flow, account card with quota windows). Wire shape verified byte-for-byte against Wei-Shaw/sub2api and a fresh capture of real @anthropic-ai/claude-code@2.1.181 cli.js (sub2api itself only pins 2.1.161; we close the post-2.1.161 release-window gap on mid-conversation-system-2026-04-07).

  • Cross-provider tier-aware pricing schemaBillingDimension gains input_cache_write_1h for Anthropic's extended-cache-ttl-2025-04-11; ModelPricing gains a tiers? overlay; TokenUsage.tier carries the request's actual tier extracted from response (usage.speed for Anthropic fast mode, usage.service_tier for OpenAI flex/priority). Cost compute resolves effective pricing through resolveEffectivePricing(pricing, tier) so a single model billed at multiple tiers in the same hour aggregates as separate buckets with distinct unit prices. Pricing tables populated for Claude Opus 4.6/4.7/4.8 fast mode and every priced Codex slug's flex + priority overlays. Migration 0034_usage_per_ttl_and_tier.sql adds the column + widens the dimension CHECK list and bucket-identity unique index — backfills tier = NULL so historical aggregations compute identically.

  • Gateway plumbing improvements — anthropic-ratelimit-*, request-id, cf-ray forwarded to downstream clients on every LLM surface (Messages / Chat / Responses / Gemini); UpstreamCallOptions.waitUntil plumbed so workerd doesn't cancel post-response persists (quota snapshot, token rotation); clientRequestHeaders + clientRequestPathname threaded through so providers can inspect inbound shape; control-plane proxy_fallback_list override applied pre-save (refresh-now, OAuth import, copilot poll) so dashboard operations honor unsaved proxy edits.

Test plan

  • pnpm run typecheck — clean across all packages
  • pnpm run test — 2974 tests pass across 268 files
  • pnpm run lint — clean
  • Manual data-plane probe (JP IP through SOCKS5): plan-billing applied to Sonnet 4.5, Haiku 4.5, Opus 4.8 with re-mimicry; shaped passthrough preserves operator-CC fingerprint headers
  • Manual data-plane probe: Haiku thinking output identical between our and sub2api's beta sets (interleaved-thinking only matters in multi-turn agent loops, not single requests)
  • Fresh capture verified salt 59cf53e54c78 + indices [4, 7, 20] + sha256 + slice(0, 3) algorithm unchanged from v2.1.10 → v2.1.181
  • Fresh capture confirmed fine-grained-tool-streaming-2025-05-14 removed from v2.1.181 OAuth-mimicry path (release notes: "GA on 4.6")
  • Fresh capture identified post-sub2api-pin tokens (mid-conversation-system-2026-04-07 shipped; redact-thinking-2026-02-12 + summarize-connector-text-2026-03-13 deliberately omitted with rationale)
  • Deploy: pnpm run db:migrate:remote && pnpm run deploy — migration 0034_usage_per_ttl_and_tier.sql rebuilds usage + usage_requests tables (CHECK constraints can't be ALTERed in SQLite); standard SQLite "create new + copy + rename" pattern, data-preserving
  • Post-deploy smoke: dashboard upstream creation flow, refresh-now, OAuth import via PKCE, OAuth import via credentials.json paste

Outstanding / deferred

  • Copilot's Anthropic-compatible /v1/messages endpoint per-TTL cache split is unknown — both Copilot accounts in production D1 have Claude disabled by enterprise policy; can verify when an account with Claude access is added
  • Anthropic fast-mode body activation mechanism — beta header alone doesn't activate (speed: standard in usage despite fast-mode-2026-02-01 in anthropic-beta); body field name TBC, captured by extractor when it does fire

Menci added 30 commits June 19, 2026 01:34
Lays the foundation for a fifth UpstreamProviderKind. After this commit
the workspace recognizes 'claude-code' as a valid upstream kind, the
@floway-dev/provider-claude-code package exists with config + state
asserters mirroring the codex provider's pattern, and the database
schema carries a CHECK-constraint extension plus a claude_code_pkce_pending
table for the OAuth handoff that lands in a follow-up commit.

The data-plane factory entry is a placeholder thrower; the dashboard
serializer redacts the new shape (refreshToken stays in state, surface
the access-token expiry and quota snapshot summary). Both control-plane
import routes and the data-plane fetch path are deliberately not wired
here — those land alongside the OAuth helpers and the messages
interceptors in subsequent commits.
PKCE S256 generator (32-byte verifier; produces 43-char base64url
verifier+challenge pair matching the real claude CLI). Authorize-URL
builder includes the literal `code=true` query param the CLI emits.

Token exchange + refresh against platform.claude.com/v1/oauth/token use
the JSON body shape (not form-urlencoded) and pin User-Agent: axios/1.13.6
so the wire shape is indistinguishable from a real CLI request. Both
endpoints map `invalid_grant` and `app_session_terminated` to a dedicated
ClaudeCodeOAuthSessionTerminatedError so the request path can surface
session-termination separately from a transient upstream message.

Identity is derived from `GET api.anthropic.com/api/oauth/profile` — the
profile endpoint lives on api.anthropic.com (not the OAuth host), nests
identity under `account` and `organization`, and exposes a granular
`rate_limit_tier` we combine with `organization_type` to produce a
canonical `subscriptionType` string (`pro`, `max_5x`, `max_20x`, `team`,
`enterprise`) — the same shape the CLI persists in its on-disk
credentials.json.

Two ingestion paths:
- OAuth callback: exchange `code` against `/v1/oauth/token`, then fetch
  profile and build {config, state}. The profile call routes through
  the per-upstream fetcher (proxy-aware); the token exchange uses direct
  egress because at exchange time we don't have an upstream yet.
- credentials.json paste: parse `claudeAiOauth` wrapper, validate
  `expiresAt` is milliseconds (rejects suspected seconds-mode files),
  fetch profile but prefer the CLI's persisted `subscriptionType`
  verbatim when present. The JSON's existing access token is reused
  for the cached entry so the first request doesn't need a refresh.

Access-token cache mirrors codex's CAS semantics:
- 5-minute skew window — anything within 5 min of expiry counts as
  stale to avoid racing the upstream clock.
- Refresh-token rotation MUST land via CAS. A CAS miss is fatal because
  our fresh response holds a token a sibling rotation has already
  superseded; surfacing the error lets the next request re-read the
  just-persisted state.
- Terminal session-terminated errors flip `state` to `refresh_failed`,
  clear the cached access token, persist via CAS (best-effort — sibling
  may have already flipped), and rethrow.
- `invalidateClaudeCodeAccessToken` clears the cached slot without
  touching the refresh token (for 401-retry).

eslint.config.ts: registered the package in projectList (missed in the
prior scaffolding commit).

Test coverage: 73 tests across pkce, oauth, import, access-token-cache.
…parser

Adds the static-mimicry surface, request-detection predicate, and the
`anthropic-ratelimit-unified-*` response-header parser for the Claude Code
subscription provider. No fetch path is wired yet — that lands in the
next change.

- `headers.ts` pins the v2.1.181 header set Anthropic's third-party
  detector keys on, with separate Sonnet/Opus and Haiku `anthropic-beta`
  profiles selected via `pickClaudeCodeHeaders(modelId)`.
- `system-blocks.ts` ships the three blocks we drop on the re-mimicry
  path: identity prefix, default-template boilerplate (`cache_control:
  ephemeral`), and `buildBillingBlock(version, fp)` which emits the
  literal `cch=00000` placeholder. `computeCcVersionFingerprint(version,
  body)` produces the 3-hex `cc_version` suffix via SHA-256 over salt +
  UTF-8 bytes[4,7,20] of the first user-message text + version, padding
  with `'0'` (0x30) past end-of-string. Salt and indices ported from
  sub2api's `gateway_billing_block.go`, which itself ports from Parrot's
  `cc_mimicry.py` reverse-engineered from CC packet captures.
- `detection.ts` mirrors sub2api's `claude_code_validator.go` predicate:
  UA regex + non-messages/count_tokens/Haiku-probe short-circuits + a
  strict gate (x-app, anthropic-beta, anthropic-version, valid
  metadata.user_id in legacy or JSON form, plus a system-block billing
  fast path or Dice >= 0.5 against six identity templates). Dropping the
  Dice fallback false-negatives pre-v2.1.36 CC traffic — 2026-06-19
  probes confirmed real old CC still bills correctly on the plan tier.
- `quota.ts` parses the `unified-*` family captured live on 2026-06-19
  into a typed snapshot with nested 5h/7d windows and an overage block,
  plus a raw passthrough map of every `anthropic-ratelimit-*` header for
  dashboard display. Missing fields read as `null` rather than throw.

Default-template text extracted from `@anthropic-ai/claude-code@2.1.10`'s
plain `cli.js` and resolved to a single string (the pinned v2.1.181
release ships a Bun-compiled native binary that resists clean string
extraction; the spec records this fallback as acceptable). Tool-name
placeholders inside the template substituted with the canonical CC tool
identifiers (Bash, TodoWrite, Task, Read, Edit, Write, Grep,
AskUserQuestion, WebFetch, general-purpose). Pinning gets bumped
together with the CC version.

Dep added: `@noble/hashes@^2.2.0` (already in the workspace via packages
proxy and http). 147 new tests; full workspace test suite green (2829
tests).
Wires the Claude Code data plane on top of Task 2's OAuth + Task 3's
detection/system-blocks/quota helpers.

- Six-step messages interceptor chain (hoist-user-system / inject-billing /
  inject-identity / inject-default-template / pin-dated-model-id /
  synthesize-metadata-user-id) shaping non-CC traffic into the exact CC
  wire shape before reaching Anthropic.
- callClaudeCodeMessages: pre-fetch gates (state, rate-limit), access-token
  cache, 401 invalidate-and-retry-once with terminal-state surfacing, best-
  effort quota persistence on 2xx and 429, recordUpstreamLatency contract
  observed on every code path including synthetic gates.
- Provider factory advertises a static catalog (sonnet / opus / haiku dated
  ids) with current console.anthropic.com pricing baked in; messages is the
  only endpoint exposed, every other call surface throws not-implemented.
- Detection runs at the boundary so shaped CC traffic bypasses the
  re-mimicry chain entirely (operator's own CC client passes through with
  Authorization swap only).

193 tests across the package; workspace typecheck + lint clean.
…CallOptions

Add optional `clientRequestHeaders` and `clientRequestPathname` to
`UpstreamCallOptions` and capture them once on GatewayCtx from the inbound
Hono request. Each `attempt.ts` consumes them via a new
`buildUpstreamCallOptions(candidate, ctx, recorder)` helper so the build is
uniform across messages / responses / chat-completions and across the
generate / count-tokens / compact entry points.

Only the claude-code provider reads the new fields today — its shape
detector keys on the inbound UA + pathname + system block to decide
whether the request is already real Claude Code traffic that can pass
through unmodified. Other providers receive the fields and ignore them.

Synthetic ctxs (in-process tests, translation chains that never
originated as a real HTTP request) leave both fields undefined; the
claude-code detector treats that as "not CC-shaped" so the re-mimicry
chain runs.
The Claude Code subscription endpoint reads the
`x-anthropic-billing-header: …` block in the system prompt to bill the
request against the user's plan tier — that block has to reach Anthropic
intact. Every other upstream sees the same block as ordinary prompt text,
and the per-turn `cch=` hash inside it kills prompt-cache hits on each
turn even when the real conversation prefix hasn't changed.

Add a `strip-billing-attribution` flag (default-on for copilot / azure /
custom, default-off for claude-code) and gate the interceptor on it. The
file is rewritten from scratch — the top-comment now explains both
purposes and cites the flag id.

Update strip-billing-attribution_test.ts to cover the new flag-off path
(billing block survives intact when flag is off) alongside the existing
strip-when-on assertions, and add gateway-level integration tests in
serve_test.ts that pin the routing-side guarantee for both directions:
claude-code preserves the block, copilot strips it.

Also update flags_test.ts to enumerate the new defaults.
Replace the placeholder thrower with the real createClaudeCodeProvider
factory and declare @floway-dev/provider-claude-code as a gateway
dependency. The dispatch path that throws for unknown provider kinds
now resolves to the real factory for claude-code rows.

Compute effective per-binding flags in createClaudeCodeProvider — the
factory layers `record.flagOverrides` on top of the catalog defaults
for the kind and stamps the resulting set onto every model the
catalog advertises. Without this, the flag-gated interceptor would
see an empty `binding.enabledFlags` and the `strip-billing-attribution`
default-off semantics for claude-code would be indistinguishable from
flag-on behavior on a copilot binding.

In the provider's callMessages, switch the shaped-request detection
to read inbound `clientRequestHeaders` and `clientRequestPathname`
from the per-call UpstreamCallOptions. Previously detection ran
against the outbound wire-header bag (always empty for unshaped
traffic), so the shaped-passthrough path was effectively dead in real
traffic — a real Claude Code session would always have been re-mimicked.

Update the data-transfer routes to accept claude-code as a valid
upstream provider kind during backup/restore and round-trip
ClaudeCodeUpstreamState through the runtime's shape assertion (codex
already does this; claude-code now does too). Without this, an
operator who ran `pnpm run export` on a deployment with a claude-code
upstream would have been unable to import the dump back.
…refresh

Adds four admin endpoints mirroring the codex naming:

- POST /api/upstreams/claude-code-pkce-start
- POST /api/upstreams/claude-code-import
- POST /api/upstreams/:id/claude-code-reimport
- POST /api/upstreams/:id/claude-code-refresh-now

The import path accepts either a verbatim ~/.claude/.credentials.json paste
or a PKCE callback (full redirect URL or just code+state). The verifier is
stashed in a new claude_code_pkce_pending table whose schema was added in
0033 and whose repo wiring (memory + SQL) and scheduled sweep land here.

The generic POST /api/upstreams and PATCH /api/upstreams/:id now route
claude-code through the dedicated endpoints — config edits return 400 with
the canonical "use claude-code-reimport" message, mirroring how codex is
handled. State serialization (already in serialize.ts) exposes a slimmed
summary without the refresh token; the routes themselves never leak it.

Refresh-now CAS-rotates the refresh_token via the per-upstream proxy-aware
fetcher, flips the row to refresh_failed on session-terminated, and
returns 502 (not 401) so the dashboard's auth client doesn't log the
operator out on a dead credential.
…count card

Adds the operator-facing UI surface for the claude-code provider, mirroring
the codex panels:

- ProviderPicker grows a Claude Code tile (rose tone) so the create-mode
  upstream editor offers the provider.
- ClaudeCodeConfigPanel orchestrates the PKCE + credentials.json paste
  flow, threaded through the typed Hono RPC client. The OAuth tab opens
  the authorize URL, accepts either the full redirect URL or the
  ?code=...&state=... fragment, and posts to claude-code-import; the
  credentials.json tab POSTs the raw paste verbatim. Re-import and
  Refresh-now buttons appear on edit mode.
- ClaudeCodeAccountCard renders the identity + state badge + structured
  5h / 7d / overage quota slices, plus access-token expiry as a relative
  time. The raw header map collapses under a debug disclosure so the
  detail page covers the operator's `claude-code-reverse`-style poke
  without leaving the dashboard.
- UpstreamRow, ModelInfoBar, UpstreamPicker, and the route-loader guard
  in /dashboard/upstreams/new.vue learn the new provider kind.
- api/types.ts gains the slimmed ClaudeCodeUpstreamState / quota
  snapshot shapes — the wire shape mirrors what serialize.ts already
  emits (refresh token never surfaced, only refreshTokenSet boolean).
…ead defaults, header-casing defense)

Six small fixes from the final-review pass on the Claude Code provider
branch:

- system-blocks_test.ts: correct the byte-index annotations for
  'hello world this is a test prompt' — bytes[4]='o', bytes[7]='o' (the
  second 'o' in "world"), bytes[20]='a'. The previous comment claimed 'r'
  at index 7 and 't' at index 20. Test passes either way (assertion is on
  hash output), but the comment misled readers.
- fetch.ts persistQuotaSnapshot: drop the dishonest
  `as unknown as Record<string, unknown>` double-cast; the `data` field is
  typed `unknown`, so plain assignment is enough and correct.
- provider.ts UpstreamCallOptions: document the lowercase-key invariant on
  `clientRequestHeaders`. The record is built via `headersToRecord` in
  gateway-ctx.ts, which uses `Headers.forEach` and yields lowercase keys
  per the WHATWG Fetch spec.
- fetch.ts shaped passthrough: defensively `delete` both `authorization`
  and `Authorization` from the spread before adding our bearer token. The
  lowercase invariant should hold; the uppercase delete is one-line
  insurance, not a real expectation.
- fetch.ts shaped passthrough: extend the `stream: true` coercion comment
  to record why it is safe in the shaped path — shaped detection requires
  CC client headers + system blocks + a valid metadata.user_id, and the
  real Claude Code client always sets `stream: true`.
- ClaudeCodeAccountCard.vue: drop dead defensive defaults. The asserter
  proves `accountUuid` and `email` are non-empty strings on a non-null
  account, so `?? ''` and `?? 'Claude Code account'` are unreachable. Also
  drop the `id.length <= 18` short-circuit on `accountIdShort`: canonical
  UUIDs are always 36 chars, so the branch is dead.
…lable' literal

Background. Anthropic returns `anthropic-ratelimit-unified-fallback: available`
on every successful /v1/messages response. Claude Code v2.1.10's parser
(`HoB` in cli.js, mirrored in extractUnifiedRateLimitInfo.js of the
reverse-engineered source) reads it via `=== "available"`. Our parser
accepted only the magic strings `y` / `yes` / `true`, so the parsed
value was always null and the dashboard's "fallback active" badge never
fired.

The semantic also flips. The header reports whether a degraded-mode
fallback service is reachable, not whether it is currently in use:
`available` is the steady-state good signal, anything else is a warning
that the fallback path is gone too. Renaming the field to
`fallbackAvailable` reflects that, and the dashboard now surfaces a
warning badge only when the gateway has lost the fallback signal.

Adjustments. quota.ts switches the parse to literal-string equality
against `available`; the snapshot field becomes `fallbackAvailable`. The
SPA's `ClaudeCodeQuotaSnapshotData` type and the account card mirror
the rename. Tests cover the steady-state `available` → true,
non-`available` values → false, and absent header → null cases.
The codex catalog mapper hardcoded `enabledFlags: new Set<string>()` on
every UpstreamModel it produced, so flagOverrides configured on a codex
upstream survived the schema validation and SQL round-trip but were
silently dropped before any interceptor could read them.

The fix mirrors the claude-code provider's pattern: resolve the upstream's
effective flag set once at provider construction with
`resolveEffectiveFlags(defaultsForProvider('codex'), [record.flagOverrides])`,
then thread it into `codexRawToUpstreamModel` so every emitted model
carries the resolved set. A dedicated test asserts that turning
`responses-web-search-shim` on at the upstream layer reaches the
produced models.
… fallback)

The block was tagged "pinned to @anthropic-ai/claude-code@2.1.181" but
its body was the v2.1.10 cli.js text — v2.1.181 ships as a Bun-compiled
binary with no plain JS source, and an earlier extraction fell back to
v2.1.10 rather than reading the actual v2.1.181 wire shape.

Captured v2.1.181's real `system[2]` by pointing the binary at a local
401 echo (ANTHROPIC_BASE_URL trick) and reading back the assembled
prompt. v2.1.181 builds the prompt from many conditional sub-templates;
we keep the always-on core (intro + URL warning + # System +
# Doing tasks + # Executing actions with care + # Using your tools +
# Tone and style) and drop the per-session # Environment /
# Session-specific guidance / # Context management / # Text output
paragraphs that depend on cwd, model, output style, and experiment
flags. The captured `# Using your tools` carried the SDK-mode
TaskCreate planner; cli mode falls through to TodoWrite (igm does
[TaskCreate, TodoWrite].find), which is the token we ship.

Detection's IDENTITY_TEMPLATES gains the v2.1.181 "interactive agent"
prefix alongside the older "interactive CLI tool" string so Dice-
template fallback recognises real CC traffic on either side of the
2.1.181 boundary.
Drop the # System / # Doing tasks / # Executing actions with care /
# Using your tools sections from the v2.1.181 wire shape and keep
only the opener line, the two IMPORTANT: safety lines, and the
# Tone and style bullets — the same trimmed expansion sub2api ships
in backend/internal/service/gateway_service.go.

Those four sections are CC-agent-action instructions ("prefer
dedicated tools over Bash", "use TodoWrite to plan", "default to
writing no comments") that would inappropriately steer the model
when a non-CC downstream is on the other end of the re-mimicry path.
Identity block + safety/URL pair + tone is enough structural
material for Anthropic's plan-billing detector — the detector keys
on shape and identity, not on the trimmed sections.

Block size drops from ~9.9 KB to ~1.5 KB. Detection still passes
both old and new identity-prefix variants (matching is against
IDENTITY_BLOCK, not against the template body).
…of swallowing

invalidateClaudeCodeAccessToken previously warn-and-returned when the upstream
row disappeared mid-401-retry. That is the silent-catch-and-return pattern the
project explicitly forbids; flip it to throw so the operator sees the real
condition instead of a downstream 401 mask.

ensureClaudeCodeAccessToken previously synthesized a stateMessage fallback for
non-active accounts via "account is in state 'X'". Tighten the wire contract:
stateMessage is now required on terminal states and must be absent on active.
The discriminated-union split keeps the access-token cache from inventing a
message when it surfaces ClaudeCodeOAuthSessionTerminatedError, and the
fetch.ts non-active synthetic-503 path drops its ternary now that the field is
guaranteed.
…tead of silent defaults

deriveSubscriptionType previously coerced (organization_type=null) to 'pro' and
(organization_type=claude_max, unknown rate_limit_tier) to 'max', and passed
through any unknown organization_type by stripping the 'claude_' prefix. All
three paths silently invented subscription tiers from missing or unrecognized
upstream data; replace them with throws so future Anthropic shape changes fail
loud at ingest time instead of mislabeling accounts.

claudeCodeTokenRequest previously caught JSON parse errors and smuggled the
raw text under a synthetic '_nonJsonBody' key that downstream type checks
ignore. Surface the parse failure directly with the response status and the
raw-text excerpt; the original parse error rides on `cause`.
…inventing defaults

The catalog mapper previously fell back to slug for a missing display_name
and to 0 for missing context_window / max_context_window. A 0 is then
indistinguishable from a real upstream value, and the silent slug
substitution hides a contract change behind plausible-looking output.

Throw on each missing field — mirroring the existing slug check — so an
upstream shape change surfaces at the catalog refresh boundary rather
than seeping into downstream pricing and limits. The fallback chain in
codexRawToUpstreamModel still treats 0 as the "unset" sentinel for the
per-request window, but the path is now rewritten with explicit
comparisons so the null-vs-zero distinction is no longer wrapped in a
||-chain that conflated the two.
…explicitly

Round-1 cleanup: ClaudeCodeAccountCard previously fell back to raw.accounts[0]
when the configured account UUID did not match any state row, silently
showing credentials and quota for an unrelated account. Replace the
opportunistic find-or-fallback with a tagged CredentialLookup union
('present' | 'missing-state' | 'uuid-mismatch') and render an explicit
re-import prompt when the configured UUID is missing from state.

Timestamp parse failures returned the raw ISO string, which let malformed
upstream data render unchanged. Log a warning and substitute an
'(invalid timestamp)' sentinel instead so the operator can see the parse
failed in DevTools.
Replace the `value as Record<string, unknown>` cast in the catalog
asserter with an `isPlainRecord` type predicate; the call site no longer
asserts a shape the surrounding control flow has not already verified.

Drop the `?? {}` fallback in the callResponses / callResponsesCompact
boundary contexts. The provider contract types `headers` as
`Record<string, string> | undefined`, and spreading `undefined` is a
no-op — the fallback was defending against a case the type already
excludes.
… codex)

A throw from readClaudeCodeUpstreamState means the row's state column
was hand-edited or written by a buggier branch. Returning a 500 with
the parse error stitched into the body leaks internal error structure
to the dashboard; letting the throw propagate to the framework-level
500 handler stack-traces internally without leaking shape.
…erter contracts

extractClaudeCodeCallbackParams previously fed any non-protocol-prefixed input
to URLSearchParams as-is. A host-relative paste like
`platform.claude.com/oauth/code/callback?code=…&state=…` would hit that path
and surface a confusing "missing code" error because URLSearchParams treated
the whole string as a single malformed key. Slice at the first `?` so the
post-`?` segment parses as a query, and add a regression test.

The state asserter previously accepted both `undefined` and `null` for
`accessToken` / `quotaSnapshot` and the readClaudeCodeUpstreamState boundary
normalized absent → null. Tighten the wire contract: the asserter now requires
the keys to be present with explicit `null` (or a populated entry), so the
boundary normalization becomes dead code and is dropped. Both writers
(`auth/import.ts`, `access-token-cache.ts`) already always set both fields
explicitly, so this matches the on-disk reality.
Drop four comments that paraphrase the immediately-following code: the
CodexRawModel shape header repeats the field list, the
codexRawToUpstreamModel describe-block preamble repeats what the four
tests beneath it visibly do, and two provider-test comments restate a
helper name and a `toHaveBeenCalledTimes(1)` assertion.
… drift

The codex and claude-code refresh-now / OAuth-ingest handlers had
diverged on three small dimensions: where the single-use PKCE-state
guarantee was documented, whether the asserted state alias or the
raw existing.state was passed to saveState, and what comments
described the failure-state CAS write. Align both to a single
voice and structure so future audits see one OAuth-ingest pattern
instead of two near-identical ones.

Also drop codex's manual 500-with-error-string wrapper around the
state-shape assert. Mirroring claude-code (now letting the throw
propagate), a malformed state column means the row was hand-edited;
the framework-level 500 handler stack-traces internally without
leaking the parse error to the dashboard.
…panels

Round-1 cleanup: across UpstreamRow, UpstreamPicker, UpstreamConfigPanel,
ModelInfoBar, UpstreamEditPage, and the new-upstream loader, the SPA
carried defensive `??`/`||` fallbacks against fields the wire types
already require. Drop the fakes and replace silent default arms with an
exhaustive `assertNever` so a future widening of UpstreamProviderKind
fails the build instead of silently routing through the amber/zinc
catch-all.

The new-upstream loader threw the loaded list/flags through `?? []` and
silently rendered an empty page if the store failed to populate. Throw
through the loader instead so the framework surfaces the error.

UpstreamEditPage's autoForActive had a defensive `azure → []` early
return that was unreachable in practice (the loader does not fetch for
Azure, so `upstreamModels` is empty and the fall-through returns the
same value); drop it.

Add a shared `apps/web/src/utils/assert-never.ts` helper for the
exhaustive checks.
…tion

The Hono entry built a fresh Record<string, string> on every request via
headersToRecord(c.req.raw.headers), then the consumer reconstructed a
Headers instance (`new Headers(opts.clientRequestHeaders)`) for case-
insensitive lookups. The detector reads four headers; building and
re-wrapping the whole map per request was pure allocation churn on the
hot path.

Forward the runtime's own Headers instance through GatewayCtx and the
UpstreamCallOptions wire so lookups stay native; the consumer's wrap
still works because new Headers(...) accepts a Headers init. The wire
type stays Headers | Record<string, string> so synthetic call sites in
tests can keep passing a plain record literal.

Also drop a stale citation in the GatewayCtx interface comment that
referenced a Gemini WebSocket entry that does not exist (createGateway
CtxForWs lives under responses/).
…uotaSnapshot

state.ts previously typed `ClaudeCodeQuotaSnapshotEntry.data` as `unknown` and
fetch.ts re-cast at the consumer (`as ClaudeCodeQuotaSnapshot | null`) plus
non-null asserted afterward. Tighten the wire contract: data is now
`ClaudeCodeQuotaSnapshot` directly, the asserter calls into a new
`assertClaudeCodeQuotaSnapshot` in quota.ts that validates each field, and
`isRateLimitedNow` becomes a type predicate so the resetIso line drops the `!`.

Test fixtures that built the snapshot's `data` ad-hoc now use a `fullQuotaSnapshot`
helper and a regression test pins the inner-shape rejection so future loose
data from the wire fails loud.
…extraction

extractClaudeCodeCallbackParams throws on a malformed callback URL.
Previously the throw fell through the outer try/catch and surfaced
as an opaque 400, with the URL-parse step's role in the data flow
invisible from the read of the function. Wrap the call so the
malformed-URL early-return is visible alongside every other
explicit `{ ok: false }` exit in the helper.
Menci added 14 commits June 19, 2026 18:57
… compute

- New protocols/common/models_test.ts covers resolveEffectivePricing
  (override merge, strips `tiers`, unknown/null/absent tier behavior,
  null base) and the input_cache_write_1h → input_cache_write → input
  fallback chain.
- New Messages stream usage tests cover the cache_creation per-TTL split
  (sub-object takes precedence over the rolled-up flat field), the
  fallback to the flat field when the sub-object is absent, and the
  speed=fast → tier=fast extraction (standard leaves tier unset).
- New aggregate tests verify the per-tier override applies during cost
  compute (Opus 4.8 fast at 2× base costs twice as much) and that an
  unknown tier falls back to base pricing, plus pricing for the new
  input_cache_write_1h dimension via its dedicated rate.
- Responses WebSocket usage extractor now reads service_tier from the
  response object (matching the HTTP path), since it ran a duplicate
  helper that had drifted before this change.
Add `tiers.flex` and `tiers.priority` overlays for every priced Codex slug
so the dashboard's notional cost reflects which OpenAI service tier the
request actually ran on. The gateway already captures
`usage.service_tier` into `TokenUsage.tier`; this commit completes the
loop by giving the cost compute a per-tier rate row to look up.

Tier overrides match OpenAI's public pricing (verified 2026-06-19 against
https://platform.openai.com/docs/pricing):

  gpt-5.5         flex $2.5/$0.25/$15      priority $12.5/$1.25/$75
  gpt-5.4         flex $1.25/$0.13/$7.5    priority $5/$0.5/$30
  gpt-5.4-mini    flex $0.375/$0.0375/$2.25  priority $1.5/$0.15/$9

`codex-auto-review` shares `gpt-5.4`'s pricing including the tier
overrides. Codex CLI's `/fast` toggle writes `service_tier: "priority"`
on the wire (per openai/codex's `ServiceTier::Fast.request_value()`), so
operator-facing rows tagged "fast" cost out at the priority row.

Cache-write rate stays unset on these entries — OpenAI charges cache
creation at the same rate as input, which `unitPriceForDimension`'s
fallback chain already covers.
…provider

# Conflicts:
#	apps/web/src/api/types.ts
#	apps/web/src/components/upstream-edit/UpstreamConfigPanel.vue
#	apps/web/src/components/upstream-edit/UpstreamEditPage.vue
#	apps/web/src/pages/dashboard/upstreams/new.vue
#	packages/gateway/src/control-plane/upstreams/routes.ts
#	packages/gateway/src/control-plane/upstreams/routes_test.ts
#	packages/gateway/src/data-plane/llm/responses/websocket.ts
#	packages/gateway/src/data-plane/llm/shared/gateway-ctx.ts
#	packages/gateway/src/data-plane/llm/shared/gateway-ctx_test.ts
#	packages/gateway/src/repo/sql.ts
Backport the Claude Code refresh-race recovery (commit f1efc9d) to
Codex. Same architecture, same vulnerability:

Under burst load, two workers can both observe a stale access token on
the same Codex upstream and both attempt a refresh. OpenAI rotates the
refresh_token on every successful /oauth/token call, so exactly one
racer wins; the other's request is rejected with `invalid_grant` for
trying to redeem the rotated-out copy. The previous flow treated every
`invalid_grant` as a dead credential and let the caller flip the
account to `refresh_failed` — destroying the working credential a
sibling had just rotated, and forcing the operator to re-import.

On `invalid_grant`, the access-token cache now re-reads upstream state
for the same `chatgpt-account-id` slot and compares the refresh_token
it tried against what is now stored. If they differ, a sibling rotated
and we return their freshly-minted access token (the caller treats it
as a normal cache hit and skips the terminal flip). If they match, we
re-raise the original error so the data-plane / control-plane caller
flips the row as before. The other refresh-terminal codes —
`app_session_terminated`, `invalid_refresh_token`, `invalid_client`,
`unauthorized_client`, `access_denied` — bypass recovery entirely;
none of them are caused by a rotation race.

`CodexOAuthSessionTerminatedError` now carries the raw OAuth `error`
value as a `code` field alongside the existing `upstreamMessage` so the
recovery branch can single out `invalid_grant` from the catch.
`REFRESH_TERMINAL_OAUTH_CODES` is broadened to the audit-aligned set
(`app_session_terminated`, `invalid_grant`, `invalid_refresh_token`,
`invalid_client`, `unauthorized_client`, `access_denied`) — Codex is
OpenAI OAuth, so the list matches sub2api's `isNonRetryableRefreshError`
verbatim.

Sub2api's tryRecoverFromRefreshRace
(backend/internal/service/oauth_refresh_api.go:173-193) is the canonical
pattern; we apply it to Codex's per-account credential here. The token
rotation persistence hook stays awaited (matches Claude); the recovery
branch reads from the just-persisted state via the upstream repo and
returns the sibling's cached access token directly so no second mint
fires from this call site.
…ope)

Adds a third Claude Code credential class alongside the existing full-scope
PKCE flow and credentials.json paste: the Setup Token, a long-lived
(~1 year) inference-only bearer that Anthropic's "Create a Long-Lived
Token" UI issues. Cross-checked against sub2api (BuildSetupTokenAuthorizeURL,
ScopeInference, expires_in=31536000 on exchange) and claude-relay-service
(SCOPES_SETUP, exchangeSetupTokenCode).

Wire-shape differences from the regular OAuth flow:
- Authorize URL: scope narrows to `user:inference` only (no
  org:create_api_key, no user:profile, ...). Host / client_id / redirect_uri
  / response_type are identical.
- Token exchange: body adds expires_in=31536000 to request the 1-year
  bearer. Response has NO refresh_token — the access token IS the credential.
- Profile endpoint: the bearer lacks user:profile scope, so
  /api/oauth/profile 403s. The existing degraded-identity fallback fires
  and surfaces a deterministic accountUuid + null email.

State and lifecycle:
- ClaudeCodeAccountCredential becomes a discriminated union on tokenKind:
  `oauth` carries a non-empty rotating refresh token; `setup-token` carries
  null. Old records (written before this field existed) are normalized to
  `oauth` on read so legacy state passes the strict asserter.
- ensureClaudeCodeAccessToken short-circuits for setup-token: cached
  bearer is returned when fresh, or the row is flipped to refresh_failed
  with a "re-import to recover" message when expired. No upstream call.
- The control-plane refresh-now route rejects setup-token credentials
  with a precise message rather than attempting a refresh that has no
  rotation counterpart.

ClaudeOAuthTokenResponse.refresh_token becomes optional to reflect the
setup-token shape; the OAuth-flow and refresh paths guard explicitly so
shape drift surfaces a clear error instead of corrupting persisted state.

Adds tests for: setup-token authorize URL scope, setup-token exchange body
(expires_in=31536000), setup-token import end-to-end with degraded
identity, setup-token cache short-circuit (fresh + expired paths), state
asserter token-kind discrimination, and legacy default-fill.
Adds three parallel routes for the Setup-Token import flow alongside the
existing OAuth ones:

- POST /api/upstreams/claude-code-setup-token-pkce-start
- POST /api/upstreams/claude-code-setup-token-import
- POST /api/upstreams/:id/claude-code-setup-token-reimport

The PKCE pending store is shared with the regular OAuth flow because only
the authorize URL's `scope` differs server-side at Anthropic; a verifier
issued by either start endpoint can be consumed by either import endpoint
in principle, though the UI keeps them paired so the user-facing scope
intent matches the persisted credential class.

The setup-token import body has no `credentials_json` path (Anthropic's
CLI never persists a setup token) — only the OAuth callback is accepted.
The default display name falls back to the short account UUID when the
degraded-identity path returns null email (the usual case).

claude-code-refresh-now now rejects setup-token credentials with a
precise 400 message; the existing route was previously typed against the
old shape where `refreshToken` was always non-null. It also asserts the
refresh-token presence on the upstream response so shape drift surfaces
loudly instead of corrupting the persisted row.

The control-plane serializer surfaces `tokenKind` on each account
summary so the dashboard can label setup-token accounts distinctly. The
SPA types declare the field as optional to tolerate the pre-deploy
window before the gateway is upgraded.

Adds tests covering: setup-token PKCE start, end-to-end setup-token
import with degraded identity, refresh-now rejection on setup-token,
and setup-token reimport replacing credentials in place.
Three-tab strip in the Claude Code import flow:
- Sign in with Claude — existing full-scope PKCE OAuth.
- Setup Token — new inference-only PKCE flow with a brief callout
  explaining the trade-off (cannot self-mint API keys, cannot be
  refreshed; safer for shared deployments).
- Paste credentials.json — existing manual paste.

Each PKCE-driven tab manages its own in-flight authorize URL session
because the URL's `scope` is locked server-side; switching tabs lazy-
fetches the URL for the newly active tab.

The Account Card shows a "Setup Token" badge next to the subscription
chip (violet tone, uppercase) so an operator scanning the dashboard
immediately sees which credential class a row holds. The "Refresh
token now" button is hidden for setup-token rows because that flow
has no rotation counterpart — re-import is the only renewal path.

Wire-shape: dashboards consume the new `tokenKind` field through the
`ClaudeCodeAccountCredentialSummary` type declared in commit 2; the
field is optional client-side so a dashboard running ahead of a deploy
still parses pre-upgrade responses correctly.
…olate

A cold-start fan-out of N concurrent requests against an empty access-token
cache used to drive N parallel `/v1/oauth/token` POSTs. Anthropic rotates
the refresh token on every call, so only one POST survives; the rest get
`invalid_grant` and fall into `recoverFromRefreshRace` (commit f1efc9d)
which re-reads state and returns the winner's freshly-rotated token. That
is correct but burns N round-trips against the OAuth endpoint per cold
start and adds the recovery re-read to every loser's latency.

Add a module-level `Map<upstreamId, Promise<EnsuredAccessToken>>` that
coalesces concurrent ensures on the same upstream into a single in-flight
promise. Later callers await the existing promise and observe the first
caller's result; the map entry is cleared on settle (both success and
failure) so the next wave is free to mint when the cache expires again.

Scope: per-isolate only. Cloudflare Workers isolates do not share memory,
so siblings on other isolates still race; cross-isolate dedup remains the
job of `recoverFromRefreshRace`. Sub2api gates the same path with a Redis
SETNX lease over a 60s TTL window (gateway_service refresh path via
`oauth_refresh_api.go:91-105`), making the whole cluster single-mint per
refresh window. We trade that for zero coordination dependency at the cost
of cross-isolate-only round-trip duplication — typically the rare case,
since a single isolate usually serves multiple concurrent requests for the
same credential.

`freshlyMinted` semantics widen slightly: coalesced waiters now report
`freshlyMinted: true` even though they did not drive the mint themselves
(they share the minter's EnsuredAccessToken). The 401-retry path still
behaves correctly — the slight semantic shift turns a previously-cheap
"invalidate + re-mint on 401" branch into a "give up on 401" branch for
coalesced peers, which is the safer of the two under a dying-credential
scenario.

Tests cover the 10-parallel-cold-start case (asserts exactly one fetch),
the post-settle map cleanup case, and the rejected-mint propagation case
(every waiter sees the same rejection, map cleared).
… boundary

Anthropic /v1/messages requires `max_tokens` and 422s without it. Real CC
also always emits `temperature: 1`; its absence is one of the CC-shape
fingerprint failures the plan-billing detector keys on. Third-party
clients (cline, aider, custom integrations) routinely omit one or both,
expecting the gateway to fill defaults.

Sub2api (gateway_service.go:1301-1314, rev 4a5665d) backfills both
unconditionally: `max_tokens: 128000`, `temperature: 1`. We mirror the
temperature default verbatim and pick a softer max_tokens policy —
prefer the model's advertised `limits.max_output_tokens`, else fall back
to the gateway-wide MESSAGES_FALLBACK_MAX_TOKENS (8192). Sub2api's
hardcoded 128000 ignores per-model output caps and is not worth
porting.

The new interceptor sits at the head of `claudeCodeMessagesChain` so
the rest of the re-mimicry steps (notably `inject-billing-block`, whose
fingerprint depends on the post-chain wire shape) see a fully-formed
payload.
…org 403 sentinels

Two credential-class body sentinels were leaking through as ordinary
4xxs, leaving the operator to chase phantom request errors while the
gateway burned an OAuth refresh on every retry against a permanently
dead account:

  - 400 invalid_request_error / "organization has been disabled"
    (sub2api ratelimit_service.go:208-214, CRS
    claudeRelayService.js:_isOrganizationDisabledError)
  - 403 permission_error / "OAuth authentication is currently not
    allowed" (CRS claudeRelayService.js:_isOrganizationDisabledError)

Both signal a permanently disabled org that no retry, refresh, or
re-import can recover; the operator must contact Anthropic. The
match runs on the lowercased error.message substring against the
typed (invalid_request_error / permission_error) error shape.

Detection lives in performUpstreamCall's .then(response => ...) block,
alongside the existing quota-snapshot persist. The response body is
cloned before reading so the upstream response still streams to the
caller verbatim — passthrough discipline holds; the terminal flip is
purely additional dashboard signal (the same role persistTerminalState
plays inside access-token-cache's refresh-death path). The persist is
fire-and-forget via the same waitUntil shim the quota persist uses.

Body parse is deliberately defensive: any 400 / 403 whose body is not
JSON, or whose JSON shape does not match {error:{type,message}}, must
not trigger a terminal flip — those are the common case (max_tokens
validation, beta-feature gating, etc.). Skip-when-already-terminal
keeps the first terminal signal stable when a sibling write
(e.g. an OAuth refresh death) won the race.
…-stream

`tokenUsageFromMessagesFrame` only returned the running usage figure on
`message_stop`, leaving the per-frame observer's `state.rememberUsage`
calls a no-op for the in-flight `message_start` and `message_delta`
frames. When the downstream client aborted mid-stream — every Ctrl-C in
the Claude Code CLI — the streaming finally block then recorded a null
or stale usage, so billing telemetry under-counted every aborted session
even though Anthropic already metered the output tokens against the
operator's plan window.

Return the running snapshot on `message_start` and `message_delta` too
so `state.usage` checkpoints as the stream progresses; the finally block
already records whatever was last observed when the client disconnects.

Audited against sub2api (drains upstream to capture final usage even on
client disconnect) — different remedy, same intent: keep recorded usage
aligned with what the upstream actually metered.
A stuck upstream SSE read otherwise pins the request open until the
underlying fetch read timeout (Workers HTTP cap ~100s) or the client
disconnects — neither of which Hono's keepalive scheme catches, because
the downstream ping timer fires whether or not upstream sent anything.

Add a generic `withIdleTimeout` wrapper that races every iterator next()
against a setTimeout, invokes an `onTimeout` callback (to abort the
upstream fetch via the gateway's downstream AbortController), and throws
`UpstreamIdleTimeoutError`. Each received frame resets the window;
Anthropic's own `event: ping` survives parse-sse as a real frame, so the
30s upstream ping cadence keeps healthy streams alive forever.

Wire it into the Messages stream path with a 60s window — 2× headroom
above Anthropic's 30s ping cadence, comfortably below the Workers HTTP
timeout. The thrown error flows through the existing `messagesSseFrames`
catch, which emits a synthetic Messages `event: error` so downstream
clients see a clean failure instead of a hung socket.

Audited against sub2api's `StreamDataIntervalTimeout` (config-driven
30-300s, default off, synthetic error frame on fire); we default on
because a stuck upstream cannot pin a request indefinitely.
…e breakpoint cap

Anthropic's prompt-caching API rejects requests that carry more than
four `cache_control` markers ("the API enforces a maximum of 4 cache
points per request",
https://platform.claude.com/docs/en/build-with-claude/prompt-caching).
The budget is shared across system blocks, tools, and message content.

`inject-default-template` previously appended DEFAULT_TEMPLATE_BLOCK at
`system[2]` with its `cache_control: { type: 'ephemeral', ttl: '5m' }`
unconditionally. Callers that already carry four breakpoints across
their tools / system / messages were therefore pushed to five on the
wire and the request 400'd upstream.

Count caller breakpoints before injection: if adding ours would exceed
the cap, inject the template text without `cache_control` (the three-
block system shape that real CC ships still needs to be there for the
Anthropic plan detector; only the cache anchor is dropped). Caller
overage past the cap is left for the caller to fix — we never synthesize
an error for it.

Mirrors sub2api's `maxCacheControlBlocks = 4` and the count-and-demote
pass in `enforceCacheControlLimit`
(`backend/internal/service/gateway_service.go:71,4581`).
…flake

The control-plane route tests use `setupAppTest` to spin up the full
Hono app + memory D1 mock + admin session per test. Real work is
hundreds of milliseconds, but under workspace-parallel load (workers
contending for CPU, GC pauses) several of them push past vitest's 10s
default and flake intermittently. 30s gives headroom without masking
actual hangs.

Also folds in eslint --fix for the stylistic/function-paren-newline
violations the access-token-cache_test agent left.
@Menci Menci force-pushed the worktree-claude-code-provider branch from c69e72e to ef2061f Compare June 19, 2026 14:33
Menci added 2 commits June 19, 2026 22:49
…ignals

Per the deep observability audit (sub2api parity gap 8 + gaps 1/2/4), the
provider previously emitted three free-form `console.warn`/`console.info`
lines for the entire surface — refresh-race recovery, quota-persist
failure, terminal-sentinel persist failure, identity-degraded fallback,
unknown organization_type — with no event name discipline, no field
discipline, and zero signal for the moments operators most need to grep
for: refresh-token rotation, quota-status transitions, terminal flips.

Add a small `log.ts` helper (snake_case event name + KV fields,
slog-shaped). KV chosen over JSON-line so the lines stay
human-greppable on a worker tail without a JSON pretty-printer; both
runtimes' `console.*` flush them verbatim. Format: `event_name field=value
field2="quoted with spaces"`; numbers and booleans bare; null/undefined
emitted as the literal `null` so the operator sees the field was
considered. All event names prefixed `claude_code_` so a single grep
separates them from anything else in the worker log.

Converts every existing console.* in the provider to a structured event
and adds the four high-value signals that were previously silent:

- `claude_code_refresh_token_rotated` — emitted by the access-token cache
  on every successful refresh round-trip (was silent; only observable
  via after-the-fact `accessToken.refreshedAt` diffs).
- `claude_code_refresh_race_recovered` — replaces the free-form
  console.info; now carries `upstream_id`, `account_uuid`, and the first
  6 chars of the rotated refresh token.
- `claude_code_account_state_flip` — emitted on every persistTerminalState
  call (access-token cache: setup-token expiry, OAuth refresh death)
  and on the terminal-sentinel persist in fetch.ts. Carries `from_state`,
  `to_state`, `reason`, `oauth_code`/`upstream_status`, and the
  operator-facing message. Previously the flip was silent — the only
  signal was a 503 surfacing to the client.
- `claude_code_quota_state_transition` — emitted from persistQuotaSnapshot
  only when the persisted status differs from the prior snapshot, so
  the line count tracks transitions (allowed→rejected, rejected→allowed,
  initial population) rather than every request. Previously, the only
  way to observe the moment of rate-limit onset was to diff
  `state_json.quotaSnapshot.fetchedAt`.
- `claude_code_terminal_sentinel_detected` — emitted when the
  org-disabled-400 or banned-org-403 body sentinel matches, separately
  from the subsequent state-flip event so the detection signal is
  visible even if the persist races and loses.

Plus housekeeping conversions:
- `claude_code_quota_persist_failed`, `claude_code_terminal_sentinel_persist_failed`
  replace the two warn-on-rejection paths in fetch.ts.
- `claude_code_identity_degraded_fallback` and
  `claude_code_unknown_organization_type` replace the two console.warn
  calls in auth/identity.ts.

The per-request "detection fallback" event (gap 9 in the audit) is out
of scope here — it would require either accepting a high-volume per-
request log line or adding a runtime env-getter dependency just for an
opt-in flag. Left for a follow-up.

Includes log_test.ts covering format, whitespace quoting, null/undefined
handling, and the three log-level routes.
…e operator

The control-plane claudeCodeRefreshNow handler does its own /v1/oauth/token
mint + CAS-write, parallel to the data-plane access-token cache. Under a
narrow interleaving, a data-plane request rotates the refresh token between
our read and our mint, the upstream answers our (now stale) RT with
invalid_grant, and the handler flipped the row to refresh_failed and
returned a 400 "Re-import the credential to recover" toast — even though
the credential was alive and freshly rotated by the sibling.

Mirror the data-plane recovery from access-token-cache.ts (commit
f1efc9d): on invalid_grant (or a losing CAS), re-read the row. If a
sibling rotated successfully (state still active, refresh token differs,
fresh access token in cache), surface 200 with the rotated state so the
dashboard shows the refresh as the no-op success it actually was. If the
row is unchanged or now terminal, fall through to the existing terminal-
flip path. Bounded by a `recoveryAllowed` flag for one recovery attempt
per request, matching the data-plane depth guard.

Audit reference: sdd/sub2api-deep-concurrency.md axis #10.
Menci added 2 commits June 20, 2026 02:35
Per the deep admin-routes audit (sub2api-deep-admin-routes.md item #5)
plus the dedicated quota-probe decision (sub2api-deep-quota-probe-decision.md),
real @anthropic-ai/claude-code@2.1.181 calls
GET https://api.anthropic.com/api/oauth/usage as a standalone request to
read the unified rate-limit snapshot without burning a model call. The
binary's string table carries the literal "fetchUtilization: GET
/api/oauth/usage" plus a response decoder that consumes
{five_hour, seven_day, seven_day_sonnet, ...} each as
{utilization, resets_at} plus optional overage fields.

Mirror this in the gateway as POST /api/upstreams/:id/probe-quota:

- Add fetchClaudeCodeUsageProbe(accessToken, fetcher) helper in
  packages/provider-claude-code/src/auth/usage-probe.ts. Sends the same
  oauth-2025-04-20 anthropic-beta header sub2api pins (without it the
  upstream 401s even on a valid bearer). Returns body verbatim — we do
  NOT shape-check the inner JSON because Anthropic adds fields
  (priorIsUsingOverage, hadPriorUtilizationData, ...) without warning,
  and a strict parser would reject a perfectly usable new field.
- Add control-plane route claudeCodeProbeQuota in
  packages/gateway/src/control-plane/upstreams/routes.ts. Mints the
  access token via ensureClaudeCodeAccessToken (the same code path the
  data plane uses, so refresh races and terminal flips are handled
  identically). Routes through the per-upstream proxy fetcher so the
  probe lands through the same chain as data-plane requests. Returns
  the upstream body spread at the top level + a gateway-stamped
  fetched_at ISO timestamp.
- Persist the probe response into a new ClaudeCodeAccountCredential
  field `usageProbeSnapshot` so the dashboard's quota card sees the
  freshest snapshot without re-probing. Distinct slot from the existing
  `quotaSnapshot` (which is header-derived from the data plane) because
  the wire shapes are completely different — header-derived
  ClaudeCodeQuotaSnapshot has a fixed schema, the live probe is
  free-form JSON that evolves on Anthropic's schedule.
- state.ts adds the new slot with a legacy-default normalization in
  readClaudeCodeUpstreamState so records written before this commit
  still pass the strict-key gate; the asserter requires an explicit
  `null` from new writes. State persistence is best-effort: a CAS loss
  to a concurrent rotation is fine because the live response already
  returned to the operator regardless.
- Audit log: emit the brief's `claude_code_admin_action` structured
  event (action: 'quota_probe') on both success and error. The event
  reuses the provider's existing slog-shaped logger (log.ts is now
  re-exported from the package index so the gateway can emit the same
  KV-formatted lines as the provider).

Non-claude-code upstreams 404; the probe semantics are claude-code-specific.
The probe-quota route (added in the prior commit) lets an operator pull a
live /api/oauth/usage snapshot without burning a model call. Surface it as
a Refresh button on the account card and reconcile the two snapshot
sources the credential now carries:

- quotaSnapshot is header-derived; written by the data plane on every
  /v1/messages response.
- usageProbeSnapshot is the verbatim probe body; written by the new
  operator-driven route.

The two cover the same 5h/7d windows under different field names and
shapes. Render whichever source is newer per window (probe right after
the operator hits Refresh; headers right after any real model call). The
probe-only 7-day Sonnet window shows up only when the probe has run.
Each window chip labels its source so the operator knows which axis the
percentage came from.

Plumbing: ClaudeCodeAccountCard emits `refresh-quota`;
ClaudeCodeConfigPanel handles the POST and emits `quota-refreshed` with
a locally-merged record (the route returns the upstream body verbatim,
not the persisted summary, so we mirror the gateway's persistence shape
on the client). The new event is distinct from `imported` because the
existing handler navigates / re-keys the loader, which is the wrong
response for a non-mutating data refresh. UpstreamEditPage lands the
updated record on `liveRecord` so the card re-renders without a list
reload, and the panel's `:record` binding switches from `props.record`
to `liveRecord` so quota-refresh mutations actually reach the card.

Raw probe extras get their own disclosure for fields outside the three
known windows (priorIsUsingOverage, hadPriorUtilizationData, ...), so a
schema addition on Anthropic's side surfaces without a dashboard change.
@Menci Menci force-pushed the worktree-claude-code-provider branch from 94d5363 to 739434b Compare June 19, 2026 18:48
@Menci Menci marked this pull request as draft June 19, 2026 19:29
Menci added 2 commits June 20, 2026 03:43
Per the "杜绝假鲁棒" rule: defensive checks that the type system already
guarantees, or that the only call site contractually rules out, are dead
weight and should be removed so real failures stay loud.

- detection.ts: the inbound-pathname short-circuit (`if (input.pathname !==
  '/v1/messages') return true;`) is unreachable. `callMessages` is the only
  caller and the catalog declares `endpoints: { messages: {} }`; every other
  surface returns a synthetic 405 before reaching the detector. Drop the
  branch, the `pathname` field on `ClaudeCodeShapedRequestInput`, and the
  `clientRequestPathname` plumbing in `provider.ts:looksShaped`.

- access-token-cache.ts: the `if (account.refreshToken === null) throw …`
  check inside the oauth branch is dead. The `setup-token` early-return
  above narrows TS to the `oauth` arm of the discriminated union, whose
  `refreshToken` is typed `string`. The state asserter enforces it on read.

- ClaudeCodeAccountCard.vue: `typeof raw !== 'object'` is dead — `raw` is
  typed `Record<string, string>` from the schema, so the `!raw` guard
  already covers the only possible falsy case (the `quota.value?.raw`
  optional chain returning undefined).

Tests updated to drop the obsolete `pathname` / `clientRequestPathname`
inputs and the two short-circuit cases that exercised the deleted branch.
…provider

# Conflicts:
#	apps/web/src/api/types.ts
#	apps/web/src/components/models/ModelInfoBar.vue
#	apps/web/src/components/settings/UpstreamRow.vue
#	apps/web/src/components/upstream-edit/CodexConfigPanel.vue
#	apps/web/src/components/upstream-edit/CopilotConfigPanel.vue
#	apps/web/src/components/upstream-edit/CopilotDeviceFlow.vue
#	apps/web/src/components/upstream-edit/ProviderPicker.vue
#	apps/web/src/components/upstream-edit/UpstreamConfigPanel.vue
#	apps/web/src/components/upstream-edit/UpstreamEditPage.vue
#	apps/web/src/components/upstreams/UpstreamPicker.vue
#	packages/gateway/src/control-plane/auth/github-device-flow.ts
#	packages/gateway/src/control-plane/data-transfer/routes.ts
#	packages/gateway/src/control-plane/schemas.ts
#	packages/gateway/src/control-plane/upstreams/proxy-resolution.ts
#	packages/gateway/src/control-plane/upstreams/routes.ts
#	packages/gateway/src/control-plane/upstreams/routes_test.ts
#	packages/gateway/src/data-plane/providers/registry.ts
#	packages/gateway/src/repo/sql.ts
#	packages/provider/src/model.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant