Skip to content

feat(provider): plumb waitUntil + inbound request headers through UpstreamCallOptions#62

Open
Menci wants to merge 5 commits into
mainfrom
precursor-provider-interface-plumbing
Open

feat(provider): plumb waitUntil + inbound request headers through UpstreamCallOptions#62
Menci wants to merge 5 commits into
mainfrom
precursor-provider-interface-plumbing

Conversation

@Menci

@Menci Menci commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

One of four parallel precursor PRs splitting cross-cutting infrastructure out of #60 (claude-code subscription provider).

  • UpstreamCallOptions.waitUntil — fire-and-forget promise registration that survives past the response. Providers use it for post-response persistence (quota snapshots, token rotation). Maps to ExecutionContext.waitUntil on Workers; no-op on Node.
  • UpstreamCallOptions.clientRequestHeaders + clientRequestPathname — inbound HTTP request inspection so providers can detect shaped-passthrough patterns. Set only when the call originates from a real Hono request; absent on translated/synthetic paths.
  • Codex synthetic 405 for unsupported endpoints — /responses is the only surface; any other call (chat-completions / messages / embeddings / images) returns a clean 405 envelope instead of leaking a stack trace.

Test plan

  • typecheck clean
  • full workspace test pass
  • lint clean

Follow-ups in sibling precursors / claude-code PR

  • PR2 (response surface) consumes `waitUntil` for streaming + uses these UpstreamCallOptions fields verbatim.
  • PR3 (control-plane endpoints) consumes nothing from this PR but uses `clientRequestHeaders` indirectly via the providers it instantiates.
  • PR4 (tier-aware billing) is independent.
  • feat(provider): Claude Code subscription provider #60 (claude-code) is the heaviest consumer — uses `waitUntil` for quota snapshot / token rotation, `clientRequestHeaders` for shaped-passthrough detection.

Menci added 3 commits June 20, 2026 00:39
…CallOptions

Add optional `clientRequestHeaders` and `clientRequestPathname` to
`UpstreamCallOptions` and capture them once on GatewayCtx from the inbound
Hono request. Each `attempt.ts` consumes them via a new
`buildUpstreamCallOptions(candidate, ctx, recorder)` helper so the build is
uniform across messages / responses / chat-completions and across the
generate / count-tokens / compact entry points.

The captured Headers is the native `Headers` instance the runtime gave us,
not a snapshot — header lookups on the hot path stay native. The
UpstreamCallOptions field accepts either `Headers` or a plain
`Record<string, string>` so synthetic test sites can pass a record without
boxing.

Synthetic ctxs (in-process tests, translation chains that never originated
as a real HTTP request) leave both fields undefined. Consumers that key on
the inbound request shape (e.g. native-client shaped-passthrough detection)
must treat undefined as "no inbound context, do not shortcut".
…and-forget persistence

Add an optional `waitUntil` slot on `UpstreamCallOptions` so providers can
register fire-and-forget promises that need to outlive the response. Reuses
the existing per-request backgroundScheduler — they share the runtime
primitive (`ExecutionContext.waitUntil` on workerd, a no-op on Node) and the
per-call name reads more naturally at the provider boundary.

Wire it through the three places that build `UpstreamCallOptions`: the
attempt-layer helper (messages / responses / chat-completions), the
passthrough-serve path (embeddings / images / count_tokens), and the
inline construction inside the responses-side image-generation server tool.

Codex's transport is the first consumer. Three best-effort state writes —
the quota snapshot on 2xx, the quota snapshot on 429, and the freshly-minted
access token on the 401-retry path — used to rely on a bare
`.catch(() => {})` fire-and-forget. On workerd a fire-and-forget promise is
cancelled the instant the streaming response returns to the client, so the
write could be dropped mid-flight and the next request would re-mint or
re-race the upstream. Hand each write to the runtime's waitUntil slot via
a single `registerBackgroundWrite` helper. The error swallow stays (these
writes are recoverable noise — CAS losses to concurrent rotations,
transient storage failures — and the request must not crash on them), but
the swallow now happens inside a lifetime the runtime promises us.

The slot is optional on UpstreamCallOptions because synthetic call sites
(in-process tests, internal compaction) legitimately omit it; the optional
chain in `registerBackgroundWrite` degrades cleanly for those.
…king stack trace

The Codex upstream only exposes /responses; the catalog mapper advertises
that single endpoint and getProvidedModels is the only place the data
plane learns what surfaces are reachable. Even so, a stray dispatch from
an upstream routing bug would land in one of the stub `call*` methods on
the provider — and the stubs rejected with a bare `Error`, which escaped
the boundary as a 500 with a raw stack trace.

Replace each stub with a synthetic 405 carrying the gateway's standard
JSON error envelope (`{ error: { type, message } }`). Each call still
flows through the per-call `recordUpstreamLatency` recorder to honour
the wrap-once contract that every boundary-facing response must satisfy,
including synthetic ones that never touch the network.

The existing throw-asserting test is replaced with one that exercises
every unsupported surface (now including callMessages and
callChatCompletions, which the previous assertion missed) and verifies
the 405 envelope.
@Menci Menci changed the title Provider: thread waitUntil + clientRequestHeaders + path through UpstreamCallOptions; codex synthetic 405 feat(provider): plumb waitUntil + inbound request headers through UpstreamCallOptions Jun 19, 2026
Menci added 2 commits June 20, 2026 03:17
Reviewed the 3-commit precursor diff as a single final state and
removed the scaffolding that crept in:

- UpstreamCallOptions: drop the speculative `Headers | Record<string, string>`
  union on clientRequestHeaders (no caller passes a Record) and the matching
  JSDoc block; trim the per-field preamble to one short paragraph per field.
  Make `waitUntil` required — every production callsite supplies it via
  ctx.backgroundScheduler or its local scheduler, and the test stub already
  passes a no-op, so the optional chain in codex/fetch.ts defended a case
  that cannot occur.
- codex/fetch.ts: collapse the 9-line comment on registerBackgroundWrite to
  one line and drop the now-unneeded `?.()` on opts.call.waitUntil.
- codex/provider.ts: delete the duplicate rationale comment on synthetic405
  (the call-site cluster already explains the why).
- gateway-ctx.ts: trim the clientRequest* block and inline the one-shot
  `new URL(c.req.url).pathname` instead of allocating a URL twice.
- attempt-helpers.ts: trim the 9-line preamble on buildUpstreamCallOptions
  and drop the conditional spread — straight assignment of an optional
  source to an optional target is equivalent.
- test-utils/stubs.ts: drop the redundant `waitUntil discards the promise`
  sentence; the `() => {}` is self-explanatory.
- codex/fetch_test.ts: enforcingRecorder now supplies waitUntil to match
  the tightened UpstreamCallOptions shape.

Also fixes a preexisting import-order lint error in responses/websocket_test.ts
that was inherited from the branch base (origin/main has the fixed order
in a commit our base predates); reordering here keeps `pnpm run lint`
green for this PR.
The `clientRequestPathname` field on `UpstreamCallOptions` was added in
aacceae anticipating a consumer that keys on the inbound pathname (e.g.
native-client shaped-passthrough detection). No provider in this PR reads
it, and the would-be downstream consumer in the Claude Code provider is
out of scope here. Carrying a field with no consumer obscures the
interface and invites speculative wiring elsewhere — drop it now and add
it back the day a real callsite needs it.

Removes the field from `UpstreamCallOptions`, the matching `GatewayCtx`
mirror, both gateway-ctx constructors, and the projection in
`buildUpstreamCallOptions`. `clientRequestHeaders` stays because the
upstream provider transports do consult it (Copilot's
shaped-passthrough leg in particular).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant