feat(provider): plumb waitUntil + inbound request headers through UpstreamCallOptions#62
Open
Menci wants to merge 5 commits into
Open
feat(provider): plumb waitUntil + inbound request headers through UpstreamCallOptions#62Menci wants to merge 5 commits into
Menci wants to merge 5 commits into
Conversation
…CallOptions Add optional `clientRequestHeaders` and `clientRequestPathname` to `UpstreamCallOptions` and capture them once on GatewayCtx from the inbound Hono request. Each `attempt.ts` consumes them via a new `buildUpstreamCallOptions(candidate, ctx, recorder)` helper so the build is uniform across messages / responses / chat-completions and across the generate / count-tokens / compact entry points. The captured Headers is the native `Headers` instance the runtime gave us, not a snapshot — header lookups on the hot path stay native. The UpstreamCallOptions field accepts either `Headers` or a plain `Record<string, string>` so synthetic test sites can pass a record without boxing. Synthetic ctxs (in-process tests, translation chains that never originated as a real HTTP request) leave both fields undefined. Consumers that key on the inbound request shape (e.g. native-client shaped-passthrough detection) must treat undefined as "no inbound context, do not shortcut".
…and-forget persistence
Add an optional `waitUntil` slot on `UpstreamCallOptions` so providers can
register fire-and-forget promises that need to outlive the response. Reuses
the existing per-request backgroundScheduler — they share the runtime
primitive (`ExecutionContext.waitUntil` on workerd, a no-op on Node) and the
per-call name reads more naturally at the provider boundary.
Wire it through the three places that build `UpstreamCallOptions`: the
attempt-layer helper (messages / responses / chat-completions), the
passthrough-serve path (embeddings / images / count_tokens), and the
inline construction inside the responses-side image-generation server tool.
Codex's transport is the first consumer. Three best-effort state writes —
the quota snapshot on 2xx, the quota snapshot on 429, and the freshly-minted
access token on the 401-retry path — used to rely on a bare
`.catch(() => {})` fire-and-forget. On workerd a fire-and-forget promise is
cancelled the instant the streaming response returns to the client, so the
write could be dropped mid-flight and the next request would re-mint or
re-race the upstream. Hand each write to the runtime's waitUntil slot via
a single `registerBackgroundWrite` helper. The error swallow stays (these
writes are recoverable noise — CAS losses to concurrent rotations,
transient storage failures — and the request must not crash on them), but
the swallow now happens inside a lifetime the runtime promises us.
The slot is optional on UpstreamCallOptions because synthetic call sites
(in-process tests, internal compaction) legitimately omit it; the optional
chain in `registerBackgroundWrite` degrades cleanly for those.
…king stack trace
The Codex upstream only exposes /responses; the catalog mapper advertises
that single endpoint and getProvidedModels is the only place the data
plane learns what surfaces are reachable. Even so, a stray dispatch from
an upstream routing bug would land in one of the stub `call*` methods on
the provider — and the stubs rejected with a bare `Error`, which escaped
the boundary as a 500 with a raw stack trace.
Replace each stub with a synthetic 405 carrying the gateway's standard
JSON error envelope (`{ error: { type, message } }`). Each call still
flows through the per-call `recordUpstreamLatency` recorder to honour
the wrap-once contract that every boundary-facing response must satisfy,
including synthetic ones that never touch the network.
The existing throw-asserting test is replaced with one that exercises
every unsupported surface (now including callMessages and
callChatCompletions, which the previous assertion missed) and verifies
the 405 envelope.
Reviewed the 3-commit precursor diff as a single final state and
removed the scaffolding that crept in:
- UpstreamCallOptions: drop the speculative `Headers | Record<string, string>`
union on clientRequestHeaders (no caller passes a Record) and the matching
JSDoc block; trim the per-field preamble to one short paragraph per field.
Make `waitUntil` required — every production callsite supplies it via
ctx.backgroundScheduler or its local scheduler, and the test stub already
passes a no-op, so the optional chain in codex/fetch.ts defended a case
that cannot occur.
- codex/fetch.ts: collapse the 9-line comment on registerBackgroundWrite to
one line and drop the now-unneeded `?.()` on opts.call.waitUntil.
- codex/provider.ts: delete the duplicate rationale comment on synthetic405
(the call-site cluster already explains the why).
- gateway-ctx.ts: trim the clientRequest* block and inline the one-shot
`new URL(c.req.url).pathname` instead of allocating a URL twice.
- attempt-helpers.ts: trim the 9-line preamble on buildUpstreamCallOptions
and drop the conditional spread — straight assignment of an optional
source to an optional target is equivalent.
- test-utils/stubs.ts: drop the redundant `waitUntil discards the promise`
sentence; the `() => {}` is self-explanatory.
- codex/fetch_test.ts: enforcingRecorder now supplies waitUntil to match
the tightened UpstreamCallOptions shape.
Also fixes a preexisting import-order lint error in responses/websocket_test.ts
that was inherited from the branch base (origin/main has the fixed order
in a commit our base predates); reordering here keeps `pnpm run lint`
green for this PR.
The `clientRequestPathname` field on `UpstreamCallOptions` was added in aacceae anticipating a consumer that keys on the inbound pathname (e.g. native-client shaped-passthrough detection). No provider in this PR reads it, and the would-be downstream consumer in the Claude Code provider is out of scope here. Carrying a field with no consumer obscures the interface and invites speculative wiring elsewhere — drop it now and add it back the day a real callsite needs it. Removes the field from `UpstreamCallOptions`, the matching `GatewayCtx` mirror, both gateway-ctx constructors, and the projection in `buildUpstreamCallOptions`. `clientRequestHeaders` stays because the upstream provider transports do consult it (Copilot's shaped-passthrough leg in particular).
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
One of four parallel precursor PRs splitting cross-cutting infrastructure out of #60 (claude-code subscription provider).
UpstreamCallOptions.waitUntil— fire-and-forget promise registration that survives past the response. Providers use it for post-response persistence (quota snapshots, token rotation). Maps toExecutionContext.waitUntilon Workers; no-op on Node.UpstreamCallOptions.clientRequestHeaders+clientRequestPathname— inbound HTTP request inspection so providers can detect shaped-passthrough patterns. Set only when the call originates from a real Hono request; absent on translated/synthetic paths./responsesis the only surface; any other call (chat-completions / messages / embeddings / images) returns a clean 405 envelope instead of leaking a stack trace.Test plan
Follow-ups in sibling precursors / claude-code PR