Skip to content

feat(gateway): forward upstream response headers (anthropic-ratelimit-* + request-id + cf-ray)#64

Open
Menci wants to merge 2 commits into
mainfrom
precursor-response-surface
Open

feat(gateway): forward upstream response headers (anthropic-ratelimit-* + request-id + cf-ray)#64
Menci wants to merge 2 commits into
mainfrom
precursor-response-surface

Conversation

@Menci

@Menci Menci commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

Forward upstream response headers anthropic-ratelimit-* (prefix) + request-id / x-request-id / cf-ray (exact-name) to downstream LLM-response clients across Messages / Chat / Responses / Gemini surfaces.

Anthropic's CLI /status indicator reads these to render quota state; operator support tickets need request-id + cf-ray to correlate upstream failures.

Includes the ProviderStreamResult.ok:true shape extension (a required responseHeaders: Headers field) so every provider must consciously populate from the upstream Response. ~36 test fixtures updated workspace-wide.

Test plan

  • typecheck + test + lint clean

Menci added 2 commits June 20, 2026 00:38
…sponse headers across LLM surfaces

Real Claude Code's `/status` indicator reads anthropic-ratelimit-unified-*
headers off every /v1/messages response. When CC is pointed at our gateway,
those headers must reach the downstream client untouched or the status bar
shows nothing. Operator support tickets and live debugging additionally
need request-id / x-request-id (Anthropic / OpenAI vendor traces) and
cf-ray (Cloudflare edge trace) to flow through verbatim.

Adds a prefix allowlist (`anthropic-ratelimit-`) plus an exact-name
allowlist (`request-id`, `x-request-id`, `cf-ray`) on the response
composer; future ratelimit dimensions the upstream introduces (e.g. a
future `anthropic-ratelimit-tier-*`) are forwarded automatically without
touching the composition logic.

Two helpers cover both response shapes: `forwardUpstreamHeaders` stages
the allowlisted entries onto the Hono context so `streamSSE` emits them,
`mergeForwardedUpstreamHeaders` builds a `HeadersInit` for the
non-streaming `Response.json` path. Both accept `undefined` so callers
can pass `result.headers` directly. Wired into all four LLM surfaces
(Messages / Chat Completions / Responses / Gemini).

`EventResult.headers` is the field providers populate from the upstream
Response so the source-side `respond` layer can read them; the
provider-side plumbing lands as part of the broader provider rework.
…t so respond layer forwards them

The header-forwarding helpers added in commit f3446be read
`result.headers` at 8 respond.ts call sites, but no provider was
populating the field — every streaming success funneled `undefined` to
`eventResult`. Without the wire, the allowlist (`anthropic-ratelimit-*`,
`request-id`, `x-request-id`, `cf-ray`) was dead code on the happy path.

Thread the upstream `Headers` through the streaming-success branch of
`ProviderStreamResult` and propagate it from `streamingProviderCall`
(populated for every provider that goes through it: Copilot, Custom,
Azure, Codex) all the way to the EventResult that `respond` reads.
The single shared `providerStreamResultToExecuteResult` helper is the
seam where every protocol's `attempt` converts ProviderStreamResult
into EventResult, so wiring it once covers Messages, Chat Completions,
Responses, and Gemini (which reaches its upstreams via translation
through the other three).

The field stays optional on `ProviderStreamResult.ok:true`, matching
the same shape on `EventResult`: synthesized streams that have no
upstream Response behind them (e.g. a future Copilot boundary
interceptor that constructs events from a non-wire source) genuinely
have nothing to forward, so the contract reflects that rather than
forcing producers to fabricate an empty `Headers`.

Also forwards `headers` through the two existing EventResult rebuild
sites that drop fields by default — the Responses
`canonicalize-encrypted-content` interceptor and the
`responsesAttempt.generate` wrap that mints the stored response id —
so a header that survives the provider boundary survives the inner
chain too.

Tests added:
- Per-protocol `attempt_test.ts`: stub the provider with a known
  Headers fixture and assert it lands on the resulting EventResult
  (Messages, Chat Completions, Responses, Gemini via Chat
  Completions).
- `messages/http_test.ts`: full provider → attempt → respond chain
  for both streaming and non-streaming, asserting allowlisted entries
  reach the outgoing HTTP response and non-allowlisted ones do not.
@Menci Menci changed the title Gateway response surface: ratelimit/request-id forwarding, partial-stream billing, 60s SSE idle timeout feat(gateway): response header forwarding + SSE idle + partial-stream billing fix Jun 19, 2026
@Menci Menci force-pushed the precursor-response-surface branch from f5b44bb to 5f83545 Compare June 19, 2026 18:00
@Menci Menci changed the title feat(gateway): response header forwarding + SSE idle + partial-stream billing fix feat(gateway): forward upstream response headers (anthropic-ratelimit-* + request-id + cf-ray) Jun 19, 2026
@Menci Menci force-pushed the precursor-response-surface branch 3 times, most recently from 7c9a7be to 4153be2 Compare June 19, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant