feat(gateway): forward upstream response headers (anthropic-ratelimit-* + request-id + cf-ray)#64
Open
Menci wants to merge 2 commits into
Open
feat(gateway): forward upstream response headers (anthropic-ratelimit-* + request-id + cf-ray)#64Menci wants to merge 2 commits into
Menci wants to merge 2 commits into
Conversation
…sponse headers across LLM surfaces Real Claude Code's `/status` indicator reads anthropic-ratelimit-unified-* headers off every /v1/messages response. When CC is pointed at our gateway, those headers must reach the downstream client untouched or the status bar shows nothing. Operator support tickets and live debugging additionally need request-id / x-request-id (Anthropic / OpenAI vendor traces) and cf-ray (Cloudflare edge trace) to flow through verbatim. Adds a prefix allowlist (`anthropic-ratelimit-`) plus an exact-name allowlist (`request-id`, `x-request-id`, `cf-ray`) on the response composer; future ratelimit dimensions the upstream introduces (e.g. a future `anthropic-ratelimit-tier-*`) are forwarded automatically without touching the composition logic. Two helpers cover both response shapes: `forwardUpstreamHeaders` stages the allowlisted entries onto the Hono context so `streamSSE` emits them, `mergeForwardedUpstreamHeaders` builds a `HeadersInit` for the non-streaming `Response.json` path. Both accept `undefined` so callers can pass `result.headers` directly. Wired into all four LLM surfaces (Messages / Chat Completions / Responses / Gemini). `EventResult.headers` is the field providers populate from the upstream Response so the source-side `respond` layer can read them; the provider-side plumbing lands as part of the broader provider rework.
…t so respond layer forwards them The header-forwarding helpers added in commit f3446be read `result.headers` at 8 respond.ts call sites, but no provider was populating the field — every streaming success funneled `undefined` to `eventResult`. Without the wire, the allowlist (`anthropic-ratelimit-*`, `request-id`, `x-request-id`, `cf-ray`) was dead code on the happy path. Thread the upstream `Headers` through the streaming-success branch of `ProviderStreamResult` and propagate it from `streamingProviderCall` (populated for every provider that goes through it: Copilot, Custom, Azure, Codex) all the way to the EventResult that `respond` reads. The single shared `providerStreamResultToExecuteResult` helper is the seam where every protocol's `attempt` converts ProviderStreamResult into EventResult, so wiring it once covers Messages, Chat Completions, Responses, and Gemini (which reaches its upstreams via translation through the other three). The field stays optional on `ProviderStreamResult.ok:true`, matching the same shape on `EventResult`: synthesized streams that have no upstream Response behind them (e.g. a future Copilot boundary interceptor that constructs events from a non-wire source) genuinely have nothing to forward, so the contract reflects that rather than forcing producers to fabricate an empty `Headers`. Also forwards `headers` through the two existing EventResult rebuild sites that drop fields by default — the Responses `canonicalize-encrypted-content` interceptor and the `responsesAttempt.generate` wrap that mints the stored response id — so a header that survives the provider boundary survives the inner chain too. Tests added: - Per-protocol `attempt_test.ts`: stub the provider with a known Headers fixture and assert it lands on the resulting EventResult (Messages, Chat Completions, Responses, Gemini via Chat Completions). - `messages/http_test.ts`: full provider → attempt → respond chain for both streaming and non-streaming, asserting allowlisted entries reach the outgoing HTTP response and non-allowlisted ones do not.
f5b44bb to
5f83545
Compare
1 task
7c9a7be to
4153be2
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Forward upstream response headers
anthropic-ratelimit-*(prefix) +request-id/x-request-id/cf-ray(exact-name) to downstream LLM-response clients across Messages / Chat / Responses / Gemini surfaces.Anthropic's CLI
/statusindicator reads these to render quota state; operator support tickets needrequest-id+cf-rayto correlate upstream failures.Includes the
ProviderStreamResult.ok:trueshape extension (a requiredresponseHeaders: Headersfield) so every provider must consciously populate from the upstream Response. ~36 test fixtures updated workspace-wide.Test plan