fix(session-server): strip stale hop-by-hop headers when re-emitting proxy response#4
Open
DavidBellamy wants to merge 33 commits intomainfrom
Open
fix(session-server): strip stale hop-by-hop headers when re-emitting proxy response#4DavidBellamy wants to merge 33 commits intomainfrom
DavidBellamy wants to merge 33 commits intomainfrom
Conversation
…pdating (radixark#890) Co-authored-by: Yueming Yuan <yym022502@gmail.com>
Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>
…adixark#654) Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… CLI args in swe-agent-v2 (radixark#954) Co-authored-by: Shi Dong <shi.dong@radixark.ai>
…#952) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
…adixark#926) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmers >=5.0 (radixark#927) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#948) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…log dtype (radixark#975) Co-authored-by: yueming-yuan <yym022502@gmail.com>
…adixark#974) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…proxy response
build_proxy_response forwards upstream gateway headers into either JSONResponse
or Response. Both re-serialize / re-frame the body:
* JSONResponse runs json.dumps over the parsed content, so whitespace and
unicode escape behavior may produce a different byte count than the upstream
did.
* Response may be re-framed by Starlette with chunked transfer encoding.
Forwarding the upstream content-length, transfer-encoding, or content-encoding
in these cases causes a mismatch between declared framing and the bytes
Starlette actually writes. Clients (e.g. Miles's own http_utils.post) then
error with h11 LocalProtocolError 'Too much data for declared Content-Length'
or 'peer closed connection without sending complete message body' and retry.
Observed: on a mock-agent FAST_ITER run with PD disaggregation through a
gateway that serializes merged prefill+decode logprobs, ~200 of 332 chat
completions hit this error before mock retries salvaged training progress.
Strip the three hop-by-hop headers before building the outgoing Response;
Starlette / hyper then recompute the correct framing from the actual body.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirror of upstream radixark#1004 filed against LLM360/miles so the fix lands in
LLM360:deployvia the 15-min auto-rebase without waiting on upstream review.Problem
SessionServer.build_proxy_responseinmiles/rollout/session/session_server.pyforwards upstream response headers verbatim into eitherJSONResponseorResponse. Both re-serialize or re-frame the body:JSONResponserunsjson.dumpsover the parsed content. Whitespace, unicode escape behavior, or key ordering may produce a different byte count than what the upstream produced.Responsemay be re-framed by Starlette with chunked transfer encoding.Forwarding the upstream
content-length,transfer-encoding, orcontent-encodingin these cases causes a mismatch between the declared framing and the bytes Starlette actually writes. Clients (e.g. Miles's ownhttp_utils.post) then error withh11._util.LocalProtocolError: Too much data for declared Content-Lengthorpeer closed connection without sending complete message body (received 0 bytes, expected N)and retry.Fix
One-hunk change in
build_proxy_response: strip the three hop-by-hop headers fromresult["headers"]before passing them to the outgoing Response. Starlette/hyper then computecontent-lengthfrom the actual body they write. Mirrors whatdo_proxyalready does on the incoming request path.