Skip to content

fix(session-server): strip stale hop-by-hop headers when re-emitting proxy response#4

Open
DavidBellamy wants to merge 33 commits intomainfrom
fix/session-server-strip-stale-content-length-clean
Open

fix(session-server): strip stale hop-by-hop headers when re-emitting proxy response#4
DavidBellamy wants to merge 33 commits intomainfrom
fix/session-server-strip-stale-content-length-clean

Conversation

@DavidBellamy
Copy link
Copy Markdown
Collaborator

Mirror of upstream radixark#1004 filed against LLM360/miles so the fix lands in LLM360:deploy via the 15-min auto-rebase without waiting on upstream review.

Problem

SessionServer.build_proxy_response in miles/rollout/session/session_server.py forwards upstream response headers verbatim into either JSONResponse or Response. Both re-serialize or re-frame the body:

  • JSONResponse runs json.dumps over the parsed content. Whitespace, unicode escape behavior, or key ordering may produce a different byte count than what the upstream produced.
  • Response may be re-framed by Starlette with chunked transfer encoding.

Forwarding the upstream content-length, transfer-encoding, or content-encoding in these cases causes a mismatch between the declared framing and the bytes Starlette actually writes. Clients (e.g. Miles's own http_utils.post) then error with h11._util.LocalProtocolError: Too much data for declared Content-Length or peer closed connection without sending complete message body (received 0 bytes, expected N) and retry.

Fix

One-hunk change in build_proxy_response: strip the three hop-by-hop headers from result["headers"] before passing them to the outgoing Response. Starlette/hyper then compute content-length from the actual body they write. Mirrors what do_proxy already does on the incoming request path.

JD-ETH and others added 30 commits April 5, 2026 13:51
…pdating (radixark#890)

Co-authored-by: Yueming Yuan <yym022502@gmail.com>
Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>
…adixark#654)

Co-authored-by: GuanxingLu <gxlu02@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… CLI args in swe-agent-v2 (radixark#954)

Co-authored-by: Shi Dong <shi.dong@radixark.ai>
…#952)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
…adixark#926)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmers >=5.0 (radixark#927)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#948)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…log dtype (radixark#975)

Co-authored-by: yueming-yuan <yym022502@gmail.com>
maocheng23 and others added 3 commits April 15, 2026 11:51
…adixark#974)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…proxy response

build_proxy_response forwards upstream gateway headers into either JSONResponse
or Response. Both re-serialize / re-frame the body:

  * JSONResponse runs json.dumps over the parsed content, so whitespace and
    unicode escape behavior may produce a different byte count than the upstream
    did.
  * Response may be re-framed by Starlette with chunked transfer encoding.

Forwarding the upstream content-length, transfer-encoding, or content-encoding
in these cases causes a mismatch between declared framing and the bytes
Starlette actually writes. Clients (e.g. Miles's own http_utils.post) then
error with h11 LocalProtocolError 'Too much data for declared Content-Length'
or 'peer closed connection without sending complete message body' and retry.

Observed: on a mock-agent FAST_ITER run with PD disaggregation through a
gateway that serializes merged prefill+decode logprobs, ~200 of 332 chat
completions hit this error before mock retries salvaged training progress.

Strip the three hop-by-hop headers before building the outgoing Response;
Starlette / hyper then recompute the correct framing from the actual body.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.