fix(session-server): strip stale hop-by-hop headers when re-emitting proxy response by DavidBellamy · Pull Request #4 · LLM360/miles

DavidBellamy · 2026-04-17T22:52:33Z

Mirror of upstream radixark#1004 filed against LLM360/miles so the fix lands in LLM360:deploy via the 15-min auto-rebase without waiting on upstream review.

Problem

SessionServer.build_proxy_response in miles/rollout/session/session_server.py forwards upstream response headers verbatim into either JSONResponse or Response. Both re-serialize or re-frame the body:

JSONResponse runs json.dumps over the parsed content. Whitespace, unicode escape behavior, or key ordering may produce a different byte count than what the upstream produced.
Response may be re-framed by Starlette with chunked transfer encoding.

Forwarding the upstream content-length, transfer-encoding, or content-encoding in these cases causes a mismatch between the declared framing and the bytes Starlette actually writes. Clients (e.g. Miles's own http_utils.post) then error with h11._util.LocalProtocolError: Too much data for declared Content-Length or peer closed connection without sending complete message body (received 0 bytes, expected N) and retry.

Fix

One-hunk change in build_proxy_response: strip the three hop-by-hop headers from result["headers"] before passing them to the outgoing Response. Starlette/hyper then compute content-length from the actual body they write. Mirrors what do_proxy already does on the incoming request path.

…weight updates" (radixark#882)

…pdating (radixark#890) Co-authored-by: Yueming Yuan <yym022502@gmail.com>

…rk#871)

Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>

…adixark#654) Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… CLI args in swe-agent-v2 (radixark#954) Co-authored-by: Shi Dong <shi.dong@radixark.ai>

…#952) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

…adixark#926) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rmers >=5.0 (radixark#927) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ers >=5.0 (radixark#931)

…#935)

…#948) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…adixark#961)

…match (radixark#885)

…log dtype (radixark#975) Co-authored-by: yueming-yuan <yym022502@gmail.com>

…ark#960)

…#982)

…adixark#974) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…proxy response build_proxy_response forwards upstream gateway headers into either JSONResponse or Response. Both re-serialize / re-frame the body: * JSONResponse runs json.dumps over the parsed content, so whitespace and unicode escape behavior may produce a different byte count than the upstream did. * Response may be re-framed by Starlette with chunked transfer encoding. Forwarding the upstream content-length, transfer-encoding, or content-encoding in these cases causes a mismatch between declared framing and the bytes Starlette actually writes. Clients (e.g. Miles's own http_utils.post) then error with h11 LocalProtocolError 'Too much data for declared Content-Length' or 'peer closed connection without sending complete message body' and retry. Observed: on a mock-agent FAST_ITER run with PD disaggregation through a gateway that serializes merged prefill+decode logprobs, ~200 of 332 chat completions hit this error before mock retries salvaged training progress. Strip the three hop-by-hop headers before building the outgoing Response; Starlette / hyper then recompute the correct framing from the actual body.

JD-ETH and others added 30 commits April 5, 2026 13:51

Revert "[BUGFIX] [P2PRDMA] Add rollout post-processing after P2PRDMA …

e0fc889

…weight updates" (radixark#882)

[Fix] fix ci (radixark#894)

ef5dda6

Avoid threading for ray getting object (radixark#886)

a3db3a9

Add explicit errors for unsupported Megatron profiles (radixark#887)

4dd7770

Add nvfp4 quantizer files (radixark#907)

649a353

Bump flash-linear-attention version to 0.4.2 (radixark#892)

3572922

[BUGFIX] Invoke "post_process_quantization" by default after weight u…

8146a78

…pdating (radixark#890) Co-authored-by: Yueming Yuan <yym022502@gmail.com>

Add heartbeat and id to session server (radixark#866)

eaa36a2

fix: adding thin glm5 image to docker build + latest tag sync (radixa…

70dc402

…rk#871)

Add consistent hashing routing policy for rollout (radixark#891)

c198efa

Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>

[example] add retool v2 example with multi-turn framework interfaces (r…

afc5b55

…adixark#654) Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Expose rollout-batch-size, n-samples-per-prompt, global-batch-size as…

4db9bfe

… CLI args in swe-agent-v2 (radixark#954) Co-authored-by: Shi Dong <shi.dong@radixark.ai>

chore: remove obsolete swe-agent server.py and run-qwen3.sh (radixark…

6b58ebd

…#952) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add weight staleness control for fully async rollout (radixark#958)

41615af

Fix/pause generation mode (radixark#924)

94dbb8f

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

[v0.5.10][1] Bump sglang to v0.5.10 (radixark#898)

4d8b007

[v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0 (r…

ef228e6

…adixark#926) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transfo…

b1a4346

…rmers >=5.0 (radixark#927) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[v0.5.10][4] Fix _no_split_modules set not subscriptable in transform…

2a99108

…ers >=5.0 (radixark#931)

[v0.5.10][5] Disable piecewise cuda graph to avoid NVLS oom (radixark…

c74392d

…#935)

[v0.5.10][7][FSDP] move FSDP to experimental and disable by default (r…

c4e50c8

…adixark#961)

Add skiplist and more robust calculation on val (radixark#965)

8d66ac1

[fix] tiny fix debug rollout only in weight version check (radixark#967)

02f6e05

feat: real cp support with relayout fix for qwen3.5 train/rollout mis…

eb294e3

…match (radixark#885)

[AMD] Upgrade to sglv0.5.10 (radixark#973)

82bf196

switch model to actor (radixark#756)

ef7481a

[fix] support general logic to bypass fp32 downcast and fix qwen35 A_…

85fe651

…log dtype (radixark#975) Co-authored-by: yueming-yuan <yym022502@gmail.com>

fix: populate prefix_cache_info in OpenAI/session rollout path (radix…

6cc3feb

…ark#960)

Remove prepare_harbor_tasks.py; use harbor-private adapters (radixark…

6706c73

…#982)

maocheng23 and others added 3 commits April 15, 2026 11:51

[fix] Skip flush_cache in in_place mode and add fully async example (r…

f144961

…adixark#974) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GLM47 full cmd for async and sync reasoning (radixark#986)

c271e14

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session-server): strip stale hop-by-hop headers when re-emitting proxy response#4

fix(session-server): strip stale hop-by-hop headers when re-emitting proxy response#4
DavidBellamy wants to merge 33 commits intomainfrom
fix/session-server-strip-stale-content-length-clean

DavidBellamy commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

DavidBellamy commented Apr 17, 2026

Problem

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants