Strip duplicate Server header in session-server proxy#13
Merged
Conversation
When the session server proxies a response back from smg's main router, it now strips the upstream Server header before constructing the FastAPI response. Without this, FastAPI/uvicorn adds its own Server header on top of the one already there, producing duplicate Server headers, which aiohttp's strict HTTP parser (used by litellm in harbor) rejects as malformed (BadHttpMessage 400). Harbor retries the request ~10 times then gives up, breaking agent rollouts whenever --use-session-server is enabled. Mirrors the existing strip-list pattern for content-length and transfer-encoding (same category of bug: a header the local HTTP stack regenerates).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
The agentic RL inference path runs two HTTP servers: smg's main router (the Rust inference request router that sglang uses) and a session server (a Python FastAPI proxy in
miles/rollout/session/session_server.py). The session server wraps the main router and gives it per-session (aka rollout) URLs to hold onto state.The high-level chain is: harbor agent ↔ session server ↔ smg main router ↔ sglang (forward direction = request, reverse direction = response).
The bug
In the response direction, the session server is letting the upstream smg
Server:header pass through to the FastAPI/uvicorn layer, which then adds its ownServer: uvicornheader. As a result, the response has duplicateServer:headers, which is invalid HTTP per RFC 7230. Harbor (technically the litellm client harbor uses, which parses HTTP via aiohttp's strict llhttp parser) rejects the response as malformed. Harbor then retries the request up to its max (~10 times) then gives up.The fix
One-line change in
miles/rollout/session/session_server.py'sbuild_proxy_response: add"server"to the strip-list alongside the existing"content-length"and"transfer-encoding"strips. The session server now drops the upstreamServer:header so only uvicorn's own survives — single header, parsers happy.Notes
--use-session-server. We want to use it so the harbor↔inference chain can be stateful within a trajectory (e.g. for TITO).Server:header.Discovered during
LLM360/RL360 PR radixark#285 smoke run (job 1600118). All four trials failed at the LLM call step with
litellm.InternalServerError: Connection error.The actual underlying exception in the trial trajectory wasaiohttp.http_exceptions.BadHttpMessage: 400, message: Duplicate 'Server' header found.against URLhttp://10.24.2.107:5896/sessions/<sid>/v1/chat/completions. With this fix, harbor agents through the session server should no longer hit the duplicate-header rejection.