Skip to content

Strip duplicate Server header in session-server proxy#13

Merged
odp merged 1 commit intoprodfrom
fix-session-server-duplicate-server-header
May 6, 2026
Merged

Strip duplicate Server header in session-server proxy#13
odp merged 1 commit intoprodfrom
fix-session-server-duplicate-server-header

Conversation

@DavidBellamy
Copy link
Copy Markdown
Collaborator

Background

The agentic RL inference path runs two HTTP servers: smg's main router (the Rust inference request router that sglang uses) and a session server (a Python FastAPI proxy in miles/rollout/session/session_server.py). The session server wraps the main router and gives it per-session (aka rollout) URLs to hold onto state.

The high-level chain is: harbor agent ↔ session server ↔ smg main router ↔ sglang (forward direction = request, reverse direction = response).

The bug

In the response direction, the session server is letting the upstream smg Server: header pass through to the FastAPI/uvicorn layer, which then adds its own Server: uvicorn header. As a result, the response has duplicate Server: headers, which is invalid HTTP per RFC 7230. Harbor (technically the litellm client harbor uses, which parses HTTP via aiohttp's strict llhttp parser) rejects the response as malformed. Harbor then retries the request up to its max (~10 times) then gives up.

The fix

One-line change in miles/rollout/session/session_server.py's build_proxy_response: add "server" to the strip-list alongside the existing "content-length" and "transfer-encoding" strips. The session server now drops the upstream Server: header so only uvicorn's own survives — single header, parsers happy.

Notes

  • Whether the session server is used at all is a flag set in miles via --use-session-server. We want to use it so the harbor↔inference chain can be stateful within a trajectory (e.g. for TITO).
  • The pre-existing strip-list comment in this method already documents the same category of bug ("forwarding upstream's content-length verbatim breaks uvicorn h11..."). This fix extends that strip-list with the same logic, just for the Server: header.

Discovered during

LLM360/RL360 PR radixark#285 smoke run (job 1600118). All four trials failed at the LLM call step with litellm.InternalServerError: Connection error. The actual underlying exception in the trial trajectory was aiohttp.http_exceptions.BadHttpMessage: 400, message: Duplicate 'Server' header found. against URL http://10.24.2.107:5896/sessions/<sid>/v1/chat/completions. With this fix, harbor agents through the session server should no longer hit the duplicate-header rejection.

When the session server proxies a response back from smg's main router,
it now strips the upstream Server header before constructing the FastAPI
response. Without this, FastAPI/uvicorn adds its own Server header on
top of the one already there, producing duplicate Server headers, which
aiohttp's strict HTTP parser (used by litellm in harbor) rejects as
malformed (BadHttpMessage 400). Harbor retries the request ~10 times
then gives up, breaking agent rollouts whenever --use-session-server
is enabled.

Mirrors the existing strip-list pattern for content-length and
transfer-encoding (same category of bug: a header the local HTTP
stack regenerates).
@DavidBellamy DavidBellamy requested a review from a team as a code owner May 5, 2026 16:14
Copy link
Copy Markdown

@odp odp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLGTM

@odp odp merged commit 1bc00c6 into prod May 6, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants