feat(sglang_engine): allow PD worker_type on /add_worker registration path#3
Closed
DavidBellamy wants to merge 1 commit intomainfrom
Closed
feat(sglang_engine): allow PD worker_type on /add_worker registration path#3DavidBellamy wants to merge 1 commit intomainfrom
DavidBellamy wants to merge 1 commit intomainfrom
Conversation
… path The old sglang_router (<=0.2.1) and the miles-router both use the single-arg /add_worker?url=... endpoint for engine registration. Previously, the Miles engine asserted worker_type=='regular' before hitting that endpoint, so any attempt to stand up prefill/decode workers via the miles-router path (including the sgl-model-gateway that mirrors it) fail-fasts at engine init: AssertionError: pd disaggregation is not supported in old router or miles router. This blocks PD disagg throughput scaling in any deployment that uses the miles-router path, even when the receiving router (e.g. sgl-model-gateway with a PD-aware shim) can handle worker_type on /add_worker. Relax the assertion: forward worker_type (and bootstrap_port for prefill) as extra query params. Routers that honor them get PD registration; routers that only accept the single-arg form ignore the extras and register as regular, with a warning logged so the fallback is visible. The companion server-side change is on the receiving router: - sgl-model-gateway must accept ?worker_type=&bootstrap_port= on /add_worker - Or deployments can use the newer /workers endpoint (non-miles path). Context: LLM360/RL360 radixark#76. Track G (job 1559336) showed full PD KV transfer via mooncake works with SGLang's own mini_lb; this unblocks the same flow through Miles-driven rollouts.
Collaborator
Author
|
Re-opening against radixark/miles:main so the LLM360/miles deploy branch auto-builder picks it up. Same branch, same diff. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
SGLangEngine._init_normalassertsworker_type == "regular"before POSTing to the old-style/add_worker?url=...endpoint:Any attempt to stand up prefill/decode workers via the miles-router path (or a gateway that mirrors it, e.g. sgl-model-gateway) fails fast at engine init, even when the receiving router could handle PD if given the information.
This blocks PD disaggregation throughput scaling in Miles-driven rollouts. Observed during LLM360/RL360#76 overnight runs when attempting
FAST_ITER_CONFIG=pd-disaggthrough the gateway's Miles-API shim.Fix
Relax the assertion. Forward
worker_type(andbootstrap_portfor prefill workers) as extra query params on the/add_workerURL. Routers that honor them get correct PD registration; routers that only accept the single-arg form ignore the extras and register as regular, with alogger.warningso the fallback behavior is visible to operators.Companion server-side change
This PR alone doesn't make PD routing functional through an old-style router that didn't previously understand worker_type; it just removes the fail-fast assert. Operators also need one of:
/add_workerhandler to acceptworker_type=andbootstrap_port=query params (small serde change; the current shim ignores unknown params)>0.2.1): switch--no-use-miles-routerso Miles takes thePOST /workersJSON path that already supports PDTest
Unit-checked URL construction for all four shapes:
Context
LLM360/RL360#76 Track G (job 1559336) proved that full PD+Mooncake KV transfer works on M2's SR-IOV IB via SGLang's native
sglang_router.launch_router --mini-lb. This PR unblocks the same flow through Miles-driven rollouts, so agentic RL can use PD scaling without hand-written sbatch test harnesses.