Fix missing disagg_request_id fallback in context responses#12893
Open
pich4ya wants to merge 1 commit intoNVIDIA:mainfrom
Open
Fix missing disagg_request_id fallback in context responses#12893pich4ya wants to merge 1 commit intoNVIDIA:mainfrom
pich4ya wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Contributor
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughModified error handling in the context-phase response verification to fall back to using Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This change makes the disaggregated proxy tolerant of context-phase responses that include
ctx_request_idbut omitdisagg_request_id.Instead of failing hard in
_verify_ctx_response, the proxy now falls back toctx_request_idand logs a warning.Why
On a live 2-node DGX Spark disaggregated setup (
gpt-oss-120b, UCX KV-cache transfer over RoCE/RDMA), the context server returned:disaggregated_params.ctx_request_id != Nonedisaggregated_params.disagg_request_id == NoneThe proxy then raised:
That prevented an otherwise healthy context/generation/proxy stack from serving requests.
Rationale for the fallback
For context-first disaggregated flow,
ctx_request_idis already the request identifier that the generation side uses to continue the request. Whendisagg_request_idis absent butctx_request_idis present, usingctx_request_idpreserves the existing request identity instead of aborting the request.This is intentionally conservative:
disaggregated_paramsis missingctx_request_idis missingdisagg_request_idwhen it is the sole missing fieldTest
Added a regression test to verify
_verify_ctx_responseaccepts a context-phase response with:ctx_request_idsetdisagg_request_idunsetand backfills
disagg_request_idfromctx_request_id.Validation performed
Local validation:
python3 -m py_compile tensorrt_llm/serve/openai_disagg_service.py tests/unittest/disaggregated/test_openai_disagg_service.pyI could not run the full pytest target in this lightweight clone because the local environment does not include TensorRT-LLM's Python test dependencies (for example
transformers).Reproduction environment
Observed on:
1.3.0rc10Summary by CodeRabbit