Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
fix PR number
f32b1f9 to
6d9ff42
Compare
| --context-length $CONTEXT_LENGTH --disable-radix-cache \ | ||
| --attention-backend trtllm_mha --moe-runner-backend flashinfer_trtllm \ | ||
| --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \ | ||
| --enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \ |
There was a problem hiding this comment.
update sglang cookbook?
There was a problem hiding this comment.
LGTM — straightforward config update switching from TP4/EP4 to TP8/EP1 with wider concurrency sweep and adding allreduce fusion flag.
Extended reasoning...
Overview
This PR modifies three files: .github/configs/nvidia-master.yaml (simplifies search space for qwen3.5-fp8-b200-sglang from mixed TP4/EP4 to unified TP8/EP1 with concurrency 4-128, bumps CUDA image from cu129 to cu130), benchmarks/single_node/qwen3.5_fp8_b200.sh (adds --enable-flashinfer-allreduce-fusion flag to the SGLang server launch command), and perf-changelog.yaml (adds a changelog entry documenting the changes).
Security risks
None. These are benchmark configuration and launch parameter changes with no security implications.
Level of scrutiny
Low scrutiny is appropriate. This is a mechanical benchmark config update following well-established patterns in the repository. The parallelism change (TP4/EP4 → TP8/EP1) and the addition of a single server flag are common types of tuning changes seen throughout the PR history.
Other factors
The only issue found is a placeholder pull/XXX in the perf-changelog entry instead of pull/918. This is a minor documentation nit — the repo already has multiple pre-existing pull/XXX entries from prior merged PRs, indicating this is a known low-priority issue. The inline comment on this nit is already attached to the review. No outstanding human reviewer comments exist on this PR.
| --context-length $CONTEXT_LENGTH --disable-radix-cache \ | ||
| --attention-backend trtllm_mha --moe-runner-backend flashinfer_trtllm \ | ||
| --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \ | ||
| --enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \ |
No description provided.