Skip to content

[NV] Update Qwen FP8 B200 Configs#918

Open
ankursingh-nv wants to merge 5 commits intomainfrom
nv/qwen-fp8-tp8
Open

[NV] Update Qwen FP8 B200 Configs#918
ankursingh-nv wants to merge 5 commits intomainfrom
nv/qwen-fp8-tp8

Conversation

@ankursingh-nv
Copy link
Collaborator

No description provided.

@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

2 similar comments
@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

fix PR number
--context-length $CONTEXT_LENGTH --disable-radix-cache \
--attention-backend trtllm_mha --moe-runner-backend flashinfer_trtllm \
--scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
--enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update sglang cookbook?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward config update switching from TP4/EP4 to TP8/EP1 with wider concurrency sweep and adding allreduce fusion flag.

Extended reasoning...

Overview

This PR modifies three files: .github/configs/nvidia-master.yaml (simplifies search space for qwen3.5-fp8-b200-sglang from mixed TP4/EP4 to unified TP8/EP1 with concurrency 4-128, bumps CUDA image from cu129 to cu130), benchmarks/single_node/qwen3.5_fp8_b200.sh (adds --enable-flashinfer-allreduce-fusion flag to the SGLang server launch command), and perf-changelog.yaml (adds a changelog entry documenting the changes).

Security risks

None. These are benchmark configuration and launch parameter changes with no security implications.

Level of scrutiny

Low scrutiny is appropriate. This is a mechanical benchmark config update following well-established patterns in the repository. The parallelism change (TP4/EP4 → TP8/EP1) and the addition of a single server flag are common types of tuning changes seen throughout the PR history.

Other factors

The only issue found is a placeholder pull/XXX in the perf-changelog entry instead of pull/918. This is a minor documentation nit — the repo already has multiple pre-existing pull/XXX entries from prior merged PRs, indicating this is a known low-priority issue. The inline comment on this nit is already attached to the review. No outstanding human reviewer comments exist on this PR.

--context-length $CONTEXT_LENGTH --disable-radix-cache \
--attention-backend trtllm_mha --moe-runner-backend flashinfer_trtllm \
--scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
--enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants