[NV] Update Qwen FP8 B200 Configs by ankursingh-nv · Pull Request #918 · SemiAnalysisAI/InferenceX

ankursingh-nv · 2026-03-19T02:08:55Z

No description provided.

github-actions · 2026-03-19T02:09:05Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-19T02:09:05Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-19T02:09:05Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

fix PR number

functionstackx · 2026-03-19T02:11:45Z

benchmarks/single_node/qwen3.5_fp8_b200.sh

 --context-length $CONTEXT_LENGTH --disable-radix-cache \
 --attention-backend trtllm_mha --moe-runner-backend flashinfer_trtllm \
--scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
+--enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \


update sglang cookbook?

claude

LGTM — straightforward config update switching from TP4/EP4 to TP8/EP1 with wider concurrency sweep and adding allreduce fusion flag.

Extended reasoning...

Overview

This PR modifies three files: .github/configs/nvidia-master.yaml (simplifies search space for qwen3.5-fp8-b200-sglang from mixed TP4/EP4 to unified TP8/EP1 with concurrency 4-128, bumps CUDA image from cu129 to cu130), benchmarks/single_node/qwen3.5_fp8_b200.sh (adds --enable-flashinfer-allreduce-fusion flag to the SGLang server launch command), and perf-changelog.yaml (adds a changelog entry documenting the changes).

Security risks

None. These are benchmark configuration and launch parameter changes with no security implications.

Level of scrutiny

Low scrutiny is appropriate. This is a mechanical benchmark config update following well-established patterns in the repository. The parallelism change (TP4/EP4 → TP8/EP1) and the addition of a single server flag are common types of tuning changes seen throughout the PR history.

Other factors

The only issue found is a placeholder pull/XXX in the perf-changelog entry instead of pull/918. This is a minor documentation nit — the repo already has multiple pre-existing pull/XXX entries from prior merged PRs, indicating this is a known low-priority issue. The inline comment on this nit is already attached to the review. No outstanding human reviewer comments exist on this PR.

perf-changelog.yaml

kedarpotdar-nv · 2026-03-19T06:16:56Z

benchmarks/single_node/qwen3.5_fp8_b200.sh

 --context-length $CONTEXT_LENGTH --disable-radix-cache \
 --attention-backend trtllm_mha --moe-runner-backend flashinfer_trtllm \
--scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
+--enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \


update fp8 config

ed1b0ba

ankursingh-nv requested a review from a team March 19, 2026 02:08

ankursingh-nv requested review from jgangani and kedarpotdar-nv as code owners March 19, 2026 02:08

github-project-automation bot added this to InferenceMAX Board Mar 19, 2026

update sgl image

6d9ff42

fix PR number

ankursingh-nv force-pushed the nv/qwen-fp8-tp8 branch from f32b1f9 to 6d9ff42 Compare March 19, 2026 02:11

functionstackx reviewed Mar 19, 2026

View reviewed changes

claude bot reviewed Mar 19, 2026

View reviewed changes

perf-changelog.yaml Show resolved Hide resolved

kedarpotdar-nv approved these changes Mar 19, 2026

View reviewed changes

Ankur-singh and others added 3 commits March 20, 2026 10:56

fix recipe

c8ac68f

Merge branch 'main' into nv/qwen-fp8-tp8

35edc3a

Merge branch 'main' into nv/qwen-fp8-tp8

172252b

ankursingh-nv added NVIDIA sweep-enabled labels Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Update Qwen FP8 B200 Configs#918

[NV] Update Qwen FP8 B200 Configs#918
ankursingh-nv wants to merge 5 commits intomainfrom
nv/qwen-fp8-tp8

ankursingh-nv commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

functionstackx Mar 19, 2026

Uh oh!

kedarpotdar-nv Mar 19, 2026

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

kedarpotdar-nv Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ankursingh-nv commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

functionstackx Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

kedarpotdar-nv Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

kedarpotdar-nv Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants