Add DSV4 B200 Dynamo vLLM disagg by Oseltamivir · Pull Request #1303 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-05-08T20:53:26Z

Summary

Add dsv4-fp4-b200-dynamo-vllm multi-node disaggregated config.
Add B200-adapted DSV4 Dynamo vLLM recipes derived from the existing GB200 recipe set.
Update the B200 DGXC Slurm launcher to support dynamo-vllm for dsv4/fp4.

Validation

python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml --config-keys dsv4-fp4-b200-dynamo-vllm
python utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml --model-prefix dsv4 --framework dynamo-vllm --runner-type b200-multinode --multi-node
python -m pytest utils/matrix_logic/ -v
bash -n runners/launch_b200-dgxc.sh
git diff --check

github-actions · 2026-05-08T20:53:34Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-08T20:56:42Z

Claude finished @Oseltamivir's task in 4m 2s —— View job

PR Review: Add DSV4 B200 Dynamo vLLM disagg

Read and analyze all changed files
Validate master config and perf-changelog
Check container image accessibility
Validate model prefixes
Check launch script (enroot import, expert parallelism)
Review recipe YAML files for correctness
Post review findings

Summary

No blocking issues found. Two warnings posted as inline comments on the launch script:

max_attempts sed override — The existing blanket sed at line 146 will downgrade all 5 new recipes from max_attempts: 1440 to 720, contradicting the recipes' intentional higher timeout for cold-cache B200 model loads.
Fragile branch pin — git checkout aflowers/vllm-gb200-v0.20.0 depends on a feature branch on NVIDIA/srt-slurm that could disappear. Pinning a commit SHA would be more robust.

Validations passed:

perf-changelog.yaml updated and appended at end of file
Container image vllm/vllm-openai:v0.20.0-ubuntu2404 is publicly accessible on Docker Hub
Model prefix dsv4 is valid and already used across many existing configs
Enroot import pattern present in launch script (line 89)
Master config worker counts (prefill/decode num-worker) match all 5 recipe YAML prefill_workers/decode_workers values
Recipe vllm_config EP/DP settings are consistent with the master config ep/dp-attn flags

claude · 2026-05-08T21:00:35Z

+    elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "dsv4" ]]; then
+        git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
+        cd "$SRT_REPO_DIR" || exit 1
+        git checkout aflowers/vllm-gb200-v0.20.0


🟡 WARNING: This checks out a feature branch (aflowers/vllm-gb200-v0.20.0) on NVIDIA/srt-slurm. If that branch is deleted, force-pushed, or renamed upstream, all dsv4-fp4-b200-dynamo-vllm jobs will silently break.

Why it matters: Feature branches on external repos are ephemeral — this creates a fragile dependency.

Fix: Consider pinning to a specific commit SHA instead of a branch name for reproducibility:

Suggested change

git checkout aflowers/vllm-gb200-v0.20.0

- git checkout aflowers/vllm-gb200-v0.20.0

+ git checkout <specific-commit-sha> # aflowers/vllm-gb200-v0.20.0 as of YYYY-MM-DD

claude · 2026-05-08T21:00:43Z

    if [[ "$IS_AGENTIC" == "1" ]]; then
        git clone --branch cam/sa-submission-q2-2026 --single-branch https://github.com/cquil11/srt-slurm-nv.git "$SRT_REPO_DIR"
        cd "$SRT_REPO_DIR" || exit 1
+    elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "dsv4" ]]; then


🟡 WARNING: The existing sed at line 146 (sed -i 's/^ max_attempts: [0-9]*/ max_attempts: 720/') will override all 5 new B200 recipes' max_attempts: 1440 down to 720. The recipes intentionally set 1440 (≈ 4 h of health-check polling) with comments about absorbing cold-cache model loads on B200, but the blanket sed will halve that to ≈ 2 h.

Why it matters: If the DSV4-Pro model load on B200 genuinely needs the longer timeout, jobs could fail with spurious health-check timeouts before the model finishes loading.

Fix: Guard the sed so it only bumps recipes that ship with the old 360 default, or skip it for dynamo-vllm recipes:

if ! grep -q 'max_attempts: 1440' "${CONFIG_FILE%%:*}"; then sed -i 's/^ max_attempts: [0-9]*/ max_attempts: 720/' "${CONFIG_FILE%%:*}" fi

claude · 2026-05-08T21:09:03Z

+    elif [[ $MODEL_PREFIX == "dsv4" && $PRECISION == "fp4" && $FRAMEWORK == "dynamo-vllm" ]]; then
+        SELECTED_MODEL_PATH=""
+        if [[ -n "${MODEL_PATH:-}" && -d "${MODEL_PATH}" ]]; then
+            SELECTED_MODEL_PATH="$MODEL_PATH"
+        else
+            for candidate in /lustre/fsw/models/deepseek-v4-pro /lustre/fsw/models/dsv4-pro /lustre/fsw/models/DeepSeek-V4-Pro; do
+                if [[ -d "$candidate" ]]; then
+                    SELECTED_MODEL_PATH="$candidate"
+                    break
+                fi
+            done
+        fi
+        export MODEL_PATH="${SELECTED_MODEL_PATH:-/lustre/fsw/models/deepseek-v4-pro}"
+        export SRT_SLURM_MODEL_PREFIX="deepseek-v4-pro"


🔴 The shared sed at runners/launch_b200-dgxc.sh:146 (sed -i 's/^ max_attempts: [0-9]*/ max_attempts: 720/') matches any value, so the new B200 vLLM recipes' deliberate max_attempts: 1440 (documented in-file as needed for cold-cache DSV4-Pro loads) gets silently rewritten to 720 — halving the intended 14400s/4h health-check window to 7200s/2h. Make the rewrite a floor (only set 720 when the current value is lower) so it stops downgrading recipes that already specify a higher value, e.g. awk -i inplace '/^ max_attempts:/ {n=$2+0; if (n<720) $0=" max_attempts: 720"} 1' "${CONFIG_FILE%%:*}".

Extended reasoning...

What the bug is

The pre-existing line at runners/launch_b200-dgxc.sh:146:

sed -i 's/^ max_attempts: [0-9]*/ max_attempts: 720/' "${CONFIG_FILE%%:*}"

was introduced as a bump — the comment immediately above (lines 143–145) explains it raises DSR1-FP8's default max_attempts: 360 to 720 (3600s → 7200s) so large-model loads off shared FS finish in time. The substitution pattern [0-9]*, however, matches any numeric value, so it is really a force-set, not a floor.

How this PR triggers it

This PR adds a new elif branch at lines 57–62 that routes FRAMEWORK=dynamo-vllm + MODEL_PREFIX=dsv4 into the same code path that later runs the sed. The 5 new recipes (disagg-b200-{low-latency,low-middle-curve,mid-curve-megamoe,high-tpt-megamoe,max-tpt-megamoe}.yaml) all explicitly ship:

slurm: time_limit: "8:00:00" health_check: max_attempts: 1440 interval_seconds: 10

with an in-file rationale: "slurm.time_limit + health_check set to 8h / 1440 attempts to absorb cold-cache model loads." DeepSeek-V4-Pro is a large MoE model and the recipe author deliberately picked the higher value to cover the 4h cold-cache window. The launcher silently negates that choice.

Step-by-step proof

CI runs dsv4-fp4-b200-dynamo-vllm (see .github/configs/nvidia-master.yaml).

runners/launch_b200-dgxc.sh matches the new dsv4/fp4/dynamo-vllm branch (line 26) and clones srt-slurm + overlays the new recipes (lines 57–62).

Execution falls through to line 146 with CONFIG_FILE=recipes/vllm/deepseek-v4/8k1k/disagg-b200-low-latency.yaml.

The recipe currently contains max_attempts: 1440.

The sed pattern ^ max_attempts: [0-9]* matches that line and replaces it with max_attempts: 720.

srtctl now polls health for 720 × 10s = 7200s (2h) instead of the 1440 × 10s = 14400s (4h) the author specified.

Verified there is no later code that resets max_attempts after this sed (only launch_h200-dgxc-slurm.sh references it, and it appends to a fresh file, not the same path).

Why existing safeguards don't prevent it

bash -n only checks syntax, the generate_sweep_configs.py validation in the PR description doesn't exercise the launcher at all, and there is no test that asserts the recipe's max_attempts is preserved.

Impact

Cold-start runs of DSV4-Pro on the shared filesystem may report spurious health_check failures somewhere in the 2h–4h window where the recipe's 1440-attempt setting would have succeeded. Failures here cost a full multi-node B200 allocation per occurrence.

Fix

Make the rewrite a floor instead of a force-set so it stops downgrading recipes that already specify a higher value, for example:

awk -i inplace '/^ max_attempts:/ {n=$2+0; if (n<720) $0=" max_attempts: 720"} 1' "${CONFIG_FILE%%:*}"

or scope the sed to only the older paths that actually need the bump.

github-actions · 2026-05-08T21:10:18Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25579129066
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25579129066

github-actions · 2026-05-08T21:42:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25579927968
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25579927968

github-actions · 2026-05-08T22:12:09Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25581086579
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25581086579

github-actions · 2026-05-09T02:49:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25583036657
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25583036657

github-actions · 2026-05-09T04:05:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25589968098
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25589968098

github-project-automation Bot added this to InferenceMAX Board May 8, 2026

Add DSV4 B200 Dynamo vLLM disagg

2191510

Oseltamivir force-pushed the dsv4-b200-dynamo-vllm-disagg branch from d2d12d5 to 2191510 Compare May 8, 2026 20:53

Oseltamivir added the sweep-enabled label May 8, 2026

Oseltamivir marked this pull request as ready for review May 8, 2026 20:56

Oseltamivir requested a review from a team May 8, 2026 20:56

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners May 8, 2026 20:56

claude Bot reviewed May 8, 2026

View reviewed changes

image change

a1630bf

env vars

be8c58f

env vars

f728288

Oseltamivir added 2 commits May 8, 2026 20:05

refine pareto

4ca8bb5

Update nvidia-master.yaml

b3fa178

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DSV4 B200 Dynamo vLLM disagg#1303

Add DSV4 B200 Dynamo vLLM disagg#1303
Oseltamivir wants to merge 6 commits intomainfrom
dsv4-b200-dynamo-vllm-disagg

Oseltamivir commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026 •

edited

Loading

Uh oh!

claude Bot May 8, 2026

Uh oh!

claude Bot May 8, 2026

Uh oh!

claude Bot May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	git checkout aflowers/vllm-gb200-v0.20.0
	- git checkout aflowers/vllm-gb200-v0.20.0
	+ git checkout <specific-commit-sha> # aflowers/vllm-gb200-v0.20.0 as of YYYY-MM-DD

Conversation

Oseltamivir commented May 8, 2026

Summary

Validation

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add DSV4 B200 Dynamo vLLM disagg

Summary

Uh oh!

claude Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 8, 2026

Choose a reason for hiding this comment

What the bug is

How this PR triggers it

Step-by-step proof

Why existing safeguards don't prevent it

Impact

Fix

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 8, 2026 •

edited

Loading