chore: Cuda13 Support #1786

guyueh1 · 2026-01-16T00:16:35Z

What does this PR do ?

Cuda13 in the base image and torch 2.9

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Chores
- Updated container base image to CUDA 13.0
- Upgraded vLLM to version 0.13.0 with CUDA 13 support
- Upgraded transformer-engine to 2.9.0 with CUDA 13 core support
- Enhanced build workflow with improved dependency locking

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

coderabbitai · 2026-01-16T00:19:53Z

📝 Walkthrough

Walkthrough

These changes upgrade the CUDA infrastructure from version 12.9 to 13.0, updating the Docker base image, adding CUDA header configuration, and replacing cu12-versioned dependencies with cu13-versioned packages (vLLM, transformer-engine, NVShmem, PyTorch).

Changes

Cohort / File(s)	Summary
CUDA 13.0 Upgrade `docker/Dockerfile`	Updated base image from CUDA 12.9 to 13.0, added `CPLUS_INCLUDE_PATH` environment variable for CUDA headers, and introduced `uv lock` before `uv sync` commands for dependency locking
CUDA 13.0 Upgrade `pyproject.toml`	Replaced cu12 package versions with cu13 equivalents: NVShmem wheel spec, vLLM (0.13.0), transformer-engine (2.9.0), and PyTorch sources; updated tool.uv.sources entries from cu129 to cu130 for torch, torchvision, and triton

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major changes (CUDA 13.0, PyTorch 2.9, multiple dependency upgrades) affecting numerics and performance, but lacks test results, regression testing, performance benchmarks, and documentation.	Add test results confirming no numerical regressions, performance benchmarks, FlashInfer compatibility resolution confirmation, and complete the contributor checklist before merging.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title 'chore: Cuda13 Support' accurately reflects the main change: upgrading CUDA from 12.9 to 13.0 across the Docker configuration and all dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@docker/Dockerfile`:
- Line 107: The Dockerfile currently runs the command "uv lock" during image
build which regenerates uv.lock and breaks reproducibility; remove that command
from the Dockerfile or replace it with "uv lock --check" so the build fails if
uv.lock is out-of-date (instead of silently rewriting it); if the intent was to
keep lockfile synced, instead enforce the check in CI (or document the CUDA13
workaround) and ensure any reference to "uv lock" in the Dockerfile is updated
accordingly.

In `@pyproject.toml`:
- Line 60: The vLLM direct URL dependency currently pins only the aarch64 wheel
(e.g., the entry with "vllm-0.13.0+cu130-cp38-abi3-manylinux_2_35_aarch64.whl");
update each vllm dependency occurrence so the package spec includes two
conditional markers selecting the correct wheel per architecture: one URL using
the x86_64 wheel filename
(vllm-0.13.0+cu130-cp38-abi3-manylinux_2_35_x86_64.whl) with a marker ;
platform_machine == 'x86_64' and the existing aarch64 URL with a marker ;
platform_machine == 'aarch64', ensuring all vllm entries (the ones matching the
current aarch64-only URL) are replaced with these architecture-conditional specs
so x86_64 builds install the correct wheel.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0b1a91 and 5d09ae4.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

docker/Dockerfile
pyproject.toml

🧰 Additional context used

📓 Path-based instructions (1)

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

pyproject.toml
docker/Dockerfile

🧠 Learnings (2)

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to Dockerfile.ngc_pytorch : Exception: Dockerfile.ngc_pytorch is exempt from the uv run rule

Applied to files:

docker/Dockerfile

📚 Learning: 2025-09-24T18:36:01.919Z

Learnt from: terrykong
Repo: NVIDIA-NeMo/RL PR: 1024
File: .pre-commit-config.yaml:80-81
Timestamp: 2025-09-24T18:36:01.919Z
Learning: In pre-commit hooks, avoid using `uv run` for standalone Python scripts like tools/config_cli.py because `uv run` triggers project synchronization, which adds unnecessary overhead. Use direct script execution (e.g., `./tools/config_cli.py`) instead when the script is designed to be standalone.

Applied to files:

docker/Dockerfile

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Lint check
GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (8)

docker/Dockerfile (2)

20-22: LGTM! Appropriate header path configuration for CUDA 13.

The CPLUS_INCLUDE_PATH configuration is a reasonable workaround for the CUDA 13 header reorganization where standard library headers moved under include/cccl. The comment clearly explains the purpose.

8-8: Base image tag is valid and confirmed available.

The base image nvcr.io/nvidia/cuda-dl-base:25.11-cuda13.0-devel-ubuntu24.04 follows NVIDIA's documented tag pattern for CUDA DL 25.11 with CUDA 13.0 on Ubuntu 24.04 and is publicly available on NGC.

pyproject.toml (6)

49-49: LGTM! NVShmem updated to CUDA 13 variant.

The dependency correctly switches from nvidia-nvshmem-cu12 to nvidia-nvshmem-cu13 with appropriate platform markers.

71-72: Clarify status of commented-out deep_ep and deep_gemm dependencies.

Multiple deep_ep and deep_gemm git references are commented out. The commit message mentions "still not working with flashinfer," suggesting these are temporarily disabled.

Consider:

Adding a comment explaining why these are disabled and linking to a tracking issue

If this is expected to be resolved before merge, mark it as a TODO with the issue reference

This helps future maintainers understand the intent and track when these can be re-enabled.

Also applies to: 78-79, 82-83

96-96: LGTM! Transformer-engine updated to 2.9.0 with CUDA 13 support.

The transformer-engine[pytorch,core_cu13]==2.9.0 specification correctly pins the version with CUDA 13 core support.

161-171: LGTM! PyTorch sources correctly updated to cu130 index.

The torch, torchvision, and triton source mappings properly reference the new pytorch-cu130 index for Linux platforms while preserving PyPI for macOS.

196-197: LGTM! Index URL updated for CUDA 13.

The PyTorch index definition correctly points to the cu130 wheel repository.

226-227: Clever approach to disable cu12 transformer-engine.

Using sys_platform == 'never' as an impossible condition is an effective way to exclude the transformer-engine-cu12 package from resolution without removing it entirely from overrides. This prevents accidental installation of the CUDA 12 variant.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-16T00:19:56Z

docker/Dockerfile

 # Do everything in one layer to prevent large layers.

 # The venv is symlinked to avoid bloating the layer size
+uv lock


⚠️ Potential issue | 🟠 Major

Reconsider running uv lock during the Docker build.

Running uv lock regenerates uv.lock from pyproject.toml, which can produce different results across builds if upstream dependency versions change. This undermines reproducibility—the lockfile committed to the repo should be the source of truth.

If the intent is to ensure the lockfile stays in sync with pyproject.toml, consider either:

Remove uv lock and rely on CI to verify the lockfile is up-to-date

Use uv lock --check to fail the build if the lockfile is stale, rather than silently regenerating it

Was this added intentionally to work around a specific issue with the CUDA 13 dependencies?

🤖 Prompt for AI Agents

In `@docker/Dockerfile` at line 107, The Dockerfile currently runs the command "uv lock" during image build which regenerates uv.lock and breaks reproducibility; remove that command from the Dockerfile or replace it with "uv lock --check" so the build fails if uv.lock is out-of-date (instead of silently rewriting it); if the intent was to keep lockfile synced, instead enforce the check in CI (or document the CUDA13 workaround) and ensure any reference to "uv lock" in the Dockerfile is updated accordingly.

coderabbitai · 2026-01-16T00:19:56Z

pyproject.toml

  "causal-conv1d",
  # Remove this once https://github.com/NVIDIA-NeMo/RL/issues/501 resolved
-  "vllm==0.11.2",
+  "vllm @ https://github.com/vllm-project/vllm/releases/download/v0.13.0/vllm-0.13.0+cu130-cp38-abi3-manylinux_2_35_aarch64.whl",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check available vLLM v0.13.0 wheels on GitHub releases echo "=== Checking vLLM v0.13.0 release assets ===" curl -s "https://api.github.com/repos/vllm-project/vllm/releases/tags/v0.13.0" | jq -r '.assets[].name' 2>/dev/null | grep -i "cu130.*\.whl" || echo "No cu130 wheels found or API failed"

Repository: NVIDIA-NeMo/RL

Length of output: 211

🏁 Script executed:

#!/bin/bash # Examine pyproject.toml to see the dependencies and affected lines echo "=== Checking pyproject.toml content around line 60, 74, 83, 104 ===" cat -n pyproject.toml | sed -n '55,110p'

Repository: NVIDIA-NeMo/RL

Length of output: 3690

vLLM wheel URLs only specify aarch64 architecture—x86_64 builds will fail.

All vLLM direct URL dependencies (lines 60, 74, 83, 104) point to manylinux_2_35_aarch64.whl while x86_64 wheels are available for v0.13.0. This breaks x86_64 Linux builds, which are likely the primary target for most users.

Add conditional markers to select the appropriate wheel per architecture:

"vllm @ https://github.com/vllm-project/vllm/releases/download/v0.13.0/vllm-0.13.0+cu130-cp38-abi3-manylinux_2_35_x86_64.whl ; platform_machine == 'x86_64'", "vllm @ https://github.com/vllm-project/vllm/releases/download/v0.13.0/vllm-0.13.0+cu130-cp38-abi3-manylinux_2_35_aarch64.whl ; platform_machine == 'aarch64'",

🤖 Prompt for AI Agents

In `@pyproject.toml` at line 60, The vLLM direct URL dependency currently pins only the aarch64 wheel (e.g., the entry with "vllm-0.13.0+cu130-cp38-abi3-manylinux_2_35_aarch64.whl"); update each vllm dependency occurrence so the package spec includes two conditional markers selecting the correct wheel per architecture: one URL using the x86_64 wheel filename (vllm-0.13.0+cu130-cp38-abi3-manylinux_2_35_x86_64.whl) with a marker ; platform_machine == 'x86_64' and the existing aarch64 URL with a marker ; platform_machine == 'aarch64', ensuring all vllm entries (the ones matching the current aarch64-only URL) are replaced with these architecture-conditional specs so x86_64 builds install the correct wheel.

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

guyueh1 · 2026-01-16T01:00:00Z

uv run examples/run_grpo_math.py --config examples/configs/grpo_math_1B_megatron.yaml policy.model_name=meta-llama/Llama-3.1-8B-Instruct works now!

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

github-actions · 2026-01-17T22:05:13Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 2490141 (PR #1786 from cuda13)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

github-actions · 2026-01-18T23:01:15Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 90efbeb (PR #1786 from cuda13)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

guyueh1 · 2026-01-19T18:48:40Z

@terrykong I think the github CI machine is too old for running unit test. We can try to run those tests locally for now, but eventually we will need some H100 CI with higher driver for testing this.

guyueh1 and others added 2 commits January 13, 2026 15:53

save

77a0a8d

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Fix; still not working with flashinfer

5d09ae4

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

guyueh1 requested review from a team as code owners January 16, 2026 00:16

guyueh1 changed the title ~~feat: Cuda13 Support~~ feat: [draft do not merge] Cuda13 Support Jan 16, 2026

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

Rm flash-attn from dependency

34e5c37

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

Update Automodel submod

2490141

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

guyueh1 requested a review from a team as a code owner January 17, 2026 22:04

Fix

90efbeb

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

guyueh1 changed the title ~~feat: [draft do not merge] Cuda13 Support~~ chore: Cuda13 Support Jan 18, 2026

guyueh1 added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Jan 18, 2026

guyueh1 self-assigned this Jan 18, 2026

guyueh1 temporarily deployed to nemo-ci January 18, 2026 23:01 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci January 18, 2026 23:31 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Cuda13 Support #1786

chore: Cuda13 Support #1786

guyueh1 commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 16, 2026

Uh oh!

coderabbitai bot Jan 16, 2026

Uh oh!

guyueh1 commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

guyueh1 commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chore: Cuda13 Support #1786

Are you sure you want to change the base?

chore: Cuda13 Support #1786

Conversation

guyueh1 commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

guyueh1 commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 17, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Jan 18, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

guyueh1 commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

guyueh1 commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading