fix(actor_group): support cu13 TMS preload and enable NCCL CUMEM by d…#307
Open
aoshen02 wants to merge 2 commits into
Open
fix(actor_group): support cu13 TMS preload and enable NCCL CUMEM by d…#307aoshen02 wants to merge 2 commits into
aoshen02 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates vime/ray/actor_group.py by changing the default value of the NCCL_CUMEM_ENABLE environment variable from '0' to '1'. Additionally, it adds support for CUDA 13 by including 'torch_memory_saver_hook_mode_preload_cu13.abi3.so' in the list of preloaded shared library paths for torch_memory_saver. As there are no review comments, I have no feedback to provide.
CalvinXKY
previously approved these changes
Jul 1, 2026
…efault - Add torch_memory_saver_hook_mode_preload_cu13.abi3.so to the TMS dynamic library search list. Without this, cu13 (CUDA 13) containers fail to find the preload hook and TMS memory management is disabled. - Change NCCL_CUMEM_ENABLE default from "0" to "1". GB300 (sm103) requires CUMEM for NVLink/NVLS transports; disabling it causes NCCL init failures on Blackwell GPUs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onfig Docker: - Dockerfile: add cu13 (CUDA 13) build target alongside cu12, with sm_100f family-compatible kernels for GB300 (sm103) - justfile: add cu13 build/push targets - Remove obsolete vllm.patch (fixes merged upstream) Scripts: - run-glm5.2-744B-A40B.sh: update parallel config for 64 GPU GB300 (PP=4, TP=8, CP=2, EP=16), DSA layer split (first=18, mid=20, last=20), workload sizing (rollout-batch-size=8, n-samples=8, global-batch-size=64, max-tokens-per-gpu=65536 for 128K support), unique log filenames with run ID Docs: - Update GLM-5.2 744B example for GB300 configuration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
95f0602 to
339f599
Compare
aoshen02
added a commit
to aoshen02/vime
that referenced
this pull request
Jul 3, 2026
Port the cu13 build support from vllm-project#307: ENABLE_CUDA_13 branches the apt dev headers, cublas header, TransformerEngine (source-built for cu13), TMS_CUDA_MAJOR auto-detect, and the cudnn pin. justfile gains a build-cu13 target and a VARIANT-prefixed manifest. actor_group preloads the cu13 TMS .so. Also switch the vLLM patch apply to --allow-empty so the build survives once the patch is emptied upstream. Excludes vllm-project#307's NCCL_CUMEM_ENABLE default flip (0->1) and the glm5.2 scripts by request. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: aoshen02 <aoshen@inferact.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…efault
Add torch_memory_saver_hook_mode_preload_cu13.abi3.so to the TMS dynamic library search list. Without this, cu13 (CUDA 13) containers fail to find the preload hook and TMS memory management is disabled.
Change NCCL_CUMEM_ENABLE default from "0" to "1". GB300 (sm103) requires CUMEM for NVLink/NVLS transports; disabling it causes NCCL init failures on Blackwell GPUs.