Skip to content

[Build] Link PYTORCH_ROCM_ARCH specified archs#15

Open
rjrock wants to merge 1 commit intoROCm:mainfrom
rjrock:fix/link_offload_archs
Open

[Build] Link PYTORCH_ROCM_ARCH specified archs#15
rjrock wants to merge 1 commit intoROCm:mainfrom
rjrock:fix/link_offload_archs

Conversation

@rjrock
Copy link

@rjrock rjrock commented Mar 23, 2026

Motivation

In the vLLM CI we see the error message

'Failed: CUDA error /app/DeepEP/csrc/kernels/launch_hip.cuh:71 'invalid kernel file''

when using deep_ep. Although we specify gfx950 in the env var PYTORCH_ROCM_ARCH, the gfx950 kernels are not linked into the shared object.

Technical Details

The offload architectures are explicitly linked into the shared object file. Previously, whatever architecture was discovered at runtime was linked.

@amirakb89 amirakb89 requested review from amirakb89 March 23, 2026 21:08
Copy link
Contributor

@amirakb89 amirakb89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@amirakb89 amirakb89 requested review from itej89 and liligwu March 23, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants