[Build] Link PYTORCH_ROCM_ARCH specified archs by rjrock · Pull Request #15 · ROCm/DeepEP

rjrock · 2026-03-23T21:00:53Z

Motivation

In the vLLM CI we see the error message

'Failed: CUDA error /app/DeepEP/csrc/kernels/launch_hip.cuh:71 'invalid kernel file''

when using deep_ep. Although we specify gfx950 in the env var PYTORCH_ROCM_ARCH, the gfx950 kernels are not linked into the shared object.

The offload architectures are explicitly linked into the shared object file. Previously, whatever architecture was discovered at runtime was linked.

amirakb89

LGTM.

[Build] Link PYTORCH_ROCM_ARCH specified archs

f25d9b0

amirakb89 requested review from amirakb89 March 23, 2026 21:08

amirakb89 approved these changes Mar 23, 2026

View reviewed changes

amirakb89 requested review from itej89 and liligwu March 23, 2026 21:18