Fixes and patches for NVlabs/nvdiffrast to support AMD ROCm 7.1 and Wave64 architectures (gfx1100, gfx1201).
This repository provides a comprehensive patch to make NVlabs/nvdiffrast fully compatible with AMD GPUs using ROCm 7.1 and newer architectures like RDNA 3/4 (gfx1100, gfx1201).
The original nvdiffrast is built for NVIDIA CUDA. Standard "hipify" conversion fails on newer AMD cards because:
- Wavefront Size: Newer AMD GPUs use Wave64, requiring 64-bit lane masks, while CUDA code uses 32-bit.
- NVIDIA ASM: The code contains PTX assembly that doesn't exist in ROCm.
- C++ Strictness: Modern ROCm compilers (Clang 20+) are stricter about narrowing conversions and namespaces.
- ✅ Wave64 Support: Converts all 32-bit masks (
0xffffffffu) to 64-bit (0xffffffffffffffffull). - ✅ ASM Porting: Replaces
vmin,vmax,slct,prmt, andbfindwith cross-platform HIP intrinsics. - ✅ PyTorch Compatibility: Fixes
OptionalCUDAGuardnamespaces and narrowing errors intorch_antialias.cpp. - ✅ Header Aliasing: Correctly links ROCm headers to expected CUDA paths.
sudo apt update && sudo apt install -y hipsparse-dev hipblas-dev rocthrust-dev hipcub-dev# Clone the original nvdiffrast
pip install git+https://github.com/NVlabs/nvdiffrast.git --no-build-isolation
cd nvdiffrast
# Download and run this patch
git clone https://github.com/tashibi/nvdiffrast-rocm-patch.git
chmod +x patch_rocm.sh
./patch_rocm.shrm -rf build/
export PYTORCH_ROCM_ARCH=gfx1201 # Change to your arch (gfx1100 for 7900XTX, etc.)
export FORCE_CUDA=1
python3 setup.py installimport torch
import nvdiffrast.torch as dr
ctx = dr.RasterizeCudaContext()
print("nvdiffrast successfully loaded on ROCm!")If you use this for InstantMesh on a 16GB card:
- Use
grid_res: 96or128in your config. - Set
export PYTORCH_HIP_ALLOC_CONF=max_split_size_mb:128to avoid OOM.