Fix hardware FP atomics on Julia 1.13 (LLVM 20)#923
Conversation
|
This is the best I could come up with the help from Claude. Any feedback welcome to see if one could address this in a better fashion. |
|
How is Clang handling this? We are emitting them here https://github.com/JuliaConcurrent/UnsafeAtomics.jl/blob/master/ext/UnsafeAtomicsLLVM/atomics.jl I think, but there is currently no special handling for AMDGPU. There is also I am a bit worried about universal emission https://rocm.docs.amd.com/en/docs-6.2.0/conceptual/gpu-memory.html#coherence fine-grained has a meaning in terms of coherence and I am not sure when/if we are getting fine-grained memory on all systems. |
|
Seems that Clang's The semantic contract is on the |
In LLVM 20,
shouldExpandAtomicRMWInIRfor AMDGPU gained a legality check (globalMemoryFPAtomicIsLegal) that gates whether FPatomicrmwinstructions are lowered to native hardware instructions or expanded to CAS loops. For targets like gfx1100 (RDNA3) that lackAgentScopeFineGrainedRemoteMemoryAtomics, the check requires!amdgpu.no.fine.grained.memoryper-instruction metadata to returntrue. Without it, LLVM expands every FP atomic to aglobal_atomic_cmpswap_b32CAS loop regardless of theamdgpu-unsafe-fp-atomicsfunction attribute.The fix adds an empty
!amdgpu.no.fine.grained.memorymetadata node to all FPatomicrmwinstructions (fadd/fsub/fmax/fmin) infinish_module!whenunsafe_fp_atomicsis enabled. This runs before theAtomicExpandPass, so LLVM 20 selects native instructions (e.g.global_atomic_add_f32) as expected. Adding the metadata is a no-op on CDNA targets and on LLVM 18 (Julia 1.12), which don't consult it.