Skip to content

Fix hardware FP atomics on Julia 1.13 (LLVM 20)#923

Open
luraess wants to merge 2 commits into
mainfrom
lr/atomics-1.13
Open

Fix hardware FP atomics on Julia 1.13 (LLVM 20)#923
luraess wants to merge 2 commits into
mainfrom
lr/atomics-1.13

Conversation

@luraess
Copy link
Copy Markdown
Member

@luraess luraess commented May 26, 2026

In LLVM 20, shouldExpandAtomicRMWInIR for AMDGPU gained a legality check (globalMemoryFPAtomicIsLegal) that gates whether FP atomicrmw instructions are lowered to native hardware instructions or expanded to CAS loops. For targets like gfx1100 (RDNA3) that lack AgentScopeFineGrainedRemoteMemoryAtomics, the check requires !amdgpu.no.fine.grained.memory per-instruction metadata to return true. Without it, LLVM expands every FP atomic to a global_atomic_cmpswap_b32 CAS loop regardless of the amdgpu-unsafe-fp-atomics function attribute.

The fix adds an empty !amdgpu.no.fine.grained.memory metadata node to all FP atomicrmw instructions (fadd/fsub/fmax/fmin) in finish_module! when unsafe_fp_atomics is enabled. This runs before the AtomicExpandPass, so LLVM 20 selects native instructions (e.g. global_atomic_add_f32) as expected. Adding the metadata is a no-op on CDNA targets and on LLVM 18 (Julia 1.12), which don't consult it.

@luraess luraess requested a review from vchuravy May 26, 2026 13:30
@luraess
Copy link
Copy Markdown
Member Author

luraess commented May 26, 2026

This is the best I could come up with the help from Claude. Any feedback welcome to see if one could address this in a better fashion.

@vchuravy
Copy link
Copy Markdown
Member

How is Clang handling this?

We are emitting them here https://github.com/JuliaConcurrent/UnsafeAtomics.jl/blob/master/ext/UnsafeAtomicsLLVM/atomics.jl I think, but there is currently no special handling for AMDGPU.

There is also ‘amdgpu.no.remote.memory’

I am a bit worried about universal emission

https://rocm.docs.amd.com/en/docs-6.2.0/conceptual/gpu-memory.html#coherence

fine-grained has a meaning in terms of coherence and I am not sure when/if we are getting fine-grained memory on all systems.

@luraess
Copy link
Copy Markdown
Member Author

luraess commented May 26, 2026

Seems that Clang's setTargetAtomicMetadata in AMDGPU.cpp applies !amdgpu.no.fine.grained.memory unconditionally on all FP atomics whenever allowAMDGPUUnsafeFPAtomics() is true.

The semantic contract is on the unsafe_fp_atomics opt-in: by enabling it, the caller asserts their memory is not fine-grained. Note that Clang would also add !amdgpu.ignore.denormal.mode for fadd float specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants