Fix hardware FP atomics on Julia 1.13 (LLVM 20) by luraess · Pull Request #923 · JuliaGPU/AMDGPU.jl

luraess · 2026-05-26T13:30:28Z

In LLVM 20, shouldExpandAtomicRMWInIR for AMDGPU gained a legality check (globalMemoryFPAtomicIsLegal) that gates whether FP atomicrmw instructions are lowered to native hardware instructions or expanded to CAS loops. For targets like gfx1100 (RDNA3) that lack AgentScopeFineGrainedRemoteMemoryAtomics, the check requires !amdgpu.no.fine.grained.memory per-instruction metadata to return true. Without it, LLVM expands every FP atomic to a global_atomic_cmpswap_b32 CAS loop regardless of the amdgpu-unsafe-fp-atomics function attribute.

The fix adds an empty !amdgpu.no.fine.grained.memory metadata node to all FP atomicrmw instructions (fadd/fsub/fmax/fmin) in finish_module! when unsafe_fp_atomics is enabled. This runs before the AtomicExpandPass, so LLVM 20 selects native instructions (e.g. global_atomic_add_f32) as expected. Adding the metadata is a no-op on CDNA targets and on LLVM 18 (Julia 1.12), which don't consult it.

luraess · 2026-05-26T13:31:36Z

This is the best I could come up with the help from Claude. Any feedback welcome to see if one could address this in a better fashion.

vchuravy · 2026-05-26T13:42:37Z

How is Clang handling this?

We are emitting them here https://github.com/JuliaConcurrent/UnsafeAtomics.jl/blob/master/ext/UnsafeAtomicsLLVM/atomics.jl I think, but there is currently no special handling for AMDGPU.

There is also ‘amdgpu.no.remote.memory’

I am a bit worried about universal emission

https://rocm.docs.amd.com/en/docs-6.2.0/conceptual/gpu-memory.html#coherence

fine-grained has a meaning in terms of coherence and I am not sure when/if we are getting fine-grained memory on all systems.

luraess · 2026-05-26T14:16:45Z

Seems that Clang's setTargetAtomicMetadata in AMDGPU.cpp applies !amdgpu.no.fine.grained.memory unconditionally on all FP atomics whenever allowAMDGPUUnsafeFPAtomics() is true.

The semantic contract is on the unsafe_fp_atomics opt-in: by enabling it, the caller asserts their memory is not fine-grained. Note that Clang would also add !amdgpu.ignore.denormal.mode for fadd float specifically.

Fix atomic for 1.13

76d1cf3

luraess requested a review from vchuravy May 26, 2026 13:30

Fixup

a106753

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hardware FP atomics on Julia 1.13 (LLVM 20)#923

Fix hardware FP atomics on Julia 1.13 (LLVM 20)#923
luraess wants to merge 2 commits into
mainfrom
lr/atomics-1.13

luraess commented May 26, 2026

Uh oh!

luraess commented May 26, 2026

Uh oh!

vchuravy commented May 26, 2026

Uh oh!

luraess commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luraess commented May 26, 2026

Uh oh!

luraess commented May 26, 2026

Uh oh!

vchuravy commented May 26, 2026

Uh oh!

luraess commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luraess commented May 26, 2026 •

edited

Loading