Skip to content

Replace ExceptionInfo with lightweight packed UInt64 atomic#894

Merged
vchuravy merged 5 commits intomasterfrom
gb/lightweight-exceptions
Mar 19, 2026
Merged

Replace ExceptionInfo with lightweight packed UInt64 atomic#894
vchuravy merged 5 commits intomasterfrom
gb/lightweight-exceptions

Conversation

@gbaraldi
Copy link
Member

Summary

  • Replace the 56-byte ExceptionInfo struct with a single packed UInt64, written via one atomic CAS
  • Each error site (bounds check, div-by-zero, etc.) previously inlined ~20 flat_store_byte instructions writing the struct byte-by-byte, consuming ~15 VGPRs on never-taken error paths
  • New approach: pack workgroup IDs (3x16 bits) + error code (8 bits) into 64 bits, single global_atomic_cmpswap, ~3 VGPRs per error site

What's lost

  • Per-workitem thread IDs (workgroup IDs are kept)
  • Human-readable reason strings stored on device (replaced with error codes mapped to strings on host)

What's kept

  • Workgroup identification (x, y, z)
  • Error categorization (BoundsError, DomainError, OverflowError, DivideError, etc.)
  • First-writer-wins semantics (atomic CAS)

Why this matters

In the Oceananigans benchmark, partial_mapreduce_device kernels had 300-362 flat_store_byte instructions from inlined exception handling. The mapreduce kernel variants scaled from 59 to 105 VGPRs as dimension count increased, with ~15 VGPRs per dimension attributable to inlined exception paths. At 105 VGPRs the kernel is limited to 4 waves/SIMD; without the exception overhead the actual reduction logic needs ~40 VGPRs (8 waves/SIMD), doubling occupancy.

Test plan

  • Normal kernel execution (no exceptions) still works
  • Bounds errors are detected and reported with workgroup info
  • Division-by-zero errors are detected
  • Exception info is correctly unpacked on host side
  • Verify reduced flat_store_byte count in assembly output
  • Verify reduced VGPR count for kernels with multiple error paths

🤖 Generated with Claude Code

gbaraldi and others added 4 commits March 18, 2026 15:27
The old exception handling inlined ~20 flat_store_byte instructions at
every error site (bounds checks, div-by-zero, etc.), writing a 56-byte
ExceptionInfo struct byte-by-byte through flat memory. This bloated
register usage by ~15 VGPRs per error site, reducing occupancy even
though the error paths are never taken at runtime.

Replace with a single UInt64 packed with workgroup IDs (16 bits each)
and an error code (8 bits), written via one atomic CAS. Each error
site now needs ~3 VGPRs instead of ~15.

Trade-offs:
- Lost: per-workitem IDs, human-readable reason strings on device
- Kept: workgroup IDs, error category (BoundsError, DomainError, etc.)
- Gained: significantly lower register pressure on error paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that the new packed UInt64 exception path does not generate
flat_store_byte instructions (from the old byte-by-byte ExceptionInfo
writes) and uses atomic cmpswap instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that bounds-checked array access also generates zero
flat_store_byte and uses atomic cmpswap for exception signaling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without kernel=true, code_native compiles as a device function and
the AddKernelStatePass is skipped, leaving julia.gpu.state_getter
as a real function call. This masks the true codegen improvement
(10 VGPRs vs 41) and causes false test results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# TODO check specific exception type
end

@testset "Exception codegen" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should start using FileCheck.jl for these.

Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
@gbaraldi
Copy link
Member Author

@luraess this is a breaking release I guess?

@luraess
Copy link
Member

luraess commented Mar 19, 2026

Would it break the higher level api?

@vchuravy
Copy link
Member

I don't think this qualifies for a breaking release

@gbaraldi
Copy link
Member Author

I don't know if we considered the exception info struct as API. The direct errors should be the same

@luraess
Copy link
Member

luraess commented Mar 19, 2026

We could tag a minor of this impacts some user code or patch else.

@vchuravy vchuravy merged commit c02ef8d into master Mar 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants