batch mem2reg to process all variables in a single pass#547
batch mem2reg to process all variables in a single pass#547LegNeato wants to merge 3 commits intoRust-GPU:mainfrom
Conversation
|
@eddyb can you review this one? |
mem2reg processed each variable independently. This O(V*N) complexity caused multi-minute hangs on large post-inline functions. Batch all operations into single passes over the instructions. Fixes Rust-GPU#546. Measured on the proof-of-space-gpu shader: mem2reg drops from 228s to 18s (13x), total link from 236s to 26s. Also adds unit tests.
Firestar99
left a comment
There was a problem hiding this comment.
Looks sane to me, then again, I never looked at our mem2reg pass before. What's really interesting to me are the access patters of this pass, and how it reads and writes instructions, surely useful info for implementing some more efficient data structures for storing a module.
| Op::Constant if u32_type.is_some() && inst.result_type == u32_type => { | ||
| let value = inst.operands[0].unwrap_literal_bit32(); | ||
| constants.insert(inst.result_id.unwrap(), value); | ||
| } |
There was a problem hiding this comment.
You're separating out u32 constants, yet a lot of your tests are using f32 constants?
There was a problem hiding this comment.
The map only collects u32 constants because it's used for resolving OpAccessChain indices. The tests use the f32 store/load values. From https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpAccessChain, access chain indices must be scalar integers, and AFAICT rustc always emits u32 for these (and the previous code assumed u32). I added a doc comment.
The SPIR-V spec allows OpLine/OpNoLine between OpPhi instructions at the start of a block. Account for this in the phi search so it doesn't stop early at a debug instruction. Adds two tests for the phi search boundary behavior.
Add doc comment to construct_access_chain_info explaining that access chain indices must be scalar integers per the SPIR-V spec, and that the constants map only tracks u32 (matching what rustc emits). Add test for a u64 constant index that is valid SPIR-V but not resolved by mem2reg, verifying the variable is not promoted.
mem2reg processed each variable independently.
This O(V*N) complexity caused multi-minute hangs on large post-inline functions.
Batch all operations into single passes over the instructions.
Fixes #546. Measured on the proof-of-space-gpu shader: mem2reg drops from 228s to 18s, total link from 236s to 26s.
Also adds unit tests.
Disclosure: largely AI.