[LLVMGPU] Handle decomposed masks in ROCDLBufferInstructionsOptimization by Max191 · Pull Request #23859 · iree-org/iree

Max191 · 2026-03-19T18:11:13Z

The tile-and-fuse pipeline now decomposes masks (decomposeMasks=true), producing step+broadcast+cmpi+andi IR instead of create_mask ops. This commit adapts the ROCDL buffer optimization passes to work with the new IR shape.

ROCDLBufferInstructionsOptimization is simplified to pattern-match vector.broadcast(%scalar_i1) as the mask on vector.transfer_read and vector.maskedload, replacing with an unmasked operation + arith.select (or just the unmasked operation if the mask is constant true). Moved from post-bufferize to after vector lowering in the pipeline.

OptimizeComparisonOps is a new pass that simplifies vector arith.cmpi where one operand is a broadcast of a scalar with known divisibility. Uses IREE's IntegerDivisibilityAnalysis to determine if the comparison result is uniform across all vector lanes, and rewrites to a scalar comparison + broadcast.

Max191 · 2026-03-19T18:12:04Z

I still need to review properly this myself, so probably don't review yet, but making this draft to provide context in another PR.

The tile-and-fuse pipeline now decomposes masks (`decomposeMasks=true`), producing step+broadcast+cmpi+andi IR instead of create_mask ops. This commit adapts the ROCDL buffer optimization passes to work with the new IR shape. **ROCDLBufferInstructionsOptimization** is simplified to pattern-match `vector.broadcast(%scalar_i1)` as the mask on `vector.transfer_read` and `vector.maskedload`, replacing with an unmasked operation + `arith.select` (or just the unmasked operation if the mask is constant true). Moved from post-bufferize to after vector lowering in the pipeline. **OptimizeComparisonOps** is a new pass that simplifies vector `arith.cmpi` where one operand is a broadcast of a scalar with known divisibility. Uses IREE's `IntegerDivisibilityAnalysis` to determine if the comparison result is uniform across all vector lanes, and rewrites to a scalar comparison + broadcast. Handles all signed ordered predicates (slt, sle, sgt, sge) via a single-bucket condition (`floor(vecMin/sdiv) == floor(vecMax/sdiv)` with adjustment for non-strict predicates), and folds eq/ne to constants when no multiple of sdiv falls within the vector range. Signed-off-by: Max Dawkins <max.dawkins@gmail.com>

Max191 · 2026-03-27T14:57:41Z

Converted to draft because this is causing some performance regressions that are difficult to deal with. I will see about coming back to this in the near future.

…uctionsOptimization (#23947) The pass was bailing out on vector.transfer_read ops with non-identity permutation maps (e.g., 1D reads from a 4D memref). After #23855, we will frequently see 1D reads, which need to be supported here. Ideally, we will do something like what is done in #23859, but that approach is causing performance regressions that are difficult to deal with. For now, this provides a solution for the new mask types we will be seeing. Signed-off-by: Max Dawkins <max.dawkins@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Max191 mentioned this pull request Mar 19, 2026

[LinalgExt] Implement direct vectorization for im2col op #23855

Merged

krzysz00 reviewed Mar 19, 2026

View reviewed changes

Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/ROCDLBufferInstructionsOptimization.cpp Outdated

Max191 force-pushed the buffer-instr-opt-decomposed-mask branch 3 times, most recently from 187d635 to 4fefc05 Compare March 23, 2026 20:52

Max191 requested a review from krzysz00 March 23, 2026 20:52

Max191 marked this pull request as ready for review March 23, 2026 20:53

Max191 requested review from Groverkss, MaheshRavishankar, hanhanW, kuhar, nirvedhmeshram and qedawkins as code owners March 23, 2026 20:53

Max191 force-pushed the buffer-instr-opt-decomposed-mask branch from 4fefc05 to 8ee02af Compare March 23, 2026 20:56

Max191 mentioned this pull request Mar 27, 2026

[LLVMGPU] Support minor identity permutation maps in ROCDLBufferInstructionsOptimization #23947

Merged

Max191 marked this pull request as draft March 27, 2026 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPU] Handle decomposed masks in ROCDLBufferInstructionsOptimization#23859

[LLVMGPU] Handle decomposed masks in ROCDLBufferInstructionsOptimization#23859
Max191 wants to merge 1 commit intoiree-org:mainfrom
Max191-agents:buffer-instr-opt-decomposed-mask

Max191 commented Mar 19, 2026 •

edited

Loading

Uh oh!

Max191 commented Mar 19, 2026

Uh oh!

Uh oh!

Max191 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Max191 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Max191 commented Mar 19, 2026

Uh oh!

Uh oh!

Max191 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Max191 commented Mar 19, 2026 •

edited

Loading