[DMA][Swizzle] Enable LDS DMA with swizzling#23807
Open
lialan wants to merge 13 commits intousers/lialan/lower_dma_when_scaledfrom
Open
[DMA][Swizzle] Enable LDS DMA with swizzling#23807lialan wants to merge 13 commits intousers/lialan/lower_dma_when_scaledfrom
lialan wants to merge 13 commits intousers/lialan/lower_dma_when_scaledfrom
Conversation
c42cd2c to
aed9f22
Compare
Contributor
Author
|
Initial benchmark results on MI355X without tuning:
|
Contributor
Author
|
looking into the reason of regression. |
6ab1f22 to
5c175cf
Compare
3f861f5 to
cefbf84
Compare
* For now, remove the blanket guard that disabled DMA for all scaled matmuls. * When manually enable DMA, XOR swizzle will get disabled (for now). * Use DMA (UseGlobalLoadDMAAttr) for LHS/RHS operands. * Fix lowering of DMA copy.
Revert destination indices from divergent (srcLinearOffset) back to subgroup-uniform (linearOffsetVal). The gather_to_lds op contract specifies that only lane 0's dstIndices are used, so the dst base must be uniform. Also add a TODO in the scaled matmul DMA pipeline test noting that gather_to_lds is not yet produced for scaled operands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MX tests require static shapes. The "small" shape set includes dynamic dynamicities by default, so explicitly pass --mnk_dynamicities=static,static,static. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a SwizzleHintOp is used as the destination of a gather_to_lds via reshape or subview ops (expand_shape/collapse_shape/subview), the swizzle is applied on the source side in the DMA lowering pass. These reshape ops just pass through the swizzled allocation and should be treated as transparent users rather than unsupported ones. This fixes a compiler crash in the scaled matmul DMA path where: alloc -> swizzle_hint -> expand_shape -> gather_to_lds.dst Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace LLVMGPUTileAndFuse with #iree_gpu.pipeline<TileAndFuse> to match the migration in #23816. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4518445 to
b027b1e
Compare
…led destinations * Add swizzle detection helper that traces destination memref through expand_shape/collapse_shape/subview to find SwizzleHintOp. * apply the swizzle attribute's offset transformation to source linear offsets in the `gather_to_lds` lowering. * XOR swizzle is self-inverse, so applying it to source addresses produces the correct swizzled layout in LDS without violating gather_to_lds's uniform-destination constraint. * Add pipeline tests and E2E tests to make sure it works. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The base branch had plain DMA CHECK-DIRECT-LOAD checks that were superseded by the swizzle_dma branch's swizzled format checks. Remove the duplicate stale checks that fail with the new swizzle-enabled config.
56b7c30 to
c4f0eee
Compare
ae31340 to
3d21fd2
Compare
Yu-Zhewen
reviewed
Mar 21, 2026
Contributor
For BF16, I noticed that multi-buffering does not work with swizzling (#23919). This might be the same issue here. |
krzysz00
reviewed
Mar 24, 2026
Contributor
krzysz00
left a comment
There was a problem hiding this comment.
I don't have correctness concerns, but I don't think I'd want to land this with those perf regressions. Maybe we should investigate that multibuffering issue?
Contributor
Agree. I’m currently looking into it. |
16 tasks
Contributor
Author
|
The plan is to split this PR into smaller, ones.
|
b027b1e to
11e2260
Compare
cb09fe7 to
4c416a0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
expand_shape/collapse_shape/subviewto findSwizzleHintOp.gather_to_ldslowering.gather_to_lds's uniform-destination constraint.