[FEAT-DSL] Update FlyToROCDL Conversion to support full dsl types#273
[FEAT-DSL] Update FlyToROCDL Conversion to support full dsl types#273
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the Fly→ROCDL lowering pipeline to better support the full Fly DSL type system (notably swizzles, composed layouts, and BufferDesc “fat pointer” handling), and aligns MLIR tests + Python DSL wrappers with the new printing/lowering behavior.
Changes:
- Add
BufferFatPtrhelper and refactor BufferDesc pointer lowering/offsetting/copy to use it. - Extend layout lowering/type inference to handle
SwizzleTypeincrd2idxand introducefly.decompositionplumbing for copy expansion. - Update MLIR conversion/transform tests for formatting and i32-based GEP indices; reorganize/extend Python primitive DSL helpers.
Reviewed changes
Copilot reviewed 12 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/mlir/Transforms/rewrite_func_signature.mlir | Adjusts expected type printing spacing in signatures. |
| tests/mlir/Transforms/layout_lowering.mlir | Adjusts expected ptr type printing spacing. |
| tests/mlir/Conversion/pointer_ops.mlir | Updates expected GEP index types/casts (i64 → i32) to match new lowering. |
| tests/mlir/Conversion/memref_ops.mlir | Updates expected GEP index types/casts (i64 → i32) to match new lowering. |
| tests/mlir/Conversion/dyn_shared.mlir | Updates expected dynamic shared GEP index type (i64 → i32). |
| python/flydsl/expr/primitive.py | Reorganizes and expands DSL op wrappers; adds/relocates ptr/memref helpers and “Deprecated” wrappers. |
| lib/Dialect/FlyROCDL/CDNA3/MmaAtom.cpp | Adjusts CDNA3 MFMA C thread/value layout construction. |
| lib/Dialect/Fly/Transforms/LayoutLowering.cpp | Adds SwizzleType handling in crd2idx lowering; inserts fly.decomposition before certain copy atom calls; improves GEMM shape handling. |
| lib/Dialect/Fly/IR/FlyTypeDefs.cpp | Improves printing spacing for types/attrs (commas). |
| lib/Dialect/Fly/IR/FlyOps.cpp | Extends crd2idx type inference for swizzles; adds DecompositionOp type inference; refines MemRefLoadOp inference for CoordTensorType. |
| lib/Dialect/Fly/IR/FlyAttrDefs.cpp | Adjusts parsing of leaf/basis attrs to more selectively parse E<mode> suffixes. |
| lib/Conversion/FlyToROCDL/FlyToROCDL.cpp | Refactors BufferDesc lowering through BufferFatPtr; adds ptr swizzle application at load/store/memcpy sites; simplifies iterator ops lowerings. |
| lib/Conversion/FlyToROCDL/BufferFatPtr.h | New helper implementing BufferDesc fat-pointer packing/unpacking and swizzled byte-offset computation. |
| include/flydsl/Dialect/Fly/Transforms/MemrefLowering.td | Adds pattern rewrites for fly.decomposition on memref/coord_tensor. |
| include/flydsl/Dialect/Fly/IR/FlyOps.td | Broadens crd2idx layout operand type (allows swizzle), adds fly.decomposition op, and tweaks assembly formats. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
coderfeli
left a comment
There was a problem hiding this comment.
Review Comments
Critical
-
i32 GEP overflow:
AddOffsetOpLoweringuses i32 offsets for all address spaces including Global. Offsets >2GB will overflow. Should keep i64 for non-BufferDesc pointers. -
Null deref in
AddOffsetOpLowering:offset.getDefiningOp()returns nullptr for block arguments, thendefOp->getOperand(0)crashes. Need a null check. -
Buffer-to-buffer copy dropped:
lowerCDNA3BufferCopynow rejects both-sides-BufferDesc and both-sides-non-BufferDesc. Old code handled these cases. Potential regression if any kernel uses buffer-to-buffer copy. -
CDNA3 MmaAtom C layout: Removed
ValM1from accumulator layout. For mfma with M>16, this changes how accumulator values distribute. Needs correctness verification across all MFMA variants.
Medium
-
Swizzle logic duplicated between
BufferFatPtr::swizzleByteOffsetandapplySwizzleOnPtr— should extract a shared helper. -
lowerUniversalCopymemcpy length changed from i64 to i32. ConfirmLLVM::MemcpyOpaccepts i32 length. -
No MLIR tests for
fly.decompositionor swizzle incrd2idx. -
MakeViewOpLoweringrelies onuse_empty()for CoordTensor — fragile pattern, add a comment explaining why.
|
a little conflict @sjfeng1999 after I changed the crd2idx to use in gemm kernel. |
|
Made-with: Cursor
f2f651c to
a226da8
Compare
|
@sjfeng1999 rebased on main and fixed a minor run error in the moe and gemm. take a look. |
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist