Skip to content

[FEAT-DSL] Update FlyToROCDL Conversion to support full dsl types#273

Merged
coderfeli merged 8 commits intomainfrom
pr/yodate-to-rocdl-conversion
Mar 25, 2026
Merged

[FEAT-DSL] Update FlyToROCDL Conversion to support full dsl types#273
coderfeli merged 8 commits intomainfrom
pr/yodate-to-rocdl-conversion

Conversation

@sjfeng1999
Copy link
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Fly→ROCDL lowering pipeline to better support the full Fly DSL type system (notably swizzles, composed layouts, and BufferDesc “fat pointer” handling), and aligns MLIR tests + Python DSL wrappers with the new printing/lowering behavior.

Changes:

  • Add BufferFatPtr helper and refactor BufferDesc pointer lowering/offsetting/copy to use it.
  • Extend layout lowering/type inference to handle SwizzleType in crd2idx and introduce fly.decomposition plumbing for copy expansion.
  • Update MLIR conversion/transform tests for formatting and i32-based GEP indices; reorganize/extend Python primitive DSL helpers.

Reviewed changes

Copilot reviewed 12 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/mlir/Transforms/rewrite_func_signature.mlir Adjusts expected type printing spacing in signatures.
tests/mlir/Transforms/layout_lowering.mlir Adjusts expected ptr type printing spacing.
tests/mlir/Conversion/pointer_ops.mlir Updates expected GEP index types/casts (i64 → i32) to match new lowering.
tests/mlir/Conversion/memref_ops.mlir Updates expected GEP index types/casts (i64 → i32) to match new lowering.
tests/mlir/Conversion/dyn_shared.mlir Updates expected dynamic shared GEP index type (i64 → i32).
python/flydsl/expr/primitive.py Reorganizes and expands DSL op wrappers; adds/relocates ptr/memref helpers and “Deprecated” wrappers.
lib/Dialect/FlyROCDL/CDNA3/MmaAtom.cpp Adjusts CDNA3 MFMA C thread/value layout construction.
lib/Dialect/Fly/Transforms/LayoutLowering.cpp Adds SwizzleType handling in crd2idx lowering; inserts fly.decomposition before certain copy atom calls; improves GEMM shape handling.
lib/Dialect/Fly/IR/FlyTypeDefs.cpp Improves printing spacing for types/attrs (commas).
lib/Dialect/Fly/IR/FlyOps.cpp Extends crd2idx type inference for swizzles; adds DecompositionOp type inference; refines MemRefLoadOp inference for CoordTensorType.
lib/Dialect/Fly/IR/FlyAttrDefs.cpp Adjusts parsing of leaf/basis attrs to more selectively parse E<mode> suffixes.
lib/Conversion/FlyToROCDL/FlyToROCDL.cpp Refactors BufferDesc lowering through BufferFatPtr; adds ptr swizzle application at load/store/memcpy sites; simplifies iterator ops lowerings.
lib/Conversion/FlyToROCDL/BufferFatPtr.h New helper implementing BufferDesc fat-pointer packing/unpacking and swizzled byte-offset computation.
include/flydsl/Dialect/Fly/Transforms/MemrefLowering.td Adds pattern rewrites for fly.decomposition on memref/coord_tensor.
include/flydsl/Dialect/Fly/IR/FlyOps.td Broadens crd2idx layout operand type (allows swizzle), adds fly.decomposition op, and tweaks assembly formats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@coderfeli coderfeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Comments

Critical

  1. i32 GEP overflow: AddOffsetOpLowering uses i32 offsets for all address spaces including Global. Offsets >2GB will overflow. Should keep i64 for non-BufferDesc pointers.

  2. Null deref in AddOffsetOpLowering: offset.getDefiningOp() returns nullptr for block arguments, then defOp->getOperand(0) crashes. Need a null check.

  3. Buffer-to-buffer copy dropped: lowerCDNA3BufferCopy now rejects both-sides-BufferDesc and both-sides-non-BufferDesc. Old code handled these cases. Potential regression if any kernel uses buffer-to-buffer copy.

  4. CDNA3 MmaAtom C layout: Removed ValM1 from accumulator layout. For mfma with M>16, this changes how accumulator values distribute. Needs correctness verification across all MFMA variants.

Medium

  1. Swizzle logic duplicated between BufferFatPtr::swizzleByteOffset and applySwizzleOnPtr — should extract a shared helper.

  2. lowerUniversalCopy memcpy length changed from i64 to i32. Confirm LLVM::MemcpyOp accepts i32 length.

  3. No MLIR tests for fly.decomposition or swizzle in crd2idx.

  4. MakeViewOpLowering relies on use_empty() for CoordTensor — fragile pattern, add a comment explaining why.

@coderfeli
Copy link
Collaborator

a little conflict @sjfeng1999 after I changed the crd2idx to use in gemm kernel.

@sjfeng1999
Copy link
Contributor Author

Review Comments

Critical

  1. i32 GEP overflow: AddOffsetOpLowering uses i32 offsets for all address spaces including Global. Offsets >2GB will overflow. Should keep i64 for non-BufferDesc pointers.
  2. Null deref in AddOffsetOpLowering: offset.getDefiningOp() returns nullptr for block arguments, then defOp->getOperand(0) crashes. Need a null check.
  3. Buffer-to-buffer copy dropped: lowerCDNA3BufferCopy now rejects both-sides-BufferDesc and both-sides-non-BufferDesc. Old code handled these cases. Potential regression if any kernel uses buffer-to-buffer copy.
  4. CDNA3 MmaAtom C layout: Removed ValM1 from accumulator layout. For mfma with M>16, this changes how accumulator values distribute. Needs correctness verification across all MFMA variants.

Medium

  1. Swizzle logic duplicated between BufferFatPtr::swizzleByteOffset and applySwizzleOnPtr — should extract a shared helper.
  2. lowerUniversalCopy memcpy length changed from i64 to i32. Confirm LLVM::MemcpyOp accepts i32 length.
  3. No MLIR tests for fly.decomposition or swizzle in crd2idx.
  4. MakeViewOpLowering relies on use_empty() for CoordTensor — fragile pattern, add a comment explaining why.
  1. The type of offset in the GEP is coherent with the integer width in the int_tuple offset. It's not always i32.
  2. After rewrite-func-signature pass, int_tuple value can't appear at the block_arguemnts.
  3. It's unproper case anyway. CI will check if there is a regression.
  4. Done, it's a true problem.

@coderfeli coderfeli force-pushed the pr/yodate-to-rocdl-conversion branch from f2f651c to a226da8 Compare March 24, 2026 15:01
@coderfeli
Copy link
Collaborator

@sjfeng1999 rebased on main and fixed a minor run error in the moe and gemm. take a look.

@coderfeli coderfeli merged commit cf3ab62 into main Mar 25, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants