Skip to content

fix(codegen): materialize transpose scratch tile#1406

Open
high-cloud wants to merge 1 commit into
hw-native-sys:mainfrom
high-cloud:fix/tile-transpose-scratch
Open

fix(codegen): materialize transpose scratch tile#1406
high-cloud wants to merge 1 commit into
hw-native-sys:mainfrom
high-cloud:fix/tile-transpose-scratch

Conversation

@high-cloud
Copy link
Copy Markdown
Contributor

Summary

  • Lower 3-arg tile.transpose(src, axis0, axis1) into a 4-arg form with an explicit scratch tile so memory allocation assigns a UB address before PTO codegen.
  • Preserve transposed valid shape metadata and keep Python IR/language wrappers in sync with the internal 4-arg transpose form.
  • Add regression coverage for lower-composite lowering and generated PTO alloc metadata (addr, valid_row, valid_col).

Testing

  • cmake --build build --parallel
  • python -m pytest tests/ut/ir/transforms/test_lower_composite_ops.py tests/ut/codegen/test_pto_codegen_ops.py::TestTileTransposeCodegen -v
  • python tests/lint/clang_tidy.py --diff-base HEAD (passes; local clang-tidy is 18.1.8 while project expects 21.1.0)
  • git diff --check
  • NPU repro demo passed via task-submit: task_20260519_115111_278459217976

Fixes #1402

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extends pl.tile.transpose to support both 3-argument (tile, axis1, axis2) and 4-argument (tile, scratch_tile, axis1, axis2) forms. The public API maintains 3-argument form only; the 4-argument form is internal, used by composite-op lowering to auto-allocate and manage scratch tiles. Type deduction now preserves valid_shape metadata and handles both forms, while codegen produces valid_row/valid_col operands for dynamically-shaped scratch allocations.

Changes

Tile Transpose Scratch Allocation

Layer / File(s) Summary
Private language API helper for 4-arg transpose
python/pypto/language/op/tile_ops.py
Added _transpose_with_tmp(tile, tmp, axis1, axis2) helper that normalizes axes to ConstInt and emits an IR call to tile.transpose with the scratch tile operand, then wraps it as a Tile.
Type deduction with valid_shape preservation and argument flexibility
src/ir/op/tile_ops/transform.cpp
Updated DeduceTileTransposeType to accept both 3 and 4 arguments with axis_base offset for parsing, validating tmp as TileType when present. Redesigned valid_shape derivation from GetValidShape and conditional axis swap based on rank matching. Simplified tile_view initialization in tile.slice, tile.assemble, tile.scatter_update, and tile.set_validshape via value_or() helper.
Composite op lowering with auto-scratch allocation
src/ir/transforms/lower_composite_ops_pass.cpp
Added LoweringBuilder::CreateTile helper and LowerTransposeRule to lower composite tile.transpose: validates 3 or 4 arguments, passes through 4-arg form directly, and for 3-arg form derives target memory from input tile and creates intermediate scratch before emitting tile.transpose. Registered transpose in composite-op dispatch table.
Python printer support for 4-arg transpose form
src/ir/transforms/python_printer.cpp
Updated IRPythonPrinter to normalize tile.transpose with 4 arguments to tile._transpose_with_tmp when printing, mapping the internal IR form back to the public helper name.
Test coverage for transpose scratch and valid_shape
tests/ut/ir/operators/test_tile_ops.py, tests/ut/ir/transforms/test_lower_composite_ops.py, tests/ut/codegen/test_pto_codegen_ops.py
Added test for public API to reject 4-arg form, IR visitor helper _CallCollector to gather calls, composite lowering test validating 4-arg form with preallocated scratch tile, and codegen test validating pto.ttrans emission and scratch allocation with addr/valid_row/valid_col attributes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hw-native-sys/pypto#1395: Both PRs modify tile-type deduction to use the shared GetValidShape(...) helper for correct TileView.valid_shape propagation.

Suggested labels

bug

Suggested reviewers

  • lyfne123

Poem

🐰 A scratch tile hops through transpose's flow,
Valid shapes preserved, row by col we go,
Four arguments dance where three once stood,
Auto-lowered scratch tiles, working as they should! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: materializing a transpose scratch tile for codegen. It clearly identifies the primary fix without excessive detail.
Description check ✅ Passed The description is directly related to the changeset, detailing the approach (3-arg to 4-arg lowering), preservation of metadata, and testing performed. It provides meaningful context for the fix.
Linked Issues check ✅ Passed The PR directly addresses issue #1402 by implementing the expected behavior: lowering 3-arg tile.transpose to 4-arg form with explicit scratch tile so pto.alloc_tile receives proper valid_row/valid_col operands, preventing ptoas parsing errors.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the transpose scratch tile materialization: Python API wrapper, IR transform lowering, type deduction, Python printer support, and comprehensive regression tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ir/op/tile_ops/transform.cpp`:
- Around line 344-349: The CHECK messages for the axis constants are using stale
ordinal names; in the block where axis_base is used to fetch axis1_const and
axis2_const (via As<ConstInt>(args[axis_base]) and As<ConstInt>(args[axis_base +
1])), update the error strings to correctly refer to the 3rd and 4th arguments
respectively (e.g., "tile.transpose requires third argument (axis1) to be a
ConstInt" for axis1_const and "tile.transpose requires fourth argument (axis2)
to be a ConstInt" for axis2_const) so diagnostics reflect the 4-arg transpose
ordering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 68a9cd3e-4e91-41ea-ac55-4d59cef62bcb

📥 Commits

Reviewing files that changed from the base of the PR and between ffbb5ae and 975e770.

📒 Files selected for processing (7)
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/language/op/tile_ops.py
  • src/backend/common/pto_ops_common.cpp
  • src/ir/op/tile_ops/transform.cpp
  • src/ir/transforms/lower_composite_ops_pass.cpp
  • tests/ut/codegen/test_pto_codegen_ops.py
  • tests/ut/ir/transforms/test_lower_composite_ops.py

Comment thread src/ir/op/tile_ops/transform.cpp Outdated
@high-cloud high-cloud force-pushed the fix/tile-transpose-scratch branch 2 times, most recently from 72706b9 to 9f12d8a Compare May 19, 2026 07:55
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ir/op/tile_ops/transform.cpp (1)

337-372: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate tmp against the inferred transpose result.

The 4-arg branch only checks that tmp is a TileType. A scratch tile with mismatched shape/dtype/valid-shape will still type-check, while the result type is derived entirely from input. That can hand codegen alloc metadata from tmp that disagrees with the transposed result this op claims to produce.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ir/op/tile_ops/transform.cpp` around lines 337 - 372, The 4-arg branch
currently only ensures args[1] is a TileType (tmp_type) but must also validate
that this scratch tile matches the transpose result: after computing
axis1/axis2, build the expected new_shape (swap input_shape[axis1] and [axis2])
and expected new_valid_shape (use GetValidShape(tile_type) and swap when sizes
match, otherwise use new_shape), then check tmp_type->dtype_ ==
tile_type->dtype_, tmp_type->shape == new_shape and that tmp_type's valid region
equals or is compatible with new_valid_shape; if any check fails, emit a CHECK
with a clear message referencing tile.transpose 4-arg tmp mismatch so codegen
won't get allocation metadata that disagrees with the derived result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/ir/op/tile_ops/transform.cpp`:
- Around line 337-372: The 4-arg branch currently only ensures args[1] is a
TileType (tmp_type) but must also validate that this scratch tile matches the
transpose result: after computing axis1/axis2, build the expected new_shape
(swap input_shape[axis1] and [axis2]) and expected new_valid_shape (use
GetValidShape(tile_type) and swap when sizes match, otherwise use new_shape),
then check tmp_type->dtype_ == tile_type->dtype_, tmp_type->shape == new_shape
and that tmp_type's valid region equals or is compatible with new_valid_shape;
if any check fails, emit a CHECK with a clear message referencing tile.transpose
4-arg tmp mismatch so codegen won't get allocation metadata that disagrees with
the derived result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a488c1a4-b94c-45ce-a598-e61b05a175bf

📥 Commits

Reviewing files that changed from the base of the PR and between 975e770 and 9f12d8a.

📒 Files selected for processing (7)
  • python/pypto/language/op/tile_ops.py
  • src/ir/op/tile_ops/transform.cpp
  • src/ir/transforms/lower_composite_ops_pass.cpp
  • src/ir/transforms/python_printer.cpp
  • tests/ut/codegen/test_pto_codegen_ops.py
  • tests/ut/ir/operators/test_tile_ops.py
  • tests/ut/ir/transforms/test_lower_composite_ops.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/ut/ir/transforms/test_lower_composite_ops.py
  • src/ir/transforms/lower_composite_ops_pass.cpp
  • tests/ut/codegen/test_pto_codegen_ops.py

@high-cloud high-cloud force-pushed the fix/tile-transpose-scratch branch from 9f12d8a to 1f8a6a0 Compare May 19, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug] Incore pl.transpose emits pto.alloc_tile with dynamic valid shape but no valid_row operand

1 participant