Skip to content

fix(codegen): require row_major layout for tile.cast (pto.tcvt)#1559

Merged
lyfne123 merged 3 commits into
hw-native-sys:mainfrom
Little-oil:issue-1549-tcvt-colmajor-narrowing
May 29, 2026
Merged

fix(codegen): require row_major layout for tile.cast (pto.tcvt)#1559
lyfne123 merged 3 commits into
hw-native-sys:mainfrom
Little-oil:issue-1549-tcvt-colmajor-narrowing

Conversation

@Little-oil
Copy link
Copy Markdown
Contributor

@Little-oil Little-oil commented May 27, 2026

Summary

pto.tcvt (the lowering of tile.cast) silently mis-orders elements when its source tile is col_major — e.g. a reshaped [n, 1] index vector narrowed i32 -> i16. The same cast on a row_major source is correct, so the failure is silent wrong output with no diagnostic. This is what produced reversed scatter rows in the FP16 tensor.scatter_update lowering (issue #1549).

PyPTO already drives this exact class of ISA constraint through the ResolveBackendOpLayouts pass, which reshapes a [n, 1] col_major vector to [1, n] row_major around a constrained op and restores the layout afterwards. tile.cast simply had no layout spec, so it was never repaired.

Changes

  • src/backend/common/pto_ops_common.cpp: register tile.cast with set_input_layout(0, row_major) + set_output_layout(row_major), mirroring tile.rsqrt / tile.cmps / tile.sort32. ResolveBackendOpLayouts now repairs every col_major caller generically. Row-major callers are unaffected (no repair, zero overhead).
  • tests/ut/ir/transforms/test_resolve_backend_op_layouts_pass.py: pass-level regression for a col_major [16, 1] i32 -> i16 cast being repaired through a [1, 16] row_major reshape.
  • tests/st/runtime/ops/test_cast.py: new end-to-end ST — a col_major [N, 1] i32 -> i16 narrow (the [Bug] tile.cast (pto.tcvt) narrowing mis-orders elements when the source tile is col_major #1549 regression, must preserve element order) plus a row_major [1, N] control case.
  • docs/en + docs/zh-cn 20-resolve_backend_op_layouts.md: list tile.cast among the constrained ops.
  • src/ir/transforms/op_conversion_registry.cpp: comment-only trim of the scatter_update lowering — its i32-compute / narrow-at-the-end design is kept for the alignment benefit. No behavior change: this path narrows only the row-major [n, d] flat index, so it never feeds a col_major source to tile.cast, and the new layout spec is a no-op for it.

Testing

  • New + existing ResolveBackendOpLayouts UTs pass (5/5)
  • tests/ut/ir/transforms/ + tests/ut/codegen/: 1687 passed, 26 skipped
  • tests/ut/ir/operators/test_tile_ops.py: 237 passed
  • Pre-commit hooks (clang-format, cpplint, ruff, pyright, markdownlint) pass
  • On-device ST tests/st/runtime/ops/test_cast.py::TestCast::test_tile_cast_col_major_narrow (hardware, to be confirmed by reviewer) — the direct col_major-cast regression for this fix.

Note: test_scatter_update.py::...::test_tile_scatter_update_fp16 is not a target of this PR. That path was already worked around in #1537 (narrow only the row-major [n, d] flat index), so this PR does not change its codegen and it needs no re-confirmation here.

Related Issues

Fixes #1549

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR constrains tile.cast narrowing to require row-major layout on both input and output, registers the constraint in backend op initialization, adds a test verifying the ResolveBackendOpLayouts pass repairs column-major sources via reshape, and updates documentation to reflect the new constraint across English and Chinese developer guides.

Changes

tile.cast row-major layout constraint and repair verification

Layer / File(s) Summary
Backend tile.cast registration with row-major constraints
src/backend/common/pto_ops_common.cpp
tile.cast registration now guards against exclude_ops, explicitly sets input operand 0 and output layout to row_major, and adds rationale comments about pto.tcvt mis-ordering on col_major sources and expected caller-side reshape repair.
Pass test and narrowing behavior documentation
tests/ut/ir/transforms/test_resolve_backend_op_layouts_pass.py, src/ir/transforms/op_conversion_registry.cpp
New test test_rewrites_column_vector_cast_through_row_major_reshape verifies the pass transforms a [N, 1] column-major cast by reshaping to [1, N] row-major, casting, and reshaping back. Comment in tensor.scatter_update narrowing clarifies row-major layout handling of flat-index i32 → i16 narrowing.
Documentation updates
docs/en/dev/passes/20-resolve_backend_op_layouts.md, docs/zh-cn/dev/passes/20-resolve_backend_op_layouts.md
Backend-registered layout constraints documentation now includes tile.cast in the row-major-required ops example list for both English and Chinese guides.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • hw-native-sys/pypto#729: Modifies RegisterPTOOps to register/constrain backend tile ops with ir::TileLayout::row_major layout expectations, directly feeding the same ResolveBackendOpLayouts rewrite mechanism.
  • hw-native-sys/pypto#1231: Extends the ResolveBackendOpLayouts pass to coerce inputs/outputs into required row-major layouts via reshape, the same repair mechanism exercised by the new test.
  • hw-native-sys/pypto#588: Introduces the core ResolveBackendOpLayouts reshape-based repair mechanism that this PR now leverages for tile.cast column-major source handling.

Suggested labels

bug

Poem

🐰 A rabbit hops through column-major woes,
Reshaping cast-ops where the mis-order grows.
Row-major now guards where chaos dwelt,
While the pass stitches tiles as they're felt.
Hopping toward order, one layout at a time! 🏃‍♂️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing tile.cast to require row_major layout in the PTO codegen layer.
Linked Issues check ✅ Passed The PR fully addresses issue #1549 by registering tile.cast with row_major layout constraints, adding a regression test, updating documentation, and including a comment refresh on the scatter_update lowering.
Out of Scope Changes check ✅ Passed All changes are in scope: layout registration, regression test, documentation updates, and a comment-only refresh of related lowering code—all directly supporting the #1549 fix.
Description check ✅ Passed The pull request description comprehensively explains the bug fix addressing mis-ordering in tile.cast for col_major sources, details all changes across multiple files, and directly relates to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request resolves an issue where tile.cast (implemented via pto.tcvt) mis-orders elements when its source tile is column-major (e.g., a reshaped [n, 1] index vector). It registers row-major layout constraints for both input and output of tile.cast in pto_ops_common.cpp, allowing the ResolveBackendOpLayouts pass to automatically repair column-major callers by reshaping them to row-major before the cast and back to column-major afterward. The PR also updates relevant documentation, refactors code comments in op_conversion_registry.cpp, and adds a regression unit test to verify the fix. There are no review comments to address.

Youhezhen added 2 commits May 29, 2026 09:31
Fixes hw-native-sys#1549

pto.tcvt silently mis-orders elements when its source tile is col_major
(e.g. a reshaped [n, 1] index vector narrowed i32 -> i16): the same cast
on a row_major source is correct. This produced reversed scatter rows in
the FP16 tensor.scatter_update lowering.

Register tile.cast with set_input_layout(0, row_major) and
set_output_layout(row_major), mirroring tile.rsqrt / tile.cmps / tile.sort32.
ResolveBackendOpLayouts then repairs every col_major caller generically by
reshaping [n, 1] -> [1, n] row_major around the cast and restoring the
original layout afterwards. Row-major callers are unaffected (no repair).

Add a ResolveBackendOpLayouts regression test for a col_major [16, 1]
i32 -> i16 cast, and list tile.cast among the constrained ops in the pass
docs (en + zh-cn).

Also refresh the scatter_update lowering comment: its i32-compute /
narrow-at-the-end design is retained for the alignment and canonical-layout
benefit, while the col_major mis-ordering it used to dodge is now covered by
the general tile.cast layout spec.
… sources

Covers hw-native-sys#1549: narrows i32 -> i16 on a col_major [N, 1] view (reshaped from
[1, N]) and, as a control, on a row_major [1, N] tile. The col_major case is
the regression — element order must be preserved.

NOTE: still verifying that the tile.cast row_major layout spec actually engages
for this reshape-sourced col_major path in the full pipeline; the repair was not
observed to fire in a device-free compile, which needs follow-up before relying
on this ST as a passing gate.
@Little-oil Little-oil force-pushed the issue-1549-tcvt-colmajor-narrowing branch from a96c187 to dc183de Compare May 29, 2026 01:31
Condense the tile.cast row_major rationale in pto_ops_common.cpp to match
the neighboring tile.rsqrt comment style, and drop the historical narration
from the scatter_update flat-index comment in op_conversion_registry.cpp,
keeping only the alignment rationale that still describes the code.
@lyfne123 lyfne123 merged commit 5a2aaec into hw-native-sys:main May 29, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug] tile.cast (pto.tcvt) narrowing mis-orders elements when the source tile is col_major

2 participants