Skip to content

fix(scatter): tscatter mask-form PTOAS syntax + cmp blayout roundtrip (#1498)#1513

Open
Little-oil wants to merge 8 commits into
hw-native-sys:mainfrom
Little-oil:feat/scatter-review-fixes
Open

fix(scatter): tscatter mask-form PTOAS syntax + cmp blayout roundtrip (#1498)#1513
Little-oil wants to merge 8 commits into
hw-native-sys:mainfrom
Little-oil:feat/scatter-review-fixes

Conversation

@Little-oil
Copy link
Copy Markdown
Contributor

@Little-oil Little-oil commented May 26, 2026

Post-merge fixes for the scatter operators added in #1426, plus a fix for the
cmp/cmps TileView round-trip gap (#1498) that had forced a related test to be skipped.

1. pto.tscatter mask-form PTOAS syntax (functional bug)

The mask-form codegen emitted the maskPattern attribute as a trailing dict
after outs(...). PTOAS rejects that with expected ',' after src operand in ins(...). The attribute must ride inside ins() after the src operand,
exactly like pto.tgather's mask form:

pto.tscatter ins(%src, {maskPattern = #pto.mask_pattern<P0101>} : src_ty) outs(%dst : dst_ty)

The codegen UT now asserts maskPattern appears inside ins() (before outs()
so the layout cannot regress.

2. cmp/cmps packed-mask TileView blayout roundtrip — Closes #1498

The python printer elides a TileView field when it matches
GetImplicitTileView(tile_type.shape_, ...) (implicit view from the physical
tile shape). The text parser instead recomputed the implicit blayout from
valid_shape when present, desynchronising the two for packed-mask tiles: a
cmp/cmps result with physical shape [16, 8] but valid_shape [16, 1] had its
row_major blayout omitted by the printer, while the parser inferred
col_major from valid_shape's cols==1print->parse failed with
TileView blayout mismatch.

Fix: the parser now infers the implicit defaults from the physical tile shape
(falling back to valid_shape only when the shape is unavailable), matching the
printer. This un-skips TestConvertScatterOp::test_scatter_conversion, which
exercises the path through the scatter DPS-preserve blend.

3. Index-form review fixes

  • Guard the INT16 flat-index range in the tensor.scatter lowering (n*cols <= 32768)
    so an oversized 2-byte tile fails loudly instead of overflowing to wrong addresses.
  • INTERNAL_CHECK that the tscatter src/indexes type annotations are both present
    or both absent (a one-sided annotation is a codegen bug).
  • Document the duplicate-index ascending last-wins ordering as a pto.tscatter
    ABI guarantee the lowering and ST reference rely on.

Tests

  • Index-form ST (TestScatterIndexForm) across the dst/src + indexes dtype matrix
    (fp32 / int32 / fp16 / bf16 / int16), plus the repeated-index last-wins case and
    the single-row regression for [Bug] pl.tensor.scatter on 1-row src tile triggers tile.ci cols!=1 ISA check #1586.
  • Mask-form ST (TestScatterMaskForm, P0101/P1010, plus a chained P0101→P1010
    reassembly case) on the A2/A3 backend (Ascend910B); A5/Ascend950 rejects the
    mask form. dst is zero-init so the expected unselected columns hold regardless
    of zero-vs-preserve semantics.
  • test_scatter_conversion un-skipped, now passing under the autouse roundtrip instrument.
  • Full tests/ut/ir/transforms suite (all under the roundtrip instrument): 1330 passed,
    25 skipped; parser/printer/type_resolver suites green.

⚠️ Temporarily skipped (tracked, not blockers for this PR)

  • Index-form 2-byte path — test_scatter_fp16 / test_scatter_bf16 /
    test_scatter_int16
    : currently failing on device due to a pto-isa bug in
    the 2-byte (fp16/bf16/int16) lowering path, not in this PR's codegen. Skipped via
    @pytest.mark.skip pending a pto-isa fix; the fp32/int32 (4-byte) index-form
    cases pass. Will be re-enabled once pto-isa lands the fix.
  • Mask-form chain — test_scatter_mask_chain: the chained P0101→P1010 scatter
    into a single dst (RoPE even/odd reassembly) has a failure still being root-caused;
    skipped for now. The single-pattern P0101/P1010 mask cases pass.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 35e914cb-b279-4d0f-b2d6-6e93460e86a1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Fixes packed-mask TileView default inference to use physical tile_shape, adjusts pto.tscatter MLIR emission (maskPattern placement and operand typing), adds an INT16 overflow check during tensor.scatter lowering, and adds/updates runtime and unit tests for mask-form scatter and conversion roundtrip.

Changes

Scatter operations: parser, codegen, and testing stack

Layer / File(s) Summary
Parser type resolution: implicit TileView defaults from physical tile_shape
python/pypto/language/parser/type_resolver.py
TypeResolver._resolve_tileview now infers implicit TileView defaults from physical tile_shape when available, falling back to valid_shape only if tile_shape is missing.
Scatter semantics documentation and collision ordering
src/ir/op/tile_ops/scatter.cpp
Expands tile.scatter/pto.tscatter header to declare duplicate-index collision ordering: ascending element order with last/higher-index write wins (ABI guarantee).
PTO MLIR codegen: scatter typing and mask operand placement
src/backend/common/pto_ops_common.cpp
Enforce that src and indexes type annotations are either both present or both absent for tile.scatter emission; for tile.scatter_mask emit {maskPattern = #pto.mask_pattern<...>} inside ins(...) immediately after src, with optional : <src_type> annotation inside ins(...).
Scatter lowering: cols semantics and INT16 overflow guard
src/ir/transforms/op_conversion_registry.cpp
Document cols as the flattened destination column count shared across related tiles; add runtime CHECK that n * cols <= 32768 when index tile dtype is INT16.
Mask-form scatter runtime tests (P0101, P1010)
tests/st/runtime/test_scatter.py
Add mask-form scatter tests covering P0101 (even columns) and P1010 (odd columns), including spec builder, two @pl.program implementations, test-case classes computing expected outputs, and a pytest suite pinned to Ascend910B.
Unit test updates: MLIR operand validation and roundtrip re-enable
tests/ut/codegen/test_pto_codegen_ops.py, tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
Strengthen test_tile_scatter_mask_form_codegen to assert maskPattern appears inside ins(...) before outs(...). Re-enable previously skipped test_scatter_conversion and add comments documenting the roundtrip fix.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hw-native-sys/pypto#1426: Related scatter-operator implementation and prior MLIR codegen/mask-form handling this PR builds upon.

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 A scatter of masks, a shape that's true,
Print and parse in perfect symmetry grew,
With tile_shape now guiding the way,
Packed predicates roundtrip, hooray hooray!
And P0101, P1010 dance in the test,
Overflow guards put scatter to rest. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and specifically describes the main changes: fixing tscatter mask-form PTOAS syntax and addressing the cmp/cmps TileView blayout roundtrip issue (#1498).
Linked Issues check ✅ Passed The PR successfully implements the fix for issue #1498 by updating the parser to infer implicit TileView defaults from physical tile shape instead of valid_shape, and includes all related index-form improvements and test coverage as described.
Out of Scope Changes check ✅ Passed All code changes are directly scoped to the stated objectives: PTOAS syntax fix, blayout roundtrip fix, INT16 range guard, type-annotation symmetry check, and comprehensive test additions for mask-form scatter and roundtrip verification.
Description check ✅ Passed The pull request description comprehensively documents the changes made, including three functional fixes with detailed explanations of the issues and solutions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a print-to-parse roundtrip failure for packed-mask tiles by resolving implicit tile view defaults using the physical tile shape instead of the valid shape. It also implements the mask-form scatter (pto.tscatter) for A2/A3 backends, updates the PTO codegen to place the mask pattern inside the input arguments list, and adds comprehensive tests. The feedback suggests using the INTERNAL_CHECK_SPAN macro to automatically include source location information in the error message and dynamically printing the destination data type in the INT16 index range guard check to keep the error message accurate.

Comment thread src/backend/common/pto_ops_common.cpp Outdated
Comment thread src/ir/transforms/op_conversion_registry.cpp Outdated
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 26, 2026
- tscatter mask form: emit maskPattern inside ins() after src, before the type
  annotation (ins(%src, {maskPattern...} : src_ty) outs(%dst)) — device-verified.
- INTERNAL_CHECK -> INTERNAL_CHECK_SPAN(op->span_) for the scatter type-annotation
  symmetry check (gemini).
- INT16 flat-index guard: print the actual element dtype instead of hardcoding
  "2-byte" (INT16 covers 1- and 2-byte; gemini).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ir/transforms/op_conversion_registry.cpp`:
- Around line 1415-1420: Replace the unsafe multiplication CHECK(n * cols <=
32768) with a division-based bound to avoid signed int64_t overflow: define a
constant like kMaxFlat = 32767 (or 32768 per desired semantics), compute int64_t
max_rows = (cols == 0 ? kMaxFlat : kMaxFlat / cols), and then CHECK(n <=
max_rows). In the CHECK error message (the same CHECK site in
op_conversion_registry.cpp) do not recompute n*cols; instead report n, cols and
the computed max_rows (or kMaxFlat) to explain the limit and suggest splitting
the scatter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 57ac265f-2584-4af3-bf24-4f84ecbbba35

📥 Commits

Reviewing files that changed from the base of the PR and between 9e0e3b9 and 5bec213.

📒 Files selected for processing (2)
  • src/backend/common/pto_ops_common.cpp
  • src/ir/transforms/op_conversion_registry.cpp

Comment thread src/ir/transforms/op_conversion_registry.cpp Outdated
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 26, 2026
- INT16 scatter flat-index guard: bound rows via division (kMaxFlat/cols)
  instead of n*cols to avoid signed int64 overflow; handle cols==0
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 26, 2026
- tscatter mask form: emit maskPattern inside ins() after src, before the type
  annotation (ins(%src, {maskPattern...} : src_ty) outs(%dst)) — device-verified.
- INTERNAL_CHECK -> INTERNAL_CHECK_SPAN(op->span_) for the scatter type-annotation
  symmetry check (gemini).
- INT16 flat-index guard: print the actual element dtype instead of hardcoding
  "2-byte" (INT16 covers 1- and 2-byte; gemini).
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 26, 2026
- INT16 scatter flat-index guard: bound rows via division (kMaxFlat/cols)
  instead of n*cols to avoid signed int64 overflow; handle cols==0
@Little-oil Little-oil force-pushed the feat/scatter-review-fixes branch from 668210d to 96ca32c Compare May 26, 2026 07:29
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 27, 2026
- tscatter mask form: emit maskPattern inside ins() after src, before the type
  annotation (ins(%src, {maskPattern...} : src_ty) outs(%dst)) — device-verified.
- INTERNAL_CHECK -> INTERNAL_CHECK_SPAN(op->span_) for the scatter type-annotation
  symmetry check (gemini).
- INT16 flat-index guard: print the actual element dtype instead of hardcoding
  "2-byte" (INT16 covers 1- and 2-byte; gemini).
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 27, 2026
- INT16 scatter flat-index guard: bound rows via division (kMaxFlat/cols)
  instead of n*cols to avoid signed int64 overflow; handle cols==0
@Little-oil Little-oil force-pushed the feat/scatter-review-fixes branch from 96ca32c to 8de941a Compare May 27, 2026 08:18
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 28, 2026
- tscatter mask form: emit maskPattern inside ins() after src, before the type
  annotation (ins(%src, {maskPattern...} : src_ty) outs(%dst)) — device-verified.
- INTERNAL_CHECK -> INTERNAL_CHECK_SPAN(op->span_) for the scatter type-annotation
  symmetry check (gemini).
- INT16 flat-index guard: print the actual element dtype instead of hardcoding
  "2-byte" (INT16 covers 1- and 2-byte; gemini).
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 28, 2026
- INT16 scatter flat-index guard: bound rows via division (kMaxFlat/cols)
  instead of n*cols to avoid signed int64 overflow; handle cols==0
@Little-oil Little-oil force-pushed the feat/scatter-review-fixes branch from 8de941a to 3198967 Compare May 28, 2026 07:35
@Little-oil
Copy link
Copy Markdown
Contributor Author

blocked by PTO-ISA

@Little-oil Little-oil moved this to Blocked in pto project May 29, 2026
Youhezhen added 7 commits May 30, 2026 16:02
…hw-native-sys#1498)

The python printer elides a TileView field when it matches
GetImplicitTileView(tile_type.shape_, ...) — i.e. the implicit view derived
from the *physical* tile shape. The text parser, however, recomputed the
implicit blayout/slayout/fractal from `valid_shape` when it was given,
desynchronising the two for packed-mask tiles.

A cmp/cmps result has physical shape e.g. [16, 8] but valid_shape [16, 1]:
the printer omits its (row_major) blayout, while the parser saw valid_shape's
cols==1 and filled col_major, so print->parse failed structural equality with
"TileView blayout mismatch". Infer the implicit defaults from the physical
tile shape (falling back to valid_shape only when the shape is unavailable),
matching the printer.

Un-skips TestConvertScatterOp::test_scatter_conversion, which exercises this
path via the scatter DPS-preserve blend and now round-trips cleanly.
…view fixes

Mask-form codegen: pto.tscatter requires the maskPattern attribute *inside*
ins() after the src operand (same shape as pto.tgather's mask form), e.g.
`ins(%src, {maskPattern = #pto.mask_pattern<P0101>} : src_ty) outs(%dst)`.
The previous trailing-attribute form made PTOAS fail with "expected ',' after
src operand in ins(...)". The codegen UT now asserts maskPattern appears inside
ins() (before outs()) so the layout can't regress.

Also addresses review feedback on the index form:
- Guard the INT16 flat-index range in the tensor.scatter lowering (n*cols must
  stay <= 32768) so an oversized 2-byte tile fails loudly instead of silently
  overflowing to wrong destination addresses.
- Add an INTERNAL_CHECK that the src/indexes type annotations are both present
  or both absent (a one-sided annotation is a codegen bug, not valid input).
- Document the duplicate-index ascending last-wins ordering as a pto.tscatter
  ABI guarantee that the lowering and ST reference both rely on.

Tests: add mask-form ST (TestScatterMaskForm, P0101/P1010) on the A2/A3
backend (Ascend910B); A5/Ascend950 rejects the mask form. dst is zero-init so
the expected unselected columns are correct regardless of whether tscatter
zeros or preserves them.
- tscatter mask form: emit maskPattern inside ins() after src, before the type
  annotation (ins(%src, {maskPattern...} : src_ty) outs(%dst)) — device-verified.
- INTERNAL_CHECK -> INTERNAL_CHECK_SPAN(op->span_) for the scatter type-annotation
  symmetry check (gemini).
- INT16 flat-index guard: print the actual element dtype instead of hardcoding
  "2-byte" (INT16 covers 1- and 2-byte; gemini).
- INT16 scatter flat-index guard: bound rows via division (kMaxFlat/cols)
  instead of n*cols to avoid signed int64 overflow; handle cols==0
Fold the RoPE even/odd reassembly repro into TestScatterMaskForm: write
two compact inputs into one dst by chaining P0101 then P1010 mask
scatters, pinning that the second scatter preserves the first's writes
(dst[:, 0::2] = even, dst[:, 1::2] = odd). even/odd use disjoint positive
ranges so a swapped pattern or clobbered column is caught.
@Little-oil Little-oil force-pushed the feat/scatter-review-fixes branch from fc39366 to 24a053a Compare May 30, 2026 08:12
… / WIP)

Skip test_scatter_fp16/bf16/int16: the index-form 2-byte path currently
fails on device due to a pto-isa bug (not this PR's codegen); re-enable
once pto-isa lands the fix. The fp32/int32 4-byte cases still run.

Skip test_scatter_mask_chain: the chained P0101->P1010 reassembly into a
single dst is still being root-caused; the single-pattern P0101/P1010
mask cases still run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Blocked

Development

Successfully merging this pull request may close these issues.

[Bug] cmp/cmps packed-mask result TileView loses its blayout on print->parse roundtrip

1 participant