test(runtime): Add runtime regression test for #1352 Path 1 by luohuan19 · Pull Request #1355 · hw-native-sys/pypto

luohuan19 · 2026-05-13T03:51:13Z

Summary

Add a runtime regression test reproducing the gate_up_silu pattern from the Qwen3-32B decode kernel which triggered the acc→acc pto.tmov bug on Ascend 910B (#1352).

The test uses the exact prolog-then-pipeline pattern with two independent accumulators (gate_acc, up_acc) under pl.at(CORE_GROUP, split=UP_DOWN), which forces the AIC/AIV split and exposes the MemRef-base mismatch on the AIC function — the root condition that produces invalid acc→acc pto.tmov ops.

This is the runtime-level regression test for Path 1 (codegen-level elision) of issue #1352. The codegen-level fix and a codegen-only regression test are bundled in this branch (the fix had not yet been pushed upstream).

Changes

tests/st/runtime/test_pipeline_matmul_acc.py — new runtime regression test reproducing the gate/up dual-accumulator pipeline kernel that triggered ptoas error: 'pto.tmov' op expects a supported tmov address-space pair.

Testing

Without the Path 1 fix: kernel fails to compile (ptoas rejects acc→acc pto.tmov).
With the Path 1 fix in src/backend/common/pto_ops_common.cpp: kernel compiles and produces numerically correct results (rtol=1e-3, atol=1e-3).

Related Issues

Refs #1352 (Path 1 — short-term codegen-level fix).

…w-native-sys#1352) Ascend 910B has no hardware path to copy data between disjoint L0C (accumulator) addresses; ptoas rejects any such pto.tmov with: 'pto.tmov' op expects a supported tmov address-space pair for this target These acc→acc tile.move ops are always semantic no-ops: they arise when MemoryReuse fails to unify MemRef bases across nested pipelined matmul_acc loops — YieldFixupMutator inserts them as fixups, but AllocateMemoryAddr then assigns different physical addresses to the un-unified bases, producing an acc@0 → acc@16384 → acc@0 round-trip. The prior fix (e4ddc66, hw-native-sys#1310) only elided same-space same-address moves. Extend the condition to also elide when both src and dst are MemorySpace::Acc, regardless of byte_offset. Co-authored-by: Cursor <cursoragent@cursor.com>

…peline loops Adds test_pipeline_matmul_acc_no_acc_to_acc_tmov to TestTileMoveAccNoopElision, covering the pattern where pl.pipeline(stage=2) + Mat-resident matmul_acc causes AutoTileMatmulL0 to insert inner K-loops whose _l0_c IterArg bases may not be unified by MemoryReuse, exposing acc→acc tile.move that ptoas rejects (hw-native-sys#1352). Co-authored-by: Cursor <cursoragent@cursor.com>

Add a runtime regression test that reproduces the gate_up_silu pattern from the Qwen3-32B decode kernel which triggered the acc→acc pto.tmov bug (hw-native-sys#1352). The test exercises Path 1 of the fix (codegen-level elision of acc→acc pto.tmov in src/backend/common/pto_ops_common.cpp). The test uses the exact prolog-then-pipeline pattern with two independent accumulators (gate_acc, up_acc) under pl.at(CORE_GROUP, split=UP_DOWN), which forces the AIC/AIV split and exposes the MemRef-base mismatch on the AIC function. Without the Path 1 fix, ptoas rejects the generated kernel with "'pto.tmov' op expects a supported tmov address-space pair". With the fix, the kernel compiles and produces numerically correct results. Refs hw-native-sys#1352

coderabbitai · 2026-05-13T03:51:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR elides pto.tmov in tile.move when both source and destination are in the Acc memory space (in addition to same-address elision), aliases the destination SSA to the source via SetCurrentResultBuf(), and adds unit and runtime regression tests exercising pipeline matmul_acc scenarios.

Changes

Acc→Acc tile.move no-op elision

Layer / File(s)	Summary
tile.move acc→acc elision implementation `src/backend/common/pto_ops_common.cpp`	Expanded inline documentation and updated codegen to compute `both_acc`; elides `pto.tmov` when `same_address
Runtime dual-accumulator pipeline test `tests/st/runtime/test_pipeline_matmul_acc.py`	New `TestPipelineMatmulAccGateUp` runtime test building gate/up `pl.matmul_acc` pipelines, enforcing consumption ordering with `pl.add(..., 0.0)`, multiplying accumulators, casting to BF16, and validating output against FP32 reference.
Codegen unit test for acc→acc tmov elision `tests/ut/codegen/test_pto_codegen_ops.py`	`_generate_mlir` updated to concatenate all InCore variants' MLIR; added `test_pipeline_matmul_acc_no_acc_to_acc_tmov()` which codegens a pipeline matmul_acc and asserts no acc→acc `pto.tmov` appears.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[Pass Bug] MemoryReuse fails to unify acc MemRef bases across nested pipelined matmul_acc loops, producing invalid pto.tmov acc→acc #1352: Directly related; implements the acc→acc pto.tmov elision described in the issue.

Possibly related PRs

hw-native-sys/pypto#1312: Related work refining tile.move codegen and adding codegen tests for acc→acc elision.
hw-native-sys/pypto#537: Related matmul_acc/pto.tmov buffering logic affecting accumulator moves.
hw-native-sys/pypto#347: Introduced buffer remapping APIs (SetCurrentResultBuf usage) that this change relies on.

Suggested reviewers

lyfne123

Poem

🐰 I hopped through code with eager cheer,

Found phantom moves that disappeared,
Now acc-to-acc need not be stirred,
Aliased bytes sing without a word,
Pipelines hum, no extra copy feared.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly and specifically describes the main change: adding a runtime regression test for issue `#1352` Path 1, which is the primary change across the files modified.
Description check	✅ Passed	The pull request description is well-detailed and directly related to the changeset, explaining the context of the gate_up_silu pattern, the test structure, and how it addresses issue `#1352` Path 1.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request addresses an issue where unsupported accumulator-to-accumulator moves were being generated in the backend for certain pipeline matmul patterns. The fix involves eliding these moves in the PTO operations common logic when both source and destination are in the accumulator memory space. The PR also includes a new runtime regression test and a codegen unit test to verify the fix. Feedback was provided regarding the use of the PyPTO DSL in the new runtime test, specifically pointing out that pl.create_tile should be used for in-core scratch space instead of pl.create_tensor, and that global tensors should be explicitly loaded into tiles before being used in pl.matmul operations within device-side blocks.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/common/pto_ops_common.cpp`:
- Around line 2197-2204: The current aliasing branch uses both_acc and silently
aliases all Acc→Acc tile.move, which breaks distinct-buffer copy semantics;
change the condition so aliasing only occurs when the move is provably a no-op
(e.g., retain the same_address check and replace the both_acc check with a call
to an explicit predicate like IsProvenNoOpTileMove(op) or inspect a specific op
attribute), and for other Acc→Acc cases do not call
codegen.SetCurrentResultBuf(codegen.GetExprAsCode(op->args_[0]))—instead emit
the normal copy lowering or return an explicit rejection/error for unsupported
distinct-buffer Acc→Acc copies; refer to same_address, both_acc,
ir::MemorySpace::Acc, tile.move, codegen.SetCurrentResultBuf, and op->args_[0]
when making the change.

In `@tests/ut/codegen/test_pto_codegen_ops.py`:
- Around line 1723-1784: The test only generates MLIR for the first InCore
variant, so when split=UP_DOWN it may miss the AIC kernel that contains the
acc→acc pto.tmov; update the test (test_pipeline_matmul_acc_no_acc_to_acc_tmov)
to iterate and codegen all split InCore variants produced by _generate_mlir (or
explicitly select the AIC variant) by checking ir.is_incore_type() for each
function and asserting _has_acc_to_acc_tmov on the AIC InCore output instead of
relying on the first function only.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d59032ee-5de7-4a54-8f94-4c745f9741d6

📥 Commits

Reviewing files that changed from the base of the PR and between 7da3b83 and a5741f4.

📒 Files selected for processing (3)

src/backend/common/pto_ops_common.cpp
tests/st/runtime/test_pipeline_matmul_acc.py
tests/ut/codegen/test_pto_codegen_ops.py

- ruff-format tests/st/runtime/test_pipeline_matmul_acc.py - Clarify out_tile comment: it is a DDR staging tensor, not an InCore tile - Harden TestTileMoveAccNoopElision._generate_mlir to codegen all InCore functions (not just the first), so split-kernel variants are covered - Drop out_dtype=FP32 from pl.matmul in test_pipeline_matmul_acc_no_acc_to_acc_tmov so the Tile overload matches and pyright accepts the Mat-tile inputs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/st/runtime/test_pipeline_matmul_acc.py`:
- Around line 57-58: Update the docstrings to match the implemented computation
and output dtype: replace references that say "gate_result + up_result" and
"FP32 output" with wording that the operation is elementwise multiplication
"(gate * up)" and that the result is stored as BF16 (e.g., "gate_result (BF16) *
up_result (BF16) stored as BF16"); apply the same change to both docstring
locations referenced (the earlier block around the first occurrence and the
second block at lines ~80-84) so the documentation accurately reflects the math
and dtype used in the test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0ada6f57-b84e-4f0e-b3ff-aeb9a7a0846e

📥 Commits

Reviewing files that changed from the base of the PR and between a5741f4 and d7b687b.

📒 Files selected for processing (2)

tests/st/runtime/test_pipeline_matmul_acc.py
tests/ut/codegen/test_pto_codegen_ops.py

…w-native-sys#1355)

luohuan19 and others added 3 commits May 13, 2026 11:49

github-project-automation Bot added this to pto project May 13, 2026

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Comment thread tests/st/runtime/test_pipeline_matmul_acc.py

Comment thread tests/st/runtime/test_pipeline_matmul_acc.py

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread src/backend/common/pto_ops_common.cpp

Comment thread tests/ut/codegen/test_pto_codegen_ops.py

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread tests/st/runtime/test_pipeline_matmul_acc.py Outdated

fix(pr): align test_pipeline_matmul_acc docstring with actual math (h…

aeba8f0

…w-native-sys#1355)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(runtime): Add runtime regression test for #1352 Path 1#1355

test(runtime): Add runtime regression test for #1352 Path 1#1355
luohuan19 wants to merge 5 commits into
hw-native-sys:mainfrom
luohuan19:test/pipeline-matmul-acc-1352

luohuan19 commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luohuan19 commented May 13, 2026

Summary

Changes

Testing

Related Issues

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 13, 2026 •

edited

Loading