fix(ir): chain Acc -> Mat -> Left/Right tile.move in InferTileMemoryS… by ChristosMatzoros · Pull Request #1460 · hw-native-sys/pypto

ChristosMatzoros · 2026-05-21T13:54:48Z

Summary

InsertMoveStmt in InferTileMemorySpace did not consult the SoC memory graph
when inserting a tile.move between a producer and its constrained consumer.
When a cube op's output (placed in Acc per the matmul op's
set_output_memory(MemorySpace::Acc)) was consumed by another cube op that
required the value in Left or Right, the pass emitted a single direct
tile.move(producer, target=Left/Right). PTOAS's TMovOp::verify then
rejects the resulting pto.tmov at the codegen verifier:

error: 'pto.tmov' op expects a supported tmov address-space pair for this target

This PR teaches InsertMoveStmt to emit the legal two-hop sequence
Acc -> Mat -> Left/Right for the rejected pair, so each individual hop is
in mem_graph and accepted by PTOAS.

Cause

src/backend/common/soc.cpp declares for a2a3:

mem_graph[MemorySpace::Acc] = {MemorySpace::Mat, MemorySpace::DDR};
mem_graph[MemorySpace::Mat] = {MemorySpace::Left, MemorySpace::Right};

i.e. Acc has no direct edge to Left or Right. PTOAS enforces the same set in
its okPair table (AccToMat, AccToVec, MatToTile, VecToVec). The two
declarations agree on the hardware constraint; the pass simply ignored both.

Any pattern where a cube op's output feeds a subsequent cube op as Left or
Right triggers this. The minimal repro is the same one captured in the
test:

ab = pl.matmul(a, b)    # ab is placed in Acc per matmul's set_output_memory
abc = pl.matmul(ab, c)  # second matmul needs `ab` in Left — direct Acc->Left
                        # is not a legal tmov pair on a2a3

Fix

In InsertMoveStmt (src/ir/transforms/infer_tile_memory_space_pass.cpp),
detect the Acc -> {Left, Right} pair and emit two tile.move ops chained
through Mat. Intermediate hops carry no blayout/slayout override; only
the final hop carries the consumer's required layout so consumers see the
expected tile shape/layout.

Why narrow, not universal

The natural-looking alternative is "call Backend::FindMemPath for every
move and emit the path." That CHECK-fails on benign pairs that aren't in the
graph but are accepted by codegen (e.g. Mat -> Vec for pre-pop staging).
mem_graph models physical TMOV edges only — it's necessary but not
sufficient as a model of all permitted moves. This PR therefore handles only
the specific pair PTOAS actually rejects today.

Alternative locations considered

I'd be happy to relocate the rewrite if you prefer a different layer:

a dedicated LegalizeMemoryMoves pass running after InferTileMemorySpace,
an op-level input-memory constraint on tile.move to force the IR not to
produce the bad pair in the first place,
making mem_graph a complete model of codegen-supported pairs and using
Backend::FindMemPath universally.

These are bigger architectural changes; I picked the minimal-delta fix at the
site where the offending tile.move is emitted.

Related work

The pypto3 team has been steadily fixing the family of "pypto3 emits tmov
pairs that PTOAS rejects":

[Pass Bug] MemoryReuse fails to unify acc MemRef bases across nested pipelined matmul_acc loops, producing invalid pto.tmov acc→acc #1352 documents the acc→acc case (different MemRef bases under nested
pl.matmul_acc) and proposes a two-step plan: short-term codegen elide,
long-term IR fix.
fix(pass): correct Acc→Vec tile.move layout and same-dtype tile.cast fold #1431 fixed the Acc→Vec layout case in this same pass
(InferTileMemorySpace) by overriding required_blayout/required_slayout
for that producer→target combination.

This PR continues that pattern at the same IR layer for the remaining
Acc→Left/Right case — the team's stated "Path 2" preference in #1352.

Test plan

Added TestAutoMoveInsertion::test_matmul_auto_moves_acc_through_mat_to_left
in tests/ut/ir/transforms/test_infer_tile_memory_space.py. Uses the
Before/Expected @pl.program pattern and ir.assert_structural_equal per
the conventions in testing-and-examples.md and the existing tests in this
file. Asserts that the inserted moves form the Acc -> Mat -> Left chain.

Verified locally:

tests/ut/ir/transforms/test_infer_tile_memory_space.py — all tests in
this file pass (29 pre-existing + 1 new).
Full tests/ut/ run vs the same suite on a clean upstream/main checkout
in my local environment: identical pre-existing failure/skip sets. No
failure introduced by this patch; +1 test pass for the new regression
case. (My local env has a handful of environment-specific failures unrelated
to this pass — they reproduce on upstream/main without this patch and are
green on CI.)

Files changed

src/ir/transforms/infer_tile_memory_space_pass.cpp — refactor
InsertMoveStmt to factor out an emit_one_move helper, add the
Acc → Mat → {Left, Right} chain when needed.
tests/ut/ir/transforms/test_infer_tile_memory_space.py — new test.

No header / ODS / Python-binding changes. No documentation changes (the pass
behaviour described in docs/en/dev/passes/17-infer_tile_memory_space.md
remains accurate — this PR is a constraint enforcement, not a new feature).

…pace `InsertMoveStmt` in `InferTileMemorySpace` did not consult the SoC memory graph when inserting a `tile.move` between a producer and its constrained consumer. When a cube op's output (placed in `Acc` per the matmul op's `set_output_memory(MemorySpace::Acc)`) was consumed by another cube op that required the value in `Left` or `Right`, the pass emitted a single direct `tile.move(producer, target=Left/Right)`. PTOAS's `TMovOp::verify` then rejects the resulting `pto.tmov` at the codegen verifier with: 'pto.tmov' op expects a supported tmov address-space pair for this target `backend/common/soc.cpp` declares for a2a3: mem_graph[Acc] = {Mat, DDR} mem_graph[Mat] = {Left, Right} i.e. Acc has no direct edge to Left or Right. PTOAS enforces the same set in its `okPair` table. The two declarations agreed; the pass simply ignored them. This patch detects the `Acc -> Left/Right` pair in `InsertMoveStmt` and emits two `tile.move` ops chained through `Mat`, so each individual hop is in `mem_graph` and accepted by PTOAS. Intermediate hops carry no blayout/slayout override; only the final hop carries the consumer's required layout. The rewrite is narrow on purpose. `mem_graph` models only physical TMOV edges, but codegen accepts several pairs not in the graph (e.g. `Mat -> Vec` for pre-pop staging). A universal `Backend::FindMemPath` BFS would CHECK-fail on those benign pairs, so we only insert the intermediate hop for the specific pair PTOAS actually rejects. Test: new `TestAutoMoveInsertion::test_matmul_auto_moves_acc_through_mat_to_left` in `tests/ut/ir/transforms/test_infer_tile_memory_space.py` uses the Before/Expected pattern with `ir.assert_structural_equal` to verify the inserted moves form the `Acc -> Mat -> Left` chain.

coderabbitai · 2026-05-21T13:54:59Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5b089517-4116-4182-ac17-20d2ce3519ab

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the TileMemorySpaceMutator to handle hardware-constrained memory moves by inserting intermediate hops, such as routing Acc through Mat before reaching Left or Right spaces. The implementation uses a helper lambda to emit move operations and includes a new unit test to verify the multi-hop logic. Feedback suggests simplifying the move emission logic by consolidating the multi-hop and direct move paths into a single loop and final call for better maintainability.

gemini-code-assist · 2026-05-21T14:00:07Z

+    VarPtr final_moved_var;
+    if (path.size() > 2) {
+      // Multi-hop: emit one intermediate tile.move per hop.  Intermediate hops
+      // get no blayout/slayout overrides (use default for the intermediate
+      // memory space); only the FINAL hop carries the consumer's required
+      // layout so that the consumer sees the expected tile shape/layout.
+      ExprPtr cur = mutated_producer;
+      // path[0] == src_space, path.back() == target; iterate hops [1..size).
+      for (size_t i = 1; i + 1 < path.size(); ++i) {
+        auto inter_var = emit_one_move(cur, path[i], std::nullopt, std::nullopt);
+        cur = inter_var;
+      }
+      final_moved_var = emit_one_move(cur, target, required_blayout, required_slayout);
+    } else {
+      // Direct move (path.empty() means direct edge or src == target; treat
+      // both as single-hop fall-through to the original behaviour).
+      final_moved_var = emit_one_move(mutated_producer, target, required_blayout, required_slayout);
+    }


The logic for handling multi-hop and direct moves can be simplified by removing the explicit if/else block. Since cur is initialized to mutated_producer and the loop condition i + 1 < path.size() naturally handles empty or short paths, a single sequence of operations is sufficient and more maintainable.

ExprPtr cur = mutated_producer; // path[0] == src_space, path.back() == target; iterate hops [1..size-1). for (size_t i = 1; i + 1 < path.size(); ++i) { cur = emit_one_move(cur, path[i], std::nullopt, std::nullopt); } VarPtr final_moved_var = emit_one_move(cur, target, required_blayout, required_slayout);

github-project-automation Bot added this to pto project May 21, 2026

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ir): chain Acc -> Mat -> Left/Right tile.move in InferTileMemoryS…#1460

fix(ir): chain Acc -> Mat -> Left/Right tile.move in InferTileMemoryS…#1460
ChristosMatzoros wants to merge 1 commit into
hw-native-sys:mainfrom
ChristosMatzoros:fix/infer-tile-memory-space-acc-mat-hop

ChristosMatzoros commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChristosMatzoros commented May 21, 2026

Summary

Cause

Fix

Why narrow, not universal

Alternative locations considered

Related work

Test plan

Files changed

Uh oh!

coderabbitai Bot commented May 21, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant