fix(ir): chain Acc -> Mat -> Left/Right tile.move in InferTileMemoryS…#1460
Conversation
…pace
`InsertMoveStmt` in `InferTileMemorySpace` did not consult the SoC memory
graph when inserting a `tile.move` between a producer and its constrained
consumer. When a cube op's output (placed in `Acc` per the matmul op's
`set_output_memory(MemorySpace::Acc)`) was consumed by another cube op
that required the value in `Left` or `Right`, the pass emitted a single
direct `tile.move(producer, target=Left/Right)`. PTOAS's `TMovOp::verify`
then rejects the resulting `pto.tmov` at the codegen verifier with:
'pto.tmov' op expects a supported tmov address-space pair for this target
`backend/common/soc.cpp` declares for a2a3:
mem_graph[Acc] = {Mat, DDR}
mem_graph[Mat] = {Left, Right}
i.e. Acc has no direct edge to Left or Right. PTOAS enforces the same set
in its `okPair` table. The two declarations agreed; the pass simply
ignored them.
This patch detects the `Acc -> Left/Right` pair in `InsertMoveStmt` and
emits two `tile.move` ops chained through `Mat`, so each individual hop
is in `mem_graph` and accepted by PTOAS. Intermediate hops carry no
blayout/slayout override; only the final hop carries the consumer's
required layout.
The rewrite is narrow on purpose. `mem_graph` models only physical TMOV
edges, but codegen accepts several pairs not in the graph (e.g.
`Mat -> Vec` for pre-pop staging). A universal `Backend::FindMemPath`
BFS would CHECK-fail on those benign pairs, so we only insert the
intermediate hop for the specific pair PTOAS actually rejects.
Test: new `TestAutoMoveInsertion::test_matmul_auto_moves_acc_through_mat_to_left`
in `tests/ut/ir/transforms/test_infer_tile_memory_space.py` uses the
Before/Expected pattern with `ir.assert_structural_equal` to verify the
inserted moves form the `Acc -> Mat -> Left` chain.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the TileMemorySpaceMutator to handle hardware-constrained memory moves by inserting intermediate hops, such as routing Acc through Mat before reaching Left or Right spaces. The implementation uses a helper lambda to emit move operations and includes a new unit test to verify the multi-hop logic. Feedback suggests simplifying the move emission logic by consolidating the multi-hop and direct move paths into a single loop and final call for better maintainability.
| VarPtr final_moved_var; | ||
| if (path.size() > 2) { | ||
| // Multi-hop: emit one intermediate tile.move per hop. Intermediate hops | ||
| // get no blayout/slayout overrides (use default for the intermediate | ||
| // memory space); only the FINAL hop carries the consumer's required | ||
| // layout so that the consumer sees the expected tile shape/layout. | ||
| ExprPtr cur = mutated_producer; | ||
| // path[0] == src_space, path.back() == target; iterate hops [1..size). | ||
| for (size_t i = 1; i + 1 < path.size(); ++i) { | ||
| auto inter_var = emit_one_move(cur, path[i], std::nullopt, std::nullopt); | ||
| cur = inter_var; | ||
| } | ||
| final_moved_var = emit_one_move(cur, target, required_blayout, required_slayout); | ||
| } else { | ||
| // Direct move (path.empty() means direct edge or src == target; treat | ||
| // both as single-hop fall-through to the original behaviour). | ||
| final_moved_var = emit_one_move(mutated_producer, target, required_blayout, required_slayout); | ||
| } |
There was a problem hiding this comment.
The logic for handling multi-hop and direct moves can be simplified by removing the explicit if/else block. Since cur is initialized to mutated_producer and the loop condition i + 1 < path.size() naturally handles empty or short paths, a single sequence of operations is sufficient and more maintainable.
ExprPtr cur = mutated_producer;
// path[0] == src_space, path.back() == target; iterate hops [1..size-1).
for (size_t i = 1; i + 1 < path.size(); ++i) {
cur = emit_one_move(cur, path[i], std::nullopt, std::nullopt);
}
VarPtr final_moved_var = emit_one_move(cur, target, required_blayout, required_slayout);
Summary
InsertMoveStmtinInferTileMemorySpacedid not consult the SoC memory graphwhen inserting a
tile.movebetween a producer and its constrained consumer.When a cube op's output (placed in
Accper the matmul op'sset_output_memory(MemorySpace::Acc)) was consumed by another cube op thatrequired the value in
LeftorRight, the pass emitted a single directtile.move(producer, target=Left/Right). PTOAS'sTMovOp::verifythenrejects the resulting
pto.tmovat the codegen verifier:This PR teaches
InsertMoveStmtto emit the legal two-hop sequenceAcc -> Mat -> Left/Rightfor the rejected pair, so each individual hop isin
mem_graphand accepted by PTOAS.Cause
src/backend/common/soc.cppdeclares fora2a3:mem_graph[MemorySpace::Acc] = {MemorySpace::Mat, MemorySpace::DDR}; mem_graph[MemorySpace::Mat] = {MemorySpace::Left, MemorySpace::Right};i.e. Acc has no direct edge to Left or Right. PTOAS enforces the same set in
its
okPairtable (AccToMat,AccToVec,MatToTile,VecToVec). The twodeclarations agree on the hardware constraint; the pass simply ignored both.
Any pattern where a cube op's output feeds a subsequent cube op as
LeftorRighttriggers this. The minimal repro is the same one captured in thetest:
Fix
In
InsertMoveStmt(src/ir/transforms/infer_tile_memory_space_pass.cpp),detect the
Acc -> {Left, Right}pair and emit twotile.moveops chainedthrough
Mat. Intermediate hops carry noblayout/slayoutoverride; onlythe final hop carries the consumer's required layout so consumers see the
expected tile shape/layout.
Why narrow, not universal
The natural-looking alternative is "call
Backend::FindMemPathfor everymove and emit the path." That CHECK-fails on benign pairs that aren't in the
graph but are accepted by codegen (e.g.
Mat -> Vecfor pre-pop staging).mem_graphmodels physical TMOV edges only — it's necessary but notsufficient as a model of all permitted moves. This PR therefore handles only
the specific pair PTOAS actually rejects today.
Alternative locations considered
I'd be happy to relocate the rewrite if you prefer a different layer:
LegalizeMemoryMovespass running afterInferTileMemorySpace,tile.moveto force the IR not toproduce the bad pair in the first place,
mem_grapha complete model of codegen-supported pairs and usingBackend::FindMemPathuniversally.These are bigger architectural changes; I picked the minimal-delta fix at the
site where the offending
tile.moveis emitted.Related work
The pypto3 team has been steadily fixing the family of "pypto3 emits
tmovpairs that PTOAS rejects":
acc→acccase (different MemRef bases under nestedpl.matmul_acc) and proposes a two-step plan: short-term codegen elide,long-term IR fix.
Acc→Veclayout case in this same pass(
InferTileMemorySpace) by overridingrequired_blayout/required_slayoutfor that producer→target combination.
This PR continues that pattern at the same IR layer for the remaining
Acc→Left/Rightcase — the team's stated "Path 2" preference in #1352.Test plan
Added
TestAutoMoveInsertion::test_matmul_auto_moves_acc_through_mat_to_leftin
tests/ut/ir/transforms/test_infer_tile_memory_space.py. Uses theBefore/Expected
@pl.programpattern andir.assert_structural_equalperthe conventions in
testing-and-examples.mdand the existing tests in thisfile. Asserts that the inserted moves form the
Acc -> Mat -> Leftchain.Verified locally:
tests/ut/ir/transforms/test_infer_tile_memory_space.py— all tests inthis file pass (29 pre-existing + 1 new).
tests/ut/run vs the same suite on a cleanupstream/maincheckoutin my local environment: identical pre-existing failure/skip sets. No
failure introduced by this patch;
+1test pass for the new regressioncase. (My local env has a handful of environment-specific failures unrelated
to this pass — they reproduce on
upstream/mainwithout this patch and aregreen on CI.)
Files changed
src/ir/transforms/infer_tile_memory_space_pass.cpp— refactorInsertMoveStmtto factor out anemit_one_movehelper, add theAcc → Mat → {Left, Right}chain when needed.tests/ut/ir/transforms/test_infer_tile_memory_space.py— new test.No header / ODS / Python-binding changes. No documentation changes (the pass
behaviour described in
docs/en/dev/passes/17-infer_tile_memory_space.mdremains accurate — this PR is a constraint enforcement, not a new feature).