feat(pass): compress manual_scope phase-fence deps with ExpandManualPhaseFence by Leaf-Salix · Pull Request #1545 · hw-native-sys/pypto

Leaf-Salix · 2026-05-26T15:34:11Z

Summary

Adds a dedicated ExpandManualPhaseFence IR pass that compresses profitable
explicit manual_scope full-array TaskId dependencies before backend
lowering.
Rewrites profitable phase-fence fanout from
```
N producers -> M consumers
```
into
```
N producers -> dummy barrier -> M consumers
```
by inserting one dependency-only system.task_dummy.
Keeps loop-entry dependency snapshots aligned with the previous phase, so
current-phase parallel branches do not accidentally observe TaskId slots
already updated by earlier branches in the same phase.
Preserves the existing direct dependency lowering for low-benefit, mixed,
partial-slot, unresolved, pure range-only, non-orchestration, and update-hazard
dependency shapes.
Updates runtime swimlane witnesses under the current scheduling test layout:
tests/st/runtime/scheduling/.

ExpandManualPhaseFence

This PR moves manual phase-fence dependency compression into a dedicated IR pass.

The pass recognizes profitable orchestration manual_scope consumer sites whose
manual dependency is exactly one full Array[TASK_ID]. For those sites, it
inserts a marked system.task_dummy barrier carrying the original dependency
array, then rewrites covered consumers to depend on the barrier TaskId.

That changes the runtime dependency shape from direct all-to-all fanout to a
single dependency-only barrier:

previous phase tids[N] -> dummy barrier -> current phase consumers[M]

The main correctness point is the phase snapshot boundary. A loop-carried
TaskId array should represent the completed previous phase when current-phase
consumers are submitted. The pass therefore creates the barrier before the
current profitable fanout rather than letting each consumer branch expand
deps=[tids] after earlier branches may have already written updated task ids
back into the same carrier.

The pass is intentionally narrow. It is not a general DAG optimizer; it only
compresses clear local full-array phase-fence fanout patterns where the barrier
is both profitable and safe.

Fallback Boundaries

The pass keeps direct dependency lowering when the shape is not clearly safe or
profitable.

Current behavior:

profitable full-array fanout compresses;
minimum-profitable 3 -> 3 fanout compresses;
low-benefit N -> 1 and 2 -> 2 fanout stay direct;
pure range -> range repeated consumers stay direct;
scalar deps, mixed deps, multiple-array deps, unresolved arrays, and
auto-scope deps stay direct;
partial-slot deps such as prev = tids[i]; deps=[prev] stay guarded scalar
direct deps;
same-body update hazards stay direct;
non-orchestration functions are ignored.

In practice, if the pass cannot recognize one clear profitable full-array manual
phase-fence pattern, it leaves the existing dependency path unchanged.

One caveat worth calling out: a shared outer tids carrier is not implicitly
lifted into per-lane ownership. For example,

tids -> parallel { range { parallel { deps=[tids] } } }

does not become

parallel { lane-local tids -> range { parallel { deps=[tids] } } }

Independent outer lanes still need the carrier itself to be scoped or indexed
per lane.

Runtime Test Sync Notes

This branch also updates older manual_scope and pl.at scheduling witnesses
to match the post-compression dependency shape and the current test directory
layout.

The runtime ST coverage now lives under:

tests/st/runtime/scheduling/test_phase_fence_dep_compression.py
tests/st/runtime/scheduling/test_manual_scope_pipeline.py
tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py

The swimlane assertions intentionally avoid depending on a stable visible
dummy-task marker. The externally required runtime contract is strict phase
ordering: tasks in flattened stage k + 1 must not start before all tasks in
flattened stage k finish.

The updated witnesses also avoid stale pre-compression assumptions that required
direct all-to-all fanout observability in the swimlane artifact.

Tests

Coverage added or updated in this branch:

pass behavior and fallback boundaries:
tests/ut/ir/transforms/test_expand_manual_phase_fence.py
orchestration codegen shape:
tests/ut/codegen/test_phase_fence_dep_compression.py
existing manual-scope codegen coverage:
tests/ut/codegen/test_orchestration_codegen.py
runtime correctness and phase-order witnesses:
tests/st/runtime/scheduling/test_phase_fence_dep_compression.py
scheduling swimlane expectation sync:
tests/st/runtime/scheduling/test_manual_scope_pipeline.py
tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py

Representative validation commands:

pytest tests/ut/ir/transforms/test_expand_manual_phase_fence.py
pytest tests/ut/codegen/test_phase_fence_dep_compression.py
pytest tests/ut/codegen/test_orchestration_codegen.py -k "phase_fence or manual_scope"

pytest tests/st/runtime/scheduling/test_phase_fence_dep_compression.py -q --platform=a2a3
pytest tests/st/runtime/scheduling/test_manual_scope_pipeline.py -q --platform=a2a3
pytest tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py -q --platform=a2a3

PYPTO_PHASE_FENCE_EXTRA_SWIMLANE=1 pytest \
  tests/st/runtime/scheduling/test_phase_fence_dep_compression.py::TestPhaseFenceDepCompressionSwimlane::test_multiloop_chain_default \
  --enable-l2-swimlane -q --platform=a2a3

PYPTO_PHASE_FENCE_EXTRA_SWIMLANE=1 pytest \
  tests/st/runtime/scheduling/test_phase_fence_dep_compression.py::TestPhaseFenceDepCompressionSwimlane::test_submit_three_level_strict \
  --enable-l2-swimlane -q --platform=a2a3

PYPTO_PHASE_FENCE_EXTRA_SWIMLANE=1 pytest \
  tests/st/runtime/scheduling/test_phase_fence_dep_compression.py::TestPhaseFenceDepCompressionSwimlane::test_pl_at_three_level_strict \
  --enable-l2-swimlane -q --platform=a2a3

These cover profitable full-array compression, loop-entry snapshot correctness,
low-benefit and mixed-dep fallback, partial-slot scalar fallback, codegen barrier
shape, runtime correctness, and strict phase ordering after dummy-barrier
insertion.

… behavior

coderabbitai · 2026-05-26T15:34:28Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 23b3c258-20d4-4bb9-8a67-e3e18b807d09

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR implements a new IR transform pass that optimizes manual-scope task dependencies by inserting explicit dummy barrier tasks to compress large all-to-all dependency arrays. The pass analyzes manual-dependency patterns, inserts system.task_dummy assignments when profitable, and rewrites consumer calls to depend on the barriers instead of full dependency arrays. Codegen is updated to lower dummy barriers to rt_submit_dummy_task runtime submissions with proper fanin wiring.

Changes

Manual Phase Fence Dependency Compression

Layer / File(s)	Summary
IR Contract: Dummy Task Barrier Attribute and Operator `include/pypto/ir/expr.h`, `src/ir/op/sync_ops/task.cpp`	New `kAttrDummyTask` attribute marks dependency-only `system.task_dummy` calls. The `system.task_dummy` operator takes no arguments and produces `Scalar[TASK_ID]` for use as internal barriers.
Pass Infrastructure: Declaration, Properties, and Pipeline Integration `CMakeLists.txt`, `include/pypto/ir/transforms/passes.h`, `include/pypto/ir/transforms/pass_properties.h`, `python/bindings/modules/passes.cpp`, `python/pypto/pypto_core/passes.pyi`, `python/pypto/ir/pass_manager.py`	Pass factory declaration, IR properties specification, Python bindings/stubs, and insertion into default/debug-tile optimization pipelines after `DeriveCallDirections`.
IR Transform Implementation: Barrier Analysis and Insertion `src/ir/transforms/expand_manual_phase_fence_pass.cpp`	Core pass: utility/detection helpers for task-id arrays and manual-dep edges; IR traversal to identify profitable compression opportunities; barrier emission logic that creates `system.task_dummy` assignments with appropriate manual-dependency attributes; `ManualPhaseFenceMutator` that traverses loops within manual scopes, rewrites covered consumer calls, and inserts barriers.
Codegen Lowering: System Task Dummy and Barrier Submission `src/codegen/orchestration/orchestration_codegen.cpp`, `python/bindings/modules/ir.cpp`	`OrchestrationStmtCodegen` recognizes `system.task_dummy` ops and validates their dummy-task attribute. New `EmitDummyTask` helper initializes barrier task IDs (invalid by default), fills dependency arrays when manual deps are present, submits `rt_submit_dummy_task`, and manages phase-fence barrier counter state. `array.get_element` in manual scope records extracted task ID scalars for dependency targeting. Python binding docstring updated to document reserved attribute keys.
Unit Tests: IR Transform Validation `tests/ut/ir/operators/test_task_ops.py`, `tests/ut/ir/transforms/test_expand_manual_phase_fence.py`	IR helper builders for constants, consumer call sites, array slot extraction, and orchestration programs with manual scopes. Tests verify profitable parallel array-dependency insertion and consumer rewriting; non-insertion for scalar-only, mixed array+scalar, partial-slot, low-benefit, and range-consumer patterns. Also verifies non-orchestration functions are ignored and IterArg barriers use visible initialization values.
Unit Tests: Orchestration Codegen Enhancements `tests/ut/codegen/test_orchestration_codegen.py`	Updated `test_manual_scope_parallel_array_carry_above_legacy_16_cap` to assert dummy barrier emission and compressed downstream dependency arrays for `pl.parallel(N > 16)`. Added three new tests validating non-emission for scalar deps, mixed deps, and auto-scope array deps.
Unit Tests: Comprehensive Phase Fence Codegen Test Suite `tests/ut/codegen/test_phase_fence_dep_compression.py`	Helper functions to generate and compile programs, assert barrier shapes, and validate code ordering. 15 test cases covering flattened/`pl.at`/nested phases, sibling producer-consumer loops, multiloop chains, dense mixed graphs, partial reduces, and fallback behaviors (scalar fallback, low-benefit, partial slots, updated arrays). Validates `rt_submit_dummy_task` emission, dependency array sizes, and loop ordering via substring assertions.
Integration Tests: Swimlane and System-Level Validation `tests/st/runtime/scheduling/test_manual_scope_pipeline.py`, `tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py`, `tests/st/runtime/scheduling/test_phase_fence_dep_compression.py`	Updated swimlane tests to tolerate dummy barrier task observation and replaced strict fanout/chain assertions with weaker task-count checks and reconstructed chain helpers. New 900-line comprehensive phase-fence test module with program builders for 9 orchestration patterns, swimlane timing helpers, phase-strictness validators, and both correctness and profiling witness test suites. Updated pass manager test expectations to include new pass.

Sequence Diagram

sequenceDiagram
  participant OrchestrationFunc as Orchestration<br/>Function Body
  participant ExpandPass as ExpandManualPhaseFence<br/>IR Pass
  participant ManualScope as RuntimeScopeStmt<br/>(manual=true)
  participant ForLoop as ForStmt<br/>(parallel)
  participant Barrier as system.task_dummy<br/>Assignment
  participant Consumer as Consumer<br/>Call
  
  OrchestrationFunc->>ExpandPass: Transform program
  ExpandPass->>ManualScope: Analyze within manual scope
  ManualScope->>ForLoop: Check loop for profitable deps
  ForLoop->>ExpandPass: Identify manual_dep_edges array
  ExpandPass->>ExpandPass: Count producers/consumers,<br/>estimate edge savings
  alt Profitable
    ExpandPass->>Barrier: Insert phase_fence_barrier_*_tid<br/>assignment with manual_dep_edges
    Barrier->>Consumer: Rewrite consumer manual_dep_edges<br/>to reference barrier variable
  else Not Profitable
    ExpandPass->>ForLoop: Keep original dependency arrays
  end
  ExpandPass-->>OrchestrationFunc: Return transformed program

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related Issues

[Bug] (ST Flake) Swimlane DAG observability is not stable enough for exact fanout and branch-chain assertions #1483: Addresses same swimlane observability challenge by relaxing strict fanout assertions and rewriting branch-chain validation in updated runtime scheduling tests to tolerate runtime-inserted dummy barrier tasks.

Possibly Related PRs

hw-native-sys/pypto#671: Modifies PassManager pipeline strategy definitions; the new ExpandManualPhaseFence pass insertion depends on the retrieved PR's refactored pass ordering infrastructure.

Suggested Labels

enhancement

Suggested Reviewers

lyfne123
Hzfengsy

Poem

🐰 A barrier stands tall in the phase-fenced land,
Compressing deps with careful hand,
No more arrays bloat with fanout's sprawl—
One dummy task holds it all! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.61% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(pass): compress manual_scope phase-fence deps with ExpandManualPhaseFence' clearly summarizes the main change: adding a new IR pass that compresses manual_scope phase-fence dependencies.
Description check	✅ Passed	The description comprehensively explains the PR objectives, implementation details, fallback boundaries, test coverage, and validation commands—all directly related to the ExpandManualPhaseFence pass addition.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new compiler pass, ExpandManualPhaseFence, designed to optimize manual phase fences by compressing multi-dependency fanouts into synthetic dummy barrier tasks (system.task_dummy). It updates the orchestration codegen to lower these dummy tasks to rt_submit_dummy_task, registers the pass in the Python bindings and pass manager, and adds comprehensive correctness and swimlane tests. The review feedback identifies two main improvement opportunities: removing the unused helper function WithBoolAttr in the new pass file, and optimizing sequential loops by placing barriers for loop-invariant dependencies outside the loop body to avoid redundant dummy task submissions on every iteration.

gemini-code-assist · 2026-05-26T15:36:18Z

+static std::vector<std::pair<std::string, std::any>> WithBoolAttr(
+    std::vector<std::pair<std::string, std::any>> attrs, const std::string& key, bool value) {
+  for (auto& [k, v] : attrs) {
+    if (k == key) {
+      v = value;
+      return attrs;
+    }
+  }
+  attrs.emplace_back(key, value);
+  return attrs;
+}


The helper function WithBoolAttr is defined but never used anywhere in this file or the rest of the codebase. Removing this dead code improves maintainability and keeps the codebase clean.

gemini-code-assist · 2026-05-26T15:36:18Z

+    if (!decisions.empty() && op->kind_ != ForKind::Parallel) {
+      auto body_stmts = FlattenToVector(body_with_current_rewrites);
+      std::vector<StmtPtr> with_barriers;
+      with_barriers.reserve(decisions.size() + body_stmts.size());
+      for (const auto& decision : decisions) {
+        with_barriers.push_back(decision.barrier_stmt);
+      }
+      with_barriers.insert(with_barriers.end(), body_stmts.begin(), body_stmts.end());
+      body_with_current_rewrites = MakeSeqOrStmt(std::move(with_barriers), op->span_);
+    }


For sequential loops (op->kind_ != ForKind::Parallel), placing the barrier inside the loop body causes a dummy task to be submitted on every single iteration of the loop. While this is necessary for loop-carried dependencies (which change per iteration), it is highly inefficient for loop-invariant dependencies (which are constant across all iterations). For loop-invariant dependencies, the barrier should be placed outside the sequential loop, just like it is for parallel loops. This would reduce the number of dummy task submissions from O(trip_count) to O(1) and significantly lower runtime overhead.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef45dc562d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T15:40:40Z

+    for (const auto& dep_array : CollectDirectManualDepArrays(body)) {
+      if (!dep_array || BodyUpdatesArray(body, dep_array.get())) continue;
+      int64_t consumers = CountManualDepConsumersOnArray(body, dep_array.get());
+      if (is_parallel) consumers *= trip_count;
+      try_add(dep_array, dep_array, consumers);


Guard barrier hoisting against loop-local dep arrays

When BuildDecisions scans CollectDirectManualDepArrays(body), it treats every matching array as hoistable in parallel loops, but VisitStmt_(ForStmtPtr) later always places the barrier before the loop. If a manual_dep_edges array is introduced inside the loop body (for example via an alias/rebind var used by consumers), the synthesized system.task_dummy will reference a value that is not defined at the hoist point. In that case codegen can compute zero barrier deps and leave the barrier TaskId invalid, so rewritten consumers lose their intended dependency edges. Please restrict this path to dep-array vars that dominate the loop header (or keep such cases uncompressed).

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

tests/ut/ir/transforms/test_expand_manual_phase_fence.py (1)

292-298: 💤 Low value

Missing pytest.main entry point.

The file is missing the standard if __name__ == "__main__" block for direct test execution that other test files in the repository have.

Based on learnings, prefer invoking pytest programmatically via pytest.main([__file__, "-v"]) inside the if __name__ == "__main__" guard for consistent test execution across test files.

🔧 Suggested addition at end of file

 def test_pure_range_consumer_does_not_insert_dummy():
     tids = ir.Var("tids", ir.ArrayType(DataType.TASK_ID, 4), S)
     before = _program_with_loop(_consumer("a", [tids]), kind=ir.ForKind.Sequential)

     after = _run(before)
     first_call = cast(ir.Call, cast(ir.AssignStmt, _loop_body_stmts(after)[0]).value)
     assert first_call.op.name != "system.task_dummy"
+
+
+if __name__ == "__main__":
+    import pytest
+    pytest.main([__file__, "-v"])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/transforms/test_expand_manual_phase_fence.py` around lines 292 -
298, Add the standard "__main__" pytest entry so this test file can be executed
directly: add an if __name__ == "__main__" guard and call pytest.main([__file__,
"-v"]). Ensure you import pytest at top if missing; reference the existing
test_pure_range_consumer_does_not_insert_dummy test and other tests in this file
so running the module executes pytest for this file.

tests/ut/codegen/test_phase_fence_dep_compression.py (1)

22-37: ⚡ Quick win

Avoid re-running phase-fence transforms after the default pipeline.

_compile_program() already runs the default pass strategy, then _generate_orch_code() applies derive_call_directions + expand_manual_phase_fence again. That can hide pipeline-wiring regressions and makes this suite assert a double-pass flow.

Suggested helper split

-def _generate_orch_code(program) -> str:
-    program = passes.derive_call_directions()(program)
-    program = passes.expand_manual_phase_fence()(program)
+def _generate_orch_code(program, *, apply_phase_fence_passes: bool = False) -> str:
+    if apply_phase_fence_passes:
+        program = passes.derive_call_directions()(program)
+        program = passes.expand_manual_phase_fence()(program)
     for func in program.functions.values():
         if func.func_type == ir.FunctionType.Orchestration:
             return codegen.generate_orchestration(program, func).code
     raise ValueError("No orchestration function found in program")

 def _compile_program(program_cls) -> str:
@@
     transformed = pm.run_passes(program_cls)
-    return _generate_orch_code(transformed)
+    return _generate_orch_code(transformed)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_phase_fence_dep_compression.py` around lines 22 - 37,
The test helper double-runs phase-fence transforms: _compile_program runs the
default pass pipeline (PassManager.get_strategy/pm.run_passes) and then
_generate_orch_code re-applies derive_call_directions() and
expand_manual_phase_fence(), masking regressions that depend on a single-pass
flow. Fix by removing the redundant transforms from _generate_orch_code — have
_compile_program produce a fully transformed program (using pm.run_passes) and
make _generate_orch_code simply locate the orchestration function and call
codegen.generate_orchestration(program, func). Update references to
derive_call_directions and expand_manual_phase_fence accordingly so they are
only invoked via the default pipeline, not again in _generate_orch_code.

tests/ut/codegen/test_orchestration_codegen.py (1)

3936-3941: ⚡ Quick win

Make task-deps assertions resilient to task index renumbering.

These checks pin params_t0/params_t1 directly, which is brittle when non-semantic codegen changes renumber tasks. Prefer params_t\d+ regex here, like other tests in this file.

Suggested adjustment

-        assert "PTO2TaskId params_t0_deps[1];" in code, code
+        assert re.search(r"PTO2TaskId params_t\d+_deps\[1\];", code), code
...
-        assert "PTO2TaskId params_t1_deps[5];" in code, code
+        assert re.search(r"PTO2TaskId params_t\d+_deps\[5\];", code), code
...
-        assert "PTO2TaskId params_t0_deps[4];" in code, code
+        assert re.search(r"PTO2TaskId params_t\d+_deps\[4\];", code), code

Also applies to: 4026-4027, 4071-4072

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 3936 - 3941, The
assertions hard-code task numbers (e.g., "params_t0_deps" / "params_t1_deps")
which breaks when task indices are renumbered; change them to use regex-based
checks that match params_t\d+ (for example use re.search on patterns like
r'PTO2TaskId\s+params_t\d+_deps\[\d+\];' and
r'if\s+\(phase_fence_barrier_0_tid\.is_valid\(\)\)\s+params_t\d+\_deps\[params_t\d+_deps_count\+\+\]\s*='
) so the tests verify the same semantics without binding to a specific task
index (also update the similar assertions at the other occurrences noted around
lines 4026-4027 and 4071-4072); keep the negative check for ABOVE_LEGACY_CAP
as-is but apply the params_t\d+ pattern where task-specific names are asserted.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/st/runtime/scheduling/test_manual_scope_pipeline.py`:
- Around line 237-244: The test test_intra_iteration_dep_present was left as a
no-op because the runtime checks were commented out; re-enable the runtime
fanout assertions by uncommenting the manual_scope_swimlane_data checks (the
tasks extraction, total_fanout computation and assert) and ensure the assertion
verifies total_fanout >= _M * _N (or another documented, conservative
lower-bound if you expect runtime variance) so the test fails on missing
intra-iteration dependency edges; reference the variables
manual_scope_swimlane_data, tasks, total_fanout, and the constants _M and _N
when restoring the assertion and keeping the original failure message for
diagnostic clarity.

In `@tests/st/runtime/scheduling/test_phase_fence_dep_compression.py`:
- Around line 52-57: The helper _assert_flattened_stage_strict currently calls
pytest.skip when the observed task count is less than expected (in the block
around the tasks length check) which masks regressions; change that behavior to
fail the test instead by replacing pytest.skip(...) with pytest.fail(...) (or an
explicit assert/raise) and update the similar helper block that also skips on
insufficient tasks (the second occurrence around the other helper/branch at
lines 67-70) so both helpers (including the one referenced near the other
task-count check) raise a failure with a clear message like "need >= {expected}
tasks for phase-fence check, got {len(tasks)}".

In `@tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py`:
- Around line 228-232: The current conditional uses pytest.skip when
total_fanout < _M * _N which can hide real intra-iteration dependency
regressions; change the behavior so the test fails (or at minimum xfails)
instead of silently skipping: replace the pytest.skip call with pytest.fail (or
an assert) that includes the same diagnostic message and values for total_fanout
and _M * _N so failures are visible in runtime witness mode; keep the same
message text but call pytest.fail(...) (or assert total_fanout >= _M * _N,
f"...") referencing total_fanout, _M and _N and remove the silent skip.

---

Nitpick comments:
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 3936-3941: The assertions hard-code task numbers (e.g.,
"params_t0_deps" / "params_t1_deps") which breaks when task indices are
renumbered; change them to use regex-based checks that match params_t\d+ (for
example use re.search on patterns like r'PTO2TaskId\s+params_t\d+_deps\[\d+\];'
and
r'if\s+\(phase_fence_barrier_0_tid\.is_valid\(\)\)\s+params_t\d+\_deps\[params_t\d+_deps_count\+\+\]\s*='
) so the tests verify the same semantics without binding to a specific task
index (also update the similar assertions at the other occurrences noted around
lines 4026-4027 and 4071-4072); keep the negative check for ABOVE_LEGACY_CAP
as-is but apply the params_t\d+ pattern where task-specific names are asserted.

In `@tests/ut/codegen/test_phase_fence_dep_compression.py`:
- Around line 22-37: The test helper double-runs phase-fence transforms:
_compile_program runs the default pass pipeline
(PassManager.get_strategy/pm.run_passes) and then _generate_orch_code re-applies
derive_call_directions() and expand_manual_phase_fence(), masking regressions
that depend on a single-pass flow. Fix by removing the redundant transforms from
_generate_orch_code — have _compile_program produce a fully transformed program
(using pm.run_passes) and make _generate_orch_code simply locate the
orchestration function and call codegen.generate_orchestration(program, func).
Update references to derive_call_directions and expand_manual_phase_fence
accordingly so they are only invoked via the default pipeline, not again in
_generate_orch_code.

In `@tests/ut/ir/transforms/test_expand_manual_phase_fence.py`:
- Around line 292-298: Add the standard "__main__" pytest entry so this test
file can be executed directly: add an if __name__ == "__main__" guard and call
pytest.main([__file__, "-v"]). Ensure you import pytest at top if missing;
reference the existing test_pure_range_consumer_does_not_insert_dummy test and
other tests in this file so running the module executes pytest for this file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0eee7a2f-df8f-49c6-a1e6-3ad12d27e870

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb34f4 and ef45dc5.

📒 Files selected for processing (19)

CMakeLists.txt
include/pypto/ir/expr.h
include/pypto/ir/transforms/pass_properties.h
include/pypto/ir/transforms/passes.h
python/bindings/modules/ir.cpp
python/bindings/modules/passes.cpp
python/pypto/ir/pass_manager.py
python/pypto/pypto_core/passes.pyi
src/codegen/orchestration/orchestration_codegen.cpp
src/ir/op/sync_ops/task.cpp
src/ir/transforms/expand_manual_phase_fence_pass.cpp
tests/st/runtime/scheduling/test_manual_scope_pipeline.py
tests/st/runtime/scheduling/test_phase_fence_dep_compression.py
tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py
tests/ut/codegen/test_orchestration_codegen.py
tests/ut/codegen/test_phase_fence_dep_compression.py
tests/ut/ir/operators/test_task_ops.py
tests/ut/ir/transforms/test_expand_manual_phase_fence.py
tests/ut/ir/transforms/test_pass_manager.py

coderabbitai · 2026-05-26T15:45:47Z

+        # Swimlane fanout is runtime-dependent on this witness and can
+        # under-report the same underlying dependency shape. Do not hard-assert
+        # the aggregate fanout count here.
+        # tasks = manual_scope_swimlane_data["tasks"]
+        # total_fanout = sum(t["fanout_count"] for t in tasks)
+        # assert total_fanout >= _M * _N, (
+        #     f"expected at least {_M * _N} fan-out edges (one per stage1->stage2 pair), got {total_fanout}"
+        # )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test_intra_iteration_dep_present is currently a no-op.

Line [237]–Line [244] comment out all runtime checks, so this test always passes and cannot detect dep-edge regressions.

Suggested fix

def test_intra_iteration_dep_present(self, manual_scope_swimlane_data: dict): @@ - # Swimlane fanout is runtime-dependent on this witness and can - # under-report the same underlying dependency shape. Do not hard-assert - # the aggregate fanout count here. - # tasks = manual_scope_swimlane_data["tasks"] - # total_fanout = sum(t["fanout_count"] for t in tasks) - # assert total_fanout >= _M * _N, ( - # f"expected at least {_M * _N} fan-out edges (one per stage1->stage2 pair), got {total_fanout}" - # ) + tasks = manual_scope_swimlane_data["tasks"] + total_fanout = sum(t["fanout_count"] for t in tasks) + if total_fanout == 0: + pytest.fail("no observable stage1->stage2 fanout edges in swimlane") + if total_fanout < _M * _N: + pytest.skip( + f"swimlane under-reports fanout edges ({total_fanout} < {_M * _N}); " + "strict dep wiring is covered by codegen UT" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Swimlane fanout is runtime-dependent on this witness and can

# under-report the same underlying dependency shape. Do not hard-assert

# the aggregate fanout count here.

# tasks = manual_scope_swimlane_data["tasks"]

# total_fanout = sum(t["fanout_count"] for t in tasks)

# assert total_fanout >= _M * _N, (

# f"expected at least {_M * _N} fan-out edges (one per stage1->stage2 pair), got {total_fanout}"

# )

tasks = manual_scope_swimlane_data["tasks"]

total_fanout = sum(t["fanout_count"] for t in tasks)

if total_fanout == 0:

pytest.fail("no observable stage1->stage2 fanout edges in swimlane")

if total_fanout < _M * _N:

pytest.skip(

f"swimlane under-reports fanout edges ({total_fanout} < {_M * _N}); "

"strict dep wiring is covered by codegen UT"

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/st/runtime/scheduling/test_manual_scope_pipeline.py` around lines 237 - 244, The test test_intra_iteration_dep_present was left as a no-op because the runtime checks were commented out; re-enable the runtime fanout assertions by uncommenting the manual_scope_swimlane_data checks (the tasks extraction, total_fanout computation and assert) and ensure the assertion verifies total_fanout >= _M * _N (or another documented, conservative lower-bound if you expect runtime variance) so the test fails on missing intra-iteration dependency edges; reference the variables manual_scope_swimlane_data, tasks, total_fanout, and the constants _M and _N when restoring the assertion and keeping the original failure message for diagnostic clarity.

coderabbitai · 2026-05-26T15:45:48Z

+def _assert_flattened_stage_strict(swimlane_data: dict, *, stages: int, branches: int) -> None:
+    expected = stages * branches
+    tasks = swimlane_data["tasks"]
+    if len(tasks) < expected:
+        pytest.skip(f"need >= {expected} tasks for phase-fence check, got {len(tasks)}")
+    tasks = sorted(tasks, key=lambda t: t["start_time_us"])[:expected]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Task-count undershoots should fail, not skip, in witness helpers.

Line [55]-Line [56] and Line [69]-Line [70] currently skip when too few tasks are observed, which can hide real scheduling regressions across many callers.

Suggested fix

def _assert_flattened_stage_strict(swimlane_data: dict, *, stages: int, branches: int) -> None: expected = stages * branches tasks = swimlane_data["tasks"] - if len(tasks) < expected: - pytest.skip(f"need >= {expected} tasks for phase-fence check, got {len(tasks)}") + assert len(tasks) >= expected, ( + f"need >= {expected} tasks for phase-fence check, got {len(tasks)}" + ) @@ def _assert_min_task_count(swimlane_data: dict, *, expected: int) -> None: tasks = swimlane_data["tasks"] - if len(tasks) < expected: - pytest.skip(f"need >= {expected} tasks for swimlane check, got {len(tasks)}") + assert len(tasks) >= expected, ( + f"need >= {expected} tasks for swimlane check, got {len(tasks)}" + )

Also applies to: 67-70

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/st/runtime/scheduling/test_phase_fence_dep_compression.py` around lines 52 - 57, The helper _assert_flattened_stage_strict currently calls pytest.skip when the observed task count is less than expected (in the block around the tasks length check) which masks regressions; change that behavior to fail the test instead by replacing pytest.skip(...) with pytest.fail(...) (or an explicit assert/raise) and update the similar helper block that also skips on insufficient tasks (the second occurrence around the other helper/branch at lines 67-70) so both helpers (including the one referenced near the other task-count check) raise a failure with a clear message like "need >= {expected} tasks for phase-fence check, got {len(tasks)}".

coderabbitai · 2026-05-26T15:45:48Z

+        if total_fanout < _M * _N:
+            pytest.skip(
+                f"pl.at swimlane under-reports outlined fanout edges ({total_fanout} < {_M * _N}); "
+                "strict dep wiring is covered by codegen UT"
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip-only fanout handling can hide real intra-iteration dep regressions.

At Line [228]–Line [232], low observed fanout always skips; a broken dependency shape can silently pass in runtime witness mode.

Suggested fix

tasks = pl_at_deps_swimlane_data["tasks"] total_fanout = sum(t["fanout_count"] for t in tasks) + if total_fanout == 0: + pytest.fail("no observable stage1->stage2 fanout edges in pl.at swimlane") if total_fanout < _M * _N: pytest.skip( f"pl.at swimlane under-reports outlined fanout edges ({total_fanout} < {_M * _N}); " "strict dep wiring is covered by codegen UT" )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py` around lines 228 - 232, The current conditional uses pytest.skip when total_fanout < _M * _N which can hide real intra-iteration dependency regressions; change the behavior so the test fails (or at minimum xfails) instead of silently skipping: replace the pytest.skip call with pytest.fail (or an assert) that includes the same diagnostic message and values for total_fanout and _M * _N so failures are visible in runtime witness mode; keep the same message text but call pytest.fail(...) (or assert total_fanout >= _M * _N, f"...") referencing total_fanout, _M and _N and remove the silent skip.

Leaf-Salix and others added 7 commits May 26, 2026 07:43

feat(phase-fence): add ExpandManualPhaseFence and dummy-task plumbing

891d874

fix(phase-fence): tighten barrier placement and fallback semantics

012d905

test(ut): add and stabilize phase-fence transform/codegen coverage

b2e348f

test(st): add phase-fence runtime witness matrix

5848f10

test(swimlane): align observability-sensitive assertions with runtime…

e7a2c9f

… behavior

chore: apply pre-commit cleanup after history restack

0bcfb18

test(st): move phase-fence runtime witness into scheduling

ef45dc5

github-project-automation Bot added this to pto project May 26, 2026

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

fix(pass): address phase-fence review follow-ups

934eca7

Leaf-Salix marked this pull request as draft May 27, 2026 09:10

Conversation

Leaf-Salix commented May 26, 2026

Summary

ExpandManualPhaseFence

Fallback Boundaries

Runtime Test Sync Notes

Tests

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related Issues

Possibly Related PRs

Suggested Labels

Suggested Reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 26, 2026 •

edited

Loading