Skip to content

feat(pass): compress manual_scope phase-fence deps with ExpandManualPhaseFence#1545

Draft
Leaf-Salix wants to merge 8 commits into
hw-native-sys:mainfrom
Leaf-Salix:codex/phase-fence-dep-compression-passVer
Draft

feat(pass): compress manual_scope phase-fence deps with ExpandManualPhaseFence#1545
Leaf-Salix wants to merge 8 commits into
hw-native-sys:mainfrom
Leaf-Salix:codex/phase-fence-dep-compression-passVer

Conversation

@Leaf-Salix
Copy link
Copy Markdown

Summary

  • Adds a dedicated ExpandManualPhaseFence IR pass that compresses profitable
    explicit manual_scope full-array TaskId dependencies before backend
    lowering.

  • Rewrites profitable phase-fence fanout from

    N producers -> M consumers
    

    into

    N producers -> dummy barrier -> M consumers
    

    by inserting one dependency-only system.task_dummy.

  • Keeps loop-entry dependency snapshots aligned with the previous phase, so
    current-phase parallel branches do not accidentally observe TaskId slots
    already updated by earlier branches in the same phase.

  • Preserves the existing direct dependency lowering for low-benefit, mixed,
    partial-slot, unresolved, pure range-only, non-orchestration, and update-hazard
    dependency shapes.

  • Updates runtime swimlane witnesses under the current scheduling test layout:
    tests/st/runtime/scheduling/.

ExpandManualPhaseFence

This PR moves manual phase-fence dependency compression into a dedicated IR pass.

The pass recognizes profitable orchestration manual_scope consumer sites whose
manual dependency is exactly one full Array[TASK_ID]. For those sites, it
inserts a marked system.task_dummy barrier carrying the original dependency
array, then rewrites covered consumers to depend on the barrier TaskId.

That changes the runtime dependency shape from direct all-to-all fanout to a
single dependency-only barrier:

previous phase tids[N] -> dummy barrier -> current phase consumers[M]

The main correctness point is the phase snapshot boundary. A loop-carried
TaskId array should represent the completed previous phase when current-phase
consumers are submitted. The pass therefore creates the barrier before the
current profitable fanout rather than letting each consumer branch expand
deps=[tids] after earlier branches may have already written updated task ids
back into the same carrier.

The pass is intentionally narrow. It is not a general DAG optimizer; it only
compresses clear local full-array phase-fence fanout patterns where the barrier
is both profitable and safe.

Fallback Boundaries

The pass keeps direct dependency lowering when the shape is not clearly safe or
profitable.

Current behavior:

  • profitable full-array fanout compresses;
  • minimum-profitable 3 -> 3 fanout compresses;
  • low-benefit N -> 1 and 2 -> 2 fanout stay direct;
  • pure range -> range repeated consumers stay direct;
  • scalar deps, mixed deps, multiple-array deps, unresolved arrays, and
    auto-scope deps stay direct;
  • partial-slot deps such as prev = tids[i]; deps=[prev] stay guarded scalar
    direct deps;
  • same-body update hazards stay direct;
  • non-orchestration functions are ignored.

In practice, if the pass cannot recognize one clear profitable full-array manual
phase-fence pattern, it leaves the existing dependency path unchanged.

One caveat worth calling out: a shared outer tids carrier is not implicitly
lifted into per-lane ownership. For example,

tids -> parallel { range { parallel { deps=[tids] } } }

does not become

parallel { lane-local tids -> range { parallel { deps=[tids] } } }

Independent outer lanes still need the carrier itself to be scoped or indexed
per lane.

Runtime Test Sync Notes

This branch also updates older manual_scope and pl.at scheduling witnesses
to match the post-compression dependency shape and the current test directory
layout.

The runtime ST coverage now lives under:

  • tests/st/runtime/scheduling/test_phase_fence_dep_compression.py
  • tests/st/runtime/scheduling/test_manual_scope_pipeline.py
  • tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py

The swimlane assertions intentionally avoid depending on a stable visible
dummy-task marker. The externally required runtime contract is strict phase
ordering: tasks in flattened stage k + 1 must not start before all tasks in
flattened stage k finish.

The updated witnesses also avoid stale pre-compression assumptions that required
direct all-to-all fanout observability in the swimlane artifact.

Tests

Coverage added or updated in this branch:

  • pass behavior and fallback boundaries:
    tests/ut/ir/transforms/test_expand_manual_phase_fence.py
  • orchestration codegen shape:
    tests/ut/codegen/test_phase_fence_dep_compression.py
  • existing manual-scope codegen coverage:
    tests/ut/codegen/test_orchestration_codegen.py
  • runtime correctness and phase-order witnesses:
    tests/st/runtime/scheduling/test_phase_fence_dep_compression.py
  • scheduling swimlane expectation sync:
    tests/st/runtime/scheduling/test_manual_scope_pipeline.py
    tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py

Representative validation commands:

pytest tests/ut/ir/transforms/test_expand_manual_phase_fence.py
pytest tests/ut/codegen/test_phase_fence_dep_compression.py
pytest tests/ut/codegen/test_orchestration_codegen.py -k "phase_fence or manual_scope"

pytest tests/st/runtime/scheduling/test_phase_fence_dep_compression.py -q --platform=a2a3
pytest tests/st/runtime/scheduling/test_manual_scope_pipeline.py -q --platform=a2a3
pytest tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py -q --platform=a2a3

PYPTO_PHASE_FENCE_EXTRA_SWIMLANE=1 pytest \
  tests/st/runtime/scheduling/test_phase_fence_dep_compression.py::TestPhaseFenceDepCompressionSwimlane::test_multiloop_chain_default \
  --enable-l2-swimlane -q --platform=a2a3

PYPTO_PHASE_FENCE_EXTRA_SWIMLANE=1 pytest \
  tests/st/runtime/scheduling/test_phase_fence_dep_compression.py::TestPhaseFenceDepCompressionSwimlane::test_submit_three_level_strict \
  --enable-l2-swimlane -q --platform=a2a3

PYPTO_PHASE_FENCE_EXTRA_SWIMLANE=1 pytest \
  tests/st/runtime/scheduling/test_phase_fence_dep_compression.py::TestPhaseFenceDepCompressionSwimlane::test_pl_at_three_level_strict \
  --enable-l2-swimlane -q --platform=a2a3

These cover profitable full-array compression, loop-entry snapshot correctness,
low-benefit and mixed-dep fallback, partial-slot scalar fallback, codegen barrier
shape, runtime correctness, and strict phase ordering after dummy-barrier
insertion.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 23b3c258-20d4-4bb9-8a67-e3e18b807d09

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements a new IR transform pass that optimizes manual-scope task dependencies by inserting explicit dummy barrier tasks to compress large all-to-all dependency arrays. The pass analyzes manual-dependency patterns, inserts system.task_dummy assignments when profitable, and rewrites consumer calls to depend on the barriers instead of full dependency arrays. Codegen is updated to lower dummy barriers to rt_submit_dummy_task runtime submissions with proper fanin wiring.

Changes

Manual Phase Fence Dependency Compression

Layer / File(s) Summary
IR Contract: Dummy Task Barrier Attribute and Operator
include/pypto/ir/expr.h, src/ir/op/sync_ops/task.cpp
New kAttrDummyTask attribute marks dependency-only system.task_dummy calls. The system.task_dummy operator takes no arguments and produces Scalar[TASK_ID] for use as internal barriers.
Pass Infrastructure: Declaration, Properties, and Pipeline Integration
CMakeLists.txt, include/pypto/ir/transforms/passes.h, include/pypto/ir/transforms/pass_properties.h, python/bindings/modules/passes.cpp, python/pypto/pypto_core/passes.pyi, python/pypto/ir/pass_manager.py
Pass factory declaration, IR properties specification, Python bindings/stubs, and insertion into default/debug-tile optimization pipelines after DeriveCallDirections.
IR Transform Implementation: Barrier Analysis and Insertion
src/ir/transforms/expand_manual_phase_fence_pass.cpp
Core pass: utility/detection helpers for task-id arrays and manual-dep edges; IR traversal to identify profitable compression opportunities; barrier emission logic that creates system.task_dummy assignments with appropriate manual-dependency attributes; ManualPhaseFenceMutator that traverses loops within manual scopes, rewrites covered consumer calls, and inserts barriers.
Codegen Lowering: System Task Dummy and Barrier Submission
src/codegen/orchestration/orchestration_codegen.cpp, python/bindings/modules/ir.cpp
OrchestrationStmtCodegen recognizes system.task_dummy ops and validates their dummy-task attribute. New EmitDummyTask helper initializes barrier task IDs (invalid by default), fills dependency arrays when manual deps are present, submits rt_submit_dummy_task, and manages phase-fence barrier counter state. array.get_element in manual scope records extracted task ID scalars for dependency targeting. Python binding docstring updated to document reserved attribute keys.
Unit Tests: IR Transform Validation
tests/ut/ir/operators/test_task_ops.py, tests/ut/ir/transforms/test_expand_manual_phase_fence.py
IR helper builders for constants, consumer call sites, array slot extraction, and orchestration programs with manual scopes. Tests verify profitable parallel array-dependency insertion and consumer rewriting; non-insertion for scalar-only, mixed array+scalar, partial-slot, low-benefit, and range-consumer patterns. Also verifies non-orchestration functions are ignored and IterArg barriers use visible initialization values.
Unit Tests: Orchestration Codegen Enhancements
tests/ut/codegen/test_orchestration_codegen.py
Updated test_manual_scope_parallel_array_carry_above_legacy_16_cap to assert dummy barrier emission and compressed downstream dependency arrays for pl.parallel(N > 16). Added three new tests validating non-emission for scalar deps, mixed deps, and auto-scope array deps.
Unit Tests: Comprehensive Phase Fence Codegen Test Suite
tests/ut/codegen/test_phase_fence_dep_compression.py
Helper functions to generate and compile programs, assert barrier shapes, and validate code ordering. 15 test cases covering flattened/pl.at/nested phases, sibling producer-consumer loops, multiloop chains, dense mixed graphs, partial reduces, and fallback behaviors (scalar fallback, low-benefit, partial slots, updated arrays). Validates rt_submit_dummy_task emission, dependency array sizes, and loop ordering via substring assertions.
Integration Tests: Swimlane and System-Level Validation
tests/st/runtime/scheduling/test_manual_scope_pipeline.py, tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py, tests/st/runtime/scheduling/test_phase_fence_dep_compression.py
Updated swimlane tests to tolerate dummy barrier task observation and replaced strict fanout/chain assertions with weaker task-count checks and reconstructed chain helpers. New 900-line comprehensive phase-fence test module with program builders for 9 orchestration patterns, swimlane timing helpers, phase-strictness validators, and both correctness and profiling witness test suites. Updated pass manager test expectations to include new pass.

Sequence Diagram

sequenceDiagram
  participant OrchestrationFunc as Orchestration<br/>Function Body
  participant ExpandPass as ExpandManualPhaseFence<br/>IR Pass
  participant ManualScope as RuntimeScopeStmt<br/>(manual=true)
  participant ForLoop as ForStmt<br/>(parallel)
  participant Barrier as system.task_dummy<br/>Assignment
  participant Consumer as Consumer<br/>Call
  
  OrchestrationFunc->>ExpandPass: Transform program
  ExpandPass->>ManualScope: Analyze within manual scope
  ManualScope->>ForLoop: Check loop for profitable deps
  ForLoop->>ExpandPass: Identify manual_dep_edges array
  ExpandPass->>ExpandPass: Count producers/consumers,<br/>estimate edge savings
  alt Profitable
    ExpandPass->>Barrier: Insert phase_fence_barrier_*_tid<br/>assignment with manual_dep_edges
    Barrier->>Consumer: Rewrite consumer manual_dep_edges<br/>to reference barrier variable
  else Not Profitable
    ExpandPass->>ForLoop: Keep original dependency arrays
  end
  ExpandPass-->>OrchestrationFunc: Return transformed program
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related Issues

Possibly Related PRs

  • hw-native-sys/pypto#671: Modifies PassManager pipeline strategy definitions; the new ExpandManualPhaseFence pass insertion depends on the retrieved PR's refactored pass ordering infrastructure.

Suggested Labels

enhancement

Suggested Reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 A barrier stands tall in the phase-fenced land,
Compressing deps with careful hand,
No more arrays bloat with fanout's sprawl—
One dummy task holds it all! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.61% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(pass): compress manual_scope phase-fence deps with ExpandManualPhaseFence' clearly summarizes the main change: adding a new IR pass that compresses manual_scope phase-fence dependencies.
Description check ✅ Passed The description comprehensively explains the PR objectives, implementation details, fallback boundaries, test coverage, and validation commands—all directly related to the ExpandManualPhaseFence pass addition.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new compiler pass, ExpandManualPhaseFence, designed to optimize manual phase fences by compressing multi-dependency fanouts into synthetic dummy barrier tasks (system.task_dummy). It updates the orchestration codegen to lower these dummy tasks to rt_submit_dummy_task, registers the pass in the Python bindings and pass manager, and adds comprehensive correctness and swimlane tests. The review feedback identifies two main improvement opportunities: removing the unused helper function WithBoolAttr in the new pass file, and optimizing sequential loops by placing barriers for loop-invariant dependencies outside the loop body to avoid redundant dummy task submissions on every iteration.

Comment on lines +227 to +237
static std::vector<std::pair<std::string, std::any>> WithBoolAttr(
std::vector<std::pair<std::string, std::any>> attrs, const std::string& key, bool value) {
for (auto& [k, v] : attrs) {
if (k == key) {
v = value;
return attrs;
}
}
attrs.emplace_back(key, value);
return attrs;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The helper function WithBoolAttr is defined but never used anywhere in this file or the rest of the codebase. Removing this dead code improves maintainability and keeps the codebase clean.

Comment on lines +287 to +296
if (!decisions.empty() && op->kind_ != ForKind::Parallel) {
auto body_stmts = FlattenToVector(body_with_current_rewrites);
std::vector<StmtPtr> with_barriers;
with_barriers.reserve(decisions.size() + body_stmts.size());
for (const auto& decision : decisions) {
with_barriers.push_back(decision.barrier_stmt);
}
with_barriers.insert(with_barriers.end(), body_stmts.begin(), body_stmts.end());
body_with_current_rewrites = MakeSeqOrStmt(std::move(with_barriers), op->span_);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For sequential loops (op->kind_ != ForKind::Parallel), placing the barrier inside the loop body causes a dummy task to be submitted on every single iteration of the loop. While this is necessary for loop-carried dependencies (which change per iteration), it is highly inefficient for loop-invariant dependencies (which are constant across all iterations). For loop-invariant dependencies, the barrier should be placed outside the sequential loop, just like it is for parallel loops. This would reduce the number of dummy task submissions from O(trip_count) to O(1) and significantly lower runtime overhead.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef45dc562d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +362 to +366
for (const auto& dep_array : CollectDirectManualDepArrays(body)) {
if (!dep_array || BodyUpdatesArray(body, dep_array.get())) continue;
int64_t consumers = CountManualDepConsumersOnArray(body, dep_array.get());
if (is_parallel) consumers *= trip_count;
try_add(dep_array, dep_array, consumers);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard barrier hoisting against loop-local dep arrays

When BuildDecisions scans CollectDirectManualDepArrays(body), it treats every matching array as hoistable in parallel loops, but VisitStmt_(ForStmtPtr) later always places the barrier before the loop. If a manual_dep_edges array is introduced inside the loop body (for example via an alias/rebind var used by consumers), the synthesized system.task_dummy will reference a value that is not defined at the hoist point. In that case codegen can compute zero barrier deps and leave the barrier TaskId invalid, so rewritten consumers lose their intended dependency edges. Please restrict this path to dep-array vars that dominate the loop header (or keep such cases uncompressed).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
tests/ut/ir/transforms/test_expand_manual_phase_fence.py (1)

292-298: 💤 Low value

Missing pytest.main entry point.

The file is missing the standard if __name__ == "__main__" block for direct test execution that other test files in the repository have.

Based on learnings, prefer invoking pytest programmatically via pytest.main([__file__, "-v"]) inside the if __name__ == "__main__" guard for consistent test execution across test files.

🔧 Suggested addition at end of file
 def test_pure_range_consumer_does_not_insert_dummy():
     tids = ir.Var("tids", ir.ArrayType(DataType.TASK_ID, 4), S)
     before = _program_with_loop(_consumer("a", [tids]), kind=ir.ForKind.Sequential)

     after = _run(before)
     first_call = cast(ir.Call, cast(ir.AssignStmt, _loop_body_stmts(after)[0]).value)
     assert first_call.op.name != "system.task_dummy"
+
+
+if __name__ == "__main__":
+    import pytest
+    pytest.main([__file__, "-v"])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/transforms/test_expand_manual_phase_fence.py` around lines 292 -
298, Add the standard "__main__" pytest entry so this test file can be executed
directly: add an if __name__ == "__main__" guard and call pytest.main([__file__,
"-v"]). Ensure you import pytest at top if missing; reference the existing
test_pure_range_consumer_does_not_insert_dummy test and other tests in this file
so running the module executes pytest for this file.
tests/ut/codegen/test_phase_fence_dep_compression.py (1)

22-37: ⚡ Quick win

Avoid re-running phase-fence transforms after the default pipeline.

_compile_program() already runs the default pass strategy, then _generate_orch_code() applies derive_call_directions + expand_manual_phase_fence again. That can hide pipeline-wiring regressions and makes this suite assert a double-pass flow.

Suggested helper split
-def _generate_orch_code(program) -> str:
-    program = passes.derive_call_directions()(program)
-    program = passes.expand_manual_phase_fence()(program)
+def _generate_orch_code(program, *, apply_phase_fence_passes: bool = False) -> str:
+    if apply_phase_fence_passes:
+        program = passes.derive_call_directions()(program)
+        program = passes.expand_manual_phase_fence()(program)
     for func in program.functions.values():
         if func.func_type == ir.FunctionType.Orchestration:
             return codegen.generate_orchestration(program, func).code
     raise ValueError("No orchestration function found in program")
 def _compile_program(program_cls) -> str:
@@
     transformed = pm.run_passes(program_cls)
-    return _generate_orch_code(transformed)
+    return _generate_orch_code(transformed)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_phase_fence_dep_compression.py` around lines 22 - 37,
The test helper double-runs phase-fence transforms: _compile_program runs the
default pass pipeline (PassManager.get_strategy/pm.run_passes) and then
_generate_orch_code re-applies derive_call_directions() and
expand_manual_phase_fence(), masking regressions that depend on a single-pass
flow. Fix by removing the redundant transforms from _generate_orch_code — have
_compile_program produce a fully transformed program (using pm.run_passes) and
make _generate_orch_code simply locate the orchestration function and call
codegen.generate_orchestration(program, func). Update references to
derive_call_directions and expand_manual_phase_fence accordingly so they are
only invoked via the default pipeline, not again in _generate_orch_code.
tests/ut/codegen/test_orchestration_codegen.py (1)

3936-3941: ⚡ Quick win

Make task-deps assertions resilient to task index renumbering.

These checks pin params_t0/params_t1 directly, which is brittle when non-semantic codegen changes renumber tasks. Prefer params_t\d+ regex here, like other tests in this file.

Suggested adjustment
-        assert "PTO2TaskId params_t0_deps[1];" in code, code
+        assert re.search(r"PTO2TaskId params_t\d+_deps\[1\];", code), code
...
-        assert "PTO2TaskId params_t1_deps[5];" in code, code
+        assert re.search(r"PTO2TaskId params_t\d+_deps\[5\];", code), code
...
-        assert "PTO2TaskId params_t0_deps[4];" in code, code
+        assert re.search(r"PTO2TaskId params_t\d+_deps\[4\];", code), code

Also applies to: 4026-4027, 4071-4072

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 3936 - 3941, The
assertions hard-code task numbers (e.g., "params_t0_deps" / "params_t1_deps")
which breaks when task indices are renumbered; change them to use regex-based
checks that match params_t\d+ (for example use re.search on patterns like
r'PTO2TaskId\s+params_t\d+_deps\[\d+\];' and
r'if\s+\(phase_fence_barrier_0_tid\.is_valid\(\)\)\s+params_t\d+\_deps\[params_t\d+_deps_count\+\+\]\s*='
) so the tests verify the same semantics without binding to a specific task
index (also update the similar assertions at the other occurrences noted around
lines 4026-4027 and 4071-4072); keep the negative check for ABOVE_LEGACY_CAP
as-is but apply the params_t\d+ pattern where task-specific names are asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/st/runtime/scheduling/test_manual_scope_pipeline.py`:
- Around line 237-244: The test test_intra_iteration_dep_present was left as a
no-op because the runtime checks were commented out; re-enable the runtime
fanout assertions by uncommenting the manual_scope_swimlane_data checks (the
tasks extraction, total_fanout computation and assert) and ensure the assertion
verifies total_fanout >= _M * _N (or another documented, conservative
lower-bound if you expect runtime variance) so the test fails on missing
intra-iteration dependency edges; reference the variables
manual_scope_swimlane_data, tasks, total_fanout, and the constants _M and _N
when restoring the assertion and keeping the original failure message for
diagnostic clarity.

In `@tests/st/runtime/scheduling/test_phase_fence_dep_compression.py`:
- Around line 52-57: The helper _assert_flattened_stage_strict currently calls
pytest.skip when the observed task count is less than expected (in the block
around the tasks length check) which masks regressions; change that behavior to
fail the test instead by replacing pytest.skip(...) with pytest.fail(...) (or an
explicit assert/raise) and update the similar helper block that also skips on
insufficient tasks (the second occurrence around the other helper/branch at
lines 67-70) so both helpers (including the one referenced near the other
task-count check) raise a failure with a clear message like "need >= {expected}
tasks for phase-fence check, got {len(tasks)}".

In `@tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py`:
- Around line 228-232: The current conditional uses pytest.skip when
total_fanout < _M * _N which can hide real intra-iteration dependency
regressions; change the behavior so the test fails (or at minimum xfails)
instead of silently skipping: replace the pytest.skip call with pytest.fail (or
an assert) that includes the same diagnostic message and values for total_fanout
and _M * _N so failures are visible in runtime witness mode; keep the same
message text but call pytest.fail(...) (or assert total_fanout >= _M * _N,
f"...") referencing total_fanout, _M and _N and remove the silent skip.

---

Nitpick comments:
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 3936-3941: The assertions hard-code task numbers (e.g.,
"params_t0_deps" / "params_t1_deps") which breaks when task indices are
renumbered; change them to use regex-based checks that match params_t\d+ (for
example use re.search on patterns like r'PTO2TaskId\s+params_t\d+_deps\[\d+\];'
and
r'if\s+\(phase_fence_barrier_0_tid\.is_valid\(\)\)\s+params_t\d+\_deps\[params_t\d+_deps_count\+\+\]\s*='
) so the tests verify the same semantics without binding to a specific task
index (also update the similar assertions at the other occurrences noted around
lines 4026-4027 and 4071-4072); keep the negative check for ABOVE_LEGACY_CAP
as-is but apply the params_t\d+ pattern where task-specific names are asserted.

In `@tests/ut/codegen/test_phase_fence_dep_compression.py`:
- Around line 22-37: The test helper double-runs phase-fence transforms:
_compile_program runs the default pass pipeline
(PassManager.get_strategy/pm.run_passes) and then _generate_orch_code re-applies
derive_call_directions() and expand_manual_phase_fence(), masking regressions
that depend on a single-pass flow. Fix by removing the redundant transforms from
_generate_orch_code — have _compile_program produce a fully transformed program
(using pm.run_passes) and make _generate_orch_code simply locate the
orchestration function and call codegen.generate_orchestration(program, func).
Update references to derive_call_directions and expand_manual_phase_fence
accordingly so they are only invoked via the default pipeline, not again in
_generate_orch_code.

In `@tests/ut/ir/transforms/test_expand_manual_phase_fence.py`:
- Around line 292-298: Add the standard "__main__" pytest entry so this test
file can be executed directly: add an if __name__ == "__main__" guard and call
pytest.main([__file__, "-v"]). Ensure you import pytest at top if missing;
reference the existing test_pure_range_consumer_does_not_insert_dummy test and
other tests in this file so running the module executes pytest for this file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0eee7a2f-df8f-49c6-a1e6-3ad12d27e870

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb34f4 and ef45dc5.

📒 Files selected for processing (19)
  • CMakeLists.txt
  • include/pypto/ir/expr.h
  • include/pypto/ir/transforms/pass_properties.h
  • include/pypto/ir/transforms/passes.h
  • python/bindings/modules/ir.cpp
  • python/bindings/modules/passes.cpp
  • python/pypto/ir/pass_manager.py
  • python/pypto/pypto_core/passes.pyi
  • src/codegen/orchestration/orchestration_codegen.cpp
  • src/ir/op/sync_ops/task.cpp
  • src/ir/transforms/expand_manual_phase_fence_pass.cpp
  • tests/st/runtime/scheduling/test_manual_scope_pipeline.py
  • tests/st/runtime/scheduling/test_phase_fence_dep_compression.py
  • tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py
  • tests/ut/codegen/test_orchestration_codegen.py
  • tests/ut/codegen/test_phase_fence_dep_compression.py
  • tests/ut/ir/operators/test_task_ops.py
  • tests/ut/ir/transforms/test_expand_manual_phase_fence.py
  • tests/ut/ir/transforms/test_pass_manager.py

Comment on lines +237 to +244
# Swimlane fanout is runtime-dependent on this witness and can
# under-report the same underlying dependency shape. Do not hard-assert
# the aggregate fanout count here.
# tasks = manual_scope_swimlane_data["tasks"]
# total_fanout = sum(t["fanout_count"] for t in tasks)
# assert total_fanout >= _M * _N, (
# f"expected at least {_M * _N} fan-out edges (one per stage1->stage2 pair), got {total_fanout}"
# )
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test_intra_iteration_dep_present is currently a no-op.

Line [237]–Line [244] comment out all runtime checks, so this test always passes and cannot detect dep-edge regressions.

Suggested fix
     def test_intra_iteration_dep_present(self, manual_scope_swimlane_data: dict):
@@
-        # Swimlane fanout is runtime-dependent on this witness and can
-        # under-report the same underlying dependency shape. Do not hard-assert
-        # the aggregate fanout count here.
-        # tasks = manual_scope_swimlane_data["tasks"]
-        # total_fanout = sum(t["fanout_count"] for t in tasks)
-        # assert total_fanout >= _M * _N, (
-        #     f"expected at least {_M * _N} fan-out edges (one per stage1->stage2 pair), got {total_fanout}"
-        # )
+        tasks = manual_scope_swimlane_data["tasks"]
+        total_fanout = sum(t["fanout_count"] for t in tasks)
+        if total_fanout == 0:
+            pytest.fail("no observable stage1->stage2 fanout edges in swimlane")
+        if total_fanout < _M * _N:
+            pytest.skip(
+                f"swimlane under-reports fanout edges ({total_fanout} < {_M * _N}); "
+                "strict dep wiring is covered by codegen UT"
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Swimlane fanout is runtime-dependent on this witness and can
# under-report the same underlying dependency shape. Do not hard-assert
# the aggregate fanout count here.
# tasks = manual_scope_swimlane_data["tasks"]
# total_fanout = sum(t["fanout_count"] for t in tasks)
# assert total_fanout >= _M * _N, (
# f"expected at least {_M * _N} fan-out edges (one per stage1->stage2 pair), got {total_fanout}"
# )
tasks = manual_scope_swimlane_data["tasks"]
total_fanout = sum(t["fanout_count"] for t in tasks)
if total_fanout == 0:
pytest.fail("no observable stage1->stage2 fanout edges in swimlane")
if total_fanout < _M * _N:
pytest.skip(
f"swimlane under-reports fanout edges ({total_fanout} < {_M * _N}); "
"strict dep wiring is covered by codegen UT"
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/st/runtime/scheduling/test_manual_scope_pipeline.py` around lines 237 -
244, The test test_intra_iteration_dep_present was left as a no-op because the
runtime checks were commented out; re-enable the runtime fanout assertions by
uncommenting the manual_scope_swimlane_data checks (the tasks extraction,
total_fanout computation and assert) and ensure the assertion verifies
total_fanout >= _M * _N (or another documented, conservative lower-bound if you
expect runtime variance) so the test fails on missing intra-iteration dependency
edges; reference the variables manual_scope_swimlane_data, tasks, total_fanout,
and the constants _M and _N when restoring the assertion and keeping the
original failure message for diagnostic clarity.

Comment on lines +52 to +57
def _assert_flattened_stage_strict(swimlane_data: dict, *, stages: int, branches: int) -> None:
expected = stages * branches
tasks = swimlane_data["tasks"]
if len(tasks) < expected:
pytest.skip(f"need >= {expected} tasks for phase-fence check, got {len(tasks)}")
tasks = sorted(tasks, key=lambda t: t["start_time_us"])[:expected]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Task-count undershoots should fail, not skip, in witness helpers.

Line [55]-Line [56] and Line [69]-Line [70] currently skip when too few tasks are observed, which can hide real scheduling regressions across many callers.

Suggested fix
 def _assert_flattened_stage_strict(swimlane_data: dict, *, stages: int, branches: int) -> None:
     expected = stages * branches
     tasks = swimlane_data["tasks"]
-    if len(tasks) < expected:
-        pytest.skip(f"need >= {expected} tasks for phase-fence check, got {len(tasks)}")
+    assert len(tasks) >= expected, (
+        f"need >= {expected} tasks for phase-fence check, got {len(tasks)}"
+    )
@@
 def _assert_min_task_count(swimlane_data: dict, *, expected: int) -> None:
     tasks = swimlane_data["tasks"]
-    if len(tasks) < expected:
-        pytest.skip(f"need >= {expected} tasks for swimlane check, got {len(tasks)}")
+    assert len(tasks) >= expected, (
+        f"need >= {expected} tasks for swimlane check, got {len(tasks)}"
+    )

Also applies to: 67-70

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/st/runtime/scheduling/test_phase_fence_dep_compression.py` around lines
52 - 57, The helper _assert_flattened_stage_strict currently calls pytest.skip
when the observed task count is less than expected (in the block around the
tasks length check) which masks regressions; change that behavior to fail the
test instead by replacing pytest.skip(...) with pytest.fail(...) (or an explicit
assert/raise) and update the similar helper block that also skips on
insufficient tasks (the second occurrence around the other helper/branch at
lines 67-70) so both helpers (including the one referenced near the other
task-count check) raise a failure with a clear message like "need >= {expected}
tasks for phase-fence check, got {len(tasks)}".

Comment on lines +228 to +232
if total_fanout < _M * _N:
pytest.skip(
f"pl.at swimlane under-reports outlined fanout edges ({total_fanout} < {_M * _N}); "
"strict dep wiring is covered by codegen UT"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip-only fanout handling can hide real intra-iteration dep regressions.

At Line [228]–Line [232], low observed fanout always skips; a broken dependency shape can silently pass in runtime witness mode.

Suggested fix
         tasks = pl_at_deps_swimlane_data["tasks"]
         total_fanout = sum(t["fanout_count"] for t in tasks)
+        if total_fanout == 0:
+            pytest.fail("no observable stage1->stage2 fanout edges in pl.at swimlane")
         if total_fanout < _M * _N:
             pytest.skip(
                 f"pl.at swimlane under-reports outlined fanout edges ({total_fanout} < {_M * _N}); "
                 "strict dep wiring is covered by codegen UT"
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/st/runtime/scheduling/test_pl_at_deps_pipeline.py` around lines 228 -
232, The current conditional uses pytest.skip when total_fanout < _M * _N which
can hide real intra-iteration dependency regressions; change the behavior so the
test fails (or at minimum xfails) instead of silently skipping: replace the
pytest.skip call with pytest.fail (or an assert) that includes the same
diagnostic message and values for total_fanout and _M * _N so failures are
visible in runtime witness mode; keep the same message text but call
pytest.fail(...) (or assert total_fanout >= _M * _N, f"...") referencing
total_fanout, _M and _N and remove the silent skip.

@Leaf-Salix Leaf-Salix marked this pull request as draft May 27, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant