Skip to content

refactor: migrate pl.at(optimization=) to optimizations=[pl.auto_chunk]#373

Merged
zhangqi-chen merged 3 commits into
hw-native-sys:mainfrom
lyfne123:refactor/auto-chunk-kwarg
May 26, 2026
Merged

refactor: migrate pl.at(optimization=) to optimizations=[pl.auto_chunk]#373
zhangqi-chen merged 3 commits into
hw-native-sys:mainfrom
lyfne123:refactor/auto-chunk-kwarg

Conversation

@lyfne123
Copy link
Copy Markdown
Contributor

Summary

pypto#1504 removed the deprecated pl.at(optimization=, split=) kwargs and the chunked_loop_optimizer sentinel. This migrates every callsite in pypto-lib to the supported optimizations=[pl.auto_chunk] form and refreshes stale comments.

  • 18 files: examples + qwen3/deepseek/kimi/milm kernels
  • No split= usage existed, so all become a plain optimizations=[pl.auto_chunk]
  • pl.auto_chunk is itself deprecation-warned but still functional; kept to keep examples runnable

Test plan

  • lint: English-only + headers pass
  • golden harness on examples/models compiles & runs

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Review Change Stack

Warning

Review limit reached

@lyfne123, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 50 minutes and 19 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e33f0916-ef65-4cd1-ab28-410ab04ddf9c

📥 Commits

Reviewing files that changed from the base of the PR and between 09a33f0 and 5a2b095.

📒 Files selected for processing (15)
  • examples/advanced/gemm_eltwise.py
  • examples/beginner/hello_world.py
  • examples/beginner/matmul.py
  • examples/intermediate/gemm.py
  • examples/intermediate/layer_norm.py
  • examples/intermediate/rope.py
  • examples/intermediate/softmax.py
  • models/deepseek/v3_2/deepseek_v3_2_prefill_front_draft.py
  • models/deepseek/v4/decode_indexer.py
  • models/deepseek/v4/hc_post.py
  • models/deepseek/v4/qkv_proj_rope.py
  • models/kimi/kimi_k2_decode_draft.py
  • models/milm/milm_decode_draft.py
  • models/qwen3/14b/qwen3_14b_l3_generate.py
  • models/qwen3/32b/qwen3_32b_prefill_draft.py
📝 Walkthrough

Walkthrough

This PR systematically replaces PyPTO's singular pl.at(..., optimization=pl.chunked_loop_optimizer) parameter with a list-based pl.at(..., optimizations=[pl.auto_chunk]) parameter across all examples and production models, unifying the loop optimization hint mechanism from one strategy to another.

Changes

Loop optimization hint unification

Layer / File(s) Summary
Documentation and docstring updates
examples/advanced/gemm_eltwise.py
Module-level comments and function docstrings describing the optimizer configuration are updated to reference auto_chunk instead of chunked_loop_optimizer.
Beginner and intermediate example programs
examples/beginner/hello_world.py, examples/beginner/matmul.py, examples/intermediate/gemm.py, examples/intermediate/layer_norm.py, examples/intermediate/rope.py, examples/intermediate/softmax.py
Six example programs update their core-group pl.at contexts to use the new optimizer hint format via single-line parameter changes.
Single-region model kernel updates
models/deepseek/v3_2/deepseek_v3_2_prefill_front_draft.py, models/deepseek/v4/decode_attention_hca.py, models/deepseek/v4/hc_post.py, models/kimi/kimi_k2_decode_draft.py, models/milm/milm_decode_draft.py
Deepseek v3_2, v4 (hca, hc_post), Kimi, and MiLM decode programs update their individual core-group scheduling regions via straightforward one-line parameter changes.
Multi-region models with updated comments
models/deepseek/v4/decode_attention_swa.py, models/deepseek/v4/qkv_proj_rope.py, models/deepseek/v4/decode_indexer.py
Deepseek v4 attention SWA (three KV cache and assembly regions), qkv_proj_rope (RMSNorm partial-sum paths), and decode_indexer (score_quant path) each switch multiple pl.at contexts and update related inline documentation to reflect auto_chunk usage.
Large-scale Qwen3 model refactoring
models/qwen3/14b/qwen3_14b_l3_generate.py, models/qwen3/32b/qwen3_32b_prefill_draft.py
Qwen3 14B and 32B models update all core-group pl.at directives across prefill and decode paths (Q/K/V projection, normalization, padding, RoPE, attention matmul/softmax, and MLP stages) to use the new optimizations=[pl.auto_chunk] format.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • hw-native-sys/pypto-lib#332: Both PRs modify models/deepseek/v4/qkv_proj_rope.py by replacing pl.at(..., optimization=pl.chunked_loop_optimizer) hints—this PR switches to optimizations=[pl.auto_chunk] while the related PR refactors the qkv_proj_rope scope structure.
  • hw-native-sys/pypto-lib#350: Both PRs modify models/deepseek/v4/decode_indexer.py in the score_quant path—this PR updates the CORE_GROUP loop scheduling hint to pl.auto_chunk while the related PR refactors the scoring computation logic.
  • hw-native-sys/pypto-lib#276: Both PRs update Qwen3-14B pl.at directives across the same core-group regions in prefill and decode paths; this PR switches the optimizer strategy while the related PR adds name_hint labeling.

Poem

🐰 With optimizer hints now crystalline clear,
From chunked loops to auto-chunks we steer,
Across all examples and models so grand,
A unified PyPTO, refactored so planned!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: migrating deprecated pl.at(optimization=) syntax to the new optimizations=[pl.auto_chunk] form across the codebase.
Description check ✅ Passed The description is directly related to the changeset, explaining the migration rationale, scope (18 files), technical details, and test plan.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request performs a widespread refactor to replace the chunked_loop_optimizer with auto_chunk across multiple example scripts and model implementations. The changes primarily involve updating the pl.at context manager to use the optimizations list parameter instead of the single optimization parameter, along with corresponding updates to docstrings and comments. I have no feedback to provide as there were no review comments.

lyfne123 and others added 2 commits May 26, 2026 10:57
…izations=[pl.auto_chunk]

pypto#1504 removed the pl.at(optimization=, split=) kwargs and the
chunked_loop_optimizer sentinel. Switch all callsites to the supported
optimizations=[pl.auto_chunk] form and update stale comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auto_chunk requests a ~27GB static arena for rms_norm's two-pass
manually-chunked kernel under the pinned pto-isa, failing CI runtime.
softmax/layer_norm migrate cleanly (single full-hidden tile); rms_norm
is the only example already manually chunked, so revert it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lyfne123 lyfne123 force-pushed the refactor/auto-chunk-kwarg branch from 09a33f0 to 2ea0c63 Compare May 26, 2026 02:57
…mizer

decode_attention_hca/swa wrap pl.at inside an outer pl.range with
explicit chunk= args; auto_chunk re-chunks and gives ~34% sim mismatch
on x_out (device passes). Same already-manually-chunked case as rms_norm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants