refactor: migrate rms_norm + dsv4 attention to auto_chunk#386
Conversation
sim failures were a known infra issue, unrelated to auto_chunk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 26 minutes and 34 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughFive ChangesLoop Optimizer Parameter Migration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request replaces the deprecated optimization=pl.chunked_loop_optimizer with optimizations=[pl.auto_chunk] across several model files and examples. The feedback points out that in several locations (decode_attention_hca.py and decode_attention_swa.py), the pl.auto_chunk optimization is redundant because there are no loops inside the with pl.at blocks, and suggests removing the optimizations argument in those cases.
These pl.at blocks have no inner loop (chunk loop is outside), so auto_chunk is a no-op; using it caused ~35% sim mismatch. Bare pl.at per gemini review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bare pl.at: hca worsens to 80% mismatch, swa errors (scatter has chunk= inner loop needing auto_chunk). only chunked_loop works for these. 386 net effect = rms_norm migration only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Finishes #373 migration: rms_norm and dsv4 decode_attention_hca/swa were kept on chunked_loop_optimizer during #373 due to suspected sim mismatch; root cause was a known sim infra issue, unrelated to auto_chunk. Migrate the remaining 5 sites so the repo is fully on pl.auto_chunk.
🤖 Generated with Claude Code