Warn when FSDP auto-wrap policy splits tied weights by c-pozzi · Pull Request #21613 · Lightning-AI/pytorch-lightning

c-pozzi · 2026-03-26T08:39:53Z

Summary

Adds detection of shared (tied) parameters that would be placed in separate FSDP units by the auto_wrap_policy
Emits a rank_zero_warn before wrapping, turning a cryptic RuntimeError: size mismatch into an actionable message
Applies to both Fabric (setup_module) and PyTorch Lightning (_setup_model) FSDP strategies

Motivation

Models like Llama, GPT-2, and Mistral tie their input embedding and output head weights. When users include torch.nn.Embedding in their FSDP auto-wrap policy, the embedding gets its own FSDP unit while the tied lm_head stays in the root unit. FSDP shards each unit independently, so lm_head sees a flat/sharded tensor instead of the expected 2D weight — causing a size mismatch deep in torch with no indication of the real cause.

Test plan

Tied weights across FSDP units → warning emitted
Tied weights in same FSDP unit → no warning
No shared params → no warning
No policy set → no warning

Closes #21403

🤖 Generated with Claude Code

📚 Documentation preview 📚: https://pytorch-lightning--21613.org.readthedocs.build/en/21613/

Detect shared parameters that would be placed in separate FSDP units by the auto-wrap policy and emit a warning before wrapping. This turns a cryptic RuntimeError (size mismatch) into an actionable message. Applies to both Fabric and PyTorch Lightning FSDP strategies. Closes Lightning-AI#21403

Cover four scenarios: tied weights across units (warns), tied weights in same unit (no warn), no shared params (no warn), no policy (no warn).

c-pozzi · 2026-03-26T09:16:18Z

@justusschock

codecov · 2026-03-26T09:34:41Z

Codecov Report

❌ Patch coverage is 97.50000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 87%. Comparing base (bb7820f) to head (ac456ab).
⚠️ Report is 9 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #21613   +/-   ##
=======================================
  Coverage      87%      87%           
=======================================
  Files         270      270           
  Lines       23934    23974   +40     
=======================================
+ Hits        20713    20749   +36     
- Misses       3221     3225    +4

c-pozzi added 2 commits March 26, 2026 08:26

Add tests for FSDP tied weight detection warning

e662fef

Cover four scenarios: tied weights across units (warns), tied weights in same unit (no warn), no shared params (no warn), no policy (no warn).

github-actions bot added fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Mar 26, 2026

c-pozzi marked this pull request as ready for review March 26, 2026 09:15

c-pozzi requested review from ethanwharris, justusschock, lantiga and tchaton as code owners March 26, 2026 09:15

justusschock approved these changes Mar 26, 2026

View reviewed changes

deependujha added 3 commits March 30, 2026 15:51

Merge branch 'master' into fix/fsdp-tied-weights-warning

3d9d97b

Merge branch 'master' into fix/fsdp-tied-weights-warning

b64b197

Merge branch 'master' into fix/fsdp-tied-weights-warning

ac456ab

ethanwharris approved these changes Apr 13, 2026

View reviewed changes

ethanwharris merged commit 4ea9b01 into Lightning-AI:master Apr 13, 2026
142 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn when FSDP auto-wrap policy splits tied weights#21613

Warn when FSDP auto-wrap policy splits tied weights#21613
ethanwharris merged 5 commits intoLightning-AI:masterfrom
c-pozzi:fix/fsdp-tied-weights-warning

c-pozzi commented Mar 26, 2026 •

edited by github-actions bot

Loading

Uh oh!

c-pozzi commented Mar 26, 2026

Uh oh!

codecov bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

c-pozzi commented Mar 26, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

c-pozzi commented Mar 26, 2026

Uh oh!

codecov bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

c-pozzi commented Mar 26, 2026 •

edited by github-actions bot

Loading

codecov bot commented Mar 26, 2026 •

edited

Loading