Warn when FSDP auto-wrap policy splits tied weights#21613
Merged
ethanwharris merged 5 commits intoLightning-AI:masterfrom Apr 13, 2026
Merged
Warn when FSDP auto-wrap policy splits tied weights#21613ethanwharris merged 5 commits intoLightning-AI:masterfrom
ethanwharris merged 5 commits intoLightning-AI:masterfrom
Conversation
Detect shared parameters that would be placed in separate FSDP units by the auto-wrap policy and emit a warning before wrapping. This turns a cryptic RuntimeError (size mismatch) into an actionable message. Applies to both Fabric and PyTorch Lightning FSDP strategies. Closes Lightning-AI#21403
Cover four scenarios: tied weights across units (warns), tied weights in same unit (no warn), no shared params (no warn), no policy (no warn).
Contributor
Author
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #21613 +/- ##
=======================================
Coverage 87% 87%
=======================================
Files 270 270
Lines 23934 23974 +40
=======================================
+ Hits 20713 20749 +36
- Misses 3221 3225 +4 |
justusschock
approved these changes
Mar 26, 2026
ethanwharris
approved these changes
Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
auto_wrap_policyrank_zero_warnbefore wrapping, turning a crypticRuntimeError: size mismatchinto an actionable messagesetup_module) and PyTorch Lightning (_setup_model) FSDP strategiesMotivation
Models like Llama, GPT-2, and Mistral tie their input embedding and output head weights. When users include
torch.nn.Embeddingin their FSDP auto-wrap policy, the embedding gets its own FSDP unit while the tiedlm_headstays in the root unit. FSDP shards each unit independently, solm_headsees a flat/sharded tensor instead of the expected 2D weight — causing a size mismatch deep in torch with no indication of the real cause.Test plan
Closes #21403
🤖 Generated with Claude Code
📚 Documentation preview 📚: https://pytorch-lightning--21613.org.readthedocs.build/en/21613/