Skip to content

fix(merger): handle non-sharded tensors in FSDP2 checkpoint merging#155

Merged
kcz358 merged 1 commit intomainfrom
fix/fsdp2-merger-non-sharded-tensors
Apr 15, 2026
Merged

fix(merger): handle non-sharded tensors in FSDP2 checkpoint merging#155
kcz358 merged 1 commit intomainfrom
fix/fsdp2-merger-non-sharded-tensors

Conversation

@kcz358
Copy link
Copy Markdown
Collaborator

@kcz358 kcz358 commented Apr 15, 2026

Summary

Non-sharded buffers (e.g. time_embedding.inv_freq) are stored as plain Tensor rather than DTensor in FSDP2 checkpoints, causing AttributeError: 'Tensor' object has no attribute '_local_tensor' during merge.

  • Check for _local_tensor attribute before accessing it; fall back to using the tensor directly for non-sharded buffers
  • Deduplicate identical copies across ranks (take first) instead of blindly concatenating, which would produce incorrect shapes for non-sharded parameters

Non-sharded buffers (e.g. inv_freq) are stored as plain Tensors rather
than DTensors, causing AttributeError on _local_tensor access. Now
falls back to using the tensor directly, and deduplicates identical
copies across ranks instead of concatenating them.
@kcz358 kcz358 merged commit d1ac27a into main Apr 15, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant