Skip to content

_create_transpose tensor accumulating during FSDP2 with quantized_model_init #2717

@pstjohn

Description

@pstjohn

with quantized_model_init (DelayedScaling), FusedAdam, and FSDP2, we allocate a transformer_engine/pytorch/tensor/storage/float8_tensor_storage.py:204:_create_transpose tensor during the backwards pass that accumulates over all the model layers until the subsequent forward layer

Image

https://nvidia.slack.com/archives/C038G319G6R/p1771868674747129?thread_ts=1771868398.818749&cid=C038G319G6R

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions