_create_transpose tensor accumulating during FSDP2 with quantized_model_init

with quantized_model_init (DelayedScaling), FusedAdam, and FSDP2, we allocate a `transformer_engine/pytorch/tensor/storage/float8_tensor_storage.py:204:_create_transpose` tensor during the backwards pass that accumulates over all the model layers until the subsequent forward layer

<img width="1156" height="526" alt="Image" src="https://github.com/user-attachments/assets/e27263e2-7500-4c18-9ed0-facb35bf3e20" />

https://nvidia.slack.com/archives/C038G319G6R/p1771868674747129?thread_ts=1771868398.818749&cid=C038G319G6R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_create_transpose tensor accumulating during FSDP2 with quantized_model_init #2717

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

_create_transpose tensor accumulating during FSDP2 with quantized_model_init #2717

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions