Skip to content

TrainingArguments does not process fsdp arguments correctly #42664

@quic-meetkuma

Description

@quic-meetkuma

System Info

transformers version: 5.0.0.dev0 (Added backend specific code only)
Platform: Linux-6.8.0-41-generic-x86_64-with-glibc2.39
Python version: 3.10.19
Huggingface_hub version: 1.0.0.rc6
Safetensors version: 0.6.2
Accelerate version: 1.10.1
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.7.0+cpu (NA)
Using distributed or parallel set-up in script?: Not needed

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Code references are given.

Expected behavior

In the _process_fsdp_args() function within the TrainingArguments class, the parameters intended for the FullyShardedDataParallelPlugin from the accelerate module are not mapped correctly. Additionally, some parameters are missing entirely.

The current implementation does not pass the necessary parameters from self.fsdp_config into fsdp_plugin_args as expected. Parameters should be mapped based on the example below, but this mapping is incomplete in the existing code.

Example: https://github.com/huggingface/accelerate/blob/b9ca0de682f25f15357a3f9f1a4d94374a1d451d/tests/tp/fsdp2_tp_preparation.py#L72

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions