-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Description
System Info
transformers version: 5.0.0.dev0 (Added backend specific code only)
Platform: Linux-6.8.0-41-generic-x86_64-with-glibc2.39
Python version: 3.10.19
Huggingface_hub version: 1.0.0.rc6
Safetensors version: 0.6.2
Accelerate version: 1.10.1
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.7.0+cpu (NA)
Using distributed or parallel set-up in script?: Not needed
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Code references are given.
Expected behavior
In the _process_fsdp_args() function within the TrainingArguments class, the parameters intended for the FullyShardedDataParallelPlugin from the accelerate module are not mapped correctly. Additionally, some parameters are missing entirely.
The current implementation does not pass the necessary parameters from self.fsdp_config into fsdp_plugin_args as expected. Parameters should be mapped based on the example below, but this mapping is incomplete in the existing code.