Skip to content

Error: "Attempting to unscale FP16 gradients." #7

@brian6091

Description

@brian6091

Description: Ubuntu 18.04.6 LTS
diffusers @ git+https://github.com/huggingface/diffusers@326de4191578dfb55cb968880d40d703075e331e
torchvision @ https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp38-cp38-linux_x86_64.whl
transformers==4.25.1
xformers @ https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl

Copy-and-paste the text below in your GitHub issue

  • Accelerate version: 0.14.0
  • Platform: Linux-5.10.133+-x86_64-with-glibc2.27
  • Python version: 3.8.15
  • Numpy version: 1.21.6
  • PyTorch version (GPU?): 1.13.0+cu116 (True)
  • Accelerate default config:
    Not found

Steps: 0% 0/699 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/Dreambooth/train_dreambooth.py", line 854, in <module> main(args) File "/content/Dreambooth/train_dreambooth.py", line 810, in main accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1247, in clip_grad_norm_ self.unscale_gradients() File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1210, in unscale_gradients self.scaler.unscale_(opt) File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 282, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False) File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 210, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. Steps: 0% 0/699 [00:12<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1069, in launch_command simple_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 551, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/Dreambooth/train_dreambooth.py', '--revision=fp16', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=stabilityai/sd-vae-ft-mse', '--instance_data_dir=/content/gdrive/MyDrive/InstanceImages/caetmurxb/', '--class_data_dir=/content/gdrive/MyDrive/RegularizationImages/person/', '--output_dir=/content/models/', '--logging_dir=/content/logs/', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of caetmux person', '--class_prompt=a photo of a person', '--seed=1275017', '--resolution=512', '--train_batch_size=4', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--mixed_precision=fp16', '--use_8bit_adam', '--adam_beta1=0.9', '--adam_beta2=0.999', '--adam_weight_decay=0.01', '--adam_epsilon=1e-08', '--learning_rate=6e-06', '--lr_scheduler=cosine', '--lr_warmup_steps=25', '--lr_cosine_num_cycles=5', '--ema_inv_gamma=1.0', '--ema_power=0.5', '--ema_min_value=0', '--ema_max_value=0.999', '--max_train_steps=699', '--num_class_images=1500', '--sample_batch_size=4', '--save_min_steps=100', '--save_interval=100', '--n_save_sample=4', '--save_sample_prompt=a photo of caetmux person', '--save_sample_negative_prompt=']' returned non-zero exit status 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions