Skip to content

Noise residuals during Text-to-Image Fine-tuning #11232

@Yanlin2001

Description

@Yanlin2001

Describe the bug

Loss curve decreases but U-net denoising gets worse when fine-tuning stable-diffusion-2-1-base on self-built dataset.

train_loss
Image

validation(Step 11,616)
Image

validation(Step 28,208)
Image

validation(Step 45,408)
Image

These artifacts are similar to an illustration in a paper I read, so I'll call them “residual noise” for now.
Image

overfitting on single batch

--resolution="320" \
--train_batch_size="16" \
--gradient_accumulation_steps="4" \
--gradient_checkpointing \
--learning_rate="5e-05" \

Image
Image

Reproduction

accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" \
--train_data_dir="clean_data/train-good" \
--resolution="320" \
--train_batch_size="16" \
--gradient_accumulation_steps="4" \
--gradient_checkpointing \
--max_train_steps="100000" \
--learning_rate="5e-05" \
--max_grad_norm="1" \
--lr_scheduler="constant" \
--lr_warmup_steps="0" \
--output_dir="experiments/demo" \
--mixed_precision="fp16" \
--validation_prompts \
"Brain, AXT1PRE, Slice1" \
"Brain, AXT1PRE, Slice2, FieldStrength:2.8936, Flash, TR:264, TE:2.88, TI:300, flipAngle:70" \
"Brain, AXT1PRE, Slice8, FieldStrength:2.8936, Flash, TR:264, TE:2.88, TI:300, flipAngle:70" \
"Brain, AXT1PRE, Slice11, FieldStrength:2.8936, Flash, TR:250, TE:2.64, TI:300, flipAngle:70" \
"Brain, AXT2, Slice1" \
"Brain, AXT2, Slice1, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:9.74, TR:5120, TE:107, TI:100, flipAngle:150" \
"Brain, AXT2, Slice6, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:10.28, TR:5120, TE:103, TI:100, flipAngle:150" \
"Brain, AXT2, Slice10, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:10.16, TR:5460, TE:102, TI:100, flipAngle:150" \
"Brain, AXT1POST, Slice1" \
"Brain, AXT1POST, Slice1, FieldStrength:2.8936, Flash, TR:250, TE:2.64, TI:300, flipAngle:70" \
"Brain, AXT1POST, Slice8, FieldStrength:2.8936, Flash, TR:264, TE:2.88, TI:300, flipAngle:70" \
"Brain, AXT1POST, Slice11, FieldStrength:2.8936, Flash, TR:264, TE:2.88, TI:300, flipAngle:70" \
"Brain, AXT1, Slice1" \
"Brain, AXT1, Slice1, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:9.4, TR:419, TE:9.4, TI:100, flipAngle:140" \
"Brain, AXT1, Slice9, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:9.4, TR:446, TE:9.4, TI:100, flipAngle:140" \
"Brain, AXT1, Slice11, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:9.4, TR:461, TE:9.4, TI:100, flipAngle:145" \
"Brain, AXFLAIR, Slice1" \
"Brain, AXFLAIR, Slice1, FieldStrength:2.8936, TurboSpinEcho with EchoSpacing:9.02, TR:9000, TE:81, TI:2500, flipAngle:150" \
"Brain, AXFLAIR, Slice9, FieldStrength:2.8936, TurboSpinEcho with EchoSpacing:9.02, TR:9000, TE:81, TI:2500, flipAngle:150" \
"Brain, AXFLAIR, Slice11, FieldStrength:2.8936, TurboSpinEcho with EchoSpacing:9.02, TR:9000, TE:81, TI:2500, flipAngle:150" \
"yoda" \
"Brain, Slice1, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:9.4, TR:419, TE:9.4, TI:100, flipAngle:140" \
"Brain, Slice9, FieldStrength:1.494, TurboSpinEcho with EchoSpacing:9.4, TR:446, TE:9.4, TI:100, flipAngle:140" \
"''" \
--validation_epochs="1" \
--checkpointing_steps="1500"

Logs

System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Windows-10-10.0.19044-SP0
  • Running on Google Colab?: No
  • Python version: 3.8.20
  • PyTorch version (GPU?): 2.4.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.29.3
  • Transformers version: 4.46.3
  • Accelerate version: 1.0.1
  • PEFT version: 0.7.0
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.3
  • xFormers version: not installed
  • Accelerator: NVIDIA RTX A5000, 24564 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@sayakpaul

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions