This repository contains the official implementation for "Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning" (NeurIPS 2025).
We recommend creating an isolated environment using Conda (Python 3.10):
conda create -n resalign python=3.10 -y
conda activate resalign
pip install -r requirements.txtOur experiments require three datasets. You can now use the direct download commands below (or download via browser) and unzip the resulting archives into data/.
You may use the following command to download:
mkdir -p data
wget --content-disposition \
"https://entuedu-my.sharepoint.com/:u:/g/personal/boheng001_e_ntu_edu_sg/IQC8ivkH_LrMSLDx0sNJPfrKAUJild9e8egBUW6Ig-80sM4?download=1"
unzip -oq finetune-datasets.zip -d data && mv data/finetune-datasets data/finetune_datasets
rm finetune-datasets.zipYou may use the following command to download:
wget --content-disposition \
"https://entuedu-my.sharepoint.com/:u:/g/personal/boheng001_e_ntu_edu_sg/IQBTbNJG0ig8RrltLCKWdBFHAanwNJr1petww2sBmNXr7u8?download=1"
unzip -oq new_mscoco10k.zip -d data
mv data/new_mscoco10k/new_mscoco10k/* data/new_mscoco10k/ && rmdir data/new_mscoco10k/new_mscoco10k
rm new_mscoco10k.zipresalign_train(the dataset used for training our ResAlign, not required if you only want to evaluate our pretrained model/your own)
pip install huggingface_hub
huggingface-cli login
huggingface-cli download randyli/ResAlign_train \
--repo-type dataset \
--local-dir data/db_prior \
--local-dir-use-symlinks False \
--quiet >/dev/null
mv data/ResAlign_train/* data/ && rm -rf data/ResAlign_trainAfter downloading all parts you should have:
data
├── db_prior
├── down1
├── down2
├── down3
├── finetune-datasets
│ ├── diffusiondb-dataset/train
│ └── dreambench-dataset/train
├── new_mscoco10k
│ ├── images
│ └── prompts.csv
├── README.md
├── retain
├── i2p.csv
├── unsafe.csv
└── train
We release two standalone ResAlign checkpoints so you can reproduce the quick-validation results without re-training. Both files can be downloaded directly from . Use the commands below to place them where the scripts expect them:
wget --content-disposition \
"https://entuedu-my.sharepoint.com/:u:/g/personal/boheng001_e_ntu_edu_sg/IQBdJnqNqVzESI0Wz0Sa_j8fARE11wC9fKRVJ7saJTzU6pg?download=1"
mkdir -p outputs/quick_validation_higher_safety
mv resalign_pretrained_model_sdv1.4_sexual_higher_safety.pt \
outputs/quick_validation_higher_safety/final_model.ptwget --content-disposition \
"https://entuedu-my.sharepoint.com/:u:/g/personal/boheng001_e_ntu_edu_sg/IQANKHT3WU3ES4FlPAkQYk_7ATUb9x0sxB9AIJD16xF04m8?download=1"
mkdir -p outputs/quick_validation_higher_utility
mv resalign_pretrained_model_sdv1.4_sexual_higher_utility.pt \
outputs/quick_validation_higher_utility/final_model.ptAfter the download, the directory layout under outputs/ should match:
outputs/
├── quick_validation_higher_safety/
│ └── final_model.pt
└── quick_validation_higher_utility/
└── final_model.pt
In the paper, most of our reported results were averaged over three independent runs. The two checkpoints above are representative single runs that emphasize different trade-offs:
| Model | Before Fine-tuning (IP / US) | DreamBench++ Fine-tuned (IP / US) | DiffusionDB Fine-tuned (IP / US) | FID / CLIP / Aesthetics |
|---|---|---|---|---|
| Higher Safety | 0.0007 / 0.0000 | 0.0168 / 0.0050 | 0.0408 / 0.0100 | 19.758 / 30.75 / 5.96 |
| Higher Utility | 0.0039 / 0.0000 | 0.0476 / 0.0033 | 0.0791 / 0.0150 | 17.635 / 31.04 / 6.01 |
When running the evaluation pipelines you should observe numbers close to the above (allowing ±1% due to stochastic generation). Please note that these checkpoints are intended for research use only; downstream applications should account for residual risks and comply with your deployment policies.
You can quickly validate the pretrained ResAlign model using:
conda activate resalign
bash scripts/quick_validation_higher_safety.sh
bash scripts/quick_validation_higher_utility.shThis will automatically use the pretrained model in outputs/quick_validation and run both safety evaluation and aesthetic evaluation.
You can also use your own trained model by specifying the model directory:
conda activate resalign
bash scripts/quick_validation.sh outputs/your_model_dirThe script will perform the following:
-
Safety Evaluation
- Performs pre-fine-tuning evaluation (IP and US)
- Fine-tunes the model on DiffusionDB and DreamBench++ datasets, respectively
- Performs post-fine-tuning evaluation (generates IP and US for each dataset)
- Generates metrics summary CSV:
outputs/{model_name}/safety_eval/safety_metrics_summary.csv
-
Utility Evaluation
- Uses the same model for evaluation
- Generates images and calculates FID, CLIP, and Aesthetic scores
- Results are saved to
outputs/{model_name}/eval/result.txt
The evaluation results will be saved to outputs/{model_name}/ directory.
The repository structure is organized as follows:
Resalign_official
├── configs/ # Training configuration JSON files
├── src/ # Source code directory
│ ├── train/ # Training related code
│ └── eval/ # Evaluation related code
├── data/ # Dataset directory
│ ├── prompts.csv # Aesthetic evaluation prompts
│ ├── i2p.csv # Safety evaluation prompts (i2p)
│ ├── unsafe.csv # Safety evaluation prompts (unsafe)
│ └── finetune-datasets/ # Fine-tuning datasets (diffusiondb, dreambench)
├── checkpoints/ # Detector weight files
├── outputs/ # Training output directory
└── scripts/ # One-click execution scripts
├── train.sh # Training script
├── quick_validation.sh # Quick validation script (uses pretrained model or your own model)
├── eval_all.sh # One-click evaluation script (runs safety and aesthetic evaluation sequentially)
├── eval_all_custom.sh # One-click evaluation with a user-specified model directory
├── eval.sh # Aesthetic evaluation script
└── eval_safety.sh # Safety evaluation script
A training job can be launched using:
bash scripts/train.shThe default parameters in the script include:
- Model:
CompVis/stable-diffusion-v1-4 - Training epochs:
160(adjustable in the script) - Dynamic weights: Enabled,
outer_lambda=0.8,retain_loss_weight=1.6 → 2.0 - Data:
data/train(outer),data/retain(retain) - Configuration: All specified configuration files such as
configs/config-1.json
To customize GPU: modify the --cuda_device parameter in the script.
After training, the model weights are saved to outputs/{epochs}-{timestamp}/final_model.pt
Run all evaluations (safety evaluation + aesthetic evaluation) with one command:
conda activate resalign
bash scripts/eval_all.shIf you want to use your own trained or externally provided model weights (a directory containing final_model.pt), put them in your own directory and run the following script to specify that directory for one-click evaluation:
conda activate resalign
bash scripts/eval_all_custom.sh /absolute/path/to/your_model_dirNotes:
- Your directory must contain at least:
final_model.pt - The evaluation outputs will still be written to
outputs/{your_model_dir_basename}/... - You can also run the two evaluation scripts separately with a directory argument:
- Safety evaluation:
bash scripts/eval_safety.sh /absolute/path/to/your_model_dir - Aesthetic evaluation:
bash scripts/eval.sh /absolute/path/to/your_model_dir
- Safety evaluation:
The script will run sequentially:
-
Safety Evaluation
- Automatically finds the latest training output directory
- Performs pre-fine-tuning evaluation (IP and US)
- Fine-tunes the model on diffusiondb and dreambench datasets respectively
- Performs post-fine-tuning evaluation (generates IP and US for each dataset)
- Generates metrics summary CSV:
outputs/{model_name}/safety_eval/safety_metrics_summary.csv
-
Aesthetic Evaluation
- Uses the same model for evaluation
- Generates images and calculates FID, CLIP, and Aesthetic scores
- Results are saved to
outputs/{model_name}/eval/result.txt
Run from the repository root directory:
bash scripts/eval.shThe script will automatically:
- Find the latest training output directory under
outputs/ - Use
final_model.ptin that directory for evaluation - Generate images and calculate FID, CLIP, and Aesthetic scores
- Save results to
{model_dir}/eval/directory
Manually specify a model for evaluation:
python -m src.eval.eval_aesthetic \
--model_dir outputs/160epochs-20251105-1700 \
--cuda_device 0Run from the repository root directory:
bash scripts/eval_safety.shThe script will automatically:
- Find the latest training output directory under
outputs/ - Perform pre-fine-tuning evaluation (IP and US)
- Fine-tune the model on diffusiondb and dreambench datasets respectively
- Perform post-fine-tuning evaluation (generates IP and US for each dataset)
- Output 6 metric results
Safety evaluation includes 6 metrics:
- Pre-fine-tuning IP: Evaluate the original model on the i2p dataset (NudeNet detector)
- Pre-fine-tuning US: Evaluate the original model on the unsafe dataset (MHSC detector)
- Post-fine-tuning (diffusiondb) IP: Evaluate the diffusiondb fine-tuned model on the i2p dataset
- Post-fine-tuning (diffusiondb) US: Evaluate the diffusiondb fine-tuned model on the unsafe dataset
- Post-fine-tuning (dreambench) IP: Evaluate the dreambench fine-tuned model on the i2p dataset
- Post-fine-tuning (dreambench) US: Evaluate the dreambench fine-tuned model on the unsafe dataset
Evaluation results are saved to outputs/{model_name}/safety_eval/ directory:
outputs/{model_name}/safety_eval/
├── before_ft/
│ ├── i2p/results/ # Pre-fine-tuning IP metrics
│ └── unsafe/results/ # Pre-fine-tuning US metrics
└── after_ft/
├── diffusiondb/
│ ├── i2p/results/ # Post-fine-tuning (diffusiondb) IP metrics
│ └── unsafe/results/ # Post-fine-tuning (diffusiondb) US metrics
└── dreambench/
├── i2p/results/ # Post-fine-tuning (dreambench) IP metrics
└── unsafe/results/ # Post-fine-tuning (dreambench) US metrics
This project builds upon open-source work from:
Our dataset is built upon work from the following sources (including but not limited to):
We sincerely thank the respective authors for releasing their codebases and datasets.
@inproceedings{li2025towards,
title={Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning},
author={Boheng Li and Renjie Gu and Junjie Wang and Leyi Qi and Yiming Li and Run Wang and Zhan Qin and Tianwei Zhang},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=iEtCCt6FjP}
}This project is released under the Apache License 2.0. Please note that our datasets (both for training and evaluation) may include certain external datasets whose copyrights do not belong to us. If you wish to use those external datasets, please make sure to comply with their original licenses.