Skip to content

mograev/OptDif

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

139 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent Space Optimization using Diffusion Models

Thesis Overview

This thesis investigates latent space optimization (LSO) using diffusion models, aiming to scale black-box optimization to larger generators and high-resolution images. We explore how to maximize a quantifiable attribute (e.g., smiling) while preserving realism and faithfulness. We compare three search spaces: (i) Stable Diffusion autoencoder latents, (ii) a compact learned latent (LatentVQVAE over SD latents), and (iii) LoRA space optimization (LoRASO), which optimizes low-dimensional adapter embeddings. Under matched settings, LoRASO achieves the strongest attribute gains and realism, highlighting the effectiveness of conditioning space optimization over traditional latent space methods.

Repository Description

Official code repository for the Master's thesis Latent Space Optimization using Diffusion Models, submitted to the Data and Web Science Group (Prof. Dr.-Ing. Margret Keuper) at the University of Mannheim.

Setup Guide

Environment

conda create -n optdif1 python=3.12
conda activate optdif1
pip install -r requirements.txt

Data Import

The FFHQ dataset is too large to be included in this repository. The images1024x1024 version of the FFHQ dataset can be downloaded from here. Copy the Google Drive folder data to the root directory of the workspace, and unzip the image archive. The directory structure should look like this:

├── data
│   └── ffhq
│       ├── images1024x1024
│       │   ├── 00000.png
│       │   ├── ...
│       │   ├── 69999.png
│       ├── ffhq-dataset-v2.json
│       ├── smile_scores.json
│       └── smile_scores_scaled.json

Classifier Import

The CelebA classifier can be downloaded from here. Copy the Google Drive folder models to the root directory of the workspace.

Model Training

This repository provides scripts for training various autoencoder models and finetuning the Stable Diffusion VAE or other components. The model implementations can be found under src/models/, and their training scripts are located in src/run/. Training can be started by adapting and running the appropriate Slurm scripts under slurm/train/.

Optimization

The primary work of this thesis is to perform optimization within autoencoder latent spaces or alternative embedding spaces. For optimization, we rely on direct gradient-based optimization (src/gbo/) or a Bayesian optimization framework (src/bo/). We consider three main approaches to optimization and their implementation can be found directly in src/lso_<approach>.py and launched via the respective Slurm script in slurm/lso/:

  1. Optimization in Stable Diffusion Latent Space: Direct optimization in a Stable Diffusion autoencoder latent space.
  2. Optimization in LatentVQVAE Latent Space: Optimization in a compact learned latent space, exemplified by a LatentVQVAE that further encodes Stable Diffusion (SD) latents.
  3. Optimization in LoRA Conditioning Space: This approach builds on the CTRLorALTer paper and exploits intermediate representations of a LoRAdapter for optimization.

Credits

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published