Skip to content

francescopiocirillo/celeba-conditional-diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎭 CelebA Conditional Diffusion Model

=====================================================

πŸš€ A PyTorch implementation of a DDPM-based conditional diffusion model for controllable face generation on the CelebA dataset (64Γ—64 resolution).

Developed for the Generative AI (MSc in Computer Engineering) course at the University of Salerno.

Demonstrates practical experience in diffusion models, conditional generative modeling, UNet architectures, attention mechanisms, EMA stabilization and end-to-end training pipelines.


πŸ“Œ Overview

This project implements a conditional Denoising Diffusion Probabilistic Model (DDPM) capable of generating face images conditioned on three semantic attributes:

  • πŸ‘¨ Male / Female
  • 😊 Smiling / Not Smiling
  • πŸ‘Ά Young / Not Young

The model learns to generate realistic 64Γ—64 face images from pure Gaussian noise, guided by attribute conditioning.


🧠 Model Architecture

The generative backbone is a Conditional U-Net trained to predict noise Ξ΅β‚œ.

πŸ”Ή Core Components

  • Sinusoidal Time Embedding
  • Learnable Condition Embedding
  • FiLM modulation inside residual blocks
  • Multi-scale Self-Attention (16Γ—16 and 8Γ—8)
  • Linear beta schedule (1000 diffusion steps)
  • EMA (Exponential Moving Average) stabilization

πŸ”„ Diffusion Process

Forward Process:

xβ‚€ β†’ xβ‚œ (progressive noise injection)

Reverse Process (learned):

xβ‚œ β†’ xβ‚œβ‚‹β‚ β†’ ... β†’ xβ‚€

The model predicts noise at each timestep and reconstructs the clean image via iterative denoising.


πŸ— Architecture Details

Input Resolution: 64Γ—64
Base Channels: 128
Timesteps: 1000
Optimizer: AdamW
Loss: MSE (noise prediction objective)

The U-Net includes:

  • Downsampling: 64 β†’ 32 β†’ 16 β†’ 8
  • Bottleneck at 8Γ—8
  • Symmetric decoder with skip connections
  • Attention layers at 16Γ—16 and 8Γ—8 resolutions
  • FiLM-based conditioning (time + attributes)

πŸ“Š Conditioning Strategy

Each sample is conditioned on a 3-dimensional binary vector:

[Male, Smiling, Young]

Example:

[1, 1, 0] β†’ Male, Smiling, Not Young

All 8 possible attribute combinations are supported during sampling.


πŸ“‚ Repository Structure

πŸ“¦ celeba-conditional-diffusion  
β”œβ”€β”€ πŸ“ scripts  
β”‚   β”œβ”€β”€ architecture.py      # Conditional UNet (FiLM + Attention)  
β”‚   β”œβ”€β”€ training_lite.py     # DDPM scheduler + training loop + EMA  
β”‚   └── inference.py         # Conditional sampling script  
β”‚  
β”œβ”€β”€ πŸ“ weights  
β”‚   └── latest.pt            # Latest trained checkpoint  
β”‚  
β”œβ”€β”€ .gitattributes  
β”œβ”€β”€ LICENSE  
└── README.md

πŸ§ͺ Training Pipeline

1️⃣ Dataset

Dataset used: CelebA

Attributes extracted:

  • #20 β†’ Male
  • #31 β†’ Smiling
  • #39 β†’ Young

Images are:

  • Resized to 64Γ—64
  • Center cropped
  • Normalized to [-1, 1]

2️⃣ Training

Run:

python training_lite.py

Features:

  • Random timestep sampling
  • Forward diffusion noise injection
  • Noise prediction objective
  • Gradient clipping
  • EMA model tracking
  • Periodic sample generation
  • Automatic checkpoint saving

Checkpoints saved in:

weights/latest.pt

3️⃣ Inference

Edit the conditioning vector inside:

inference.py

Then run:

python inference.py

The script:

  • Loads the trained checkpoint
  • Restores the scheduler
  • Generates N samples with the same conditioning
  • Saves a grid image in /generated

πŸ”¬ Technologies Used

  • PyTorch
  • Torchvision
  • CelebA Dataset
  • UNet Architecture
  • Self-Attention
  • DDPM Scheduler (custom implementation)
  • EMA (Stabilized Training)
  • AdamW Optimizer

πŸŽ“ Academic Context

Developed as a Project Work for:

Generative AI – MSc in Computer Engineering
University of Salerno
Academic Year 2025/2026

The original assignment required multiple generative approaches; this repository contains the Diffusion Model implementation.

The other 2 implementations can be found here.


πŸ’‘ Key Challenges Addressed

  • Stable diffusion training from scratch
  • Conditioning injection inside UNet blocks
  • Attention integration at low resolutions
  • Reverse diffusion numerical stability
  • EMA-based sampling stabilization
  • Memory-efficient training at 64Γ—64

⭐ Final Note

This project highlights:

  • Deep understanding of diffusion models
  • Conditional generative modeling
  • Architectural design of UNet with attention
  • Training stabilization techniques
  • Full generative pipeline implementation from scratch

If you find it interesting, feel free to ⭐ the repository.


πŸ“ˆ SEO Tags

Diffusion Model, DDPM, Conditional Diffusion, CelebA Diffusion, Face Generation AI, Conditional UNet, Generative AI MSc Project, PyTorch Diffusion Implementation, Noise Prediction Model, Attribute Conditioned Generation, Denoising Diffusion Probabilistic Model, Self Attention UNet, EMA Diffusion Training, Computer Vision Generative Models

πŸ“„ License

This project is licensed under the MIT License.

Use it, build on it, experiment with it, just don’t blame the diffusion process if it generates something unexpected πŸ˜„

About

πŸ‘€ Conditional diffusion model for 64x64 face generation on CelebA, conditioned on gender, smile, and age attributes. Implements a DDPM-based architecture for controllable face synthesis and attribute-guided sampling.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages