=====================================================
π A PyTorch implementation of a DDPM-based conditional diffusion model for controllable face generation on the CelebA dataset (64Γ64 resolution).
Developed for the Generative AI (MSc in Computer Engineering) course at the University of Salerno.
Demonstrates practical experience in diffusion models, conditional generative modeling, UNet architectures, attention mechanisms, EMA stabilization and end-to-end training pipelines.
This project implements a conditional Denoising Diffusion Probabilistic Model (DDPM) capable of generating face images conditioned on three semantic attributes:
- π¨ Male / Female
- π Smiling / Not Smiling
- πΆ Young / Not Young
The model learns to generate realistic 64Γ64 face images from pure Gaussian noise, guided by attribute conditioning.
The generative backbone is a Conditional U-Net trained to predict noise Ξ΅β.
- Sinusoidal Time Embedding
- Learnable Condition Embedding
- FiLM modulation inside residual blocks
- Multi-scale Self-Attention (16Γ16 and 8Γ8)
- Linear beta schedule (1000 diffusion steps)
- EMA (Exponential Moving Average) stabilization
Forward Process:
xβ β xβ (progressive noise injection)
Reverse Process (learned):
xβ β xβββ β ... β xβ
The model predicts noise at each timestep and reconstructs the clean image via iterative denoising.
Input Resolution: 64Γ64
Base Channels: 128
Timesteps: 1000
Optimizer: AdamW
Loss: MSE (noise prediction objective)
The U-Net includes:
- Downsampling: 64 β 32 β 16 β 8
- Bottleneck at 8Γ8
- Symmetric decoder with skip connections
- Attention layers at 16Γ16 and 8Γ8 resolutions
- FiLM-based conditioning (time + attributes)
Each sample is conditioned on a 3-dimensional binary vector:
[Male, Smiling, Young]
Example:
[1, 1, 0] β Male, Smiling, Not Young
All 8 possible attribute combinations are supported during sampling.
π¦ celeba-conditional-diffusion
βββ π scripts
β βββ architecture.py # Conditional UNet (FiLM + Attention)
β βββ training_lite.py # DDPM scheduler + training loop + EMA
β βββ inference.py # Conditional sampling script
β
βββ π weights
β βββ latest.pt # Latest trained checkpoint
β
βββ .gitattributes
βββ LICENSE
βββ README.md
Dataset used: CelebA
Attributes extracted:
- #20 β Male
- #31 β Smiling
- #39 β Young
Images are:
- Resized to 64Γ64
- Center cropped
- Normalized to [-1, 1]
Run:
python training_lite.py
Features:
- Random timestep sampling
- Forward diffusion noise injection
- Noise prediction objective
- Gradient clipping
- EMA model tracking
- Periodic sample generation
- Automatic checkpoint saving
Checkpoints saved in:
weights/latest.pt
Edit the conditioning vector inside:
inference.py
Then run:
python inference.py
The script:
- Loads the trained checkpoint
- Restores the scheduler
- Generates N samples with the same conditioning
- Saves a grid image in
/generated
- PyTorch
- Torchvision
- CelebA Dataset
- UNet Architecture
- Self-Attention
- DDPM Scheduler (custom implementation)
- EMA (Stabilized Training)
- AdamW Optimizer
Developed as a Project Work for:
Generative AI β MSc in Computer Engineering
University of Salerno
Academic Year 2025/2026
The original assignment required multiple generative approaches; this repository contains the Diffusion Model implementation.
The other 2 implementations can be found here.
- Stable diffusion training from scratch
- Conditioning injection inside UNet blocks
- Attention integration at low resolutions
- Reverse diffusion numerical stability
- EMA-based sampling stabilization
- Memory-efficient training at 64Γ64
This project highlights:
- Deep understanding of diffusion models
- Conditional generative modeling
- Architectural design of UNet with attention
- Training stabilization techniques
- Full generative pipeline implementation from scratch
If you find it interesting, feel free to β the repository.
Diffusion Model, DDPM, Conditional Diffusion, CelebA Diffusion, Face Generation AI, Conditional UNet, Generative AI MSc Project, PyTorch Diffusion Implementation, Noise Prediction Model, Attribute Conditioned Generation, Denoising Diffusion Probabilistic Model, Self Attention UNet, EMA Diffusion Training, Computer Vision Generative Models
This project is licensed under the MIT License.
Use it, build on it, experiment with it, just donβt blame the diffusion process if it generates something unexpected π