A casual implementation of the image generative AI method diffusion. This code implements U-Net the is trained to predict noise in an image. At inference time, this can be used to predict and subsequently incrementally subtract the noise from an image of pure noise, slowly revealing the "signal" underneath. The result is a completely new signal, a new flower/face image.
- all in basic pytorch
- single and multi GPU training with pytorch's distributed data parallel DDP (TODO: use lightening)
- image generation
- this has not been finetuned or hyper parameter optimized, it is a quick and dirty implementation for teaching/learning.
- install PyTorch as appropriate for your system
- install other modules
python -m pip install numpy matplotlib tensorboard pytest - optionally run the (very basic) unit tests
pytest
Currently, two datasets are setup, see src/dataset/README.md for details on data download and preparation
- Oxford Flowers: 64 X 64 pixel images
- celebA: 109 X 89 pixel images
- Install
uvfrom astral:curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv
uv pip install -e .
uv run pytest
Once data is ready, /scripts/ folder has all the entry points to train the model and generate images.
folders.pycontains model checkpoint and logging folder names for scripts to importrun_trainer.py(RECOMENDED) trains the UNet on a single GPUrun_trainer_ddp.pytrains the UNet on multiple GPUS via DDPrun_generator.pyrestores a model checkpoint, generates new images using CPU and saves as a png.
All experiments were run on PC with a Ryzen 9 5950X, 64GB RAM and 2 X Nvidia RTX 3090 GPUs, recognisable images can be generated even within ~ 1 hour of training.
Image generation only takes a few seconds, (yes even on CPU).
