Final-Year-University-Project

Class-conditioned Audio Diffusion from scratch

Prerequisites:

Download the GTZAN dataset, which can be found here https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification Place this in the "GTZAN_Genre_Collection" folder This is required to train the model and use the dataset.
Download PANNs model, which can be found here https://github.com/qiuqiangkong/audioset_tagging_cnn Place this in the "PANNs" folder This is required to provide the embeddings for the FAD score, which is used to evaluate the model You can download the specific model "Cnn14_mAP=0.431.pth" from here https://zenodo.org/records/3987831 This model is used, since it matches the resolution of the 256x256 samples generated from the models

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
AudioDiffusion.py		AudioDiffusion.py
ImageDiffusion.py		ImageDiffusion.py
README.md		README.md
datasets.py		datasets.py
diffusion_models.py		diffusion_models.py
generate_from_pretrained_model.py		generate_from_pretrained_model.py
tests.py		tests.py

Provide feedback