To put into practice the knowledge learned in the univesity course Deep Learning for AI, my colleague Fabrizio Niro(@fabrizio-n) and I, decided to try to implement a neural network that was able to recognize the genre of a song that is passed to it as input.
The Dataset that we used is the famous GTZAN dataset. It contains 1000 audio files, all having length of 30 seconds.
The most common way of performing deep learning on audio data is by the use of convolutional architectures on the spectrograms of audio features. By the way, in this project we are trying to model the data as time dependent sequences, for this reason we will use a RNN(Recurrent Neural Network), in particular a Gated Recurent Unit(GRU).
Note: The implementation of the NN is done with PyTorch
Below the slides we used to present what we did and the results we achieved.









