This repository is all about about comparing music to languages like latin and then trying to develop music generator using NLP techniques. Based on this project I've created a presentation that was presented during GHOST Day: Applied Machine Learning Conference
Below you can find quick summary about what have been done. You can watch slides or dig into code for more details.
I really wanted to compare different genres and artist with one another. But unfortunatelly I couldn't find any good dataset comprising of midi songs with such classification. So I decided to create one by myself the ideal structure for me was:
└── genres /
├── rock/
│ ├── artist_1/
│ │ ├── song_1.mid
│ │ └── song_2.mid
│ └── artist_2/
│ └── song_1.mid
├── jazz/
│ └── artist_1
├── blues
└── pop
So I've used technique called Web scraping to traverse pages with MIDI and download songs from here. One problem was that some artists had was labaled to many ganres and some labels I didn't like a lot. As a result I've filtered a lot of files and eventually end up with 8 thousand songs. In the scryper folder you can see the scripts I've used.
If you are interested in getting the dataset please contact me. It is not that straightforward as a lot of the songs are copyrighted and due the law if we are speaking about generative models dataset can be used only for scientific purposes. Concerning discriminative models you can use the dataset for whatever you want. But I can't make it publicly available. Click here for more information about law about using copyrighted datasets
Since I wanted to compare music with languages I had to create common representation for them both. I decided to represent music as a text. You can find implementations of converters inside midiToTxt.
- In the first representation I've just mapped every possible piano key to one ASCII character and represent each timestep as a vector with characters (notes) to be played.
Now we have data represented in a way we wanted so we can start our experiments. In the first one I compared how conditional entropy bahaves w.r.t music and latin language. In a nutshell conditional entropy H(X|Y) depicts our uncertaintity about event X knowing what happend in event Y. In our case Y - previous word we know that was spoken based on which we predict X - next word to be spoken. You can find more about conditional entropy in the slides. In general we expect that entropy to decrease w.r.t more words known. Here I present results which all shows that this musical text behaves very similarly to latin, and latin behaves very similarly to all other languages. You can find conditional entropy implementation in entropy
After seeing that this musical text behave like language I was more certain that typical NLP methods will be working. You can find all code considering this part in notebooks. At first 3 baselines was trained (All based on LSTM):
- first on that uncompressed representation, just using multilabel classification at each step. Baseline 1 sample
- Second on lossles compression representation. At each step multilabel classification + Poisson regression. Baseline 2 sample
- Third one on the event based representation. Dictonary of words was created and then after each step we perform typical classification. Baseline 3 sample
I've got the most promising results from the third representation. That is why I used it to train GPT transformer, hoping for best results. Transformer sample





