96 lines (59 loc) · 3.51 KB

Notes

for windows:

http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy

get the scipy whl and numpy blas whl whl

Spectrograms

TODO:

NOW

make init script nicer for someone else to use
tests for learning.py - v important

LATER

produce a graph at the end for the test data
Dockerise so others can easily develop (no real way of testing with scientific python packages with travis at the moment, unless docker is used)
Turn TODO list into github issues

Would be Cool

make the fpython stuff nicer to use (currently in the bin folder of env, setup needs to add it to the folder)
think of ways of making the experiments even more lightweight

The Non-vocal baseline:

2 methods - spectrograms and raw waveform.

Papers and misc. Information

http://cs231n.stanford.edu/reports/Cs_231n_paper.pdf
https://papers.nips.cc/paper/3674-unsupervised-feature-learning-for-audio-classification-using-convolutional-deep-belief-networks.pdf
- 20ms window size, 10ms overlap
- PCA whitening, (80 components)
http://research.microsoft.com/pubs/230136/IS140441.PDF
http://benanne.github.io/2014/08/05/spotify-cnns.html
- used mel scale for frequency and logarithmic amplitude scaling

https://www.youtube.com/watch?v=qv6UVOQ0F44

Datasets That Look Useful:

RML emotion database: promising, although maybe a bit small, could go well with another corpus.
Montreal affected voices corpus: emotional vocalizations (laughing, crying etc.)
Italian affected voices corpus
Toronto Emotional Speech Set:
Berlin Emotional Speech Database:
RAVDESS: Currently in use
Surrey Audio-Visual Expressed Emotion (SAVEE) Database
Vera am Mittag German Audio-Visual Spontaneous Speech Database.
Estonian Emotional Speech Corpus http://peeter.eki.ee:5000/

The ones below I looked at, but were not useful

buckeye corpus: Long conversations of american english - not much metadata
voxforge: Lots of recordings, none of it about the users emotions/stress levels, therefore likely not to have much range.
Santa Barbara Corpus of Spoken American English: detailed but with no stress/emotion.
Spoken Language Corpora at the Research Center on Multilingualism: same as the UCSB
The Spoken Turkish Corpus at METU Ankara: Also normal corpus
Spoken Corpus Klient with the Corp-Oral Corpus at ILTEC Lisbon
OLAC: Open Language Archives Community
BAS Bavarian Archive for Speech Signals
Simmortel Speech Recognition Corpus for Indian English and Hindi
ELRA: the European Language Resources Association
The PELCRA Conversational Corpus of Polish