Sarcastically is machine learning algorithm based on a text and audio multi-model approach developed using Tensorflow and Keras on top of the MUStARD dataset. Built for the purpose of the Bachelors of Computer Science at the University of Westminster.
The technology for the project is defined in the environment.yml file listed in the root of the project.
Sarcastically was built in order to improve sarcasm detection by utilizing better audio models focused on natural human voice which is able to provide better features which can be used to improve the accuracy of the sarcasm detection. This accompanied by the text model helps in improving the goal of this project as a whole.
You will need to download the following files:
- The WikiWord Vectors Dataset from FastText
- The MUStARD Dataset Files
After you download the above, do the following:
- Make sure that the audio files are stored in a folder named
mmsd_raw_datain the root of the project. - Run the cells in the
audio-model-preprocessing.ipynbfile in order to convert themp4videos into.wavfiles. - Run the cells in
mustard_normalizer.ipynbto normalize the data into a CSV file namednormalized_mustard_dataset.csv
Finally,
- Create a conda environment based on the environment.yml file as below
conda create -n <your_env_name> -f environment.yml- Activate the conda environment
source activate <your_env_name>Open the sarcastically.ipynb file and run all the cells. (make sure you've downloaded the files in the previous step)
Contact me at ryanjk.kuruppu@gmail.com
Checkout my final report at docs/sarcastically.pdf