jockharkness/Movie-Genre-Classification---COMP90049-Introduction-to-Machine-Learning-Assignment-2
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
{\rtf1\ansi\ansicpg1252\cocoartf2513
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww6780\viewh11380\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Below is a description for each of the files submitted:\
\
\
get_data.py: extracts the data from the tsv files\
\
baseline.py: performs the Zero-R baseline\
\
feature_extraction.py: this is where the preprocessing occurs. The datasets are concatenated, and processing is performed on the joint dataset as to make the vectorisation consistent. The tags and titles features are lemmatised and stop words are removed. A there is a getter for each feature used in analysis\
\
classifiers.py: this file was used in the preliminary testing of the features. The program iterates through four classifiers, outputting results for each of them.\
\
decisiontree.py: this file contains the code to implement the decision tree and its respective testing. It also contains the code for the pruning component. The figures plot the effects of pruning on accuracy.\
\
neuralnet_gridsearch.py: this file was used when testing which parameters may increase accuracy for the MLP classifier. The CV GridSearch functionality from the sklearn package was used to iterate through different parameter settings.\
\
\
}