This directory contains the code for the CIS 5300 Term Project, including baseline models, extensions, and their corresponding markdown documentation.
- Short description for each file in this folder.
- General Discription of Model Output Generation.
-
simple-baseline.py: Contains the code for the simple baseline implementation from Milestone #2. A more detailed description and how to run it is insimple-baseline.md. -
simple-baseline.md: Documentation of the simple baseline from Milestone #2. Contains a more detailed description of the simple baseline and instructions on how to runsimple-baseline.py -
strong-baseline.py: Contains the code for the strong baseline implementation from Milestone #2. A more detailed description and how to run it is instrong-baseline.md -
strong-baseline.md: Documentation of the strong baseline from Milestone #2. Contains a more detailed description of the strong baseline and instructions on how to runstrong-baseline.py -
score.py: Utility used to compute evaluation metrics (accuracy, F1, precision, recall). Use it to obtain evaluation metrics by comparing prediction files against the ground truth. Typical usage from the command line:python score.py --true_labels_file test_labels.npy --pred_labels_file prediction_file.npy. Execution details are available inscoring.md. However, we recommend executing this script from the output folder. Please refer toREADME.mdfor details. -
scoring.md: Markdown file providing detailed documentation on how to usescore.py, including explanations of the evaluation metric, example commands, and guidelines for interpreting the results. -
generate_outputs.py: Script for generating final model outputs as a.npyfile on the test dataset. Run after training the model and saving its weights. Typical usage from the command line:python generate_outputs.py --ensemble_method="BERT". More details will be provided below. -
BERT_CNN.py: Scripts used to train BERT + CNN as well as generate evaluation metrics on the Dev and Test Datasets. Please refer toBERT_CNN.mdfor more detailed documentation and how to run this script. -
BERT_CNN.md: Markdown file providing detailed documentation for the BERT + CNN model from Milestone #3 -
BERT_LSTM.py: Scripts used to train BERT + LSTM as well as generate evaluation metrics on the Dev and Test Datasets. Please refer toBERT_LSTM.mdfor more detailed documentation and how to run this script. -
BERT_LSTM.md: Markdown file providing detailed documentation for the BERT + LSTM model from Milestone #3 -
Soft_Voting.py: Script used to implement a soft voting ensemble of BERT, BERT + CNN, and BERT + LSTM. Please refer toSoft_Voting.mdfor detailed documentation and instructions on running this script. -
Hard_Voting.py: Script used to implement a hard voting ensemble of BERT, BERT + CNN, and BERT + LSTM. Please refer toHard_Voting.mdfor detailed documentation and instructions on running this script. -
Stacking.py: Script used to implement a stacking ensemble of BERT, BERT + CNN, and BERT + LSTM with a Logistic Regression meta-learner. Please refer toStacking.mdfor detailed documentation and instructions on running this script. -
Max_Voting.py: Script used to implement a max voting ensemble of BERT, BERT + CNN, and BERT + LSTM. Please refer toMax_Voting.mdfor detailed documentation and instructions on running this script. -
Milestone #2 (Prepare Data).ipynb: Notebook used to prepare and explore the dataset for Milestone 2. It samples 3% of the original dataset and performs exploratory data analysis (EDA), including computing split statistics across training, development, and test sets, and investigating class imbalance. -
Error_Analysis_Dhruv.ipynb: Notebook for performing error analysis on the best-performing system, categorizing both false positives and false negatives to understand the types and prevalence of errors. -
error_analysis.ipynb: Notebook for analyzing and comparing error types between the strong baseline and the best-performing system.
-
Inputs: CSV files in the
data/folder (train_data.csv,dev_data.csv,test_data.csv) -
Saved Model Weights: In addition to the data, you will also need the saved weights from Milestone #2 Fine-Tune of BERT Model, the saved weights from Milestone #3 training of BERT + CNN, and the saved weights from Milestone #3 training of BERT + LSTM. To access the weights, here is a Google Drive Link to a Shared Folder called Model Weights. Within this folder, you will see the following sub-folders
-
Milestone2-Baseline-BERT-FinalModel (i.e. Saved Weights from Milestone #2 Fine-Tune of BERT Model)
-
Milestone3-BERT-CNN-FinalModel (i.e. Saved Weights from Milestone #3 Training of BERT + CNN)
-
Milestone3-BERT-BiLSTM-FinalModel (i.e. Saved Weights from Milestone #3 Training of BERT + LSTM)
Please download the above folders and structure your local repository as follows: Required Folder Structure
├── data/
│ ├── train_data.csv
│ ├── dev_data.csv
│ └── test_data.csv
├── Milestone #2/
│ ├── Milestone2-Baseline-BERT-FinalModel # Saved Weights from Milestone #2 Fine-Tune of BERT Model
├── Milestone #3/
│ ├── Milestone3-BERT-CNN-FinalModel # Saved Weights from Milestone #3 Training of BERT + CNN
│ ├── Milestone3-BERT-BiLSTM-FinalModel # Saved Weights from Milestone #3 Training of BERT + LSTM
- Outputs: Prediction
.npyfiles,*-results.csvcontaining evaluation results, and Model Checkpoints
Final model predictions on the test set are generated using the script generate_outputs.py. This script loads the saved model weights from Milestone #2 and Milestone #3, runs inference on the test dataset, and saves predictions as .npy files for evaluation. Using these npy files and the ground truth file, the score.py(score.py) function can be used to produce evaluation metrics!
The script supports generating outputs for the following single models and ensemble methods:
-
Single Models
BERTBERT-CNNBERT-LSTM
-
Ensemble Methods
Hard VoteSoft VoteMax VoteStacking(with Logistic Regression meta-learner)
Note to Grader: The simple baseline script generates its own output upon execution!
Run the script from the directory containing generate_outputs.py:
python generate_outputs.py --ensemble_method "<METHOD_NAME>"where <METHOD_NAME> is any one of:
BERTBERT-CNNBERT-LSTMHard VoteSoft VoteMax VoteStacking