Skip to content

Latest commit

 

History

History
259 lines (190 loc) · 6.42 KB

File metadata and controls

259 lines (190 loc) · 6.42 KB

DistilBERT Hyperparameter Exploration - Source Code

Student: Martynas Prascevicius Student ID: 001263199 Course: COMP1818 Artificial Intelligence Applications Academic Year: 2025-26


Project Overview

This project systematically explores DistilBERT hyperparameters for sentiment analysis through 11 controlled experiments across 4 phases:

  • Phase 1: Baseline (1 experiment)
  • Phase 2: Learning rates - 1e-5, 2e-5, 3e-5, 5e-5 (4 experiments)
  • Phase 3: Batch sizes - 8, 16, 32 (3 experiments)
  • Phase 4: Training duration - 3, 4, 5 epochs (3 experiments)

Best Result: 91.04% accuracy (learning rate 1e-5)


What's Included

├── src/
│   ├── run_all.py              # Master script - runs everything
│   ├── experiment_runner.py    # Main training script
│   ├── experiment_configs.py   # All 11 experiment configs
│   ├── data_loader.py          # IMDB dataset loader
│   ├── enhanced_model.py       # DistilBERT model class
│   ├── generate_figures.py     # Creates all 5 figures
│   └── results_analyzer.py     # Optional analysis tool
│
├── results/                    # All 11 experiment results (JSON)
│   ├── baseline_default.json   # 90.77%
│   ├── lr_1e5.json            # 91.04% (BEST)
│   ├── lr_2e5.json            # 90.96%
│   ├── lr_3e5.json            # 90.83%
│   ├── lr_5e5.json            # 90.06%
│   ├── batch_8.json           # 90.86%
│   ├── batch_16.json          # 90.91%
│   ├── batch_32.json          # 90.40%
│   ├── epochs_3.json          # 91.02%
│   ├── epochs_4.json          # 91.00%
│   └── epochs_5.json          # 90.28%
│
├── requirements.txt            # Python dependencies
└── README_CODE_SUBMISSION.md   # This file

Dataset

IMDB Movie Reviews (Maas et al., 2011)

Download: https://ai.stanford.edu/~amaas/data/sentiment/

Extract to create this structure:

CW2/data/aclImdb/
├── train/pos/  (12,500 positive reviews)
├── train/neg/  (12,500 negative reviews)
├── test/pos/   (12,500 positive reviews)
└── test/neg/   (12,500 negative reviews)

Total: 50,000 reviews (25k train, 25k test)


Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Download Dataset

Download IMDB from: https://ai.stanford.edu/~amaas/data/sentiment/ Extract to: CW2/data/aclImdb/

3. Run Everything (Recommended)

python3 src/run_all.py

This will:

  • Run all 11 experiments (Phases 1-4)
  • Generate all 5 figures automatically
  • Save everything to results/ and figures/

Time: ~20 hours on Mac M4 (varies by hardware)


Advanced Usage

Run Individual Phases

# Run one phase at a time
python3 src/experiment_runner.py --phase 1  # Baseline
python3 src/experiment_runner.py --phase 2  # Learning rates
python3 src/experiment_runner.py --phase 3  # Batch sizes
python3 src/experiment_runner.py --phase 4  # Training duration

Generate Figures Only

# If you already have results/*.json files
python3 src/generate_figures.py

Analyze Results

# Optional: compare all experiments
python3 src/results_analyzer.py --compare-all

Requirements

Hardware

  • Minimum: 8GB RAM, CPU
  • Recommended: 16GB RAM, GPU (CUDA/MPS)
  • Used for this project: Mac mini M4, 24GB RAM, MPS GPU

Software

  • Python 3.10+
  • PyTorch 2.0+
  • Transformers 4.30+
  • See requirements.txt for complete list

Results Summary

Key Findings

Learning Rate (Phase 2):

  • 1e-5: 91.04% ← BEST (challenges BERT recommendations)
  • 2e-5: 90.96% (standard baseline)
  • 3e-5: 90.83%
  • 5e-5: 90.06% (unstable training)

Batch Size (Phase 3):

  • 8: 90.86% (slow: 170 min)
  • 16: 90.91% (optimal: 153 min)
  • 32: 90.40% (fast: 139 min, but worse generalization)

Training Duration (Phase 4):

  • 3 epochs: 91.02% (optimal - early stopping)
  • 4 epochs: 91.00% (minimal gain)
  • 5 epochs: 90.28% (severe overfitting)

Optimal Configuration

learning_rate = 1e-5   # Conservative (not 2e-5!)
batch_size = 16        # Medium (not 32)
num_epochs = 3         # Early stopping
max_length = 256
optimizer = AdamW
weight_decay = 0.01

File Descriptions

src/run_all.py - Master script that runs all experiments + generates figures

src/experiment_runner.py - Main training script (loads data, trains model, saves results)

src/experiment_configs.py - Defines all 11 experiments with hyperparameters

src/data_loader.py - Loads IMDB from local directory (no HuggingFace dependency)

src/enhanced_model.py - DistilBERT model (66M parameters, pre-trained + classification head)

src/generate_figures.py - Creates 5 publication-quality figures (PDF + PNG)

src/results_analyzer.py - Optional tool for analysis and LaTeX table export


Reproducibility

All experiments use:

  • Random seed: 42
  • Deterministic algorithms: Enabled
  • Data splits: 90% train, 10% validation (from 25k training set)
  • Test set: Fixed 25k reviews (never seen during training)

Note: MPS GPU has ±0.02% variance (~5 predictions out of 25k)


Training Time

Per experiment (Mac M4 with MPS GPU):

  • Baseline: ~140 min
  • Learning rate experiments: ~160 min each
  • Batch size experiments: 139-170 min each
  • Training duration: 140-228 min each

Total: ~20 hours for all 11 experiments


Citation

Dataset:

@inproceedings{maas2011learning,
  title={Learning word vectors for sentiment analysis},
  author={Maas, Andrew L and Daly, Raymond E and Pham, Peter T and
          Huang, Dan and Ng, Andrew Y and Potts, Christopher},
  booktitle={ACL},
  year={2011}
}

DistilBERT:

@article{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  journal={arXiv preprint arXiv:1910.01108},
  year={2019}
}

Contact

Student: Martynas Prascevicius Student ID: 001263199 Email: mpcode@icloud.com University: University of Greenwich


Academic Integrity

This code is submitted as coursework for COMP1818.

AI assistance (ChatGPT) was used for:

  • Code debugging and structure
  • Documentation and comments
  • LaTeX formatting

All experimental design, analysis, and conclusions are my own work.


Last Updated: November 16, 2025