Student: Martynas Prascevicius Student ID: 001263199 Course: COMP1818 Artificial Intelligence Applications Academic Year: 2025-26
This project systematically explores DistilBERT hyperparameters for sentiment analysis through 11 controlled experiments across 4 phases:
- Phase 1: Baseline (1 experiment)
- Phase 2: Learning rates - 1e-5, 2e-5, 3e-5, 5e-5 (4 experiments)
- Phase 3: Batch sizes - 8, 16, 32 (3 experiments)
- Phase 4: Training duration - 3, 4, 5 epochs (3 experiments)
Best Result: 91.04% accuracy (learning rate 1e-5)
├── src/
│ ├── run_all.py # Master script - runs everything
│ ├── experiment_runner.py # Main training script
│ ├── experiment_configs.py # All 11 experiment configs
│ ├── data_loader.py # IMDB dataset loader
│ ├── enhanced_model.py # DistilBERT model class
│ ├── generate_figures.py # Creates all 5 figures
│ └── results_analyzer.py # Optional analysis tool
│
├── results/ # All 11 experiment results (JSON)
│ ├── baseline_default.json # 90.77%
│ ├── lr_1e5.json # 91.04% (BEST)
│ ├── lr_2e5.json # 90.96%
│ ├── lr_3e5.json # 90.83%
│ ├── lr_5e5.json # 90.06%
│ ├── batch_8.json # 90.86%
│ ├── batch_16.json # 90.91%
│ ├── batch_32.json # 90.40%
│ ├── epochs_3.json # 91.02%
│ ├── epochs_4.json # 91.00%
│ └── epochs_5.json # 90.28%
│
├── requirements.txt # Python dependencies
└── README_CODE_SUBMISSION.md # This file
IMDB Movie Reviews (Maas et al., 2011)
Download: https://ai.stanford.edu/~amaas/data/sentiment/
Extract to create this structure:
CW2/data/aclImdb/
├── train/pos/ (12,500 positive reviews)
├── train/neg/ (12,500 negative reviews)
├── test/pos/ (12,500 positive reviews)
└── test/neg/ (12,500 negative reviews)
Total: 50,000 reviews (25k train, 25k test)
pip install -r requirements.txtDownload IMDB from: https://ai.stanford.edu/~amaas/data/sentiment/
Extract to: CW2/data/aclImdb/
python3 src/run_all.pyThis will:
- Run all 11 experiments (Phases 1-4)
- Generate all 5 figures automatically
- Save everything to
results/andfigures/
Time: ~20 hours on Mac M4 (varies by hardware)
# Run one phase at a time
python3 src/experiment_runner.py --phase 1 # Baseline
python3 src/experiment_runner.py --phase 2 # Learning rates
python3 src/experiment_runner.py --phase 3 # Batch sizes
python3 src/experiment_runner.py --phase 4 # Training duration# If you already have results/*.json files
python3 src/generate_figures.py# Optional: compare all experiments
python3 src/results_analyzer.py --compare-all- Minimum: 8GB RAM, CPU
- Recommended: 16GB RAM, GPU (CUDA/MPS)
- Used for this project: Mac mini M4, 24GB RAM, MPS GPU
- Python 3.10+
- PyTorch 2.0+
- Transformers 4.30+
- See
requirements.txtfor complete list
Learning Rate (Phase 2):
- 1e-5: 91.04% ← BEST (challenges BERT recommendations)
- 2e-5: 90.96% (standard baseline)
- 3e-5: 90.83%
- 5e-5: 90.06% (unstable training)
Batch Size (Phase 3):
- 8: 90.86% (slow: 170 min)
- 16: 90.91% (optimal: 153 min)
- 32: 90.40% (fast: 139 min, but worse generalization)
Training Duration (Phase 4):
- 3 epochs: 91.02% (optimal - early stopping)
- 4 epochs: 91.00% (minimal gain)
- 5 epochs: 90.28% (severe overfitting)
learning_rate = 1e-5 # Conservative (not 2e-5!)
batch_size = 16 # Medium (not 32)
num_epochs = 3 # Early stopping
max_length = 256
optimizer = AdamW
weight_decay = 0.01src/run_all.py - Master script that runs all experiments + generates figures
src/experiment_runner.py - Main training script (loads data, trains model, saves results)
src/experiment_configs.py - Defines all 11 experiments with hyperparameters
src/data_loader.py - Loads IMDB from local directory (no HuggingFace dependency)
src/enhanced_model.py - DistilBERT model (66M parameters, pre-trained + classification head)
src/generate_figures.py - Creates 5 publication-quality figures (PDF + PNG)
src/results_analyzer.py - Optional tool for analysis and LaTeX table export
All experiments use:
- Random seed: 42
- Deterministic algorithms: Enabled
- Data splits: 90% train, 10% validation (from 25k training set)
- Test set: Fixed 25k reviews (never seen during training)
Note: MPS GPU has ±0.02% variance (~5 predictions out of 25k)
Per experiment (Mac M4 with MPS GPU):
- Baseline: ~140 min
- Learning rate experiments: ~160 min each
- Batch size experiments: 139-170 min each
- Training duration: 140-228 min each
Total: ~20 hours for all 11 experiments
Dataset:
@inproceedings{maas2011learning,
title={Learning word vectors for sentiment analysis},
author={Maas, Andrew L and Daly, Raymond E and Pham, Peter T and
Huang, Dan and Ng, Andrew Y and Potts, Christopher},
booktitle={ACL},
year={2011}
}
DistilBERT:
@article{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
journal={arXiv preprint arXiv:1910.01108},
year={2019}
}
Student: Martynas Prascevicius Student ID: 001263199 Email: mpcode@icloud.com University: University of Greenwich
This code is submitted as coursework for COMP1818.
AI assistance (ChatGPT) was used for:
- Code debugging and structure
- Documentation and comments
- LaTeX formatting
All experimental design, analysis, and conclusions are my own work.
Last Updated: November 16, 2025