Student: Martynas Prascevicius Student ID: 001263199 Course: COMP1818 Artificial Intelligence Applications Academic Year: 2025-26
This project systematically explores DistilBERT hyperparameters for sentiment analysis through 11 controlled experiments across 4 phases:
- Phase 1: Baseline (1 experiment)
- Phase 2: Learning rates (4 experiments)
- Phase 3: Batch sizes (3 experiments)
- Phase 4: Training duration (3 experiments)
Best Result: 90.84% accuracy (5 epochs training)
IMDB Movie Reviews (Maas et al., 2011)
Note: The dataset is included in this submission (in data/aclImdb/ folder) for convenience and reproducibility.
Alternative: If you prefer to download separately: https://ai.stanford.edu/~amaas/data/sentiment/
Structure:
data/aclImdb/
├── train/pos/ (12,500 positive reviews)
├── train/neg/ (12,500 negative reviews)
├── test/pos/ (12,500 positive reviews)
└── test/neg/ (12,500 negative reviews)
Total: 50,000 reviews (25k train, 25k test)
CW2/
├── distilbert_experiment.py # Complete implementation (run this!)
├── requirements.txt # Python dependencies
├── README.md # This file
│
├── results/ # Experiment results (JSON)
│ ├── baseline_default.json
│ ├── lr_1e5.json
│ ├── lr_2e5.json
│ └── ... # 11 total
│
└── figures/ # Publication-quality figures
├── figure1_learning_rate.pdf
├── figure2_batch_size.pdf
├── figure3_overfitting.pdf
├── figure4_training_history.pdf
└── figure5_all_experiments.pdf
pip install -r requirements.txt(NOT NEEDED!!! as i included data)
#Download IMDB from: https://ai.stanford.edu/~amaas/data/sentiment/
#Extract to: CW2/data/aclImdb/
python3 distilbert_experiment.pyThis will:
- Run all 11 experiments (Phases 1-4)
- Generate all 5 figures automatically
- Save results to
results/directory - Save figures to
figures/directory
Time: ~10 hours on Mac M4 (varies by hardware)
Learning Rate (Phase 2):
- 1e-5: 90.45% (too conservative)
- 2e-5: 90.62% (BEST - validates BERT recommendations)
- 3e-5: 90.37% (moderate performance)
- 5e-5: 90.21% (unstable training)
Batch Size (Phase 3):
- 8: 90.30% (slow: 54 min)
- 16: 90.60% (optimal: 49 min)
- 32: 90.12% (fast: 47 min, worse generalisation)
Training Duration (Phase 4):
- 3 epochs: 90.59% (baseline)
- 4 epochs: 90.64% (slight improvement)
- 5 epochs: 90.84% (BEST - but shows overfitting signs)
learning_rate = 2e-5 # BERT recommended (validated!)
batch_size = 16 # Medium (not 32)
num_epochs = 3 # Balanced (3-4 for robustness)
max_length = 256
optimizer = AdamW
weight_decay = 0.01- Minimum: 8GB RAM, CPU
- Recommended: 16GB RAM, GPU (CUDA/MPS)
- Used for this project: Mac mini M4, 24GB RAM, MPS GPU
- Python 3.10+
- PyTorch 2.0+
- Transformers 4.30+
- See
requirements.txtfor complete list
All experiments use:
- Random seed: 42
- Deterministic algorithms: Enabled
- Data splits: 90% train, 10% validation (from 25k training set)
- Test set: Fixed 25k reviews (never seen during training)
Note: MPS GPU has ±0.02% variance (~5 predictions out of 25k)