Student: Martynas Prascevicius Student ID: 001263199 Course: COMP1818 Artificial Intelligence Applications Completion Date: November 15, 2025
- Baseline experiment (90.77% accuracy)
- Phase 2: Learning rates (4 experiments)
- Phase 3: Batch sizes (3 experiments)
- Phase 4: Training duration (3 experiments)
- Total: 11 experiments, ~20 hours compute time
- All 5 publication-quality figures generated (PDF + PNG)
- LaTeX results table created
- Results analysis completed
- 4-page LaTeX report written
- Bibliography file updated (11 references)
- Demo inference script created
- 5-minute presentation guide written
- Overleaf upload package created (104 KB)
- All files packaged and ready for submission
CW2/
├── Prascevicius_Martynas_DistilBERT.tex # Main LaTeX report
├── demo_inference.py # Demo script for presentation
├── PRESENTATION_GUIDE.md # 5-minute presentation guide
├── CW2_Overleaf_Package.zip # Ready to upload (104 KB)
│
├── src/
│ ├── experiment_configs.py # 11 experiment configurations
│ ├── experiment_runner.py # Automated training pipeline
│ ├── enhanced_model.py # DistilBERT model class
│ ├── data_loader.py # Local IMDB loader
│ ├── generate_figures.py # Visualization script
│ └── results_analyzer.py # Results analysis tools
│
├── results/ # 11 experiment JSON files
│ ├── baseline_default.json # 90.77%
│ ├── lr_1e5.json # 91.04% ⭐ BEST
│ ├── lr_2e5.json # 90.96%
│ ├── lr_3e5.json # 90.83%
│ ├── lr_5e5.json # 90.06%
│ ├── batch_8.json # 90.86%
│ ├── batch_16.json # 90.91%
│ ├── batch_32.json # 90.40%
│ ├── epochs_3.json # 91.02%
│ ├── epochs_4.json # 91.00%
│ └── epochs_5.json # 90.28% (overfitting)
│
├── figures/ # All visualizations
│ ├── figure1_learning_rate.pdf/.png
│ ├── figure2_batch_size.pdf/.png
│ ├── figure3_overfitting.pdf/.png
│ ├── figure4_training_history.pdf/.png
│ ├── figure5_all_experiments.pdf/.png
│ └── table_all_results.tex
│
├── literature/
│ └── references_distilbert.bib # 11 academic references
│
├── Overleaf_Upload/ # Ready for Overleaf
│ ├── Prascevicius_Martynas_DistilBERT.tex
│ ├── references_distilbert.bib
│ ├── COMPXXXX.cls
│ ├── COMPXXXX.bst
│ ├── figure1_learning_rate.pdf
│ ├── figure2_batch_size.pdf
│ ├── figure3_overfitting.pdf
│ ├── figure4_training_history.pdf
│ ├── figure5_all_experiments.pdf
│ └── README.txt
│
└── documentation/
├── EXPLORATION_PLAN.md # Original plan document
├── OPTION_B_PLAN.md # Focused 11-experiment plan
└── PROJECT_COMPLETE.md # This file
Result: LR=1e-5 achieved 91.04% (vs 90.77% baseline)
| Learning Rate | Accuracy | vs Baseline |
|---|---|---|
| 1e-5 | 91.04% | +0.27% ⭐ |
| 2e-5 | 90.96% | +0.19% |
| 3e-5 | 90.83% | +0.06% |
| 5e-5 | 90.06% | -0.71% ❌ |
Insight: Challenges BERT paper's 2e-5 to 5e-5 recommendation. Slower learning prevents overshooting for sentiment tasks.
Result: Batch 16 optimal, Batch 32 degrades despite speed
| Batch Size | Accuracy | Training Time | Efficiency |
|---|---|---|---|
| 8 | 90.86% | 170 min | Slow |
| 16 | 90.91% | 153 min | Optimal ⭐ |
| 32 | 90.40% | 139 min | Fast but worse |
Insight: Batch 32 achieved 91.24% validation but only 90.40% test accuracy - clear generalization gap (Smith et al. 2017 confirmed).
Result: Training beyond 3 epochs degrades performance
| Epochs | Test Acc | Train Acc | Gap | Status |
|---|---|---|---|---|
| 3 | 91.02% | 97.00% | 6.0% | ✅ Optimal |
| 4 | 91.00% | 98.15% | 7.2% | |
| 5 | 90.28% | 98.95% | 8.67% | ❌ Severe |
Insight: At 5 epochs, nearly perfect training (98.95%) but worse test accuracy (90.28%) - model memorizing not learning.
Based on all 11 experiments:
optimal_config = {
'learning_rate': 1e-5, # Conservative (not 2e-5!)
'batch_size': 16, # Medium (not 32 for speed)
'num_epochs': 3, # Early stopping crucial
'max_length': 256,
'optimizer': 'AdamW',
'weight_decay': 0.01,
}Expected Performance: 91.0-91.1% accuracy on IMDB Training Time: ~160 minutes (Mac M4) Improvement: +0.27% absolute (+68 fewer errors / 25k reviews)
- File:
Prascevicius_Martynas_DistilBERT.tex - Pages: ~10-12 (including references)
- Figures: 5 publication-quality PDFs
- References: 11 academic papers
- Format: A4, double-column, 10pt (COMPXXXX template)
- File:
CW2_Overleaf_Package.zip(104 KB) - Contents: LaTeX source + figures + template + bibliography
- Status: Ready to upload to Overleaf and compile
- All experiment code: src/ directory
- Results: 11 JSON files with complete metrics
- Visualization: generate_figures.py script
- Data loading: Local IMDB loader (no dependencies)
- Demo script:
demo_inference.py- Analyzes 5 sample reviews
- Shows confidence scores
- Runs in <1 minute
- Presentation guide:
PRESENTATION_GUIDE.md- 5-minute timing breakdown
- Slide-by-slide scripts
- Common Q&A with answers
- Backup plans
Option A - Overleaf (Recommended):
- Go to overleaf.com
- Create new project
- Upload
CW2_Overleaf_Package.zip - Extract all files
- Set main document:
Prascevicius_Martynas_DistilBERT.tex - Click "Recompile"
- Download PDF
Option B - Local:
cd Overleaf_Upload
pdflatex Prascevicius_Martynas_DistilBERT.tex
bibtex Prascevicius_Martynas_DistilBERT
pdflatex Prascevicius_Martynas_DistilBERT.tex
pdflatex Prascevicius_Martynas_DistilBERT.texcd /Users/m2000uk/Desktop/coding/AI/CW2
zip -r Prascevicius_Martynas_CW2_Code.zip \
src/ \
results/ \
figures/ \
data/ \
models/ \
demo_inference.py \
requirements.txt \
README.mdcd /Users/m2000uk/Desktop/coding/AI/CW2
source ../venv/bin/activate
python3 demo_inference.py- Read
PRESENTATION_GUIDE.md - Create PowerPoint/Keynote slides
- Practice timing (aim for 4:45-5:00)
- Test demo if using
- LaTeX PDF: Prascevicius_Martynas_DistilBERT.pdf
- Source Code ZIP: Prascevicius_Martynas_CW2_Code.zip
- Presentation (if required): 5-minute video or slides
Deadline: November 19, 2025, 5pm UK Grace Period: Until November 21, 2025, 5pm UK
- Total experiments: 11
- Total compute time: ~20 hours
- Best accuracy: 91.04% (lr_1e5)
- Worst accuracy: 90.06% (lr_5e5)
- Accuracy range: 0.98%
- Training samples: 25,000 (IMDB)
- Test samples: 25,000 (IMDB)
- Total lines of code: ~2,500
- Python files: 8
- JSON result files: 11
- Figures generated: 5 (10 files: PDF + PNG)
- LaTeX report: ~4,000 words
- Presentation guide: ~3,500 words
- Code comments: ~500 lines
- README files: 4 documents
- Device: Mac mini M4
- RAM: 24 GB unified memory
- GPU: Metal Performance Shaders (MPS)
- OS: macOS 26.0.1
- Avg. GPU utilization: 98.36%
- Vaswani et al. (2017) - Attention is All You Need (Transformers)
- Devlin et al. (2019) - BERT: Pre-training of Deep Bidirectional Transformers
- Sanh et al. (2019) - DistilBERT: Smaller, Faster, Cheaper, Lighter
- Liu et al. (2019) - RoBERTa: Robustly Optimized BERT
- Howard & Ruder (2018) - ULMFiT: Universal Language Model Fine-tuning
- Sun et al. (2019) - How to Fine-Tune BERT for Text Classification
- Smith et al. (2017) - Don't Decay the Learning Rate, Increase the Batch Size
- Loshchilov & Hutter (2017) - AdamW: Decoupled Weight Decay
- Masters & Luschi (2018) - Revisiting Small Batch Training
- Maas et al. (2011) - Learning Word Vectors for Sentiment Analysis (IMDB)
- Reimers & Gurevych (2019) - Sentence-BERT: Sentence Embeddings
- Base model: DistilBERT-base-uncased
- Parameters: 66,364,418 (all trainable)
- Layers: 6 transformer layers
- Hidden size: 768
- Attention heads: 12
- Vocabulary: 30,522 tokens
- Optimizer: AdamW (weight decay 0.01)
- Gradient clipping: Max norm 1.0
- Scheduler: None (constant LR)
- Loss function: Cross-entropy
- Evaluation: Every epoch
- Early stopping: Based on validation loss
- Tokenizer: DistilBERT WordPiece
- Max sequence length: 256 tokens
- Padding: Right-side padding
- Truncation: Enabled
- Special tokens: [CLS], [SEP]
- Conservative LRs outperform BERT recommendations for sentiment tasks
- Batch size-generalization gap confirmed and quantified
- Overfitting onset precisely identified (between epochs 3-4)
- Systematic single-model exploration vs. shallow multi-model comparison
- Controlled experiments with one variable per phase
- Literature-grounded hypotheses tested empirically
- Optimal configuration for DistilBERT sentiment analysis
- Trade-off analysis (speed vs. accuracy vs. overfitting)
- Actionable recommendations for similar tasks
- LaTeX report compiled successfully
- All 5 figures appear in PDF
- All 11 references numbered correctly
- No compilation errors or warnings
- Source code ZIP created
- Demo script tested
- Presentation guide reviewed
- No "Made by mpcode" in LaTeX (university submission)
- Student name and ID on all documents
- AI Use Declaration included in report
- All figures have captions and are referenced
- Equations generated by LaTeX (not screenshots)
- Bibliography formatted correctly
- PDF report (4 pages + references)
- Source code ZIP with all experiments
- Overleaf package (104 KB)
- Presentation materials (demo + guide)
- Complete documentation
- Systematic approach - Controlled experiments yielded clear insights
- Automation - Scripts enabled overnight experiment runs
- Visualization - Publication-quality figures tell the story
- Documentation - Comprehensive guides for future reference
- MPS non-determinism - Documented variance (±0.02%)
- Long training times - Automated sequential execution
- Batch size impact - Discovered generalization gap empirically
- Overfitting detection - Identified precise onset timing
- Learning rate scheduling - Test linear, cosine, polynomial
- Warmup steps - Explore 0, 100, 500, 1000 steps
- Other datasets - Validate on SST-2, Yelp, Amazon
- Model variants - Compare with BERT-base, RoBERTa
Student: Martynas Prascevicius Student ID: 001263199 Email: mpcode@icloud.com University: University of Greenwich Course: COMP1818 Artificial Intelligence Applications Academic Year: 2025-26
All tasks finished: November 15, 2025 Total time invested: ~30 hours (experiments + analysis + writing) Ready for submission: Yes ✅
Good luck with the presentation! 🎯
This document was generated as part of CW2: DistilBERT Hyperparameter Exploration Last updated: November 15, 2025