Skip to content

Latest commit

 

History

History
134 lines (75 loc) · 4.04 KB

File metadata and controls

134 lines (75 loc) · 4.04 KB

Presentation Speaking Script

Student: Martynas Prascevicius (001263199) Duration: 5 minutes


SLIDE 1: Title (15 seconds)

Hello, my name is Martynas Prascevicius.

This is my project on optimizing DistilBERT for sentiment analysis.

I focused on ONE model and tested different settings to find the best ones.


SLIDE 2: The Problem (25 seconds)

DistilBERT comes with recommended settings from the BERT paper: learning rate 2e-5 to 5e-5, batch size 16, and 3 epochs.

But are these REALLY the best settings for sentiment analysis?

My research question is: What are the best settings for DistilBERT when analyzing movie reviews?


SLIDE 3: Methodology (40 seconds)

I ran 11 experiments in 4 phases. Each phase tests ONE setting while keeping everything else the same.

Phase 1: Baseline with recommended settings - 90.59% accuracy on IMDB reviews.

Phase 2: Tested 4 learning rates - 1e-5, 2e-5, 3e-5, and 5e-5.

Phase 3: Tested 3 batch sizes - 8, 16, and 32.

Phase 4: Tested 3, 4, and 5 epochs.

I used IMDB dataset with 25,000 reviews for training and 25,000 for testing. Total time was 20 hours on Mac mini M4.


SLIDE 4: Learning Rate Results (40 seconds)

My first finding is about learning rate.

2e-5 got the BEST performance at 90.62% accuracy. The slower rate 1e-5 got only 90.45%. The faster rates got worse - 5e-5 reached only 90.21%.

This validates the BERT paper recommendations. The baseline rate 2e-5 is optimal. Slower rates do not learn enough. Faster rates learn too quickly and miss the best solution.

This confirms that the original BERT guidance works well for sentiment analysis.


SLIDE 5: Batch Size Results (30 seconds)

My second finding is about batch size.

Batch size 16 got the BEST accuracy at 90.60%. Batch 8 got 90.30%. Batch 32 got the WORST at 90.12%.

Larger batches learn too precisely from training data, which hurts performance on new reviews. Medium batches learn more general patterns. Batch 16 is 0.48 percentage points better than batch 32.


SLIDE 6: Training Duration Results (30 seconds)

My third finding is about training duration.

At 3 epochs: 90.59% test, 96.73% training - the baseline.

At 4 epochs: 90.64% test, 98.11% training - gap growing.

At 5 epochs: 90.84% test, 98.95% training - 8.1% gap.

Test accuracy improves even though the model memorizes training data. But the growing gap means limited robustness. For production, 3 to 4 epochs is safer.


SLIDE 7: Results Summary (30 seconds)

Across all 11 experiments, clear patterns emerge.

Highest accuracy: 90.84% with 5 epochs, but shows overfitting. Best learning rate: 90.62% with 2e-5. Worst: 90.12% with batch 32. Baseline: 90.59%.

The optimal configuration: learning rate 2e-5, batch size 16, and 3 to 4 epochs.

This gets 90.6 to 90.8% accuracy - 120 to 180 fewer errors on 25,000 reviews compared to bad settings.


SLIDE 8: Key Findings (25 seconds)

Three key contributions.

First, BERT recommended learning rate 2e-5 is optimal. The original guidance is correct.

Second, larger batch sizes hurt performance. Batch 16 is 0.48 points better than batch 32.

Third, overfitting is complex. Test accuracy improves even when memorizing. The 8.1% gap at 5 epochs shows limited robustness.


SLIDE 9: Conclusion (15 seconds)

In conclusion, I ran 11 experiments testing different settings.

BERT recommended settings work well. Careful tuning gives 0.25% gain over baseline.

Thank you.


Total Duration: Approximately 4 minutes 20 seconds Recommended pace: Natural, clear speaking without rushing


TIMING BREAKDOWN

Slide Topic Time Cumulative
1 Title 15s 0:15
2 Problem 25s 0:40
3 Methodology 40s 1:20
4 Learning Rate 40s 2:00
5 Batch Size 30s 2:30
6 Training Duration 30s 3:00
7 Results Summary 30s 3:30
8 Key Findings 25s 3:55
9 Conclusion 15s 4:10

Total speaking time: 4 minutes 10 seconds With natural pauses: ~4 minutes 30 seconds Target: Under 5 minutes ✓