Our project fine-tunes a pretrained ResNet-152 model on an emotion image dataset using augmentation, regularization, and optimization strategies to maximize accuracy.
At the end, we use the model to infer on the test set outputting the prediction results in a csv
- Stratified train/val split (20% val) with fixed seed for reproducibility.
- Image size 224×224, batch size 32, DataLoader with workers and pinned memory for speed.
- Heavy train-time augmentations (crops, flips, color jitter, rotations, blur, erasing).
- Simple val transforms for consistent evaluation.
- Optional TTA (3 flipped views) for more stable validation.
- Pretrained backbone: ResNet-152 (ImageNet weights).
- AdamW optimizer with different learning rates (backbone lower LR, head higher LR).
- CosineAnnealingWarmRestarts scheduler with warmup for smooth LR transitions.
- Label smoothing loss (smoothing=0.1) to reduce overconfidence.
- Mixup (p=0.5, alpha=0.2) for better calibration and generalization.
- Gradient clipping (max_norm=1.0) and AMP for stability and efficiency.
- Resume from checkpoint if available (model, optimizer, scheduler states).
- Early stopping and best model saving based on val accuracy.
- Metrics logged each epoch for monitoring performance.
- IMG_SIZE=224, BATCH_SIZE=32, VAL_FRACTION=0.2, SEED=42
- LR: backbone 1e-5/1e-4, head 1e-3, WD 1e-4/1e-3
- TTA every 5 epochs, AMP on CUDA
- Strong pretrained backbone accelerates learning with limited data.
- Heavy augmentation, label smoothing, and mixup reduce overfitting.
- Discriminative LRs + cosine schedule improve fine-tuning stability.
- AMP and clipping keep training efficient and stable on hackathon hardware.\