Regime B: lr=5e-3, temp clamp 0.15 at ep40 (aggressive exploration)#1244
Regime B: lr=5e-3, temp clamp 0.15 at ep40 (aggressive exploration)#1244
Conversation
|
I have read the CLA Document and I hereby sign the CLA 0 out of 2 committers have signed the CLA. |
… exploration) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review: Closed — Too AggressiveHigher LR (5e-3) with earlier/sharper temperature clamping (0.15 at ep40) hurt generalization across all splits. mean3=24.9 vs baseline 23.2 (+7.3% worse). Tandem took the biggest hit (+8.5%). Forcing committed routing at ep40 before representations mature damages OOD transfer. |
Hypothesis
Higher LR for faster exploration, sharper temperature earlier for committed routing.
Instructions
Change: lr=5e-3 (both groups scale proportionally), temp clamp to 0.15 (not 0.25) starting at ep40 (not ep50). Run with
--wandb_group regime-b.Baseline (verified frontier, 4 consecutive plateau rounds)
Results
W&B run:
8e70fbc6(fern/regime-b-lr5e-3-temp0.15-ep40)Epochs completed: 57/100 (hit 30-min timeout)
Peak VRAM: 14.7 GB
Surface MAE pressure (primary metric)
Surface MAE (Ux, Uy, p)
Volume MAE (Ux, Uy, p)
val/loss: 0.9134
What happened
Regime B did not improve on baseline. mean3 regressed from 23.2 → 24.9 (+7.3%), with tandem being the worst affected (+8.5%). All splits got slightly worse.
The combination of higher LR (5e-3) and sharper temperature clamped earlier (0.15 at ep40) appears to be too aggressive. At epoch 40, when temperature clamping kicks in, the model is still in active learning (val_in_dist loss ~1.1); forcing committed routing that early before representations have matured seems to hurt generalization. The higher LR may also contribute by causing the model to overshoot during the early rapid-learning phase.
The training loss was still declining smoothly at timeout (vol=0.154, surf=0.025), so this is not a divergence issue — it's a ceiling/generalization problem.
Suggested follow-ups