Skip to content

Regime B: lr=5e-3, temp clamp 0.15 at ep40 (aggressive exploration)#1244

Closed
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-b
Closed

Regime B: lr=5e-3, temp clamp 0.15 at ep40 (aggressive exploration)#1244
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-b

Conversation

@tcapelle
Copy link
Copy Markdown
Contributor

@tcapelle tcapelle commented Mar 19, 2026

Hypothesis

Higher LR for faster exploration, sharper temperature earlier for committed routing.

Instructions

Change: lr=5e-3 (both groups scale proportionally), temp clamp to 0.15 (not 0.25) starting at ep40 (not ep50). Run with --wandb_group regime-b.

Baseline (verified frontier, 4 consecutive plateau rounds)

  • mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
  • 50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: 8e70fbc6 (fern/regime-b-lr5e-3-temp0.15-ep40)
Epochs completed: 57/100 (hit 30-min timeout)
Peak VRAM: 14.7 GB

Surface MAE pressure (primary metric)

Split This run Baseline Delta
in_dist 18.9 17.5 +1.4
ood_cond 14.9 14.3 +0.6
ood_re 28.4 27.7 +0.7
tandem 40.9 37.7 +3.2
mean3 24.9 23.2 +1.7

Surface MAE (Ux, Uy, p)

Split Ux Uy p
in_dist 5.64 1.82 18.9
ood_cond 3.12 1.10 14.9
ood_re 2.55 0.90 28.4
tandem 6.35 2.48 40.9

Volume MAE (Ux, Uy, p)

Split Ux Uy p
in_dist 1.06 0.38 19.9
ood_cond 0.71 0.29 12.8
ood_re 0.82 0.37 47.3
tandem 1.97 0.92 40.1

val/loss: 0.9134

What happened

Regime B did not improve on baseline. mean3 regressed from 23.2 → 24.9 (+7.3%), with tandem being the worst affected (+8.5%). All splits got slightly worse.

The combination of higher LR (5e-3) and sharper temperature clamped earlier (0.15 at ep40) appears to be too aggressive. At epoch 40, when temperature clamping kicks in, the model is still in active learning (val_in_dist loss ~1.1); forcing committed routing that early before representations have matured seems to hurt generalization. The higher LR may also contribute by causing the model to overshoot during the early rapid-learning phase.

The training loss was still declining smoothly at timeout (vol=0.154, surf=0.025), so this is not a divergence issue — it's a ceiling/generalization problem.

Suggested follow-ups

  1. Isolate LR effect: keep lr=5e-3 but use original temp clamp (0.25 at ep50) — test if higher LR alone is beneficial
  2. Isolate temp effect: keep lr=3e-3 but use temp 0.15 at ep40 — test if earlier/sharper temp alone is beneficial
  3. Softer temp earlier: try temp 0.20 at ep45 — less aggressive than regime B
  4. Later temp clamp: try 0.15 at ep55 or ep60 — let representations mature before committing routing

@tcapelle tcapelle added status:wip Student is working on it student:fern Assigned to fern noam Noam advisor branch experiments labels Mar 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 19, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-fern
senpai-advisor, senpai-fern seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

… exploration)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tcapelle tcapelle marked this pull request as ready for review March 19, 2026 10:01
@tcapelle tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026
@tcapelle
Copy link
Copy Markdown
Contributor Author

Review: Closed — Too Aggressive

Higher LR (5e-3) with earlier/sharper temperature clamping (0.15 at ep40) hurt generalization across all splits. mean3=24.9 vs baseline 23.2 (+7.3% worse). Tandem took the biggest hit (+8.5%). Forcing committed routing at ep40 before representations mature damages OOD transfer.

@tcapelle tcapelle closed this Mar 19, 2026
@tcapelle tcapelle deleted the exp-noam/regime-b branch March 19, 2026 10:10
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

noam Noam advisor branch experiments status:review Ready for advisor review student:fern Assigned to fern

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant