Skip to content

Regime C: Remove Lookahead, lr=4e-3 (pure AdamW)#1245

Closed
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-c
Closed

Regime C: Remove Lookahead, lr=4e-3 (pure AdamW)#1245
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-c

Conversation

@tcapelle
Copy link
Copy Markdown
Contributor

@tcapelle tcapelle commented Mar 19, 2026

Hypothesis

Lookahead may be constraining late-training convergence. Pure AdamW with slightly higher LR.

Instructions

Replace Lookahead with raw AdamW. lr=4e-3. Keep everything else. Run with --wandb_group regime-c.

Baseline (verified frontier, 4 consecutive plateau rounds)

  • mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
  • 50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: 2t5k2nku
Epochs completed: 57/100 (hit 30-min wall-clock limit; killed mid-epoch 58)
Peak memory: 14.7 GB

Val loss @ epoch 57

Split val/loss
in_dist 0.6525
ood_cond 0.7647
ood_re 0.5694
tandem 1.7149

Surface MAE @ epoch 57

Split Ux Uy p
in_dist 6.39 1.85 19.31
ood_cond 3.95 1.25 15.33
ood_re 3.21 1.05 28.48
tandem 6.60 2.40 40.43

Volume MAE @ epoch 57

Split Ux Uy p
in_dist 1.13 0.40 20.76
ood_cond 0.74 0.29 12.92
ood_re 0.82 0.37 47.40
tandem 2.02 0.92 40.06

mean3 @ epoch 57: (19.31 + 15.33 + 40.43) / 3 = 25.0 vs baseline 23.2 (worse)

What happened

The run was cut off at epoch 57/100 due to the 30-minute timeout, making this an incomplete evaluation. At epoch 57, surface pressure MAE is worse than baseline across all splits — tandem (40.43 vs 37.7), in_dist (19.31 vs 17.5), ood_cond (15.33 vs 14.3), ood_re (28.48 vs 27.7). All val losses were still declining at epoch 57, so the model had not converged.

Two interpretations are possible: (1) pure AdamW at lr=4e-3 converges more slowly in early-to-mid training and would close the gap given all 100 epochs; (2) Lookahead's slow-weight stabilization genuinely helps and removing it leads to a worse optimum. The incomplete run can't distinguish these. What is clear: at the 30-min mark, removing Lookahead is not beneficial.

Suggested follow-ups

  1. Compare at a fixed epoch budget (e.g., 57 epochs): Run baseline (Lookahead + lr=3e-3) and pure AdamW side-by-side capped at the same number of epochs for a fair comparison.
  2. Try lr=3e-3 pure AdamW: If lr=4e-3 is converging more slowly, testing the same LR without Lookahead isolates the optimizer effect.
  3. Keep Lookahead, increase lr to 4e-3: Decouples LR increase from Lookahead removal — tests whether the higher LR alone can improve over the baseline.

@tcapelle tcapelle added status:wip Student is working on it student:tanjiro Assigned to tanjiro noam Noam advisor branch experiments labels Mar 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 19, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-tanjiro
senpai-advisor, senpai-tanjiro seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@tcapelle tcapelle marked this pull request as ready for review March 19, 2026 10:01
@tcapelle tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026
@tcapelle
Copy link
Copy Markdown
Contributor Author

Review: Closed — Removing Lookahead Unhelpful

Pure AdamW at lr=4e-3 didn't improve over Lookahead+AdamW at lr=3e-3. mean3=25.0 vs 23.2 (+7.8%). The hypothesis that Lookahead constrains late-training convergence is not supported — Lookahead's slow-weight stabilization appears genuinely beneficial.

@tcapelle tcapelle closed this Mar 19, 2026
@tcapelle tcapelle deleted the exp-noam/regime-c branch March 19, 2026 10:10
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

noam Noam advisor branch experiments status:review Ready for advisor review student:tanjiro Assigned to tanjiro

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant