Regime C: Remove Lookahead, lr=4e-3 (pure AdamW) by tcapelle · Pull Request #1245 · wandb/senpai

tcapelle · 2026-03-19T08:46:15Z

Hypothesis

Lookahead may be constraining late-training convergence. Pure AdamW with slightly higher LR.

Instructions

Replace Lookahead with raw AdamW. lr=4e-3. Keep everything else. Run with --wandb_group regime-c.

Baseline (verified frontier, 4 consecutive plateau rounds)

mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: 2t5k2nku
Epochs completed: 57/100 (hit 30-min wall-clock limit; killed mid-epoch 58)
Peak memory: 14.7 GB

Val loss @ epoch 57

Split	val/loss
in_dist	0.6525
ood_cond	0.7647
ood_re	0.5694
tandem	1.7149

Surface MAE @ epoch 57

Split	Ux	Uy	p
in_dist	6.39	1.85	19.31
ood_cond	3.95	1.25	15.33
ood_re	3.21	1.05	28.48
tandem	6.60	2.40	40.43

Volume MAE @ epoch 57

Split	Ux	Uy	p
in_dist	1.13	0.40	20.76
ood_cond	0.74	0.29	12.92
ood_re	0.82	0.37	47.40
tandem	2.02	0.92	40.06

mean3 @ epoch 57: (19.31 + 15.33 + 40.43) / 3 = 25.0 vs baseline 23.2 (worse)

What happened

The run was cut off at epoch 57/100 due to the 30-minute timeout, making this an incomplete evaluation. At epoch 57, surface pressure MAE is worse than baseline across all splits — tandem (40.43 vs 37.7), in_dist (19.31 vs 17.5), ood_cond (15.33 vs 14.3), ood_re (28.48 vs 27.7). All val losses were still declining at epoch 57, so the model had not converged.

Two interpretations are possible: (1) pure AdamW at lr=4e-3 converges more slowly in early-to-mid training and would close the gap given all 100 epochs; (2) Lookahead's slow-weight stabilization genuinely helps and removing it leads to a worse optimum. The incomplete run can't distinguish these. What is clear: at the 30-min mark, removing Lookahead is not beneficial.

Suggested follow-ups

Compare at a fixed epoch budget (e.g., 57 epochs): Run baseline (Lookahead + lr=3e-3) and pure AdamW side-by-side capped at the same number of epochs for a fair comparison.
Try lr=3e-3 pure AdamW: If lr=4e-3 is converging more slowly, testing the same LR without Lookahead isolates the optimizer effect.
Keep Lookahead, increase lr to 4e-3: Decouples LR increase from Lookahead removal — tests whether the higher LR alone can improve over the baseline.

github-actions · 2026-03-19T08:46:27Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-tanjiro
senpai-advisor, senpai-tanjiro seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

tcapelle · 2026-03-19T10:10:28Z

Review: Closed — Removing Lookahead Unhelpful

Pure AdamW at lr=4e-3 didn't improve over Lookahead+AdamW at lr=3e-3. mean3=25.0 vs 23.2 (+7.8%). The hypothesis that Lookahead constrains late-training convergence is not supported — Lookahead's slow-weight stabilization appears genuinely beneficial.

Experiment placeholder

5792bc7

tcapelle added status:wip Student is working on it student:tanjiro Assigned to tanjiro noam Noam advisor branch experiments labels Mar 19, 2026

Regime C: replace Lookahead with pure AdamW, lr=4e-3

9bd890a

tcapelle marked this pull request as ready for review March 19, 2026 10:01

tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026

tcapelle closed this Mar 19, 2026

tcapelle deleted the exp-noam/regime-c branch March 19, 2026 10:10

github-actions Bot locked and limited conversation to collaborators Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regime C: Remove Lookahead, lr=4e-3 (pure AdamW)#1245

Regime C: Remove Lookahead, lr=4e-3 (pure AdamW)#1245
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-c

tcapelle commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

tcapelle commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tcapelle commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis

Instructions

Baseline (verified frontier, 4 consecutive plateau rounds)

Results

Val loss @ epoch 57

Surface MAE @ epoch 57

Volume MAE @ epoch 57

What happened

Suggested follow-ups

Uh oh!

github-actions Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tcapelle commented Mar 19, 2026

Review: Closed — Removing Lookahead Unhelpful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tcapelle commented Mar 19, 2026 •

edited

Loading

github-actions Bot commented Mar 19, 2026 •

edited

Loading