Skip to content

Regime E: EMA from ep25 + decay=0.997 + T_max=72 (longer EMA window)#1247

Closed
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-e
Closed

Regime E: EMA from ep25 + decay=0.997 + T_max=72 (longer EMA window)#1247
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-e

Conversation

@tcapelle
Copy link
Copy Markdown
Contributor

@tcapelle tcapelle commented Mar 19, 2026

Hypothesis

Earlier EMA start + slightly faster decay + longer LR schedule = more epochs of productive averaging.

Instructions

Change: ema_start_epoch=25, ema_decay=0.997, T_max=72. Run with --wandb_group regime-e.

Baseline (verified frontier, 4 consecutive plateau rounds)

  • mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
  • 50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: inhta48l
Status: Timed out at epoch 55/100 (30-min cap)

Metrics at epoch 55 (EMA model, mid-run)

Split val/loss surf Ux surf Uy surf p vol MAE (Ux/Uy/p)
in_dist 0.6440 6.09 1.58 19.6 1.13 / 0.37 / 20.3
ood_cond 0.7364 3.45 1.02 14.6 0.75 / 0.28 / 12.5
ood_re 0.5606 3.12 0.85 28.3 0.83 / 0.36 / 47.2
tandem 1.6969 6.59 2.15 40.5 1.97 / 0.90 / 39.2

mean3 (surf p, in+ood+tan / 3): 24.9 vs baseline 23.2 — worse (+7%)
in=19.6 (+2.1), ood=14.6 (+0.3), re=28.3 (+0.6), tan=40.5 (+2.8)

Peak memory: ~87.8 GB (of 96 GB)

What happened

The run hit the 30-minute timeout at epoch 55/100 — results are mid-training, not final. This makes fair comparison to the baseline impossible. The core problem: with T_max=72 (vs 62 baseline), the cosine LR schedule at epoch 55 is further from its minimum (45/72 steps complete vs ~45/52 for baseline at epoch 55). The model is operating in a higher-LR regime with more active weight updates, so it hasn't converged to its best checkpoint. The EMA model (active since epoch 25, 30 epochs of averaging) is averaging over a noisier trajectory than the baseline EMA model was.

All 4 splits show worse performance at epoch 55 vs baseline. But since T_max=72 is designed to shift productive learning later in training, the final-epoch comparison would be more meaningful — which we can't see due to the timeout.

Verdict: Inconclusive. The T_max=72 change is structurally incompatible with the 30-minute training cap — the schedule only completes at epoch ~82, well past the timeout.

Suggested follow-ups

  • Isolate ema_start_epoch=25: test earlier EMA alone (keep T_max=62) — compatible with timeout and gives more EMA averaging time at the same convergence point
  • Isolate ema_decay=0.997: slightly faster decay alone may help or hurt independently
  • Shorten T_max to fit the cap: if ~55 epochs fit in 30 min, try T_max=45 or T_max=50 to ensure the schedule completes within the budget

@tcapelle tcapelle added status:wip Student is working on it student:alphonse Assigned to alphonse noam Noam advisor branch experiments labels Mar 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 19, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-alphonse
senpai-advisor, senpai-alphonse seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@tcapelle tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026
@tcapelle tcapelle marked this pull request as ready for review March 19, 2026 10:05
@tcapelle
Copy link
Copy Markdown
Contributor Author

Review: Closed — Schedule Incompatible with Budget

T_max=72 shifts productive learning too late — the cosine schedule at epoch 55 was only 63% complete, operating at a higher LR than baseline at the same epoch. The EMA model averaged over a noisier trajectory. mean3=24.9 vs 23.2 (+7.3%). Like Regime A, this approach needs more epochs than the 30-min cap allows.

@tcapelle tcapelle closed this Mar 19, 2026
@tcapelle tcapelle deleted the exp-noam/regime-e branch March 19, 2026 10:10
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

noam Noam advisor branch experiments status:review Ready for advisor review student:alphonse Assigned to alphonse

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant