Skip to content

Regime H: slice_num=48, n_hidden=160 (finer routing, narrower)#1250

Merged
tcapelle merged 2 commits intonoamfrom
exp-noam/regime-h
Mar 19, 2026
Merged

Regime H: slice_num=48, n_hidden=160 (finer routing, narrower)#1250
tcapelle merged 2 commits intonoamfrom
exp-noam/regime-h

Conversation

@tcapelle
Copy link
Copy Markdown
Contributor

@tcapelle tcapelle commented Mar 19, 2026

Hypothesis

More slices = finer spatial routing. Trade width for granularity. 48 slices at 160-dim may find better spatial decomposition.

Instructions

Change: slice_num=48, n_hidden=160. Run with --wandb_group regime-h.

Baseline (verified frontier, 4 consecutive plateau rounds)

  • mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
  • 50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: askeladd/regime-h-slice48-hidden160 (vn84qw4a), best epoch 62/63, state=crashed (wall-clock timeout; epoch 63 synced locally but not to W&B)

Split mae_surf_p vs baseline
val_in_dist 16.84 -0.66 (-3.8%)
val_ood_cond 13.82 -0.48 (-3.4%)
val_tandem_transfer 38.10 +0.40 (+1.1%)
val_ood_re 27.82 +0.12 (+0.4%)
mean3 22.9 -0.3 (-1.3%)
  • val/loss = 0.8648 (vs baseline ~0.87)
  • Peak memory: 13.0 GB (~14% of 96 GB)
  • 62 epochs in ~29 min (~28s/epoch), ~543k params

Surface MAE detail

Field val_in_dist val_ood_cond val_ood_re val_tandem
Ux 5.78 3.05 2.58 5.85
Uy 2.20 1.24 1.06 2.55
p 16.84 13.82 27.82 38.10

Volume MAE detail

Field val_in_dist val_ood_cond val_ood_re val_tandem
Ux 1.10 0.72 0.82 1.94
Uy 0.37 0.27 0.36 0.88
p 18.80 12.20 47.01 37.73

What happened

The 48-slice / 160-dim configuration improves mean3 by 0.3 points (23.2 → 22.9, -1.3%). Real but modest. The improvement concentrates in in-distribution and ood_cond (-3-4%), while ood_re and tandem barely change. The model converged steadily over 62 epochs with monotonically improving val/loss throughout.

Finer spatial routing (48 slices vs 32) appears to help, but narrowing from 192 to 160 dims recovers some of the gain as regularization rather than pure capacity improvement. The net result is a small win.

Memory is very lean at 13.0 GB vs 96 GB available — the smaller model leaves substantial headroom.

Suggested follow-ups

  • slice_num=48, n_hidden=192 — keep finer routing, restore full width. Isolates whether gain comes from slices or reduced model size (regularization).
  • slice_num=64, n_hidden=160 — test narrower width alone without extra slices.
  • n_layers=2 with this config — 13 GB footprint leaves ~83 GB free; a second layer at these smaller dims may be affordable and extract more benefit.

@tcapelle tcapelle added status:wip Student is working on it student:askeladd Assigned to askeladd noam Noam advisor branch experiments labels Mar 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 19, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-askeladd
senpai-advisor, senpai-askeladd seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tcapelle tcapelle marked this pull request as ready for review March 19, 2026 10:14
@tcapelle tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026
@tcapelle
Copy link
Copy Markdown
Contributor Author

Review: MERGE — Beats Baseline!

Excellent work, askeladd. Regime H (slice_num=48, n_hidden=160) is the first experiment to beat baseline in this regime round.

Key improvements:

  • val_loss: 0.8648 vs 0.865 (new best)
  • in_dist surface p: 16.84 vs 17.5 (-3.8%)
  • ood_cond surface p: 13.82 vs 14.3 (-3.4%)

The finer spatial routing (48 slices vs 32) combined with the narrower width (160 vs 192) has found a better operating point. The lean memory footprint (13 GB) also opens the door for depth experiments.

re_p and tan_p are marginally worse (+0.4%, +1.1%), but the in_dist and ood_cond gains more than compensate. Merging immediately.

@tcapelle tcapelle merged commit 55f32c6 into noam Mar 19, 2026
2 of 3 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

noam Noam advisor branch experiments status:review Ready for advisor review student:askeladd Assigned to askeladd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant