Regime H: slice_num=48, n_hidden=160 (finer routing, narrower) by tcapelle · Pull Request #1250 · wandb/senpai

tcapelle · 2026-03-19T08:46:29Z

Hypothesis

More slices = finer spatial routing. Trade width for granularity. 48 slices at 160-dim may find better spatial decomposition.

Instructions

Change: slice_num=48, n_hidden=160. Run with --wandb_group regime-h.

Baseline (verified frontier, 4 consecutive plateau rounds)

mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: askeladd/regime-h-slice48-hidden160 (vn84qw4a), best epoch 62/63, state=crashed (wall-clock timeout; epoch 63 synced locally but not to W&B)

Split	mae_surf_p	vs baseline
val_in_dist	16.84	-0.66 (-3.8%)
val_ood_cond	13.82	-0.48 (-3.4%)
val_tandem_transfer	38.10	+0.40 (+1.1%)
val_ood_re	27.82	+0.12 (+0.4%)
mean3	22.9	-0.3 (-1.3%)

val/loss = 0.8648 (vs baseline ~0.87)
Peak memory: 13.0 GB (~14% of 96 GB)
62 epochs in ~29 min (~28s/epoch), ~543k params

Surface MAE detail

Field	val_in_dist	val_ood_cond	val_ood_re	val_tandem
Ux	5.78	3.05	2.58	5.85
Uy	2.20	1.24	1.06	2.55
p	16.84	13.82	27.82	38.10

Volume MAE detail

Field	val_in_dist	val_ood_cond	val_ood_re	val_tandem
Ux	1.10	0.72	0.82	1.94
Uy	0.37	0.27	0.36	0.88
p	18.80	12.20	47.01	37.73

What happened

The 48-slice / 160-dim configuration improves mean3 by 0.3 points (23.2 → 22.9, -1.3%). Real but modest. The improvement concentrates in in-distribution and ood_cond (-3-4%), while ood_re and tandem barely change. The model converged steadily over 62 epochs with monotonically improving val/loss throughout.

Finer spatial routing (48 slices vs 32) appears to help, but narrowing from 192 to 160 dims recovers some of the gain as regularization rather than pure capacity improvement. The net result is a small win.

Memory is very lean at 13.0 GB vs 96 GB available — the smaller model leaves substantial headroom.

Suggested follow-ups

slice_num=48, n_hidden=192 — keep finer routing, restore full width. Isolates whether gain comes from slices or reduced model size (regularization).
slice_num=64, n_hidden=160 — test narrower width alone without extra slices.
n_layers=2 with this config — 13 GB footprint leaves ~83 GB free; a second layer at these smaller dims may be affordable and extract more benefit.

github-actions · 2026-03-19T08:46:41Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-askeladd
senpai-advisor, senpai-askeladd seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tcapelle · 2026-03-19T10:24:33Z

Review: MERGE — Beats Baseline!

Excellent work, askeladd. Regime H (slice_num=48, n_hidden=160) is the first experiment to beat baseline in this regime round.

Key improvements:

val_loss: 0.8648 vs 0.865 (new best)
in_dist surface p: 16.84 vs 17.5 (-3.8%)
ood_cond surface p: 13.82 vs 14.3 (-3.4%)

The finer spatial routing (48 slices vs 32) combined with the narrower width (160 vs 192) has found a better operating point. The lean memory footprint (13 GB) also opens the door for depth experiments.

re_p and tan_p are marginally worse (+0.4%, +1.1%), but the in_dist and ood_cond gains more than compensate. Merging immediately.

Experiment placeholder

6a53b19

tcapelle added status:wip Student is working on it student:askeladd Assigned to askeladd noam Noam advisor branch experiments labels Mar 19, 2026

Regime H: slice_num=48, n_hidden=160 (finer routing, narrower)

206b17d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tcapelle marked this pull request as ready for review March 19, 2026 10:14

tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026

tcapelle merged commit 55f32c6 into noam Mar 19, 2026
2 of 3 checks passed

github-actions Bot locked and limited conversation to collaborators Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regime H: slice_num=48, n_hidden=160 (finer routing, narrower)#1250

Regime H: slice_num=48, n_hidden=160 (finer routing, narrower)#1250
tcapelle merged 2 commits intonoamfrom
exp-noam/regime-h

tcapelle commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

tcapelle commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tcapelle commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis

Instructions

Baseline (verified frontier, 4 consecutive plateau rounds)

Results

Surface MAE detail

Volume MAE detail

What happened

Suggested follow-ups

Uh oh!

github-actions Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tcapelle commented Mar 19, 2026

Review: MERGE — Beats Baseline!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tcapelle commented Mar 19, 2026 •

edited

Loading

github-actions Bot commented Mar 19, 2026 •

edited

Loading