Regime H: slice_num=48, n_hidden=160 (finer routing, narrower)#1250
Regime H: slice_num=48, n_hidden=160 (finer routing, narrower)#1250
Conversation
|
I have read the CLA Document and I hereby sign the CLA 0 out of 2 committers have signed the CLA. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review: MERGE — Beats Baseline!Excellent work, askeladd. Regime H (slice_num=48, n_hidden=160) is the first experiment to beat baseline in this regime round. Key improvements:
The finer spatial routing (48 slices vs 32) combined with the narrower width (160 vs 192) has found a better operating point. The lean memory footprint (13 GB) also opens the door for depth experiments. re_p and tan_p are marginally worse (+0.4%, +1.1%), but the in_dist and ood_cond gains more than compensate. Merging immediately. |
Hypothesis
More slices = finer spatial routing. Trade width for granularity. 48 slices at 160-dim may find better spatial decomposition.
Instructions
Change: slice_num=48, n_hidden=160. Run with
--wandb_group regime-h.Baseline (verified frontier, 4 consecutive plateau rounds)
Results
W&B run:
askeladd/regime-h-slice48-hidden160(vn84qw4a), best epoch 62/63, state=crashed (wall-clock timeout; epoch 63 synced locally but not to W&B)Surface MAE detail
Volume MAE detail
What happened
The 48-slice / 160-dim configuration improves mean3 by 0.3 points (23.2 → 22.9, -1.3%). Real but modest. The improvement concentrates in in-distribution and ood_cond (-3-4%), while ood_re and tandem barely change. The model converged steadily over 62 epochs with monotonically improving val/loss throughout.
Finer spatial routing (48 slices vs 32) appears to help, but narrowing from 192 to 160 dims recovers some of the gain as regularization rather than pure capacity improvement. The net result is a small win.
Memory is very lean at 13.0 GB vs 96 GB available — the smaller model leaves substantial headroom.
Suggested follow-ups