Regime J: mlp_ratio=4 (wider FFN at same hidden dim) by tcapelle · Pull Request #1252 · wandb/senpai

tcapelle · 2026-03-19T08:46:35Z

Hypothesis

The FFN hidden layer is 2x n_hidden=384. At 4x it becomes 768, giving significantly more nonlinear capacity. The epoch time increases but the model has more expressive power per layer.

Instructions

Change: mlp_ratio=4 in model_config. Run with --wandb_group regime-j.

Baseline (verified frontier, 4 consecutive plateau rounds)

mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

Iteration 1: mlp_ratio=4

W&B run: ta02xdo7 | Epochs: 54 | Memory: 16.2 GB

Split	mae_surf_p	Baseline	Δ
val_in_dist	18.8	17.5	+1.3 ↑
val_ood_cond	15.3	14.3	+1.0 ↑
val_ood_re	28.0	27.7	+0.3 ↑
val_tandem_transfer	38.5	37.7	+0.8 ↑
mean3	24.2	23.2	+1.0 ↑

Iteration 2: mlp_ratio=3 (revised per review)

W&B run: e6d3cbls | Epochs: 57 | Memory: 15.5 GB

Split	mae_surf_p	Baseline	Δ
val_in_dist	18.9	17.5	+1.4 ↑
val_ood_cond	14.3	14.3	~0 (tied)
val_ood_re	27.9	27.7	+0.2 ↑
val_tandem_transfer	40.7	37.7	+3.0 ↑
mean3	24.6	23.2	+1.4 ↑

Full Surface MAE (mlp_ratio=3)

Split	Ux	Uy	p
val_in_dist	7.24	1.93	18.92
val_ood_cond	4.15	1.28	14.26
val_ood_re	3.66	1.14	27.87
val_tandem_transfer	6.77	2.38	40.71

Volume MAE (mlp_ratio=3)

Split	Ux	Uy	p
val_in_dist	1.12	0.37	20.17
val_ood_cond	0.73	0.28	12.22
val_ood_re	0.82	0.36	46.86
val_tandem_transfer	1.95	0.89	39.13

Val loss (mlp_ratio=3, epoch 57)

val/loss: 0.893 | val_in_dist: 0.628, val_ood_cond: 0.718, val_ood_re: 0.543, val_tandem_transfer: 1.685

What happened

Negative result for both ratios. Neither mlp_ratio=3 nor mlp_ratio=4 beats the baseline (ratio=2) within the 30-minute budget.

mlp_ratio=3 ran 57 epochs vs 54 for ratio=4 (as expected — ~30s/epoch vs ~32s). However, quality was actually slightly worse: mean3=24.6 vs 24.2 for ratio=4, and both are above the baseline 23.2. The tandem transfer split is particularly hurt (40.7 vs 37.7 baseline), suggesting the wider FFN doesn't generalize better to the hardest OOD case.

One positive signal: val_ood_cond mae_surf_p with ratio=3 ties the baseline exactly (14.3). But this single split matching doesn't offset the tandem regression.

The fundamental problem: a wider FFN adds more capacity per step but doesn't provide enough improvement per gradient step to compensate for the same (or fewer) training epochs within 30 minutes. The baseline ratio=2 remains the most epoch-efficient configuration.

Suggested follow-ups

Give up on wider FFN: Both ratio=3 and ratio=4 fail within the time budget. The capacity bottleneck appears to be elsewhere.
Wider model (n_hidden increase): If more capacity is the goal, increasing n_hidden might be more uniformly beneficial than wider FFN at fixed n_hidden.
n_layers=2 + reduced slice_num: More depth (currently n_layers=1) could add capacity with different per-step compute profile than width.
Investigate tandem transfer regression: The tandem split consistently gets worse with wider FFN. This might hint that wider FFN overfits to single-foil patterns in the training set.

github-actions · 2026-03-19T08:46:45Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-gilbert
senpai-advisor, senpai-gilbert seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

tcapelle · 2026-03-19T10:10:09Z

Review: Request Changes — Try mlp_ratio=3

Good execution, gilbert. The wider FFN (mlp_ratio=4) shows an interesting signal: re_p (28.0 vs 27.7) and tan_p (38.5 vs 37.7) are the closest to baseline among all regime experiments. val/loss=0.885 is also the second-closest to baseline (0.865).

However, the model only completed 54 epochs due to the slower epoch time (~33s vs ~28s baseline). With fewer optimization steps, it's hard to know if the wider FFN would have converged to a better minimum given the same number of epochs.

Next step: try mlp_ratio=3

This should give ~30s/epoch, fitting ~60 epochs in 30 min (vs 54 at ratio=4)
Still provides 50% more FFN capacity than baseline (ratio=2)
Keep everything else identical
Run with --wandb_group regime-j

The goal is to find the sweet spot between FFN capacity and epoch throughput within the 30-min budget.

tcapelle · 2026-03-19T10:52:55Z

Review: Closed — Both mlp_ratio iterations failed

Neither mlp_ratio=4 nor mlp_ratio=3 beat baseline. The wider FFN adds compute but not enough quality within the 30-min budget. Both iterations were on old code (pre-Regime H); the new baseline (val_loss=0.8648) makes these results even further behind. Two iterations is enough — closing. I'll assign you a fresh experiment on the updated codebase.

Experiment placeholder

99b7bb3

tcapelle added status:wip Student is working on it student:gilbert Assigned to gilbert noam Noam advisor branch experiments labels Mar 19, 2026

mlp_ratio=4: wider FFN (4x vs 2x hidden dim)

9be4b7e

tcapelle marked this pull request as ready for review March 19, 2026 10:00

tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026

tcapelle marked this pull request as draft March 19, 2026 10:10

tcapelle added status:wip Student is working on it and removed status:review Ready for advisor review labels Mar 19, 2026

mlp_ratio=3: intermediate FFN width (3x vs baseline 2x)

d11cb71

tcapelle marked this pull request as ready for review March 19, 2026 10:44

tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026

tcapelle closed this Mar 19, 2026

tcapelle deleted the exp-noam/regime-j branch March 19, 2026 10:52

github-actions Bot locked and limited conversation to collaborators Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regime J: mlp_ratio=4 (wider FFN at same hidden dim)#1252

Regime J: mlp_ratio=4 (wider FFN at same hidden dim)#1252
tcapelle wants to merge 3 commits intonoamfrom
exp-noam/regime-j

tcapelle commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

tcapelle commented Mar 19, 2026

Uh oh!

tcapelle commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tcapelle commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis

Instructions

Baseline (verified frontier, 4 consecutive plateau rounds)

Results

Iteration 1: mlp_ratio=4

Iteration 2: mlp_ratio=3 (revised per review)

Full Surface MAE (mlp_ratio=3)

Volume MAE (mlp_ratio=3)

Val loss (mlp_ratio=3, epoch 57)

What happened

Suggested follow-ups

Uh oh!

github-actions Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tcapelle commented Mar 19, 2026

Review: Request Changes — Try mlp_ratio=3

Uh oh!

tcapelle commented Mar 19, 2026

Review: Closed — Both mlp_ratio iterations failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tcapelle commented Mar 19, 2026 •

edited

Loading

github-actions Bot commented Mar 19, 2026 •

edited

Loading