Regime K: n_head=8, slice_num=64 (more attention heads + slices) by tcapelle · Pull Request #1253 · wandb/senpai

tcapelle · 2026-03-19T08:46:38Z

Hypothesis

More heads (8 vs 4) with more slices (64 vs 32) gives the attention much finer-grained spatial decomposition. This was the original Transolver configuration before we traded it for speed. With torch.compile, the throughput penalty may be recoverable.

Instructions

Change: n_head=8, slice_num=64. Run with --wandb_group regime-k.

Baseline (verified frontier, 4 consecutive plateau rounds)

mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: 1pv5ks15 | Epochs: 50 (killed at timeout mid-epoch 51) | Group: regime-k

Validation losses (last/best checkpoint, epoch 50)

Split	loss	mae_surf_Ux	mae_surf_Uy	mae_surf_p
val_in_dist	0.6380	8.27	2.39	19.18
val_ood_cond	0.7515	4.94	1.58	15.11
val_ood_re	0.5802	4.65	1.39	28.66
val_tandem_transfer	1.6526	7.45	2.89	38.92
mean (val/loss)	0.9056

mean3 surf_p (in+ood+re)/3 = 21.0 (baseline ~19.8) — worse than baseline
val/loss = 0.9056 (baseline 0.87) — worse than baseline

Volume MAE

Split	mae_vol_Ux	mae_vol_Uy	mae_vol_p
val_in_dist	1.17	0.37	20.11
val_ood_cond	0.74	0.29	12.71
val_ood_re	0.84	0.37	47.32
val_tandem_transfer	1.95	0.88	37.71

Peak memory: ~16.3 GB (vs ~13 GB baseline — +25% overhead)

What happened

Negative result. n_head=8 + slice_num=64 was worse than baseline on every metric and every split.

The primary issue is throughput: each epoch took ~35s (vs ~32s at baseline settings), yielding only 50 epochs in 30 minutes instead of the ~57 the baseline gets. The larger attention heads add meaningful overhead that torch.compile does not fully recover.

The hypothesis that finer-grained spatial decomposition helps was not validated in the 30-min budget. At epoch 50 the model was still improving (every epoch saved a new best checkpoint), suggesting it may eventually converge to similar or better performance given more epochs — but that exceeds the 30-min budget constraint.

Memory increase (+25%, 16.3 GB vs ~13 GB) is a secondary cost: more slices = larger intermediate tensors.

Suggested follow-ups

Try n_head=8 alone (keep slice_num=32) — doubles attention diversity without the memory/speed cost of 64 slices
Try slice_num=64 alone (keep n_head=4) — test whether slice count matters more than head count
If longer training budget is ever available, this config would be worth re-testing since it was still converging at epoch 50

github-actions · 2026-03-19T08:46:47Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-senku
senpai-advisor, senpai-senku seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

…on config)

tcapelle · 2026-03-19T10:44:57Z

Review: Closed — n_head=8 + slice_num=64 too slow

val_loss=0.9056 vs new baseline 0.8648 (+4.7%). All metrics worse. The ~35s/epoch only allowed 50 epochs — not enough to converge. Meanwhile Regime H merged (slice_num=48, n_hidden=160) and achieves better results with less overhead.

Your suggestion to try n_head=8 alone (keep slice_num=48) is interesting — the extra attention diversity without 64-slice overhead could work on the new codebase. I'll assign you a fresh experiment on the updated code.

Experiment placeholder

0a12409

tcapelle added status:wip Student is working on it student:senku Assigned to senku noam Noam advisor branch experiments labels Mar 19, 2026

Regime K: n_head=8, slice_num=64 (restore original Transolver attenti…

b79d774

…on config)

tcapelle marked this pull request as ready for review March 19, 2026 10:37

tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026

tcapelle closed this Mar 19, 2026

tcapelle deleted the exp-noam/regime-k branch March 19, 2026 10:44

github-actions Bot locked and limited conversation to collaborators Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regime K: n_head=8, slice_num=64 (more attention heads + slices)#1253

Regime K: n_head=8, slice_num=64 (more attention heads + slices)#1253
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-k

tcapelle commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

tcapelle commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tcapelle commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis

Instructions

Baseline (verified frontier, 4 consecutive plateau rounds)

Results

Validation losses (last/best checkpoint, epoch 50)

Volume MAE

What happened

Suggested follow-ups

Uh oh!

github-actions Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tcapelle commented Mar 19, 2026

Review: Closed — n_head=8 + slice_num=64 too slow

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tcapelle commented Mar 19, 2026 •

edited

Loading

github-actions Bot commented Mar 19, 2026 •

edited

Loading