Skip to content

Regime G: Remove hard-mining, no noise, surf_weight fixed at 30 (clean training)#1249

Closed
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-g
Closed

Regime G: Remove hard-mining, no noise, surf_weight fixed at 30 (clean training)#1249
tcapelle wants to merge 2 commits intonoamfrom
exp-noam/regime-g

Conversation

@tcapelle
Copy link
Copy Markdown
Contributor

@tcapelle tcapelle commented Mar 19, 2026

Hypothesis

Ablations showed each component helps individually. But the combination of hard-mining + noise annealing + adaptive surf_weight creates a complex training landscape. Simplify: remove hard-mining, turn off noise entirely, use fixed high surf_weight=30.

Instructions

Remove vectorized hard-mining block. Set noise to zero always. Replace adaptive surf_weight with fixed 30.0. Run with --wandb_group regime-g.

Baseline (verified frontier, 4 consecutive plateau rounds)

  • mean3=23.2 (in=17.5, ood=14.3, re=27.7, tan=37.7)
  • 50 single-variable experiments failed to improve. This round tests MULTI-VARIABLE regime changes.

Results

W&B run: 1uk74twg (thorfinn/regime-g-clean, group: regime-g)
Peak memory: 14.7 GB
Training: 61 epochs, 30.1 min

Surface MAE (mae_surf_p, primary metric)

Split This run Baseline Delta
val_in_dist 17.7 17.5 +0.2
val_ood_cond 14.7 14.3 +0.4
val_ood_re 28.2 27.7 +0.5
val_tandem_transfer 40.9 37.7 +3.2
mean3 (in+ood+tan)/3 24.4 23.2 +1.2

Full Surface MAE breakdown (best checkpoint, epoch 61)

Split Ux Uy p val/loss
val_in_dist 10.0 2.6 17.7 0.6534
val_ood_cond 6.2 1.6 14.7 0.7779
val_ood_re 5.8 1.3 28.2 0.6027
val_tandem_transfer 8.5 2.9 40.9 1.7640

val/loss (4-split avg): 0.9495

What happened

The simplified regime is consistently slightly worse across all splits, with tandem taking the largest hit (+3.2 mae_surf_p, ~8.5% relative regression). Overall mean3 regresses 23.2 → 24.4.

Removing hard-mining likely hurt tandem most: it boosted pressure gradient signal on difficult non-tandem nodes, sharpening training in ways that aided generalization. Removing noise annealing reduced regularization. Fixed surf_weight=30 falls within the adaptive range, so that alone is unlikely to explain the regression.

The hypothesis that these components add unnecessary complexity is not supported — each carries meaningful signal.

Suggested follow-ups

  • Ablate surf_weight alone (fixed 30, keep noise + hard-mining) to test whether the adaptive weight is expendable.
  • Ablate noise only (remove noise, keep hard-mining + adaptive weight) to isolate the noise contribution.
  • If hard-mining is the key contributor, simplify it to a fixed top-50% fraction without tandem asymmetry — same concept, less code.

@tcapelle tcapelle added status:wip Student is working on it student:thorfinn Assigned to thorfinn noam Noam advisor branch experiments labels Mar 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 19, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-thorfinn
senpai-advisor, senpai-thorfinn seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@tcapelle tcapelle marked this pull request as ready for review March 19, 2026 10:00
@tcapelle tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 19, 2026
@tcapelle
Copy link
Copy Markdown
Contributor Author

Review: Closed — Ablation Confirms Components Help

Clean training (no hard-mining, no noise, fixed surf_weight=30) regressed mean3 from 23.2 → 24.4 (+5.2%). The tandem split was hit hardest (+3.2). This confirms the merged components (hard-mining, noise annealing, adaptive surf_weight) each carry meaningful signal. The 'complex landscape' hypothesis is not supported — the complexity is earned.

Good experiment — the negative result is informative.

@tcapelle tcapelle closed this Mar 19, 2026
@tcapelle tcapelle deleted the exp-noam/regime-g branch March 19, 2026 10:10
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

noam Noam advisor branch experiments status:review Ready for advisor review student:thorfinn Assigned to thorfinn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant