Regime G: Remove hard-mining, no noise, surf_weight fixed at 30 (clean training)#1249
Regime G: Remove hard-mining, no noise, surf_weight fixed at 30 (clean training)#1249
Conversation
|
I have read the CLA Document and I hereby sign the CLA 0 out of 2 committers have signed the CLA. |
Review: Closed — Ablation Confirms Components HelpClean training (no hard-mining, no noise, fixed surf_weight=30) regressed mean3 from 23.2 → 24.4 (+5.2%). The tandem split was hit hardest (+3.2). This confirms the merged components (hard-mining, noise annealing, adaptive surf_weight) each carry meaningful signal. The 'complex landscape' hypothesis is not supported — the complexity is earned. Good experiment — the negative result is informative. |
Hypothesis
Ablations showed each component helps individually. But the combination of hard-mining + noise annealing + adaptive surf_weight creates a complex training landscape. Simplify: remove hard-mining, turn off noise entirely, use fixed high surf_weight=30.
Instructions
Remove vectorized hard-mining block. Set noise to zero always. Replace adaptive surf_weight with fixed 30.0. Run with
--wandb_group regime-g.Baseline (verified frontier, 4 consecutive plateau rounds)
Results
W&B run:
1uk74twg(thorfinn/regime-g-clean, group: regime-g)Peak memory: 14.7 GB
Training: 61 epochs, 30.1 min
Surface MAE (mae_surf_p, primary metric)
Full Surface MAE breakdown (best checkpoint, epoch 61)
val/loss (4-split avg): 0.9495
What happened
The simplified regime is consistently slightly worse across all splits, with tandem taking the largest hit (+3.2 mae_surf_p, ~8.5% relative regression). Overall mean3 regresses 23.2 → 24.4.
Removing hard-mining likely hurt tandem most: it boosted pressure gradient signal on difficult non-tandem nodes, sharpening training in ways that aided generalization. Removing noise annealing reduced regularization. Fixed surf_weight=30 falls within the adaptive range, so that alone is unlikely to explain the regression.
The hypothesis that these components add unnecessary complexity is not supported — each carries meaningful signal.
Suggested follow-ups