Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045) by mrdavtan · Pull Request #151 · openai/parameter-golf

mrdavtan · 2026-03-20T01:47:08Z

Summary

Composition of proven community techniques on the 9L×512d baseline:

FP16 tied embedding export (from fp16 tied embedding + warmdown/LR tuning (val_bpb 1.2197) #42, Record: Mixed Quant Int6/FP16 + SmearGate + OrthoInit + MLP 3x + Sliding Window, val_bpb=1.1556 #65)
Aggressive warmdown WARMDOWN_ITERS=20000 (from Record: Mixed Quant Int6/FP16 + SmearGate + OrthoInit + MLP 3x + Sliding Window, val_bpb=1.1556 #65)
Seq2048 training (from SOTA attempt (val_bpb=1.2064) #49)
Tuned LRs and optimizer settings (from Record: Mixed Quant Int6/FP16 + SmearGate + OrthoInit + MLP 3x + Sliding Window, val_bpb=1.1556 #65, Record: Int6 MLP3x + STE QAT + Sliding Window (val_bpb=1.1594) #128)
Sliding window eval stride=64 with doc-isolated scoring (from Record: Sliding Window Eval (stride=64), val_bpb=1.1925 #50, [record bpb=1.195] sliding window + LoRA TTT #77)

val_bpb: 1.2045 — beats the naive baseline (1.2244) by 0.020 BPB.

Hardware limitation

This result is significantly constrained by RunPod node speed: 70ms/step vs the typical 44ms reported by other entries. This yielded only 8,528 training steps vs the ~13,600 that standard hardware would produce. We estimate ~1.185 on standard hardware with the same configuration.

Key metrics

Metric	Value
val_bpb (sliding window, doc-isolated)	1.2045
Pre-quant val_bpb	1.2154
Artifact size	15,912,648 bytes
Steps	8,528 (wallclock-limited at 70ms/step)
Eval time	43s

Acknowledgments

All techniques from community entries: @SamuelLarson (#65), @mattqlf (#50), @samacquaviva (#77), @spokane-way (#49), @rsavitt (#128).

Built with Claude Code

Test plan

Artifact under 16,000,000 bytes (15,912,648)
Training completes within 600s wallclock cap
Eval completes within 600s eval budget (43s)
Training log included

…ted eval (val_bpb=1.2045) Composition of proven community techniques on 9L baseline. Score limited by RunPod node speed (70ms/step vs typical 44ms, yielding 8528 steps vs ~13600). Expect ~1.185 on standard hardware.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045)#151

Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045)#151
mrdavtan wants to merge 1 commit intoopenai:mainfrom
mrdavtan:kitchen-sink-9L-pr

mrdavtan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrdavtan commented Mar 20, 2026

Summary

Hardware limitation

Key metrics

Acknowledgments

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant