Record: Long Context + All Optimizations submission by chinesepowered · Pull Request #166 · openai/parameter-golf

chinesepowered · 2026-03-20T04:13:42Z

Submission: records/track_10min_16mb/2026-03-20_LongCtx_SlidingWindow_FP16Emb_10L_AllOpts/

Strategy: Merge the two strongest approaches that haven't been combined yet:

Technique Source | What it contributes -- | -- SOTA (1.1748 BPB) | Sliding window eval, FP16 embed export, 10 layers, Overtone init, Muon WD, phase-transition resid_mix Seq4096 (1.2014 BPB) | Long context training, high Muon momentum (0.99), conservative LRs (0.02), smaller batch, longer warmdown

Key changes from current SOTA:

train_seq_len: 1024 → 2048 (2x more context per training token)
train_batch_tokens: 524K → 393K (more optimizer steps per wallclock)
matrix_lr/scalar_lr: 0.04 → 0.02 (reduces quantization gap)
muon_momentum: 0.95 → 0.99 (stronger gradient smoothing)
warmdown_iters: 2500 → 3600 (smoother weights for quantization)
tied_embed_lr: 0.10 → 0.06 (more stable embedding training)

The Seq4096 submission showed ~0.02 BPB improvement from training alone (without any eval tricks). The SOTA's sliding window eval adds ~0.03 BPB improvement. Combined, this could target ~1.155-1.165 BPB, pending validation on 8xH100 hardware.

Combines best training techniques (seq_len=2048, Muon momentum 0.99, conservative LRs, longer warmdown) with all SOTA eval/quantization tricks (sliding window, FP16 embed, Overtone init, Muon WD, 10 layers). https://claude.ai/code/session_01D1CQCz3TCExUmWTVDivu3R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Long Context + All Optimizations submission#166

Record: Long Context + All Optimizations submission#166
chinesepowered wants to merge 1 commit intoopenai:mainfrom
chinesepowered:claude/complete-coding-challenge-fGEp1

chinesepowered commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chinesepowered commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants