Sliding Window Eval + Muon6 (val_bpb 1.1973) by beee003 · Pull Request #169 · openai/parameter-golf

beee003 · 2026-03-20T04:50:52Z

Summary

Mean val_bpb: 1.1973 (3 seeds: 1337→1.1968, 42→1.1974, 7→1.1978, p<0.001)
Artifact: ~15.9 MB (under 16 MB)
Training: ~13,688 steps in 600s on 8xH100 SXM
Eval: ~126s sliding window (within 10 min budget)

Key Techniques

Sliding Window Evaluation (stride=256): Each token scored with 768+ context instead of 0-1023 average. Added forward_logits() method for efficient inference.
Muon 6-step Newton-Schulz: More accurate gradient orthogonalization (MUON_BACKEND_STEPS=6).
Extended Momentum Warmup: MUON_MOMENTUM_WARMUP_STEPS=1000 (up from 500) stabilizes early training.
Longer Warmdown: WARMDOWN_ITERS=1500 (up from 1200) for smoother LR decay.

Reproduction

MUON_BACKEND_STEPS=6 MUON_MOMENTUM_WARMUP_STEPS=1000 WARMDOWN_ITERS=1500 EVAL_STRIDE=256 \
SEED=1337 RUN_ID=submission VAL_LOSS_EVERY=0 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Results

Seed	val_loss	val_bpb	Steps
1337	2.0208	1.1968	13,688
42	2.0217	1.1974	13,688
7	2.0225	1.1978	13,688
Mean	2.0217	1.1973

Mean val_bpb: 1.1973 across 3 seeds (1337, 42, 7). Key techniques: sliding window eval (stride=256), Muon 6-step Newton-Schulz, extended momentum warmup (1000 steps), longer warmdown (1500 iters). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccf183ec4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T04:56:10Z

records/track_10min_16mb/2026-03-20_SlidingWindowMuon6/train_gpt.py

+            # Only score tokens in the scoring window [score_start, score_end)
+            score_start = pos - win_start
+            score_end = min(score_start + stride, actual_len)


Prevent double-counting boundary tokens in sliding eval

On multi-GPU runs, eval_val_sliding partitions the validation set with my_start/my_end, but this scoring window still extends each rank by a full stride because score_end is only clamped to actual_len. In the 8xH100 path used for leaderboard submissions, that means the last window on rank r can score past my_end while rank r+1 starts again at its own my_start, so the boundary tokens are counted twice in the all-reduced val_loss/val_bpb. Please clamp the scored range to the local shard (or distribute work by windows instead of raw tokens) so the final metric matches the true dataset average.

Useful? React with 👍 / 👎.

Clamp score_end to the local shard boundary (my_end) so tokens at rank boundaries aren't counted by both adjacent ranks during all_reduce. Score may change slightly on multi-GPU runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Live AI Commentary #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sliding Window Eval + Muon6 (val_bpb 1.1973)#169

Sliding Window Eval + Muon6 (val_bpb 1.1973)#169
beee003 wants to merge 2 commits intoopenai:mainfrom
beee003:submission-sliding-window-muon6

beee003 commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

beee003 commented Mar 20, 2026

Summary

Key Techniques

Reproduction

Results

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants