Add strong-submission eval pipeline and ablation tooling by RogueTex · Pull Request #153 · openai/parameter-golf

RogueTex · 2026-03-20T01:53:51Z

Summary

add configurable final eval modes: FINAL_EVAL_MODE=standard|sliding|ttt
add sliding-window eval path (EVAL_SEQ_LEN, EVAL_STRIDE, EVAL_BATCH_SEQS) with compiled forward_logits
add decoupled Muon weight decay via MUON_WEIGHT_DECAY
add export passthrough control INT8_ALWAYS_KEEP_FLOAT_NAME_PATTERNS (default keeps tok_emb.weight in fp16)
add experiment runbook and scripts under experiments/parameter_golf/

Why

This makes it possible to run controlled ablations and push for stronger 10min/16MB submissions by combining training-side robustness and evaluation-time improvements.

Validation

python3 -m py_compile train_gpt.py
bash -n experiments/parameter_golf/run_ablation.sh
bash -n experiments/parameter_golf/run_top3.sh
python3 experiments/parameter_golf/summarize_runs.py --help

Notes

GPU training/eval sweeps were not run in this local environment (no CUDA tooling present).

Add strong-submission eval pipeline and ablation tooling

e25989d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add strong-submission eval pipeline and ablation tooling#153

Add strong-submission eval pipeline and ablation tooling#153
RogueTex wants to merge 1 commit intoopenai:mainfrom
RogueTex:feat-strong-submission-eval-pipeline

RogueTex commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RogueTex commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RogueTex commented Mar 20, 2026 •

edited

Loading