Skip to content

Add strong-submission eval pipeline and ablation tooling#153

Open
RogueTex wants to merge 1 commit intoopenai:mainfrom
RogueTex:feat-strong-submission-eval-pipeline
Open

Add strong-submission eval pipeline and ablation tooling#153
RogueTex wants to merge 1 commit intoopenai:mainfrom
RogueTex:feat-strong-submission-eval-pipeline

Conversation

@RogueTex
Copy link

@RogueTex RogueTex commented Mar 20, 2026

Summary

  • add configurable final eval modes: FINAL_EVAL_MODE=standard|sliding|ttt
  • add sliding-window eval path (EVAL_SEQ_LEN, EVAL_STRIDE, EVAL_BATCH_SEQS) with compiled forward_logits
  • add decoupled Muon weight decay via MUON_WEIGHT_DECAY
  • add export passthrough control INT8_ALWAYS_KEEP_FLOAT_NAME_PATTERNS (default keeps tok_emb.weight in fp16)
  • add experiment runbook and scripts under experiments/parameter_golf/

Why

This makes it possible to run controlled ablations and push for stronger 10min/16MB submissions by combining training-side robustness and evaluation-time improvements.

Validation

  • python3 -m py_compile train_gpt.py
  • bash -n experiments/parameter_golf/run_ablation.sh
  • bash -n experiments/parameter_golf/run_top3.sh
  • python3 experiments/parameter_golf/summarize_runs.py --help

Notes

GPU training/eval sweeps were not run in this local environment (no CUDA tooling present).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants