Makes grpo.py support checkpointing, and adds a fix for mason.py#1666
Conversation
…t; add grpo.py to OPEN_INSTRUCT_COMMANDS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
758e360 to
e78d601
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 758e36034f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request adds open_instruct/grpo.py to the supported commands and resumables and introduces a replace_or_append_flag utility to handle idempotent flag overrides for --output_dir and --checkpoint_state_dir. Review feedback points out a logic bug in the new utility when dealing with adjacent flags and suggests a more robust implementation by reusing existing command-building logic. It also recommends expanding the test suite to cover these edge cases.
…; wire OLMo-core checkpointing into grpo.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
grpo.py support checkpointing, and adds a fix for mason.py
…d_flag to delegate Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nd at call sites Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…and_without_args Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… for grpo.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…of nonexistent restore_state Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
replace_or_append_flag(command, flag, value)helper inmason.pyso the auto-overrides for--output_dirand--checkpoint_state_dirreplace any existing occurrence of the flag in the user's command instead of blindly appending. Previously, passing the flag explicitly produced a duplicated entry, and argparse picked one of the two values inconsistently.build_command_without_argsflag-aware: a value-bearing flag now consumes the following token only if it doesn't itself start with--.replace_or_append_flagis a one-line delegation on top of it.open_instruct/grpo.pytoOPEN_INSTRUCT_COMMANDSandOPEN_INSTRUCT_RESUMABLESso mason recognizes it the same way it doesgrpo_fast.py, and wire OLMo-core checkpoint save/resume intogrpo.py(CheckpointerCallback+DataPreparationActorCheckpointCallback+LoadStrategy.if_available) so resumable Beaker jobs actually resume instead of silently restarting from step 0.build_command_without_argscovering the same edge cases.Carved out of #1642 to keep the change small and reviewable.
Example fixed by the
build_command_without_argschangeGiven a command where
--checkpoint_state_diris immediately followed by another flag (a flag with no value, e.g. an interrupted/edited launch line):["python", "grpo.py", "--output", "out"]—--with_trackinggot silently eaten as if it were the checkpoint dir's value.["python", "grpo.py", "--with_tracking", "--output", "out"]—--with_trackingis preserved.The same fix means
replace_or_append_flag(["--output_dir", "--output_dir", "/tmp/z"], "--output_dir", "/weka/x")now returns["--output_dir", "/weka/x"]instead of leaking the orphaned/tmp/ztoken.Test plan
uv run pytest test_mason.pymake style && make qualityscripts/train/debug/grpo_checkpoint_integration_test.sh): run 1 trains 6 steps and writes checkpoints; run 2 resumes fromstep6and trains to step 12. Run 2 logs confirmLoading checkpoint from '.../step6'andWill resume training from step 6, epoch 1.Runs:
GPU_TESTS=bypass