Makes `grpo.py` support checkpointing, and adds a fix for `mason.py` by finbarrtimbers · Pull Request #1666 · allenai/open-instruct

finbarrtimbers · 2026-05-08T16:42:42Z

Summary

Add replace_or_append_flag(command, flag, value) helper in mason.py so the auto-overrides for --output_dir and --checkpoint_state_dir replace any existing occurrence of the flag in the user's command instead of blindly appending. Previously, passing the flag explicitly produced a duplicated entry, and argparse picked one of the two values inconsistently.
Collapses repeated occurrences of the same flag down to a single one with the new value, including the trailing-flag-without-value edge case.
Make build_command_without_args flag-aware: a value-bearing flag now consumes the following token only if it doesn't itself start with --. replace_or_append_flag is a one-line delegation on top of it.
Add open_instruct/grpo.py to OPEN_INSTRUCT_COMMANDS and OPEN_INSTRUCT_RESUMABLES so mason recognizes it the same way it does grpo_fast.py, and wire OLMo-core checkpoint save/resume into grpo.py (CheckpointerCallback + DataPreparationActorCheckpointCallback + LoadStrategy.if_available) so resumable Beaker jobs actually resume instead of silently restarting from step 0.
Parameterized unit tests for the helper covering empty/absent/present-once/present-twice/trailing-flag/adjacent-flags/flag-followed-by-flag cases, plus two new cases for build_command_without_args covering the same edge cases.

Carved out of #1642 to keep the change small and reviewable.

Example fixed by the `build_command_without_args` change

Given a command where --checkpoint_state_dir is immediately followed by another flag (a flag with no value, e.g. an interrupted/edited launch line):

build_command_without_args(
    ["python", "grpo.py", "--checkpoint_state_dir", "--with_tracking", "--output", "out"],
    {"--checkpoint_state_dir": True},
)

Before: ["python", "grpo.py", "--output", "out"] — --with_tracking got silently eaten as if it were the checkpoint dir's value.
After: ["python", "grpo.py", "--with_tracking", "--output", "out"] — --with_tracking is preserved.

The same fix means replace_or_append_flag(["--output_dir", "--output_dir", "/tmp/z"], "--output_dir", "/weka/x") now returns ["--output_dir", "/weka/x"] instead of leaking the orphaned /tmp/z token.

Test plan

uv run pytest test_mason.py
make style && make quality
End-to-end checkpoint-resume integration test on Beaker (scripts/train/debug/grpo_checkpoint_integration_test.sh): run 1 trains 6 steps and writes checkpoints; run 2 resumes from step6 and trains to step 12. Run 2 logs confirm Loading checkpoint from '.../step6' and Will resume training from step 6, epoch 1.

Runs:

Run 1 (train 6 steps + checkpoint): Beaker
Run 2 (resume from step 6 → step 12): Beaker

GPU_TESTS=bypass

…t; add grpo.py to OPEN_INSTRUCT_COMMANDS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 758e36034f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist

Code Review

This pull request adds open_instruct/grpo.py to the supported commands and resumables and introduces a replace_or_append_flag utility to handle idempotent flag overrides for --output_dir and --checkpoint_state_dir. Review feedback points out a logic bug in the new utility when dealing with adjacent flags and suggests a more robust implementation by reusing existing command-building logic. It also recommends expanding the test suite to cover these edge cases.

…; wire OLMo-core checkpointing into grpo.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…d_flag to delegate Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…nd at call sites Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…and_without_args Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… for grpo.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…of nonexistent restore_state Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Make mason.py --output_dir/--checkpoint_state_dir overrides idempoten…

e78d601

…t; add grpo.py to OPEN_INSTRUCT_COMMANDS Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

finbarrtimbers force-pushed the finbarr/mason-replace-or-append-flag branch from 758e360 to e78d601 Compare May 8, 2026 16:43

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

Comment thread mason.py

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread mason.py Outdated

Comment thread test_mason.py

Address PR #1666 review: fix replace_or_append_flag adjacent-flag bug…

878b5cf

…; wire OLMo-core checkpointing into grpo.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

finbarrtimbers changed the title ~~Make mason.py output_dir/checkpoint_state_dir overrides idempotent~~ Makes grpo.py support checkpointing, and adds a fix for mason.py May 8, 2026

finbarrtimbers added 6 commits May 8, 2026 12:15

Make build_command_without_args flag-aware; simplify replace_or_appen…

0c220d4

…d_flag to delegate Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Drop replace_or_append_flag; inline build_command_without_args + exte…

13bf0fd

…nd at call sites Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add test for repeated occurrences of value-bearing flag in build_comm…

d6b9f38

…and_without_args Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add grpo_checkpoint_integration_test.sh: two-pass resume verification…

77c90f8

… for grpo.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fix DataPreparationActorCheckpointCallback to call set_state instead …

77d2154

…of nonexistent restore_state Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

cleaned up pr

69fc709

finbarrtimbers enabled auto-merge May 8, 2026 19:29

Merge branch 'main' into finbarr/mason-replace-or-append-flag

83ea6f3

finbarrtimbers requested a review from hamishivi May 8, 2026 19:29

hamishivi approved these changes May 8, 2026

View reviewed changes

Comment thread open_instruct/grpo_callbacks.py

finbarrtimbers added this pull request to the merge queue May 8, 2026

hamishivi removed this pull request from the merge queue due to a manual request May 8, 2026

Merge branch 'main' into finbarr/mason-replace-or-append-flag

c1c74f8

finbarrtimbers enabled auto-merge May 11, 2026 15:29

finbarrtimbers added this pull request to the merge queue May 11, 2026

Merged via the queue into main with commit 308c411 May 11, 2026
7 checks passed

finbarrtimbers deleted the finbarr/mason-replace-or-append-flag branch May 11, 2026 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makes `grpo.py` support checkpointing, and adds a fix for `mason.py`#1666

Makes `grpo.py` support checkpointing, and adds a fix for `mason.py`#1666
finbarrtimbers merged 10 commits into
mainfrom
finbarr/mason-replace-or-append-flag

finbarrtimbers commented May 8, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

finbarrtimbers commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example fixed by the build_command_without_args change

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented May 8, 2026 •

edited

Loading

Example fixed by the `build_command_without_args` change