GRPO OLMo-core feature parity: eval, checkpointer, schedulers by finbarrtimbers · Pull Request #1672 · allenai/open-instruct

finbarrtimbers · 2026-05-08T17:06:03Z

Brings the OLMo-core GRPO trainer (grpo.py) up to feature parity with grpo_fast.py:

Eval: new EvalCallback that pushes eval prompts onto prompt_Q on cadence and drains results via grpo_utils.maybe_evaluate; new setup_eval actor RPC and m.setup_eval.remote(...) call from grpo.py main; rank-0-only eval data loader.
Checkpointing: add an OLMo-core CheckpointerCallback to fit() driven by --checkpoint_state_freq and pruning to --keep_last_n_checkpoints. Warn if --save_freq differs (it's a no-op on the olmo-core path).
Scheduler: add explicit cosine / constant / linear branches for --lr_scheduler_type (raise on anything else).
StepTimingCallback: lower priority so its post_step runs after vLLM sync, and switch to _last_step_end-based timing so time/total is end-to-end.
Scripts: qwen3_4b_dapo_math.sh / qwen3_4b_dapo_math_oc.sh — accept BEAKER_IMAGE env var, route checkpoints to /tmp-3m/$RUN_NAME, add --use_rho_correction defaults and bump --activation_memory_budget for the OC variant.

…noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request brings grpo.py to feature parity with grpo_fast.py by implementing new callbacks for evaluation and timing, refactoring shared logic into grpo_utils.py, and updating the Hugging Face export process. Key improvements include a pruning checkpointer and startup verification for model saving. The review feedback identifies off-by-one errors in the evaluation scheduling logic and recommends a safety check for the tokenizer's pad token to prevent potential runtime errors.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: facc8f71bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T17:10:28Z

+        trainer_callbacks["checkpointer"] = olmo_core_utils.build_checkpointer_callback(
+            checkpointing_steps=self.grpo_config.checkpoint_state_freq,
+            ephemeral_save_interval=None,


Handle disabled checkpoint_state_freq before building checkpointer

When an OLMo-core GRPO run uses --checkpoint_state_freq -1 (or 0) to disable periodic state checkpoints, this forwards that value directly as the OLMo-core save_interval. The installed CheckpointerCallback rejects save_interval < 1 during construction, so the run aborts in TrainerConfig.build() instead of disabling checkpointing like the GRPO config/fast path allows. Skip registering the checkpointer or pass None when the configured frequency is non-positive.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-08T17:10:28Z

 )
-from open_instruct.grpo_fast import create_generation_configs, maybe_evaluate
+from open_instruct.grpo_fast import create_generation_configs
+from open_instruct.grpo_utils import maybe_evaluate


Retarget maybe_evaluate mocks to grpo_utils

After moving maybe_evaluate into grpo_utils, the tests still patch open_instruct.grpo_fast.accumulate_inference_batches and the old grpo_fast print helpers. Calls through this imported function now resolve data_loader_lib.accumulate_inference_batches and model_utils.print_* in grpo_utils, so the final-step/metrics tests no longer intercept the dependencies and will exercise the real queue path instead of the mocks. Update the patch targets to the new module dependencies.

Useful? React with 👍 / 👎.

…uthored-By: Claude Opus 4.7 <noreply@anthropic.com>

…t_state_to_hf; prune permanent checkpoints Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…noreply@anthropic.com>

…ature-parity

…, scheduler types Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…noreply@anthropic.com>

…re-parity # Conflicts: # CHANGELOG.md

…re-parity # Conflicts: # CHANGELOG.md # open_instruct/grpo.py # open_instruct/olmo_core_utils.py

…core now provides these mappings Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ated script tweaks Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…po_fast and olmo_core paths Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…uthored-By: Claude Opus 4.7 <noreply@anthropic.com>

…rimentConfig.__post_init__ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

farhatkevin · 2026-05-15T19:41:41Z

-        if self.checkpoint_state_dir is not None and self.checkpoint_state_freq == -1:
+        if self.checkpoint_state_dir is not None and self.checkpoint_state_freq <= 0:
            raise ValueError("`checkpoint_state_freq` must be greater than 0 if `checkpoint_state_dir` is provided!")
+        if self.save_freq != self.checkpoint_state_freq:


Should this warning live in grpo.py instead? GRPOExperimentConfig is shared with grpo_fast.py, and grpo_fast.py still uses save_freq for periodic model saves. Putting the warning here means non-Olmo-core runs can see an Olmo-core-specific warning. I know it says "on the olmo-core training path..." but is it better to move it?

finbarrtimbers added a commit that referenced this pull request May 8, 2026

Update CHANGELOG with PR #1672 link Co-Authored-By: Claude Opus 4.7 <…

67716a3

…noreply@anthropic.com>

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread open_instruct/grpo_callbacks.py

Comment thread open_instruct/grpo_callbacks.py Outdated

Comment thread open_instruct/grpo_utils.py

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

finbarrtimbers added 6 commits May 11, 2026 12:31

Move maybe_evaluate to grpo_utils; dedupe calculate_token_counts Co-A…

46576fd

…uthored-By: Claude Opus 4.7 <noreply@anthropic.com>

Verify HF export at startup; rewrite save_state_dict_as_hf via conver…

307ab05

…t_state_to_hf; prune permanent checkpoints Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Update CHANGELOG with PR #1671 link Co-Authored-By: Claude Opus 4.7 <…

370c2f1

…noreply@anthropic.com>

Merge branch 'finbarr/extract-maybe-evaluate' into finbarr/grpo-oc-fe…

abc80d3

…ature-parity

GRPO OLMo-core feature parity: EvalCallback, setup_eval, checkpointer…

5f2170b

…, scheduler types Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Update CHANGELOG with PR #1672 link Co-Authored-By: Claude Opus 4.7 <…

f322515

…noreply@anthropic.com>

finbarrtimbers force-pushed the finbarr/grpo-oc-feature-parity branch from 67716a3 to f322515 Compare May 11, 2026 18:32

This was referenced May 12, 2026

Fix CSV header handling in benchmark_generators; pathlib throughout #1684

Open

Fix _get_batch_logps NaN on fully-masked sequences (DPO) #1685

Merged

Fix gpt-4o output pricing; restate judge prices per 1M tokens #1686

Merged

Merge remote-tracking branch 'origin/main' into finbarr/grpo-oc-featu…

b12efb4

…re-parity # Conflicts: # CHANGELOG.md

finbarrtimbers changed the base branch from main to finbarr/hf-export-verify May 14, 2026 17:19

finbarrtimbers changed the base branch from finbarr/hf-export-verify to main May 14, 2026 17:20

finbarrtimbers and others added 7 commits May 15, 2026 07:20

Merge remote-tracking branch 'origin/main' into finbarr/grpo-oc-featu…

92a99dc

…re-parity # Conflicts: # CHANGELOG.md # open_instruct/grpo.py # open_instruct/olmo_core_utils.py

Drop pre-norm Qwen3/Llama OLMo-core->HF override shim; upstream olmo-…

607f67a

…core now provides these mappings Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Minimize diff: drop PruningCheckpointerCallback, ty:ignore, and unrel…

58c264c

…ated script tweaks Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Unify checkpoint_state_freq disable sentinel: <=0 disables on both gr…

6c3e3fc

…po_fast and olmo_core paths Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Apply suggestion from @gemini-code-assist[bot]

68ff60d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update grpo_callbacks.py

223b360

Use keyword args in EvalCallback.post_step's maybe_evaluate call Co-A…

b201e02

…uthored-By: Claude Opus 4.7 <noreply@anthropic.com>

finbarrtimbers requested a review from farhatkevin May 15, 2026 16:44

Move save_freq/checkpoint_state_freq divergence warning into GRPOExpe…

17be77f

…rimentConfig.__post_init__ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

finbarrtimbers enabled auto-merge May 15, 2026 18:32

farhatkevin reviewed May 15, 2026

View reviewed changes

farhatkevin approved these changes May 15, 2026

View reviewed changes

finbarrtimbers added this pull request to the merge queue May 15, 2026

Merged via the queue into main with commit e91ada4 May 15, 2026
7 checks passed

finbarrtimbers deleted the finbarr/grpo-oc-feature-parity branch May 15, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO OLMo-core feature parity: eval, checkpointer, schedulers#1672

GRPO OLMo-core feature parity: eval, checkpointer, schedulers#1672
finbarrtimbers merged 15 commits into
mainfrom
finbarr/grpo-oc-feature-parity

finbarrtimbers commented May 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

farhatkevin May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

finbarrtimbers commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

farhatkevin May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented May 8, 2026 •

edited

Loading