Move maybe_evaluate to grpo_utils; dedupe calculate_token_counts by finbarrtimbers · Pull Request #1669 · allenai/open-instruct

finbarrtimbers · 2026-05-08T16:51:32Z

Pure extraction, no behavior change. Sets up OLMo-core GRPO (grpo.py) to share the same eval flow as grpo_fast.py.

gemini-code-assist

Code Review

This pull request refactors the GRPO implementation by moving the maybe_evaluate function and the calculate_token_counts logic from grpo_fast.py to grpo_utils.py to eliminate code duplication. It also updates the relevant imports and call sites across the codebase. The review feedback identifies typos in type-ignore comments within the moved code that should be corrected to ensure compatibility with static analysis tools.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 452cc4a4ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…uthored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-evaluate

…de Opus 4.7 <noreply@anthropic.com>

farhatkevin · 2026-05-11T20:42:03Z

+import wandb
+from datasets import Dataset

+from open_instruct import data_loader as data_loader_lib


Do we expect grpo_utils importers to always have vllm installed? This PR makes grpo_utils import data_loader at module load time, which pulls in vllm; I think that changes behavior for non-vllm paths that only use shared helpers like some of the tests. Could a lazy import inside maybe_evaluate avoid that?

I don't want to do a lazy import as they can make it hard to reason about the code and I think they're a bit messy; my goal is to always have all imports at the top of the file when possible.

I think that this is fine as it doesn't cause any of the CPU tests to fail, which they would as they don't have vllm installed. You're right to flag this, though! We should be careful about the imports.

finbarrtimbers force-pushed the finbarr/extract-maybe-evaluate branch from 452cc4a to b6a2af8 Compare May 8, 2026 16:51

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread open_instruct/grpo_utils.py

Comment thread open_instruct/grpo_utils.py

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

Comment thread open_instruct/test_grpo_fast_eval.py

finbarrtimbers mentioned this pull request May 8, 2026

GRPO OLMo-core feature parity: eval, checkpointer, schedulers #1672

Merged

Move maybe_evaluate to grpo_utils; dedupe calculate_token_counts Co-A…

46576fd

…uthored-By: Claude Opus 4.7 <noreply@anthropic.com>

finbarrtimbers force-pushed the finbarr/extract-maybe-evaluate branch from b6a2af8 to 46576fd Compare May 11, 2026 18:31

Merge remote-tracking branch 'origin/main' into finbarr/extract-maybe…

11c47c9

…-evaluate

finbarrtimbers requested a review from farhatkevin May 11, 2026 20:10

finbarrtimbers enabled auto-merge May 11, 2026 20:10

finbarrtimbers added 3 commits May 11, 2026 14:12

Retarget maybe_evaluate test mocks to grpo_utils Co-Authored-By: Clau…

7bc12c1

…de Opus 4.7 <noreply@anthropic.com>

Merge branch 'main' into finbarr/extract-maybe-evaluate

0fe6b3c

Merge branch 'main' into finbarr/extract-maybe-evaluate

b5816ed

farhatkevin approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into finbarr/extract-maybe-evaluate

95a4bdd

finbarrtimbers added this pull request to the merge queue May 13, 2026

Merged via the queue into main with commit b774f1a May 13, 2026
7 checks passed

finbarrtimbers deleted the finbarr/extract-maybe-evaluate branch May 13, 2026 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move maybe_evaluate to grpo_utils; dedupe calculate_token_counts#1669

Move maybe_evaluate to grpo_utils; dedupe calculate_token_counts#1669
finbarrtimbers merged 6 commits into
mainfrom
finbarr/extract-maybe-evaluate

finbarrtimbers commented May 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

farhatkevin May 11, 2026

Uh oh!

finbarrtimbers May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

finbarrtimbers commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

farhatkevin May 11, 2026

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented May 8, 2026 •

edited

Loading