Move maybe_evaluate to grpo_utils; dedupe calculate_token_counts#1669
Conversation
452cc4a to
b6a2af8
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors the GRPO implementation by moving the maybe_evaluate function and the calculate_token_counts logic from grpo_fast.py to grpo_utils.py to eliminate code duplication. It also updates the relevant imports and call sites across the codebase. The review feedback identifies typos in type-ignore comments within the moved code that should be corrected to ensure compatibility with static analysis tools.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 452cc4a4ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…uthored-By: Claude Opus 4.7 <noreply@anthropic.com>
b6a2af8 to
46576fd
Compare
| import wandb | ||
| from datasets import Dataset | ||
|
|
||
| from open_instruct import data_loader as data_loader_lib |
There was a problem hiding this comment.
Do we expect grpo_utils importers to always have vllm installed? This PR makes grpo_utils import data_loader at module load time, which pulls in vllm; I think that changes behavior for non-vllm paths that only use shared helpers like some of the tests. Could a lazy import inside maybe_evaluate avoid that?
There was a problem hiding this comment.
I don't want to do a lazy import as they can make it hard to reason about the code and I think they're a bit messy; my goal is to always have all imports at the top of the file when possible.
I think that this is fine as it doesn't cause any of the CPU tests to fail, which they would as they don't have vllm installed. You're right to flag this, though! We should be careful about the imports.
Pure extraction, no behavior change. Sets up OLMo-core GRPO (
grpo.py) to share the same eval flow asgrpo_fast.py.