[Bugfix][Rollout] Validate batched RM reward lengths by BreezyB1n · Pull Request #313 · vllm-project/vime

BreezyB1n · 2026-07-02T06:56:51Z

Fixes #312

Summary

Validate that batched reward model outputs contain exactly one reward per input sample before assignment.
Reject deceptive top-level batched RM return types such as dict, str, and bytes before materializing rewards.
Keep the rollout assignment loops strict after the boundary check.
Add CPU unit coverage for short, long, and invalid-type batched RM results.

Root Cause

The rollout code assigned batched rewards with zip(..., strict=False). When a custom batched RM returned too few rewards, later samples kept reward=None; when it returned too many rewards, the extra values were silently dropped. Also, blindly calling list(rewards) could convert a top-level dict, str, or bytes result into keys, characters, or integers whose length happened to match the sample count.

User Impact

Custom batched RM plugins now fail fast with a clear error instead of producing partially rewarded rollout data or assigning invalid values to sample.reward. Successful batched RM paths keep the same API and ordering behavior.

Testing

/Users/bytedance/miniconda3/bin/python3.13 - <<'PY' ... pytest.main(['tests/test_vllm_rollout.py', '-q', '-k', 'batched_rm']) ... PY
/Users/bytedance/miniconda3/bin/python3.13 -m pytest tests/plugin_contracts/test_plugin_path_loading_contracts.py -q -k 'custom_rm'
.venv/bin/python -m black --check vime/rollout/rm_hub/__init__.py vime/rollout/vllm_rollout.py tests/test_vllm_rollout.py
.venv/bin/python -m ruff check vime/rollout/rm_hub/__init__.py vime/rollout/vllm_rollout.py tests/test_vllm_rollout.py
git diff --check

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: BreezyB1n <160214624+BreezyB1n@users.noreply.github.com>

read-the-docs-community · 2026-07-02T06:57:53Z

Documentation build overview

📚 vime | 🛠️ Build #33405701 | 📁 Comparing d529325 against latest (5532763)

🔍 Preview build

41 files changed · ± 41 modified

± Modified

gemini-code-assist

Code Review

This pull request introduces validation to ensure that the number of rewards returned by a batched reward model matches the number of samples. It updates batched_async_rm to raise a ValueError on mismatch, sets strict=True in zip calls within vllm_rollout.py, and adds corresponding unit tests. Feedback suggests adding explicit type validation for the returned rewards to prevent potential silent correctness bugs or generic errors when non-sequence types (like None, str, or dict) are returned.

Signed-off-by: BreezyB1n <160214624+BreezyB1n@users.noreply.github.com>

[Bugfix][Rollout] Validate batched RM reward lengths

007c134

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: BreezyB1n <160214624+BreezyB1n@users.noreply.github.com>

gemini-code-assist Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread vime/rollout/rm_hub/__init__.py Outdated

[Bugfix][Rollout] Reject invalid batched RM result types

d529325

Signed-off-by: BreezyB1n <160214624+BreezyB1n@users.noreply.github.com>

BreezyB1n marked this pull request as ready for review July 2, 2026 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Rollout] Validate batched RM reward lengths#313

[Bugfix][Rollout] Validate batched RM reward lengths#313
BreezyB1n wants to merge 2 commits into
vllm-project:mainfrom
BreezyB1n:batched-rm-length-plan

BreezyB1n commented Jul 2, 2026 •

edited

Loading

Uh oh!

read-the-docs-community Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BreezyB1n commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

User Impact

Testing

Uh oh!

read-the-docs-community Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BreezyB1n commented Jul 2, 2026 •

edited

Loading

read-the-docs-community Bot commented Jul 2, 2026 •

edited

Loading