Skip to content

[Bugfix][Rollout] Validate batched RM reward lengths#313

Open
BreezyB1n wants to merge 2 commits into
vllm-project:mainfrom
BreezyB1n:batched-rm-length-plan
Open

[Bugfix][Rollout] Validate batched RM reward lengths#313
BreezyB1n wants to merge 2 commits into
vllm-project:mainfrom
BreezyB1n:batched-rm-length-plan

Conversation

@BreezyB1n

@BreezyB1n BreezyB1n commented Jul 2, 2026

Copy link
Copy Markdown

Fixes #312

Summary

  • Validate that batched reward model outputs contain exactly one reward per input sample before assignment.
  • Reject deceptive top-level batched RM return types such as dict, str, and bytes before materializing rewards.
  • Keep the rollout assignment loops strict after the boundary check.
  • Add CPU unit coverage for short, long, and invalid-type batched RM results.

Root Cause

The rollout code assigned batched rewards with zip(..., strict=False). When a custom batched RM returned too few rewards, later samples kept reward=None; when it returned too many rewards, the extra values were silently dropped. Also, blindly calling list(rewards) could convert a top-level dict, str, or bytes result into keys, characters, or integers whose length happened to match the sample count.

User Impact

Custom batched RM plugins now fail fast with a clear error instead of producing partially rewarded rollout data or assigning invalid values to sample.reward. Successful batched RM paths keep the same API and ordering behavior.

Testing

  • /Users/bytedance/miniconda3/bin/python3.13 - <<'PY' ... pytest.main(['tests/test_vllm_rollout.py', '-q', '-k', 'batched_rm']) ... PY
  • /Users/bytedance/miniconda3/bin/python3.13 -m pytest tests/plugin_contracts/test_plugin_path_loading_contracts.py -q -k 'custom_rm'
  • .venv/bin/python -m black --check vime/rollout/rm_hub/__init__.py vime/rollout/vllm_rollout.py tests/test_vllm_rollout.py
  • .venv/bin/python -m ruff check vime/rollout/rm_hub/__init__.py vime/rollout/vllm_rollout.py tests/test_vllm_rollout.py
  • git diff --check

Co-authored-by: OpenAI Codex <codex@openai.com>
Signed-off-by: BreezyB1n <160214624+BreezyB1n@users.noreply.github.com>
@read-the-docs-community

read-the-docs-community Bot commented Jul 2, 2026

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces validation to ensure that the number of rewards returned by a batched reward model matches the number of samples. It updates batched_async_rm to raise a ValueError on mismatch, sets strict=True in zip calls within vllm_rollout.py, and adds corresponding unit tests. Feedback suggests adding explicit type validation for the returned rewards to prevent potential silent correctness bugs or generic errors when non-sequence types (like None, str, or dict) are returned.

Comment thread vime/rollout/rm_hub/__init__.py Outdated
Signed-off-by: BreezyB1n <160214624+BreezyB1n@users.noreply.github.com>
@BreezyB1n BreezyB1n marked this pull request as ready for review July 2, 2026 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Batched custom RM reward length mismatches are silently ignored

1 participant