Fix #941: Correctly define choices for --reward-model as a tuple#974
Merged
hwchen2017 merged 1 commit intodeepspeedai:masterfrom Jun 9, 2025
Merged
Conversation
72b76d2 to
ce5a1e8
Compare
Contributor
|
Hi @Flink-ddd, can you fix the formatting error? |
…deepspeedai#941 Signed-off-by: Vensenmu <vensenmu@gmail.com>
ce5a1e8 to
5addcd9
Compare
Contributor
Author
|
Hi @hwchen2017, I just fixed the formatting error and submitted again, Thank you very much for reminding me. |
hwchen2017
approved these changes
Jun 9, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #941
Problem Description:
In the e2e_rlhf.py script, the choices option for the --reward-model parameter was incorrectly defined as a string ("350m") instead of a single-element tuple ("350m",). This caused argparse to interpret it as a sequence of individual characters ('(', '3', '5', '0', 'm', ')') rather than a valid list of options.
Solution:
As suggested in issue #941, this commit changes the definition of choices to ("350m",) by adding a trailing comma after the string. This ensures it is correctly parsed as a tuple containing the single element "350m".
Testing:
The script correctly accepts the parameter when run with the valid option --reward-model 350m.
When run with an invalid option (e.g., --reward-model 123m), argparse correctly throws an "invalid choice" error.
The script correctly uses the default value when the --reward-model parameter is not provided.
Additional Notes (Optional):
Thanks to @ChenDaiwei-99 (the original reporter of the issue) for identifying the problem and suggesting the solution.