[LLM] Simplify IFEval reward aggregator by vmoens · Pull Request #3543 · pytorch/rl

vmoens · 2026-03-05T11:01:17Z

Stack from ghstack (oldest at bottom):

Replace the complex tiered multiplicative reward (structure multiplier,
quality bonus thresholds, complexity scaling) with a simple weighted
average of IFEval metrics plus a small additive format bonus.

The new reward is: weighted_avg(strict/loose metrics) + format_bonus,
where format_bonus is 0.1 for a single answer block and 0.05 for a
single think block. Reward range: ~[0, 1.15].

Made-with: Cursor

[ghstack-poisoned]

pytorch-bot · 2026-03-05T11:01:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3543

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 2 Cancelled Jobs, 6 Unrelated Failures

As of commit 2bcfff7 with merge base 491ed0f ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
<!DOCTYPE html>
Generate documentation / build-docs (3.12, 12.8) / linux-job (gh)
RuntimeError: Command docker exec -t 681ff9f2e857f52119afe0a4d931942ede6648c2768c706639e9dff65edccfa4 /exec failed with exit code 2
PR Label / add-label (gh)
Process completed with exit code 1.
Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh)
test/test_custom_envs.py::TestPendulum::test_pendulum_env[device1]

CANCELLED JOBS - The following jobs were cancelled. Please retry:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
##[error]The operation was canceled.
Unit-tests on Linux / tests-cpu (3.14) / linux-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh) (trunk failure)
test/test_rb.py::TestStorages::test__rand_given_ndim_recompile
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh) (trunk failure)
test/test_rb.py::TestStorages::test__rand_given_ndim_recompile
Unit-tests on Linux / tests-cpu (3.12) / linux-job (gh) (trunk failure)
test/test_rb.py::TestStorages::test__rand_given_ndim_recompile
Unit-tests on Linux / tests-cpu (3.13) / linux-job (gh) (trunk failure)
test/test_rb.py::TestStorages::test__rand_given_ndim_recompile
Unit-tests on Linux / tests-olddeps (3.10, 11.8) / linux-job (gh) (trunk failure)
test/test_loggers.py::TestCSVLogger::test_log_video[mp4-steps1]
Unit-tests on Linux / tests-optdeps (3.12, 13.0) / linux-job (gh) (trunk failure)
test/test_rb.py::TestStorages::test__rand_given_ndim_recompile

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-05T11:01:33Z

⚠️ PR Title Label Error

Unknown or invalid prefix [LLM].

Current title: [LLM] Simplify IFEval reward aggregator

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

github-actions · 2026-03-05T11:01:39Z

⚠️ PR Title Label Error

Unknown or invalid prefix [LLM].

Current title: [LLM] Simplify IFEval reward aggregator

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

Update

2bcfff7

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2026

This was referenced Mar 5, 2026

[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions #3542

Open

[LLM] Add MATH (competition mathematics) environment #3544

Open

github-actions bot added the llm/ LLM-related PR, triggers LLM CI tests label Mar 5, 2026

This was referenced Mar 5, 2026

[LLM] Add Countdown numbers-game environment #3545

Open

[LLM] Wire MATH and Countdown into GRPO and Expert Iteration scripts #3546

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] Simplify IFEval reward aggregator#3543

[LLM] Simplify IFEval reward aggregator#3543
vmoens wants to merge 1 commit intogh/vmoens/235/basefrom
gh/vmoens/235/head

vmoens commented Mar 5, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vmoens commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3543

❌ 4 New Failures, 2 Cancelled Jobs, 6 Unrelated Failures

Uh oh!

github-actions bot commented Mar 5, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

github-actions bot commented Mar 5, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vmoens commented Mar 5, 2026 •

edited

Loading

pytorch-bot bot commented Mar 5, 2026 •

edited

Loading