docs: on policy training #613

cmunley1 · 2026-01-28T01:46:48Z

add docs for how gym and RL enforces monotonicity and performs on policy token id corrections. add hypothetical docs on how to disable these checks for non monotonic trajectories, eg qwen3 thinking or agents with context management

disabling would be done by
NVIDIA-NeMo/RL#1812
potentially in NVIDIA-NeMo/RL#1779

Signed-off-by: Christian Munley <cmunley@nvidia.com>

- Moved the article to concepts (it's mostly conceptual in its current form) - Reformatted the content - Added crosslinks to/from doc please double check that i did not distort any meaning as part of this change --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>

Signed-off-by: Lawrence Lane <llane@nvidia.com>

cmunley1 · 2026-01-29T04:07:33Z

RL pr for off policy training NVIDIA-NeMo/RL#1840

cmunley1 and others added 5 commits January 27, 2026 17:45

document on policy training

8bfd0cb

Signed-off-by: Christian Munley <cmunley@nvidia.com>

small fix

11bc5ea

Signed-off-by: Christian Munley <cmunley@nvidia.com>

move location

9ad3812

Signed-off-by: Christian Munley <cmunley@nvidia.com>

doc build fix

e029fad

Signed-off-by: Lawrence Lane <llane@nvidia.com>

cmunley1 changed the title ~~document on policy training~~ docs: on policy training Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: on policy training #613

docs: on policy training #613

Uh oh!

cmunley1 commented Jan 28, 2026 •

edited

Loading

Uh oh!

cmunley1 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: on policy training #613

Are you sure you want to change the base?

docs: on policy training #613

Uh oh!

Conversation

cmunley1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmunley1 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cmunley1 commented Jan 28, 2026 •

edited

Loading