Skip to content

Conversation

@cmunley1
Copy link
Contributor

@cmunley1 cmunley1 commented Jan 28, 2026

add docs for how gym and RL enforces monotonicity and performs on policy token id corrections. add hypothetical docs on how to disable these checks for non monotonic trajectories, eg qwen3 thinking or agents with context management

disabling would be done by
NVIDIA-NeMo/RL#1812
potentially in NVIDIA-NeMo/RL#1779

cmunley1 and others added 5 commits January 27, 2026 17:45
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
- Moved the article to concepts (it's mostly conceptual in its current
form)
- Reformatted the content
- Added crosslinks to/from doc

please double check that i did not distort any meaning as part of this
change

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
@cmunley1 cmunley1 changed the title document on policy training docs: on policy training Jan 29, 2026
@cmunley1
Copy link
Contributor Author

RL pr for off policy training NVIDIA-NeMo/RL#1840

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants