feat: Adding quantization aware training support with Model-Optimizer #1756

mxinO · 2026-01-10T06:45:42Z

What does this PR do ?

Add quantization aware (QA) training, e.g. QA-GRPO, QA-On policy distillation.

Usage

Run a QARL by enabling simulated quantization with modelopt, adding the following config,

++policy.generation.quant_cfg="NVFP4_DEFAULT_CFG" \
++policy.quant_cfg="NVFP4_DEFAULT_CFG" \

Other quantization formats are available, but currently only tensor-wise quantization format are supported.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Meng Xin <mxin@nvidia.com>

github-actions · 2026-01-10T06:46:04Z

⚠️ File Consistency Check

Check based on commit: 4f14851 (PR #1756 from mxin/qarl-3)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Meng Xin <mxin@nvidia.com>

realAsma

@mxinO Is it possible to consolidate all ModelOpt related modifications to one folder (following the patch based integration ModelOpt adopted For MCore, Huggingface etc..)?

Signed-off-by: Meng Xin <mxin@nvidia.com>

adding quantization aware RL

4f14851

Signed-off-by: Meng Xin <mxin@nvidia.com>

mxinO added 4 commits January 13, 2026 23:04

clean up

cb2813a

Signed-off-by: Meng Xin <mxin@nvidia.com>

convert to HF with modelopt weights

142ccc9

Signed-off-by: Meng Xin <mxin@nvidia.com>

remove dtensor impl

a569f76

Signed-off-by: Meng Xin <mxin@nvidia.com>

clean up

36794d4

Signed-off-by: Meng Xin <mxin@nvidia.com>

guyueh1 self-requested a review January 14, 2026 17:28

realAsma reviewed Jan 14, 2026

View reviewed changes

mxinO added 3 commits January 14, 2026 23:49

fix quantization layer spec

45f7871

Signed-off-by: Meng Xin <mxin@nvidia.com>

fix layer spec

a556640

Signed-off-by: Meng Xin <mxin@nvidia.com>

fix calibration

db4ca84

Signed-off-by: Meng Xin <mxin@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Adding quantization aware training support with Model-Optimizer #1756

feat: Adding quantization aware training support with Model-Optimizer #1756

Uh oh!

mxinO commented Jan 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 10, 2026

Uh oh!

realAsma left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Adding quantization aware training support with Model-Optimizer #1756

Are you sure you want to change the base?

feat: Adding quantization aware training support with Model-Optimizer #1756

Uh oh!

Conversation

mxinO commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

github-actions bot commented Jan 10, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

realAsma left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mxinO commented Jan 10, 2026 •

edited

Loading