feat: Support lora in dtensor grpo workflow[3/3]: async vllm #1752

RayenTian · 2026-01-09T08:34:30Z

What does this PR do ?

Support async config for dtensor lora grpo.

TODOs

unit test
functional test
convergence test

Issues

[3/3] of #1597
closes #1597

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Result

Async

Qwen/Qwen3-0.6B

Llama-3.2-3B-Instruct

Llama-3.1-8B

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

github-actions · 2026-01-13T09:41:42Z

⚠️ File Consistency Check

Check based on commit: ab6f639 (PR #1752 from ruit/lora_grpo_async)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-01-13T09:47:28Z

⚠️ File Consistency Check

Check based on commit: b600e5f (PR #1752 from ruit/lora_grpo_async)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-01-13T11:57:01Z

⚠️ File Consistency Check

Check based on commit: 3aba604 (PR #1752 from ruit/lora_grpo_async)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ruit <ruit@nvidia.com>

…mode' across multiple interfaces Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian changed the title ~~feat: Support lora in dtensor grpo workflow[2/3]: async vllm~~ feat: Support lora in dtensor grpo workflow[3/3]: async vllm Jan 9, 2026

RayenTian mentioned this pull request Jan 9, 2026

feat: Support lora for grpo workflow #1702

Closed

4 tasks

RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from 3f5b2b5 to ee34dcb Compare January 9, 2026 09:08

RayenTian force-pushed the ruit/lora_grpo_async branch 2 times, most recently from 9bc5186 to f92d968 Compare January 9, 2026 09:23

RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch 3 times, most recently from e37c9d9 to ee92a4d Compare January 13, 2026 09:23

RayenTian force-pushed the ruit/lora_grpo_async branch from 62eaaaf to ab6f639 Compare January 13, 2026 09:41

RayenTian force-pushed the ruit/lora_grpo_async branch from ab6f639 to b600e5f Compare January 13, 2026 09:47

RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch 2 times, most recently from 9a1b189 to 2cb73ba Compare January 13, 2026 11:49

RayenTian force-pushed the ruit/lora_grpo_async branch from b600e5f to 3aba604 Compare January 13, 2026 11:56

RayenTian force-pushed the ruit/lora_grpo_async branch from 3aba604 to 8e1312c Compare January 13, 2026 11:59

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Jan 13, 2026

RayenTian temporarily deployed to nemo-ci January 13, 2026 12:01 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci January 13, 2026 12:08 — with GitHub Actions Inactive

RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch 2 times, most recently from 517ab01 to 0bf11eb Compare January 14, 2026 02:40

RayenTian force-pushed the ruit/lora_grpo_async branch from 8e1312c to 32d76b9 Compare January 14, 2026 02:41

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 14, 2026

RayenTian temporarily deployed to nemo-ci January 14, 2026 02:42 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci January 14, 2026 02:46 — with GitHub Actions Inactive

RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from 0bf11eb to 2436d92 Compare January 14, 2026 09:54

RayenTian force-pushed the ruit/lora_grpo_async branch from 32d76b9 to 3fdb505 Compare January 14, 2026 09:56

RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from 2436d92 to cfb4f10 Compare January 15, 2026 08:37

RayenTian force-pushed the ruit/lora_grpo_sync_non_colocated branch from cfb4f10 to b880394 Compare January 15, 2026 09:25

RayenTian added 3 commits January 15, 2026 01:26

support lora async

1c389de

Signed-off-by: ruit <ruit@nvidia.com>

refactor: update weight refitting parameters to use a unified 'refit_…

f0ad4e3

…mode' across multiple interfaces Signed-off-by: ruit <ruit@nvidia.com>

add functional test

cb7c69b

Signed-off-by: ruit <ruit@nvidia.com>

This was referenced Jan 15, 2026

feat: Support lora in dtensor grpo workflow[1/3]: sync and colocated setup #1748

Open

feat: Support lora in dtensor grpo workflow[2/3]: sync and non-colocated setup #1751

Open

RayenTian force-pushed the ruit/lora_grpo_async branch from 3fdb505 to cb7c69b Compare January 15, 2026 09:42

add unit test

daaba23

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian marked this pull request as ready for review January 15, 2026 09:58

RayenTian requested review from a team as code owners January 15, 2026 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support lora in dtensor grpo workflow[3/3]: async vllm #1752

feat: Support lora in dtensor grpo workflow[3/3]: async vllm #1752

Uh oh!

RayenTian commented Jan 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Support lora in dtensor grpo workflow[3/3]: async vllm #1752

Are you sure you want to change the base?

feat: Support lora in dtensor grpo workflow[3/3]: async vllm #1752

Uh oh!

Conversation

RayenTian commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

TODOs

Issues

Usage

Result

Async

Qwen/Qwen3-0.6B

Llama-3.2-3B-Instruct

Llama-3.1-8B

Before your PR is "Ready for review"

Additional Information

Uh oh!

github-actions bot commented Jan 13, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Jan 13, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Jan 13, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RayenTian commented Jan 9, 2026 •

edited

Loading