perf: DeepEP interface in megatron backend #1794

guyueh1 · 2026-01-18T19:44:55Z

What does this PR do ?

Add interface to configure deep_ep usage in megatron backend

closes #1396

Dup of #1645

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Release Notes

New Features
- Added new Mixture-of-Experts (MOE) configuration options: enable/disable DeepEP optimization, select token dispatcher type (allgather, alltoall, flex), and control shared expert overlap behavior for enhanced training flexibility.
Chores
- Added DeepEP as an optional dependency to support advanced MOE optimizations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

…1640) Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

…1605) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: ruit <ruit@nvidia.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: root <root@pool0-00514.cm.cluster> Co-authored-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com>

Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

…A) (#1648) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

coderabbitai · 2026-01-18T19:55:18Z

📝 Walkthrough

Walkthrough

Adds three new MOE-related configuration fields (moe_enable_deepep, moe_token_dispatcher_type, moe_shared_expert_overlap) across multiple configuration files and extends the MegatronConfig TypedDict schema. Also wires these fields into the Megatron policy worker and adds DeepEP as an optional dependency.

Changes

Cohort / File(s)	Summary
Configuration Files - Distillation & Math Recipes `examples/configs/distillation_math.yaml`, `examples/configs/distillation_math_megatron.yaml`, `examples/configs/dpo.yaml`, `examples/configs/sft.yaml`, `examples/configs/sft_openmathinstruct2_megatron.yaml`	Added three new MOE config fields with default values (moe_enable_deepep: false, moe_token_dispatcher_type: "allgather", moe_shared_expert_overlap: false) under policy.megatron_cfg
Configuration Files - GRPO Recipes `examples/configs/grpo_math_1B.yaml`, `examples/configs/grpo_math_1B_megatron.yaml`, `examples/configs/recipes/llm/grpo-*.yaml`	Added two to three new MOE config fields under policy.megatron_cfg; some recipes use moe_enable_deepep: true and moe_token_dispatcher_type: flex instead of defaults
Configuration Files - VLM & Assistant Recipes `examples/configs/vlm_grpo_3B.yaml`, `examples/configs/vlm_grpo_3B_megatron.yaml`, `examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml`	Added three new MOE config fields with default values under policy.megatron_cfg
Schema Definition `nemo_rl/models/policy/__init__.py`	Extended MegatronConfig TypedDict with three new fields: moe_enable_deepep (bool), moe_token_dispatcher_type (str with options "allgather"/"alltoall"/"flex"), moe_shared_expert_overlap (bool), including detailed docstring comments
Policy Worker Implementation `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Reads three new MOE config fields from config.megatron_cfg and assigns them to modelCfg during initialization
Dependency Management `pyproject.toml`	Added DeepEP git dependency to both fsdp and mcore optional dependency groups
Test Configuration `tests/unit/models/generation/test_vllm_generation.py`, `tests/unit/models/policy/test_megatron_worker.py`	Added three new MOE config fields to test configurations with default values

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

feat: Add moe load balancing metrics #1520: Adds and wires up additional MOE configuration behavior into the Megatron policy surface by modifying MegatronConfig and megatron_policy_worker.
feat: add Megatron support for on-policy distillation #1324: Performs large Megatron config work overlapping the same distillation YAML configuration files touched by this PR.
feat: Deepseek migration to Megatron-Bridge + CP support #1059: Modifies Megatron policy worker and config surface to integrate Deepseek-related MOE features alongside other Megatron enhancements.

Suggested labels

CI:L1

Suggested reviewers

terrykong
yaoyu-33
ashors1

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR adds DeepEP support feature but lacks test results, performance benchmarks, convergence validation, and has missing keys in exemplar configs raising runtime stability concerns.	Include test results demonstrating feature correctness, provide performance numbers for DeepSeekv3 training, mark new config keys as NotRequired, use .get() with defaults, and ensure all exemplar configs include these keys.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'perf: DeepEP interface in megatron backend' accurately summarizes the main change—adding DeepEP configuration interface to the Megatron backend.
Linked Issues check	✅ Passed	The PR addresses issue `#1396` by adding DeepEP configuration support across Megatron backend through config keys, TypedDict updates, and policy worker integration.
Out of Scope Changes check	✅ Passed	All changes are directly related to DeepEP interface implementation: configuration keys, TypedDict schema updates, policy worker integration, and dependency addition.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@nemo_rl/models/policy/__init__.py`:
- Around line 186-195: Update the TypedDict in nemo_rl/models/policy/__init__.py
to mark moe_enable_deepep, moe_token_dispatcher_type, and
moe_shared_expert_overlap as NotRequired and document recommended defaults
(e.g., False, 'allgather', False); then change the access in
megatron_policy_worker.py (around the logic at lines ~661–667) to use
config.get('moe_enable_deepep', False), config.get('moe_token_dispatcher_type',
'allgather'), and config.get('moe_shared_expert_overlap', False) so missing keys
won’t KeyError; finally add those three keys with the recommended default values
to the exemplar YAMLs (grpo_math_70B_megatron.yaml, grpo_math_8B_megatron.yaml,
grpo_math_qwen30ba3b_megatron.yaml).

nemo_rl/models/policy/__init__.py

guyueh1 · 2026-01-19T18:15:50Z

@terrykong could you review? This is needed for a recent deepseek performance study urgently

terrykong

Lgtm. Can you please resolve the two comments?

Fyi: @yuki-97

examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron.yaml

examples/configs/recipes/llm/grpo-dapomath17k-dsv3-megatron.yaml

parthmannan and others added 30 commits January 15, 2026 10:36

Add DeepEP support to Megatron Policy

b82a643

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Add example recipe and add to config dict

0488cee

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

docs: Revise news section for nemotron v3 and DAPO algorithm support (#…

32a6794

…1640) Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

chore: fix grpo functional test metric (#1643)

1143d7a

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

perf: Add qwen3 30b-a3b async-8-off recipe (#1642)

527f37e

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

feat: Add GPT-OSS support via mcore (#1452)

285a329

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Recipe update

105a5cc

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: Handle disabled validation in SFT training (#1611)

250f34e

Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: Fix crash when using cp in dtensor path (#1663)

afadf3e

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: Fix Fp8 sequence padding for PP>1 case (#1579)

87c55e2

Signed-off-by: root <root@pool0-00514.cm.cluster> Co-authored-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: Fix fp8 after vllm v0.11.2 bump (#1660)

b527f27

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: Fix crash when using activation_checkpointing (#1676)

21d75b2

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

feat: add dapo recipe and test (#1617)

f952f78

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

feat: DTensorPolicyV2 GPT-OSS SFT support (#1470)

8c9ae9f

Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com>

fix: grad norm calculation for dtensor v2 (#1693)

d3442b4

Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoR…

7fbc72e

…A) (#1648) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

feat: Support prefetching of specific envs (#1692)

0419429

Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Upgrade DeepEP version to match

9108663

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Lint fix

78917e0

Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: Fix DTensor slice crash after PyTorch 2.9 bump (#1689)

78d182c

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: grad norm check for automodel gpt oss nightly (#1708)

71a7fa8

Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: relax nanov3 nightly test metrics strict (#1712)

8c492ff

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

fix: on GB200 use single-thread checkpoint save to avoid Cpu OOM (#1703)

4051123

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

perf: [Perf recipe] Change TP 16->32 for deepseek GB200 sync benchmark (

3e4bdcf

#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

docs: Add doc for nano-v3 (#1694)

a2580e2

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

parthmannan and others added 4 commits January 15, 2026 16:15

Add deepep args to policy tests

6b34413

Add keys to vllm tests

eb596a4

Fix lint

f72d321

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Merge branch 'main' into guyueh/deepep_mcore_training

6d85e5c

guyueh1 self-assigned this Jan 18, 2026

guyueh1 requested a review from a team as a code owner January 18, 2026 19:44

guyueh1 added the Performance Related to improving performance label Jan 18, 2026

guyueh1 requested review from a team as code owners January 18, 2026 19:44

guyueh1 added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Jan 18, 2026

guyueh1 requested review from a team as code owners January 18, 2026 19:44

guyueh1 temporarily deployed to nemo-ci January 18, 2026 19:45 — with GitHub Actions Inactive

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

nemo_rl/models/policy/__init__.py Show resolved Hide resolved

guyueh1 temporarily deployed to nemo-ci January 18, 2026 20:13 — with GitHub Actions Inactive

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 18, 2026

guyueh1 temporarily deployed to nemo-ci January 18, 2026 21:37 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci January 18, 2026 21:40 — with GitHub Actions Inactive

guyueh1 requested a review from terrykong January 18, 2026 21:55

guyueh1 temporarily deployed to nemo-ci January 18, 2026 23:28 — with GitHub Actions Inactive

terrykong reviewed Jan 19, 2026

View reviewed changes

examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron.yaml Show resolved Hide resolved

examples/configs/recipes/llm/grpo-dapomath17k-dsv3-megatron.yaml Show resolved Hide resolved

terrykong approved these changes Jan 20, 2026

View reviewed changes

terrykong merged commit c1f12d4 into main Jan 20, 2026
68 of 74 checks passed

terrykong deleted the guyueh/deepep_mcore_training branch January 20, 2026 06:55

parthmannan mentioned this pull request Jan 20, 2026

perf: Add DeepEP support to Megatron Policy #1645

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: DeepEP interface in megatron backend #1794

perf: DeepEP interface in megatron backend #1794

guyueh1 commented Jan 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 18, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

guyueh1 commented Jan 19, 2026

Uh oh!

terrykong left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

perf: DeepEP interface in megatron backend #1794

perf: DeepEP interface in megatron backend #1794

Conversation

guyueh1 commented Jan 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 18, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guyueh1 commented Jan 19, 2026

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

guyueh1 commented Jan 18, 2026 •

edited by coderabbitai bot

Loading