Skip to content

Conversation

@guyueh1
Copy link
Contributor

@guyueh1 guyueh1 commented Jan 18, 2026

What does this PR do ?

Add interface to configure deep_ep usage in megatron backend

closes #1396

Dup of #1645

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • New Features

    • Added new Mixture-of-Experts (MOE) configuration options: enable/disable DeepEP optimization, select token dispatcher type (allgather, alltoall, flex), and control shared expert overlap behavior for enhanced training flexibility.
  • Chores

    • Added DeepEP as an optional dependency to support advanced MOE optimizations.

✏️ Tip: You can customize this high-level summary in your review settings.

parthmannan and others added 30 commits January 15, 2026 10:36
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
…1640)

Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
…1605)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: alexandery <alexandery@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Sahil Modi <samodi@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Jonas Yang <joyang@nvidia.com>
Signed-off-by: ZeYi Lin <944270057@qq.com>
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: alexandery-nvidia <alexandery@nvidia.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Peter Jin <pjin@nvidia.com>
Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com>
Co-authored-by: ruit <ruit@nvidia.com>
Co-authored-by: Jonas Yang <joyang@nvidia.com>
Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com>
Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me>
Co-authored-by: Alexander Zhipa <azzhipa@amazon.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Manasa Manohara <mmanohara@nvidia.com>
Co-authored-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Peter Jin <pjin@nvidia.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Peter Jin <pjin@nvidia.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Sahger Lad <lad.sahger@gmail.com>
Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Sahger Lad <lad.sahger@gmail.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: root <root@pool0-00514.cm.cluster>
Co-authored-by: root <root@pool0-00514.cm.cluster>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
…A) (#1648)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
#1715)

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
@guyueh1 guyueh1 self-assigned this Jan 18, 2026
@guyueh1 guyueh1 requested a review from a team as a code owner January 18, 2026 19:44
@guyueh1 guyueh1 added the Performance Related to improving performance label Jan 18, 2026
@guyueh1 guyueh1 requested review from a team as code owners January 18, 2026 19:44
@guyueh1 guyueh1 added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Jan 18, 2026
@guyueh1 guyueh1 requested review from a team as code owners January 18, 2026 19:44
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

📝 Walkthrough

Walkthrough

Adds three new MOE-related configuration fields (moe_enable_deepep, moe_token_dispatcher_type, moe_shared_expert_overlap) across multiple configuration files and extends the MegatronConfig TypedDict schema. Also wires these fields into the Megatron policy worker and adds DeepEP as an optional dependency.

Changes

Cohort / File(s) Summary
Configuration Files - Distillation & Math Recipes
examples/configs/distillation_math.yaml, examples/configs/distillation_math_megatron.yaml, examples/configs/dpo.yaml, examples/configs/sft.yaml, examples/configs/sft_openmathinstruct2_megatron.yaml
Added three new MOE config fields with default values (moe_enable_deepep: false, moe_token_dispatcher_type: "allgather", moe_shared_expert_overlap: false) under policy.megatron_cfg
Configuration Files - GRPO Recipes
examples/configs/grpo_math_1B.yaml, examples/configs/grpo_math_1B_megatron.yaml, examples/configs/recipes/llm/grpo-*.yaml
Added two to three new MOE config fields under policy.megatron_cfg; some recipes use moe_enable_deepep: true and moe_token_dispatcher_type: flex instead of defaults
Configuration Files - VLM & Assistant Recipes
examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml, examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml
Added three new MOE config fields with default values under policy.megatron_cfg
Schema Definition
nemo_rl/models/policy/__init__.py
Extended MegatronConfig TypedDict with three new fields: moe_enable_deepep (bool), moe_token_dispatcher_type (str with options "allgather"/"alltoall"/"flex"), moe_shared_expert_overlap (bool), including detailed docstring comments
Policy Worker Implementation
nemo_rl/models/policy/workers/megatron_policy_worker.py
Reads three new MOE config fields from config.megatron_cfg and assigns them to modelCfg during initialization
Dependency Management
pyproject.toml
Added DeepEP git dependency to both fsdp and mcore optional dependency groups
Test Configuration
tests/unit/models/generation/test_vllm_generation.py, tests/unit/models/policy/test_megatron_worker.py
Added three new MOE config fields to test configurations with default values

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

CI:L1

Suggested reviewers

  • terrykong
  • yaoyu-33
  • ashors1
🚥 Pre-merge checks | ✅ 4 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR adds DeepEP support feature but lacks test results, performance benchmarks, convergence validation, and has missing keys in exemplar configs raising runtime stability concerns. Include test results demonstrating feature correctness, provide performance numbers for DeepSeekv3 training, mark new config keys as NotRequired, use .get() with defaults, and ensure all exemplar configs include these keys.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: DeepEP interface in megatron backend' accurately summarizes the main change—adding DeepEP configuration interface to the Megatron backend.
Linked Issues check ✅ Passed The PR addresses issue #1396 by adding DeepEP configuration support across Megatron backend through config keys, TypedDict updates, and policy worker integration.
Out of Scope Changes check ✅ Passed All changes are directly related to DeepEP interface implementation: configuration keys, TypedDict schema updates, policy worker integration, and dependency addition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@nemo_rl/models/policy/__init__.py`:
- Around line 186-195: Update the TypedDict in nemo_rl/models/policy/__init__.py
to mark moe_enable_deepep, moe_token_dispatcher_type, and
moe_shared_expert_overlap as NotRequired and document recommended defaults
(e.g., False, 'allgather', False); then change the access in
megatron_policy_worker.py (around the logic at lines ~661–667) to use
config.get('moe_enable_deepep', False), config.get('moe_token_dispatcher_type',
'allgather'), and config.get('moe_shared_expert_overlap', False) so missing keys
won’t KeyError; finally add those three keys with the recommended default values
to the exemplar YAMLs (grpo_math_70B_megatron.yaml, grpo_math_8B_megatron.yaml,
grpo_math_qwen30ba3b_megatron.yaml).

@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 18, 2026
@guyueh1 guyueh1 requested a review from terrykong January 18, 2026 21:55
@guyueh1
Copy link
Contributor Author

guyueh1 commented Jan 19, 2026

@terrykong could you review? This is needed for a recent deepseek performance study urgently

Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Can you please resolve the two comments?

Fyi: @yuki-97

@terrykong terrykong merged commit c1f12d4 into main Jan 20, 2026
68 of 74 checks passed
@terrykong terrykong deleted the guyueh/deepep_mcore_training branch January 20, 2026 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests Performance Related to improving performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support DeepEP usage in MCore path