[None][fix] Update moe hidden_size in communicator for nemotron-h by Wanli-Jiang · Pull Request #12890 · NVIDIA/TensorRT-LLM

Wanli-Jiang · 2026-04-09T10:30:32Z

Summary by CodeRabbit

Refactor
- Enhanced Mixture of Experts (MOE) communication strategy configuration to support flexible hidden size parameter handling, enabling more efficient strategy selection and improved configuration options for MOE-based models.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang · 2026-04-09T10:30:59Z

/bot run --add-multi-gpu-test --disable-fail-fast

coderabbitai · 2026-04-09T10:34:16Z

📝 Walkthrough

Walkthrough

This PR adds an optional hidden_size parameter to communication strategy factory methods, enabling MoE-specific hidden size overrides when creating communication strategies. The configurable MoE module now passes its own hidden_size to the factory during strategy creation.

Changes

Cohort / File(s)	Summary
Communication Strategy Hidden Size Override `tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py`, `tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`	Added optional `hidden_size: Optional[int]` parameter to `create_strategy()` and `_create_forced_method()` in the factory. Updated parameter extraction to use the new argument when provided, otherwise fall back to `model_config.pretrained_config.hidden_size`. Passed the resolved `hidden_size` through strategy construction for `NVLinkOneSided`, `DeepEP`, and `DeepEPLowLatency`. Modified configurable MoE to pass its own `hidden_size` to factory calls.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is empty except for the template structure; it lacks the required 'Description' and 'Test Coverage' sections with substantive content.	Add a description explaining what the hidden_size parameter change fixes for nemotron-h and list relevant test cases that validate the changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly indicates the main change: updating MoE hidden_size in the communicator for the nemotron-h model, which aligns with the code changes adding hidden_size parameters.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py (1)
1-1: ⚠️ Potential issue | 🟡 Minor

Update copyright year for this modified file.

This file was modified in 2026, but the SPDX copyright line still says 2025.
Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines: “All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header with the year of latest meaningful modification” and “Include NVIDIA copyright header on ALL new files (update year on modified files)”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py`
at line 1, Update the SPDX copyright header in
tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py to
reflect the latest modification year (change 2025 to 2026) so the NVIDIA
copyright line matches the file's most recent meaningful modification.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py`:
- Line 1: Update the SPDX copyright header in
tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py to
reflect the latest modification year (change 2025 to 2026) so the NVIDIA
copyright line matches the file's most recent meaningful modification.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cd568bca-cd12-4207-8da8-b826e7b874ce

📥 Commits

Reviewing files that changed from the base of the PR and between 26a6151 and 4ddfae0.

📒 Files selected for processing (2)

tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py
tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py

tensorrt-cicd · 2026-04-09T10:37:51Z

PR_Github #42528 [ run ] triggered by Bot. Commit: 4ddfae0 Link to invocation

tensorrt-cicd · 2026-04-09T23:04:14Z

PR_Github #42528 [ run ] completed with state SUCCESS. Commit: 4ddfae0
/LLM/main/L0_MergeRequest_PR pipeline #33268 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Wanli-Jiang · 2026-04-10T01:08:48Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-04-10T01:16:24Z

PR_Github #42602 [ run ] triggered by Bot. Commit: 4ddfae0 Link to invocation

[None][fix] Update moe hidden_size in communicator for nemotron-h

4ddfae0

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang requested a review from a team as a code owner April 9, 2026 10:30

Wanli-Jiang requested a review from QiJune April 9, 2026 10:30

github-actions bot assigned Wanli-Jiang Apr 9, 2026

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

nv-guomingz requested review from JunyiXu-nv and leslie-fang25 April 9, 2026 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Update moe hidden_size in communicator for nemotron-h#12890

[None][fix] Update moe hidden_size in communicator for nemotron-h#12890
Wanli-Jiang wants to merge 1 commit intoNVIDIA:mainfrom
Wanli-Jiang:user/williamj/fix-nemotronh-workspace

Wanli-Jiang commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Wanli-Jiang commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

Wanli-Jiang commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Wanli-Jiang commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Wanli-Jiang commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

Wanli-Jiang commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wanli-Jiang commented Apr 9, 2026 •

edited by coderabbitai bot

Loading