Npu megatron adapt by addsubmuldiv · Pull Request #129 · modelscope/twinkle

addsubmuldiv · 2026-03-26T18:02:02Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Summary

Adapt Twinkle's Megatron backend to run on Ascend NPU via MindSpeed
Fix a correctness bug in MoE training that affects megatron-core<0.13 (including 0.12.1): when etp_size defaulted to 1 while tp_size > 1, GPTBridge sharded expert weights by etp=1 while the model had already built expert parameters sharded by tp_size, causing a reshape crash on every MoE layer weight load
Add verified NPU cookbook examples for Dense full-parameter, LoRA, and MoE LoRA training

Changes

Bug fix: ETP shape mismatch on megatron-core<0.13

In megatron-core<0.13, there is no independent ETP group — expert ColumnParallelLinear internally reuses the dense TP group, so expert weights are always sharded by tp_size. TwinkleMegatronArgs.expert_tensor_parallel_size previously returned etp_size from DeviceMesh directly. The fix detects the megatron-core version and forces expert_tensor_parallel_size to return tp_size on affected versions, with a warning logged when the values diverge.

NPU Megatron adaptation

MindSpeed early import (megatron/__init__.py): On NPU, mindspeed.megatron_adaptor is imported at package init time, before any megatron.core symbol gets bound by value. MindSpeed is an import-time patch system — it replaces torch.compile, TE ops, and Megatron internals and must run before those modules are loaded
MindSpeed runtime args (_mindspeed_args.py, mindspeed_bootstrap.py): Builds the argparse.Namespace MindSpeed expects at runtime by merging MindSpeed defaults, HF config–derived values, and Twinkle's TP/PP/CP/EP settings, then installs it into MindSpeed's internal state. Re-runs repatch() when the config signature changes
Gloo group for NPU object gather (framework.py): HCCL all_gather_object hangs on 8-card NPU. Metric gather on NPU now uses the Megatron-created Gloo DP group instead of the default HCCL group
Disabled masked_softmax_fusion on NPU (args.py): MindSpeed's TBE-backed fused softmax fails when the TBE kernel is not compiled. NPU runs fall back to unfused softmax while keeping Flash Attention
None-safe embedding all-reduce for LoRA (args.py): LoRA freezes the base embedding weight, so shared_embedding_or_output_weight().grad is None in PP+LoRA runs. Native _allreduce_word_embedding_grads would call all_reduce(None) and crash. A None-safe wrapper is temporarily monkey-patched around the native finalize_model_grads call and restored in a finally block. The wrapper also handles _get_main_grad_attr signature drift between megatron-core 0.12.x and 0.16.x
Default PG bound_device_id on NPU (megatron.py): Gloo subgroups inherit bound_device_id from the default process group, which breaks on NPU. The default PG binding is cleared before initialize_model_parallel(). Single-card runs also get a rank=0/world_size=1 PG so the same init path works without torchrun
Lazy Megatron model registry (model/register.py): Model type detection no longer imports megatron.core before MindSpeed bootstrap

Verification

All changes verified with real 8-card NPU smoke runs (10 steps, continuous loss output, checkpoint saved):

Script	Topology
`cookbook/megatron/npu/tp_full_npu.py`	Dense full-parameter, dp=2 tp=2 pp=2
`cookbook/megatron/npu/tp_lora_npu.py`	Dense LoRA, dp=2 tp=2 pp=2

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request introduces support for Megatron-based parallel training (TP, PP, EP) on NPU by integrating with MindSpeed. Key changes include new example scripts for LoRA and MoE LoRA training, comprehensive documentation updates for NPU setup, and several compatibility fixes such as using torch.where for masking and Gloo groups for object collectives on NPU. The review feedback identifies redundant environment variable assignments in shell scripts, incorrect script paths and PYTHONPATH inconsistencies in the documentation, and suggests clarifying the specific constraints for MoE LoRA support.

gemini-code-assist · 2026-03-26T18:04:43Z

cookbook/megatron/npu/tp_lora_npu.sh

+ASCEND_RT_VISIBLE_DEVICES="${ASCEND_RT_VISIBLE_DEVICES}" \
+torchrun --nproc_per_node=8 cookbook/megatron/npu/tp_lora_npu.py


The ASCEND_RT_VISIBLE_DEVICES variable is already set on line 4. Setting it again here for the torchrun command is redundant and can be removed to improve script clarity.

Suggested change

ASCEND_RT_VISIBLE_DEVICES="${ASCEND_RT_VISIBLE_DEVICES}" \

torchrun --nproc_per_node=8 cookbook/megatron/npu/tp_lora_npu.py

torchrun --nproc_per_node=8 cookbook/megatron/npu/tp_lora_npu.py

gemini-code-assist · 2026-03-26T18:04:44Z

cookbook/megatron/npu/tp_moe_lora_npu.sh

+ASCEND_RT_VISIBLE_DEVICES="${ASCEND_RT_VISIBLE_DEVICES}" \
+torchrun --nproc_per_node=8 cookbook/megatron/npu/tp_moe_lora_npu.py


The ASCEND_RT_VISIBLE_DEVICES variable is already set on line 4. Setting it again here for the torchrun command is redundant and can be removed to improve script clarity.

Suggested change

ASCEND_RT_VISIBLE_DEVICES="${ASCEND_RT_VISIBLE_DEVICES}" \

torchrun --nproc_per_node=8 cookbook/megatron/npu/tp_moe_lora_npu.py

torchrun --nproc_per_node=8 cookbook/megatron/npu/tp_moe_lora_npu.py

gemini-code-assist · 2026-03-26T18:04:44Z

docs/source_en/Usage Guide/NPU-Support.md

+export TWINKLE_SRC_PATH=/path/to/twinkle/src
+export MEGATRON_LM_PATH=/path/to/Megatron-LM
+export PYTHONPATH=${TWINKLE_SRC_PATH}:${MEGATRON_LM_PATH}:${PYTHONPATH}


The documentation regarding PYTHONPATH setup is inconsistent with the provided example scripts. Here, and in other places (lines 223-225, 332-337), TWINKLE_SRC_PATH is included. However, the example shell scripts (tp_lora_npu.sh, tp_moe_lora_npu.sh) do not use TWINKLE_SRC_PATH, assuming twinkle is installed in an editable mode. To avoid confusion, the documentation should be consistent with the example scripts.

Suggested change

export TWINKLE_SRC_PATH=/path/to/twinkle/src

export MEGATRON_LM_PATH=/path/to/Megatron-LM

export PYTHONPATH=${TWINKLE_SRC_PATH}:${MEGATRON_LM_PATH}:${PYTHONPATH}

export MEGATRON_LM_PATH=/path/to/Megatron-LM

export PYTHONPATH=${MEGATRON_LM_PATH}:${PYTHONPATH}

gemini-code-assist · 2026-03-26T18:04:44Z

docs/source_en/Usage Guide/NPU-Support.md

+export PYTHONPATH=${TWINKLE_SRC_PATH}:${MEGATRON_LM_PATH}:${PYTHONPATH}
+```
+
+> **Tip**: `cookbook/megatron/tp.sh` and `cookbook/megatron/tp_moe.sh` already include automatic PYTHONPATH configuration. You can use these scripts directly to launch training without manual setup. Default paths can be overridden via the `TWINKLE_SRC_PATH` and `MEGATRON_LM_PATH` environment variables.


The script paths mentioned here (cookbook/megatron/tp.sh and cookbook/megatron/tp_moe.sh) are incorrect. The scripts added in this pull request are located in cookbook/megatron/npu/ and are named tp_lora_npu.sh and tp_moe_lora_npu.sh. Please update the paths to refer to the correct example scripts.

Suggested change

> **Tip**: `cookbook/megatron/tp.sh` and `cookbook/megatron/tp_moe.sh` already include automatic PYTHONPATH configuration. You can use these scripts directly to launch training without manual setup. Default paths can be overridden via the `TWINKLE_SRC_PATH` and `MEGATRON_LM_PATH` environment variables.

> **Tip**: The example scripts in `cookbook/megatron/npu/` (e.g., `tp_lora_npu.sh`) already include automatic PYTHONPATH configuration for Megatron-LM. You can use these scripts directly to launch training without manual setup. The default path can be overridden via the `MEGATRON_LM_PATH` environment variable.

gemini-code-assist · 2026-03-26T18:04:44Z

docs/source_en/Usage Guide/NPU-Support.md

+- ✅ Megatron backend (DP=2, TP=2, PP=2)
+- ✅ 10-step continuous metric printing + checkpoint saving
+
+**Note**: MoE models do not currently support LoRA fine-tuning (Expert LoRA is not available when ETP>1).


This note is confusing as it seems to contradict the preceding section "Megatron MoE LoRA Fine-tuning". It should be rephrased to clarify that the limitation applies specifically when ETP > 1.

Suggested change

**Note**: MoE models do not currently support LoRA fine-tuning (Expert LoRA is not available when ETP>1).

**Note**: When using LoRA with MoE models, Expert LoRA is only supported when ETP=1. The provided MoE LoRA example uses a verified topology that respects this constraint.

gemini-code-assist · 2026-03-26T18:04:44Z

src/twinkle/model/megatron/megatron.py

+                # Avoid bool advanced indexing here. On NPU this lowers to
+                # aclnnNonzeroV2 inside AdvancedIndex and can crash during
+                # end-to-end training; torch.where preserves the same masking
+                # semantics without going through that path.
+                masked_labels = torch.where(loss_mask, labels, torch.zeros_like(labels))


Using boolean indexing with ~loss_mask can lead to issues on NPU backends as it may lower to unsupported operations. Replacing it with torch.where is a safer and more compatible way to achieve the same masking behavior, as noted in the comment.

gemini-code-assist · 2026-03-26T18:04:44Z

src/twinkle/model/megatron/megatron.py

+        # Some Megatron-LM versions (e.g. 0.12.1) only accept
+        # overlap_param_gather_with_optimizer_step here.
+        # overlap_param_gather still exists on ddp_config / distributed
+        # optimizer paths, but passing it directly into OptimizerConfig
+        # raises TypeError on this branch.
+        config_sig = inspect.signature(OptimizerConfig).parameters
+        config_kwargs = {
+            'optimizer': 'adam',
+            'lr': lr,
+            'min_lr': kwargs.get('min_lr', 0.0),
+            'weight_decay': kwargs.get('weight_decay', 0.01),
+            'adam_beta1': kwargs.get('adam_beta1', 0.9),
+            'adam_beta2': kwargs.get('adam_beta2', 0.999),
+            'adam_eps': kwargs.get('adam_eps', 1e-8),
+            'clip_grad': kwargs.get('clip_grad', 1.0),
+            'bf16': kwargs.get('bf16', True),
+            'use_distributed_optimizer': use_distributed_optimizer,
+            'log_num_zeros_in_grad': kwargs.get('log_num_zeros_in_grad', False),
+        }
+        # Keep the old knob only if this Megatron version still exposes it.
+        # Some branches wire it through ddp_config instead of OptimizerConfig.
+        if 'overlap_param_gather' in config_sig:
+            config_kwargs['overlap_param_gather'] = kwargs.get('overlap_param_gather', False)
+        if 'overlap_param_gather_with_optimizer_step' in config_sig:
+            config_kwargs['overlap_param_gather_with_optimizer_step'] = kwargs.get(
+                'overlap_param_gather_with_optimizer_step', kwargs.get('overlap_param_gather', False))
+        for key, value in kwargs.items():
+            if key in config_sig and key not in config_kwargs:
+                config_kwargs[key] = value
+
+        opt_config = OptimizerConfig(**config_kwargs)


The logic for constructing OptimizerConfig has been refactored to dynamically build the keyword arguments based on the signature of OptimizerConfig. This is a great improvement for ensuring compatibility across different versions of megatron-core which may have different constructor arguments.

Copilot

Pull request overview

Adapts Twinkle’s Megatron backend to run on Ascend NPU via MindSpeed, fixes a MoE sharding correctness issue for older megatron-core, and adds NPU Megatron cookbook + documentation.

Changes:

Bootstrap MindSpeed early on NPU and propagate a MindSpeed-compatible runtime Namespace.
Fix MoE expert tensor-parallel shape mismatch for megatron-core<0.13 by tying ETP to TP.
Add NPU-specific distributed/ops workarounds (Gloo object gather, unfused softmax) plus verified cookbook scripts and docs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/twinkle/utils/framework.py	Switch metric/object gather to Megatron’s DP Gloo group on NPU to avoid HCCL hangs.
src/twinkle/model/megatron/model/register.py	Lazily import/register Megatron model metadata to avoid early `megatron.core` imports.
src/twinkle/model/megatron/model/init.py	Stop eager registry imports; re-export lazy registry helper.
src/twinkle/model/megatron/mindspeed_bootstrap.py	Build/install MindSpeed runtime args and manage (re)patching on config signature changes.
src/twinkle/model/megatron/megatron.py	Invoke MindSpeed bootstrap before Megatron init; add NPU-safe label masking; enable Gloo groups; adjust optimizer wiring.
src/twinkle/model/megatron/args.py	Add `megatron-core<0.13` ETP fix; NPU softmax fusion fallback; LoRA finalize monkey-patch for None grads.
src/twinkle/model/megatron/_mindspeed_args.py	Construct merged MindSpeed `Namespace` from defaults + HF config + Twinkle topology.
src/twinkle/model/megatron/init.py	Import MindSpeed adaptor early on NPU so patches apply before `megatron.core` binds symbols.
docs/source_en/Usage Guide/NPU-Support.md	Document MindSpeed/Megatron setup and add verified Megatron NPU examples.
cookbook/megatron/npu/tp_moe_lora_npu.sh	Add torchrun launcher for NPU Megatron MoE LoRA smoke.
cookbook/megatron/npu/tp_moe_lora_npu.py	Add verified 8-card NPU Megatron MoE LoRA smoke script.
cookbook/megatron/npu/tp_lora_npu.sh	Add torchrun launcher for NPU Megatron LoRA smoke.
cookbook/megatron/npu/tp_lora_npu.py	Add verified 8-card NPU Megatron LoRA smoke script.

Copilot · 2026-03-27T02:25:25Z

src/twinkle/model/megatron/mindspeed_bootstrap.py

+        if _LAST_REPATCH_SIGNATURE is not None:
+            adaptor.repatch(vars(runtime_args).copy())


The current repatch logic skips calling adaptor.repatch(...) on the first bootstrap (_LAST_REPATCH_SIGNATURE is None). This becomes a correctness issue because src/twinkle/model/megatron/__init__.py imports mindspeed.megatron_adaptor before Twinkle installs runtime args, so MindSpeed can patch using defaults and never get updated on the first real runtime config. Fix by ensuring repatch() is invoked at least once after _set_runtime_args(...) when the adaptor may already be imported (e.g., call repatch() whenever signature != _LAST_REPATCH_SIGNATURE, including the None case, or explicitly detect pre-import via sys.modules and repatch once).

Suggested change

if _LAST_REPATCH_SIGNATURE is not None:

adaptor.repatch(vars(runtime_args).copy())

adaptor.repatch(vars(runtime_args).copy())

Copilot · 2026-03-27T02:25:25Z

src/twinkle/model/megatron/megatron.py

        optimizer = get_megatron_optimizer(
            config=opt_config,
            model_chunks=model_chunks,
+            use_gloo_process_groups=use_gloo_process_groups,
        )


use_gloo_process_groups is passed unconditionally into get_megatron_optimizer(...). On older/newer Megatron-Core versions where this kwarg is absent or renamed, this will raise TypeError at runtime. Consider guarding this with a signature check (similar to the OptimizerConfig handling above) and only passing the kwarg when supported, otherwise fall back to the default call.

Suggested change

optimizer = get_megatron_optimizer(

config=opt_config,

model_chunks=model_chunks,

use_gloo_process_groups=use_gloo_process_groups,

)

optimizer_sig = inspect.signature(get_megatron_optimizer).parameters

if 'use_gloo_process_groups' in optimizer_sig:

optimizer = get_megatron_optimizer(

config=opt_config,

model_chunks=model_chunks,

use_gloo_process_groups=use_gloo_process_groups,

)

else:

optimizer = get_megatron_optimizer(

config=opt_config,

model_chunks=model_chunks,

)

Copilot · 2026-03-27T02:25:26Z

src/twinkle/model/megatron/args.py

+        # The registry import chain can pull in megatron.core, which must stay
+        # behind the MindSpeed bootstrap on NPU.
+        from .model.constant import MLLMModelType
+        is_multimodal = model_type in {value for key, value in vars(MLLMModelType).items() if not key.startswith('_')}


This membership test is likely incorrect if MLLMModelType is an Enum (common for *ModelType constants): vars(MLLMModelType) yields enum members as values (e.g., MLLMModelType.X), so comparing a str (model_type) to enum instances will always be false. Prefer iterating members and comparing against their .value (or, if it’s a plain constants container, explicitly compare against string constants only). This impacts is_multimodal and can misconfigure multimodal vs. text-only model construction.

Suggested change

is_multimodal = model_type in {value for key, value in vars(MLLMModelType).items() if not key.startswith('_')}

# Support both Enum-based and constant-based definitions of MLLMModelType.

if hasattr(MLLMModelType, "__members__"):

# Enum: compare against the .value of each enum member.

_multimodal_model_types = {member.value for member in MLLMModelType.__members__.values()}

else:

# Fallback: compare against string constants defined on MLLMModelType.

_multimodal_model_types = {

value

for key, value in vars(MLLMModelType).items()

if not key.startswith('_') and isinstance(value, str)

}

is_multimodal = model_type in _multimodal_model_types

Copilot · 2026-03-27T02:25:26Z

src/twinkle/model/megatron/args.py

+        # The registry import chain can pull in megatron.core, which must stay
+        # behind the MindSpeed bootstrap on NPU.
+        from .model.constant import MLLMModelType
+        is_multimodal = model_type in {value for key, value in vars(MLLMModelType).items() if not key.startswith('_')}


This membership test is likely incorrect if MLLMModelType is an Enum (common for *ModelType constants): vars(MLLMModelType) yields enum members as values (e.g., MLLMModelType.X), so comparing a str (model_type) to enum instances will always be false. Prefer iterating members and comparing against their .value (or, if it’s a plain constants container, explicitly compare against string constants only). This impacts is_multimodal and can misconfigure multimodal vs. text-only model construction.

Copilot · 2026-03-27T02:25:26Z

src/twinkle/model/megatron/args.py

+            from megatron.core import parallel_state
            from megatron.core.distributed import DistributedDataParallel as MegatronDDP
+            from megatron.core.distributed.finalize_model_grads import (_get_main_grad_attr, _reshard_if_dtensor,
+                                                                        _unshard_if_dtensor, get_attr_wrapped_model)


parallel_state and the imports from megatron.core.distributed.finalize_model_grads are not used in finalize_model_grads_for_lora (the function later imports the module via importlib and only swaps _allreduce_word_embedding_grads). Removing unused imports reduces import-time side effects and makes the intent of the monkey-patch clearer.

Suggested change

from megatron.core import parallel_state

from megatron.core.distributed import DistributedDataParallel as MegatronDDP

from megatron.core.distributed.finalize_model_grads import (_get_main_grad_attr, _reshard_if_dtensor,

_unshard_if_dtensor, get_attr_wrapped_model)

from megatron.core.distributed import DistributedDataParallel as MegatronDDP

Copilot · 2026-03-27T02:25:26Z

src/twinkle/utils/framework.py

+            if Platform.device_prefix() == 'npu':
+                try:
+                    from megatron.core import parallel_state as mpu
+
+                    process_group = mpu.get_data_parallel_group_gloo(
+                        with_context_parallel=getattr(device_mesh, 'cp_world_size', 1) > 1)
+                except Exception:
+                    pass


The broad except Exception: pass will silently fall back to the default process group on NPU, which (per the comment) can hang in dist.all_gather_object(...). To avoid hard-to-debug metric hangs, narrow the exception types (e.g., ImportError, AttributeError) and emit a warning when the fallback happens so operators can see why the safer Gloo path was not used.

Copilot · 2026-03-27T02:25:27Z

cookbook/megatron/npu/tp_lora_npu.sh

+
+export PYTHONPATH="${MEGATRON_LM_PATH}:${PYTHONPATH:-}"


The launcher only prepends MEGATRON_LM_PATH to PYTHONPATH, but the docs also instruct adding Twinkle’s src path when running from source. Either update the script to optionally include TWINKLE_SRC_PATH (when provided) or adjust the docs around these scripts to clarify that Twinkle must already be installed in the environment.

Suggested change

export PYTHONPATH="${MEGATRON_LM_PATH}:${PYTHONPATH:-}"

TWINKLE_SRC_PATH=${TWINKLE_SRC_PATH:-}

if [ -n "${TWINKLE_SRC_PATH}" ]; then

export PYTHONPATH="${TWINKLE_SRC_PATH}:${MEGATRON_LM_PATH}:${PYTHONPATH:-}"

else

export PYTHONPATH="${MEGATRON_LM_PATH}:${PYTHONPATH:-}"

fi

Copilot · 2026-03-27T02:25:27Z

cookbook/megatron/npu/tp_lora_npu.py

+    model = MegatronModel(model_id=MODEL_ID)
+    lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')
+    model.add_adapter_to_model('default', lora_config)
+    model.set_optimizer(optimizer_cls='default', lr=1e-4)


This hardcodes lr=1e-4 while the MoE smoke script makes LR configurable via env var. For consistency across cookbook examples (and easier reproduction/tuning), consider reading LR from an environment variable (similar to the other script) or defining a module-level LR constant used here.

addsubmuldiv added 2 commits March 26, 2026 22:47

adapt megatron for NPU and fix ETP issues

28b3bb2

update cookbook and doc

09c6202

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

Merge branch 'main' into npu_adapt_doc

1b2ab01

addsubmuldiv changed the title ~~Npu adapt doc~~ Npu megatron adapt Mar 26, 2026

fix lint

ff7464a

addsubmuldiv marked this pull request as ready for review March 27, 2026 02:22

Copilot AI review requested due to automatic review settings March 27, 2026 02:22

Copilot AI reviewed Mar 27, 2026

View reviewed changes

addsubmuldiv added 2 commits March 27, 2026 10:32

fix doc

f4e11b0

update sh scripts

cdaedde

		ASCEND_RT_VISIBLE_DEVICES="${ASCEND_RT_VISIBLE_DEVICES}" \
		torchrun --nproc_per_node=8 cookbook/megatron/npu/tp_lora_npu.py

	> Tip: `cookbook/megatron/tp.sh` and `cookbook/megatron/tp_moe.sh` already include automatic PYTHONPATH configuration. You can use these scripts directly to launch training without manual setup. Default paths can be overridden via the `TWINKLE_SRC_PATH` and `MEGATRON_LM_PATH` environment variables.
	> Tip: The example scripts in `cookbook/megatron/npu/` (e.g., `tp_lora_npu.sh`) already include automatic PYTHONPATH configuration for Megatron-LM. You can use these scripts directly to launch training without manual setup. The default path can be overridden via the `MEGATRON_LM_PATH` environment variable.

	Note: MoE models do not currently support LoRA fine-tuning (Expert LoRA is not available when ETP>1).
	Note: When using LoRA with MoE models, Expert LoRA is only supported when ETP=1. The provided MoE LoRA example uses a verified topology that respects this constraint.

		if _LAST_REPATCH_SIGNATURE is not None:
		adaptor.repatch(vars(runtime_args).copy())

-        is_multimodal = model_type in {value for key, value in vars(MLLMModelType).items() if not key.startswith('_')}
+        # Support both Enum-based and constant-based definitions of MLLMModelType.
+        if hasattr(MLLMModelType, "__members__"):
+            # Enum: compare against the .value of each enum member.
+            _multimodal_model_types = {member.value for member in MLLMModelType.__members__.values()}
+        else:
+            # Fallback: compare against string constants defined on MLLMModelType.
+            _multimodal_model_types = {
+                value
+                for key, value in vars(MLLMModelType).items()
+                if not key.startswith('_') and isinstance(value, str)
+            }
+        is_multimodal = model_type in _multimodal_model_types

-export PYTHONPATH="${MEGATRON_LM_PATH}:${PYTHONPATH:-}"
+TWINKLE_SRC_PATH=${TWINKLE_SRC_PATH:-}
+if [ -n "${TWINKLE_SRC_PATH}" ]; then
+    export PYTHONPATH="${TWINKLE_SRC_PATH}:${MEGATRON_LM_PATH}:${PYTHONPATH:-}"
+else
+    export PYTHONPATH="${MEGATRON_LM_PATH}:${PYTHONPATH:-}"
+fi

Conversation

addsubmuldiv commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Summary

Changes

Verification

Experiment results

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

addsubmuldiv commented Mar 26, 2026 •

edited

Loading