Skip to content

Conversation

@wenxie-amd
Copy link
Contributor

@wenxie-amd wenxie-amd commented Jan 6, 2026

Pull request overview

Overview

This pull request addresses several issues related to running Primus Megatron on the NVIDIA platform.

Key changes:

  • Added null safety checks before using args.use_rocm_mem_info_iters in iteration membership tests
  • Fixed PYTHONPATH construction by removing an erroneous $: token and reordering path components

Required Parameter Adjustments

To run Primus-Megatron on NVIDIA GPUs, please ensure to set the following parameters when running Primus Megatron:

  • --use_rocm_mem_info False
  • --use_rocm_mem_info_iters None

Copilot AI review requested due to automatic review settings January 6, 2026 09:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a training issue on NVIDIA GPUs by adding null checks for the args.use_rocm_mem_info_iters parameter and correcting a malformed PYTHONPATH configuration.

Key changes:

  • Added null safety checks before using args.use_rocm_mem_info_iters in iteration membership tests
  • Fixed PYTHONPATH construction by removing an erroneous $: token and reordering path components

Reviewed changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated no comments.

File Description
primus/modules/trainer/megatron/trainer.py Added null checks for use_rocm_mem_info_iters parameter in three locations to prevent errors when the parameter is not set
examples/run_pretrain.sh Fixed PYTHONPATH construction by removing invalid $: syntax and reordering paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Xiaoming-AMD Xiaoming-AMD merged commit 7fc709b into main Jan 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants