Skip to content

Rescale the load balancing loss.#14

Merged
MasterJH5574 merged 1 commit intomlc-ai:mainfrom
haok1402:0404-rescale-lb-loss
Apr 5, 2026
Merged

Rescale the load balancing loss.#14
MasterJH5574 merged 1 commit intomlc-ai:mainfrom
haok1402:0404-rescale-lb-loss

Conversation

@haok1402
Copy link
Copy Markdown
Collaborator

@haok1402 haok1402 commented Apr 5, 2026

Consistent with Megatron-LM, when we report the load balancing loss, we rescale so a value of 1.0 represents perfect balance, which allows "interpretable" comparison across different training runs.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the MoE load balance loss calculation in pretrain_language_model.py by dividing the tracked loss by the load balance coefficient. This change ensures the logged metric follows the Megatron-LM convention, where a value of 1.0 represents perfect balance. I have no feedback to provide.

@haok1402 haok1402 force-pushed the 0404-rescale-lb-loss branch from e230769 to 67a0a65 Compare April 5, 2026 15:47
Copy link
Copy Markdown
Member

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MasterJH5574 MasterJH5574 merged commit db5c7cc into mlc-ai:main Apr 5, 2026
1 check passed
@haok1402 haok1402 deleted the 0404-rescale-lb-loss branch April 7, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants