Is your feature request related to a problem? Please describe.
#3058, #3715 introduces MuP into Megatron-LM with support for Muon.
MuP allows efficient and reliable hyperparameter (esp. LR) transfer from narrow to wide networks, for the same depth.
There are a series of papers in this field, that allow transfer from shallow to deeper networks, etc, which are essential for a good pretraining scaling recipe.
A new paper from Microsoft, HyperP, claims to be "the first framework for transferring optimal learning rates across model width, depth, training tokens, and Mixture-of-Experts (MoE) granularity under the Frobenius-sphere constraint with the Muon optimizer."
I suggest we integrate HyperP into Megatron-LM.
Link: Rethinking Language Model Scaling under Transferable Hypersphere Optimization, Ren et al, 2026.
Tag the @mcore-oncall
to get oncall's attention to this issue.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
#3058, #3715 introduces MuP into Megatron-LM with support for Muon.
MuP allows efficient and reliable hyperparameter (esp. LR) transfer from narrow to wide networks, for the same depth.
There are a series of papers in this field, that allow transfer from shallow to deeper networks, etc, which are essential for a good pretraining scaling recipe.
A new paper from Microsoft, HyperP, claims to be "the first framework for transferring optimal learning rates across model width, depth, training tokens, and Mixture-of-Experts (MoE) granularity under the Frobenius-sphere constraint with the Muon optimizer."
I suggest we integrate HyperP into Megatron-LM.
Link: Rethinking Language Model Scaling under Transferable Hypersphere Optimization, Ren et al, 2026.
Tag the @mcore-oncall
to get oncall's attention to this issue.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.