-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Description
According to the following paper, using Polar Express instead of Newton-Schultz in Muon leads to improvements in the validation loss.
https://arxiv.org/abs/2505.16932
It was also used in Karpathy's NanoChat repo to train a GPT-2 equivalent model for less than $100 (karpathy/nanochat#481)
It might be worthwhile to add it here.
Official implementation: https://github.com/NoahAmsel/PolarExpress/blob/main/polar_express.py
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels