Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
1a4a873
muon torch vanilla DP
Niccolo-Ajroldi Sep 30, 2025
4462202
add muon dampening hyperparam
Niccolo-Ajroldi Oct 3, 2025
87c64dd
cleanup: split muon-adam
Niccolo-Ajroldi Oct 3, 2025
81f66b7
rename MuonVanilla, add MuonKJ, add MuonBucketed, custom reduce scatter
Niccolo-Ajroldi Oct 6, 2025
b2326ff
add tests
Niccolo-Ajroldi Oct 6, 2025
5e914e0
add diagrams
Niccolo-Ajroldi Oct 7, 2025
98c5585
Add files via upload
Niccolo-Ajroldi Oct 7, 2025
2dd75e1
Add files via upload
Niccolo-Ajroldi Oct 7, 2025
cccffe0
add utils, polished, add param aplitting
Niccolo-Ajroldi Oct 7, 2025
9ff8184
moved diagrams
Niccolo-Ajroldi Oct 8, 2025
3926947
enable AdamW fused optim
Niccolo-Ajroldi Oct 8, 2025
ec5d1a7
Add files via upload
Niccolo-Ajroldi Oct 9, 2025
3798e2e
Add files via upload
Niccolo-Ajroldi Oct 9, 2025
8895419
Add files via upload
Niccolo-Ajroldi Oct 9, 2025
4b291ed
cleaned diagrams
Niccolo-Ajroldi Oct 9, 2025
66c0385
Add files via upload
Niccolo-Ajroldi Oct 12, 2025
231c5dd
fix ReduceScatter
Niccolo-Ajroldi Oct 12, 2025
9c865e5
Merge branch 'muon_torch' of github.com:Niccolo-Ajroldi/submissions_a…
Niccolo-Ajroldi Oct 12, 2025
6c61acc
Add files via upload
Niccolo-Ajroldi Oct 16, 2025
4618f38
removed dampening, use adam style ema
Niccolo-Ajroldi Oct 22, 2025
eb0f779
separate lr and wd for muon adam
Niccolo-Ajroldi Oct 22, 2025
3ad574f
separate lr and wd for muon adam
Niccolo-Ajroldi Oct 22, 2025
8649960
format
Niccolo-Ajroldi Oct 22, 2025
d28b9b6
Merge branch 'muon_torch' of github.com:Niccolo-Ajroldi/submissions_a…
Niccolo-Ajroldi Oct 22, 2025
1d5ff98
clean
Niccolo-Ajroldi Oct 22, 2025
23b198b
cleanup, def muon submission
Niccolo-Ajroldi Dec 11, 2025
fd6584d
cleanup
Niccolo-Ajroldi Dec 11, 2025
b87c93d
Add finewebedu_lm.txt with model parameters
Niccolo-Ajroldi Feb 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
Empty file.
23 changes: 23 additions & 0 deletions submissions/external_tuning/muon/pytorch/docs/criteo1tb.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
I1007 11:06:18.108439 23099690366016 utils.py:33] Muon params:
module.bot_mlp.0.weight (ndim=2)
module.bot_mlp.2.weight (ndim=2)
module.bot_mlp.4.weight (ndim=2)
module.top_mlp.0.weight (ndim=2)
module.top_mlp.2.weight (ndim=2)
module.top_mlp.4.weight (ndim=2)
module.top_mlp.6.weight (ndim=2)
module.top_mlp.9.weight (ndim=2)
I1007 11:06:18.108455 22916307100736 submission_runner.py:339] Initializing checkpoint and logger.
I1007 11:06:18.108510 23099690366016 utils.py:34] Adam params:
module.embedding_chunk_0 (ndim=2)
module.embedding_chunk_1 (ndim=2)
module.embedding_chunk_2 (ndim=2)
module.embedding_chunk_3 (ndim=2)
module.bot_mlp.0.bias (ndim=1)
module.bot_mlp.2.bias (ndim=1)
module.bot_mlp.4.bias (ndim=1)
module.top_mlp.0.bias (ndim=1)
module.top_mlp.2.bias (ndim=1)
module.top_mlp.4.bias (ndim=1)
module.top_mlp.6.bias (ndim=1)
module.top_mlp.9.bias (ndim=1)
27 changes: 27 additions & 0 deletions submissions/external_tuning/muon/pytorch/docs/fastmri.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
I1007 11:06:19.234473 22756046701632 utils.py:33] Muon params:
_orig_mod.module.down_sample_layers.0.conv_layers.0.weight (ndim=4)
_orig_mod.module.down_sample_layers.0.conv_layers.4.weight (ndim=4)
_orig_mod.module.down_sample_layers.1.conv_layers.0.weight (ndim=4)
_orig_mod.module.down_sample_layers.1.conv_layers.4.weight (ndim=4)
_orig_mod.module.down_sample_layers.2.conv_layers.0.weight (ndim=4)
_orig_mod.module.down_sample_layers.2.conv_layers.4.weight (ndim=4)
_orig_mod.module.down_sample_layers.3.conv_layers.0.weight (ndim=4)
_orig_mod.module.down_sample_layers.3.conv_layers.4.weight (ndim=4)
_orig_mod.module.conv.conv_layers.0.weight (ndim=4)
_orig_mod.module.conv.conv_layers.4.weight (ndim=4)
_orig_mod.module.up_conv.0.conv_layers.0.weight (ndim=4)
_orig_mod.module.up_conv.0.conv_layers.4.weight (ndim=4)
_orig_mod.module.up_conv.1.conv_layers.0.weight (ndim=4)
_orig_mod.module.up_conv.1.conv_layers.4.weight (ndim=4)
_orig_mod.module.up_conv.2.conv_layers.0.weight (ndim=4)
_orig_mod.module.up_conv.2.conv_layers.4.weight (ndim=4)
_orig_mod.module.up_conv.3.0.conv_layers.0.weight (ndim=4)
_orig_mod.module.up_conv.3.0.conv_layers.4.weight (ndim=4)
_orig_mod.module.up_conv.3.1.weight (ndim=4)
_orig_mod.module.up_transpose_conv.0.layers.0.weight (ndim=4)
_orig_mod.module.up_transpose_conv.1.layers.0.weight (ndim=4)
_orig_mod.module.up_transpose_conv.2.layers.0.weight (ndim=4)
_orig_mod.module.up_transpose_conv.3.layers.0.weight (ndim=4)
I1007 11:06:19.234533 23057129976896 utils.py:34] Adam params:
_orig_mod.module.up_conv.3.1.bias (ndim=1)
I1007 11:06:19.234585 22438544032832 submission_runner.py:339] Initializing checkpoint and logger.
Loading
Loading