Skip to content

Record: sliding eval, FP16 tied embeddings, 10 layers, Muon WD 0.02, overtone init, and phase-transition residual mixing. (val_bpb 1.1876)#155

Open
peytontolbert wants to merge 2 commits intoopenai:mainfrom
peytontolbert:submission/top_recipe_10l_8xh100
Open

Record: sliding eval, FP16 tied embeddings, 10 layers, Muon WD 0.02, overtone init, and phase-transition residual mixing. (val_bpb 1.1876)#155
peytontolbert wants to merge 2 commits intoopenai:mainfrom
peytontolbert:submission/top_recipe_10l_8xh100

Conversation

@peytontolbert
Copy link

Summary

6 techniques stacked on the upgraded trainer recipe, achieving final_int8_zlib_roundtrip_exact val_bpb 1.18762449 on a single 8xH100 record-track run:

  • Sliding-window final evaluation with stride=64
  • FP16 tied embedding export
  • 10 transformer layers
  • Muon weight decay 0.02
  • Overtone spectral embedding initialization with power 0.5
  • Phase-transition residual-mix initialization

Results

Seed Steps val_bpb (pre-quant) val_bpb (post-quant) Artifact size
1337 8,260 1.2202 1.18762449 15,842,628
  • Pre-quant eval at stop: val_loss:2.0602, val_bpb:1.2202
  • Post-quant eval: val_loss:2.00525163, val_bpb:1.18762449
  • Train wallclock: 599846ms
  • Final eval time: 118402ms
  • Hardware: 8x H100 80GB
  • Code size: 128619 bytes
  • Trainer used: upgraded trainer copied into the record folder as train_gpt.py

Test plan

  • 1 record-track run on 8x H100 80GB
  • Training completed under 600s wallclock: 599846ms
  • Final artifact stayed under 16MB: 15,842,628 bytes total
  • Post-quant roundtrip validation matches the printed exact metric
  • Final sliding-window evaluation completed in 118402ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant