Record: Int6 QAT + SmearGate + Muon WD (val_bpb=1.1669) by baudrillardsgh0st · Pull Request #170 · openai/parameter-golf

baudrillardsgh0st · 2026-03-20T04:59:14Z

Summary

val_bpb: 1.1669 — beats current SOTA (1.1748) by 0.0079
Artifact: 14.77 MB — well under 16MB cap
9L/512dim/8H/4KV, 21.8M params, trained 9706 steps @ 61.8ms/step on 8×H100 SXM

Key Techniques

Int6 QAT (Quantization-Aware Training): STE fake int6 quantization during forward pass with per-row symmetric scaling. Eliminates post-quant degradation without needing fp16 late-K layer passthrough.
Int6-in-Int8 zstd22 compression: Store int6 values (-32 to 31) in int8 containers — zstd-22 compresses the restricted value range ~35%. Achieves 14.77MB from 21.8M params. (Bit-packing int6 values destroys byte alignment and defeats compressors.)
SmearGate: ~513-param learned gate blending current + previous token embedding. Zero-initialized, very low LR. Provides cheap bigram context at the embedding layer.
Decoupled Muon weight decay (0.01): Applied in the Muon optimizer step for improved generalization and quantization robustness.
Sliding window evaluation (stride=64, batch=32 seqs): Full-context scoring at every token position.
FP16 tied embedding passthrough: Avoids compounding int6 errors through both input/output paths.

Results

Seed	val_loss	val_bpb	Steps	ms/step
1337	1.9703	1.1669	9706	61.80

Artifact Size

Component	Bytes
Model (int6+zstd22)	14,696,046
Code	71,909
Total	14,767,955

Test plan

Trained on 8×H100 SXM in under 10 minutes (600s wallclock)
Artifact under 16MB (14.77MB)
val_bpb improvement over SOTA exceeds 0.005 threshold (0.0079)
train.log included with full training output
train_gpt.py runs standalone within the records folder

9L 512dim int6 QAT with STE, SmearGate, Muon weight decay 0.01, int6-in-int8 zstd22 compression. 14.77MB artifact, 9706 steps @ 61.8ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

11-layer GPT with int6 QAT, SmearGate, and decoupled Muon weight decay 0.038. Artifact: 15.50MB (int6+zstd-22). Single seed, 7723 steps at 77ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Record: Int6 QAT + SmearGate + Muon WD (val_bpb=1.1669)

164befc

9L 512dim int6 QAT with STE, SmearGate, Muon weight decay 0.01, int6-in-int8 zstd22 compression. 14.77MB artifact, 9706 steps @ 61.8ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Record: 11L Int6 QAT + SmearGate + WD 0.038 (val_bpb=1.1502)

dcac9b5

11-layer GPT with int6 QAT, SmearGate, and decoupled Muon weight decay 0.038. Artifact: 15.50MB (int6+zstd-22). Single seed, 7723 steps at 77ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Int6 QAT + SmearGate + Muon WD (val_bpb=1.1669)#170

Record: Int6 QAT + SmearGate + Muon WD (val_bpb=1.1669)#170
baudrillardsgh0st wants to merge 2 commits intoopenai:mainfrom
baudrillardsgh0st:submit/int6-qat-smeargate-muonwd

baudrillardsgh0st commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

baudrillardsgh0st commented Mar 20, 2026

Summary

Key Techniques

Results

Artifact Size

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant