Skip to content

Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045)#151

Open
mrdavtan wants to merge 1 commit intoopenai:mainfrom
mrdavtan:kitchen-sink-9L-pr
Open

Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045)#151
mrdavtan wants to merge 1 commit intoopenai:mainfrom
mrdavtan:kitchen-sink-9L-pr

Conversation

@mrdavtan
Copy link

Summary

Composition of proven community techniques on the 9L×512d baseline:

val_bpb: 1.2045 — beats the naive baseline (1.2244) by 0.020 BPB.

Hardware limitation

This result is significantly constrained by RunPod node speed: 70ms/step vs the typical 44ms reported by other entries. This yielded only 8,528 training steps vs the ~13,600 that standard hardware would produce. We estimate ~1.185 on standard hardware with the same configuration.

Key metrics

Metric Value
val_bpb (sliding window, doc-isolated) 1.2045
Pre-quant val_bpb 1.2154
Artifact size 15,912,648 bytes
Steps 8,528 (wallclock-limited at 70ms/step)
Eval time 43s

Acknowledgments

All techniques from community entries: @SamuelLarson (#65), @mattqlf (#50), @samacquaviva (#77), @spokane-way (#49), @rsavitt (#128).

Built with Claude Code

Test plan

  • Artifact under 16,000,000 bytes (15,912,648)
  • Training completes within 600s wallclock cap
  • Eval completes within 600s eval budget (43s)
  • Training log included

…ted eval (val_bpb=1.2045)

Composition of proven community techniques on 9L baseline.
Score limited by RunPod node speed (70ms/step vs typical 44ms,
yielding 8528 steps vs ~13600). Expect ~1.185 on standard hardware.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant