Unofficial: Interactive dashboard visualizing all 352 submissions #747
Replies: 6 comments 6 replies
-
Dashboard Update — v4 (March 31, 2026)Major update to the dashboard: Data:
New sections:
Chart fixes:
Live dashboard: https://nathanmaine.github.io/parameter-golf-experiment-lab/ PRs for all 7 research directions: #1191, #1192, #1193, #1194, #1195, #1196, #1197 |
Beta Was this translation helpful? Give feedback.
-
|
@Ribin545 Glad the dashboard helped! I needed it as well many times! The RTX 3090 is solid hitting 1.84 what are you getting now without it? |
Beta Was this translation helpful? Give feedback.
-
Dashboard Update - v5 (April 2, 2026)Major update focused on data completeness and TTT legality filtering. Data:
New: TTT Legality Filtering Following the TTT legality discussion in issue #402 and the rulings from @0hq and @valerio-oai, the leaderboard now includes TTT compliance classification:
The leaderboard now defaults to "Legal Only" so the realistic competition state is visible immediately. All submissions are still accessible via the filter dropdown or search bar. A disclaimer block under the leaderboard heading explains that this classification is our best interpretation of the current rules and may not be 100% accurate. @0hq @valerio-oai - if any of these classifications are off, happy to adjust. If you think your submission is miscategorized, open an issue on the dashboard repo or let me know here. Why this matters: Without filtering, the top ~33 submissions are dominated by n-gram cache approaches scoring below 0.5 BPB. Many of these have been closed by organizers. The "Legal Only" view shows the actual state of the neural modeling competition, where the real innovation is happening in the 1.05-1.12 BPB range. Other updates:
Live dashboard: https://nathanmaine.github.io/parameter-golf-experiment-lab/ |
Beta Was this translation helpful? Give feedback.
-
Dashboard Update - v8 (April 3, 2026)This is NOT an official OpenAI resource. This dashboard is an independent, unofficial project by one participant. All classifications (Legal/Illegal/Suspect) are our best interpretation of the current rules based on issue #402 and the illegal submissions megathread #677. They may not be 100% accurate. Data:
New filters (per community feedback from @samquiring):
Updated sections:
New record attempts submitted:
Both are pure neural submissions with no n-gram cache, no multi-epoch TTT. Full details and reproduction commands in the PRs. Corrections welcome: If your submission's size, BPB, or legality status is showing incorrectly, open an issue at https://github.com/NathanMaine/parameter-golf-experiment-lab/issues or comment here. Live dashboard: https://nathanmaine.github.io/parameter-golf-experiment-lab/ |
Beta Was this translation helpful? Give feedback.
-
Dashboard Update - v9 (April 6, 2026)This is NOT an official OpenAI resource. Independent, unofficial project by one participant. Data
Bug Fixes
New: DGX Spark PROTEUS Ablation Data
New: SLOT Legality Context
All Sections Updated
New Key Discoveries (Section 7)
Competition Landscape
Feedback welcome - open an issue on the dashboard repo. |
Beta Was this translation helpful? Give feedback.
-
Dashboard Update - v11 (April 20, 2026) - Correcting v10This is NOT an official OpenAI resource. Independent, unofficial project by one participant. Why this is a correctionI missed the byte-LUT bug discovery that broke between April 15-19 when I posted v10 earlier this morning. The The byte-LUT bug, in one paragraphThe GDN-family submissions inherited a helper that double-counts the leading-space byte: # build_sentencepiece_luts
if piece.startswith("▁"):
base_bytes[i] = len(piece[1:].encode("utf-8")) + 1 # the +1 double-countsThe What closed because of the bug
yahya010 published the canonical check on #1734 and wrote: "canonical would not beat the merged-SOTA threshold. #1727 (val_bpb 1.07217 on PR #1700's non-buggy base) remains." #1698 arsenis-cmd (1.00995) was flagged by @dexhunter on April 19 for the same bug and is still open pending the author's response. Data
The actual open legal frontierAfter filtering out byte-buggy inheritance, pending SLOT submissions (Issue #1336), and non-records, these are the top open records under 1.08:
Pending-legality submissions that would rank higher if SLOT is ruled legal:
The canonical winning recipe right nowCustom casefold-style tokenizer (Casefold V4 / CaseOps) + Multi-Phase Global SGD TTT (legal single-pass) + Parallel Residuals + 3-Layer Depth Recurrence + Muon 0.97 + GPTQ int6 + Brotli-11 compression. Landing consistently in the 1.057-1.072 range across independent teams. Tokenizer experimentation is doing more work than most people expected. Casefold V4 (dexhunter), CaseOps (romeerp), and SP8192 variants are producing the tightest numbers on canonical scoring. My own position, honestlyAmong open records with BPB between 1.00 and 1.1048, my submissions rank:
There are roughly 60-80 open legal-looking submissions between mine and the real frontier. The "2nd place" framing I floated earlier today was wrong - it only held if you count merged PRs (only 2 record-track merges exist) and ignore the large population of open submissions. The real leaderboard includes open submissions and I am well back of it. My compliance cleanup (unchanged from v10)On April 11-13 @MatoTeziTanka flagged PR #1193 (Universal Transformer) and PR #406 (SDTTT) for multi-epoch TTT on val_tokens. I audited my other submissions and found the same pattern in PR #1127. Retracted three:
Compliance notes added to #948 and #968 (Dirichlet n-gram) for the hashed-ngram-cache class dispute on Issue #402. 7 research architecture PRs independently clearedOn April 12 @MatoTeziTanka gave "LOOKS CLEAN - MERGE" community review to #1191, #1192, #1193, #1194, #1195, #1196, #1197 plus the clean resubmission #1554. Tagged to maintainers. Historical merge rate on this repo is ~3% so I'm not expecting action, but the code and data are there. ArchiveFull-fidelity archive covers #1 through #1758 at 2.3 GB local with NAS mirror. 1,683 PR directories, each with metadata, body, all comments, file diff list, and every file from the PR. Scripts are documented on the dashboard repo for anyone who wants to build on it. Dashboard changes since v10
New Key Discoveries (Section 7)
Corrections welcome. If your submission's BPB or status is miscategorized, or if I have framed any of the above wrong, open an issue at https://github.com/NathanMaine/parameter-golf-experiment-lab/issues or comment here. Live dashboard: https://nathanmaine.github.io/parameter-golf-experiment-lab/ Good luck in the final 10 days. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I built an interactive dashboard that visualizes data from all 352 submissions with BPB scores:
Live Dashboard →
What you can do:
Data includes:
Also includes a technique effectiveness matrix showing what worked and what didn't across 46+ experiments, plus cost analysis for anyone budgeting their RunPod spend.
Source: github.com/NathanMaine/parameter-golf-experiment-lab
If your submission data looks wrong, let me know — happy to fix it. The data was pulled from submission.json files as of March 24.
Good luck everyone! 🏌️
Beta Was this translation helpful? Give feedback.
All reactions