Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
fd07d19
refactor(leaderboard): scaffold v2 UI with module split and Home page
JuhaoLiang1997 May 15, 2026
17f4eaf
chore: ignore local .handoff-*.md notes
JuhaoLiang1997 May 15, 2026
453f6a2
feat(leaderboard): redesign Home in OpenCompass-style cards + theme
JuhaoLiang1997 May 15, 2026
fd72df0
style(leaderboard): editorial palette — per-suite color, serif h1, wa…
JuhaoLiang1997 May 15, 2026
3f206b7
style(leaderboard): center hero, 4+3 grid, fw version + submitter, ti…
JuhaoLiang1997 May 15, 2026
3aab55e
feat(leaderboard): new hero, richer suite headers, vendor coverage se…
JuhaoLiang1997 May 15, 2026
b324452
feat(leaderboard): home v2 — chip cloud, submit CTA, formal metric la…
JuhaoLiang1997 May 15, 2026
105f8a4
feat(suites): full explainer — roofline argument, scenarios catalog, …
JuhaoLiang1997 May 15, 2026
24b8228
feat(leaderboard): rankings + compare + chip-detail v2 with run-detai…
JuhaoLiang1997 May 15, 2026
e8e97c8
feat(leaderboard): chip-detail merge, vendor data-driven, anchor pill…
JuhaoLiang1997 May 15, 2026
19238a1
feat(leaderboard): suites disclosure, compare share link, home recent…
JuhaoLiang1997 May 15, 2026
2f37c9e
feat(leaderboard): chip-detail share button, shared copy helpers, key…
JuhaoLiang1997 May 15, 2026
efdb9f2
feat(chip-detail): performance fingerprint radar + suiteFingerprint h…
JuhaoLiang1997 May 15, 2026
93962c0
docs(contributing): how your result appears on the leaderboard
JuhaoLiang1997 May 15, 2026
e2f4b44
feat(rankings): hide-empty-columns toggle + dual-layer focus halo
JuhaoLiang1997 May 15, 2026
c1be6b4
feat(chip-detail): chip-count scaling section + chipCountScaling helper
JuhaoLiang1997 May 15, 2026
728a5fe
feat(leaderboard): download every Chart.js chart as PNG
JuhaoLiang1997 May 15, 2026
562433b
fix(chip-detail): resolve chart slug from URL hash, not stale closure
JuhaoLiang1997 May 15, 2026
7f9cdd2
chore(leaderboard): drop dead exports, dead CSS, and unused dataset
JuhaoLiang1997 May 15, 2026
20063bd
fix(leaderboard): defensive guards for malformed viz + stale compare …
JuhaoLiang1997 May 15, 2026
2aff794
test(leaderboard): make clipboard test pass on Node 20 (CI runtime)
JuhaoLiang1997 May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion .github/workflows/validate_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ on:
- 'schema/platforms.json'
- 'tools/generate_platforms_matrix.py'
- 'README.md'
- 'leaderboard/site/**'

jobs:
validate-runners:
Expand Down Expand Up @@ -205,4 +206,23 @@ jobs:
issue_number: context.issue.number,
body,
});
}
}

frontend-tests:
name: Frontend unit tests (modal viz)
runs-on: ubuntu-latest
# Frontend tests don't need full git history; a shallow checkout
# plus Node 20 (which ships node:test) is enough.
steps:
- uses: actions/checkout@v4

- uses: actions/setup-node@v4
with:
node-version: '20'

# No npm install — the suite intentionally has zero runtime deps
# (a hand-written DOM stub stands in for jsdom / happy-dom). Add
# extra files to leaderboard/site/test/ to widen coverage; the
# glob below picks them up automatically.
- name: Run leaderboard frontend tests
run: node --test leaderboard/site/test/*.test.mjs
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,9 @@ mini_result/
*_backup/
backup/
/tmp/

# ── Local-only handoff notes (across-session continuity) ────────────────────
.handoff-*.md

# ── Local-only design QA screenshots ────────────────────────────────────────
.screenshots/
63 changes: 63 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,69 @@ verified result is itself reproducible by definition.

---

## How your result appears on the leaderboard

The frontend treats every submission as data — there's nothing per-vendor or per-chip hand-coded on the UI side. A few conventions are worth knowing if you want your result to look its best:

### Chip identity vs. chip count

The chip-detail page (`#/chip/<slug>`) is keyed on the **chip model** alone, not on `chip_count`. That means a single chip page aggregates every fan-out you've ever submitted (×1, ×4, ×8, ×16) into one overview, with the runs table sorted by `chip_count` ascending and the per-suite KPI card flagging the deployment behind the best score (e.g. `×8` badge next to the metric).

Implication: if your runner emits one `result.json` per chip-count, you don't need to invent fake chip names to keep them apart — submit them with the same `chip` field and they'll merge cleanly. Old `…-x<N>` URLs from before this change auto-redirect to the bare-model slug, so existing shared links keep working.

### Vendor colours

Vendor accents (chip dot, vendor pill, peer card border) are driven entirely by `assets/js/data.js`'s `VENDOR_COLORS` map. New vendors get a deterministic fallback colour from a 9-entry palette the first time they appear in the dataset — no frontend change required to ship the result.

If you want a brand-accurate accent for a new vendor (e.g. your accelerator's official colour), open a one-line PR to `VENDOR_COLORS`:

```js
export const VENDOR_COLORS = {
// …
Cerebras: "#ff6f3c", // ← add yours
};
```

`VENDOR_ORDER` (used to lay out the rankings facet pill row) is derived from `Object.keys(VENDOR_COLORS)`, so the same edit also pins your vendor's position in the brand list. Vendors not in the map are appended alphabetically after the brand-named ones.

### Optional: viz fields for richer modal charts

The run-detail modal's **Visualize** tab is hidden when `result.viz` is absent. Populate it to surface scenario-specific charts:

```jsonc
{
"viz": {
"type": "decode", // bandwidth-bound suite (A/F/G default)
"offline": {
"labels": [1, 2, 4, 8, 16, 32],
"throughput": [120, 230, 410, 760, 1100, 1380]
},
"interactive": { // optional, suite_D-style
"ttft_p50": 78, "ttft_p90": 110, "ttft_p99": 135,
"tpot_p50": 9, "tpot_p90": 11, "tpot_p99": 14
}
}
}
```

Suite-specific shapes the frontend understands today:

| `viz.type` | Suites | Required keys |
|---|---|---|
| `decode` | A · F · G | `offline.labels[]`, `offline.throughput[]` |
| `multichip` | B | `offline.labels[]`, `offline.throughput[]`, `offline.throughput_per_chip[]` |
| `quant` | C | `precisions[]`, `throughput[]`, optional `accuracy[]` |
| `longctx` | D | `offline.labels[]`, `offline.throughput[]`, `interactive.{ttft_p50,…,tpot_p99}` |
| `scaling` | E | `chip_counts[]`, `throughput[]`, `efficiency_pct[]` |

`viz` is **fully optional** — runners that don't emit it still get a clean Details / Implementation modal. When present, the same fields drive the per-suite head-to-head charts on the Compare page (so two basket runs render directly comparable visualisations instead of falling back to a metric-table-only view).

### Submitter handle

The leaderboard surfaces the value of `meta.submitted_by` as `@<handle>` next to your result on every list (home recent, suite cards, chip-detail submissions table). Anything that looks like a GitHub login, an email, or a `Name <email>` form is reduced to the local-part — see `submitterHandle` in `assets/js/utils.js`.

---

## Using local or air-gapped models

AccelMark separates the **model identifier** (used for leaderboard comparisons)
Expand Down
102 changes: 101 additions & 1 deletion leaderboard/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
python leaderboard/generate.py
"""

import hashlib
import json
import re
import statistics
Expand Down Expand Up @@ -48,6 +49,65 @@ def _get_suite_precision_required(suite_id: str) -> str:
return "BF16"


def _collect_suite_specs() -> dict:
"""Collect UI-relevant per-suite spec from suites/suite_*/suite.json.

Baked into the generated leaderboard.js as ``window.SUITE_SPECS`` so
the static leaderboard UI auto-syncs whenever a maintainer edits a
suite contract — model id, dataset, prompt distribution, scenarios
default/extra split, online SLA, etc. Editorial UI content (titles,
taglines, descriptions) stays in assets/js/data.js since it isn't a
property of the suite contract.

Returns a ``{ suite_id: spec }`` mapping with only the fields the UI
consumes. Missing fields are omitted (the JS-side merge keeps the
hardcoded fallback when a key is absent).
"""
out: dict = {}
suites_dir = Path("suites")
if not suites_dir.exists():
return out
for sd in sorted(suites_dir.iterdir()):
if not sd.is_dir():
continue
sf = sd / "suite.json"
if not sf.exists():
continue
try:
with open(sf) as f:
data = json.load(f)
except Exception:
continue
sid = data.get("suite_id") or sd.name
rd = data.get("request_distribution") or {}
scn = data.get("scenarios") or {}
spec: dict = {}
# Fields the UI displays in suite cards / specs / compare headers.
for k in (
"model_id",
"model_revision",
"dataset",
"precision_required",
"allowed_precisions",
"max_model_len",
"concurrency_levels",
"online_qps_levels",
"online_sla_ttft_ms",
):
if k in data and data[k] is not None:
spec[k] = data[k]
if rd.get("input_tokens_p50") is not None:
spec["input_tokens_p50"] = rd["input_tokens_p50"]
if rd.get("output_tokens_p50") is not None:
spec["output_tokens_p50"] = rd["output_tokens_p50"]
if scn.get("default"):
spec["scenarios_default"] = list(scn["default"])
if scn.get("extra"):
spec["scenarios_extra"] = list(scn["extra"])
out[sid] = spec
return out


# ── Data loading ──────────────────────────────────────────────────────────────

def load_results() -> list[dict]:
Expand Down Expand Up @@ -1080,15 +1140,55 @@ def main():

rows = _deduped

suite_specs = _collect_suite_specs()

SITE_DIR.mkdir(parents=True, exist_ok=True)
out_path = SITE_DIR / "leaderboard.js"
with open(out_path, "w") as f:
f.write("// Auto-generated by leaderboard/generate.py. Do not edit manually.\n")
# window.LEADERBOARD_DATA so ES modules (assets/js/data.js) can read it.
# Also exposed as bare LEADERBOARD_DATA for any legacy classic-script consumers.
f.write(f"const LEADERBOARD_DATA = {json.dumps(rows, indent=2)};\n")
f.write("window.LEADERBOARD_DATA = LEADERBOARD_DATA;\n")
# window.SUITE_SPECS — canonical per-suite spec from suites/suite_*/suite.json.
# data.js merges these into SUITE_META at init() so UI facts auto-sync
# with a suite contract edit (no JS to keep in step manually).
f.write(f"const SUITE_SPECS = {json.dumps(suite_specs, indent=2)};\n")
f.write("window.SUITE_SPECS = SUITE_SPECS;\n")

print(f"Leaderboard data written to {out_path} "
f"({len(rows)} rows, {len(suite_specs)} suite specs).")

# Cache-bust leaderboard.js in index.html so a stale CDN / browser
# cached copy never out-survives the data refresh. GitHub Pages
# serves the static files with a 10-minute Cache-Control by default
# and *no* ETag-aware revalidation on cross-domain `<script src>`
# loads, so without a versioned query users routinely see "old"
# submissions for hours after a merge. We hash the bytes we just
# wrote and rewrite the existing `?v=…` (or insert one) in place.
_bust_index_cache(out_path, SITE_DIR / "index.html")

print(f"Leaderboard data written to {out_path} ({len(rows)} rows).")
generate_api(results, SITE_DIR)


def _bust_index_cache(data_path: Path, index_path: Path) -> None:
"""Rewrite ``<script src="leaderboard.js?v=<sha8>">`` to match the
short SHA-256 of the file we just wrote. No-op if the index file
is missing (e.g. someone is running the generator outside the
repo)."""
if not index_path.exists():
return
sha8 = hashlib.sha256(data_path.read_bytes()).hexdigest()[:8]
html = index_path.read_text()
pattern = re.compile(
r'(<script\s+src="leaderboard\.js)(?:\?v=[0-9a-f]+)?(")',
re.IGNORECASE,
)
new_html, n = pattern.subn(rf'\1?v={sha8}\2', html)
if n and new_html != html:
index_path.write_text(new_html)
print(f" cache-busted leaderboard.js → ?v={sha8}")


if __name__ == "__main__":
main()
Loading
Loading