FreedomIntelligence · JuhaoLiang1997 · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/.github/workflows/validate_pr.yml b/.github/workflows/validate_pr.yml
@@ -11,6 +11,7 @@ on:
       - 'schema/platforms.json'
       - 'tools/generate_platforms_matrix.py'
       - 'README.md'
+      - 'leaderboard/site/**'
 
 jobs:
   validate-runners:
@@ -205,4 +206,23 @@ jobs:
                 issue_number: context.issue.number,
                 body,
               });
-            }
+            }
+
+  frontend-tests:
+    name: Frontend unit tests (modal viz)
+    runs-on: ubuntu-latest
+    # Frontend tests don't need full git history; a shallow checkout
+    # plus Node 20 (which ships node:test) is enough.
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '20'
+
+      # No npm install — the suite intentionally has zero runtime deps
+      # (a hand-written DOM stub stands in for jsdom / happy-dom).  Add
+      # extra files to leaderboard/site/test/ to widen coverage; the
+      # glob below picks them up automatically.
+      - name: Run leaderboard frontend tests
+        run: node --test leaderboard/site/test/*.test.mjs
diff --git a/.gitignore b/.gitignore
@@ -48,3 +48,9 @@ mini_result/
 *_backup/
 backup/
 /tmp/
+
+# ── Local-only handoff notes (across-session continuity) ────────────────────
+.handoff-*.md
+
+# ── Local-only design QA screenshots ────────────────────────────────────────
+.screenshots/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -346,6 +346,69 @@ verified result is itself reproducible by definition.
 
 ---
 
+## How your result appears on the leaderboard
+
+The frontend treats every submission as data — there's nothing per-vendor or per-chip hand-coded on the UI side. A few conventions are worth knowing if you want your result to look its best:
+
+### Chip identity vs. chip count
+
+The chip-detail page (`#/chip/<slug>`) is keyed on the **chip model** alone, not on `chip_count`. That means a single chip page aggregates every fan-out you've ever submitted (×1, ×4, ×8, ×16) into one overview, with the runs table sorted by `chip_count` ascending and the per-suite KPI card flagging the deployment behind the best score (e.g. `×8` badge next to the metric).
+
+Implication: if your runner emits one `result.json` per chip-count, you don't need to invent fake chip names to keep them apart — submit them with the same `chip` field and they'll merge cleanly. Old `…-x<N>` URLs from before this change auto-redirect to the bare-model slug, so existing shared links keep working.
+
+### Vendor colours
+
+Vendor accents (chip dot, vendor pill, peer card border) are driven entirely by `assets/js/data.js`'s `VENDOR_COLORS` map. New vendors get a deterministic fallback colour from a 9-entry palette the first time they appear in the dataset — no frontend change required to ship the result.
+
+If you want a brand-accurate accent for a new vendor (e.g. your accelerator's official colour), open a one-line PR to `VENDOR_COLORS`:
+
+```js
+export const VENDOR_COLORS = {
+  // …
+  Cerebras: "#ff6f3c",   // ← add yours
+};
+```
+
+`VENDOR_ORDER` (used to lay out the rankings facet pill row) is derived from `Object.keys(VENDOR_COLORS)`, so the same edit also pins your vendor's position in the brand list. Vendors not in the map are appended alphabetically after the brand-named ones.
+
+### Optional: viz fields for richer modal charts
+
+The run-detail modal's **Visualize** tab is hidden when `result.viz` is absent. Populate it to surface scenario-specific charts:
+
+```jsonc
+{
+  "viz": {
+    "type": "decode",            // bandwidth-bound suite (A/F/G default)
+    "offline": {
+      "labels":     [1, 2, 4, 8, 16, 32],
+      "throughput": [120, 230, 410, 760, 1100, 1380]
+    },
+    "interactive": {             // optional, suite_D-style
+      "ttft_p50": 78,  "ttft_p90": 110, "ttft_p99": 135,
+      "tpot_p50":  9,  "tpot_p90":  11, "tpot_p99":  14
+    }
+  }
+}
+```
+
+Suite-specific shapes the frontend understands today:
+
+| `viz.type` | Suites | Required keys |
+|---|---|---|
+| `decode`    | A · F · G  | `offline.labels[]`, `offline.throughput[]` |
+| `multichip` | B          | `offline.labels[]`, `offline.throughput[]`, `offline.throughput_per_chip[]` |
+| `quant`     | C          | `precisions[]`, `throughput[]`, optional `accuracy[]` |
+| `longctx`   | D          | `offline.labels[]`, `offline.throughput[]`, `interactive.{ttft_p50,…,tpot_p99}` |
+| `scaling`   | E          | `chip_counts[]`, `throughput[]`, `efficiency_pct[]` |
+
+`viz` is **fully optional** — runners that don't emit it still get a clean Details / Implementation modal. When present, the same fields drive the per-suite head-to-head charts on the Compare page (so two basket runs render directly comparable visualisations instead of falling back to a metric-table-only view).
+
+### Submitter handle
+
+The leaderboard surfaces the value of `meta.submitted_by` as `@<handle>` next to your result on every list (home recent, suite cards, chip-detail submissions table). Anything that looks like a GitHub login, an email, or a `Name <email>` form is reduced to the local-part — see `submitterHandle` in `assets/js/utils.js`.
+
+---
+
 ## Using local or air-gapped models
 
 AccelMark separates the **model identifier** (used for leaderboard comparisons)

diff --git a/leaderboard/generate.py b/leaderboard/generate.py
@@ -6,6 +6,7 @@
     python leaderboard/generate.py
 """
 
+import hashlib
 import json
 import re
 import statistics
@@ -48,6 +49,65 @@ def _get_suite_precision_required(suite_id: str) -> str:
         return "BF16"
 
 
+def _collect_suite_specs() -> dict:
+    """Collect UI-relevant per-suite spec from suites/suite_*/suite.json.
+
+    Baked into the generated leaderboard.js as ``window.SUITE_SPECS`` so
+    the static leaderboard UI auto-syncs whenever a maintainer edits a
+    suite contract — model id, dataset, prompt distribution, scenarios
+    default/extra split, online SLA, etc.  Editorial UI content (titles,
+    taglines, descriptions) stays in assets/js/data.js since it isn't a
+    property of the suite contract.
+
+    Returns a ``{ suite_id: spec }`` mapping with only the fields the UI
+    consumes.  Missing fields are omitted (the JS-side merge keeps the
+    hardcoded fallback when a key is absent).
+    """
+    out: dict = {}
+    suites_dir = Path("suites")
+    if not suites_dir.exists():
+        return out
+    for sd in sorted(suites_dir.iterdir()):
+        if not sd.is_dir():
+            continue
+        sf = sd / "suite.json"
+        if not sf.exists():
+            continue
+        try:
+            with open(sf) as f:
+                data = json.load(f)
+        except Exception:
+            continue
+        sid = data.get("suite_id") or sd.name
+        rd = data.get("request_distribution") or {}
+        scn = data.get("scenarios") or {}
+        spec: dict = {}
+        # Fields the UI displays in suite cards / specs / compare headers.
+        for k in (
+            "model_id",
+            "model_revision",
+            "dataset",
+            "precision_required",
+            "allowed_precisions",
+            "max_model_len",
+            "concurrency_levels",
+            "online_qps_levels",
+            "online_sla_ttft_ms",
+        ):
+            if k in data and data[k] is not None:
+                spec[k] = data[k]
+        if rd.get("input_tokens_p50") is not None:
+            spec["input_tokens_p50"] = rd["input_tokens_p50"]
+        if rd.get("output_tokens_p50") is not None:
+            spec["output_tokens_p50"] = rd["output_tokens_p50"]
+        if scn.get("default"):
+            spec["scenarios_default"] = list(scn["default"])
+        if scn.get("extra"):
+            spec["scenarios_extra"] = list(scn["extra"])
+        out[sid] = spec
+    return out
+
+
 # ── Data loading ──────────────────────────────────────────────────────────────
 
 def load_results() -> list[dict]:
@@ -1080,15 +1140,55 @@ def main():
 
     rows = _deduped
 
+    suite_specs = _collect_suite_specs()
+
     SITE_DIR.mkdir(parents=True, exist_ok=True)
     out_path = SITE_DIR / "leaderboard.js"
     with open(out_path, "w") as f:
         f.write("// Auto-generated by leaderboard/generate.py. Do not edit manually.\n")
+        # window.LEADERBOARD_DATA so ES modules (assets/js/data.js) can read it.
+        # Also exposed as bare LEADERBOARD_DATA for any legacy classic-script consumers.
         f.write(f"const LEADERBOARD_DATA = {json.dumps(rows, indent=2)};\n")
+        f.write("window.LEADERBOARD_DATA = LEADERBOARD_DATA;\n")
+        # window.SUITE_SPECS — canonical per-suite spec from suites/suite_*/suite.json.
+        # data.js merges these into SUITE_META at init() so UI facts auto-sync
+        # with a suite contract edit (no JS to keep in step manually).
+        f.write(f"const SUITE_SPECS = {json.dumps(suite_specs, indent=2)};\n")
+        f.write("window.SUITE_SPECS = SUITE_SPECS;\n")
+
+    print(f"Leaderboard data written to {out_path} "
+          f"({len(rows)} rows, {len(suite_specs)} suite specs).")
+
+    # Cache-bust leaderboard.js in index.html so a stale CDN / browser
+    # cached copy never out-survives the data refresh.  GitHub Pages
+    # serves the static files with a 10-minute Cache-Control by default
+    # and *no* ETag-aware revalidation on cross-domain `<script src>`
+    # loads, so without a versioned query users routinely see "old"
+    # submissions for hours after a merge.  We hash the bytes we just
+    # wrote and rewrite the existing `?v=…` (or insert one) in place.
+    _bust_index_cache(out_path, SITE_DIR / "index.html")
 
-    print(f"Leaderboard data written to {out_path} ({len(rows)} rows).")
     generate_api(results, SITE_DIR)
 
 
+def _bust_index_cache(data_path: Path, index_path: Path) -> None:
+    """Rewrite ``<script src="leaderboard.js?v=<sha8>">`` to match the
+    short SHA-256 of the file we just wrote.  No-op if the index file
+    is missing (e.g. someone is running the generator outside the
+    repo)."""
+    if not index_path.exists():
+        return
+    sha8 = hashlib.sha256(data_path.read_bytes()).hexdigest()[:8]
+    html = index_path.read_text()
+    pattern = re.compile(
+        r'(<script\s+src="leaderboard\.js)(?:\?v=[0-9a-f]+)?(")',
+        re.IGNORECASE,
+    )
+    new_html, n = pattern.subn(rf'\1?v={sha8}\2', html)
+    if n and new_html != html:
+        index_path.write_text(new_html)
+        print(f"  cache-busted leaderboard.js → ?v={sha8}")
+
+
 if __name__ == "__main__":
     main()