Skip to content

Feature/leaderboard UI v2#48

Merged
JuhaoLiang1997 merged 21 commits into
mainfrom
feature/leaderboard-ui-v2
May 15, 2026
Merged

Feature/leaderboard UI v2#48
JuhaoLiang1997 merged 21 commits into
mainfrom
feature/leaderboard-ui-v2

Conversation

@JuhaoLiang1997

Copy link
Copy Markdown
Collaborator

Summary

Type of change

  • New platform support
  • Bug fix (runner, validator, leaderboard, or tooling)
  • Suite definition change
  • Schema change
  • Leaderboard / UI improvement
  • Documentation
  • Other:

Testing

# Commands used to verify

Checklist

  • I have read CONTRIBUTING.md
  • My change does not break existing result.json files (or I have explained the migration path)
  • If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
  • If changing the schema: validate_submission.py updated and all existing results still validate
  • If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
  • I have updated relevant documentation

Related issues

JuhaoLiang1997 and others added 20 commits May 16, 2026 01:04
Splits the legacy 2300-line index.html into ES modules
(assets/css/* + assets/js/*) and introduces a hash router so each view
lives in its own file.

This commit only realises the Home page (suite overview grid + top-3
podiums + recent submissions).  Rankings, Compare, Chip detail and
Suites views are scaffolded with informative placeholders so the
router never errors; full implementations land in follow-ups
(commit 2: rankings + compare, 3: chip detail, 4: polish).

Notable bits:
- assets/css/{base,layout,components,home}.css design-token layer.
- assets/js/data.js single source of truth for SUITE_META, primary-
  metric direction/scale/unit rules, and row indexing.
- formatPrimary(value, suiteId) centralises display rules so suite
  quirks (e.g. Suite E ratios 0-1 rendered as %) only live in data.js.
- assets/js/router.js minimal hash router with compare-basket holder.
- leaderboard/generate.py exposes window.LEADERBOARD_DATA so ES
  modules can read the generator output (a classic-script `const` is
  not visible to modules).
- Old style.css removed (unused since the current dark-theme UI
  inlined styles into index.html).

Co-authored-by: Cursor <cursoragent@cursor.com>
Cross-session handoff drafts are session-local; they shouldn't land in
the repo. Match by glob so any feature handoff stays untracked.

Co-authored-by: Cursor <cursoragent@cursor.com>
Per first-round Home feedback ("乱", "字小", "bar 喧宾夺主", "97.0 怪"),
rebuild the Home page around small standalone leaderboards instead of a
podium+bar visual.

UI changes
- Suite card now has a filled accent header (letter + title + metric
  tag) followed by tagline, top-5 row list, and a centered "View full
  ranking →" CTA. Replaces the 3-entry podium with relative bars.
- New .lb-row primitive: rank circle (gold/silver/bronze for #1–#3,
  neutral for #4#5), chip name, vendor color dot + framework, and a
  right-aligned mono score with muted unit.
- Recent submissions reuse .lb-row; suite letter takes the rank slot
  and a small "Suite X" pill plus date sit in the sub line.
- Vendor identity is now visible at a glance via the colored dot.

Theming
- Auto light/dark via prefers-color-scheme; no manual toggle, no
  separate stylesheet. Tokens redeclared in @media block in base.css.
- Badges, primary button, compare pill now use color-mix on the
  current accent/good/bad tokens so they adapt to both themes.

Layout polish
- Hero: bigger h1 (2.1rem), more breathing tagline, KPI value bumped
  to 1.7rem with uppercased label kerning.
- Main padding loosened to 2rem 1.75rem 3rem; section margin to 3rem
  so 7-card grid no longer feels cramped.
- --max-w narrowed 1400 → 1200 for more comfortable line lengths.

Misc
- fmtNum integer fast-path: 97 / 32 / 4 render without ".0".
- formatPrimary unit is split from the number in views so the unit
  can render muted next to the value.

Co-authored-by: Cursor <cursoragent@cursor.com>
…rm parchment

Feedback was "颜色和布局有点单调". Premium UI in 2026 isn't about more
chrome — it's restraint plus craft (Vercel, Linear, Anthropic). Reworked
the visual system in that direction.

Palette
- Drop pure black / pure white. Dark uses an off-black (#0b0c10), light
  uses a warm parchment (#faf8f3) borrowed from editorial sites.
- Each suite (A–G) now has its own perceptually-distinct category color
  (cobalt, violet, teal, terracotta, emerald, rose, indigo). Surfaced as
  the suite card's header bar, the letter chip, the #1 row tint, the
  CTA color, and the small suite-tag pill on recent rows.
- Color reads consistently across the page — same color tells the same
  story everywhere.

Typography
- Pair editorial serif (ui-serif → Iowan / Charter / Georgia) with the
  existing sans body. Serif now drives the hero h1 (with italic accent
  on the lede), section h2s, KPI numerals, and the score-val numbers.
- Hero h1 bumped to 2.7rem with -0.02em tracking; eyebrow micro-cap
  label sits above for editorial pacing.

Layout polish
- Hero: subtle multi-color radial-gradient backdrop (3 stops at low
  alpha) clipped to a softly-rounded billboard. Adds depth without
  noise.
- KPI strip: thin vertical dividers between values, bracketed by hair-
  line top/bottom rules.
- Section headers: editorial eyebrow ("01 · Workloads", "02 · Latest
  activity") + serif h2, separated from the body by a faint rule.
- Suite card #1 row visually featured: faint color tint background +
  left accent bar + slightly larger score.
- Soft card lift on hover (translate + bigger shadow + suite-colored
  border).
- Theme-color meta now follows prefers-color-scheme so the iOS chrome
  matches.

/suites placeholder updated to the same visual language so it doesn't
look stranded.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ghter type

Round-3 feedback fixes:

Hero
- "标题挤在左边" — center the entire hero stack (eyebrow, h1, tagline,
  KPI strip, CTAs). KPI tiles centered between two hairline rules,
  with vertical dividers between values.
- Drop the leading "—" on the hero eyebrow so a centered single line
  reads cleanly.

Row sub line
- Framework version now displayed next to the framework name
  ("vLLM 0.7.3", "SGLang 0.5.6"). Long dev versions (e.g.
  "0.19.1rc1.dev339+gedc364896") are normalized via shortVersion()
  which trims at "+" and ".dev" plus a 12-char hard cap.
- Submitter handle ("@username") rendered after precision on the
  wider row-2 cards and on every recent-submissions row. Row-1
  compact cards skip it to keep the 25%-width slot readable.
- New helpers: utils.shortVersion(), utils.submitterHandle().

Typography
- "字体不统一" — reduce serif to display only (h1, h2, hero KPI
  numerals). Score values, suite letters, and the suite-stats counts
  now use sans-bold with tabular-nums so column alignment is crisp
  but the visual stays consistent.
- Suite letter chip dropped from serif → sans-bold for a tighter
  pairing with the sans card title beside it.

Layout
- 7 suite cards split into 4+3 on a 12-col grid (row 1: A B C D
  span 3 each; row 2: E F G span 4 each). Row 1 cards use a
  stacked head (letter + title row → metric pill below right-
  aligned) so the title gets the full card width — no more
  "Single-chip throu…" ellipsis.
- Row 1 cards show top 4 entries; row 2 cards show top 5 + submitter.
- Mobile <=1100px collapses to 2-col, <=640px to single column.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ction

Round-4 feedback fixes:

Hero
- Two-line title: "AccelMark Leaderboard" + "A Reproducible
  Multi-Regime AI Accelerator Benchmark" subtitle. The em-dash
  formulation is gone.
- All sans, no serif anywhere ("字体统一用现在的结果的字体").

Background
- The hero radial-gradient backdrop felt boxed-in because it was
  clipped to a rounded rectangle. Promoted it to body::before with
  absolute positioning, a 720-px height, and a linear mask that
  fades the bottom to transparent — the wash now blends into the
  page bg with no visible boundary and scrolls away with the hero.

Suite cards
- Uniform 3-col grid again (3 / 3 / 1 instead of the 4 + 3 split).
  All cards same size, all show top 6 entries.
- Card header now carries the tagline (moved out of the body) plus
  a new meta line with the suite's model, baseline precision, total
  results, and chip-config count — e.g.
    "Llama 3 · 8B · BF16 baseline · 18 results · 17 chips".
- Suite title bumped to 1.15rem 700.
- Long titles ("Quantization efficiency") now wrap instead of
  ellipsing — the head row uses flex 1 1 auto.

Data helpers
- data.suiteFacts(id) returns the mode model, mode precision, and
  per-suite counts.
- data.vendorBreakdown() returns one entry per vendor with chips,
  submissions, suite letters, and the vendor's best chip in its
  most-populated suite.
- utils.shortModel() abbreviates HF-style names ("Meta-Llama-3-8B-
  Instruct" → "Llama 3 · 8B", "Mixtral-8x7B-Instruct-v0.1" → keeps
  the × variant).

New 02 · Coverage section
- "Submissions by vendor" — auto-fit grid of vendor cards. Each
  card shows the vendor's color stripe and dot, chip + submission
  counts, the headline top chip with its score, and small letter
  pills tinted by suite color showing which suites the vendor
  appears in. Pairs with section 01 to give a second lens on the
  same dataset.

Typography
- Drop --font-serif token entirely. Sans only for h1/h2/KPI/score/
  card title/suite letter. Mono is retained behind the .mono utility
  but not used anywhere on home.

/suites updated to the same suite-card shape so the two pages share
visual language.

Co-authored-by: Cursor <cursoragent@cursor.com>
…bels

- Home page restyled: 8-row leaderboard cards, "Chips on the leaderboard"
  cloud (tile size = submission count, color = vendor) replacing the old
  per-vendor grid, and a new "Submit your result" contributor section.
- byline on each lb-row now carries framework + version + submitter handle
  for richer attribution at a glance.
- Suite headers use hardcoded model/protocol tokens (not data-derived) so
  the page does not jitter on submission changes.
- Metric labels rolled out to formal names across data.js (tokens/sec,
  queries/sec, 2x/4x scaling efficiency, quality efficiency, sustained
  throughput) — these flow into both the home view and any future view.
- Hero subtitle pinned with white-space: nowrap and responsive font ramp
  so it stops awkwardly wrapping on common widths.
- Em dashes in number/date fallbacks replaced with hyphens to play nicer
  with the editorial type and copy-paste.
- .screenshots/ ignored (local-only design QA captures).

Co-authored-by: Cursor <cursoragent@cursor.com>
…per-suite specs

Replaces the placeholder suites view with a long-form explainer that
makes the per-suite design argument concrete and scannable.

Page structure
- Hero: "Workload Suites" + a single, full-width subtitle (no eyebrow,
  no third tagline line).
- 01 Methodology: 2-column layout. Prose on the left explains why a
  single score collapses information across heterogeneous regimes; an
  inline SVG roofline diagram on the right plots each suite at its
  bottleneck region (memory-bound vs compute-bound), with knee marker
  and per-suite dots colored by suite.
- 02 Scenarios catalog (new): the seven protocols (accuracy / offline
  / online / interactive / sustained / speculative / burst) factored
  out into a single catalog. Each card has an SVG icon, role tagline,
  prose description, mini-spec table (metric, direction, threshold,
  cost), and clickable "Used by" suite-letter chips.
- 03 Specifications: per-suite cards now carry a description paragraph
  paired with a "Concrete finding" sidebar, a spec strip, compact
  protocol pills, and a horizontal mini-card grid of current leaders.
- 04 Datasets and 05 Propose-a-suite CTA round out the page.

Interactions
- Clicking a "Used by" chip in a scenario card scrolls smoothly to the
  matching suite section and flashes the card briefly. Uses event
  delegation with an attach-once guard to survive SPA re-renders.
- scroll-margin-top on suite cards clears the sticky top nav so anchor
  jumps don't tuck the target underneath.

Other
- index.html loads the new suites.css alongside the existing
  layout/components/home stylesheets.

Co-authored-by: Cursor <cursoragent@cursor.com>
…l modal

Rankings, Compare and Chip-detail get rebuilt on top of the v1 scaffold;
single-run detail moves out of a dedicated page into a global modal so
the leaderboard reads as one continuous browse → compare → drill flow.

Views
- rankings: suite pills + multi-select vendor/precision/framework facet
  pills, sticky toolbar, sortable table, URL-synced state, compare
  basket keyed by run_id, in-basket banner, full chip column linkable
  to chip-detail, soft `?chip=<slug>` filter with focus banner.
- compare: basket strip + per-suite resolver that surfaces the same
  hardware in any suite, per-metric table with winner stars, head-to-
  head Chart.js panels per suite (grouped bars, scaling lines, precision
  matrix, latency horizontal bars), chip-cloud quick-add (empty state
  + populated "Add another chip" affordance), empty-suite banner with
  "Try Suite X instead" suggestions, sharable `?runs=<rid,…>` seed.
- chip-detail: full rewrite — hero with vendor/memory eyebrow + facts +
  Compare/Browse CTAs, 01 per-suite KPI grid with #N/M rank badges
  (gold/silver/bronze top-3, unique-chip rank), 02 every-submission
  table, 03 Compare-with-similar-chips peer grid using shared-suite
  overlap with same-vendor tie-break.
- home: chip cloud + lb-rows now use nested anchors so chip names route
  to /chip/<slug> while the rest of the row pops the modal.

Run-detail modal
- New assets/js/modal.js: global singleton overlay with Details /
  Visualize / Implementation tabs, per-suite Chart.js renderings, deep
  link via `?run=<rid>&tab=viz`, history.replaceState so router never
  re-dispatches mid-modal, nested-anchor escape so inner <a> tags keep
  native navigation, stale-rid auto-strip.

Polish
- generate.py hashes leaderboard.js after writing and rewrites the
  <script src="leaderboard.js?v=<sha8>"> tag in index.html so CDN /
  browser cache invalidates on every data refresh.
- chipHref defends against missing slugs, chip-detail not-found mirrors
  .rk-empty / .cmp-empty rather than the loading state.
- Mobile audit ≤480/420: topnav drops wordmark + GH link, hero h1
  shrinks and wraps, CTA stacks vertically, chip-tile collapses to
  single compact size with text wrapping.

Data
- new helpers: rankChipInSuite, similarChipsTo, bestRowForRunInSuite,
  representativeRunForChip, rowByRunId, chipCloudData; bestPerSuiteForChip
  and bestPerChipForSuite formalised; `_chip_slug` / `_chip_label`
  attached at init so views never recompute.

Removed
- Top nav compare counter pill (duplicated the rankings toolbar status
  per user feedback).

Co-authored-by: Cursor <cursoragent@cursor.com>
…s + tests

Three independent improvements shipped together — designed in
.handoff-leaderboard-ui-v2.md as commits 5/6/7.

* Chip-detail page is now per chip-model, not per (chip × chip_count)
  fan-out.  4090D, 4090D ×4, 4090D ×8 share one page; the hero shows
  the bare chip name, the per-suite KPI cards add a "×N" badge when
  the best score came from a multi-card deployment, and the every-
  submission table grows a Chips column grouped by fan-out.  utils:
  chipSlug() drops the -x<N> suffix; new normalizeChipSlug() lets the
  router auto-redirect legacy "/chip/<slug>-x<N>" URLs so old shared
  links keep working.  Real data: 32 chip-detail pages collapse to 21.

* Vendor colour binding is 100% data-driven.  Single VENDOR_COLORS
  map in data.js plus an injectVendorStyles() that writes one
  [data-vendor="X"]{--vendor-color:#hex} rule per vendor seen in the
  loaded dataset into a singleton <style> tag at init() time.  Unknown
  vendors get a deterministic fallback colour from a 9-shade palette.
  Deletes ~70 lines of per-vendor CSS spread across 6 files
  (base/components/rankings/chip-detail/home/suites).  Adding a new
  vendor with brand colour is now one line in JS instead of seven CSS
  blocks; adding one without a brand colour is zero lines.

* Rankings facet pills (vendor / precision / framework / clear-all /
  empty-state-clear) switched from <button> to <a href> with a
  precomputed toggled-URL.  Middle / Cmd / Ctrl / Shift-click open in
  a new tab via native anchor semantics; plain left-click still
  preventDefault()s through the SPA toggle.  Suite pill stays a button
  because a suite switch resets sort + filters, so "open same suite
  with different filters in a new tab" doesn't make sense.

* Chip-detail KPI cards expose a hover hint footer ("Open run · ⌘+click
  for all in suite") plus a full title attribute so the dual click
  affordance is discoverable for keyboard / screen-reader users too.
  Footer collapses on touch widths (<=640px) since modifier-click is
  moot.

* New node:test coverage with zero npm deps:
    - leaderboard/site/test/dom_stub.mjs — minimal DOM + Chart.js
      stand-in (head, getElementById, FakeEl id getter/setter).
    - modal_viz.test.mjs — 11 cases: happy-path full suite_A row,
      no-data placeholder, unknown viz.type fallback, partial blocks
      for every per-suite renderer, vizHasAnyData boundary.
    - chip_slug.test.mjs — 6 cases pinning the chipSlug + normalize
      contract so a refactor can't accidentally re-fork chip-detail.
    - vendor_color.test.mjs — 5 cases: brand hits, deterministic
      fallback, null safety, distinct shades for unknown vendors,
      VENDOR_ORDER stays in sync with VENDOR_COLORS keys.
  Total 22 pass / 0 fail / ~110ms.  modal.js exports a deliberately
  underscored `_test` ESM hatch (vizHasAnyData / renderViz /
  destroyCharts / resetColorCache) so tests can drive the panel
  without booting the full modal.

* CI: validate_pr.yml gains paths trigger leaderboard/site/** and a
  new frontend-tests job (actions/setup-node@v4 → node --test
  leaderboard/site/test/*.test.mjs).

Co-authored-by: Cursor <cursoragent@cursor.com>
… ribbon

Three small backlog items shipped together because they touch
disjoint views; each is a single-purpose UX improvement.

* Suites 01 Methodology: wrap the second + third paragraph in a
  <details class="why-prose-more"> so mobile readers see the
  opening argument plus an opt-in expander instead of three full
  prose paragraphs above the fold.  Mount-time matchMedia decides
  desktop default-open / mobile default-closed (we deliberately
  don't react to viewport changes — toggling open mid-read would
  yank the user's scroll).  Summary chrome strips the native
  triangle marker for a pill that matches the eyebrow palette.

* Compare basket: add a Copy share link button next to Clear all.
  bindClicks intercepts data-basket-share, _shareUrlForBasket
  recomposes the canonical URL from the in-memory basket + active
  suite (the seed ?runs= is stripped from the address bar after
  consumption, so location.href alone wouldn't restore the basket).
  navigator.clipboard.writeText is the happy path; falls back to a
  hidden <textarea> + execCommand("copy") on insecure contexts and
  to window.prompt as a last resort, so the button never silently
  no-ops.  Success / failure each flash the label for 1.6s / 3.5s
  with is-copied / is-copy-failed accent palettes.

* Home hero: add a "this week" KPI tile + accent ribbon below the
  KPI strip showing recentSince(7d) submissions, linking to
  #/rankings?suite=suite_A&sort=date:desc.  When count is zero the
  whole ribbon + KPI are suppressed so quiet weeks don't read as a
  regression.  Pulsing accent dot mirrors a status-LED convention;
  prefers-reduced-motion turns it static.

* data.js: new recentSince(days, now=new Date()).  Compares
  lexicographically against a UTC YYYY-MM-DD cutoff (rows are
  calendar-dates with no time component; UTC avoids midnight
  rollover ambiguity).  Returns 0 for non-positive / NaN inputs
  rather than throwing, so the hero still renders if the data
  shape ever drifts.

* Tests: new test/recent_since.test.mjs (5 cases, 27 total pass /
  0 fail) covering 7d / 1d / 365d / negative+NaN / default-now and
  the no-date-row skip.  Re-uses dom_stub.mjs with a hand-built
  fixture so the suite stays npm-free.

Co-authored-by: Cursor <cursoragent@cursor.com>
…board a11y

Two related improvements that touch every view:

* Hoist the compare basket's share-link logic into reusable utils:
  copyToClipboard(text) and flashButtonLabel(btn, msg, opts).  Both
  are pure helpers with full progressive fallback (clipboard API →
  textarea+execCommand → window.prompt) and never throw, so views
  stay one-liners.  Add a Copy link CTA to the chip-detail hero
  that uses them; rewrite compare.js to consume the same helpers.
  Promote success / failure flash chrome onto a shared `.copy-btn`
  class so any future share button picks up the green / red accents
  without per-view CSS.

* Keyboard a11y sweep across every clickable surface:
  - Global :focus-visible ring (2px accent + offset, inherits radius)
    on the body so tab navigation always shows where focus is.
  - lb-row trigger divs (home suite cards + recent activity) get
    role="button" + tabindex="0" + aria-label; modal.js gains a
    keydown delegate that turns Enter / Space on any [data-open-run]
    surface into the same openModal call a click would make.  Same
    nested-anchor escape as the click delegate so focus on an inner
    chip-name link still navigates to the chip page.
  - Rankings + chip-detail table rows get tabindex="0" + aria-label;
    we keep the native <tr> role so screen-reader column-header
    pairing still works (role="button" would clobber it).
  - Compare chip-cloud tiles get role="button" + aria-pressed mirroring
    the in-basket state, with an extended title that documents both
    the toggle and the modifier-click navigation paths.
  - Modal tabs / panels are now linked via id + aria-controls +
    aria-labelledby, with a focusable panel container so keyboard
    users land in the active panel after switching tabs.
  - dom_stub: querySelector / querySelectorAll stubs on FakeEl so
    helper smoke tests can compose without monkey-patching jsdom.

Tests: new test/share_helpers.test.mjs (5 cases) covers the no-clipboard
fallback, navigator.clipboard.writeText happy path, null safety, and the
flashButtonLabel re-entrancy invariant (rapid clicks must not lock the
button into the flash label).  32 total pass / 0 fail / ~150ms.

Co-authored-by: Cursor <cursoragent@cursor.com>
…elper

The KPI grid told users which suites a chip submitted to and what
its primary metric was; what it never showed was *where the chip
sits across the spectrum* — is this a bandwidth-bound winner that
loses on long context, a balanced mid-stack chip, or a one-suite
specialist?  The radar makes that question answerable in one
glance: each axis is a suite letter, each tick is % of the global
best primary metric for that suite, asc-direction metrics inverted
so the leader still reads as 100 %.

Section anatomy: full-width card with a square radar canvas on the
left, a sortable per-suite legend (suite letter chip + name + raw
value + normalised %) on the right.  Suites the chip never
submitted to come back as missing entries — the polygon collapses
to centre on those axes, the legend annotates them as "no data" /
"—" so the gap is explicit rather than implied.

The radar also surfaces a dashed reference ring at 100 % so users
can read each axis as "how close is this chip to the best chip
ever submitted on this workload" without doing the division
themselves.  Chip's brand colour fills the polygon — same colour
the rest of the site uses for the chip's vendor.

Mounted post-render via setTimeout(0) so Chart.js sees an attached
canvas; previous chart instance is destroyed on every re-mount so
quick chip switching doesn't leak canvases.  Skipped entirely when
the chip has data on fewer than 2 suites (a single-point polygon
adds no signal), and the section's structured legend stays the
fallback when window.Chart isn't loaded.

Renumber sections: 02 fingerprint, 03 every submission, 04 peers.

* data.js: new `suiteFingerprint(slug)` returns
  Map<suiteId, { value, normalized, best, missing }> for every
  suite in SUITE_ORDER.  Reuses rowsForSuite for global-best lookup.
  Asc-direction metrics get `bestValue / value` so TTFT/latency
  metrics rank correctly on the radar.

* views/chip-detail.js: new `renderFingerprintSection(slug, sample)`
  + `_mountFingerprintChart(el, slug, sample)`; renumbered the
  every-submission section eyebrow to 03 and peers to 04.

* css/chip-detail.css: full styling for `.chip-fp-section`,
  `.chip-fp-wrap` (responsive 2-col on desktop, stacked on mobile),
  `.chip-fp-canvas` (square aspect-ratio wrapper for Chart.js),
  and `.chip-fp-legend` (suite-tinted letter chips + per-row tnum).

* test/suite_fingerprint.test.mjs (5 cases): leader normalised to
  1.0, mid-pack to 0.5, missing suites flagged + zero, unknown
  slugs return a complete Map without throwing, cross-suite
  ordering inversions surface as expected.  37 total pass / 0 fail
  / ~230ms.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add a section between "Leaderboard tiers" and "Using local or
air-gapped models" so contributors know which fields the v2
frontend reads (and which are optional polish vs. required schema).
README intentionally stays as-is.

* Chip identity vs. chip count — chip-detail aggregates every
  fan-out (×1/×4/×8) onto one page keyed on chip model alone, so
  per-fan-out result.json files don't need invented chip names.
  Old `…-x<N>` URLs auto-redirect.

* Vendor colours — explain that `VENDOR_COLORS` in data.js is the
  one source of truth: a one-line PR pins both the brand colour
  and the vendor's position in `VENDOR_ORDER`; vendors not in the
  map get a deterministic fallback from a 9-entry palette so a new
  result is never colour-less on first appearance.

* Optional viz fields — table of `viz.type` shapes the modal +
  compare charts understand today (decode / multichip / quant /
  longctx / scaling) with the required keys for each. Stresses
  that viz is fully optional — runners that don't emit it still
  get a clean Details / Implementation modal.

* Submitter handle — clarify that `meta.submitted_by` becomes
  `@<handle>` next to every result (utils.js `submitterHandle`).

Co-authored-by: Cursor <cursoragent@cursor.com>
Two complementary polish items, both small and table-focused.

* Rankings: hide metric columns whose values are entirely null in
  the current post-filter slice.  suite_C's quant_quality and
  suite_E's scaling_efficiency are null on most chips, so the
  default rendering used to show four "—" columns at every breakpoint
  — visual noise that read as "this chip is broken" rather than
  "this metric is suite-specific".  ?cols=auto (default) elides
  those columns; ?cols=all forces every declared column.  Primary
  and active-sort columns are always pinned visible (hiding either
  would break the page logic).

  The toggle lives in the existing .rk-status row as a pill anchor
  ("3 columns hidden — show all" / "Hide empty columns") so middle-
  click + Cmd-click open the alternate state in a new tab — same
  affordance every other rankings filter uses.  rebuildHash drops
  the ?cols= param when toggling back to auto so URLs stay tidy.

* Focus ring polish: extend the global :focus-visible from a single
  2px outline to a two-layer halo — the crisp 2px accent outline plus
  a 4px translucent box-shadow ring underneath using
  color-mix(--accent 28%).  The original ring got swallowed on the
  hero radial gradient and on suite-color-tinted rows; the halo
  traces the element's border-radius via box-shadow, so it survives
  busy backgrounds without per-element tuning.

  Pills (rk-suite-pill / rk-facet-pill / scn-suite-letter) get a
  smaller 3px halo so it doesn't dwarf them.  Table rows opt out of
  the halo entirely — `border-collapse: collapse` clips box-shadow
  mid-row, and the crisp outline still gives keyboard users an
  unambiguous focus indicator.

Tests: 37 pass / 0 fail unchanged (no new test surface — the toggle
is a render-time slice over existing data; the focus ring is CSS).

Co-authored-by: Cursor <cursoragent@cursor.com>
The fingerprint radar (commit 10) answered "where on the spectrum
does this chip sit?".  This new 03 · Scaling section answers the
follow-up question: "and how much does going wide actually help?"

Only meaningful for chips deployed at >1 fan-out (A100, H100,
H200, 4090D in the current dataset).  Single-variant chips
(consumer cards, Apple silicon) suppress the section entirely so
the page renumbers downward — no empty layout slot, no stretched
chart of one bar.

* Section anatomy: full-width card with a grouped bar chart on the
  left (one cluster per active suite, one bar per chip-count) and
  a vendor-tinted swatch legend on the right.  Below the chart, a
  per-suite breakdown grid surfaces the absolute primary-metric
  value for every chip-count cell — bars are normalised %, the
  breakdown carries the raw tok/s / ms / % so the user sees both.

* Y-axis is "% of this chip's best on this suite" (intra-cluster
  normalisation).  asc-direction primary metrics inverted so the
  lowest-latency variant still reads as 100 %.  Cross-suite mixing
  is intentionally NOT done here — that's what the fingerprint
  radar is for; this section answers the *intra*-suite scaling
  question instead, which is what users go to chip-detail to debug.

* Per-chip-count colour: bars within a cluster fade from accent (×1)
  to a lighter accent (×N) so the gradient reads as "more chips =
  more". Cells with no submission collapse to null bars + an "—"
  placeholder in the breakdown so the gap is explicit rather than
  silently re-aligning the cluster.

* Section numbering renumbers downward when scaling is shown:
  03 scaling → 04 every submission → 05 peers; without scaling the
  numbering stays 03 every submission → 04 peers.  renderSimilarChips
  now takes a sectionNum arg so the eyebrow stays in lock-step.

* data.js: new chipCountScaling(slug) returns
  { chipCounts: [1, 4, 8],
    suites: [{ sid, letter, title, perCount }] }
  where perCount is a Map<chip_count, { value, normalized, run_id }>
  with null-cell guarantees.  Suites with no submission at any
  chip-count are dropped from `suites` so the chart doesn't render
  empty clusters.

* test/chip_count_scaling.test.mjs (5 cases): single-fan-out → empty
  suites; multi-fan-out → intra-cluster normalisation; missing
  cells null/zero (NOT absent — chart relies on full Map); fully
  absent suites dropped; unknown slug returns stable shape without
  throwing.  42 total pass / 0 fail / ~180ms.

Co-authored-by: Cursor <cursoragent@cursor.com>
Path-1 share: each Chart.js render target gets a small "↓ PNG" pill
in its top-right corner that exports the chart to a downloadable
image.  Zero deps, zero library — just `canvas.toBlob` + an
ephemeral `<a download>`.  Covers four chart families:

  * modal Visualize tab — every per-suite chart (suite_A throughput
    bar, suite_C precision matrix, suite_D long-context, suite_E
    scaling, suite_G MoE, sustained over time, etc.)
  * chip-detail · 02 fingerprint radar
  * chip-detail · 03 chip-count scaling bar
  * compare · per-suite head-to-head charts

The PNG is composited onto an opaque background read from
`--bg-elev` before export — Chart.js renders with a transparent
fill, so a naive toDataURL on a dark-mode page produces an image
with white text on a transparent background that looks like
nothing once pasted into Slack / X / GitHub Discussions.

Filenames are self-describing so downloads stack cleanly in the
user's Downloads folder:
  * compare:      `compare-suite-a-throughput-by-concurrency.png`
  * chip-detail:  `<chip-slug>-fingerprint.png`,
                  `<chip-slug>-scaling.png`
  * modal viz:    `<run_id>-<section-title-slug>.png` (the section
                  title is read from the nearest preceding
                  `.viz-section-title` so each `_mkCanvas` caller
                  doesn't need to re-declare its name)

* utils.js: new `downloadCanvasAsPng(canvas, opts)` —
  three-step robustness: composite + opaque fill → toBlob → URL.
  Falls back to `toDataURL` when toBlob isn't available (older
  Safari, jsdom).  Returns Promise<boolean>; never throws.

* components.css: shared `.chart-dl-btn` pill (top-right absolute,
  fades in on hover / :focus-visible, green `is-saved` / red
  `is-failed` flash via flashButtonLabel).  Containers everywhere
  switched to `position: relative` so the pill anchors against the
  canvas surface, not the page.

* modal.js: `_mkCanvas(height, name?)` now ships the download
  button automatically — every existing renderer (8+ call sites)
  picks up the affordance for free.  Click delegate added to the
  modal-shell handler, slug-fies the nearest `.viz-section-title`
  for the filename so adding a new chart needs no naming work.

* views/chip-detail.js: download button HTML threaded into the
  fingerprint + scaling canvas wrappers; bindClicks routes
  `[data-chart-dl]` to a new `_downloadChipChart`.

* views/compare.js: same affordance per head-to-head chart card.
  Click delegate inside the existing compare bindClicks.

* test/share_helpers.test.mjs (4 new cases): null-safety, missing
  2d context fall-through, full toBlob happy path with composite
  ordering assertion (background fill MUST happen before drawImage
  — otherwise the export ignores the bg), toDataURL fallback
  exercised by removing the toBlob method on the fake canvas.
  Plus dom_stub gets `removeChild` / `click` so the helper's
  `document.body.appendChild(a) → a.click() → removeChild` flow
  doesn't throw under node-test.

46 total pass / 0 fail / ~180ms.

Co-authored-by: Cursor <cursoragent@cursor.com>
The PNG download click handler tried to read `slug` from
`bindClicks`'s outer closure, but bindClicks attaches once per
mounted view (via the `__chipDetailClicksAttached` guard) and never
captured slug as a parameter — so clicking Download PNG on the
fingerprint or scaling chart threw `ReferenceError: slug is not
defined` and the button silently did nothing.

Read the active chip slug from `location.hash` at click time
instead, matching the pattern already used by `_chipShareUrl()`.
This also makes the handler immune to stale-closure issues if the
user navigates between chips without remounting the view.

Co-authored-by: Cursor <cursoragent@cursor.com>
Pure-subtraction cleanup after the UI v2 stabilises.  Nothing here
changes runtime behaviour; every removal was verified to have zero
references across `assets/js/**` and `index.html`.

- `data.js`
  · Drop unused `maxBy` import (utils.js export now removed below).
  · Remove 5 zero-importer exports (`ready`, `rows`, `uniqueChips`,
    `rankWithinSuite`, `suiteLeader`).  The internal `_ready` flag
    is still consumed by every public lazy-init guard, so module
    state is unaffected.

- `utils.js`
  · Remove zero-importer exports `fmtPct`, `fmtMs`, `maxBy`, `el`.
    Helpers that nobody calls quietly accumulate divergent expectations
    of how they should behave; deleting them keeps the public surface
    honest.

- `components.css`
  · Remove `.vendor-pill` (+ ::before), `.suite-chip` family,
    `.rel-bar` family, `.badge.flagged`, and `.rank` medals — none of
    them appear in any view or template.  The current DOM uses
    `lb-row-rank` (home.css) and `chip-suite-rank` (chip-detail.css)
    for the medal styling, so the colour mapping survives intact.

- `rankings.js`
  · Drop `data-cols-toggle` from the column-visibility toggle anchor.
    It's a stringly-typed mirror of the `?cols=` value that already
    lives in the same anchor's `href`; navigation goes through the
    native hashchange path, so no click delegate ever read it.

All 46 unit tests still pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
…basket

Two narrow defensive fixes for failure modes that don't bite today
but are one bad row / one stale shared link away from a broken UI.

modal.js — `_renderSuiteC` online_by_precision branch
  Per-format records fed into the "TTFT p99 by QPS across formats"
  line chart had two unguarded deep accesses: `viz.online_by_precision[0].qps_labels`
  and `fp.sla_met.map(...)`.  generate.py emits both today, but a
  hand-edited result.json or a future schema change that drops
  `sla_met` would throw `Cannot read properties of undefined
  (reading 'map')` and white out the entire Visualize tab.  Default
  qps_labels to `[]`, gate `sla_met.map` behind `Array.isArray`, and
  default `ttft_p99` to `[]` for symmetry.

compare.js — basket entries all resolve to null (stale shared link)
  When every `?runs=` id maps back to a `null` row (re-uploaded
  under new ids, or pruned), `chips` was empty but the
  `suiteEmpty` guard required `chips.length > 0`, so we fell into
  the normal-render branch and produced a header-only table plus
  an empty chart grid with no explanation — reads like a broken
  page.  Add a dedicated short-circuit that explains what
  happened, shows a "Clear & start over" button (so the basket
  isn't permanently wedged), and keeps the chip cloud visible so
  the user can immediately rebuild a comparison.  rankings.css
  gets a small `.cmp-basket--empty .cmp-basket-stale` rule for
  the muted italic count label.

All 46 unit tests still pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

The Node ≤20 runtime that CI pins (`node-version: '20'` in
validate_pr.yml) doesn't expose `globalThis.navigator` at all —
that web-standard global only landed in Node 21+.  The existing
test reached for `Object.getOwnPropertyDescriptor(globalThis.navigator, "clipboard")`,
which throws `Cannot convert undefined or null to object` on Node
20.  Locally (Node 25) the test passed and we shipped the
regression in commit 9.

Replace the whole `navigator` slot via defineProperty on
globalThis instead of monkey-patching its `clipboard` property.
On Node ≤20 the slot didn't exist (defineProperty creates it,
finally branch deletes it); on Node ≥21 the existing slot is a
configurable getter (defineProperty replaces it, finally branch
restores the original descriptor).  Behaviour is identical on
either runtime.

Verified locally on Node 25 (46/46) and Node 20 (46/46).

Co-authored-by: Cursor <cursoragent@cursor.com>
@JuhaoLiang1997 JuhaoLiang1997 merged commit 6a038ca into main May 15, 2026
3 checks passed
@JuhaoLiang1997 JuhaoLiang1997 deleted the feature/leaderboard-ui-v2 branch May 15, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant