Feature/leaderboard UI v2#48
Merged
Merged
Conversation
Splits the legacy 2300-line index.html into ES modules
(assets/css/* + assets/js/*) and introduces a hash router so each view
lives in its own file.
This commit only realises the Home page (suite overview grid + top-3
podiums + recent submissions). Rankings, Compare, Chip detail and
Suites views are scaffolded with informative placeholders so the
router never errors; full implementations land in follow-ups
(commit 2: rankings + compare, 3: chip detail, 4: polish).
Notable bits:
- assets/css/{base,layout,components,home}.css design-token layer.
- assets/js/data.js single source of truth for SUITE_META, primary-
metric direction/scale/unit rules, and row indexing.
- formatPrimary(value, suiteId) centralises display rules so suite
quirks (e.g. Suite E ratios 0-1 rendered as %) only live in data.js.
- assets/js/router.js minimal hash router with compare-basket holder.
- leaderboard/generate.py exposes window.LEADERBOARD_DATA so ES
modules can read the generator output (a classic-script `const` is
not visible to modules).
- Old style.css removed (unused since the current dark-theme UI
inlined styles into index.html).
Co-authored-by: Cursor <cursoragent@cursor.com>
Cross-session handoff drafts are session-local; they shouldn't land in the repo. Match by glob so any feature handoff stays untracked. Co-authored-by: Cursor <cursoragent@cursor.com>
Per first-round Home feedback ("乱", "字小", "bar 喧宾夺主", "97.0 怪"),
rebuild the Home page around small standalone leaderboards instead of a
podium+bar visual.
UI changes
- Suite card now has a filled accent header (letter + title + metric
tag) followed by tagline, top-5 row list, and a centered "View full
ranking →" CTA. Replaces the 3-entry podium with relative bars.
- New .lb-row primitive: rank circle (gold/silver/bronze for #1–#3,
neutral for #4–#5), chip name, vendor color dot + framework, and a
right-aligned mono score with muted unit.
- Recent submissions reuse .lb-row; suite letter takes the rank slot
and a small "Suite X" pill plus date sit in the sub line.
- Vendor identity is now visible at a glance via the colored dot.
Theming
- Auto light/dark via prefers-color-scheme; no manual toggle, no
separate stylesheet. Tokens redeclared in @media block in base.css.
- Badges, primary button, compare pill now use color-mix on the
current accent/good/bad tokens so they adapt to both themes.
Layout polish
- Hero: bigger h1 (2.1rem), more breathing tagline, KPI value bumped
to 1.7rem with uppercased label kerning.
- Main padding loosened to 2rem 1.75rem 3rem; section margin to 3rem
so 7-card grid no longer feels cramped.
- --max-w narrowed 1400 → 1200 for more comfortable line lengths.
Misc
- fmtNum integer fast-path: 97 / 32 / 4 render without ".0".
- formatPrimary unit is split from the number in views so the unit
can render muted next to the value.
Co-authored-by: Cursor <cursoragent@cursor.com>
…rm parchment
Feedback was "颜色和布局有点单调". Premium UI in 2026 isn't about more
chrome — it's restraint plus craft (Vercel, Linear, Anthropic). Reworked
the visual system in that direction.
Palette
- Drop pure black / pure white. Dark uses an off-black (#0b0c10), light
uses a warm parchment (#faf8f3) borrowed from editorial sites.
- Each suite (A–G) now has its own perceptually-distinct category color
(cobalt, violet, teal, terracotta, emerald, rose, indigo). Surfaced as
the suite card's header bar, the letter chip, the #1 row tint, the
CTA color, and the small suite-tag pill on recent rows.
- Color reads consistently across the page — same color tells the same
story everywhere.
Typography
- Pair editorial serif (ui-serif → Iowan / Charter / Georgia) with the
existing sans body. Serif now drives the hero h1 (with italic accent
on the lede), section h2s, KPI numerals, and the score-val numbers.
- Hero h1 bumped to 2.7rem with -0.02em tracking; eyebrow micro-cap
label sits above for editorial pacing.
Layout polish
- Hero: subtle multi-color radial-gradient backdrop (3 stops at low
alpha) clipped to a softly-rounded billboard. Adds depth without
noise.
- KPI strip: thin vertical dividers between values, bracketed by hair-
line top/bottom rules.
- Section headers: editorial eyebrow ("01 · Workloads", "02 · Latest
activity") + serif h2, separated from the body by a faint rule.
- Suite card #1 row visually featured: faint color tint background +
left accent bar + slightly larger score.
- Soft card lift on hover (translate + bigger shadow + suite-colored
border).
- Theme-color meta now follows prefers-color-scheme so the iOS chrome
matches.
/suites placeholder updated to the same visual language so it doesn't
look stranded.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ghter type
Round-3 feedback fixes:
Hero
- "标题挤在左边" — center the entire hero stack (eyebrow, h1, tagline,
KPI strip, CTAs). KPI tiles centered between two hairline rules,
with vertical dividers between values.
- Drop the leading "—" on the hero eyebrow so a centered single line
reads cleanly.
Row sub line
- Framework version now displayed next to the framework name
("vLLM 0.7.3", "SGLang 0.5.6"). Long dev versions (e.g.
"0.19.1rc1.dev339+gedc364896") are normalized via shortVersion()
which trims at "+" and ".dev" plus a 12-char hard cap.
- Submitter handle ("@username") rendered after precision on the
wider row-2 cards and on every recent-submissions row. Row-1
compact cards skip it to keep the 25%-width slot readable.
- New helpers: utils.shortVersion(), utils.submitterHandle().
Typography
- "字体不统一" — reduce serif to display only (h1, h2, hero KPI
numerals). Score values, suite letters, and the suite-stats counts
now use sans-bold with tabular-nums so column alignment is crisp
but the visual stays consistent.
- Suite letter chip dropped from serif → sans-bold for a tighter
pairing with the sans card title beside it.
Layout
- 7 suite cards split into 4+3 on a 12-col grid (row 1: A B C D
span 3 each; row 2: E F G span 4 each). Row 1 cards use a
stacked head (letter + title row → metric pill below right-
aligned) so the title gets the full card width — no more
"Single-chip throu…" ellipsis.
- Row 1 cards show top 4 entries; row 2 cards show top 5 + submitter.
- Mobile <=1100px collapses to 2-col, <=640px to single column.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ction
Round-4 feedback fixes:
Hero
- Two-line title: "AccelMark Leaderboard" + "A Reproducible
Multi-Regime AI Accelerator Benchmark" subtitle. The em-dash
formulation is gone.
- All sans, no serif anywhere ("字体统一用现在的结果的字体").
Background
- The hero radial-gradient backdrop felt boxed-in because it was
clipped to a rounded rectangle. Promoted it to body::before with
absolute positioning, a 720-px height, and a linear mask that
fades the bottom to transparent — the wash now blends into the
page bg with no visible boundary and scrolls away with the hero.
Suite cards
- Uniform 3-col grid again (3 / 3 / 1 instead of the 4 + 3 split).
All cards same size, all show top 6 entries.
- Card header now carries the tagline (moved out of the body) plus
a new meta line with the suite's model, baseline precision, total
results, and chip-config count — e.g.
"Llama 3 · 8B · BF16 baseline · 18 results · 17 chips".
- Suite title bumped to 1.15rem 700.
- Long titles ("Quantization efficiency") now wrap instead of
ellipsing — the head row uses flex 1 1 auto.
Data helpers
- data.suiteFacts(id) returns the mode model, mode precision, and
per-suite counts.
- data.vendorBreakdown() returns one entry per vendor with chips,
submissions, suite letters, and the vendor's best chip in its
most-populated suite.
- utils.shortModel() abbreviates HF-style names ("Meta-Llama-3-8B-
Instruct" → "Llama 3 · 8B", "Mixtral-8x7B-Instruct-v0.1" → keeps
the × variant).
New 02 · Coverage section
- "Submissions by vendor" — auto-fit grid of vendor cards. Each
card shows the vendor's color stripe and dot, chip + submission
counts, the headline top chip with its score, and small letter
pills tinted by suite color showing which suites the vendor
appears in. Pairs with section 01 to give a second lens on the
same dataset.
Typography
- Drop --font-serif token entirely. Sans only for h1/h2/KPI/score/
card title/suite letter. Mono is retained behind the .mono utility
but not used anywhere on home.
/suites updated to the same suite-card shape so the two pages share
visual language.
Co-authored-by: Cursor <cursoragent@cursor.com>
…bels - Home page restyled: 8-row leaderboard cards, "Chips on the leaderboard" cloud (tile size = submission count, color = vendor) replacing the old per-vendor grid, and a new "Submit your result" contributor section. - byline on each lb-row now carries framework + version + submitter handle for richer attribution at a glance. - Suite headers use hardcoded model/protocol tokens (not data-derived) so the page does not jitter on submission changes. - Metric labels rolled out to formal names across data.js (tokens/sec, queries/sec, 2x/4x scaling efficiency, quality efficiency, sustained throughput) — these flow into both the home view and any future view. - Hero subtitle pinned with white-space: nowrap and responsive font ramp so it stops awkwardly wrapping on common widths. - Em dashes in number/date fallbacks replaced with hyphens to play nicer with the editorial type and copy-paste. - .screenshots/ ignored (local-only design QA captures). Co-authored-by: Cursor <cursoragent@cursor.com>
…per-suite specs Replaces the placeholder suites view with a long-form explainer that makes the per-suite design argument concrete and scannable. Page structure - Hero: "Workload Suites" + a single, full-width subtitle (no eyebrow, no third tagline line). - 01 Methodology: 2-column layout. Prose on the left explains why a single score collapses information across heterogeneous regimes; an inline SVG roofline diagram on the right plots each suite at its bottleneck region (memory-bound vs compute-bound), with knee marker and per-suite dots colored by suite. - 02 Scenarios catalog (new): the seven protocols (accuracy / offline / online / interactive / sustained / speculative / burst) factored out into a single catalog. Each card has an SVG icon, role tagline, prose description, mini-spec table (metric, direction, threshold, cost), and clickable "Used by" suite-letter chips. - 03 Specifications: per-suite cards now carry a description paragraph paired with a "Concrete finding" sidebar, a spec strip, compact protocol pills, and a horizontal mini-card grid of current leaders. - 04 Datasets and 05 Propose-a-suite CTA round out the page. Interactions - Clicking a "Used by" chip in a scenario card scrolls smoothly to the matching suite section and flashes the card briefly. Uses event delegation with an attach-once guard to survive SPA re-renders. - scroll-margin-top on suite cards clears the sticky top nav so anchor jumps don't tuck the target underneath. Other - index.html loads the new suites.css alongside the existing layout/components/home stylesheets. Co-authored-by: Cursor <cursoragent@cursor.com>
…l modal Rankings, Compare and Chip-detail get rebuilt on top of the v1 scaffold; single-run detail moves out of a dedicated page into a global modal so the leaderboard reads as one continuous browse → compare → drill flow. Views - rankings: suite pills + multi-select vendor/precision/framework facet pills, sticky toolbar, sortable table, URL-synced state, compare basket keyed by run_id, in-basket banner, full chip column linkable to chip-detail, soft `?chip=<slug>` filter with focus banner. - compare: basket strip + per-suite resolver that surfaces the same hardware in any suite, per-metric table with winner stars, head-to- head Chart.js panels per suite (grouped bars, scaling lines, precision matrix, latency horizontal bars), chip-cloud quick-add (empty state + populated "Add another chip" affordance), empty-suite banner with "Try Suite X instead" suggestions, sharable `?runs=<rid,…>` seed. - chip-detail: full rewrite — hero with vendor/memory eyebrow + facts + Compare/Browse CTAs, 01 per-suite KPI grid with #N/M rank badges (gold/silver/bronze top-3, unique-chip rank), 02 every-submission table, 03 Compare-with-similar-chips peer grid using shared-suite overlap with same-vendor tie-break. - home: chip cloud + lb-rows now use nested anchors so chip names route to /chip/<slug> while the rest of the row pops the modal. Run-detail modal - New assets/js/modal.js: global singleton overlay with Details / Visualize / Implementation tabs, per-suite Chart.js renderings, deep link via `?run=<rid>&tab=viz`, history.replaceState so router never re-dispatches mid-modal, nested-anchor escape so inner <a> tags keep native navigation, stale-rid auto-strip. Polish - generate.py hashes leaderboard.js after writing and rewrites the <script src="leaderboard.js?v=<sha8>"> tag in index.html so CDN / browser cache invalidates on every data refresh. - chipHref defends against missing slugs, chip-detail not-found mirrors .rk-empty / .cmp-empty rather than the loading state. - Mobile audit ≤480/420: topnav drops wordmark + GH link, hero h1 shrinks and wraps, CTA stacks vertically, chip-tile collapses to single compact size with text wrapping. Data - new helpers: rankChipInSuite, similarChipsTo, bestRowForRunInSuite, representativeRunForChip, rowByRunId, chipCloudData; bestPerSuiteForChip and bestPerChipForSuite formalised; `_chip_slug` / `_chip_label` attached at init so views never recompute. Removed - Top nav compare counter pill (duplicated the rankings toolbar status per user feedback). Co-authored-by: Cursor <cursoragent@cursor.com>
…s + tests
Three independent improvements shipped together — designed in
.handoff-leaderboard-ui-v2.md as commits 5/6/7.
* Chip-detail page is now per chip-model, not per (chip × chip_count)
fan-out. 4090D, 4090D ×4, 4090D ×8 share one page; the hero shows
the bare chip name, the per-suite KPI cards add a "×N" badge when
the best score came from a multi-card deployment, and the every-
submission table grows a Chips column grouped by fan-out. utils:
chipSlug() drops the -x<N> suffix; new normalizeChipSlug() lets the
router auto-redirect legacy "/chip/<slug>-x<N>" URLs so old shared
links keep working. Real data: 32 chip-detail pages collapse to 21.
* Vendor colour binding is 100% data-driven. Single VENDOR_COLORS
map in data.js plus an injectVendorStyles() that writes one
[data-vendor="X"]{--vendor-color:#hex} rule per vendor seen in the
loaded dataset into a singleton <style> tag at init() time. Unknown
vendors get a deterministic fallback colour from a 9-shade palette.
Deletes ~70 lines of per-vendor CSS spread across 6 files
(base/components/rankings/chip-detail/home/suites). Adding a new
vendor with brand colour is now one line in JS instead of seven CSS
blocks; adding one without a brand colour is zero lines.
* Rankings facet pills (vendor / precision / framework / clear-all /
empty-state-clear) switched from <button> to <a href> with a
precomputed toggled-URL. Middle / Cmd / Ctrl / Shift-click open in
a new tab via native anchor semantics; plain left-click still
preventDefault()s through the SPA toggle. Suite pill stays a button
because a suite switch resets sort + filters, so "open same suite
with different filters in a new tab" doesn't make sense.
* Chip-detail KPI cards expose a hover hint footer ("Open run · ⌘+click
for all in suite") plus a full title attribute so the dual click
affordance is discoverable for keyboard / screen-reader users too.
Footer collapses on touch widths (<=640px) since modifier-click is
moot.
* New node:test coverage with zero npm deps:
- leaderboard/site/test/dom_stub.mjs — minimal DOM + Chart.js
stand-in (head, getElementById, FakeEl id getter/setter).
- modal_viz.test.mjs — 11 cases: happy-path full suite_A row,
no-data placeholder, unknown viz.type fallback, partial blocks
for every per-suite renderer, vizHasAnyData boundary.
- chip_slug.test.mjs — 6 cases pinning the chipSlug + normalize
contract so a refactor can't accidentally re-fork chip-detail.
- vendor_color.test.mjs — 5 cases: brand hits, deterministic
fallback, null safety, distinct shades for unknown vendors,
VENDOR_ORDER stays in sync with VENDOR_COLORS keys.
Total 22 pass / 0 fail / ~110ms. modal.js exports a deliberately
underscored `_test` ESM hatch (vizHasAnyData / renderViz /
destroyCharts / resetColorCache) so tests can drive the panel
without booting the full modal.
* CI: validate_pr.yml gains paths trigger leaderboard/site/** and a
new frontend-tests job (actions/setup-node@v4 → node --test
leaderboard/site/test/*.test.mjs).
Co-authored-by: Cursor <cursoragent@cursor.com>
… ribbon
Three small backlog items shipped together because they touch
disjoint views; each is a single-purpose UX improvement.
* Suites 01 Methodology: wrap the second + third paragraph in a
<details class="why-prose-more"> so mobile readers see the
opening argument plus an opt-in expander instead of three full
prose paragraphs above the fold. Mount-time matchMedia decides
desktop default-open / mobile default-closed (we deliberately
don't react to viewport changes — toggling open mid-read would
yank the user's scroll). Summary chrome strips the native
triangle marker for a pill that matches the eyebrow palette.
* Compare basket: add a Copy share link button next to Clear all.
bindClicks intercepts data-basket-share, _shareUrlForBasket
recomposes the canonical URL from the in-memory basket + active
suite (the seed ?runs= is stripped from the address bar after
consumption, so location.href alone wouldn't restore the basket).
navigator.clipboard.writeText is the happy path; falls back to a
hidden <textarea> + execCommand("copy") on insecure contexts and
to window.prompt as a last resort, so the button never silently
no-ops. Success / failure each flash the label for 1.6s / 3.5s
with is-copied / is-copy-failed accent palettes.
* Home hero: add a "this week" KPI tile + accent ribbon below the
KPI strip showing recentSince(7d) submissions, linking to
#/rankings?suite=suite_A&sort=date:desc. When count is zero the
whole ribbon + KPI are suppressed so quiet weeks don't read as a
regression. Pulsing accent dot mirrors a status-LED convention;
prefers-reduced-motion turns it static.
* data.js: new recentSince(days, now=new Date()). Compares
lexicographically against a UTC YYYY-MM-DD cutoff (rows are
calendar-dates with no time component; UTC avoids midnight
rollover ambiguity). Returns 0 for non-positive / NaN inputs
rather than throwing, so the hero still renders if the data
shape ever drifts.
* Tests: new test/recent_since.test.mjs (5 cases, 27 total pass /
0 fail) covering 7d / 1d / 365d / negative+NaN / default-now and
the no-date-row skip. Re-uses dom_stub.mjs with a hand-built
fixture so the suite stays npm-free.
Co-authored-by: Cursor <cursoragent@cursor.com>
…board a11y
Two related improvements that touch every view:
* Hoist the compare basket's share-link logic into reusable utils:
copyToClipboard(text) and flashButtonLabel(btn, msg, opts). Both
are pure helpers with full progressive fallback (clipboard API →
textarea+execCommand → window.prompt) and never throw, so views
stay one-liners. Add a Copy link CTA to the chip-detail hero
that uses them; rewrite compare.js to consume the same helpers.
Promote success / failure flash chrome onto a shared `.copy-btn`
class so any future share button picks up the green / red accents
without per-view CSS.
* Keyboard a11y sweep across every clickable surface:
- Global :focus-visible ring (2px accent + offset, inherits radius)
on the body so tab navigation always shows where focus is.
- lb-row trigger divs (home suite cards + recent activity) get
role="button" + tabindex="0" + aria-label; modal.js gains a
keydown delegate that turns Enter / Space on any [data-open-run]
surface into the same openModal call a click would make. Same
nested-anchor escape as the click delegate so focus on an inner
chip-name link still navigates to the chip page.
- Rankings + chip-detail table rows get tabindex="0" + aria-label;
we keep the native <tr> role so screen-reader column-header
pairing still works (role="button" would clobber it).
- Compare chip-cloud tiles get role="button" + aria-pressed mirroring
the in-basket state, with an extended title that documents both
the toggle and the modifier-click navigation paths.
- Modal tabs / panels are now linked via id + aria-controls +
aria-labelledby, with a focusable panel container so keyboard
users land in the active panel after switching tabs.
- dom_stub: querySelector / querySelectorAll stubs on FakeEl so
helper smoke tests can compose without monkey-patching jsdom.
Tests: new test/share_helpers.test.mjs (5 cases) covers the no-clipboard
fallback, navigator.clipboard.writeText happy path, null safety, and the
flashButtonLabel re-entrancy invariant (rapid clicks must not lock the
button into the flash label). 32 total pass / 0 fail / ~150ms.
Co-authored-by: Cursor <cursoragent@cursor.com>
…elper
The KPI grid told users which suites a chip submitted to and what
its primary metric was; what it never showed was *where the chip
sits across the spectrum* — is this a bandwidth-bound winner that
loses on long context, a balanced mid-stack chip, or a one-suite
specialist? The radar makes that question answerable in one
glance: each axis is a suite letter, each tick is % of the global
best primary metric for that suite, asc-direction metrics inverted
so the leader still reads as 100 %.
Section anatomy: full-width card with a square radar canvas on the
left, a sortable per-suite legend (suite letter chip + name + raw
value + normalised %) on the right. Suites the chip never
submitted to come back as missing entries — the polygon collapses
to centre on those axes, the legend annotates them as "no data" /
"—" so the gap is explicit rather than implied.
The radar also surfaces a dashed reference ring at 100 % so users
can read each axis as "how close is this chip to the best chip
ever submitted on this workload" without doing the division
themselves. Chip's brand colour fills the polygon — same colour
the rest of the site uses for the chip's vendor.
Mounted post-render via setTimeout(0) so Chart.js sees an attached
canvas; previous chart instance is destroyed on every re-mount so
quick chip switching doesn't leak canvases. Skipped entirely when
the chip has data on fewer than 2 suites (a single-point polygon
adds no signal), and the section's structured legend stays the
fallback when window.Chart isn't loaded.
Renumber sections: 02 fingerprint, 03 every submission, 04 peers.
* data.js: new `suiteFingerprint(slug)` returns
Map<suiteId, { value, normalized, best, missing }> for every
suite in SUITE_ORDER. Reuses rowsForSuite for global-best lookup.
Asc-direction metrics get `bestValue / value` so TTFT/latency
metrics rank correctly on the radar.
* views/chip-detail.js: new `renderFingerprintSection(slug, sample)`
+ `_mountFingerprintChart(el, slug, sample)`; renumbered the
every-submission section eyebrow to 03 and peers to 04.
* css/chip-detail.css: full styling for `.chip-fp-section`,
`.chip-fp-wrap` (responsive 2-col on desktop, stacked on mobile),
`.chip-fp-canvas` (square aspect-ratio wrapper for Chart.js),
and `.chip-fp-legend` (suite-tinted letter chips + per-row tnum).
* test/suite_fingerprint.test.mjs (5 cases): leader normalised to
1.0, mid-pack to 0.5, missing suites flagged + zero, unknown
slugs return a complete Map without throwing, cross-suite
ordering inversions surface as expected. 37 total pass / 0 fail
/ ~230ms.
Co-authored-by: Cursor <cursoragent@cursor.com>
Add a section between "Leaderboard tiers" and "Using local or air-gapped models" so contributors know which fields the v2 frontend reads (and which are optional polish vs. required schema). README intentionally stays as-is. * Chip identity vs. chip count — chip-detail aggregates every fan-out (×1/×4/×8) onto one page keyed on chip model alone, so per-fan-out result.json files don't need invented chip names. Old `…-x<N>` URLs auto-redirect. * Vendor colours — explain that `VENDOR_COLORS` in data.js is the one source of truth: a one-line PR pins both the brand colour and the vendor's position in `VENDOR_ORDER`; vendors not in the map get a deterministic fallback from a 9-entry palette so a new result is never colour-less on first appearance. * Optional viz fields — table of `viz.type` shapes the modal + compare charts understand today (decode / multichip / quant / longctx / scaling) with the required keys for each. Stresses that viz is fully optional — runners that don't emit it still get a clean Details / Implementation modal. * Submitter handle — clarify that `meta.submitted_by` becomes `@<handle>` next to every result (utils.js `submitterHandle`). Co-authored-by: Cursor <cursoragent@cursor.com>
Two complementary polish items, both small and table-focused.
* Rankings: hide metric columns whose values are entirely null in
the current post-filter slice. suite_C's quant_quality and
suite_E's scaling_efficiency are null on most chips, so the
default rendering used to show four "—" columns at every breakpoint
— visual noise that read as "this chip is broken" rather than
"this metric is suite-specific". ?cols=auto (default) elides
those columns; ?cols=all forces every declared column. Primary
and active-sort columns are always pinned visible (hiding either
would break the page logic).
The toggle lives in the existing .rk-status row as a pill anchor
("3 columns hidden — show all" / "Hide empty columns") so middle-
click + Cmd-click open the alternate state in a new tab — same
affordance every other rankings filter uses. rebuildHash drops
the ?cols= param when toggling back to auto so URLs stay tidy.
* Focus ring polish: extend the global :focus-visible from a single
2px outline to a two-layer halo — the crisp 2px accent outline plus
a 4px translucent box-shadow ring underneath using
color-mix(--accent 28%). The original ring got swallowed on the
hero radial gradient and on suite-color-tinted rows; the halo
traces the element's border-radius via box-shadow, so it survives
busy backgrounds without per-element tuning.
Pills (rk-suite-pill / rk-facet-pill / scn-suite-letter) get a
smaller 3px halo so it doesn't dwarf them. Table rows opt out of
the halo entirely — `border-collapse: collapse` clips box-shadow
mid-row, and the crisp outline still gives keyboard users an
unambiguous focus indicator.
Tests: 37 pass / 0 fail unchanged (no new test surface — the toggle
is a render-time slice over existing data; the focus ring is CSS).
Co-authored-by: Cursor <cursoragent@cursor.com>
The fingerprint radar (commit 10) answered "where on the spectrum
does this chip sit?". This new 03 · Scaling section answers the
follow-up question: "and how much does going wide actually help?"
Only meaningful for chips deployed at >1 fan-out (A100, H100,
H200, 4090D in the current dataset). Single-variant chips
(consumer cards, Apple silicon) suppress the section entirely so
the page renumbers downward — no empty layout slot, no stretched
chart of one bar.
* Section anatomy: full-width card with a grouped bar chart on the
left (one cluster per active suite, one bar per chip-count) and
a vendor-tinted swatch legend on the right. Below the chart, a
per-suite breakdown grid surfaces the absolute primary-metric
value for every chip-count cell — bars are normalised %, the
breakdown carries the raw tok/s / ms / % so the user sees both.
* Y-axis is "% of this chip's best on this suite" (intra-cluster
normalisation). asc-direction primary metrics inverted so the
lowest-latency variant still reads as 100 %. Cross-suite mixing
is intentionally NOT done here — that's what the fingerprint
radar is for; this section answers the *intra*-suite scaling
question instead, which is what users go to chip-detail to debug.
* Per-chip-count colour: bars within a cluster fade from accent (×1)
to a lighter accent (×N) so the gradient reads as "more chips =
more". Cells with no submission collapse to null bars + an "—"
placeholder in the breakdown so the gap is explicit rather than
silently re-aligning the cluster.
* Section numbering renumbers downward when scaling is shown:
03 scaling → 04 every submission → 05 peers; without scaling the
numbering stays 03 every submission → 04 peers. renderSimilarChips
now takes a sectionNum arg so the eyebrow stays in lock-step.
* data.js: new chipCountScaling(slug) returns
{ chipCounts: [1, 4, 8],
suites: [{ sid, letter, title, perCount }] }
where perCount is a Map<chip_count, { value, normalized, run_id }>
with null-cell guarantees. Suites with no submission at any
chip-count are dropped from `suites` so the chart doesn't render
empty clusters.
* test/chip_count_scaling.test.mjs (5 cases): single-fan-out → empty
suites; multi-fan-out → intra-cluster normalisation; missing
cells null/zero (NOT absent — chart relies on full Map); fully
absent suites dropped; unknown slug returns stable shape without
throwing. 42 total pass / 0 fail / ~180ms.
Co-authored-by: Cursor <cursoragent@cursor.com>
Path-1 share: each Chart.js render target gets a small "↓ PNG" pill
in its top-right corner that exports the chart to a downloadable
image. Zero deps, zero library — just `canvas.toBlob` + an
ephemeral `<a download>`. Covers four chart families:
* modal Visualize tab — every per-suite chart (suite_A throughput
bar, suite_C precision matrix, suite_D long-context, suite_E
scaling, suite_G MoE, sustained over time, etc.)
* chip-detail · 02 fingerprint radar
* chip-detail · 03 chip-count scaling bar
* compare · per-suite head-to-head charts
The PNG is composited onto an opaque background read from
`--bg-elev` before export — Chart.js renders with a transparent
fill, so a naive toDataURL on a dark-mode page produces an image
with white text on a transparent background that looks like
nothing once pasted into Slack / X / GitHub Discussions.
Filenames are self-describing so downloads stack cleanly in the
user's Downloads folder:
* compare: `compare-suite-a-throughput-by-concurrency.png`
* chip-detail: `<chip-slug>-fingerprint.png`,
`<chip-slug>-scaling.png`
* modal viz: `<run_id>-<section-title-slug>.png` (the section
title is read from the nearest preceding
`.viz-section-title` so each `_mkCanvas` caller
doesn't need to re-declare its name)
* utils.js: new `downloadCanvasAsPng(canvas, opts)` —
three-step robustness: composite + opaque fill → toBlob → URL.
Falls back to `toDataURL` when toBlob isn't available (older
Safari, jsdom). Returns Promise<boolean>; never throws.
* components.css: shared `.chart-dl-btn` pill (top-right absolute,
fades in on hover / :focus-visible, green `is-saved` / red
`is-failed` flash via flashButtonLabel). Containers everywhere
switched to `position: relative` so the pill anchors against the
canvas surface, not the page.
* modal.js: `_mkCanvas(height, name?)` now ships the download
button automatically — every existing renderer (8+ call sites)
picks up the affordance for free. Click delegate added to the
modal-shell handler, slug-fies the nearest `.viz-section-title`
for the filename so adding a new chart needs no naming work.
* views/chip-detail.js: download button HTML threaded into the
fingerprint + scaling canvas wrappers; bindClicks routes
`[data-chart-dl]` to a new `_downloadChipChart`.
* views/compare.js: same affordance per head-to-head chart card.
Click delegate inside the existing compare bindClicks.
* test/share_helpers.test.mjs (4 new cases): null-safety, missing
2d context fall-through, full toBlob happy path with composite
ordering assertion (background fill MUST happen before drawImage
— otherwise the export ignores the bg), toDataURL fallback
exercised by removing the toBlob method on the fake canvas.
Plus dom_stub gets `removeChild` / `click` so the helper's
`document.body.appendChild(a) → a.click() → removeChild` flow
doesn't throw under node-test.
46 total pass / 0 fail / ~180ms.
Co-authored-by: Cursor <cursoragent@cursor.com>
The PNG download click handler tried to read `slug` from `bindClicks`'s outer closure, but bindClicks attaches once per mounted view (via the `__chipDetailClicksAttached` guard) and never captured slug as a parameter — so clicking Download PNG on the fingerprint or scaling chart threw `ReferenceError: slug is not defined` and the button silently did nothing. Read the active chip slug from `location.hash` at click time instead, matching the pattern already used by `_chipShareUrl()`. This also makes the handler immune to stale-closure issues if the user navigates between chips without remounting the view. Co-authored-by: Cursor <cursoragent@cursor.com>
Pure-subtraction cleanup after the UI v2 stabilises. Nothing here
changes runtime behaviour; every removal was verified to have zero
references across `assets/js/**` and `index.html`.
- `data.js`
· Drop unused `maxBy` import (utils.js export now removed below).
· Remove 5 zero-importer exports (`ready`, `rows`, `uniqueChips`,
`rankWithinSuite`, `suiteLeader`). The internal `_ready` flag
is still consumed by every public lazy-init guard, so module
state is unaffected.
- `utils.js`
· Remove zero-importer exports `fmtPct`, `fmtMs`, `maxBy`, `el`.
Helpers that nobody calls quietly accumulate divergent expectations
of how they should behave; deleting them keeps the public surface
honest.
- `components.css`
· Remove `.vendor-pill` (+ ::before), `.suite-chip` family,
`.rel-bar` family, `.badge.flagged`, and `.rank` medals — none of
them appear in any view or template. The current DOM uses
`lb-row-rank` (home.css) and `chip-suite-rank` (chip-detail.css)
for the medal styling, so the colour mapping survives intact.
- `rankings.js`
· Drop `data-cols-toggle` from the column-visibility toggle anchor.
It's a stringly-typed mirror of the `?cols=` value that already
lives in the same anchor's `href`; navigation goes through the
native hashchange path, so no click delegate ever read it.
All 46 unit tests still pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
…basket Two narrow defensive fixes for failure modes that don't bite today but are one bad row / one stale shared link away from a broken UI. modal.js — `_renderSuiteC` online_by_precision branch Per-format records fed into the "TTFT p99 by QPS across formats" line chart had two unguarded deep accesses: `viz.online_by_precision[0].qps_labels` and `fp.sla_met.map(...)`. generate.py emits both today, but a hand-edited result.json or a future schema change that drops `sla_met` would throw `Cannot read properties of undefined (reading 'map')` and white out the entire Visualize tab. Default qps_labels to `[]`, gate `sla_met.map` behind `Array.isArray`, and default `ttft_p99` to `[]` for symmetry. compare.js — basket entries all resolve to null (stale shared link) When every `?runs=` id maps back to a `null` row (re-uploaded under new ids, or pruned), `chips` was empty but the `suiteEmpty` guard required `chips.length > 0`, so we fell into the normal-render branch and produced a header-only table plus an empty chart grid with no explanation — reads like a broken page. Add a dedicated short-circuit that explains what happened, shows a "Clear & start over" button (so the basket isn't permanently wedged), and keeps the chip cloud visible so the user can immediately rebuild a comparison. rankings.css gets a small `.cmp-basket--empty .cmp-basket-stale` rule for the muted italic count label. All 46 unit tests still pass. Co-authored-by: Cursor <cursoragent@cursor.com>
✅ AccelMark Validation: All submissions validSee the workflow run for details. |
The Node ≤20 runtime that CI pins (`node-version: '20'` in validate_pr.yml) doesn't expose `globalThis.navigator` at all — that web-standard global only landed in Node 21+. The existing test reached for `Object.getOwnPropertyDescriptor(globalThis.navigator, "clipboard")`, which throws `Cannot convert undefined or null to object` on Node 20. Locally (Node 25) the test passed and we shipped the regression in commit 9. Replace the whole `navigator` slot via defineProperty on globalThis instead of monkey-patching its `clipboard` property. On Node ≤20 the slot didn't exist (defineProperty creates it, finally branch deletes it); on Node ≥21 the existing slot is a configurable getter (defineProperty replaces it, finally branch restores the original descriptor). Behaviour is identical on either runtime. Verified locally on Node 25 (46/46) and Node 20 (46/46). Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Type of change
Testing
# Commands used to verifyChecklist
result.jsonfiles (or I have explained the migration path)BenchmarkRunner, produces validresult.json, includes a reference resultvalidate_submission.pyupdated and all existing results still validateleaderboard/generate.pyproduces correct output on existing resultsRelated issues