Feature/leaderboard UI v2 by JuhaoLiang1997 · Pull Request #48 · FreedomIntelligence/AccelMark

JuhaoLiang1997 · 2026-05-15T19:16:40Z

Summary

Type of change

Testing

# Commands used to verify

Checklist

I have read CONTRIBUTING.md
My change does not break existing result.json files (or I have explained the migration path)
If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
If changing the schema: validate_submission.py updated and all existing results still validate
If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
I have updated relevant documentation

Related issues

Splits the legacy 2300-line index.html into ES modules (assets/css/* + assets/js/*) and introduces a hash router so each view lives in its own file. This commit only realises the Home page (suite overview grid + top-3 podiums + recent submissions). Rankings, Compare, Chip detail and Suites views are scaffolded with informative placeholders so the router never errors; full implementations land in follow-ups (commit 2: rankings + compare, 3: chip detail, 4: polish). Notable bits: - assets/css/{base,layout,components,home}.css design-token layer. - assets/js/data.js single source of truth for SUITE_META, primary- metric direction/scale/unit rules, and row indexing. - formatPrimary(value, suiteId) centralises display rules so suite quirks (e.g. Suite E ratios 0-1 rendered as %) only live in data.js. - assets/js/router.js minimal hash router with compare-basket holder. - leaderboard/generate.py exposes window.LEADERBOARD_DATA so ES modules can read the generator output (a classic-script `const` is not visible to modules). - Old style.css removed (unused since the current dark-theme UI inlined styles into index.html). Co-authored-by: Cursor <cursoragent@cursor.com>

Cross-session handoff drafts are session-local; they shouldn't land in the repo. Match by glob so any feature handoff stays untracked. Co-authored-by: Cursor <cursoragent@cursor.com>

Per first-round Home feedback ("乱", "字小", "bar 喧宾夺主", "97.0 怪"), rebuild the Home page around small standalone leaderboards instead of a podium+bar visual. UI changes - Suite card now has a filled accent header (letter + title + metric tag) followed by tagline, top-5 row list, and a centered "View full ranking →" CTA. Replaces the 3-entry podium with relative bars. - New .lb-row primitive: rank circle (gold/silver/bronze for #1–#3, neutral for #4–#5), chip name, vendor color dot + framework, and a right-aligned mono score with muted unit. - Recent submissions reuse .lb-row; suite letter takes the rank slot and a small "Suite X" pill plus date sit in the sub line. - Vendor identity is now visible at a glance via the colored dot. Theming - Auto light/dark via prefers-color-scheme; no manual toggle, no separate stylesheet. Tokens redeclared in @media block in base.css. - Badges, primary button, compare pill now use color-mix on the current accent/good/bad tokens so they adapt to both themes. Layout polish - Hero: bigger h1 (2.1rem), more breathing tagline, KPI value bumped to 1.7rem with uppercased label kerning. - Main padding loosened to 2rem 1.75rem 3rem; section margin to 3rem so 7-card grid no longer feels cramped. - --max-w narrowed 1400 → 1200 for more comfortable line lengths. Misc - fmtNum integer fast-path: 97 / 32 / 4 render without ".0". - formatPrimary unit is split from the number in views so the unit can render muted next to the value. Co-authored-by: Cursor <cursoragent@cursor.com>

…rm parchment Feedback was "颜色和布局有点单调". Premium UI in 2026 isn't about more chrome — it's restraint plus craft (Vercel, Linear, Anthropic). Reworked the visual system in that direction. Palette - Drop pure black / pure white. Dark uses an off-black (#0b0c10), light uses a warm parchment (#faf8f3) borrowed from editorial sites. - Each suite (A–G) now has its own perceptually-distinct category color (cobalt, violet, teal, terracotta, emerald, rose, indigo). Surfaced as the suite card's header bar, the letter chip, the #1 row tint, the CTA color, and the small suite-tag pill on recent rows. - Color reads consistently across the page — same color tells the same story everywhere. Typography - Pair editorial serif (ui-serif → Iowan / Charter / Georgia) with the existing sans body. Serif now drives the hero h1 (with italic accent on the lede), section h2s, KPI numerals, and the score-val numbers. - Hero h1 bumped to 2.7rem with -0.02em tracking; eyebrow micro-cap label sits above for editorial pacing. Layout polish - Hero: subtle multi-color radial-gradient backdrop (3 stops at low alpha) clipped to a softly-rounded billboard. Adds depth without noise. - KPI strip: thin vertical dividers between values, bracketed by hair- line top/bottom rules. - Section headers: editorial eyebrow ("01 · Workloads", "02 · Latest activity") + serif h2, separated from the body by a faint rule. - Suite card #1 row visually featured: faint color tint background + left accent bar + slightly larger score. - Soft card lift on hover (translate + bigger shadow + suite-colored border). - Theme-color meta now follows prefers-color-scheme so the iOS chrome matches. /suites placeholder updated to the same visual language so it doesn't look stranded. Co-authored-by: Cursor <cursoragent@cursor.com>

…ghter type Round-3 feedback fixes: Hero - "标题挤在左边" — center the entire hero stack (eyebrow, h1, tagline, KPI strip, CTAs). KPI tiles centered between two hairline rules, with vertical dividers between values. - Drop the leading "—" on the hero eyebrow so a centered single line reads cleanly. Row sub line - Framework version now displayed next to the framework name ("vLLM 0.7.3", "SGLang 0.5.6"). Long dev versions (e.g. "0.19.1rc1.dev339+gedc364896") are normalized via shortVersion() which trims at "+" and ".dev" plus a 12-char hard cap. - Submitter handle ("@username") rendered after precision on the wider row-2 cards and on every recent-submissions row. Row-1 compact cards skip it to keep the 25%-width slot readable. - New helpers: utils.shortVersion(), utils.submitterHandle(). Typography - "字体不统一" — reduce serif to display only (h1, h2, hero KPI numerals). Score values, suite letters, and the suite-stats counts now use sans-bold with tabular-nums so column alignment is crisp but the visual stays consistent. - Suite letter chip dropped from serif → sans-bold for a tighter pairing with the sans card title beside it. Layout - 7 suite cards split into 4+3 on a 12-col grid (row 1: A B C D span 3 each; row 2: E F G span 4 each). Row 1 cards use a stacked head (letter + title row → metric pill below right- aligned) so the title gets the full card width — no more "Single-chip throu…" ellipsis. - Row 1 cards show top 4 entries; row 2 cards show top 5 + submitter. - Mobile <=1100px collapses to 2-col, <=640px to single column. Co-authored-by: Cursor <cursoragent@cursor.com>

…ction Round-4 feedback fixes: Hero - Two-line title: "AccelMark Leaderboard" + "A Reproducible Multi-Regime AI Accelerator Benchmark" subtitle. The em-dash formulation is gone. - All sans, no serif anywhere ("字体统一用现在的结果的字体"). Background - The hero radial-gradient backdrop felt boxed-in because it was clipped to a rounded rectangle. Promoted it to body::before with absolute positioning, a 720-px height, and a linear mask that fades the bottom to transparent — the wash now blends into the page bg with no visible boundary and scrolls away with the hero. Suite cards - Uniform 3-col grid again (3 / 3 / 1 instead of the 4 + 3 split). All cards same size, all show top 6 entries. - Card header now carries the tagline (moved out of the body) plus a new meta line with the suite's model, baseline precision, total results, and chip-config count — e.g. "Llama 3 · 8B · BF16 baseline · 18 results · 17 chips". - Suite title bumped to 1.15rem 700. - Long titles ("Quantization efficiency") now wrap instead of ellipsing — the head row uses flex 1 1 auto. Data helpers - data.suiteFacts(id) returns the mode model, mode precision, and per-suite counts. - data.vendorBreakdown() returns one entry per vendor with chips, submissions, suite letters, and the vendor's best chip in its most-populated suite. - utils.shortModel() abbreviates HF-style names ("Meta-Llama-3-8B- Instruct" → "Llama 3 · 8B", "Mixtral-8x7B-Instruct-v0.1" → keeps the × variant). New 02 · Coverage section - "Submissions by vendor" — auto-fit grid of vendor cards. Each card shows the vendor's color stripe and dot, chip + submission counts, the headline top chip with its score, and small letter pills tinted by suite color showing which suites the vendor appears in. Pairs with section 01 to give a second lens on the same dataset. Typography - Drop --font-serif token entirely. Sans only for h1/h2/KPI/score/ card title/suite letter. Mono is retained behind the .mono utility but not used anywhere on home. /suites updated to the same suite-card shape so the two pages share visual language. Co-authored-by: Cursor <cursoragent@cursor.com>

…bels - Home page restyled: 8-row leaderboard cards, "Chips on the leaderboard" cloud (tile size = submission count, color = vendor) replacing the old per-vendor grid, and a new "Submit your result" contributor section. - byline on each lb-row now carries framework + version + submitter handle for richer attribution at a glance. - Suite headers use hardcoded model/protocol tokens (not data-derived) so the page does not jitter on submission changes. - Metric labels rolled out to formal names across data.js (tokens/sec, queries/sec, 2x/4x scaling efficiency, quality efficiency, sustained throughput) — these flow into both the home view and any future view. - Hero subtitle pinned with white-space: nowrap and responsive font ramp so it stops awkwardly wrapping on common widths. - Em dashes in number/date fallbacks replaced with hyphens to play nicer with the editorial type and copy-paste. - .screenshots/ ignored (local-only design QA captures). Co-authored-by: Cursor <cursoragent@cursor.com>

…per-suite specs Replaces the placeholder suites view with a long-form explainer that makes the per-suite design argument concrete and scannable. Page structure - Hero: "Workload Suites" + a single, full-width subtitle (no eyebrow, no third tagline line). - 01 Methodology: 2-column layout. Prose on the left explains why a single score collapses information across heterogeneous regimes; an inline SVG roofline diagram on the right plots each suite at its bottleneck region (memory-bound vs compute-bound), with knee marker and per-suite dots colored by suite. - 02 Scenarios catalog (new): the seven protocols (accuracy / offline / online / interactive / sustained / speculative / burst) factored out into a single catalog. Each card has an SVG icon, role tagline, prose description, mini-spec table (metric, direction, threshold, cost), and clickable "Used by" suite-letter chips. - 03 Specifications: per-suite cards now carry a description paragraph paired with a "Concrete finding" sidebar, a spec strip, compact protocol pills, and a horizontal mini-card grid of current leaders. - 04 Datasets and 05 Propose-a-suite CTA round out the page. Interactions - Clicking a "Used by" chip in a scenario card scrolls smoothly to the matching suite section and flashes the card briefly. Uses event delegation with an attach-once guard to survive SPA re-renders. - scroll-margin-top on suite cards clears the sticky top nav so anchor jumps don't tuck the target underneath. Other - index.html loads the new suites.css alongside the existing layout/components/home stylesheets. Co-authored-by: Cursor <cursoragent@cursor.com>

…l modal Rankings, Compare and Chip-detail get rebuilt on top of the v1 scaffold; single-run detail moves out of a dedicated page into a global modal so the leaderboard reads as one continuous browse → compare → drill flow. Views - rankings: suite pills + multi-select vendor/precision/framework facet pills, sticky toolbar, sortable table, URL-synced state, compare basket keyed by run_id, in-basket banner, full chip column linkable to chip-detail, soft `?chip=<slug>` filter with focus banner. - compare: basket strip + per-suite resolver that surfaces the same hardware in any suite, per-metric table with winner stars, head-to- head Chart.js panels per suite (grouped bars, scaling lines, precision matrix, latency horizontal bars), chip-cloud quick-add (empty state + populated "Add another chip" affordance), empty-suite banner with "Try Suite X instead" suggestions, sharable `?runs=<rid,…>` seed. - chip-detail: full rewrite — hero with vendor/memory eyebrow + facts + Compare/Browse CTAs, 01 per-suite KPI grid with #N/M rank badges (gold/silver/bronze top-3, unique-chip rank), 02 every-submission table, 03 Compare-with-similar-chips peer grid using shared-suite overlap with same-vendor tie-break. - home: chip cloud + lb-rows now use nested anchors so chip names route to /chip/<slug> while the rest of the row pops the modal. Run-detail modal - New assets/js/modal.js: global singleton overlay with Details / Visualize / Implementation tabs, per-suite Chart.js renderings, deep link via `?run=<rid>&tab=viz`, history.replaceState so router never re-dispatches mid-modal, nested-anchor escape so inner <a> tags keep native navigation, stale-rid auto-strip. Polish - generate.py hashes leaderboard.js after writing and rewrites the <script src="leaderboard.js?v=<sha8>"> tag in index.html so CDN / browser cache invalidates on every data refresh. - chipHref defends against missing slugs, chip-detail not-found mirrors .rk-empty / .cmp-empty rather than the loading state. - Mobile audit ≤480/420: topnav drops wordmark + GH link, hero h1 shrinks and wraps, CTA stacks vertically, chip-tile collapses to single compact size with text wrapping. Data - new helpers: rankChipInSuite, similarChipsTo, bestRowForRunInSuite, representativeRunForChip, rowByRunId, chipCloudData; bestPerSuiteForChip and bestPerChipForSuite formalised; `_chip_slug` / `_chip_label` attached at init so views never recompute. Removed - Top nav compare counter pill (duplicated the rankings toolbar status per user feedback). Co-authored-by: Cursor <cursoragent@cursor.com>

…s + tests Three independent improvements shipped together — designed in .handoff-leaderboard-ui-v2.md as commits 5/6/7. * Chip-detail page is now per chip-model, not per (chip × chip_count) fan-out. 4090D, 4090D ×4, 4090D ×8 share one page; the hero shows the bare chip name, the per-suite KPI cards add a "×N" badge when the best score came from a multi-card deployment, and the every- submission table grows a Chips column grouped by fan-out. utils: chipSlug() drops the -x<N> suffix; new normalizeChipSlug() lets the router auto-redirect legacy "/chip/<slug>-x<N>" URLs so old shared links keep working. Real data: 32 chip-detail pages collapse to 21. * Vendor colour binding is 100% data-driven. Single VENDOR_COLORS map in data.js plus an injectVendorStyles() that writes one [data-vendor="X"]{--vendor-color:#hex} rule per vendor seen in the loaded dataset into a singleton <style> tag at init() time. Unknown vendors get a deterministic fallback colour from a 9-shade palette. Deletes ~70 lines of per-vendor CSS spread across 6 files (base/components/rankings/chip-detail/home/suites). Adding a new vendor with brand colour is now one line in JS instead of seven CSS blocks; adding one without a brand colour is zero lines. * Rankings facet pills (vendor / precision / framework / clear-all / empty-state-clear) switched from <button> to <a href> with a precomputed toggled-URL. Middle / Cmd / Ctrl / Shift-click open in a new tab via native anchor semantics; plain left-click still preventDefault()s through the SPA toggle. Suite pill stays a button because a suite switch resets sort + filters, so "open same suite with different filters in a new tab" doesn't make sense. * Chip-detail KPI cards expose a hover hint footer ("Open run · ⌘+click for all in suite") plus a full title attribute so the dual click affordance is discoverable for keyboard / screen-reader users too. Footer collapses on touch widths (<=640px) since modifier-click is moot. * New node:test coverage with zero npm deps: - leaderboard/site/test/dom_stub.mjs — minimal DOM + Chart.js stand-in (head, getElementById, FakeEl id getter/setter). - modal_viz.test.mjs — 11 cases: happy-path full suite_A row, no-data placeholder, unknown viz.type fallback, partial blocks for every per-suite renderer, vizHasAnyData boundary. - chip_slug.test.mjs — 6 cases pinning the chipSlug + normalize contract so a refactor can't accidentally re-fork chip-detail. - vendor_color.test.mjs — 5 cases: brand hits, deterministic fallback, null safety, distinct shades for unknown vendors, VENDOR_ORDER stays in sync with VENDOR_COLORS keys. Total 22 pass / 0 fail / ~110ms. modal.js exports a deliberately underscored `_test` ESM hatch (vizHasAnyData / renderViz / destroyCharts / resetColorCache) so tests can drive the panel without booting the full modal. * CI: validate_pr.yml gains paths trigger leaderboard/site/** and a new frontend-tests job (actions/setup-node@v4 → node --test leaderboard/site/test/*.test.mjs). Co-authored-by: Cursor <cursoragent@cursor.com>

… ribbon Three small backlog items shipped together because they touch disjoint views; each is a single-purpose UX improvement. * Suites 01 Methodology: wrap the second + third paragraph in a <details class="why-prose-more"> so mobile readers see the opening argument plus an opt-in expander instead of three full prose paragraphs above the fold. Mount-time matchMedia decides desktop default-open / mobile default-closed (we deliberately don't react to viewport changes — toggling open mid-read would yank the user's scroll). Summary chrome strips the native triangle marker for a pill that matches the eyebrow palette. * Compare basket: add a Copy share link button next to Clear all. bindClicks intercepts data-basket-share, _shareUrlForBasket recomposes the canonical URL from the in-memory basket + active suite (the seed ?runs= is stripped from the address bar after consumption, so location.href alone wouldn't restore the basket). navigator.clipboard.writeText is the happy path; falls back to a hidden <textarea> + execCommand("copy") on insecure contexts and to window.prompt as a last resort, so the button never silently no-ops. Success / failure each flash the label for 1.6s / 3.5s with is-copied / is-copy-failed accent palettes. * Home hero: add a "this week" KPI tile + accent ribbon below the KPI strip showing recentSince(7d) submissions, linking to #/rankings?suite=suite_A&sort=date:desc. When count is zero the whole ribbon + KPI are suppressed so quiet weeks don't read as a regression. Pulsing accent dot mirrors a status-LED convention; prefers-reduced-motion turns it static. * data.js: new recentSince(days, now=new Date()). Compares lexicographically against a UTC YYYY-MM-DD cutoff (rows are calendar-dates with no time component; UTC avoids midnight rollover ambiguity). Returns 0 for non-positive / NaN inputs rather than throwing, so the hero still renders if the data shape ever drifts. * Tests: new test/recent_since.test.mjs (5 cases, 27 total pass / 0 fail) covering 7d / 1d / 365d / negative+NaN / default-now and the no-date-row skip. Re-uses dom_stub.mjs with a hand-built fixture so the suite stays npm-free. Co-authored-by: Cursor <cursoragent@cursor.com>

…board a11y Two related improvements that touch every view: * Hoist the compare basket's share-link logic into reusable utils: copyToClipboard(text) and flashButtonLabel(btn, msg, opts). Both are pure helpers with full progressive fallback (clipboard API → textarea+execCommand → window.prompt) and never throw, so views stay one-liners. Add a Copy link CTA to the chip-detail hero that uses them; rewrite compare.js to consume the same helpers. Promote success / failure flash chrome onto a shared `.copy-btn` class so any future share button picks up the green / red accents without per-view CSS. * Keyboard a11y sweep across every clickable surface: - Global :focus-visible ring (2px accent + offset, inherits radius) on the body so tab navigation always shows where focus is. - lb-row trigger divs (home suite cards + recent activity) get role="button" + tabindex="0" + aria-label; modal.js gains a keydown delegate that turns Enter / Space on any [data-open-run] surface into the same openModal call a click would make. Same nested-anchor escape as the click delegate so focus on an inner chip-name link still navigates to the chip page. - Rankings + chip-detail table rows get tabindex="0" + aria-label; we keep the native <tr> role so screen-reader column-header pairing still works (role="button" would clobber it). - Compare chip-cloud tiles get role="button" + aria-pressed mirroring the in-basket state, with an extended title that documents both the toggle and the modifier-click navigation paths. - Modal tabs / panels are now linked via id + aria-controls + aria-labelledby, with a focusable panel container so keyboard users land in the active panel after switching tabs. - dom_stub: querySelector / querySelectorAll stubs on FakeEl so helper smoke tests can compose without monkey-patching jsdom. Tests: new test/share_helpers.test.mjs (5 cases) covers the no-clipboard fallback, navigator.clipboard.writeText happy path, null safety, and the flashButtonLabel re-entrancy invariant (rapid clicks must not lock the button into the flash label). 32 total pass / 0 fail / ~150ms. Co-authored-by: Cursor <cursoragent@cursor.com>

…elper The KPI grid told users which suites a chip submitted to and what its primary metric was; what it never showed was *where the chip sits across the spectrum* — is this a bandwidth-bound winner that loses on long context, a balanced mid-stack chip, or a one-suite specialist? The radar makes that question answerable in one glance: each axis is a suite letter, each tick is % of the global best primary metric for that suite, asc-direction metrics inverted so the leader still reads as 100 %. Section anatomy: full-width card with a square radar canvas on the left, a sortable per-suite legend (suite letter chip + name + raw value + normalised %) on the right. Suites the chip never submitted to come back as missing entries — the polygon collapses to centre on those axes, the legend annotates them as "no data" / "—" so the gap is explicit rather than implied. The radar also surfaces a dashed reference ring at 100 % so users can read each axis as "how close is this chip to the best chip ever submitted on this workload" without doing the division themselves. Chip's brand colour fills the polygon — same colour the rest of the site uses for the chip's vendor. Mounted post-render via setTimeout(0) so Chart.js sees an attached canvas; previous chart instance is destroyed on every re-mount so quick chip switching doesn't leak canvases. Skipped entirely when the chip has data on fewer than 2 suites (a single-point polygon adds no signal), and the section's structured legend stays the fallback when window.Chart isn't loaded. Renumber sections: 02 fingerprint, 03 every submission, 04 peers. * data.js: new `suiteFingerprint(slug)` returns Map<suiteId, { value, normalized, best, missing }> for every suite in SUITE_ORDER. Reuses rowsForSuite for global-best lookup. Asc-direction metrics get `bestValue / value` so TTFT/latency metrics rank correctly on the radar. * views/chip-detail.js: new `renderFingerprintSection(slug, sample)` + `_mountFingerprintChart(el, slug, sample)`; renumbered the every-submission section eyebrow to 03 and peers to 04. * css/chip-detail.css: full styling for `.chip-fp-section`, `.chip-fp-wrap` (responsive 2-col on desktop, stacked on mobile), `.chip-fp-canvas` (square aspect-ratio wrapper for Chart.js), and `.chip-fp-legend` (suite-tinted letter chips + per-row tnum). * test/suite_fingerprint.test.mjs (5 cases): leader normalised to 1.0, mid-pack to 0.5, missing suites flagged + zero, unknown slugs return a complete Map without throwing, cross-suite ordering inversions surface as expected. 37 total pass / 0 fail / ~230ms. Co-authored-by: Cursor <cursoragent@cursor.com>

Add a section between "Leaderboard tiers" and "Using local or air-gapped models" so contributors know which fields the v2 frontend reads (and which are optional polish vs. required schema). README intentionally stays as-is. * Chip identity vs. chip count — chip-detail aggregates every fan-out (×1/×4/×8) onto one page keyed on chip model alone, so per-fan-out result.json files don't need invented chip names. Old `…-x<N>` URLs auto-redirect. * Vendor colours — explain that `VENDOR_COLORS` in data.js is the one source of truth: a one-line PR pins both the brand colour and the vendor's position in `VENDOR_ORDER`; vendors not in the map get a deterministic fallback from a 9-entry palette so a new result is never colour-less on first appearance. * Optional viz fields — table of `viz.type` shapes the modal + compare charts understand today (decode / multichip / quant / longctx / scaling) with the required keys for each. Stresses that viz is fully optional — runners that don't emit it still get a clean Details / Implementation modal. * Submitter handle — clarify that `meta.submitted_by` becomes `@<handle>` next to every result (utils.js `submitterHandle`). Co-authored-by: Cursor <cursoragent@cursor.com>

Two complementary polish items, both small and table-focused. * Rankings: hide metric columns whose values are entirely null in the current post-filter slice. suite_C's quant_quality and suite_E's scaling_efficiency are null on most chips, so the default rendering used to show four "—" columns at every breakpoint — visual noise that read as "this chip is broken" rather than "this metric is suite-specific". ?cols=auto (default) elides those columns; ?cols=all forces every declared column. Primary and active-sort columns are always pinned visible (hiding either would break the page logic). The toggle lives in the existing .rk-status row as a pill anchor ("3 columns hidden — show all" / "Hide empty columns") so middle- click + Cmd-click open the alternate state in a new tab — same affordance every other rankings filter uses. rebuildHash drops the ?cols= param when toggling back to auto so URLs stay tidy. * Focus ring polish: extend the global :focus-visible from a single 2px outline to a two-layer halo — the crisp 2px accent outline plus a 4px translucent box-shadow ring underneath using color-mix(--accent 28%). The original ring got swallowed on the hero radial gradient and on suite-color-tinted rows; the halo traces the element's border-radius via box-shadow, so it survives busy backgrounds without per-element tuning. Pills (rk-suite-pill / rk-facet-pill / scn-suite-letter) get a smaller 3px halo so it doesn't dwarf them. Table rows opt out of the halo entirely — `border-collapse: collapse` clips box-shadow mid-row, and the crisp outline still gives keyboard users an unambiguous focus indicator. Tests: 37 pass / 0 fail unchanged (no new test surface — the toggle is a render-time slice over existing data; the focus ring is CSS). Co-authored-by: Cursor <cursoragent@cursor.com>

The fingerprint radar (commit 10) answered "where on the spectrum does this chip sit?". This new 03 · Scaling section answers the follow-up question: "and how much does going wide actually help?" Only meaningful for chips deployed at >1 fan-out (A100, H100, H200, 4090D in the current dataset). Single-variant chips (consumer cards, Apple silicon) suppress the section entirely so the page renumbers downward — no empty layout slot, no stretched chart of one bar. * Section anatomy: full-width card with a grouped bar chart on the left (one cluster per active suite, one bar per chip-count) and a vendor-tinted swatch legend on the right. Below the chart, a per-suite breakdown grid surfaces the absolute primary-metric value for every chip-count cell — bars are normalised %, the breakdown carries the raw tok/s / ms / % so the user sees both. * Y-axis is "% of this chip's best on this suite" (intra-cluster normalisation). asc-direction primary metrics inverted so the lowest-latency variant still reads as 100 %. Cross-suite mixing is intentionally NOT done here — that's what the fingerprint radar is for; this section answers the *intra*-suite scaling question instead, which is what users go to chip-detail to debug. * Per-chip-count colour: bars within a cluster fade from accent (×1) to a lighter accent (×N) so the gradient reads as "more chips = more". Cells with no submission collapse to null bars + an "—" placeholder in the breakdown so the gap is explicit rather than silently re-aligning the cluster. * Section numbering renumbers downward when scaling is shown: 03 scaling → 04 every submission → 05 peers; without scaling the numbering stays 03 every submission → 04 peers. renderSimilarChips now takes a sectionNum arg so the eyebrow stays in lock-step. * data.js: new chipCountScaling(slug) returns { chipCounts: [1, 4, 8], suites: [{ sid, letter, title, perCount }] } where perCount is a Map<chip_count, { value, normalized, run_id }> with null-cell guarantees. Suites with no submission at any chip-count are dropped from `suites` so the chart doesn't render empty clusters. * test/chip_count_scaling.test.mjs (5 cases): single-fan-out → empty suites; multi-fan-out → intra-cluster normalisation; missing cells null/zero (NOT absent — chart relies on full Map); fully absent suites dropped; unknown slug returns stable shape without throwing. 42 total pass / 0 fail / ~180ms. Co-authored-by: Cursor <cursoragent@cursor.com>

Path-1 share: each Chart.js render target gets a small "↓ PNG" pill in its top-right corner that exports the chart to a downloadable image. Zero deps, zero library — just `canvas.toBlob` + an ephemeral `<a download>`. Covers four chart families: * modal Visualize tab — every per-suite chart (suite_A throughput bar, suite_C precision matrix, suite_D long-context, suite_E scaling, suite_G MoE, sustained over time, etc.) * chip-detail · 02 fingerprint radar * chip-detail · 03 chip-count scaling bar * compare · per-suite head-to-head charts The PNG is composited onto an opaque background read from `--bg-elev` before export — Chart.js renders with a transparent fill, so a naive toDataURL on a dark-mode page produces an image with white text on a transparent background that looks like nothing once pasted into Slack / X / GitHub Discussions. Filenames are self-describing so downloads stack cleanly in the user's Downloads folder: * compare: `compare-suite-a-throughput-by-concurrency.png` * chip-detail: `<chip-slug>-fingerprint.png`, `<chip-slug>-scaling.png` * modal viz: `<run_id>-<section-title-slug>.png` (the section title is read from the nearest preceding `.viz-section-title` so each `_mkCanvas` caller doesn't need to re-declare its name) * utils.js: new `downloadCanvasAsPng(canvas, opts)` — three-step robustness: composite + opaque fill → toBlob → URL. Falls back to `toDataURL` when toBlob isn't available (older Safari, jsdom). Returns Promise<boolean>; never throws. * components.css: shared `.chart-dl-btn` pill (top-right absolute, fades in on hover / :focus-visible, green `is-saved` / red `is-failed` flash via flashButtonLabel). Containers everywhere switched to `position: relative` so the pill anchors against the canvas surface, not the page. * modal.js: `_mkCanvas(height, name?)` now ships the download button automatically — every existing renderer (8+ call sites) picks up the affordance for free. Click delegate added to the modal-shell handler, slug-fies the nearest `.viz-section-title` for the filename so adding a new chart needs no naming work. * views/chip-detail.js: download button HTML threaded into the fingerprint + scaling canvas wrappers; bindClicks routes `[data-chart-dl]` to a new `_downloadChipChart`. * views/compare.js: same affordance per head-to-head chart card. Click delegate inside the existing compare bindClicks. * test/share_helpers.test.mjs (4 new cases): null-safety, missing 2d context fall-through, full toBlob happy path with composite ordering assertion (background fill MUST happen before drawImage — otherwise the export ignores the bg), toDataURL fallback exercised by removing the toBlob method on the fake canvas. Plus dom_stub gets `removeChild` / `click` so the helper's `document.body.appendChild(a) → a.click() → removeChild` flow doesn't throw under node-test. 46 total pass / 0 fail / ~180ms. Co-authored-by: Cursor <cursoragent@cursor.com>

The PNG download click handler tried to read `slug` from `bindClicks`'s outer closure, but bindClicks attaches once per mounted view (via the `__chipDetailClicksAttached` guard) and never captured slug as a parameter — so clicking Download PNG on the fingerprint or scaling chart threw `ReferenceError: slug is not defined` and the button silently did nothing. Read the active chip slug from `location.hash` at click time instead, matching the pattern already used by `_chipShareUrl()`. This also makes the handler immune to stale-closure issues if the user navigates between chips without remounting the view. Co-authored-by: Cursor <cursoragent@cursor.com>

Pure-subtraction cleanup after the UI v2 stabilises. Nothing here changes runtime behaviour; every removal was verified to have zero references across `assets/js/**` and `index.html`. - `data.js` · Drop unused `maxBy` import (utils.js export now removed below). · Remove 5 zero-importer exports (`ready`, `rows`, `uniqueChips`, `rankWithinSuite`, `suiteLeader`). The internal `_ready` flag is still consumed by every public lazy-init guard, so module state is unaffected. - `utils.js` · Remove zero-importer exports `fmtPct`, `fmtMs`, `maxBy`, `el`. Helpers that nobody calls quietly accumulate divergent expectations of how they should behave; deleting them keeps the public surface honest. - `components.css` · Remove `.vendor-pill` (+ ::before), `.suite-chip` family, `.rel-bar` family, `.badge.flagged`, and `.rank` medals — none of them appear in any view or template. The current DOM uses `lb-row-rank` (home.css) and `chip-suite-rank` (chip-detail.css) for the medal styling, so the colour mapping survives intact. - `rankings.js` · Drop `data-cols-toggle` from the column-visibility toggle anchor. It's a stringly-typed mirror of the `?cols=` value that already lives in the same anchor's `href`; navigation goes through the native hashchange path, so no click delegate ever read it. All 46 unit tests still pass. Co-authored-by: Cursor <cursoragent@cursor.com>

…basket Two narrow defensive fixes for failure modes that don't bite today but are one bad row / one stale shared link away from a broken UI. modal.js — `_renderSuiteC` online_by_precision branch Per-format records fed into the "TTFT p99 by QPS across formats" line chart had two unguarded deep accesses: `viz.online_by_precision[0].qps_labels` and `fp.sla_met.map(...)`. generate.py emits both today, but a hand-edited result.json or a future schema change that drops `sla_met` would throw `Cannot read properties of undefined (reading 'map')` and white out the entire Visualize tab. Default qps_labels to `[]`, gate `sla_met.map` behind `Array.isArray`, and default `ttft_p99` to `[]` for symmetry. compare.js — basket entries all resolve to null (stale shared link) When every `?runs=` id maps back to a `null` row (re-uploaded under new ids, or pruned), `chips` was empty but the `suiteEmpty` guard required `chips.length > 0`, so we fell into the normal-render branch and produced a header-only table plus an empty chart grid with no explanation — reads like a broken page. Add a dedicated short-circuit that explains what happened, shows a "Clear & start over" button (so the basket isn't permanently wedged), and keeps the chip cloud visible so the user can immediately rebuild a comparison. rankings.css gets a small `.cmp-basket--empty .cmp-basket-stale` rule for the muted italic count label. All 46 unit tests still pass. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-15T19:16:55Z

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

The Node ≤20 runtime that CI pins (`node-version: '20'` in validate_pr.yml) doesn't expose `globalThis.navigator` at all — that web-standard global only landed in Node 21+. The existing test reached for `Object.getOwnPropertyDescriptor(globalThis.navigator, "clipboard")`, which throws `Cannot convert undefined or null to object` on Node 20. Locally (Node 25) the test passed and we shipped the regression in commit 9. Replace the whole `navigator` slot via defineProperty on globalThis instead of monkey-patching its `clipboard` property. On Node ≤20 the slot didn't exist (defineProperty creates it, finally branch deletes it); on Node ≥21 the existing slot is a configurable getter (defineProperty replaces it, finally branch restores the original descriptor). Behaviour is identical on either runtime. Verified locally on Node 25 (46/46) and Node 20 (46/46). Co-authored-by: Cursor <cursoragent@cursor.com>

JuhaoLiang1997 and others added 20 commits May 16, 2026 01:04

chore: ignore local .handoff-*.md notes

17f4eaf

Cross-session handoff drafts are session-local; they shouldn't land in the repo. Match by glob so any feature handoff stays untracked. Co-authored-by: Cursor <cursoragent@cursor.com>

JuhaoLiang1997 merged commit 6a038ca into main May 15, 2026
3 checks passed

JuhaoLiang1997 deleted the feature/leaderboard-ui-v2 branch May 15, 2026 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/leaderboard UI v2#48

Feature/leaderboard UI v2#48
JuhaoLiang1997 merged 21 commits into
mainfrom
feature/leaderboard-ui-v2

JuhaoLiang1997 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JuhaoLiang1997 commented May 15, 2026

Summary

Type of change

Testing

Checklist

Related issues

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ AccelMark Validation: All submissions valid

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 15, 2026 •

edited

Loading