Skip to content

chore(benchmark): refresh CI baseline to latest gateway main#65

Merged
bburda merged 2 commits into
mainfrom
chore/update-benchmark-baseline
Jun 23, 2026
Merged

chore(benchmark): refresh CI baseline to latest gateway main#65
bburda merged 2 commits into
mainfrom
chore/update-benchmark-baseline

Conversation

@bburda

@bburda bburda commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Description

Refresh the benchmark CI baseline (benchmark/baseline/ci.json) to gateway main bdcc4ed. The footprint and scaling lanes were re-measured on the same host as the previous baseline (Intel i7-10700K, 16 cores, glibc), so host-keyed compare stays valid.

Numbers (old baseline 8569f21 -> new bdcc4ed):

metric old new delta
footprint USS median 90.8 MiB 53.0 MiB -42%
footprint threads 53 31 -42%
footprint CPU-cores 0.195 0.246 +26%
scaling exponent 0.456 0.293 (CI [0.03, 0.55]) lower, slower growth

Notes:

  • The new baseline records a real gateway_sha for every lane including footprint. The old footprint entry had gateway_sha = unknown because the demo image predated SHA capture.
  • compare against the new baseline passes with no regressions. Footprint CPU is up ~26% and shows as WARN, not a regression. The config sweep confirms refresh_interval_ms is the dominant CPU factor, and the demo runs at a 10s refresh, so this is a small same-config difference, not a refresh spike.
  • The full run also covered the sweep, heap (no leak) and fault lanes; those are not part of the baseline file.
  • The README example results are illustrative one-host snapshots and are left unchanged.

Related Issue

closes #67

Checklist

  • Tested locally (benchmark unit tests pass; compare against the new baseline passes)
  • README updated (not needed; example results are illustrative)

Copilot AI review requested due to automatic review settings June 23, 2026 12:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the committed benchmark CI baseline JSON to reflect the latest measurements from gateway main (bdcc4ed), keeping CI’s compare step aligned with current performance characteristics on the same host fingerprint.

Changes:

  • Refresh benchmark/baseline/ci.json with new footprint and scaling metrics from run 20260622-220119.
  • Update the baseline gateway_sha and host memory total to match the new run metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmark/baseline/ci.json Outdated
@bburda bburda self-assigned this Jun 23, 2026
bburda added 2 commits June 23, 2026 15:35
Re-measured the footprint and scaling lanes on the same host as the
previous baseline (Intel i7-10700K, 16 cores, glibc) against gateway
main bdcc4ed, and stored the result as the ci baseline.

- footprint USS median: 90.8 -> 53.0 MiB
- footprint threads: 53 -> 31
- footprint CPU-cores: 0.195 -> 0.246
- scaling exponent: 0.456 -> 0.293 (95% CI [0.03, 0.55])
- gateway_sha now recorded for all lanes including footprint
  (the old footprint entry was 'unknown')

compare passes against the new baseline (no regressions).
update-baseline wrote a fixed comment claiming the footprint gateway_sha
may be unknown, regardless of the actual run. Collect gateway_sha from
every lane's run_metadata and write a comment that states the real
source: one measured commit when all lanes agree, or the per-lane values
otherwise. Take the top-level gateway_sha from the first non-unknown lane.

Regenerate ci.json: the comment now names the measured commit; the
metrics are unchanged.
@bburda bburda force-pushed the chore/update-benchmark-baseline branch from a7bb73b to 16bf727 Compare June 23, 2026 13:35
@bburda bburda merged commit 7cdaccb into main Jun 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refresh benchmark CI baseline to latest gateway main

3 participants