chore(benchmark): refresh CI baseline to latest gateway main#65
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the committed benchmark CI baseline JSON to reflect the latest measurements from gateway main (bdcc4ed), keeping CI’s compare step aligned with current performance characteristics on the same host fingerprint.
Changes:
- Refresh
benchmark/baseline/ci.jsonwith new footprint and scaling metrics from run20260622-220119. - Update the baseline
gateway_shaand host memory total to match the new run metadata.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Re-measured the footprint and scaling lanes on the same host as the previous baseline (Intel i7-10700K, 16 cores, glibc) against gateway main bdcc4ed, and stored the result as the ci baseline. - footprint USS median: 90.8 -> 53.0 MiB - footprint threads: 53 -> 31 - footprint CPU-cores: 0.195 -> 0.246 - scaling exponent: 0.456 -> 0.293 (95% CI [0.03, 0.55]) - gateway_sha now recorded for all lanes including footprint (the old footprint entry was 'unknown') compare passes against the new baseline (no regressions).
update-baseline wrote a fixed comment claiming the footprint gateway_sha may be unknown, regardless of the actual run. Collect gateway_sha from every lane's run_metadata and write a comment that states the real source: one measured commit when all lanes agree, or the per-lane values otherwise. Take the top-level gateway_sha from the first non-unknown lane. Regenerate ci.json: the comment now names the measured commit; the metrics are unchanged.
a7bb73b to
16bf727
Compare
mfaferek93
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Refresh the benchmark CI baseline (
benchmark/baseline/ci.json) to gateway mainbdcc4ed. The footprint and scaling lanes were re-measured on the same host as the previous baseline (Intel i7-10700K, 16 cores, glibc), so host-keyedcomparestays valid.Numbers (old baseline
8569f21-> newbdcc4ed):Notes:
gateway_shafor every lane including footprint. The old footprint entry hadgateway_sha = unknownbecause the demo image predated SHA capture.compareagainst the new baseline passes with no regressions. Footprint CPU is up ~26% and shows as WARN, not a regression. The config sweep confirmsrefresh_interval_msis the dominant CPU factor, and the demo runs at a 10s refresh, so this is a small same-config difference, not a refresh spike.Related Issue
closes #67
Checklist
compareagainst the new baseline passes)