fix(bench-arena): measure memory churn by allocation sampler, not an uncollected snapshot#407
Merged
Merged
Conversation
…uncollected snapshot The memory dimension read churn as the post-burst usedSize delta with no GC before the snapshot, unlike retained and leak which both collect first. So churn measured whatever transient garbage V8 had not yet swept at that instant: GC scheduling, not allocation. A cell could read tens of MB uncollected while the burst's truly-retained memory was under 100KB (collect first and it is), so the figure swung on snapshot timing, not on cost. Wrap the burst in the CDP allocation sampler (HeapProfiler.startSampling / stopSampling) and sum the sampled bytes instead. The sampler records allocations as they happen, independent of when they are collected, so churn reflects what a typing burst allocates, GC-timing-independent. retained and leak are untouched (they already collect before every snapshot). Re-bases every library's churn in results.json on the next CI bench refresh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The benchmark's memory
churnmetric read an uncollected heap snapshot, so it measured V8'sGC scheduling rather than allocation. This measures it with the CDP allocation sampler instead.
retainedandleakare unchanged.The bug
In the memory dimension, each cycle reads heap snapshots to derive three figures.
retainedandleakrun a double-GC before their snapshots;churndid not, readingusedSizeimmediately afterthe keystroke burst (
s2 - s1). So churn captured whatever transient garbage V8 had not swept yet atthat instant.
That is GC-scheduling noise, not a memory cost. On a single grid cell, churn could read tens of MB
uncollected, yet collecting before the snapshot showed the burst truly retains under 100KB (and
retained/leak, which already collect first, confirmed the live cost is parity). Same burst, samereal memory, orders of magnitude apart on a snapshot-timing accident. It quietly skewed the churn
column for the whole cohort.
The fix
Wrap the burst in
HeapProfiler.startSampling/stopSamplingand sum the sampled allocation. Thesampler records allocations as they happen, independent of when they are collected, so churn now
reflects the bytes a typing burst allocates, GC-timing-independent.
retainedandleakare untouched(they already collect before every snapshot).
This was the only metric with the issue: it was the lone uncollected snapshot, and the timed
dimensions measure elapsed time, not heap.
Validation
A local grid N100M8 memory run across the full cohort now reports stable, sane churn proportional to
each library's typing allocation (roughly 50 to 330KB), with no snapshot-timing spikes. Attaform's
churn reads about 327KB, down from one to two MB of uncollected noise before. Bench typecheck
(vue-tsc), eslint, and prettier all pass.
Sequencing
Changing how churn is measured re-bases every library's churn in
results.json. The committed numbersupdate on the next CI bench refresh (the monthly sharded workflow); the docs render the current numbers
until then.
🤖 Generated with Claude Code