stream: reduce allocations on WHATWG streams hot paths#63876
Conversation
MattiasBuelens
left a comment
There was a problem hiding this comment.
Looks pretty sensible to me! 👍
| // No read is in flight. Mirror the buffered fast path of | ||
| // ReadableStreamDefaultReader.read(): when data is already queued | ||
| // in a default controller, resolve immediately without allocating | ||
| // a read request. The result settles synchronously, so leaving | ||
| // state.current undefined matches the state the slow path reaches | ||
| // once its read request callbacks have settled. |
There was a problem hiding this comment.
I wonder if we always have to inline this... 🤔
I ported some of your previous optimizations to web-streams-polyfill, but instead of copying the code around I made defaultReader.read() create a different kind of ReadRequest if it knows that it will be resolved synchronously.
Perhaps we can do the same here, and have nextSteps create a different kind of AsyncIteratorReadRequest if it can be resolved synchronously? (I forgot to do that in my polyfill, it seems. 😅 ) Or would that risk turning readableStreamDefaultReaderRead megamorphic? (Or maybe it already is?)
| queueMicrotask(() => { | ||
| controller[kState].started = true; | ||
| assert(!controller[kState].pulling); | ||
| assert(!controller[kState].pullAgain); | ||
| readableStreamDefaultControllerCallPullIfNeeded(controller); | ||
| }); |
There was a problem hiding this comment.
Nit-pick: maybe pull this callback into a const, so we can reuse it for the PromisePrototypeThen below?
|
Updated benchmarks after the regression fix: |
| // each known call-site arity gets its own wrapper. The exact number of | ||
| // arguments passed through to the user callback is observable and must be | ||
| // preserved. | ||
| function createPromiseCallback0(name, fn, thisArg) { |
There was a problem hiding this comment.
Can we give them more readable names? e.g. createPromiseCallbackNoParams, createPromiseCallback1Param, createPromiseCallback2Params
It would probably makes sense to split this into its own PR
There was a problem hiding this comment.
I would actually prefer not to. This is relatively small, that fighting CI only once would save a significant amount of time.
There was a problem hiding this comment.
I've opened #63909. The time you try to save will be tech debts for backporters, I'd much rather pay it now
There was a problem hiding this comment.
I kinda prefer the better names now also
There was a problem hiding this comment.
I'm ok with changing the names. I don't think doing that change without the rest of this PR is useful.
6d140aa to
92194f3
Compare
Pure-JavaScript optimizations to lib/internal/webstreams/* that reduce per-chunk and per-construction allocations on hot paths without observable behavior change. Per-chunk: reuse promise-reaction closures per controller, add buffered fast path for async iterator, specialize callback wrappers by arity, and share immutable nil records for writable stream resets. Per-construction: use queueMicrotask for non-object start results, materialize reader/writer .closed and .ready records lazily, and remove dead allocations. Assisted-by: Claude Fable 5 Signed-off-by: Matteo Collina <hello@matteocollina.com>
92194f3 to
da5b0ee
Compare
| // Arity-specialized variants of the promise-callback wrapper. The generic | ||
| // rest-parameter + ReflectApply form allocated an arguments array on every | ||
| // invocation; these run on per-chunk hot paths (pull/write/transform), so | ||
| // each known call-site arity gets its own wrapper. The exact number of | ||
| // arguments passed through to the user callback is observable and must be | ||
| // preserved. | ||
| function createPromiseCallbackNoParams(name, fn, thisArg) { |
There was a problem hiding this comment.
Given that it seems to not have a significant effect on the benchmark, consider reverting that so this PR doesn't conflict with #63909
There was a problem hiding this comment.
It does after these other optimizations are applied.
There was a problem hiding this comment.
I'll rebase this after that land
Pure-JavaScript allocation reductions on the WHATWG streams hot paths partially based on the findings of #63872 : reused promise-reaction closures per controller (pull/write), a buffered fast path in the async iterator,
queueMicrotask()for non-thenable start results, arity-specialized algorithm wrappers, shared nil state records, and removal of several dead per-instance allocations. No observable behavior change: WPT streams/compression/encoding results are identical tomain(same subtests passing, same 8 expected failures by name).Benchmark results (
benchmark/compare.js --runs 10, both binaries built the same day from the same toolchain):The
creation.jsrows at the stockn=50000measure a 20-40ms window and are unreliable; re-run at--set n=500000: