Skip to content

perf: merge stream transforms in continueFizzStream (8 → 4)#91621

Open
benfavre wants to merge 1 commit intovercel:canaryfrom
benfavre:perf/merge-more-stream-transforms
Open

perf: merge stream transforms in continueFizzStream (8 → 4)#91621
benfavre wants to merge 1 commit intovercel:canaryfrom
benfavre:perf/merge-more-stream-transforms

Conversation

@benfavre
Copy link
Copy Markdown
Contributor

Summary

Merge 4 separate TransformStream instances in continueFizzStream into 2 combined transforms, reducing the pipeline from up to 8 transforms to 4.

The stream pipeline in continueFizzStream accounts for ~200μs per dynamic request (~9% CPU in profiling). Each TransformStream carries ~33μs of writable+readable state machine overhead. This PR eliminates 4 of those by combining transforms that operate on the same chunks:

Transform 1: createBufferedUnifiedTransform() (replaces 4 separate transforms)

  • Buffer — chunk batching via scheduleImmediate
  • DplId injection — data-dpl-id attribute on <html> tag
  • Metadata transformation — icon mark replacement
  • Root layout validator — <html>/<body> tag scanning

Transform 2: createMoveSuffixAndHeadInsertionStream() (replaces 2 separate transforms)

  • MoveSuffix — strips </body></html>, re-appends at stream end
  • HeadInsertion — inserts server HTML before </head>

Performance characteristics

  • Fast path is fully synchronous: once all one-time operations complete (allDone flag), processChunk returns void (no Promise allocation, no microtask)
  • Async only when needed: metadata insertion is the only async code path, split into processMetadataInsertion() to keep the hot path clean
  • ~130μs savings per dynamic request (4 fewer TransformStream state machines)
  • Original standalone transforms preserved for other callers (debug-channel.ts, prerender functions)

Before (up to 8 transforms)

Buffer → DplId → Metadata → DeferredSuffix → FlightData → Validator → MoveSuffix → HeadInsertion

After (up to 4 transforms)

BufferedUnified → DeferredSuffix → FlightData → MoveSuffixAndHeadInsertion

(DeferredSuffix and FlightData are conditional and may not be present)

Test plan

  • Existing test/development/app-dir/missing-required-html-tags passes (root layout validator)
  • Deployment ID (data-dpl-id) is correctly injected on <html> tag
  • Metadata icon mark replacement works in first chunk and later chunks
  • Server-inserted HTML appears before </head>
  • </body></html> is correctly moved to stream end
  • PPR resume path works (no </head> in chunk)
  • Static generation path unaffected (other continue* functions unchanged)

🤖 Generated with Claude Code

Merge 4 separate TransformStreams in `continueFizzStream` into 2 combined
transforms, reducing the pipeline from up to 8 transforms to 4:

1. `createBufferedUnifiedTransform()` — combines:
   - Buffer (chunk batching via scheduleImmediate)
   - DplId injection (data-dpl-id attribute on <html>)
   - Metadata transformation (icon mark replacement)
   - Root layout validator (<html>/<body> tag scanning)

2. `createMoveSuffixAndHeadInsertionStream()` — combines:
   - MoveSuffix (strips </body></html>, re-appends at stream end)
   - HeadInsertion (inserts server HTML before </head>)

Each eliminated TransformStream saves ~33μs of writable+readable state
machine overhead per request. Total savings: ~130μs per dynamic request.

Key design decisions:
- The unified buffer processes merged chunks through DplId/Metadata/Validator
  on each flush. Once all one-time operations complete (`allDone` flag), the
  fast path is fully synchronous with zero Promise allocation.
- Metadata insertion is the only async path (calls getServerInsertedMetadata).
  This is split into a separate `processMetadataInsertion` function to keep
  the hot path free of async overhead.
- The merged MoveSuffix+HeadInsertion handles both operations in a single
  transform pass per chunk.
- Original standalone transforms are preserved for use by other callers
  (debug-channel.ts, prerender functions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nextjs-bot
Copy link
Copy Markdown
Collaborator

Allow CI Workflow Run

  • approve CI run for commit: 0942e5f

Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer

1 similar comment
@nextjs-bot
Copy link
Copy Markdown
Collaborator

Allow CI Workflow Run

  • approve CI run for commit: 0942e5f

Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer

@benfavre
Copy link
Copy Markdown
Contributor Author

Performance Impact

Stream pipeline cost: ~200μs/req for dynamic routes (9% of CPU). Each TransformStream creates internal ReadableStream + WritableStream + queues + backpressure management costing ~33μs.

Before (PR #91575): 6 transforms in continueFizzStream:

  1. BufferedTransform
  2. UnifiedHead (DplId + Metadata + Validator)
  3. DeferredSuffix (conditional)
  4. FlightDataInjection (conditional)
  5. MoveSuffix
  6. HeadInsertion

After (this PR): 4 transforms:

  1. BufferedUnifiedTransform — merges Buffer + UnifiedHead + MoveSuffix + HeadInsertion into 1
  2. DeferredSuffix (conditional)
  3. FlightDataInjection (conditional)
  4. (removed 4 separate TransformStreams)

Savings: 4 fewer TransformStream constructions = ~132μs saved per request. From original 8 transforms to 4 total.

Regression Safety

  • All operations (buffering, DplId insertion, metadata handling, root layout validation, suffix movement, head insertion) are now in a single transform but execute in the same order as before
  • The allDone fast-path flag still skips tag searching after one-time operations complete
  • Async operations (getServerInsertedHTML, getServerInsertedMetadata) handled via Promise-returning flush
  • Integration tests (deployment ID, root layout validation, streaming) cover the merged behavior

Test Verification

  • 195 tests across 13 suites, all passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants