Skip to content

feat(http3): optimize stream sending to avoid blocked streams#3015

Draft
larseggert wants to merge 5 commits intomozilla:mainfrom
larseggert:fix-261
Draft

feat(http3): optimize stream sending to avoid blocked streams#3015
larseggert wants to merge 5 commits intomozilla:mainfrom
larseggert:fix-261

Conversation

@larseggert
Copy link
Collaborator

@larseggert larseggert commented Sep 26, 2025

Implements optimizations to address wasteful attempts to send data on flow-control blocked streams and fixes iteration patterns that could lose streams during re-addition.

Key improvements:

  • Add blocked_streams HashSet to track flow-control blocked streams separately
  • Modify stream_has_pending_data() to skip adding blocked streams to pending queue
  • Add check_blocked_streams() method to automatically move unblocked streams back

Benefits:

  • Reduces CPU usage by eliminating repeated attempts on blocked streams
  • Improves responsiveness by automatically detecting when streams become unblocked

Closes #261

…ed streams

Implements optimizations to address wasteful attempts to send data on
flow-control blocked streams and fixes iteration patterns that could lose
streams during re-addition.

Key improvements:
- Add blocked_streams HashSet to track flow-control blocked streams separately
- Modify stream_has_pending_data() to skip adding blocked streams to pending queue
- Replace mem::take() with single-stream iteration to handle re-additions properly
- Add check_blocked_streams() method to automatically move unblocked streams back
- Use HashSet::retain() for zero-allocation in-place stream filtering
- Clean up both pending and blocked sets when streams are removed

Benefits:
- Reduces CPU usage by eliminating repeated attempts on blocked streams
- Improves responsiveness by automatically detecting when streams become unblocked

Closes mozilla#261
Copilot AI review requested due to automatic review settings September 26, 2025 10:19
@larseggert larseggert changed the title feat(http3): optimize stream sending to avoid repeatedly trying block… feat(http3): optimize stream sending to avoid blocked streams Sep 26, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes HTTP/3 stream sending by introducing flow-control aware stream management to avoid repeatedly attempting to send data on blocked streams.

  • Adds a separate blocked_streams HashSet to track flow-control blocked streams
  • Implements check_blocked_streams() method to automatically detect when blocked streams become unblocked
  • Refactors stream iteration from mem::take() to single-stream processing to handle re-additions properly

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@codecov
Copy link

codecov bot commented Sep 26, 2025

Codecov Report

❌ Patch coverage is 81.74603% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.34%. Comparing base (b9c32c7) to head (95245b9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3015      +/-   ##
==========================================
- Coverage   93.41%   93.34%   -0.07%     
==========================================
  Files         124      124              
  Lines       36234    36350     +116     
  Branches    36234    36350     +116     
==========================================
+ Hits        33847    33930      +83     
- Misses       1540     1575      +35     
+ Partials      847      845       -2     
Components Coverage Δ
neqo-common 97.32% <ø> (ø)
neqo-crypto 83.25% <ø> (-0.48%) ⬇️
neqo-http3 93.17% <81.74%> (-0.12%) ⬇️
neqo-qpack 94.18% <ø> (ø)
neqo-transport 94.48% <ø> (-0.01%) ⬇️
neqo-udp 78.94% <ø> (-0.48%) ⬇️
mtu 85.76% <ø> (ø)

…²) performance

## Summary
Implements optimizations to eliminate wasteful attempts to send data on flow-control blocked streams and addresses performance issues identified during code review.

## Key Improvements

### 1. Blocked Stream Tracking
- Add `blocked_streams` HashSet to track flow-control blocked streams separately
- Modify `stream_has_pending_data()` to skip adding blocked streams to pending queue
- Add `check_blocked_streams()` method to automatically move unblocked streams back to pending
- Clean up both pending and blocked sets when streams are removed

### 2. Performance Optimizations
- Use `mem::take()` pattern in `send_non_control_streams()` to avoid O(n²) iteration
- Implement batching in `check_blocked_streams()` to avoid expensive per-stream calls during iteration
- Properly handle streams that get re-added during processing

## Benefits
- **Reduces CPU usage** by eliminating repeated attempts on blocked streams
- **Improves responsiveness** by automatically detecting when streams become unblocked
- **Fixes algorithmic performance issues** in stream processing hot paths
- **Maintains backward compatibility** with all existing tests passing

Closes mozilla#261
@larseggert larseggert marked this pull request as ready for review September 26, 2025 13:28
Copilot AI review requested due to automatic review settings September 26, 2025 13:28
@larseggert larseggert requested a review from mxinden as a code owner September 26, 2025 13:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 401 to 429
let mut unblocked = Vec::new();
#[expect(
clippy::iter_over_hash_type,
reason = "OK to loop over streams in an undefined order."
)]
for &stream_id in &self.blocked_streams {
match conn.stream_avail_send_space(stream_id) {
Ok(0) => {
// Still blocked, do nothing.
}
Ok(_) => {
// Unblocked, collect for removal and move to pending data if needed.
unblocked.push(stream_id);
}
Err(_) => {
// Stream no longer exists, collect for removal.
unblocked.push(stream_id);
}
}
}
// Remove all unblocked streams from blocked_streams.
for stream_id in unblocked {
self.blocked_streams.remove(&stream_id);
if let Some(stream) = self.send_streams.get(&stream_id) {
if stream.has_data_to_send() {
self.streams_with_pending_data.insert(stream_id);
}
}
}
Copy link

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method iterates over all blocked streams on every call, which could be expensive with many blocked streams. Consider optimizing by only checking blocked streams when flow control updates are received, or implementing a more efficient unblocking mechanism.

Copilot uses AI. Check for mistakes.
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increases the amount of state tracking, i.e. does not come for free both in terms of complexity, and potentially in terms of compute. Unless we have a benchmark showing this improves performance, i.e. shows that the additional tracking is cheaper than the continuous attempt at sending on a blocked stream, I don't think we should merge.

Copilot AI review requested due to automatic review settings September 29, 2025 12:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

for stream_id in unblocked {
self.blocked_streams.remove(&stream_id);
if let Some(stream) = self.send_streams.get(&stream_id) {
if stream.has_data_to_send() {
Copy link

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a potential infinite loop. When mark_stream_for_sending() is called, it adds the stream to streams_with_pending_data, which will cause it to be processed again in send_streams_with_pending_data(). If the stream becomes blocked again, it will be moved back to blocked_streams, then check_blocked_streams() will move it back to pending, creating a cycle.

Suggested change
if stream.has_data_to_send() {
if stream.has_data_to_send() && !self.streams_with_pending_data.contains(&stream_id) {

Copilot uses AI. Check for mistakes.
Comment on lines +451 to +459
match conn.stream_avail_send_space(stream_id) {
Ok(0) => {
// No space available, stream is likely blocked
self.blocked_streams.insert(stream_id);
}
Ok(_) => {
// Space available, stream can continue sending
self.streams_with_pending_data.insert(stream_id);
}
Copy link

Copilot AI Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic will re-add a stream that was just processed back to streams_with_pending_data in the same iteration, potentially causing it to be processed again immediately. This could lead to inefficient repeated processing of the same stream within a single send_streams_with_pending_data() call.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

github-actions bot commented Sep 29, 2025

🐰 Bencher Report

Branchfix-261
TestbedOn-prem
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
nanoseconds (ns)
(Result Δ%)
Upper Boundary
nanoseconds (ns)
(Limit %)
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client📈 view plot
🚷 view threshold
208,660,000.00 ns
(+0.92%)Baseline: 206,766,840.58 ns
216,469,067.65 ns
(96.39%)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client📈 view plot
🚷 view threshold
204,780,000.00 ns
(+1.93%)Baseline: 200,912,115.94 ns
211,182,472.97 ns
(96.97%)
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client📈 view plot
🚷 view threshold
38,595,000.00 ns
(+21.90%)Baseline: 31,661,324.64 ns
42,922,891.79 ns
(89.92%)
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client📈 view plot
🚷 view threshold
290,440,000.00 ns
(-0.32%)Baseline: 291,370,840.58 ns
303,634,212.32 ns
(95.65%)
1-streams/each-1000-bytes/simulated-time📈 view plot
🚷 view threshold
118,970,000.00 ns
(+0.25%)Baseline: 118,674,550.72 ns
120,712,253.92 ns
(98.56%)
1-streams/each-1000-bytes/wallclock-time📈 view plot
🚷 view threshold
586,310.00 ns
(-1.07%)Baseline: 592,642.00 ns
615,476.24 ns
(95.26%)
1000-streams/each-1-bytes/simulated-time📈 view plot
🚷 view threshold
2,333,300,000.00 ns
(-82.70%)Baseline: 13,490,980,579.71 ns
23,068,540,082.26 ns
(10.11%)
1000-streams/each-1-bytes/wallclock-time📈 view plot
🚷 view threshold
12,511,000.00 ns
(-9.61%)Baseline: 13,841,753.62 ns
15,084,963.56 ns
(82.94%)
1000-streams/each-1000-bytes/simulated-time📈 view plot
🚷 view threshold
16,476,000,000.00 ns
(-11.70%)Baseline: 18,659,620,289.86 ns
20,618,474,175.45 ns
(79.91%)
1000-streams/each-1000-bytes/wallclock-time📈 view plot
🚷 view threshold
49,854,000.00 ns
(-1.61%)Baseline: 50,669,866.67 ns
57,072,130.62 ns
(87.35%)
RxStreamOrderer::inbound_frame()📈 view plot
🚷 view threshold
110,340,000.00 ns
(+0.59%)Baseline: 109,696,956.52 ns
111,590,965.33 ns
(98.88%)
coalesce_acked_from_zero 1+1 entries📈 view plot
🚷 view threshold
89.64 ns
(+0.87%)Baseline: 88.87 ns
90.13 ns
(99.45%)
coalesce_acked_from_zero 10+1 entries📈 view plot
🚷 view threshold
105.74 ns
(-0.32%)Baseline: 106.07 ns
107.22 ns
(98.62%)
coalesce_acked_from_zero 1000+1 entries📈 view plot
🚷 view threshold
92.08 ns
(+1.94%)Baseline: 90.32 ns
94.94 ns
(96.98%)
coalesce_acked_from_zero 3+1 entries📈 view plot
🚷 view threshold
106.54 ns
(-0.05%)Baseline: 106.59 ns
107.68 ns
(98.95%)
decode 1048576 bytes, mask 3f📈 view plot
🚷 view threshold
1,760,900.00 ns
(+7.70%)Baseline: 1,634,959.71 ns
1,809,989.32 ns
(97.29%)
decode 1048576 bytes, mask 7f📈 view plot
🚷 view threshold
5,048,300.00 ns
(-0.36%)Baseline: 5,066,487.54 ns
5,113,021.54 ns
(98.73%)
decode 1048576 bytes, mask ff📈 view plot
🚷 view threshold
3,003,000.00 ns
(-0.87%)Baseline: 3,029,363.77 ns
3,053,866.08 ns
(98.33%)
decode 4096 bytes, mask 3f📈 view plot
🚷 view threshold
6,252.20 ns
(-15.02%)Baseline: 7,357.02 ns
10,374.04 ns
(60.27%)
decode 4096 bytes, mask 7f📈 view plot
🚷 view threshold
19,633.00 ns
(-0.86%)Baseline: 19,802.48 ns
20,464.83 ns
(95.94%)
decode 4096 bytes, mask ff📈 view plot
🚷 view threshold
11,341.00 ns
(-0.21%)Baseline: 11,365.41 ns
12,521.98 ns
(90.57%)
sent::Packets::take_ranges📈 view plot
🚷 view threshold
4,560.80 ns
(-3.38%)Baseline: 4,720.32 ns
4,959.89 ns
(91.95%)
transfer/pacing-false/same-seed/simulated-time/run📈 view plot
🚷 view threshold
25,234,000,000.00 ns
(-0.69%)Baseline: 25,410,201,166.18 ns
26,029,155,392.72 ns
(96.95%)
transfer/pacing-false/same-seed/wallclock-time/run📈 view plot
🚷 view threshold
24,777,000.00 ns
(-3.53%)Baseline: 25,684,667.64 ns
27,106,786.37 ns
(91.41%)
transfer/pacing-false/varying-seeds/simulated-time/run📈 view plot
🚷 view threshold
25,190,000,000.00 ns
(+0.06%)Baseline: 25,175,405,247.81 ns
25,224,533,916.06 ns
(99.86%)
transfer/pacing-false/varying-seeds/wallclock-time/run📈 view plot
🚷 view threshold
25,040,000.00 ns
(-2.83%)Baseline: 25,768,239.07 ns
27,412,870.51 ns
(91.34%)
transfer/pacing-true/same-seed/simulated-time/run📈 view plot
🚷 view threshold
25,301,000,000.00 ns
(-1.08%)Baseline: 25,577,483,965.01 ns
25,883,944,847.15 ns
(97.75%)
transfer/pacing-true/same-seed/wallclock-time/run📈 view plot
🚷 view threshold
26,401,000.00 ns
(-2.33%)Baseline: 27,030,889.21 ns
28,602,554.35 ns
(92.30%)
transfer/pacing-true/varying-seeds/simulated-time/run📈 view plot
🚷 view threshold
25,016,000,000.00 ns
(+0.08%)Baseline: 24,995,355,685.13 ns
25,044,029,996.46 ns
(99.89%)
transfer/pacing-true/varying-seeds/wallclock-time/run📈 view plot
🚷 view threshold
25,322,000.00 ns
(-3.61%)Baseline: 26,269,545.19 ns
27,994,757.32 ns
(90.45%)
🐰 View full continuous benchmarking report in Bencher

Signed-off-by: Lars Eggert <lars@eggert.org>
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 12, 2025

CodSpeed Performance Report

Merging #3015 will improve performances by 11.16%

Comparing larseggert:fix-261 (95245b9) with main (b9c32c7)

Summary

⚡ 1 improvement
✅ 22 untouched

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Simulation client 852.3 ms 766.7 ms +11.16%

@github-actions
Copy link
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to b9c32c7.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@github-actions
Copy link
Contributor

Benchmark results

Performance differences relative to b9c32c7.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💔 Performance has regressed.
       time:   [204.47 ms 204.78 ms 205.09 ms]
       thrpt:  [487.58 MiB/s 488.32 MiB/s 489.06 MiB/s]
change:
       time:   [+1.6604% +1.8996% +2.1422%] (p = 0.00 < 0.05)
       thrpt:  [-2.0972% -1.8642% -1.6333%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💔 Performance has regressed.
       time:   [288.57 ms 290.44 ms 292.34 ms]
       thrpt:  [34.207 Kelem/s 34.431 Kelem/s 34.654 Kelem/s]
change:
       time:   [+1.0774% +1.9952% +2.9143%] (p = 0.00 < 0.05)
       thrpt:  [-2.8318% -1.9561% -1.0659%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [38.443 ms 38.595 ms 38.770 ms]
       thrpt:  [25.793   B/s 25.910   B/s 26.013   B/s]
change:
       time:   [-0.7411% -0.1602% +0.4553%] (p = 0.60 > 0.05)
       thrpt:  [-0.4532% +0.1604% +0.7467%]

Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: Change within noise threshold.
       time:   [208.37 ms 208.66 ms 208.99 ms]
       thrpt:  [478.50 MiB/s 479.25 MiB/s 479.91 MiB/s]
change:
       time:   [-0.7270% -0.5355% -0.3313%] (p = 0.00 < 0.05)
       thrpt:  [+0.3324% +0.5384% +0.7323%]

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.308 µs 11.341 µs 11.380 µs]
       change: [-1.0832% -0.4440% +0.0548%] (p = 0.14 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [2.9936 ms 3.0030 ms 3.0141 ms]
       change: [-0.3209% +0.1246% +0.5597%] (p = 0.61 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
10 (10.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.582 µs 19.633 µs 19.691 µs]
       change: [-0.1588% +0.2884% +0.8085%] (p = 0.25 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low mild
2 (2.00%) high mild
14 (14.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.0337 ms 5.0483 ms 5.0657 ms]
       change: [-0.2913% +0.1011% +0.5009%] (p = 0.63 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
15 (15.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.2161 µs 6.2522 µs 6.2960 µs]
       change: [-0.0048% +0.4681% +1.0177%] (p = 0.08 > 0.05)

Found 18 outliers among 100 measurements (18.00%)
8 (8.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severe

decode 1048576 bytes, mask 3f: Change within noise threshold.
       time:   [1.7581 ms 1.7609 ms 1.7664 ms]
       change: [-1.9950% -1.0072% -0.2316%] (p = 0.02 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

1-streams/each-1000-bytes/wallclock-time: No change in performance detected.
       time:   [585.12 µs 586.31 µs 587.71 µs]
       change: [-0.4693% +0.0600% +0.5569%] (p = 0.82 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
1-streams/each-1000-bytes/simulated-time
time: [118.76 ms 118.97 ms 119.17 ms]
thrpt: [8.1944 KiB/s 8.2087 KiB/s 8.2232 KiB/s]
change:
time: [-0.5300% -0.2576% +0.0046%] (p = 0.05 > 0.05)
thrpt: [-0.0046% +0.2583% +0.5328%]
No change in performance detected.

1000-streams/each-1-bytes/wallclock-time: 💚 Performance has improved.
       time:   [12.476 ms 12.511 ms 12.546 ms]
       change: [-2.5823% -2.1537% -1.7459%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
1000-streams/each-1-bytes/simulated-time
time: [2.3292 s 2.3333 s 2.3374 s]
thrpt: [427.82 B/s 428.57 B/s 429.33 B/s]
change:
time: [-0.3136% -0.0696% +0.1533%] (p = 0.57 > 0.05)
thrpt: [-0.1530% +0.0697% +0.3146%]
No change in performance detected.

1000-streams/each-1000-bytes/wallclock-time: 💚 Performance has improved.
       time:   [49.733 ms 49.854 ms 49.977 ms]
       change: [-1.9723% -1.4463% -1.0104%] (p = 0.00 < 0.05)
1000-streams/each-1000-bytes/simulated-time: No change in performance detected.
       time:   [16.214 s 16.476 s 16.737 s]
       thrpt:  [58.346 KiB/s 59.271 KiB/s 60.229 KiB/s]
change:
       time:   [-1.8031% +0.3896% +2.6943%] (p = 0.72 > 0.05)
       thrpt:  [-2.6236% -0.3881% +1.8362%]
coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [89.258 ns 89.640 ns 90.018 ns]
       change: [-0.1880% +0.2282% +0.6918%] (p = 0.32 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
11 (11.00%) high mild
3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [106.18 ns 106.54 ns 106.93 ns]
       change: [-0.5614% +0.0503% +0.6192%] (p = 0.87 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) high mild
13 (13.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [105.41 ns 105.74 ns 106.15 ns]
       change: [-1.1771% -0.3829% +0.3972%] (p = 0.35 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [91.929 ns 92.075 ns 92.238 ns]
       change: [-0.0523% +0.6052% +1.3694%] (p = 0.09 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [110.14 ms 110.34 ms 110.66 ms]
       change: [+0.5859% +0.7739% +1.0999%] (p = 0.00 < 0.05)

Found 19 outliers among 100 measurements (19.00%)
2 (2.00%) low severe
13 (13.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe

sent::Packets::take_ranges: No change in performance detected.
       time:   [4.4608 µs 4.5608 µs 4.6527 µs]
       change: [-2.3333% +0.9104% +4.5284%] (p = 0.62 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

transfer/pacing-false/varying-seeds/wallclock-time/run: Change within noise threshold.
       time:   [24.988 ms 25.040 ms 25.101 ms]
       change: [+0.9575% +1.2453% +1.5561%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-false/varying-seeds/simulated-time/run: No change in performance detected.
       time:   [25.155 s 25.190 s 25.225 s]
       thrpt:  [162.38 KiB/s 162.61 KiB/s 162.83 KiB/s]
change:
       time:   [-0.1498% +0.0425% +0.2387%] (p = 0.67 > 0.05)
       thrpt:  [-0.2381% -0.0424% +0.1500%]
transfer/pacing-true/varying-seeds/wallclock-time/run: Change within noise threshold.
       time:   [25.262 ms 25.322 ms 25.385 ms]
       change: [-1.0386% -0.7009% -0.3595%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-true/varying-seeds/simulated-time/run: No change in performance detected.
       time:   [24.980 s 25.016 s 25.052 s]
       thrpt:  [163.50 KiB/s 163.74 KiB/s 163.97 KiB/s]
change:
       time:   [-0.0809% +0.1287% +0.3384%] (p = 0.21 > 0.05)
       thrpt:  [-0.3373% -0.1285% +0.0810%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed/wallclock-time/run: Change within noise threshold.
       time:   [24.746 ms 24.777 ms 24.823 ms]
       change: [+1.5039% +1.7092% +1.9363%] (p = 0.00 < 0.05)

Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/same-seed/simulated-time/run: No change in performance detected.
       time:   [25.234 s 25.234 s 25.234 s]
       thrpt:  [162.32 KiB/s 162.32 KiB/s 162.32 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000%]
transfer/pacing-true/same-seed/wallclock-time/run: No change in performance detected.
       time:   [26.386 ms 26.401 ms 26.417 ms]
       change: [-0.0512% +0.1611% +0.3147%] (p = 0.08 > 0.05)
transfer/pacing-true/same-seed/simulated-time/run: No change in performance detected.
       time:   [25.301 s 25.301 s 25.301 s]
       thrpt:  [161.89 KiB/s 161.89 KiB/s 161.89 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000%]

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Contributor

Client/server transfer results

Performance differences relative to b9c32c7.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google vs. google 455.0 ± 3.8 449.1 465.6 70.3 ± 8.4
google vs. neqo (cubic, paced) 279.8 ± 4.7 271.6 288.9 114.4 ± 6.8 -0.6 -0.2%
msquic vs. msquic 197.1 ± 67.6 145.5 457.5 162.4 ± 0.5
msquic vs. neqo (cubic, paced) 217.5 ± 64.9 143.7 466.5 147.1 ± 0.5 -4.2 -1.9%
neqo vs. google (cubic, paced) 758.1 ± 8.4 748.9 828.2 42.2 ± 3.8 1.1 0.1%
neqo vs. msquic (cubic, paced) 156.8 ± 4.2 148.9 165.2 204.1 ± 7.6 💚 -1.5 -0.9%
neqo vs. neqo (cubic) 94.7 ± 4.4 86.9 104.5 338.0 ± 7.3 💔 2.3 2.5%
neqo vs. neqo (cubic, paced) 94.6 ± 4.2 87.7 105.4 338.3 ± 7.6 -0.5 -0.5%
neqo vs. neqo (reno) 95.6 ± 4.1 85.7 106.4 334.6 ± 7.8 💔 2.2 2.3%
neqo vs. neqo (reno, paced) 96.2 ± 4.0 89.1 106.4 332.5 ± 8.0 1.0 1.1%
neqo vs. quiche (cubic, paced) 193.3 ± 4.4 186.7 202.5 165.5 ± 7.3 -0.9 -0.5%
neqo vs. s2n (cubic, paced) 218.8 ± 3.7 213.3 226.3 146.3 ± 8.6 💚 -4.0 -1.8%
quiche vs. neqo (cubic, paced) 155.8 ± 4.6 142.0 164.4 205.4 ± 7.0 💚 -2.0 -1.3%
quiche vs. quiche 146.2 ± 4.5 138.5 158.7 218.9 ± 7.1
s2n vs. neqo (cubic, paced) 174.2 ± 4.8 161.8 185.7 183.7 ± 6.7 💔 3.5 2.0%
s2n vs. s2n 244.6 ± 21.3 231.4 342.0 130.8 ± 1.5

Download data for profiler.firefox.com or download performance comparison data.

@larseggert larseggert marked this pull request as draft February 12, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimise sending data in http3 code

3 participants