Skip to content

chore: Stabilize benchmarks for better signal#3370

Draft
larseggert wants to merge 1 commit intomozilla:mainfrom
larseggert:chore-bench-stabilization
Draft

chore: Stabilize benchmarks for better signal#3370
larseggert wants to merge 1 commit intomozilla:mainfrom
larseggert:chore-bench-stabilization

Conversation

@larseggert
Copy link
Collaborator

More runs, tighter intervals, joint config.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 28, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 5.04%

Comparing larseggert:chore-bench-stabilization (fbbcbbd) with main (cc8bf07)

Summary

❌ 2 regressed benchmarks
✅ 32 untouched benchmarks
🆕 2 new benchmarks
⏩ 12 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime walltime/1-streams/each-1000-bytes 1.4 ms 1.4 ms -4.16%
🆕 WallTime walltime/pacing-false/fixed-seed N/A 95.2 ms N/A
🆕 WallTime walltime/pacing-true/fixed-seed N/A 93.1 ms N/A
Memory walltime/1000-streams/each-1000-bytes 809.5 KB 852.4 KB -5.04%

Footnotes

  1. 12 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.14%. Comparing base (d1180a3) to head (fbbcbbd).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3370      +/-   ##
==========================================
- Coverage   94.20%   94.14%   -0.06%     
==========================================
  Files         125      129       +4     
  Lines       37754    38060     +306     
  Branches    37754    38060     +306     
==========================================
+ Hits        35566    35832     +266     
- Misses       1346     1381      +35     
- Partials      842      847       +5     
Flag Coverage Δ
freebsd 93.16% <ø> (-0.07%) ⬇️
linux 94.22% <ø> (+0.02%) ⬆️
macos 94.11% <ø> (+0.02%) ⬆️
windows 94.24% <ø> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
neqo-common 98.49% <ø> (ø)
neqo-crypto 86.90% <ø> (-0.04%) ⬇️
neqo-http3 93.90% <70.00%> (+0.02%) ⬆️
neqo-qpack 94.79% <100.00%> (-0.01%) ⬇️
neqo-transport 95.19% <97.00%> (+0.08%) ⬆️
neqo-udp 82.47% <ø> (-0.43%) ⬇️
mtu 86.61% <ø> (ø)

@larseggert larseggert force-pushed the chore-bench-stabilization branch from d38618d to e6a3222 Compare January 28, 2026 14:38
More runs, tighter intervals, joint config.
@larseggert larseggert force-pushed the chore-bench-stabilization branch from e6a3222 to fbbcbbd Compare January 28, 2026 14:56
@github-actions
Copy link
Contributor

Client/server transfer results

Performance differences relative to cc8bf07.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
neqo-neqo-cubic-nopacing 94.8 ± 4.0 86.6 106.0 337.4 ± 8.0 💚 -1.2 -1.3%
neqo-neqo-newreno 97.8 ± 4.7 89.0 108.6 327.2 ± 6.8 💔 2.0 2.1%
neqo-quiche-cubic 190.8 ± 3.7 184.1 200.0 167.8 ± 8.6 💚 -1.2 -0.6%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google-google-nopacing 454.5 ± 4.3 449.0 475.8 70.4 ± 7.4
google-neqo-cubic 272.4 ± 4.5 266.5 291.1 117.5 ± 7.1 0.6 0.2%
msquic-msquic-nopacing 176.7 ± 54.1 118.5 457.9 181.1 ± 0.6
msquic-neqo-cubic 192.0 ± 46.3 146.2 417.0 166.7 ± 0.7 -13.7 -6.7%
neqo-google-cubic 755.4 ± 4.7 747.4 769.0 42.4 ± 6.8 1.1 0.1%
neqo-msquic-cubic 158.8 ± 4.1 150.7 167.3 201.6 ± 7.8 -0.3 -0.2%
neqo-neqo-cubic 94.7 ± 4.0 87.8 103.7 337.8 ± 8.0 -1.1 -1.2%
neqo-neqo-cubic-nopacing 94.8 ± 4.0 86.6 106.0 337.4 ± 8.0 💚 -1.2 -1.3%
neqo-neqo-newreno 97.8 ± 4.7 89.0 108.6 327.2 ± 6.8 💔 2.0 2.1%
neqo-neqo-newreno-nopacing 95.9 ± 4.2 85.7 105.2 333.5 ± 7.6 -0.2 -0.2%
neqo-quiche-cubic 190.8 ± 3.7 184.1 200.0 167.8 ± 8.6 💚 -1.2 -0.6%
neqo-s2n-cubic 219.9 ± 4.4 211.9 232.9 145.5 ± 7.3 1.2 0.6%
quiche-neqo-cubic 157.6 ± 9.4 143.6 204.0 203.1 ± 3.4 1.2 0.8%
quiche-quiche-nopacing 140.0 ± 2.9 136.7 154.1 228.6 ± 11.0
s2n-neqo-cubic 174.0 ± 4.9 165.0 185.6 183.9 ± 6.5 -0.9 -0.5%
s2n-s2n-nopacing 248.5 ± 22.2 231.4 343.2 128.8 ± 1.4

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to main at cc8bf07.

neqo-pr as clientneqo-pr as server
neqo-pr vs. aioquic: A L1
neqo-pr vs. go-x-net: A BP BA
neqo-pr vs. haproxy: A 🚀C1 BP BA
neqo-pr vs. kwik: BP BA
neqo-pr vs. linuxquic: A L1 C1
neqo-pr vs. lsquic: baseline result missing
neqo-pr vs. msquic: A L1 C1
neqo-pr vs. mvfst: A 🚀L1 ⚠️BA
neqo-pr vs. neqo: Z
neqo-pr vs. nginx: A 🚀L1 C1 BP BA
neqo-pr vs. ngtcp2: A 🚀C1 CM
neqo-pr vs. picoquic: A 🚀L1 C1
neqo-pr vs. quic-go: A
neqo-pr vs. quiche: A 🚀L1 C1 BP BA
neqo-pr vs. quinn: A L1 C1
neqo-pr vs. s2n-quic: A 🚀C1 BP BA CM
neqo-pr vs. tquic: S A BP BA
neqo-pr vs. xquic: A ⚠️L1 C1
aioquic vs. neqo-pr: Z 🚀C1 CM
go-x-net vs. neqo-pr: CM
kwik vs. neqo-pr: Z BP BA CM
lsquic vs. neqo-pr: Z ⚠️L1
msquic vs. neqo-pr: Z 🚀BP CM
mvfst vs. neqo-pr: Z A L1 C1 CM
neqo vs. neqo-pr: Z
openssl vs. neqo-pr: LR M A CM
picoquic vs. neqo-pr: Z
quic-go vs. neqo-pr: 🚀BA CM
quiche vs. neqo-pr: Z 🚀L1 CM
quinn vs. neqo-pr: Z ⚠️C1 V2 CM
s2n-quic vs. neqo-pr: ⚠️B CM
tquic vs. neqo-pr: Z ⚠️C1 CM
xquic vs. neqo-pr: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

@github-actions
Copy link
Contributor

Benchmark results

Significant performance differences relative to cc8bf07.

decode 4096 bytes, mask ff: 💔 Performance has regressed by +16.450%.
       time:   [6.2204 µs 6.2311 µs 6.2428 µs]
       change: [+15.736% +16.450% +17.023] (p = 0.00 < 0.01)
       Performance has regressed.
Found 195 outliers among 1000 measurements (19.50%)
47 (4.70%) low severe
21 (2.10%) low mild
57 (5.70%) high mild
70 (7.00%) high severe
coalesce_acked_from_zero 1000+1 entries: 💔 Performance has regressed by +2.7831%.
       time:   [98.580 ns 98.739 ns 98.928 ns]
       change: [+2.0419% +2.7831% +3.2802] (p = 0.00 < 0.01)
       Performance has regressed.
Found 26 outliers among 1000 measurements (2.60%)
11 (1.10%) high mild
15 (1.50%) high severe
RxStreamOrderer::inbound_frame(): 💚 Performance has improved by -2.1149%.
       time:   [106.77 ms 106.84 ms 106.93 ms]
       change: [-2.5158% -2.1149% -1.8691] (p = 0.00 < 0.01)
       Performance has improved.
Found 39 outliers among 1000 measurements (3.90%)
31 (3.10%) high mild
8 (0.80%) high severe
All results
transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: Change within noise threshold.
       time:   [197.81 ms 198.04 ms 198.30 ms]
       thrpt:  [504.28 MiB/s 504.94 MiB/s 505.54 MiB/s]
change:
       time:   [-1.7417% -1.3046% -0.9595] (p = 0.00 < 0.01)
       thrpt:  [+0.9688% +1.3219% +1.7725]
       Change within noise threshold.
Found 7 outliers among 500 measurements (1.40%)
4 (0.80%) high mild
3 (0.60%) high severe
transfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: No change in performance detected.
       time:   [282.48 ms 283.68 ms 284.89 ms]
       thrpt:  [35.102 Kelem/s 35.251 Kelem/s 35.401 Kelem/s]
change:
       time:   [-1.2204% -0.2714% +0.6972] (p = 0.46 > 0.01)
       thrpt:  [-0.6924% +0.2722% +1.2355]
       No change in performance detected.
Found 3 outliers among 500 measurements (0.60%)
3 (0.60%) high mild
transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected.
       time:   [38.620 ms 38.684 ms 38.753 ms]
       thrpt:  [25.804   B/s 25.850   B/s 25.893   B/s]
change:
       time:   [-0.2011% +0.3568% +0.8583] (p = 0.13 > 0.01)
       thrpt:  [-0.8510% -0.3555% +0.2015]
       No change in performance detected.
Found 43 outliers among 500 measurements (8.60%)
34 (6.80%) high mild
9 (1.80%) high severe
transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: Change within noise threshold.
       time:   [201.49 ms 201.73 ms 201.98 ms]
       thrpt:  [495.09 MiB/s 495.72 MiB/s 496.31 MiB/s]
change:
       time:   [-1.5672% -1.2500% -0.9811] (p = 0.00 < 0.01)
       thrpt:  [+0.9909% +1.2659% +1.5921]
       Change within noise threshold.
Found 8 outliers among 500 measurements (1.60%)
1 (0.20%) high mild
7 (1.40%) high severe
decode 4096 bytes, mask ff: 💔 Performance has regressed by +16.450%.
       time:   [6.2204 µs 6.2311 µs 6.2428 µs]
       change: [+15.736% +16.450% +17.023] (p = 0.00 < 0.01)
       Performance has regressed.
Found 195 outliers among 1000 measurements (19.50%)
47 (4.70%) low severe
21 (2.10%) low mild
57 (5.70%) high mild
70 (7.00%) high severe
decode 1048576 bytes, mask ff: Change within noise threshold.
       time:   [1.3396 ms 1.3449 ms 1.3518 ms]
       change: [-2.0845% -1.5302% -0.8057] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 79 outliers among 1000 measurements (7.90%)
7 (0.70%) low mild
20 (2.00%) high mild
52 (5.20%) high severe
decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [7.2053 µs 7.2212 µs 7.2393 µs]
       change: [-1.1142% -0.4024% +0.0577] (p = 0.02 > 0.01)
       No change in performance detected.
Found 140 outliers among 1000 measurements (14.00%)
59 (5.90%) low severe
12 (1.20%) low mild
19 (1.90%) high mild
50 (5.00%) high severe
decode 1048576 bytes, mask 7f: Change within noise threshold.
       time:   [1.8302 ms 1.8343 ms 1.8392 ms]
       change: [-1.5556% -1.2513% -0.9126] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 110 outliers among 1000 measurements (11.00%)
46 (4.60%) low mild
22 (2.20%) high mild
42 (4.20%) high severe
decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.9103 µs 6.9251 µs 6.9439 µs]
       change: [-0.1012% +0.2850% +0.7984] (p = 0.43 > 0.01)
       No change in performance detected.
Found 117 outliers among 1000 measurements (11.70%)
4 (0.40%) low severe
24 (2.40%) low mild
13 (1.30%) high mild
76 (7.60%) high severe
decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.7640 ms 1.7675 ms 1.7716 ms]
       change: [-0.2649% +0.0250% +0.3295] (p = 0.93 > 0.01)
       No change in performance detected.
Found 69 outliers among 1000 measurements (6.90%)
15 (1.50%) high mild
54 (5.40%) high severe
streams/simulated/1-streams/each-1000-bytes: No change in performance detected.
       time:   [129.68 ms 129.68 ms 129.68 ms]
       thrpt:  [7.5304 KiB/s 7.5305 KiB/s 7.5306 KiB/s]
change:
       time:   [-0.0053% -0.0017% +0.0020] (p = 0.23 > 0.01)
       thrpt:  [-0.0020% +0.0017% +0.0053]
       No change in performance detected.
Found 6 outliers among 1000 measurements (0.60%)
1 (0.10%) low mild
5 (0.50%) high mild
streams/simulated/1000-streams/each-1-bytes: No change in performance detected.
       time:   [2.5363 s 2.5364 s 2.5366 s]
       thrpt:  [394.22   B/s 394.25   B/s 394.28   B/s]
change:
       time:   [-0.0243% -0.0086% +0.0068] (p = 0.14 > 0.01)
       thrpt:  [-0.0068% +0.0086% +0.0243]
       No change in performance detected.
streams/simulated/1000-streams/each-1000-bytes: No change in performance detected.
       time:   [6.5905 s 6.5959 s 6.6018 s]
       thrpt:  [147.92 KiB/s 148.06 KiB/s 148.18 KiB/s]
change:
       time:   [-0.2524% -0.0308% +0.1641] (p = 0.65 > 0.01)
       thrpt:  [-0.1639% +0.0308% +0.2531]
       No change in performance detected.
Found 30 outliers among 1000 measurements (3.00%)
4 (0.40%) high mild
26 (2.60%) high severe
streams/walltime/1-streams/each-1000-bytes: Change within noise threshold.
       time:   [595.34 µs 596.23 µs 597.17 µs]
       change: [+0.3548% +0.8370% +1.2172] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 90 outliers among 500 measurements (18.00%)
83 (16.60%) high mild
7 (1.40%) high severe
streams/walltime/1000-streams/each-1-bytes: Change within noise threshold.
       time:   [12.279 ms 12.291 ms 12.306 ms]
       change: [-1.8610% -1.2572% -0.9174] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 43 outliers among 500 measurements (8.60%)
2 (0.40%) low mild
37 (7.40%) high mild
4 (0.80%) high severe
streams/walltime/1000-streams/each-1000-bytes: No change in performance detected.
       time:   [45.194 ms 45.230 ms 45.273 ms]
       change: [-0.2373% -0.0658% +0.1069] (p = 0.28 > 0.01)
       No change in performance detected.
Found 10 outliers among 500 measurements (2.00%)
3 (0.60%) low mild
4 (0.80%) high mild
3 (0.60%) high severe
coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [91.118 ns 91.490 ns 91.954 ns]
       change: [-1.2286% -0.5962% +0.0777] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 35 outliers among 1000 measurements (3.50%)
9 (0.90%) low mild
3 (0.30%) high mild
23 (2.30%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [109.55 ns 109.94 ns 110.43 ns]
       change: [-0.7931% -0.2486% +0.2823] (p = 0.09 > 0.01)
       No change in performance detected.
Found 32 outliers among 1000 measurements (3.20%)
4 (0.40%) low mild
6 (0.60%) high mild
22 (2.20%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [109.11 ns 109.48 ns 110.05 ns]
       change: [-0.9626% -0.2868% +0.2805] (p = 0.06 > 0.01)
       No change in performance detected.
Found 20 outliers among 1000 measurements (2.00%)
9 (0.90%) high mild
11 (1.10%) high severe
coalesce_acked_from_zero 1000+1 entries: 💔 Performance has regressed by +2.7831%.
       time:   [98.580 ns 98.739 ns 98.928 ns]
       change: [+2.0419% +2.7831% +3.2802] (p = 0.00 < 0.01)
       Performance has regressed.
Found 26 outliers among 1000 measurements (2.60%)
11 (1.10%) high mild
15 (1.50%) high severe
RxStreamOrderer::inbound_frame(): 💚 Performance has improved by -2.1149%.
       time:   [106.77 ms 106.84 ms 106.93 ms]
       change: [-2.5158% -2.1149% -1.8691] (p = 0.00 < 0.01)
       Performance has improved.
Found 39 outliers among 1000 measurements (3.90%)
31 (3.10%) high mild
8 (0.80%) high severe
sent::Packets::take_ranges: No change in performance detected.
       time:   [4.3725 µs 4.4076 µs 4.5002 µs]
       change: [-2.8068% +0.9927% +4.8098] (p = 0.95 > 0.01)
       No change in performance detected.
Found 49 outliers among 1000 measurements (4.90%)
44 (4.40%) high mild
5 (0.50%) high severe
transfer/simulated/pacing-false/fixed-seed
       time:   [23.941 s 23.941 s 23.941 s]
       thrpt:  [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
transfer/simulated/pacing-true/fixed-seed
       time:   [23.676 s 23.676 s 23.676 s]
       thrpt:  [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
transfer/walltime/pacing-false/fixed-seed
       time:   [23.388 ms 23.406 ms 23.429 ms]
Found 8 outliers among 500 measurements (1.60%)
1 (0.20%) low mild
4 (0.80%) high mild
3 (0.60%) high severe
transfer/walltime/pacing-true/fixed-seed
       time:   [23.785 ms 23.803 ms 23.829 ms]
Found 21 outliers among 500 measurements (4.20%)
5 (1.00%) low mild
9 (1.80%) high mild
7 (1.40%) high severe

Download data for profiler.firefox.com or download performance comparison data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant