Skip to content

feat: Use a BinaryHeap for the RxStreamOrderer#3054

Draft
larseggert wants to merge 3 commits intomozilla:mainfrom
larseggert:feat-RxStreamOrderer-heap
Draft

feat: Use a BinaryHeap for the RxStreamOrderer#3054
larseggert wants to merge 3 commits intomozilla:mainfrom
larseggert:feat-RxStreamOrderer-heap

Conversation

@larseggert
Copy link
Collaborator

@larseggert larseggert commented Oct 16, 2025

Claude thinks this is faster.

Claude things this is faster.
@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 95.36082% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.69%. Comparing base (b9c32c7) to head (615fd33).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3054      +/-   ##
==========================================
- Coverage   93.41%   92.69%   -0.72%     
==========================================
  Files         124      125       +1     
  Lines       36234    36401     +167     
  Branches    36234    36401     +167     
==========================================
- Hits        33847    33742     -105     
- Misses       1540     1812     +272     
  Partials      847      847              
Components Coverage Δ
neqo-common 97.32% <ø> (ø)
neqo-crypto 83.25% <ø> (-0.48%) ⬇️
neqo-http3 93.29% <ø> (ø)
neqo-qpack 94.18% <ø> (ø)
neqo-transport 93.06% <95.36%> (-1.43%) ⬇️
neqo-udp 78.94% <ø> (-0.48%) ⬇️
mtu 85.76% <ø> (ø)

@github-actions
Copy link
Contributor

Benchmark results

Performance differences relative to e94d8c6.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.
       time:   [198.69 ms 198.96 ms 199.25 ms]
       thrpt:  [501.88 MiB/s 502.61 MiB/s 503.30 MiB/s]
change:
       time:   [−0.3215% −0.1337% +0.0589%] (p = 0.18 > 0.05)
       thrpt:  [−0.0589% +0.1338% +0.3225%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💚 Performance has improved.
       time:   [275.93 ms 277.73 ms 279.57 ms]
       thrpt:  [35.769 Kelem/s 36.006 Kelem/s 36.241 Kelem/s]
change:
       time:   [−3.4078% −2.6205% −1.7568%] (p = 0.00 < 0.05)
       thrpt:  [+1.7882% +2.6910% +3.5280%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [28.391 ms 28.466 ms 28.568 ms]
       thrpt:  [35.005   B/s 35.130   B/s 35.223   B/s]
change:
       time:   [−0.7924% −0.2954% +0.1849%] (p = 0.24 > 0.05)
       thrpt:  [−0.1845% +0.2963% +0.7988%]

Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
8 (8.00%) high severe

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: Change within noise threshold.
       time:   [200.33 ms 200.51 ms 200.69 ms]
       thrpt:  [498.27 MiB/s 498.74 MiB/s 499.19 MiB/s]
change:
       time:   [−1.2257% −0.9978% −0.8160%] (p = 0.00 < 0.05)
       thrpt:  [+0.8227% +1.0078% +1.2409%]

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.598 µs 11.634 µs 11.678 µs]
       change: [−0.0902% +0.3590% +0.8599%] (p = 0.14 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low severe
6 (6.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.0214 ms 3.0306 ms 3.0415 ms]
       change: [−0.4590% +0.0184% +0.4968%] (p = 0.91 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.963 µs 20.031 µs 20.114 µs]
       change: [−0.1766% +0.1347% +0.4714%] (p = 0.44 > 0.05)

Found 19 outliers among 100 measurements (19.00%)
3 (3.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
13 (13.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.0437 ms 5.0585 ms 5.0766 ms]
       change: [−0.6053% −0.1050% +0.3628%] (p = 0.68 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
11 (11.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [8.2578 µs 8.2928 µs 8.3415 µs]
       change: [−1.0254% −0.2539% +0.4810%] (p = 0.54 > 0.05)

Found 18 outliers among 100 measurements (18.00%)
10 (10.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.5865 ms 1.5921 ms 1.5991 ms]
       change: [−0.9581% −0.2435% +0.4217%] (p = 0.51 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe

1-streams/each-1000-bytes/wallclock-time: Change within noise threshold.
       time:   [585.30 µs 588.12 µs 591.20 µs]
       change: [+0.2303% +0.8014% +1.4096%] (p = 0.00 < 0.05)

Found 14 outliers among 100 measurements (14.00%)
14 (14.00%) high severe
1-streams/each-1000-bytes/simulated-time
time: [118.81 ms 119.03 ms 119.26 ms]
thrpt: [8.1888 KiB/s 8.2041 KiB/s 8.2192 KiB/s]
change:
time: [−0.2633% −0.0158% +0.2496%] (p = 0.90 > 0.05)
thrpt: [−0.2490% +0.0158% +0.2640%]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1000-streams/each-1-bytes/wallclock-time: 💚 Performance has improved.
       time:   [13.014 ms 13.054 ms 13.114 ms]
       change: [−3.7565% −3.3931% −2.9509%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
1000-streams/each-1-bytes/simulated-time
time: [14.991 s 15.004 s 15.017 s]
thrpt: [66.589 B/s 66.647 B/s 66.705 B/s]
change:
time: [−0.0202% +0.1063% +0.2401%] (p = 0.10 > 0.05)
thrpt: [−0.2395% −0.1062% +0.0202%]
No change in performance detected.

1000-streams/each-1000-bytes/wallclock-time: 💚 Performance has improved.
       time:   [45.996 ms 46.120 ms 46.247 ms]
       change: [−3.7940% −3.3373% −2.8882%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1000-streams/each-1000-bytes/simulated-time
time: [18.883 s 19.006 s 19.129 s]
thrpt: [51.052 KiB/s 51.382 KiB/s 51.715 KiB/s]
change:
time: [−2.1118% −0.9633% +0.1113%] (p = 0.10 > 0.05)
thrpt: [−0.1112% +0.9727% +2.1574%]
No change in performance detected.

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [88.177 ns 88.543 ns 88.901 ns]
       change: [−0.9551% −0.2383% +0.3745%] (p = 0.50 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [105.88 ns 106.40 ns 107.02 ns]
       change: [−0.1757% +0.5561% +1.6875%] (p = 0.32 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) high mild
11 (11.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [105.08 ns 105.39 ns 105.78 ns]
       change: [−1.7450% −0.7185% +0.0000%] (p = 0.11 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [89.361 ns 90.527 ns 93.125 ns]
       change: [−5.4906% +6.4286% +25.002%] (p = 0.63 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): 💚 Performance has improved.
       time:   [99.570 ms 99.634 ms 99.702 ms]
       change: [−8.0734% −7.9411% −7.8197%] (p = 0.00 < 0.05)

Found 36 outliers among 100 measurements (36.00%)
12 (12.00%) low severe
5 (5.00%) low mild
8 (8.00%) high mild
11 (11.00%) high severe

BTreeMap: in-order frames:
       time:   [920.99 µs 923.19 µs 927.32 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
BinaryHeap: in-order frames:
       time:   [96.227 µs 96.318 µs 96.409 µs]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
BTreeMap: reverse-order frames:
       time:   [187.10 µs 187.27 µs 187.45 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
BinaryHeap: reverse-order frames:
       time:   [140.21 µs 140.37 µs 140.52 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
BTreeMap: random-order frames:
       time:   [341.23 µs 341.59 µs 341.96 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
BinaryHeap: random-order frames:
       time:   [131.21 µs 131.39 µs 131.64 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
BTreeMap: frames with gaps:
       time:   [113.75 µs 113.86 µs 113.98 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
BinaryHeap: frames with gaps:
       time:   [57.041 µs 57.225 µs 57.502 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
BTreeMap: overlapping frames:
       time:   [169.49 µs 169.70 µs 169.91 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
BinaryHeap: overlapping frames:
       time:   [115.80 µs 115.99 µs 116.20 µs]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
BTreeMap: read_to_end after in-order insert:
       time:   [301.84 µs 302.09 µs 302.34 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
BinaryHeap: read_to_end after in-order insert:
       time:   [256.56 µs 256.80 µs 257.05 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
varying_frame_counts/BTreeMap/100:
       time:   [12.383 µs 12.396 µs 12.408 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
varying_frame_counts/BinaryHeap/100
       time:   [10.120 µs 10.129 µs 10.139 µs]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
varying_frame_counts/BTreeMap/500
       time:   [96.575 µs 96.647 µs 96.715 µs]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
varying_frame_counts/BinaryHeap/500
       time:   [57.504 µs 57.561 µs 57.618 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
varying_frame_counts/BTreeMap/1000
       time:   [203.39 µs 204.17 µs 205.58 µs]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
varying_frame_counts/BinaryHeap/1000
       time:   [114.71 µs 114.83 µs 114.97 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
varying_frame_counts/BTreeMap/5000
       time:   [4.2644 ms 4.2704 ms 4.2766 ms]
varying_frame_counts/BinaryHeap/5000
       time:   [3.8359 ms 3.8412 ms 3.8469 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
varying_frame_counts/BTreeMap/10000
       time:   [9.6113 ms 9.6247 ms 9.6390 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
varying_frame_counts/BinaryHeap/10000
       time:   [8.9493 ms 8.9783 ms 9.0254 ms]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
sent::Packets::take_ranges: No change in performance detected.
       time:   [4.6060 µs 4.7374 µs 4.8735 µs]
       change: [−3.9980% +0.0194% +4.1328%] (p = 0.99 > 0.05)

Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds/wallclock-time/run: 💚 Performance has improved.
       time:   [23.813 ms 23.859 ms 23.913 ms]
       change: [−5.7948% −5.5582% −5.3053%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds/simulated-time/run: No change in performance detected.
       time:   [25.183 s 25.218 s 25.254 s]
       thrpt:  [162.19 KiB/s 162.42 KiB/s 162.65 KiB/s]
change:
       time:   [−0.1044% +0.0844% +0.2695%] (p = 0.38 > 0.05)
       thrpt:  [−0.2688% −0.0844% +0.1045%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-true/varying-seeds/wallclock-time/run: Change within noise threshold.
       time:   [24.760 ms 24.819 ms 24.880 ms]
       change: [−3.4095% −3.0389% −2.7060%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-true/varying-seeds/simulated-time/run: Change within noise threshold.
       time:   [24.897 s 24.936 s 24.976 s]
       thrpt:  [164.00 KiB/s 164.26 KiB/s 164.51 KiB/s]
change:
       time:   [−0.5599% −0.3266% −0.0848%] (p = 0.01 < 0.05)
       thrpt:  [+0.0848% +0.3277% +0.5630%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed/wallclock-time/run: Change within noise threshold.
       time:   [24.728 ms 24.754 ms 24.792 ms]
       change: [−3.0443% −2.8605% −2.6565%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/same-seed/simulated-time/run: No change in performance detected.
       time:   [25.710 s 25.710 s 25.710 s]
       thrpt:  [159.31 KiB/s 159.31 KiB/s 159.31 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000%]
transfer/pacing-true/same-seed/wallclock-time/run: Change within noise threshold.
       time:   [25.735 ms 25.770 ms 25.822 ms]
       change: [−2.7261% −2.5468% −2.3496%] (p = 0.00 < 0.05)

Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

transfer/pacing-true/same-seed/simulated-time/run: No change in performance detected.
       time:   [25.675 s 25.675 s 25.675 s]
       thrpt:  [159.53 KiB/s 159.53 KiB/s 159.53 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000%]

Download data for profiler.firefox.com or download performance comparison data.

group.finish();
}

criterion_group!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an awful lot of benchmarking.

Copy link
Collaborator Author

@larseggert larseggert Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no intention of leaving this in if there is a signal that the core change is at all positive.

Signed-off-by: Lars Eggert <lars@eggert.org>
@github-actions
Copy link
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to b9c32c7.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@github-actions
Copy link
Contributor

Client/server transfer results

Performance differences relative to b9c32c7.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google vs. google 463.4 ± 4.2 457.7 480.1 69.1 ± 7.6
google vs. neqo (cubic, paced) 281.2 ± 3.7 274.7 289.7 113.8 ± 8.6 -0.9 -0.3%
msquic vs. msquic 198.2 ± 80.2 136.7 595.7 161.5 ± 0.4
msquic vs. neqo (cubic, paced) 215.6 ± 73.9 141.4 668.3 148.4 ± 0.4 1.1 0.5%
neqo vs. google (cubic, paced) 770.0 ± 4.2 764.5 784.0 41.6 ± 7.6 -1.0 -0.1%
neqo vs. msquic (cubic, paced) 156.2 ± 4.1 150.6 164.3 204.8 ± 7.8 -0.8 -0.5%
neqo vs. neqo (cubic) 93.4 ± 4.0 86.6 102.6 342.7 ± 8.0 0.7 0.8%
neqo vs. neqo (cubic, paced) 93.4 ± 4.1 84.7 101.9 342.5 ± 7.8 💚 -2.1 -2.2%
neqo vs. neqo (reno) 93.6 ± 4.3 85.9 102.2 341.8 ± 7.4 0.1 0.1%
neqo vs. neqo (reno, paced) 94.3 ± 3.9 88.4 102.8 339.4 ± 8.2 0.5 0.6%
neqo vs. quiche (cubic, paced) 193.4 ± 4.3 187.7 205.4 165.5 ± 7.4 -1.2 -0.6%
neqo vs. s2n (cubic, paced) 221.6 ± 5.1 213.3 236.5 144.4 ± 6.3 -0.9 -0.4%
quiche vs. neqo (cubic, paced) 157.0 ± 4.8 146.9 167.6 203.8 ± 6.7 0.8 0.5%
quiche vs. quiche 146.0 ± 4.5 137.9 158.7 219.2 ± 7.1
s2n vs. neqo (cubic, paced) 175.1 ± 5.9 165.5 206.3 182.7 ± 5.4 0.2 0.1%
s2n vs. s2n 245.4 ± 20.0 232.0 343.4 130.4 ± 1.6

Download data for profiler.firefox.com or download performance comparison data.

@martinthomson
Copy link
Member

It's not clear that this is faster.

(If you want to compare the benchmarks, don't you have to run those on main? How does that work?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants