feat: Reduce reallocations in RxStreamOrderer#3003
feat: Reduce reallocations in RxStreamOrderer#3003larseggert wants to merge 3 commits intomozilla:mainfrom
Conversation
Let's see if this helps performance.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3003 +/- ##
==========================================
- Coverage 95.66% 93.36% -2.31%
==========================================
Files 123 123
Lines 35702 35712 +10
Branches 35702 35712 +10
==========================================
- Hits 34156 33342 -814
- Misses 1506 1528 +22
- Partials 40 842 +802
|
|
| Branch | feat-inbound_frame-prealloc |
| Testbed | On-prem |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| google vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 278.12 ms(-0.08%)Baseline: 278.34 ms | 282.73 ms (98.37%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| msquic vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 224.72 ms(+12.76%)Baseline: 199.30 ms | 236.94 ms (94.84%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. google (cubic, paced) | 📈 view plot 🚷 view threshold | 756.26 ms(-0.45%)Baseline: 759.69 ms | 774.82 ms (97.61%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. msquic (cubic, paced) | 📈 view plot 🚷 view threshold | 156.46 ms(-0.83%)Baseline: 157.78 ms | 160.59 ms (97.43%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (cubic) | 📈 view plot 🚷 view threshold | 94.69 ms(+3.42%)Baseline: 91.56 ms | 96.88 ms (97.74%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 94.16 ms(+1.35%)Baseline: 92.90 ms | 98.09 ms (95.99%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (reno) | 📈 view plot 🚷 view threshold | 93.24 ms(+1.86%)Baseline: 91.54 ms | 96.70 ms (96.43%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (reno, paced) | 📈 view plot 🚷 view threshold | 95.04 ms(+2.42%)Baseline: 92.79 ms | 97.78 ms (97.19%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. quiche (cubic, paced) | 📈 view plot 🚷 view threshold | 191.75 ms(-0.97%)Baseline: 193.64 ms | 196.97 ms (97.35%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. s2n (cubic, paced) | 📈 view plot 🚷 view threshold | 221.72 ms(+0.26%)Baseline: 221.14 ms | 224.10 ms (98.94%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| quiche vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 157.33 ms(+2.74%)Baseline: 153.14 ms | 158.50 ms (99.26%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| s2n vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 173.28 ms(-0.28%)Baseline: 173.77 ms | 178.00 ms (97.35%) |
Benchmark resultsPerformance differences relative to b3d8f0d. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected. time: [200.11 ms 200.47 ms 200.96 ms]
thrpt: [497.61 MiB/s 498.82 MiB/s 499.72 MiB/s]
change:
time: [−0.0850% +0.1485% +0.4403%] (p = 0.28 > 0.05)
thrpt: [−0.4384% −0.1483% +0.0851%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected. time: [299.66 ms 301.36 ms 303.08 ms]
thrpt: [32.994 Kelem/s 33.183 Kelem/s 33.371 Kelem/s]
change:
time: [−0.3116% +0.4839% +1.2104%] (p = 0.21 > 0.05)
thrpt: [−1.1959% −0.4816% +0.3126%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected. time: [28.416 ms 28.512 ms 28.630 ms]
thrpt: [34.928 B/s 35.073 B/s 35.191 B/s]
change:
time: [−0.3750% +0.0906% +0.5737%] (p = 0.71 > 0.05)
thrpt: [−0.5704% −0.0905% +0.3764%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved. time: [202.29 ms 202.62 ms 203.01 ms]
thrpt: [492.58 MiB/s 493.52 MiB/s 494.33 MiB/s]
change:
time: [−3.7342% −3.4882% −3.2606%] (p = 0.00 < 0.05)
thrpt: [+3.3705% +3.6142% +3.8791%]
decode 4096 bytes, mask ff: No change in performance detected. time: [11.613 µs 11.651 µs 11.694 µs]
change: [−0.8027% −0.1804% +0.3326%] (p = 0.57 > 0.05)
decode 1048576 bytes, mask ff: No change in performance detected. time: [3.0185 ms 3.0278 ms 3.0387 ms]
change: [−0.8118% −0.1895% +0.3544%] (p = 0.54 > 0.05)
decode 4096 bytes, mask 7f: No change in performance detected. time: [19.948 µs 19.998 µs 20.056 µs]
change: [−0.3160% +0.1413% +0.5881%] (p = 0.57 > 0.05)
decode 1048576 bytes, mask 7f: No change in performance detected. time: [5.0328 ms 5.0426 ms 5.0540 ms]
change: [−1.2024% −0.5035% +0.0438%] (p = 0.12 > 0.05)
decode 4096 bytes, mask 3f: No change in performance detected. time: [8.2789 µs 8.3159 µs 8.3580 µs]
change: [+0.0483% +0.5635% +1.1728%] (p = 0.06 > 0.05)
decode 1048576 bytes, mask 3f: No change in performance detected. time: [1.5881 ms 1.5949 ms 1.6035 ms]
change: [−2.0388% −0.4004% +0.7757%] (p = 0.67 > 0.05)
1-streams/each-1000-bytes/wallclock-time: Change within noise threshold. time: [589.89 µs 591.70 µs 593.79 µs]
change: [−1.1506% −0.6516% −0.1492%] (p = 0.01 < 0.05)
1000-streams/each-1-bytes/wallclock-time: Change within noise threshold. time: [14.032 ms 14.061 ms 14.092 ms]
change: [−0.9199% −0.6459% −0.3527%] (p = 0.00 < 0.05)
1000-streams/each-1000-bytes/wallclock-time: No change in performance detected. time: [50.776 ms 50.939 ms 51.101 ms]
change: [−0.1426% +0.5306% +1.1027%] (p = 0.10 > 0.05)
1000-streams/each-1000-bytes/simulated-time: No change in performance detected. time: [18.741 s 18.912 s 19.085 s]
thrpt: [51.170 KiB/s 51.638 KiB/s 52.110 KiB/s]
change:
time: [−0.9411% +0.3276% +1.5536%] (p = 0.62 > 0.05)
thrpt: [−1.5298% −0.3265% +0.9501%]
coalesce_acked_from_zero 1+1 entries: No change in performance detected. time: [88.191 ns 88.538 ns 88.895 ns]
change: [−0.1959% +0.3684% +1.1270%] (p = 0.28 > 0.05)
coalesce_acked_from_zero 3+1 entries: No change in performance detected. time: [106.03 ns 106.55 ns 107.30 ns]
change: [−0.2849% +0.3210% +1.0704%] (p = 0.39 > 0.05)
coalesce_acked_from_zero 10+1 entries: No change in performance detected. time: [105.58 ns 106.07 ns 106.64 ns]
change: [−0.4911% −0.0224% +0.4220%] (p = 0.92 > 0.05)
coalesce_acked_from_zero 1000+1 entries: No change in performance detected. time: [88.723 ns 91.556 ns 98.100 ns]
change: [−1.4101% +2.7728% +10.065%] (p = 0.60 > 0.05)
RxStreamOrderer::inbound_frame(): 💚 Performance has improved. time: [102.63 ms 102.79 ms 103.07 ms]
change: [−7.6514% −7.3735% −7.0581%] (p = 0.00 < 0.05)
sent::Packets::take_ranges: No change in performance detected. time: [4.5298 µs 4.6645 µs 4.8068 µs]
change: [−2.0069% +1.3391% +5.2122%] (p = 0.52 > 0.05)
transfer/pacing-false/varying-seeds/wallclock-time/run: Change within noise threshold. time: [26.951 ms 27.000 ms 27.052 ms]
change: [+0.9397% +1.2158% +1.4901%] (p = 0.00 < 0.05)
transfer/pacing-false/varying-seeds/simulated-time/run: No change in performance detected. time: [25.129 s 25.166 s 25.204 s]
thrpt: [162.52 KiB/s 162.76 KiB/s 163.00 KiB/s]
change:
time: [−0.2319% −0.0321% +0.1692%] (p = 0.76 > 0.05)
thrpt: [−0.1689% +0.0321% +0.2325%]
transfer/pacing-true/varying-seeds/wallclock-time/run: Change within noise threshold. time: [27.309 ms 27.379 ms 27.452 ms]
change: [+0.4884% +0.8607% +1.2416%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds/simulated-time/run: Change within noise threshold. time: [24.987 s 25.033 s 25.079 s]
thrpt: [163.32 KiB/s 163.62 KiB/s 163.92 KiB/s]
change:
time: [+0.0800% +0.3075% +0.5500%] (p = 0.01 < 0.05)
thrpt: [−0.5470% −0.3065% −0.0799%]
transfer/pacing-false/same-seed/wallclock-time/run: Change within noise threshold. time: [26.419 ms 26.434 ms 26.449 ms]
change: [+0.9941% +1.1013% +1.2043%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed/simulated-time/run: No change in performance detected. time: [25.152 s 25.152 s 25.152 s]
thrpt: [162.85 KiB/s 162.85 KiB/s 162.85 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000%]
transfer/pacing-true/same-seed/wallclock-time/run: Change within noise threshold. time: [28.142 ms 28.160 ms 28.179 ms]
change: [+0.0679% +0.1823% +0.2954%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed/simulated-time/run: No change in performance detected. time: [25.588 s 25.588 s 25.588 s]
thrpt: [160.07 KiB/s 160.07 KiB/s 160.07 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000%]
Download data for |
|
Hm. Transfer test regression, but one bench shows an improvement. Time to look at flamegraphs... |
|
Hm. Simplifying |
|
So what we save in |
CodSpeed Performance ReportMerging #3003 will improve performances by 17.7%Comparing Summary
Benchmarks breakdown
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to b9c32c7. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Client/server transfer resultsPerformance differences relative to b9c32c7. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
Let's see if this helps performance.