feat(transport/cubic): simplify byte tracking across acks#3043
feat(transport/cubic): simplify byte tracking across acks#3043omansfeld wants to merge 1 commit intomozilla:mainfrom
Conversation
We used to only increase `w_est` by multiples of `max_datagram_size`, thus not using up all of the `new_acked_bytes`. Therefore we had to calculate how many bytes were actually used, subtract those and track the leftover bytes across calls of `bytes_for_cwnd_increase`. This patch gets rid of that logic, always applying all of `new_bytes_acked`. The congestion window is still only going to increase in `max_datagram_size` sized steps through the logic in `ClassicCongestionControl::on_packets_acked`, but we don't need the leftover byte tracking anymore.
There was a problem hiding this comment.
Pull Request Overview
This PR simplifies the byte tracking logic in the Cubic congestion control algorithm by removing leftover byte accounting across acknowledgments. The change eliminates the complex logic that tracked unused bytes when increasing w_est in discrete max_datagram_size steps.
- Removes
reno_acked_bytesfield and associated leftover byte tracking - Simplifies
w_estcalculation to always apply all acknowledged bytes - Updates
start_epochmethod signature to remove unused parameters
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Failed Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Client/server transfer resultsPerformance differences relative to 3d2948d. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
Benchmark resultsPerformance differences relative to 3d2948d. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected. time: [200.10 ms 200.44 ms 200.78 ms]
thrpt: [498.05 MiB/s 498.91 MiB/s 499.74 MiB/s]
change:
time: [−0.4354% −0.0993% +0.2096%] (p = 0.56 > 0.05)
thrpt: [−0.2092% +0.0994% +0.4373%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected. time: [283.33 ms 285.04 ms 286.73 ms]
thrpt: [34.876 Kelem/s 35.083 Kelem/s 35.294 Kelem/s]
change:
time: [−0.8581% −0.0201% +0.8322%] (p = 0.97 > 0.05)
thrpt: [−0.8254% +0.0201% +0.8655%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected. time: [28.485 ms 28.552 ms 28.635 ms]
thrpt: [34.923 B/s 35.024 B/s 35.106 B/s]
change:
time: [+0.0020% +0.2560% +0.5447%] (p = 0.08 > 0.05)
thrpt: [−0.5418% −0.2553% −0.0020%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: Change within noise threshold. time: [202.75 ms 203.10 ms 203.54 ms]
thrpt: [491.30 MiB/s 492.37 MiB/s 493.21 MiB/s]
change:
time: [−1.4531% −1.2138% −0.9722%] (p = 0.00 < 0.05)
thrpt: [+0.9818% +1.2287% +1.4745%]
decode 4096 bytes, mask ff: No change in performance detected. time: [11.610 µs 11.654 µs 11.703 µs]
change: [−0.3157% +0.1136% +0.5898%] (p = 0.63 > 0.05)
decode 1048576 bytes, mask ff: No change in performance detected. time: [3.0236 ms 3.0444 ms 3.0769 ms]
change: [−0.3433% +0.4510% +1.6505%] (p = 0.43 > 0.05)
decode 4096 bytes, mask 7f: No change in performance detected. time: [19.933 µs 19.977 µs 20.029 µs]
change: [−0.2619% +0.0746% +0.4171%] (p = 0.69 > 0.05)
decode 1048576 bytes, mask 7f: No change in performance detected. time: [5.0417 ms 5.0514 ms 5.0627 ms]
change: [−0.5101% −0.1719% +0.1508%] (p = 0.32 > 0.05)
decode 4096 bytes, mask 3f: No change in performance detected. time: [8.2552 µs 8.2812 µs 8.3147 µs]
change: [−1.4609% −0.6251% −0.0075%] (p = 0.09 > 0.05)
decode 1048576 bytes, mask 3f: No change in performance detected. time: [1.5869 ms 1.5937 ms 1.6008 ms]
change: [−0.5145% +0.0899% +0.6983%] (p = 0.77 > 0.05)
1-streams/each-1000-bytes/wallclock-time: No change in performance detected. time: [584.23 µs 585.14 µs 586.22 µs]
change: [−0.1366% +0.2323% +0.5765%] (p = 0.21 > 0.05)
1000-streams/each-1-bytes/wallclock-time: No change in performance detected. time: [13.733 ms 13.757 ms 13.781 ms]
change: [−0.3427% −0.0705% +0.1944%] (p = 0.61 > 0.05)
1000-streams/each-1000-bytes/wallclock-time: No change in performance detected. time: [49.495 ms 49.654 ms 49.811 ms]
change: [−0.4862% −0.0411% +0.4140%] (p = 0.86 > 0.05)
1000-streams/each-1000-bytes/simulated-time: 💚 Performance has improved. time: [18.327 s 18.494 s 18.664 s]
thrpt: [52.324 KiB/s 52.803 KiB/s 53.286 KiB/s]
change:
time: [−3.9222% −2.7166% −1.4744%] (p = 0.00 < 0.05)
thrpt: [+1.4964% +2.7925% +4.0823%]
coalesce_acked_from_zero 1+1 entries: No change in performance detected. time: [87.919 ns 88.213 ns 88.504 ns]
change: [−0.4815% −0.0624% +0.3642%] (p = 0.78 > 0.05)
coalesce_acked_from_zero 3+1 entries: No change in performance detected. time: [105.95 ns 106.31 ns 106.68 ns]
change: [−0.3207% +0.1870% +0.7937%] (p = 0.52 > 0.05)
coalesce_acked_from_zero 10+1 entries: No change in performance detected. time: [105.29 ns 105.70 ns 106.19 ns]
change: [−0.3559% +0.1534% +0.7735%] (p = 0.59 > 0.05)
coalesce_acked_from_zero 1000+1 entries: No change in performance detected. time: [88.972 ns 89.089 ns 89.218 ns]
change: [−0.8392% −0.0277% +0.7337%] (p = 0.95 > 0.05)
RxStreamOrderer::inbound_frame(): No change in performance detected. time: [107.85 ms 108.01 ms 108.28 ms]
change: [−0.6770% −0.3155% +0.0171%] (p = 0.06 > 0.05)
sent::Packets::take_ranges: No change in performance detected. time: [4.5245 µs 4.5888 µs 4.6415 µs]
change: [−3.1125% −0.4276% +2.3765%] (p = 0.77 > 0.05)
transfer/pacing-false/varying-seeds/wallclock-time/run: Change within noise threshold. time: [25.210 ms 25.249 ms 25.288 ms]
change: [+1.0093% +1.2297% +1.4401%] (p = 0.00 < 0.05)
transfer/pacing-false/varying-seeds/simulated-time/run: Change within noise threshold. time: [25.067 s 25.098 s 25.129 s]
thrpt: [163.00 KiB/s 163.20 KiB/s 163.41 KiB/s]
change:
time: [−0.4680% −0.2891% −0.1010%] (p = 0.00 < 0.05)
thrpt: [+0.1011% +0.2899% +0.4702%]
transfer/pacing-true/varying-seeds/wallclock-time/run: Change within noise threshold. time: [25.717 ms 25.776 ms 25.836 ms]
change: [+0.7246% +1.1115% +1.4825%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds/simulated-time/run: No change in performance detected. time: [24.922 s 24.957 s 24.991 s]
thrpt: [163.90 KiB/s 164.12 KiB/s 164.36 KiB/s]
change:
time: [−0.4044% −0.1997% +0.0016%] (p = 0.05 > 0.05)
thrpt: [−0.0016% +0.2001% +0.4060%]
transfer/pacing-false/same-seed/wallclock-time/run: Change within noise threshold. time: [25.074 ms 25.093 ms 25.112 ms]
change: [−0.7422% −0.5707% −0.4276%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed/simulated-time/run: Change within noise threshold. time: [25.492 s 25.492 s 25.492 s]
thrpt: [160.68 KiB/s 160.68 KiB/s 160.68 KiB/s]
change:
time: [−0.8478% −0.8478% −0.8478%] (p = 0.00 < 0.05)
thrpt: [+0.8551% +0.8551% +0.8551%]
transfer/pacing-true/same-seed/wallclock-time/run: No change in performance detected. time: [26.121 ms 26.147 ms 26.186 ms]
change: [−0.0375% +0.1950% +0.3844%] (p = 0.07 > 0.05)
transfer/pacing-true/same-seed/simulated-time/run: 💚 Performance has improved. time: [25.131 s 25.131 s 25.131 s]
thrpt: [162.99 KiB/s 162.99 KiB/s 162.99 KiB/s]
change:
time: [−2.1210% −2.1210% −2.1210%] (p = 0.00 < 0.05)
thrpt: [+2.1669% +2.1669% +2.1669%]
Download data for |
|
| Branch | simplify_byte_tracking |
| Testbed | On-prem |
🚨 1 Alert
| Benchmark | Measure Units | View | Benchmark Result (Result Δ%) | Upper Boundary (Limit %) |
|---|---|---|---|---|
| decode 1048576 bytes, mask ff | Latency milliseconds (ms) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 3.04 ms(+0.42%)Baseline: 3.03 ms | 3.04 ms (100.01%) |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
|---|---|---|---|
| 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client | 📈 view plot 🚷 view threshold | 203,100,000.00 ns(-2.93%)Baseline: 209,237,251.46 ns | 217,763,767.26 ns (93.27%) |
| 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client | 📈 view plot 🚷 view threshold | 200,440,000.00 ns(-1.39%)Baseline: 203,266,140.35 ns | 212,732,889.41 ns (94.22%) |
| 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client | 📈 view plot 🚷 view threshold | 28,552,000.00 ns(+0.49%)Baseline: 28,413,152.05 ns | 28,876,403.00 ns (98.88%) |
| 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client | 📈 view plot 🚷 view threshold | 285,040,000.00 ns(-3.38%)Baseline: 295,007,192.98 ns | 306,049,858.23 ns (93.14%) |
| 1-streams/each-1000-bytes/simulated-time | 📈 view plot 🚷 view threshold | 118,850,000.00 ns(+0.48%)Baseline: 118,287,894.74 ns | 120,885,411.72 ns (98.32%) |
| 1-streams/each-1000-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 585,140.00 ns(-2.26%)Baseline: 598,650.64 ns | 623,493.03 ns (93.85%) |
| 1000-streams/each-1-bytes/simulated-time | 📈 view plot 🚷 view threshold | 14,990,000,000.00 ns(-0.01%)Baseline: 14,991,888,888.89 ns | 15,010,601,829.65 ns (99.86%) |
| 1000-streams/each-1-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 13,757,000.00 ns(-3.36%)Baseline: 14,235,181.29 ns | 14,984,963.90 ns (91.81%) |
| 1000-streams/each-1000-bytes/simulated-time | 📈 view plot 🚷 view threshold | 18,494,000,000.00 ns(-2.16%)Baseline: 18,903,029,239.77 ns | 19,156,891,572.29 ns (96.54%) |
| 1000-streams/each-1000-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 49,654,000.00 ns(-5.27%)Baseline: 52,418,532.16 ns | 58,797,744.06 ns (84.45%) |
| RxStreamOrderer::inbound_frame() | 📈 view plot 🚷 view threshold | 108,010,000.00 ns(-1.69%)Baseline: 109,867,543.86 ns | 112,039,815.51 ns (96.40%) |
| coalesce_acked_from_zero 1+1 entries | 📈 view plot 🚷 view threshold | 88.21 ns(-0.48%)Baseline: 88.64 ns | 89.32 ns (98.76%) |
| coalesce_acked_from_zero 10+1 entries | 📈 view plot 🚷 view threshold | 105.70 ns(-0.39%)Baseline: 106.11 ns | 107.10 ns (98.69%) |
| coalesce_acked_from_zero 1000+1 entries | 📈 view plot 🚷 view threshold | 89.09 ns(-0.86%)Baseline: 89.86 ns | 94.57 ns (94.21%) |
| coalesce_acked_from_zero 3+1 entries | 📈 view plot 🚷 view threshold | 106.31 ns(-0.30%)Baseline: 106.63 ns | 107.59 ns (98.81%) |
| decode 1048576 bytes, mask 3f | 📈 view plot 🚷 view threshold | 1,593,700.00 ns(+0.07%)Baseline: 1,592,559.65 ns | 1,599,660.71 ns (99.63%) |
| decode 1048576 bytes, mask 7f | 📈 view plot 🚷 view threshold | 5,051,400.00 ns(-0.12%)Baseline: 5,057,223.98 ns | 5,077,326.61 ns (99.49%) |
| decode 1048576 bytes, mask ff | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 3,044,400.00 ns(+0.42%)Baseline: 3,031,724.56 ns | 3,043,954.93 ns (100.01%) |
| decode 4096 bytes, mask 3f | 📈 view plot 🚷 view threshold | 8,281.20 ns(-0.19%)Baseline: 8,296.88 ns | 8,345.07 ns (99.23%) |
| decode 4096 bytes, mask 7f | 📈 view plot 🚷 view threshold | 19,977.00 ns(-0.15%)Baseline: 20,007.11 ns | 20,086.25 ns (99.46%) |
| decode 4096 bytes, mask ff | 📈 view plot 🚷 view threshold | 11,654.00 ns(-0.70%)Baseline: 11,735.81 ns | 11,980.18 ns (97.28%) |
| sent::Packets::take_ranges | 📈 view plot 🚷 view threshold | 4,588.80 ns(-3.37%)Baseline: 4,748.60 ns | 4,992.88 ns (91.91%) |
| transfer/pacing-false/same-seed/simulated-time/run | 📈 view plot 🚷 view threshold | 25,492,000,000.00 ns(+1.04%)Baseline: 25,229,952,662.72 ns | 25,681,635,982.41 ns (99.26%) |
| transfer/pacing-false/same-seed/wallclock-time/run | 📈 view plot 🚷 view threshold | 25,093,000.00 ns(-3.67%)Baseline: 26,048,130.18 ns | 27,075,939.86 ns (92.68%) |
| transfer/pacing-false/varying-seeds/simulated-time/run | 📈 view plot 🚷 view threshold | 25,098,000,000.00 ns(-0.27%)Baseline: 25,166,094,674.56 ns | 25,212,596,997.08 ns (99.55%) |
| transfer/pacing-false/varying-seeds/wallclock-time/run | 📈 view plot 🚷 view threshold | 25,249,000.00 ns(-4.02%)Baseline: 26,305,633.14 ns | 27,566,364.23 ns (91.59%) |
| transfer/pacing-true/same-seed/simulated-time/run | 📈 view plot 🚷 view threshold | 25,131,000,000.00 ns(-1.82%)Baseline: 25,597,136,094.67 ns | 25,706,861,265.59 ns (97.76%) |
| transfer/pacing-true/same-seed/wallclock-time/run | 📈 view plot 🚷 view threshold | 26,147,000.00 ns(-4.88%)Baseline: 27,489,781.07 ns | 28,754,577.59 ns (90.93%) |
| transfer/pacing-true/varying-seeds/simulated-time/run | 📈 view plot 🚷 view threshold | 24,957,000,000.00 ns(-0.15%)Baseline: 24,993,822,485.21 ns | 25,044,357,554.26 ns (99.65%) |
| transfer/pacing-true/varying-seeds/wallclock-time/run | 📈 view plot 🚷 view threshold | 25,776,000.00 ns(-3.96%)Baseline: 26,838,065.09 ns | 28,130,574.71 ns (91.63%) |
We used to only increase
w_estby multiples ofmax_datagram_size, thus not using up all of thenew_acked_bytes. Therefore we had to calculate how many bytes were actually used, subtract those and track the leftover bytes across calls ofbytes_for_cwnd_increase.This patch gets rid of that logic, always applying all of
new_bytes_acked. The congestion window is still only going to increase inmax_datagram_sizesized steps through the logic inClassicCongestionControl::on_packets_acked, but we don't need the leftover byte tracking anymore.As visible in CI below this patch does change some behavior -- we have 2 tests failing, thus apparently the congestion window growth is different.
Because
w_estis not floored to bemss-sized anymore it is now potentially a bigger value than before. And since capping thecwndincrease to bemss-sized only happens inClassicCongestionControl::on_packets_ackedthe return value ofbytes_for_cwnd_increasechanges, too. That means we now need less bytes acked to increase by1*mssinon_packets_acked.neqo/neqo-transport/src/cc/classic_cc.rs
Lines 238 to 265 in 161476a
While this should in theory only lead to faster growth of the congestion window, if you combine it with the double increase we are sometimes doing, it does lead to weird behavior in our tests. Similar behavior was first seen and described in this comment: #2535 (comment)
I think this patch is the correct way to do it, but as per all the above it does need more time and testing to verify it doesn't lead to unwanted behavior.
I'm uploading this draft PR with failing tests as a next step/reminder/reference per @mxinden's ask in #2973 (review).
Part of #3053