feat(transport/cubic): cubic region RFC9438 updates#2983
feat(transport/cubic): cubic region RFC9438 updates#2983omansfeld merged 5 commits intomozilla:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2983 +/- ##
==========================================
- Coverage 93.41% 93.38% -0.04%
==========================================
Files 124 124
Lines 35969 35970 +1
Branches 35969 35970 +1
==========================================
- Hits 33602 33591 -11
- Misses 1523 1535 +12
Partials 844 844
|
fa05f07 to
a2d3411
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR updates the CUBIC congestion control implementation with significant refinements to target calculation and documentation improvements. The changes enhance the algorithm's adherence to RFC 9438 specifications while removing obsolete constants.
- Updated documentation and comments to reference RFC 9438 instead of RFC 8312
- Implemented new target formula with clamping between cwnd and cwnd*1.5
- Removed
EXPONENTIAL_GROWTH_REDUCTIONconstant in favor of the new target clamping approach
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
a2d3411 to
1d42561
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Once the latest congestion control changes landed in Nightly, we should closely monitor packet loss: |
Worth adding a |
|
| Branch | cubic_region_updates |
| Testbed | On-prem |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
|---|---|---|---|
| 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client | 📈 view plot 🚷 view threshold | 205,030,000.00 ns(-1.74%)Baseline: 208,655,425.53 ns | 218,012,106.18 ns (94.05%) |
| 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client | 📈 view plot 🚷 view threshold | 199,230,000.00 ns(-1.69%)Baseline: 202,660,797.87 ns | 212,889,465.14 ns (93.58%) |
| 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client | 📈 view plot 🚷 view threshold | 28,498,000.00 ns(+0.29%)Baseline: 28,415,441.49 ns | 28,858,220.30 ns (98.75%) |
| 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client | 📈 view plot 🚷 view threshold | 282,720,000.00 ns(-3.92%)Baseline: 294,259,521.28 ns | 306,283,616.21 ns (92.31%) |
| 1-streams/each-1000-bytes/simulated-time | 📈 view plot 🚷 view threshold | 119,050,000.00 ns(+0.58%)Baseline: 118,357,978.72 ns | 120,888,615.11 ns (98.48%) |
| 1-streams/each-1000-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 591,170.00 ns(-1.09%)Baseline: 597,663.46 ns | 622,601.76 ns (94.95%) |
| 1000-streams/each-1-bytes/simulated-time | 📈 view plot 🚷 view threshold | 14,996,000,000.00 ns(+0.03%)Baseline: 14,992,058,510.64 ns | 15,010,807,004.61 ns (99.90%) |
| 1000-streams/each-1-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 13,803,000.00 ns(-2.76%)Baseline: 14,194,686.17 ns | 14,973,867.89 ns (92.18%) |
| 1000-streams/each-1000-bytes/simulated-time | 📈 view plot 🚷 view threshold | 19,061,000,000.00 ns(+0.74%)Baseline: 18,921,611,702.13 ns | 19,189,763,596.48 ns (99.33%) |
| 1000-streams/each-1000-bytes/wallclock-time | 📈 view plot 🚷 view threshold | 48,754,000.00 ns(-6.45%)Baseline: 52,117,143.62 ns | 58,599,715.44 ns (83.20%) |
| RxStreamOrderer::inbound_frame() | 📈 view plot 🚷 view threshold | 107,120,000.00 ns(-2.43%)Baseline: 109,791,595.74 ns | 111,981,697.30 ns (95.66%) |
| coalesce_acked_from_zero 1+1 entries | 📈 view plot 🚷 view threshold | 88.66 ns(+0.04%)Baseline: 88.62 ns | 89.29 ns (99.29%) |
| coalesce_acked_from_zero 10+1 entries | 📈 view plot 🚷 view threshold | 105.63 ns(-0.42%)Baseline: 106.08 ns | 107.08 ns (98.65%) |
| coalesce_acked_from_zero 1000+1 entries | 📈 view plot 🚷 view threshold | 88.96 ns(-0.95%)Baseline: 89.82 ns | 94.42 ns (94.21%) |
| coalesce_acked_from_zero 3+1 entries | 📈 view plot 🚷 view threshold | 106.39 ns(-0.19%)Baseline: 106.59 ns | 107.55 ns (98.92%) |
| decode 1048576 bytes, mask 3f | 📈 view plot 🚷 view threshold | 1,590,600.00 ns(-0.12%)Baseline: 1,592,520.21 ns | 1,599,563.62 ns (99.44%) |
| decode 1048576 bytes, mask 7f | 📈 view plot 🚷 view threshold | 5,059,500.00 ns(+0.04%)Baseline: 5,057,326.06 ns | 5,076,842.39 ns (99.66%) |
| decode 1048576 bytes, mask ff | 📈 view plot 🚷 view threshold | 3,033,100.00 ns(+0.05%)Baseline: 3,031,698.94 ns | 3,043,449.39 ns (99.66%) |
| decode 4096 bytes, mask 3f | 📈 view plot 🚷 view threshold | 8,317.10 ns(+0.25%)Baseline: 8,296.37 ns | 8,343.26 ns (99.69%) |
| decode 4096 bytes, mask 7f | 📈 view plot 🚷 view threshold | 20,047.00 ns(+0.19%)Baseline: 20,008.22 ns | 20,087.14 ns (99.80%) |
| decode 4096 bytes, mask ff | 📈 view plot 🚷 view threshold | 11,656.00 ns(-0.60%)Baseline: 11,726.27 ns | 11,970.14 ns (97.38%) |
| sent::Packets::take_ranges | 📈 view plot 🚷 view threshold | 4,668.30 ns(-1.67%)Baseline: 4,747.42 ns | 4,986.14 ns (93.63%) |
| transfer/pacing-false/same-seed/simulated-time/run | 📈 view plot 🚷 view threshold | 25,710,000,000.00 ns(+1.72%)Baseline: 25,275,000,000.00 ns | 25,817,812,451.62 ns (99.58%) |
| transfer/pacing-false/same-seed/wallclock-time/run | 📈 view plot 🚷 view threshold | 26,156,000.00 ns(+0.63%)Baseline: 25,993,139.78 ns | 27,070,349.68 ns (96.62%) |
| transfer/pacing-false/varying-seeds/simulated-time/run | 📈 view plot 🚷 view threshold | 25,176,000,000.00 ns(+0.03%)Baseline: 25,168,268,817.20 ns | 25,215,941,018.90 ns (99.84%) |
| transfer/pacing-false/varying-seeds/wallclock-time/run | 📈 view plot 🚷 view threshold | 25,500,000.00 ns(-2.66%)Baseline: 26,197,709.68 ns | 27,647,732.39 ns (92.23%) |
| transfer/pacing-true/same-seed/simulated-time/run | 📈 view plot 🚷 view threshold | 25,675,000,000.00 ns(+0.26%)Baseline: 25,607,177,419.35 ns | 25,691,809,468.26 ns (99.93%) |
| transfer/pacing-true/same-seed/wallclock-time/run | 📈 view plot 🚷 view threshold | 27,187,000.00 ns(-0.76%)Baseline: 27,395,930.11 ns | 28,804,606.05 ns (94.38%) |
| transfer/pacing-true/varying-seeds/simulated-time/run | 📈 view plot 🚷 view threshold | 24,959,000,000.00 ns(-0.14%)Baseline: 24,994,043,010.75 ns | 25,044,262,140.71 ns (99.66%) |
| transfer/pacing-true/varying-seeds/wallclock-time/run | 📈 view plot 🚷 view threshold | 26,007,000.00 ns(-2.68%)Baseline: 26,724,134.41 ns | 28,219,820.17 ns (92.16%) |
I don't think it's necessary, as it really mathematically cannot happen (unless with unrealistically small cwnd values). Just as an example, if we had a cwnd of That said, I'm not opposed to putting in a |
Just one more line of code. No other downside. |
35d6048 to
bcd1bb5
Compare
|
| Branch | cubic_region_updates |
| Testbed | On-prem |
🚨 1 Alert
| Iteration | Benchmark | Measure Units | View | Benchmark Result (Result Δ%) | Upper Boundary (Limit %) |
|---|---|---|---|---|---|
| 9 | neqo vs. s2n (cubic, paced) | Latency milliseconds (ms) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 226.24 ms(+2.44%)Baseline: 220.85 ms | 223.80 ms (101.09%) |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| google vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 275.35 ms(-0.84%)Baseline: 277.68 ms | 280.33 ms (98.23%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| msquic vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 196.83 ms(+1.33%)Baseline: 194.24 ms | 227.67 ms (86.45%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. google (cubic, paced) | 📈 view plot 🚷 view threshold | 757.64 ms(+0.00%)Baseline: 757.63 ms | 764.85 ms (99.06%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. msquic (cubic, paced) | 📈 view plot 🚷 view threshold | 159.64 ms(+1.50%)Baseline: 157.27 ms | 159.70 ms (99.96%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (cubic) | 📈 view plot 🚷 view threshold | 89.32 ms(-1.44%)Baseline: 90.63 ms | 94.43 ms (94.59%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 90.10 ms(-1.90%)Baseline: 91.84 ms | 95.56 ms (94.29%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (reno) | 📈 view plot 🚷 view threshold | 89.47 ms(-1.23%)Baseline: 90.59 ms | 93.95 ms (95.24%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. neqo (reno, paced) | 📈 view plot 🚷 view threshold | 93.08 ms(+1.37%)Baseline: 91.82 ms | 95.42 ms (97.54%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. quiche (cubic, paced) | 📈 view plot 🚷 view threshold | 193.10 ms(-0.35%)Baseline: 193.77 ms | 197.23 ms (97.91%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| neqo vs. s2n (cubic, paced) | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 226.24 ms(+2.44%)Baseline: 220.85 ms | 223.80 ms (101.09%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| quiche vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 155.72 ms(+2.05%)Baseline: 152.59 ms | 158.25 ms (98.41%) |
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| s2n vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 175.94 ms(+1.15%)Baseline: 173.94 ms | 178.08 ms (98.80%) |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
neqo-transport/src/cc/cubic.rs:1
- The phrase should be 'no more than' instead of 'less than' to accurately reflect the RFC constraint that CUBIC's increase rate should not exceed slow start's rate.
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
- comments and docs - new `target` formula clamping to min and max value (`cwnd<=target<=cwnd*1.5`) - removing `EXPONENTIAL_GROWTH_REDUCTION` since it's use is now covered by the max value mentioned above
- linux reference links - renaming variables - `debug_assert!` to check exponential growth of `w_est`
- add predicate for small cwnd values - move assert and change the message
28ffece to
920de9a
Compare
mxinden
left a comment
There was a problem hiding this comment.
For others, debug_assert removed after out-of-band discussion. RFC does not require the limit on the Reno region. Assumption is, that it is fine to follow the RFC only.
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 791fd40. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Client/server transfer resultsPerformance differences relative to b90ded6. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
Benchmark resultsPerformance differences relative to b90ded6. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold. time: [198.86 ms 199.23 ms 199.67 ms]
thrpt: [500.83 MiB/s 501.94 MiB/s 502.88 MiB/s]
change:
time: [−0.5566% −0.2927% −0.0386%] (p = 0.03 < 0.05)
thrpt: [+0.0387% +0.2936% +0.5597%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold. time: [280.98 ms 282.72 ms 284.52 ms]
thrpt: [35.147 Kelem/s 35.370 Kelem/s 35.590 Kelem/s]
change:
time: [−2.5094% −1.6343% −0.6886%] (p = 0.00 < 0.05)
thrpt: [+0.6934% +1.6614% +2.5740%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected. time: [28.445 ms 28.498 ms 28.571 ms]
thrpt: [35.000 B/s 35.090 B/s 35.155 B/s]
change:
time: [−0.3734% −0.0046% +0.3304%] (p = 0.98 > 0.05)
thrpt: [−0.3293% +0.0046% +0.3748%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: Change within noise threshold. time: [204.78 ms 205.03 ms 205.28 ms]
thrpt: [487.14 MiB/s 487.73 MiB/s 488.34 MiB/s]
change:
time: [+0.6909% +0.8881% +1.0673%] (p = 0.00 < 0.05)
thrpt: [−1.0560% −0.8803% −0.6862%]
decode 4096 bytes, mask ff: No change in performance detected. time: [11.605 µs 11.656 µs 11.721 µs]
change: [−0.0692% +0.4745% +1.1337%] (p = 0.14 > 0.05)
decode 1048576 bytes, mask ff: No change in performance detected. time: [3.0212 ms 3.0331 ms 3.0475 ms]
change: [−0.4890% +0.0817% +0.6473%] (p = 0.79 > 0.05)
decode 4096 bytes, mask 7f: No change in performance detected. time: [19.984 µs 20.047 µs 20.112 µs]
change: [−0.1467% +0.3306% +0.8997%] (p = 0.24 > 0.05)
decode 1048576 bytes, mask 7f: No change in performance detected. time: [5.0479 ms 5.0595 ms 5.0728 ms]
change: [−0.2863% +0.0679% +0.4199%] (p = 0.70 > 0.05)
decode 4096 bytes, mask 3f: No change in performance detected. time: [8.2820 µs 8.3171 µs 8.3584 µs]
change: [−3.3451% −0.7110% +1.0209%] (p = 0.66 > 0.05)
decode 1048576 bytes, mask 3f: No change in performance detected. time: [1.5854 ms 1.5906 ms 1.5969 ms]
change: [−0.4174% +0.0858% +0.5841%] (p = 0.74 > 0.05)
1-streams/each-1000-bytes/wallclock-time: No change in performance detected. time: [589.03 µs 591.17 µs 593.62 µs]
change: [−1.1062% −0.2596% +0.4791%] (p = 0.54 > 0.05)
1000-streams/each-1-bytes/wallclock-time: Change within noise threshold. time: [13.779 ms 13.803 ms 13.827 ms]
change: [−1.4949% −0.8003% −0.2708%] (p = 0.01 < 0.05)
1000-streams/each-1000-bytes/wallclock-time: Change within noise threshold. time: [48.591 ms 48.754 ms 48.923 ms]
change: [−1.2857% −0.7961% −0.3212%] (p = 0.00 < 0.05)
coalesce_acked_from_zero 1+1 entries: No change in performance detected. time: [88.300 ns 88.657 ns 88.998 ns]
change: [−0.7076% −0.0900% +0.4401%] (p = 0.76 > 0.05)
coalesce_acked_from_zero 3+1 entries: No change in performance detected. time: [106.04 ns 106.39 ns 106.76 ns]
change: [−3.3446% −0.7650% +0.8209%] (p = 0.68 > 0.05)
coalesce_acked_from_zero 10+1 entries: No change in performance detected. time: [105.29 ns 105.63 ns 106.07 ns]
change: [−0.3605% +0.2551% +0.8561%] (p = 0.45 > 0.05)
coalesce_acked_from_zero 1000+1 entries: No change in performance detected. time: [88.850 ns 88.961 ns 89.097 ns]
change: [−1.1690% −0.2602% +0.6253%] (p = 0.58 > 0.05)
RxStreamOrderer::inbound_frame(): 💚 Performance has improved. time: [107.05 ms 107.12 ms 107.19 ms]
change: [−1.3466% −1.2436% −1.1448%] (p = 0.00 < 0.05)
sent::Packets::take_ranges: No change in performance detected. time: [4.5999 µs 4.6683 µs 4.7373 µs]
change: [−3.0742% +0.1722% +3.6012%] (p = 0.92 > 0.05)
transfer/pacing-false/varying-seeds/wallclock-time/run: Change within noise threshold. time: [25.450 ms 25.500 ms 25.558 ms]
change: [−0.7068% −0.4516% −0.1821%] (p = 0.00 < 0.05)
transfer/pacing-false/varying-seeds/simulated-time/run: No change in performance detected. time: [25.139 s 25.176 s 25.214 s]
thrpt: [162.45 KiB/s 162.69 KiB/s 162.94 KiB/s]
change:
time: [−0.3739% −0.1767% +0.0336%] (p = 0.09 > 0.05)
thrpt: [−0.0335% +0.1770% +0.3753%]
transfer/pacing-true/varying-seeds/wallclock-time/run: Change within noise threshold. time: [25.936 ms 26.007 ms 26.083 ms]
change: [+1.5088% +1.8631% +2.2585%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds/simulated-time/run: No change in performance detected. time: [24.921 s 24.959 s 25.000 s]
thrpt: [163.84 KiB/s 164.11 KiB/s 164.36 KiB/s]
change:
time: [−0.3303% −0.1258% +0.0979%] (p = 0.27 > 0.05)
thrpt: [−0.0978% +0.1259% +0.3314%]
transfer/pacing-false/same-seed/wallclock-time/run: Change within noise threshold. time: [26.137 ms 26.156 ms 26.175 ms]
change: [+2.1238% +2.3229% +2.4737%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed/simulated-time/run: No change in performance detected. time: [25.710 s 25.710 s 25.710 s]
thrpt: [159.31 KiB/s 159.31 KiB/s 159.31 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000%]
transfer/pacing-true/same-seed/wallclock-time/run: Change within noise threshold. time: [27.163 ms 27.187 ms 27.212 ms]
change: [+0.8877% +1.1122% +1.2727%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed/simulated-time/run: No change in performance detected. time: [25.675 s 25.675 s 25.675 s]
thrpt: [159.53 KiB/s 159.53 KiB/s 159.53 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000%]
Download data for |
using max_datagram_size_f64 across functionsmax_datagram_size_f64once #2994)usingsaturating_duration_sinceto get time since epoch startsaturating_duration_sincefor t #2997)targetformula clamping to min and max value (cwnd<=target<=cwnd*1.5)EXPONENTIAL_GROWTH_REDUCTIONsince it's use is now covered by the max value mentioned aboveSplit off of #2535.
Part of #2967.
EDIT 2025-10-13:
I rebased after parts of this have already landed (see above checklist). This PR now contains the last updates to the cubic region and should be ready for review.
In the old code we'd also apply the clamping to
1.5 * curr_cwndto the reno region, since it was applied to the return value, not to the cubic region calculation result. Since this clamping is supposed to prevent exponential growth ("less than the increase rate of slow start" - RFC9438) it doesn't need to be applied to the reno region, which is growing additively per design.