feat(transport/cc): make slow start pluggable and add hystart++#3278
feat(transport/cc): make slow start pluggable and add hystart++#3278larseggert merged 19 commits intomozilla:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3278 +/- ##
==========================================
- Coverage 94.12% 94.00% -0.13%
==========================================
Files 125 131 +6
Lines 38339 38811 +472
Branches 38339 38811 +472
==========================================
+ Hits 36086 36483 +397
- Misses 1412 1481 +69
- Partials 841 847 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
mxinden
left a comment
There was a problem hiding this comment.
In favor of the design itself.
I suggest only merging here once we have more than one slow start algorithm. Each pull request should improve the state of main. Merging here now only adds a dead abstraction, i.e. SlowStart, only implemented once. (Similar to trait CongestionControl today.)
I very much get that point. That said, what would you say is the best point to move forward with this work then? Having a bunch of dependent PRs seems like it could easily become a rebasing nightmare and one big PR seems like it could become very cumbersome to review. I would also very much like to have the along-the-way feedback of incremental PRs. Should this be reviewed and merged onto a feature-branch that then eventually gets merged onto main? |
As per @mxinden's comment I made this PR a draft again and will also add HyStart++ as part of it. I might still ask for review on single commits along the way to have some incremental feedback. |
ba2537b to
36308ef
Compare
Merging this PR will degrade performance by 12.71%
Performance Changes
Comparing |
|
I added another commit to make slow start configurable via connection parameters and the CLI. Would welcome another review once you have some time @mxinden! Also a question, just to make sure I understood the parameter code structure correctly:
neqo/neqo-transport/src/sender.rs Lines 36 to 44 in adcd4bd But Lines 234 to 243 in adcd4bd But from Firefox we directly set our |
correct
correct
|
mxinden
left a comment
There was a problem hiding this comment.
Changes here thus far look good to me.
beb57b7 to
86b496d
Compare
|
The HyStart++ implementation is done now, so I'm opening up for review! I updated the main PR description to resemble the current state. |
- there was a bug where we would exit css, but the rtt was still above the threshold for entering css, so we re-entered on the same ack - now we either are in css and can exit, or we are not in css and can enter, but not go back and forth in a single ack - this is in line with the RFC and with what other implementations are doing (msquic, linux, freeBSD) - this results in slightly more aggressive growth, since now we have non-css growth on the ack that exits css and only go back into css on the next ack if the css exit was truly spurious
4fde4f7 to
6d0022b
Compare
mxinden
left a comment
There was a problem hiding this comment.
Thanks for the changes. I will take another look.
Can you document this decision in the code? Sorry in case it is already documented and I missed it. |
- also fixing some nits
mxinden
left a comment
There was a problem hiding this comment.
Ready to merge from my end.
I will take another look tomorrow with a fresh mind, but that doesn't need to block here.
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
Client/server transfer resultsPerformance differences relative to 70a0ba6. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Table above only shows statistically significant changes. See all results below. All resultsTransfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
Benchmark resultsSignificant performance differences relative to 70a0ba6. transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: 💔 Performance has regressed by +1.7538%. time: [201.36 ms 201.96 ms 202.67 ms]
thrpt: [493.42 MiB/s 495.15 MiB/s 496.63 MiB/s]
change:
time: [+1.3976% +1.7538% +2.1063] (p = 0.00 < 0.05)
thrpt: [-2.0629% -1.7236% -1.3784]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💔 Performance has regressed by +1.9825%. time: [205.70 ms 206.09 ms 206.52 ms]
thrpt: [484.22 MiB/s 485.22 MiB/s 486.14 MiB/s]
change:
time: [+1.6768% +1.9825% +2.2961] (p = 0.00 < 0.05)
thrpt: [-2.2445% -1.9439% -1.6492]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severestreams/walltime/1000-streams/each-1000-bytes: 💔 Performance has regressed by +4.0151%. time: [46.323 ms 46.386 ms 46.459 ms]
change: [+3.8356% +4.0151% +4.2065] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severeAll resultstransfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: 💔 Performance has regressed by +1.7538%. time: [201.36 ms 201.96 ms 202.67 ms]
thrpt: [493.42 MiB/s 495.15 MiB/s 496.63 MiB/s]
change:
time: [+1.3976% +1.7538% +2.1063] (p = 0.00 < 0.05)
thrpt: [-2.0629% -1.7236% -1.3784]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severetransfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: Change within noise threshold. time: [280.19 ms 282.00 ms 283.81 ms]
thrpt: [35.235 Kelem/s 35.462 Kelem/s 35.690 Kelem/s]
change:
time: [-1.9902% -1.0605% -0.1034] (p = 0.02 < 0.05)
thrpt: [+0.1035% +1.0719% +2.0306]
Change within noise threshold.transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected. time: [38.585 ms 38.745 ms 38.923 ms]
thrpt: [25.692 B/s 25.810 B/s 25.917 B/s]
change:
time: [-0.3016% +0.2723% +0.8239] (p = 0.37 > 0.05)
thrpt: [-0.8171% -0.2715% +0.3025]
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💔 Performance has regressed by +1.9825%. time: [205.70 ms 206.09 ms 206.52 ms]
thrpt: [484.22 MiB/s 485.22 MiB/s 486.14 MiB/s]
change:
time: [+1.6768% +1.9825% +2.2961] (p = 0.00 < 0.05)
thrpt: [-2.2445% -1.9439% -1.6492]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severedecode 4096 bytes, mask ff: No change in performance detected. time: [5.5696 µs 5.5794 µs 5.5903 µs]
change: [-0.7636% -0.1624% +0.3988] (p = 0.63 > 0.05)
No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
9 (9.00%) high mild
9 (9.00%) high severedecode 1048576 bytes, mask ff: No change in performance detected. time: [1.4285 ms 1.4339 ms 1.4426 ms]
change: [+0.0206% +0.4687% +1.1248] (p = 0.08 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severedecode 4096 bytes, mask 7f: No change in performance detected. time: [8.4654 µs 8.4788 µs 8.4933 µs]
change: [-0.3905% -0.0249% +0.2978] (p = 0.89 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severedecode 1048576 bytes, mask 7f: No change in performance detected. time: [2.1760 ms 2.1837 ms 2.1962 ms]
change: [-0.2251% +0.1621% +0.8014] (p = 0.62 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severedecode 4096 bytes, mask 3f: No change in performance detected. time: [6.9147 µs 6.9251 µs 6.9363 µs]
change: [-0.1618% +0.1666% +0.4920] (p = 0.33 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severedecode 1048576 bytes, mask 3f: No change in performance detected. time: [1.7668 ms 1.7741 ms 1.7859 ms]
change: [-0.0929% +0.3650% +1.0351] (p = 0.26 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severestreams/simulated/1-streams/each-1000-bytes: No change in performance detected. time: [129.68 ms 129.68 ms 129.69 ms]
thrpt: [7.5302 KiB/s 7.5304 KiB/s 7.5306 KiB/s]
change:
time: [-0.0047% -0.0010% +0.0029] (p = 0.63 > 0.05)
thrpt: [-0.0029% +0.0010% +0.0047]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildstreams/simulated/1000-streams/each-1-bytes: Change within noise threshold. time: [2.5364 s 2.5366 s 2.5369 s]
thrpt: [394.18 B/s 394.22 B/s 394.27 B/s]
change:
time: [+0.0013% +0.0174% +0.0334] (p = 0.04 < 0.05)
thrpt: [-0.0334% -0.0174% -0.0013]
Change within noise threshold.streams/simulated/1000-streams/each-1000-bytes: No change in performance detected. time: [6.5817 s 6.5877 s 6.5946 s]
thrpt: [148.08 KiB/s 148.24 KiB/s 148.38 KiB/s]
change:
time: [-0.3384% -0.1509% +0.0312] (p = 0.12 > 0.05)
thrpt: [-0.0312% +0.1511% +0.3395]
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severestreams/walltime/1-streams/each-1000-bytes: Change within noise threshold. time: [587.68 µs 589.96 µs 592.79 µs]
change: [+0.5995% +1.1178% +1.6935] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severestreams/walltime/1000-streams/each-1-bytes: Change within noise threshold. time: [12.406 ms 12.423 ms 12.441 ms]
change: [-0.6380% -0.4222% -0.2021] (p = 0.00 < 0.05)
Change within noise threshold.streams/walltime/1000-streams/each-1000-bytes: 💔 Performance has regressed by +4.0151%. time: [46.323 ms 46.386 ms 46.459 ms]
change: [+3.8356% +4.0151% +4.2065] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severecoalesce_acked_from_zero 1+1 entries: No change in performance detected. time: [92.552 ns 92.830 ns 93.104 ns]
change: [-1.2295% -0.2166% +0.5790] (p = 0.69 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severecoalesce_acked_from_zero 3+1 entries: No change in performance detected. time: [110.35 ns 110.68 ns 111.02 ns]
change: [-3.4883% -1.0878% +0.3176] (p = 0.47 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) high mild
9 (9.00%) high severecoalesce_acked_from_zero 10+1 entries: No change in performance detected. time: [109.93 ns 110.39 ns 110.91 ns]
change: [-0.3048% +0.3136% +0.9892] (p = 0.36 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) low mild
7 (7.00%) high severecoalesce_acked_from_zero 1000+1 entries: No change in performance detected. time: [94.568 ns 94.947 ns 95.587 ns]
change: [-0.1114% +0.4768% +1.0125] (p = 0.11 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severeRxStreamOrderer::inbound_frame(): Change within noise threshold. time: [108.09 ms 108.19 ms 108.29 ms]
change: [-0.6652% -0.5509% -0.4337] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildsent::Packets::take_ranges: No change in performance detected. time: [4.5181 µs 4.6285 µs 4.7448 µs]
change: [-3.0163% +0.3531% +3.8585] (p = 0.85 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mildtransfer/simulated/pacing-false/varying-seeds: No change in performance detected. time: [23.941 s 23.941 s 23.941 s]
thrpt: [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/simulated/pacing-true/varying-seeds: No change in performance detected. time: [23.676 s 23.676 s 23.676 s]
thrpt: [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/simulated/pacing-false/same-seed: No change in performance detected. time: [23.941 s 23.941 s 23.941 s]
thrpt: [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/simulated/pacing-true/same-seed: No change in performance detected. time: [23.676 s 23.676 s 23.676 s]
thrpt: [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/walltime/pacing-false/varying-seeds: Change within noise threshold. time: [23.213 ms 23.251 ms 23.303 ms]
change: [+1.1236% +1.3350% +1.5728] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/varying-seeds: Change within noise threshold. time: [23.305 ms 23.326 ms 23.350 ms]
change: [+0.4256% +0.5767% +0.7186] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severetransfer/walltime/pacing-false/same-seed: Change within noise threshold. time: [23.395 ms 23.418 ms 23.442 ms]
change: [+0.9303% +1.0634% +1.1991] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mildtransfer/walltime/pacing-true/same-seed: Change within noise threshold. time: [23.814 ms 23.843 ms 23.886 ms]
change: [+0.7849% +0.9701% +1.1969] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severeDownload data for |
|
@larseggert do you want to take another look or can I merge? The plan is to land this pref'd off in firefox with a new release next week and then run experiments on it (classic is still the default in neqo). |
larseggert
left a comment
There was a problem hiding this comment.
LGTM. Minor comments.
fdc7597 introduces the scaffolding to make slow start configurable by introducing a
SlowStarttrait and using a generic parameter with that trait bound in theClassicCongestionControlstruct. It also rewrites the current slow start implementation inclassic_slow_start.rs.df81bdb makes the slow start algorithm configurable at run-time via the CLI/ConnectionParams, similar to how it works for Cubic/NewReno.
As a reminder: fdc7597..df81bdb had already had a review cycle before I started working on HyStart++ and made the PR a draft again.
41e8c04 implements HyStart++ as per RFC 9406 with extensive tests added in 86b496d.
With the latest commit ae92b97 I re-architected the
SlowStarttrait and the order in which slow start functionality is called from the congestion controller. Hystart functionality is unchanged between that commit and the former implementation, this just improves architecture and makes it more extensible for future work.It now separates functionality into the correct order of operations for slow start (
check for exit->do the exit->apply growth if not exiting->do congestion avoidance if not in slow start anymore) which also makes it easier to add things like SUSS or rapid-start that accelerate slow start growth (but not touch the exit heuristics) in the future.To maybe make review easier, the main parts are:
neqo-transport/src/cc/hystart.rscontains the HyStart++ implementation.neqo-transport/src/cc/tests/hystart.rscontains tests for the HyStart++ implementation.neqo-transport/src/cc/classic_slow_start.rshas the standard slow start implementation that was previously hardcoded intoclassic_cc.rs.neqo-transport/src/cc/classic_cc.rshas the coreon_packets_ackedflow that calls everything else and the trait definition forSlowStart. There is also a small change toon_packet_sentfor round-tracking.I think it might make it the easiest to look at
classic_cc.rsfirst and then follow the code flow from there into the HyStart++ implementation. I'm not sure it makes a lot of sense to look at commits one-by-one because of the re-architecture in the latest commit.I also have some qlogs from validating the implementation against
h3.speed.cloudflare.comwhere one can see HyStart++ working as intended. See this qvis screenshot where it exits at the same window size that is later converged on during congestion avoidance: