fix: Allow Initial CRYPTO retransmission during Handshake PTO#3350
fix: Allow Initial CRYPTO retransmission during Handshake PTO#3350larseggert wants to merge 6 commits intomozilla:mainfrom
Conversation
45e0eee to
3c947d3
Compare
Merging this PR will degrade performance by 11.55%
Performance Changes
Comparing Footnotes
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3350 +/- ##
==========================================
- Coverage 94.26% 94.14% -0.12%
==========================================
Files 125 129 +4
Lines 37973 38341 +368
Branches 37973 38341 +368
==========================================
+ Hits 35795 36097 +302
- Misses 1339 1397 +58
- Partials 839 847 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug where Initial CRYPTO frames could not be retransmitted when Handshake PTO fires, causing handshake timeouts. The fix simplifies the SendProfile::ack_only() logic and ensures Initial packets are marked for retransmission when Handshake PTO fires.
Changes:
- Removed the redundant
SendProfile::ptofield and simplifiedack_only()to only check congestion window limits - Modified
maybe_fire_pto()to mark Initial packets for retransmission when Handshake PTO fires, ensuring lost CRYPTO data is resent - Added comprehensive unit and integration tests to verify Initial CRYPTO retransmission during Handshake PTO
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| neqo-transport/src/recovery/mod.rs | Removed SendProfile::pto field, simplified ack_only(), added logic to mark Initial packets for retransmission when Handshake PTO fires, updated tests to use should_probe() instead of pto field |
| neqo-transport/src/connection/tests/idle.rs | Updated test expectations to allow CRYPTO frames in Initial packets during PTO |
| neqo-transport/src/connection/tests/handshake.rs | Fixed reorder_handshake test flakiness with deterministic packet numbers, added new integration test initial_crypto_retransmit_during_handshake_pto |
| neqo-transport/src/connection/mod.rs | Updated call to ack_only() to remove space parameter |
When the client has lost Initial CRYPTO data and PTO fires for Handshake space (primed to prevent deadlocks), the client must still be able to retransmit the lost Initial CRYPTO frames per RFC 9002 Section 6.2.4. Previously, `SendProfile::ack_only()` incorrectly returned `true` for Initial space when PTO was for Handshake space, blocking CRYPTO frame transmission. Additionally, Initial packets were not being marked for retransmission when Handshake PTO fired. This fix: - Simplifies `ack_only()` to only check congestion window limits, removing the incorrect space-based restriction - Marks Initial packets for retransmission in `maybe_fire_pto()` when Handshake PTO fires, ensuring lost Initial CRYPTO data is resent - Removes the redundant `SendProfile::pto` field, using `should_probe()` for equivalent assertions in tests - Adds unit and integration tests that verify Initial CRYPTO can be retransmitted when Handshake PTO fires - Fixes `reorder_handshake` test flakiness by disabling packet number randomization for deterministic behavior Bug scenario from QNS L1/C1 test failures: 1. Client sends `ClientHello` split across Initial packets 2. Server receives first packet but second is lost/corrupted 3. Server ACKs first packet; client detects second as lost 4. PTO fires for Handshake (primed to prevent deadlocks) 5. BUG: `ack_only(Initial)` returned true, blocking CRYPTO retransmission 6. Client times out waiting for handshake completion
163408a to
40e1893
Compare
mxinden
left a comment
There was a problem hiding this comment.
This makes sense to me. Thanks for the elaborate test. I think we can simplify maybe_fire_pto. What do you think?
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
Benchmark resultsNo significant performance differences relative to 26711da. All resultstransfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: Change within noise threshold. time: [199.46 ms 199.84 ms 200.26 ms]
thrpt: [499.35 MiB/s 500.40 MiB/s 501.35 MiB/s]
change:
time: [-1.2556% -0.8285% -0.4341] (p = 0.00 < 0.05)
thrpt: [+0.4360% +0.8354% +1.2715]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severetransfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: Change within noise threshold. time: [283.46 ms 285.19 ms 286.94 ms]
thrpt: [34.850 Kelem/s 35.064 Kelem/s 35.279 Kelem/s]
change:
time: [+0.6331% +1.6121% +2.5102] (p = 0.00 < 0.05)
thrpt: [-2.4487% -1.5866% -0.6291]
Change within noise threshold.transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected. time: [38.636 ms 38.796 ms 38.978 ms]
thrpt: [25.656 B/s 25.776 B/s 25.883 B/s]
change:
time: [-0.7333% -0.0675% +0.5512] (p = 0.84 > 0.05)
thrpt: [-0.5482% +0.0676% +0.7388]
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: Change within noise threshold. time: [201.74 ms 202.09 ms 202.48 ms]
thrpt: [493.88 MiB/s 494.84 MiB/s 495.68 MiB/s]
change:
time: [-1.5459% -1.2487% -0.9809] (p = 0.00 < 0.05)
thrpt: [+0.9906% +1.2645% +1.5702]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severedecode 4096 bytes, mask ff: No change in performance detected. time: [4.5112 µs 4.5283 µs 4.5536 µs]
change: [-0.4534% -0.0775% +0.2904] (p = 0.70 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severedecode 1048576 bytes, mask ff: No change in performance detected. time: [1.1568 ms 1.1584 ms 1.1601 ms]
change: [-2.0270% -0.9085% -0.0613] (p = 0.07 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
10 (10.00%) low severe
1 (1.00%) high mild
2 (2.00%) high severedecode 4096 bytes, mask 7f: No change in performance detected. time: [5.7943 µs 5.8176 µs 5.8569 µs]
change: [-0.0678% +0.3857% +1.0411] (p = 0.15 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severedecode 1048576 bytes, mask 7f: No change in performance detected. time: [1.4868 ms 1.4894 ms 1.4924 ms]
change: [-0.1515% +0.1052% +0.3412] (p = 0.41 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high milddecode 4096 bytes, mask 3f: No change in performance detected. time: [5.5369 µs 5.5461 µs 5.5566 µs]
change: [-0.2015% +0.1292% +0.4803] (p = 0.47 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severedecode 1048576 bytes, mask 3f: No change in performance detected. time: [1.4139 ms 1.4160 ms 1.4181 ms]
change: [-0.8344% -0.2784% +0.1078] (p = 0.29 > 0.05)
No change in performance detected.streams/simulated/1-streams/each-1000-bytes: No change in performance detected. time: [129.68 ms 129.68 ms 129.68 ms]
thrpt: [7.5303 KiB/s 7.5305 KiB/s 7.5307 KiB/s]
change:
time: [-0.0053% -0.0014% +0.0025] (p = 0.49 > 0.05)
thrpt: [-0.0025% +0.0014% +0.0053]
No change in performance detected.streams/simulated/1000-streams/each-1-bytes: No change in performance detected. time: [2.5362 s 2.5366 s 2.5369 s]
thrpt: [394.18 B/s 394.23 B/s 394.28 B/s]
change:
time: [-0.0258% -0.0081% +0.0090] (p = 0.37 > 0.05)
thrpt: [-0.0090% +0.0081% +0.0258]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mildstreams/simulated/1000-streams/each-1000-bytes: No change in performance detected. time: [6.5913 s 6.6033 s 6.6171 s]
thrpt: [147.58 KiB/s 147.89 KiB/s 148.16 KiB/s]
change:
time: [-0.1644% +0.0753% +0.3354] (p = 0.56 > 0.05)
thrpt: [-0.3343% -0.0753% +0.1646]
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severestreams/walltime/1-streams/each-1000-bytes: Change within noise threshold. time: [584.03 µs 586.09 µs 588.49 µs]
change: [-1.3723% -0.8418% -0.2801] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severestreams/walltime/1000-streams/each-1-bytes: Change within noise threshold. time: [12.384 ms 12.402 ms 12.421 ms]
change: [-1.1874% -0.9710% -0.7512] (p = 0.00 < 0.05)
Change within noise threshold.streams/walltime/1000-streams/each-1000-bytes: No change in performance detected. time: [45.298 ms 45.366 ms 45.443 ms]
change: [-0.5041% -0.1658% +0.1036] (p = 0.32 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severecoalesce_acked_from_zero 1+1 entries: No change in performance detected. time: [92.012 ns 92.322 ns 92.633 ns]
change: [-0.2251% +0.3388% +0.9621] (p = 0.33 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severecoalesce_acked_from_zero 3+1 entries: No change in performance detected. time: [109.90 ns 110.22 ns 110.56 ns]
change: [-0.6092% -0.0702% +0.4087] (p = 0.80 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
11 (11.00%) high severecoalesce_acked_from_zero 10+1 entries: No change in performance detected. time: [109.27 ns 109.65 ns 110.12 ns]
change: [-0.1132% +0.4533% +1.1606] (p = 0.17 > 0.05)
No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
6 (6.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
10 (10.00%) high severecoalesce_acked_from_zero 1000+1 entries: No change in performance detected. time: [94.468 ns 94.629 ns 94.820 ns]
change: [-1.3011% -0.4667% +0.2635] (p = 0.26 > 0.05)
No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
8 (8.00%) high mild
10 (10.00%) high severeRxStreamOrderer::inbound_frame(): Change within noise threshold. time: [109.21 ms 109.38 ms 109.66 ms]
change: [+0.7462% +0.9274% +1.2044] (p = 0.00 < 0.05)
Change within noise threshold.
Found 23 outliers among 100 measurements (23.00%)
2 (2.00%) low severe
9 (9.00%) low mild
7 (7.00%) high mild
5 (5.00%) high severesent::Packets::take_ranges: No change in performance detected. time: [4.4358 µs 4.5327 µs 4.6307 µs]
change: [-4.4984% -1.3848% +1.6944] (p = 0.40 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mildtransfer/simulated/pacing-false/varying-seeds: No change in performance detected. time: [23.941 s 23.941 s 23.941 s]
thrpt: [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/simulated/pacing-true/varying-seeds: No change in performance detected. time: [23.676 s 23.676 s 23.676 s]
thrpt: [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/simulated/pacing-false/same-seed: No change in performance detected. time: [23.941 s 23.941 s 23.941 s]
thrpt: [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/simulated/pacing-true/same-seed: No change in performance detected. time: [23.676 s 23.676 s 23.676 s]
thrpt: [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
change:
time: [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
thrpt: [+0.0000% +0.0000% +0.0000]
No change in performance detected.transfer/walltime/pacing-false/varying-seeds: Change within noise threshold. time: [22.790 ms 22.818 ms 22.860 ms]
change: [-1.4655% -1.2081% -0.9679] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severetransfer/walltime/pacing-true/varying-seeds: No change in performance detected. time: [23.175 ms 23.203 ms 23.244 ms]
change: [-0.0431% +0.1909% +0.4273] (p = 0.10 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severetransfer/walltime/pacing-false/same-seed: Change within noise threshold. time: [23.072 ms 23.101 ms 23.147 ms]
change: [+0.2829% +0.5386% +0.7825] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/same-seed: Change within noise threshold. time: [23.551 ms 23.577 ms 23.616 ms]
change: [+0.5279% +0.7007% +0.9249] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severeDownload data for |
Client/server transfer resultsPerformance differences relative to 26711da. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Table above only shows statistically significant changes. See all results below. All resultsTransfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
When the client has lost Initial CRYPTO data and PTO fires for Handshake space (primed to prevent deadlocks), the client must still be able to retransmit the lost Initial CRYPTO frames per RFC 9002 Section 6.2.4.
Previously,
SendProfile::ack_only()incorrectly returnedtruefor Initial space when PTO was for Handshake space, blocking CRYPTO frame transmission. Additionally, Initial packets were not being marked for retransmission when Handshake PTO fired.This fix:
ack_only()to only check congestion window limits, removing the incorrect space-based restrictionmaybe_fire_pto()when Handshake PTO fires, ensuring lost Initial CRYPTO data is resentSendProfile::ptofield, usingshould_probe()for equivalent assertions in testsreorder_handshaketest flakiness by disabling packet number randomization for deterministic behaviorBug scenario from QNS L1/C1 test failures:
ClientHellosplit across Initial packetsack_only(Initial)returned true, blocking CRYPTO retransmission