Skip to content

feat: Add neqo_udp::Socket::send_buffer()#3389

Draft
larseggert wants to merge 3 commits intomozilla:mainfrom
larseggert:feat-udp-send_buffer
Draft

feat: Add neqo_udp::Socket::send_buffer()#3389
larseggert wants to merge 3 commits intomozilla:mainfrom
larseggert:feat-udp-send_buffer

Conversation

@larseggert
Copy link
Collaborator

This allows sending from a &[u8] without copying into a Vec<u8>. (I have a pending Gecko patch to use neqo-udp for WebRTC where this avoids copying.)

This allows sending from a `&[u8]` without copying into a `Vec<u8>`. (I have a pending Gecko patch to use `neqo-udp` for WebRTC where this avoids copying.)
Copilot AI review requested due to automatic review settings February 9, 2026 16:14
@larseggert larseggert requested a review from mxinden as a code owner February 9, 2026 16:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Socket::send_buffer() API to allow sending directly from a borrowed &[u8] (optionally using GSO), avoiding Vec<u8> allocations/copies.

Changes:

  • Introduce Socket::send_buffer() for zero-copy sends from a borrowed buffer.
  • Refactor common send logic into try_send_transmit() and reuse it from send_inner().
  • Add tests (and a shared helper) covering send_buffer() for both single datagrams and GSO.

@mxinden
Copy link
Member

mxinden commented Feb 10, 2026

I have a pending Gecko patch to use neqo-udp for WebRTC where this avoids copying.

Can you share the patch?

I suggest not merging here until the Firefox patch is in a reviewable state to reduce churn on the Neqo side.

@larseggert
Copy link
Collaborator Author

@larseggert larseggert marked this pull request as draft February 10, 2026 11:23
@larseggert
Copy link
Collaborator Author

I'll leave this as draft per @mxinden's suggestion to first stabilize the Gecko parts.

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 88.46154% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.14%. Comparing base (96e9859) to head (0877cdc).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3389      +/-   ##
==========================================
- Coverage   94.24%   94.14%   -0.10%     
==========================================
  Files         125      129       +4     
  Lines       37973    38352     +379     
  Branches    37973    38352     +379     
==========================================
+ Hits        35787    36107     +320     
- Misses       1349     1390      +41     
- Partials      837      855      +18     
Flag Coverage Δ
freebsd 93.20% <71.79%> (-0.07%) ⬇️
linux 94.23% <88.46%> (-0.02%) ⬇️
macos 94.10% <71.79%> (-0.02%) ⬇️
windows 94.22% <88.46%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
neqo-common 98.49% <ø> (ø)
neqo-crypto 86.90% <ø> (ø)
neqo-http3 93.88% <ø> (ø)
neqo-qpack 94.79% <ø> (ø)
neqo-transport 95.22% <100.00%> (+0.04%) ⬆️
neqo-udp 83.03% <88.46%> (+0.13%) ⬆️
mtu 86.61% <ø> (ø)

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 13, 2026

Merging this PR will degrade performance by 8.85%

⚡ 2 improved benchmarks
❌ 2 regressed benchmarks
✅ 47 untouched benchmarks
⏩ 11 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime quiche-quiche 399.7 ms 438.4 ms -8.85%
WallTime quiche-neqo 359.8 ms 372 ms -3.28%
WallTime msquic-msquic 468.7 ms 455 ms +3%
WallTime walltime/1-streams/each-1000-bytes 1.4 ms 1.3 ms +3.78%

Comparing larseggert:feat-udp-send_buffer (0877cdc) with main (378c365)

Open in CodSpeed

Footnotes

  1. 11 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to main at 378c365.

neqo-pr as clientneqo-pr as server
neqo-pr vs. aioquic: A ⚠️C1
neqo-pr vs. go-x-net: A BP BA
neqo-pr vs. haproxy: 🚀M A BP BA
neqo-pr vs. kwik: ⚠️R BP BA
neqo-pr vs. linuxquic: A L1 ⚠️C1
neqo-pr vs. lsquic: L1 C1
neqo-pr vs. msquic: A L1 ⚠️L2 C1
neqo-pr vs. mvfst: A 🚀L1 C1 ⚠️BA
neqo-pr vs. neqo: Z
neqo-pr vs. nginx: A L1 ⚠️C1 BP BA
neqo-pr vs. ngtcp2: A 🚀L1 C1 CM
neqo-pr vs. picoquic: A
neqo-pr vs. quic-go: A ⚠️C1
neqo-pr vs. quiche: A L1 ⚠️C1 BP BA
neqo-pr vs. quinn: A 🚀C1
neqo-pr vs. s2n-quic: A L1 ⚠️C1 BP BA CM
neqo-pr vs. tquic: S A BP BA
neqo-pr vs. xquic: A
aioquic vs. neqo-pr: Z ⚠️L1 CM
go-x-net vs. neqo-pr: CM
kwik vs. neqo-pr: Z BP BA CM
lsquic vs. neqo-pr: Z ⚠️C1
msquic vs. neqo-pr: Z CM
mvfst vs. neqo-pr: Z A L1 C1 CM
neqo vs. neqo-pr: Z 🚀BP
openssl vs. neqo-pr: LR M A CM
picoquic vs. neqo-pr: Z
quic-go vs. neqo-pr: CM
quiche vs. neqo-pr: Z CM
quinn vs. neqo-pr: Z 🚀C1 V2 CM
s2n-quic vs. neqo-pr: CM
tquic vs. neqo-pr: Z CM
xquic vs. neqo-pr: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 13, 2026

Unable to generate the flame graphs

The performance report has correctly been generated, but there was an internal error while generating the flame graphs for this run. We're working on fixing the issue. Feel free to contact us on Discord or at support@codspeed.io if the issue persists.

@github-actions
Copy link
Contributor

Benchmark results

No significant performance differences relative to 378c365.

All results
transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: No change in performance detected.
       time:   [202.27 ms 202.68 ms 203.14 ms]
       thrpt:  [492.27 MiB/s 493.38 MiB/s 494.39 MiB/s]
change:
       time:   [-0.0418% +0.2813% +0.6153] (p = 0.09 > 0.05)
       thrpt:  [-0.6115% -0.2805% +0.0418]
       No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
transfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: Change within noise threshold.
       time:   [284.22 ms 286.30 ms 288.40 ms]
       thrpt:  [34.674 Kelem/s 34.929 Kelem/s 35.185 Kelem/s]
change:
       time:   [-2.3828% -1.4224% -0.5537] (p = 0.00 < 0.05)
       thrpt:  [+0.5568% +1.4429% +2.4409]
       Change within noise threshold.
transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected.
       time:   [38.458 ms 38.616 ms 38.796 ms]
       thrpt:  [25.776   B/s 25.896   B/s 26.002   B/s]
change:
       time:   [-0.6236% -0.0683% +0.5339] (p = 0.82 > 0.05)
       thrpt:  [-0.5310% +0.0683% +0.6275]
       No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: No change in performance detected.
       time:   [204.49 ms 205.01 ms 205.76 ms]
       thrpt:  [486.00 MiB/s 487.78 MiB/s 489.02 MiB/s]
change:
       time:   [-0.6883% -0.3301% +0.0873] (p = 0.10 > 0.05)
       thrpt:  [-0.0872% +0.3312% +0.6931]
       No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
decode 4096 bytes, mask ff: No change in performance detected.
       time:   [4.5032 µs 4.5222 µs 4.5533 µs]
       change: [-0.5059% +0.0253% +0.5572] (p = 0.93 > 0.05)
       No change in performance detected.
Found 27 outliers among 100 measurements (27.00%)
1 (1.00%) low severe
5 (5.00%) low mild
8 (8.00%) high mild
13 (13.00%) high severe
decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [1.1582 ms 1.1595 ms 1.1608 ms]
       change: [-0.7521% -0.2021% +0.3459] (p = 0.47 > 0.05)
       No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
10 (10.00%) low severe
4 (4.00%) high mild
2 (2.00%) high severe
decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [5.7907 µs 5.8151 µs 5.8577 µs]
       change: [-0.3733% +0.0625% +0.6873] (p = 0.84 > 0.05)
       No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [1.4866 ms 1.4887 ms 1.4910 ms]
       change: [-0.0804% +0.1552% +0.3827] (p = 0.20 > 0.05)
       No change in performance detected.
decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [5.5363 µs 5.5439 µs 5.5518 µs]
       change: [-0.2793% -0.0081% +0.2438] (p = 0.95 > 0.05)
       No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.4143 ms 1.4164 ms 1.4185 ms]
       change: [-0.9743% -0.3836% +0.0324] (p = 0.15 > 0.05)
       No change in performance detected.
streams/simulated/1-streams/each-1000-bytes: No change in performance detected.
       time:   [129.68 ms 129.68 ms 129.69 ms]
       thrpt:  [7.5302 KiB/s 7.5304 KiB/s 7.5306 KiB/s]
change:
       time:   [-0.0026% +0.0011% +0.0048] (p = 0.56 > 0.05)
       thrpt:  [-0.0048% -0.0011% +0.0026]
       No change in performance detected.
streams/simulated/1000-streams/each-1-bytes: No change in performance detected.
       time:   [2.5363 s 2.5366 s 2.5369 s]
       thrpt:  [394.18   B/s 394.23   B/s 394.27   B/s]
change:
       time:   [-0.0146% +0.0014% +0.0183] (p = 0.87 > 0.05)
       thrpt:  [-0.0183% -0.0014% +0.0146]
       No change in performance detected.
streams/simulated/1000-streams/each-1000-bytes: No change in performance detected.
       time:   [6.5837 s 6.5899 s 6.5973 s]
       thrpt:  [148.02 KiB/s 148.19 KiB/s 148.33 KiB/s]
change:
       time:   [-0.1588% +0.0037% +0.1669] (p = 0.96 > 0.05)
       thrpt:  [-0.1666% -0.0037% +0.1591]
       No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
streams/walltime/1-streams/each-1000-bytes: No change in performance detected.
       time:   [587.40 µs 590.07 µs 593.03 µs]
       change: [-0.1180% +0.5209% +1.1472] (p = 0.11 > 0.05)
       No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
13 (13.00%) high severe
streams/walltime/1000-streams/each-1-bytes: Change within noise threshold.
       time:   [12.396 ms 12.415 ms 12.436 ms]
       change: [-0.4607% -0.2415% -0.0077] (p = 0.04 < 0.05)
       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
streams/walltime/1000-streams/each-1000-bytes: Change within noise threshold.
       time:   [45.164 ms 45.216 ms 45.272 ms]
       change: [-1.4562% -1.1203% -0.8833] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [92.121 ns 92.513 ns 92.938 ns]
       change: [-0.6918% -0.1398% +0.4273] (p = 0.64 > 0.05)
       No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
9 (9.00%) high mild
3 (3.00%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [109.75 ns 110.05 ns 110.38 ns]
       change: [-3.5864% -0.9570% +0.8453] (p = 0.55 > 0.05)
       No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
12 (12.00%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [109.51 ns 110.19 ns 110.96 ns]
       change: [-1.2250% -0.2052% +0.6894] (p = 0.70 > 0.05)
       No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
5 (5.00%) low severe
4 (4.00%) low mild
9 (9.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [94.525 ns 94.666 ns 94.824 ns]
       change: [-6.4226% -2.1470% +0.3715] (p = 0.39 > 0.05)
       No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [109.10 ms 109.25 ms 109.44 ms]
       change: [+0.8274% +1.1397% +1.3990] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
sent::Packets::take_ranges: No change in performance detected.
       time:   [4.4145 µs 4.4906 µs 4.5548 µs]
       change: [-5.1414% -2.4734% +0.3416] (p = 0.08 > 0.05)
       No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/simulated/pacing-false/varying-seeds: No change in performance detected.
       time:   [23.941 s 23.941 s 23.941 s]
       thrpt:  [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000]
       No change in performance detected.
transfer/simulated/pacing-true/varying-seeds: No change in performance detected.
       time:   [23.676 s 23.676 s 23.676 s]
       thrpt:  [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000]
       No change in performance detected.
transfer/simulated/pacing-false/same-seed: No change in performance detected.
       time:   [23.941 s 23.941 s 23.941 s]
       thrpt:  [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000]
       No change in performance detected.
transfer/simulated/pacing-true/same-seed: No change in performance detected.
       time:   [23.676 s 23.676 s 23.676 s]
       thrpt:  [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
change:
       time:   [+0.0000% +0.0000% +0.0000] (p = NaN > 0.05)
       thrpt:  [+0.0000% +0.0000% +0.0000]
       No change in performance detected.
transfer/walltime/pacing-false/varying-seeds: Change within noise threshold.
       time:   [23.219 ms 23.236 ms 23.253 ms]
       change: [-0.5258% -0.4257% -0.3273] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
transfer/walltime/pacing-true/varying-seeds: Change within noise threshold.
       time:   [24.072 ms 24.089 ms 24.106 ms]
       change: [+1.0423% +1.2498% +1.4033] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/walltime/pacing-false/same-seed: Change within noise threshold.
       time:   [23.525 ms 23.544 ms 23.563 ms]
       change: [+0.5217% +0.6665% +0.7991] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
transfer/walltime/pacing-true/same-seed: Change within noise threshold.
       time:   [24.106 ms 24.122 ms 24.140 ms]
       change: [-0.6114% -0.3892% -0.2316] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Contributor

Client/server transfer results

Performance differences relative to 378c365.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
neqo-neqo-newreno-nopacing 95.4 ± 4.5 88.0 118.9 335.5 ± 7.1 💚 -1.9 -2.0%
neqo-quiche-cubic 190.4 ± 3.8 185.5 202.2 168.1 ± 8.4 💔 1.3 0.7%
neqo-s2n-cubic 219.3 ± 3.8 211.5 231.0 145.9 ± 8.4 💚 -1.2 -0.5%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google-google-nopacing 454.8 ± 3.8 446.3 464.9 70.4 ± 8.4
google-neqo-cubic 274.0 ± 4.6 266.2 290.2 116.8 ± 7.0 0.2 0.1%
msquic-msquic-nopacing 173.4 ± 44.2 137.7 385.1 184.6 ± 0.7
msquic-neqo-cubic 192.8 ± 53.1 146.0 414.0 166.0 ± 0.6 -11.4 -5.6%
neqo-google-cubic 754.6 ± 4.9 746.8 769.8 42.4 ± 6.5 -0.3 -0.0%
neqo-msquic-cubic 159.6 ± 4.1 153.6 167.9 200.5 ± 7.8 -0.8 -0.5%
neqo-neqo-cubic 95.8 ± 4.3 88.3 104.9 334.1 ± 7.4 -1.0 -1.0%
neqo-neqo-cubic-nopacing 96.1 ± 3.9 88.9 104.6 333.0 ± 8.2 0.2 0.2%
neqo-neqo-newreno 97.9 ± 4.4 85.3 108.8 327.0 ± 7.3 1.0 1.1%
neqo-neqo-newreno-nopacing 95.4 ± 4.5 88.0 118.9 335.5 ± 7.1 💚 -1.9 -2.0%
neqo-quiche-cubic 190.4 ± 3.8 185.5 202.2 168.1 ± 8.4 💔 1.3 0.7%
neqo-s2n-cubic 219.3 ± 3.8 211.5 231.0 145.9 ± 8.4 💚 -1.2 -0.5%
quiche-neqo-cubic 153.5 ± 4.8 141.8 165.8 208.5 ± 6.7 -0.0 -0.0%
quiche-quiche-nopacing 142.4 ± 4.4 136.2 165.2 224.8 ± 7.3
s2n-neqo-cubic 176.2 ± 5.2 165.9 189.6 181.6 ± 6.2 0.6 0.3%
s2n-s2n-nopacing 249.6 ± 27.4 231.6 348.0 128.2 ± 1.2

Download data for profiler.firefox.com or download performance comparison data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants