Avoid io_context contention in high-throughput SSL stream reads by filling buffers fully by hofst · Pull Request #1712 · chriskohlhoff/asio

hofst · 2026-02-06T13:28:14Z

The current implementation of read_some in asio::ssl decodes only one TLS segment per operation. The TLS maximum segment size is 16KB. This leads to small reads per io_context operation. High-Throughput real-world scenarios observe as little as 9KB buffer utilization per operation, causing significant overhead and thread overscheduling when high throughput is required. This is relevant because it is hard to get much more than ~600k operations per second from a single io_context. While multiple io-contexts are possible, the overhead also applies to multiple io_contexts. Moreover, implementations get significantly more complex when involving multiple io_contexts which in turn require dedicated load-balancing and scheduling for hetereogeneous loads.
On our production machines (e.g., r8i.48xlarge with a 75Gb/s interface or x2idn.32xlarge with a 100Gb/s interface), we see significant contention and a maximum network throughput of ~25Gb/s at very high (up to 100%) CPU utilization due to contention for concurrent S3 downloads. With this PR, the system CPU utilization drops to ~10% while throughput increases to 70GB/s and 92Gb/s respectively.

This PR modifies the read operation to loop multiple reads until either:

There is no more data in the system buffer (would block).
The user-provided buffer is full.

Additionally, the internal buffer sizes are increased from 17KB to 128KB. This part is open for suggestions - should the buffer sizes be configurable for high-throughput scenarios, e.g., at runtime or via compile-time macros?

Happy to get feedback on the approach!

…lling buffers fully The current implementation of read_some in asio::ssl decodes only one TLS segment per operation. A TLS maximum segment size is 16KB. This leads to small reads per io_context operation. High-Throughput real-world scenarios observe as little as 9KB buffer utilization per operation, causing significant overhead and thread overscheduling when high throughput is required. This is relevant because it is hard to get much more than ~600k operations per second from a single io_context. While multiple io-contexts are possible, the overhead also applies to multiple io_contexts. Moreover, implementations get significantly more complex when multiple io_contexts are required which require dedicated load-balancing and scheduling. On our production machines (e.g., `r8i.48xlarge` with a 75Gb/s interface or `x2idn.32xlarge` with a 100Gb/s interface), we see significant contention and a maximum network throughput of ~25Gb/s at very high (up to 100%) CPU utilization due to contention for concurrent S3 downloads. With this PR, the system CPU utilization drops to ~10% while throughput increases to 70GB/s and 92Gb/s respectively. This PR modifies the read operation to loop multiple reads until either: 1. There is no more data in the system buffer (would block). 2. The user-provided buffer is full. Additionally, the internal buffer sizes are increased from 17KB to 128KB. This part is open for suggestions - should the buffer sizes be configurable for high-throughput scenarios, e.g., at runtime or via compile-time macros?

hofst · 2026-04-01T13:44:38Z

@chriskohlhoff Could you have a look at this PR? I have summarized the background in the PR message and there is also a related discussion here: boostorg/beast#3062

mabrarov · 2026-04-01T21:57:27Z

Hi @hofst,

IMHO, in some cases user of Asio would like to get handler of async_read_some / async_write_some to be called as soon as at least some data arrive, e.g. when using timeouts where the handler of async operation resets timer to implement lower limit of transfer speed.

Doesn't asio::transfer_at_least help in your case? With asio::transfer_at_least user of Asio still controls timings by setting a task reasonable limit.

I believe that there is a chance asio::transfer_at_least doesn't reduce thread overscheduling you observe, so consider this comment more like a question, please.

FYI: https://konradzemek.com/2015/08/16/asio-ssl-and-scalability/

Thank you.

hofst · 2026-04-02T12:58:41Z

@mabrarov

IMHO, in some cases user of Asio would like to get handler of async_read_some / async_write_some to be called as soon as at least some data arrive, e.g. when using timeouts where the handler of async operation resets timer to implement lower limit of transfer speed.
This is covered: The new logic would not block to wait for enough data to fill the buffer. Once there is no more data in the system buffer it would finish even if the user buffer is not full yet.

Doesn't asio::transfer_at_least help in your case? With asio::transfer_at_least user of Asio still controls timings by setting a task reasonable limit.
Unfortunately, transfer_at_least works on a higher level as a completion condition and would have the same problem.

FYI: https://konradzemek.com/2015/08/16/asio-ssl-and-scalability/
Thank you, I've read similar optimization guides already: But they either side-step the problem (e.g., by having one io-context per core which introduces a lot of application complexity requiring a new top-level scheduler) or do not tackle the root cause fully. This PR is targeted at the root cause of the scaling problem, i.e., it reduces the huge number of ioContext tasks that are unnecessarily expensive. In enterprise set-ups it is also not possible to deviate from OpenSSL.

This was referenced Feb 6, 2026

HTTP: Increase max read buffer size to 128KB boostorg/beast#3080

Open

io-context overwhelmed by small 16KB TLS reads limits high-throughput scenarios boostorg/beast#3062

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid io_context contention in high-throughput SSL stream reads by filling buffers fully#1712

Avoid io_context contention in high-throughput SSL stream reads by filling buffers fully#1712
hofst wants to merge 1 commit intochriskohlhoff:masterfrom
hofst:master

hofst commented Feb 6, 2026

Uh oh!

hofst commented Apr 1, 2026

Uh oh!

mabrarov commented Apr 1, 2026 •

edited

Loading

Uh oh!

hofst commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hofst commented Feb 6, 2026

Uh oh!

hofst commented Apr 1, 2026

Uh oh!

mabrarov commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hofst commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mabrarov commented Apr 1, 2026 •

edited

Loading