Skip to content

SwiftQUIC: PERF: acknowledgedSendDataInner is using too much CPU#17

Open
agnosticdev wants to merge 2 commits into
mainfrom
agnosticdev/PerfAcknowledgedSendData
Open

SwiftQUIC: PERF: acknowledgedSendDataInner is using too much CPU#17
agnosticdev wants to merge 2 commits into
mainfrom
agnosticdev/PerfAcknowledgedSendData

Conversation

@agnosticdev

Copy link
Copy Markdown
Collaborator

In StreamSendBuffer acknowledgedSendDataInner does a insert and removal operation for almost all ACKs that are processed. This is very costly based on the CPU metrics below.
The new change uses a custom struct that holds an array of Range<StreamOffset> to not pay the overhead of dealing with Set operations on each ACK. acknowledgedSendDataInner now has two branches, a in-order branch that only does a remove operation, and an out of order path that still falls back to using both an insert and remove operation. The result of this is about a 7.3 gigacycle reduction in CPU for QUICTransfer:

Baseline:

7.46 G  7.4%	-	         StreamSendBuffer.acknowledgedSendData(offset:length:log:)
7.42 G  7.3%	52.13 M	        StreamSendBuffer.acknowledgedSendDataInner(offset:length:log:)
2.91 G  2.9%	31.01 M	           specialized RangeSet.insert(contentsOf:)
2.80 G  2.8%	135.26 M	       RangeSet.Ranges._remove(contentsOf:)
639.99 M  0.6%	33.00 M	           RangeSet.Ranges.subscript.getter
567.07 M  0.6%	9.00 M	           RangeSet.remove(contentsOf:)
132.83 M  0.1%	-	               FrameArrayQueue.claim(fromStart:)
61.00 M  0.1%	4.00 M	           swift_release

With this change:

161.49 M  96.2%	1.87 M	        StreamSendBuffer.acknowledgedSendData(offset:length:log:)
159.62 M  95.1%	2.00 M	          StreamSendBuffer.acknowledgedSendDataInner(offset:length:log:)
151.62 M  90.3%	22.00 M	          specialized StreamSendBuffer.acknowledgedSendDataInner(offset:length:log:)
129.62 M  77.2%	-	              FrameArrayQueue.claim(fromStart:)
97.76 M  58.2%	20.00 M	          Frame.claim(fromStart:fromEnd:adjustSingleIPAggregate:)
31.86 M  19.0%	-	              FrameArray._claim(fromStart:existingLength:removeClaimedFrames:)
6.00 M   3.6%	6.00 M	  Frame.claim(fromStart:fromEnd:adjustSingleIPAggregate:)

@agnosticdev agnosticdev requested a review from rnro June 14, 2026 21:56
}

var acknowledgedDataRanges = RangeSet<StreamOffset>()
var acknowledgedDataRanges = AcknowledgedRanges()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change here is doing two things:

  1. Making sure we don't insert and remove in the happy path
  2. Changing the storage type from RangeSet to a custom type

Can we see a perf difference if we do (1) but keep (2) using the standard type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants