Skip to content

Add RTCEngine sendLossyBytes explicit queue to better handle sending large data track packets#1876

Open
1egoman wants to merge 7 commits intomainfrom
data-track-engine-explicit-queue
Open

Add RTCEngine sendLossyBytes explicit queue to better handle sending large data track packets#1876
1egoman wants to merge 7 commits intomainfrom
data-track-engine-explicit-queue

Conversation

@1egoman
Copy link
Copy Markdown
Contributor

@1egoman 1egoman commented Apr 8, 2026

Previously, sendLossyBytes ran await this.waitForBufferStatusLow(kind); to pause before enqueuing new data track packets onto the _data_tracks rtc data channel. In local cases this worked fairly well and the promises seemed to resolve in the order they were triggered.

However, when running the e2e tests, @boks1971 noticed an excessive number of out of order packets - in some cases the e2e tests returning as low as a 20% delivery rate for the low frequency large payload test.

I had a suspicion this await this.waitForBufferStatusLow(kind); bit was the problem, so I swapped it over to an explicit queuing implementation, and when I did, the packet delivery rate went up to 89-90%. In hindsight, relying on the promise resolution order like this was sloppy so I'm glad I'm getting this fixed now.

1egoman added 2 commits April 7, 2026 15:01
Previously, I was relying on the promise resolution order of `await
this.waitForBufferStatusLow(...)` to always resolve in the same order as
the promises were awaited, which in practice seemed to be the case, but
I wasn't 100% sure would always be true.

So just to verify, switch this over to use an explicit queue and see if
that changes any of the e2e test results.
@1egoman 1egoman requested review from boks1971 and lukasIO April 8, 2026 13:16
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 8, 2026

🦋 Changeset detected

Latest commit: d1858ba

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

size-limit report 📦

Path Size
dist/livekit-client.esm.mjs 95.56 KB (+0.13% 🔺)
dist/livekit-client.umd.js 104.4 KB (+0.15% 🔺)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought(non-blocking-just-context): an alternative would be to wrapt this function (its implementation) in a kind-dependent Mutex and buffer all calls that way. This would create back pressure inherently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW - this was what I largely ended up doing. That being said, there still needs to be an individual queue of futures though, so I couldn't just do a wholesale replacement, because the draining code needs to "call back" into sendLossyBytes via the future to actually call dc.send(...). But the mutex addition should mean that packets won't be interleaved during a buffer drain at least.

switch (bufferStatusLowBehavior) {
case 'wait':
await this.waitForBufferStatusLow(kind);
if (this.isBufferStatusLow(kind)) {
Copy link
Copy Markdown
Contributor

@lukasIO lukasIO Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're checking for whether the status is low here and in that case buffering the message 👀

I think either we want to put all msgs in wait in that buffer or negate that condition?

If we negate it, I think there's still the potential for this to race as we're checking for the bufferstatus here and also when emptying the lossyBytesBuffer. I can imagine that we would run into situations where a packet sneaks through out of order that way

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, very interesting... I am realizing that because of this incorrect negation when I ran the e2e tests my test results were invalid. Let me fix that and retest this and make sure it still has the desired effect.

Putting that aide... @lukasIO I'm not sure I understand where the race could occur. There's only one js thread involved so sendLossyBytes and updateAndEmitDCBufferStatus should never be running actually in parallel. If lossyBytesWaitBuffer is halfway through being cleared when continueSendingFuture.promise resolves then new entries will get pushed on the end (it's a queue, though now I wonder if maybe it should be renamed to make that more clear). It's possible I'm missing something though, can you give me a step by step situation outlining how a packet would end up out of order?

Copy link
Copy Markdown
Contributor Author

@1egoman 1egoman Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with lukas out of band - this is the problem he is identifying, rephrased:

The problem isn't that packets could get enqueued into lossyBytesWaitBuffer out of order, the problem is that packets could be halfway through draining from lossyBytesWaitBuffer and ANOTHER unrelated packet could come in and get interleaved in the middle while that "drain" is occurring

I think it's a good point, and I hadn't considered that case!

Copy link
Copy Markdown
Contributor Author

@1egoman 1egoman Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have addressed this in 2d30ccd.

I should say also, I'm still not 100% sure that it actually is a good tradeoff for data tracks to be buffering out of order packets like this, given the focus on timely delivery over reliability. @ladvoc I'm curious if you have any thoughts here!

Copy link
Copy Markdown
Contributor

@lukasIO lukasIO Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant with the mutex approach is minimising the required changes in this PR and simply ensuring that the resolve order of that promise chain is strict, I opened this #1877 as an alternative to the mutex+buffer approach taken here, let me know what you think!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted some rough numbers here for your alternate solution: #1877 (comment). TL;DR though it performs roughly the same time wise but results in a much more variable amount of dropped packets run to run.

1egoman added 4 commits April 8, 2026 10:52
…wait case

Lukas raised this problem:

The problem isn't that packets could get enqueued into
lossyBytesWaitBuffer out of order, the problem is that packets could be
halfway through draining from lossyBytesWaitBuffer and ANOTHER unrelated
packet could come in and get interleaved in the middle while that
"drain" is occurring

So to try to fix it, wrap the "wait" path in a mutex to ensure that
unrelated packets can't get interleaved in the midst of draining lossyBytesWaitBuffer.
@1egoman 1egoman requested a review from lukasIO April 8, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants