Add RTCEngine sendLossyBytes explicit queue to better handle sending large data track packets#1876
Add RTCEngine sendLossyBytes explicit queue to better handle sending large data track packets#1876
Conversation
Previously, I was relying on the promise resolution order of `await this.waitForBufferStatusLow(...)` to always resolve in the same order as the promises were awaited, which in practice seemed to be the case, but I wasn't 100% sure would always be true. So just to verify, switch this over to use an explicit queue and see if that changes any of the e2e test results.
🦋 Changeset detectedLatest commit: d1858ba The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
size-limit report 📦
|
There was a problem hiding this comment.
thought(non-blocking-just-context): an alternative would be to wrapt this function (its implementation) in a kind-dependent Mutex and buffer all calls that way. This would create back pressure inherently.
There was a problem hiding this comment.
FWIW - this was what I largely ended up doing. That being said, there still needs to be an individual queue of futures though, so I couldn't just do a wholesale replacement, because the draining code needs to "call back" into sendLossyBytes via the future to actually call dc.send(...). But the mutex addition should mean that packets won't be interleaved during a buffer drain at least.
src/room/RTCEngine.ts
Outdated
| switch (bufferStatusLowBehavior) { | ||
| case 'wait': | ||
| await this.waitForBufferStatusLow(kind); | ||
| if (this.isBufferStatusLow(kind)) { |
There was a problem hiding this comment.
you're checking for whether the status is low here and in that case buffering the message 👀
I think either we want to put all msgs in wait in that buffer or negate that condition?
If we negate it, I think there's still the potential for this to race as we're checking for the bufferstatus here and also when emptying the lossyBytesBuffer. I can imagine that we would run into situations where a packet sneaks through out of order that way
There was a problem hiding this comment.
Huh, very interesting... I am realizing that because of this incorrect negation when I ran the e2e tests my test results were invalid. Let me fix that and retest this and make sure it still has the desired effect.
Putting that aide... @lukasIO I'm not sure I understand where the race could occur. There's only one js thread involved so sendLossyBytes and updateAndEmitDCBufferStatus should never be running actually in parallel. If lossyBytesWaitBuffer is halfway through being cleared when continueSendingFuture.promise resolves then new entries will get pushed on the end (it's a queue, though now I wonder if maybe it should be renamed to make that more clear). It's possible I'm missing something though, can you give me a step by step situation outlining how a packet would end up out of order?
There was a problem hiding this comment.
Chatted with lukas out of band - this is the problem he is identifying, rephrased:
The problem isn't that packets could get enqueued into lossyBytesWaitBuffer out of order, the problem is that packets could be halfway through draining from lossyBytesWaitBuffer and ANOTHER unrelated packet could come in and get interleaved in the middle while that "drain" is occurring
I think it's a good point, and I hadn't considered that case!
There was a problem hiding this comment.
There was a problem hiding this comment.
What I meant with the mutex approach is minimising the required changes in this PR and simply ensuring that the resolve order of that promise chain is strict, I opened this #1877 as an alternative to the mutex+buffer approach taken here, let me know what you think!
There was a problem hiding this comment.
Posted some rough numbers here for your alternate solution: #1877 (comment). TL;DR though it performs roughly the same time wise but results in a much more variable amount of dropped packets run to run.
…wait case Lukas raised this problem: The problem isn't that packets could get enqueued into lossyBytesWaitBuffer out of order, the problem is that packets could be halfway through draining from lossyBytesWaitBuffer and ANOTHER unrelated packet could come in and get interleaved in the middle while that "drain" is occurring So to try to fix it, wrap the "wait" path in a mutex to ensure that unrelated packets can't get interleaved in the midst of draining lossyBytesWaitBuffer.
Previously, sendLossyBytes ran
await this.waitForBufferStatusLow(kind);to pause before enqueuing new data track packets onto the_data_tracksrtc data channel. In local cases this worked fairly well and the promises seemed to resolve in the order they were triggered.However, when running the e2e tests, @boks1971 noticed an excessive number of out of order packets - in some cases the e2e tests returning as low as a 20% delivery rate for the low frequency large payload test.
I had a suspicion this
await this.waitForBufferStatusLow(kind);bit was the problem, so I swapped it over to an explicit queuing implementation, and when I did, the packet delivery rate went up to 89-90%. In hindsight, relying on the promise resolution order like this was sloppy so I'm glad I'm getting this fixed now.