From b6de63b6a958e86deb0f8e1f5a45cf8519eab35e Mon Sep 17 00:00:00 2001 From: Luke Curley Date: Wed, 15 Jan 2025 23:08:29 -0800 Subject: [PATCH 01/19] Some quick blog ideas. --- src/pages/blog/async-warts.md | 19 +++++++++++++++++++ src/pages/blog/hacking-quic-datagrams.md | 13 +++++++++++++ 2 files changed, 32 insertions(+) create mode 100644 src/pages/blog/async-warts.md create mode 100644 src/pages/blog/hacking-quic-datagrams.md diff --git a/src/pages/blog/async-warts.md b/src/pages/blog/async-warts.md new file mode 100644 index 0000000..127236b --- /dev/null +++ b/src/pages/blog/async-warts.md @@ -0,0 +1,19 @@ +# Async Warts +An idea for a blog post about Rust async. + +## Cancel Safe +https://github.com/kixelated/moq-rs/blob/9707899bc13212e42b4bccfbe5d0522b2e18b57d/moq-transfork/src/model/group.rs#L136 + +## 'static and Arc> + +## Wtf is Pin + +## Send + Sync +And async traits + +## Cleanup is Cool + +## Runtime Agnostic is Cool + +## No locks across await is Cool +*but buggy diff --git a/src/pages/blog/hacking-quic-datagrams.md b/src/pages/blog/hacking-quic-datagrams.md new file mode 100644 index 0000000..45a1614 --- /dev/null +++ b/src/pages/blog/hacking-quic-datagrams.md @@ -0,0 +1,13 @@ +# Hacking QUIC Datagrams +An idea for a blog post about proper datagrams. + +## QUIC Datagrams + +## DTLS Datagrams + +## Ack-eliciting + +## Congestion Controlled + +## Hack the Library +https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 From c5adf30ec0126b81bb42339913bd03016ccae2a9 Mon Sep 17 00:00:00 2001 From: kixelated Date: Tue, 18 Feb 2025 10:57:00 -0800 Subject: [PATCH 02/19] Update and rename hacking-quic-datagrams.md to hacking-quic.md --- src/pages/blog/hacking-quic-datagrams.md | 13 --- src/pages/blog/hacking-quic.md | 102 +++++++++++++++++++++++ 2 files changed, 102 insertions(+), 13 deletions(-) delete mode 100644 src/pages/blog/hacking-quic-datagrams.md create mode 100644 src/pages/blog/hacking-quic.md diff --git a/src/pages/blog/hacking-quic-datagrams.md b/src/pages/blog/hacking-quic-datagrams.md deleted file mode 100644 index 45a1614..0000000 --- a/src/pages/blog/hacking-quic-datagrams.md +++ /dev/null @@ -1,13 +0,0 @@ -# Hacking QUIC Datagrams -An idea for a blog post about proper datagrams. - -## QUIC Datagrams - -## DTLS Datagrams - -## Ack-eliciting - -## Congestion Controlled - -## Hack the Library -https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 diff --git a/src/pages/blog/hacking-quic.md b/src/pages/blog/hacking-quic.md new file mode 100644 index 0000000..98bf11f --- /dev/null +++ b/src/pages/blog/hacking-quic.md @@ -0,0 +1,102 @@ +# Hacking QUIC +QUIC is pretty cool. +Dope even. +Let's bend it to our will. + +We're going to hack QUIC. +"Hack" like a ROM-hack, not "hack" like a prison sentence (unless Nintendo is involved). +We're not trying to be malicious, but rather unlock new functionality while maintaining specification compliance. + +We can do this easily because unlike TCP, QUIC is implemented in *userspace*. +That means we can take a QUIC library, tweak a few lines of code, and unlock new functionality while still being compliant with the spec. +We can ship our modified library as part of our (non-browser) client or server; nobody will suspect a thing. + +The one disclaimer is that we can't modify web clients, as the browser safe-guards their previous UDP sockets like it's Fort Knox. +We have to use their QUIC library (via the WebTransport API), although that doesn't stop us from using a modified server. + +## Proper QUIC Datagrams +I've been quite critical in the past about QUIC datagrams. +They are bait. +Developers want their beloved UDP datagrams are left disappointed if/when they learn about the underlying implementation. + +QUIC datagrams are: +1. Congestion controlled. +2. Trigger acknowledgements. +3. Acknowledgements can't be surfaced to the application. +4. May be batched. + +We can fix *some* of these short-comings by modifying a standard QUIC library. +Be warned though, you're entering dingus territory. + +### Congestion Control +We've already established that you're a dingus. +QUIC libraries expect developers like you to show up, act like you know how networks behave, and send unlimited packets. +It was an explicit goal to not let you do that. + +So let's do that. + +There's nothing stopping a library from sending an unlimited number of QUIC datagrams. +You'll run into flow control limits with QUIC streams, but not with QUIC datagrams. +All we need to do is comment out one check. + +But you do *need* congestion control if you're sending data over the Internet. +Otherwise you'll suffer from congestion, bufferbloat, high loss, and other symptoms that sound like a doctor's diagnosis. +But nobody said you have to use QUIC's congestion control; disable it and implement your own if you dare. + +And note that this is not specific to QUIC datagrams. +You can use a custom congestion controller for QUIC streams too! + +### Acknowledgements +This might sound bizarre if you're used to using UDP, but QUIC will explicitly acknowledge each datagram. +This is **not** used for retransmissions, instead it's only for congestion control. + +So if we implement our own congestion control like above, we'll still get bombarded with potentially useless acknowledgements. +They are batched and quite efficient so it's not the end of the world, but so many dinguses see this as an affront. + +If you control the receiver, you can tweak the `max_ack_delay`. +This is a parameter exchanged during the handshake that indicates how long the implementation can wait before sending an acknowledgement. +Crank it up to 1000ms (the default is 25ms) and the number of acknowledgement packets should slow to a trickle, acting almost as a keep-alive. + +Be warned that this will (somewhat) impact other QUIC frames, especially STREAM retransmissions. +It may also throw a wrench into congestion controllers that expect timely feedback. +IMO only hack the ack delay if you've already hacked the sender to not care about acknowledgements, otherwise its not worth it. + +### Batching +Most QUIC libraries will automatically fill a UDP packet with as much data as it can. +This is dope, but as we established, you're a dingus and can't have nice things. + +Let's say you want to send 100 byte datagrams and don't want QUIC to coalesce them into a single UDP packet. +Maybe you're making a custom FEC scheme or something and you want the packets to be fully independent. + +This is a terrible idea. +Your packets will still get coalesced at a lower level (ex. Ethernet, WiFi) that may even be using it's own FEC scheme. +More UDP packets means more context switching means worse performance. + +But I'm here to pretend not to judge. +You can disable this coalescing on the sender side. + + +## Rapid Retransmit +I was inspired to write this blog post because someone joined my (dope) Discord server. +They asked if they could do all of the above so they could implement their own acknowledgements and retransmissions. +...but why not use QUIC streams? + +### Current State +One thing that doesn't handle well is real-time latency. + +Let's say a packet gets lost over the network. +How does a QUIC library know? +The RFC outlines an algorithm that I'll attempt to simplify: + +- The sender increments a sequence number for each packet. +- Upon receiving a packet, the receiver will schedule an ACK up to `max_ack_delay` in the future. +- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (potentially an empty PING). +- After receiving an ACK, the sender will consider + +Before considering it lost, a typical QUIC library will not consider it lost until 3 newer packets have been received first, or a multiple of the RTT has elapsed. +A sender waits until an ACK has been + +## Application Limited + +## Hack the Library +https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 From d53153490331af0dbefc72f21db4d49e04daa9c0 Mon Sep 17 00:00:00 2001 From: kixelated Date: Tue, 18 Feb 2025 21:50:50 -0800 Subject: [PATCH 03/19] Update hacking-quic.md --- src/pages/blog/hacking-quic.md | 37 ++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/src/pages/blog/hacking-quic.md b/src/pages/blog/hacking-quic.md index 98bf11f..a2d9bbe 100644 --- a/src/pages/blog/hacking-quic.md +++ b/src/pages/blog/hacking-quic.md @@ -5,7 +5,7 @@ Let's bend it to our will. We're going to hack QUIC. "Hack" like a ROM-hack, not "hack" like a prison sentence (unless Nintendo is involved). -We're not trying to be malicious, but rather unlock new functionality while maintaining specification compliance. +We're not trying to be malicious, but rather unlock new functionality while remaining compliant. We can do this easily because unlike TCP, QUIC is implemented in *userspace*. That means we can take a QUIC library, tweak a few lines of code, and unlock new functionality while still being compliant with the spec. @@ -89,12 +89,37 @@ How does a QUIC library know? The RFC outlines an algorithm that I'll attempt to simplify: - The sender increments a sequence number for each packet. -- Upon receiving a packet, the receiver will schedule an ACK up to `max_ack_delay` in the future. -- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (potentially an empty PING). -- After receiving an ACK, the sender will consider +- Upon receiving a packet, the receiver will start a timer to ACK the sequence number, batching with any others that arrive within `max_ack_delay`. +- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver. +- After finally receiving an ACK, the sender *may* decide that a packet was lost if: + - 3 newer sequences were ACKed. + - or a multiple of the RTT has elapsed. +- As the congestion controller allows, retransmit any lost packets and repeat. -Before considering it lost, a typical QUIC library will not consider it lost until 3 newer packets have been received first, or a multiple of the RTT has elapsed. -A sender waits until an ACK has been +You don't need to understand the algorithm: I'll help. +If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. +It's particularly bad for the last few packets of a burst. + +### We Can Do Better +So how can we make QUIC better support real-time applications that can't wait multiple round trips? + +The trick is that a QUIC receiver MUST be prepared to accept duplicate or redundant packets. +This can happen naturally if a packet is reordered or excessively queued over the network. +Nothing is stopping us from sending a boatload of packets. + +Instead of sitting around doing nothing, our QUIC library could pre-emptively retransmit data even before it's considered lost. +Maybe we only enable this above a certain RTT where retransmissions cause unacceptable delay. +But sending redundant copies of data is nothing new; let's go a step further and embrace QUIC streams. + +At the end of the day, a QUIC STREAM frame is a byte offset and payload. +Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit any deltas as STREAM 230-250. +If the original STREAM frame is lost, well we can't actually decode the delta and suffer from HEAD-OF-LINE blocking. + +If latency is critical, you could instead modify the QUIC library to transmit STREAM 0-250 as the second packet. +There's no need to wait a fixed amount before retransmitting dependencies. + +And this is exactly what the game dev was doing but using a custom UDP protocol, complete with acknowledgements and all sorts of stuff QUIC provides for free. +Forking a library and changing a few lines feels *so wrong* but it can a valid solution. ## Application Limited From 00116fbffa66d40693cf453141f3f5b9c7e3f649 Mon Sep 17 00:00:00 2001 From: kixelated Date: Fri, 21 Feb 2025 10:31:01 -0800 Subject: [PATCH 04/19] Update hacking-quic.md --- src/pages/blog/hacking-quic.md | 119 ++++++++++++++++++++++++--------- 1 file changed, 87 insertions(+), 32 deletions(-) diff --git a/src/pages/blog/hacking-quic.md b/src/pages/blog/hacking-quic.md index a2d9bbe..5248375 100644 --- a/src/pages/blog/hacking-quic.md +++ b/src/pages/blog/hacking-quic.md @@ -1,25 +1,23 @@ # Hacking QUIC -QUIC is pretty cool. -Dope even. -Let's bend it to our will. - We're going to hack QUIC. "Hack" like a ROM-hack, not "hack" like a prison sentence (unless Nintendo is involved). -We're not trying to be malicious, but rather unlock new functionality while remaining compliant. +We're not trying to be malicious, but rather unlock new functionality while remaining compliant with the specification. +That's a top 10 nerd pickup line right there. We can do this easily because unlike TCP, QUIC is implemented in *userspace*. That means we can take a QUIC library, tweak a few lines of code, and unlock new functionality while still being compliant with the spec. -We can ship our modified library as part of our (non-browser) client or server; nobody will suspect a thing. +We can ship our modified library as part of our application; nobody will suspect a thing. The one disclaimer is that we can't modify web clients, as the browser safe-guards their previous UDP sockets like it's Fort Knox. -We have to use their QUIC library (via the WebTransport API), although that doesn't stop us from using a modified server. +We have to use the WebTransport API which uses the browser's built in QUIC library. +Although that doesn't stop us from using a modified server so many of these are still possible. ## Proper QUIC Datagrams I've been quite critical in the past about QUIC datagrams. -They are bait. -Developers want their beloved UDP datagrams are left disappointed if/when they learn about the underlying implementation. +Bold statements like "they are bait" and "never use datagrams". +But some developers don't want to know the truth, and just want their beloved UDP datagrams. -QUIC datagrams are: +The problem is that QUIC datagrams are not UDP datagrams, they are: 1. Congestion controlled. 2. Trigger acknowledgements. 3. Acknowledgements can't be surfaced to the application. @@ -30,36 +28,47 @@ Be warned though, you're entering dingus territory. ### Congestion Control We've already established that you're a dingus. -QUIC libraries expect developers like you to show up, act like you know how networks behave, and send unlimited packets. +QUIC libraries expect developers like to act like you understand how networks behave and send unlimited packets. It was an explicit goal to not let you do that. -So let's do that. +So let's do it anyway. -There's nothing stopping a library from sending an unlimited number of QUIC datagrams. +The "truth" is that there's nothing stopping a library from sending an unlimited number of QUIC datagrams. +The specification is the equivalent of a pinky promise because there's no mechanism to enforce a limit on the receiving end. You'll run into flow control limits with QUIC streams, but not with QUIC datagrams. All we need to do is comment out one check. But you do *need* congestion control if you're sending data over the Internet. -Otherwise you'll suffer from congestion, bufferbloat, high loss, and other symptoms that sound like a doctor's diagnosis. -But nobody said you have to use QUIC's congestion control; disable it and implement your own if you dare. +Otherwise you'll suffer from congestion, bufferbloat, high loss, and other symptoms +These are not symptoms of the latest disease kept under wraps by the Trump administration, but rather a reality of the internet being a shared resource. +But nobody said you *have* to use QUIC's congestion control; disable it and implement your own if you dare. + +percentile meme -And note that this is not specific to QUIC datagrams. +And note that congestion control is not specific to QUIC datagrams. You can use a custom congestion controller for QUIC streams too! ### Acknowledgements This might sound bizarre if you're used to using UDP, but QUIC will explicitly acknowledge each datagram. -This is **not** used for retransmissions, instead it's only for congestion control. +However, these are **not** used for retransmissions, so what are they for? -So if we implement our own congestion control like above, we'll still get bombarded with potentially useless acknowledgements. -They are batched and quite efficient so it's not the end of the world, but so many dinguses see this as an affront. +...they're only for congestion control. +But we just disabled QUIC's congestion control! +We'll just get bombarded with useless acknowledgements. +They are batched and quite efficient so it's not the end of the world, but I can already feel your angst. +The most cleverest of dinguses amongst us (amogus?) may think you could leverage these ACKs for your own application. +Unfortunately, there's an edge case discovered by yours truely where QUIC may acknowledge a datagram but never deliver it to the application. +So if you're using QUIC datagrams, yes you do have to implement your own ACK/NACK protocol in the application and yes, it does feel terrible. + +But let's get rid of these useless ACKs. If you control the receiver, you can tweak the `max_ack_delay`. This is a parameter exchanged during the handshake that indicates how long the implementation can wait before sending an acknowledgement. Crank it up to 1000ms (the default is 25ms) and the number of acknowledgement packets should slow to a trickle, acting almost as a keep-alive. Be warned that this will (somewhat) impact other QUIC frames, especially STREAM retransmissions. It may also throw a wrench into congestion controllers that expect timely feedback. -IMO only hack the ack delay if you've already hacked the sender to not care about acknowledgements, otherwise its not worth it. +IMO only hack the ack delay if you've already hacked the sender to not care about acknowledgements, otherwise its not worth it, and even then it's borderline. ### Batching Most QUIC libraries will automatically fill a UDP packet with as much data as it can. @@ -68,24 +77,32 @@ This is dope, but as we established, you're a dingus and can't have nice things. Let's say you want to send 100 byte datagrams and don't want QUIC to coalesce them into a single UDP packet. Maybe you're making a custom FEC scheme or something and you want the packets to be fully independent. -This is a terrible idea. +I interupt this example to proclaim that this is a terrible idea. Your packets will still get coalesced at a lower level (ex. Ethernet, WiFi) that may even be using it's own FEC scheme. More UDP packets means more context switching means worse performance. But I'm here to pretend not to judge. You can disable this coalescing on the sender side. +Tweak a few lines of code and boop, you're sending a proper UDP packet for each QUIC datagram. ## Rapid Retransmit I was inspired to write this blog post because someone joined my (dope) Discord server. They asked if they could do all of the above so they could implement their own acknowledgements and retransmissions. -...but why not use QUIC streams? -### Current State -One thing that doesn't handle well is real-time latency. +So I asked them... why not use QUIC streams? +They already provide reliability, ordering, and can be cancelled. +What more could you want? + +### What More Could We Want? +QUIC is pretty poor for real-time latency. +It's not designed for small payloads that need to arrive ASAP, even if it means worse efficiency. Let's say a packet gets lost over the network. How does a QUIC library know? + +The unfortunate reality (for now) is that there's no explicit signal. +A QUIC library has to use maths and logic to make an educated guess that a packet is lost and needs to be retransmitted. The RFC outlines an algorithm that I'll attempt to simplify: - The sender increments a sequence number for each packet. @@ -96,30 +113,68 @@ The RFC outlines an algorithm that I'll attempt to simplify: - or a multiple of the RTT has elapsed. - As the congestion controller allows, retransmit any lost packets and repeat. -You don't need to understand the algorithm: I'll help. +Skipped that boring wall of text? +I don't blame you. +You're just here for the funny blog and *maaaaybe* learn something along the way. + +I'll help. If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. It's particularly bad for the last few packets of a burst. +That means if you're trying to send data cross-continent, some data will randomly take 100ms to 200ms longer to deliver. ### We Can Do Better So how can we make QUIC better support real-time applications that can't wait multiple round trips? The trick is that a QUIC receiver MUST be prepared to accept duplicate or redundant packets. This can happen naturally if a packet is reordered or excessively queued over the network. -Nothing is stopping us from sending a boatload of packets. +You might see where this is going: nothing can stop us from abusing this behavior and sending a boatload of packets. Instead of sitting around doing nothing, our QUIC library could pre-emptively retransmit data even before it's considered lost. Maybe we only enable this above a certain RTT where retransmissions cause unacceptable delay. But sending redundant copies of data is nothing new; let's go a step further and embrace QUIC streams. At the end of the day, a QUIC STREAM frame is a byte offset and payload. -Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit any deltas as STREAM 230-250. -If the original STREAM frame is lost, well we can't actually decode the delta and suffer from HEAD-OF-LINE blocking. +Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit 20 bytes of deltas as STREAM 230-250. +If the original STREAM frame is lost, well even if we receive those 20 bytes, we can't actually decode them and suffer from HEAD-OF-LINE blocking. + +My game dev friend thinks this is unacceptable and made his own ACK-based algorithm on top of QUIC datagrams instead. +The sender ticks every 30ms and sends a delta from the last acknowledged state, even if that data might be in-flight already. +Pretty cool right? +Why doesn't QUIC do this? + +It does. + +(mind blown) + +QUIC will retransmit any unacknowledged fragments of a stream. +But like I said above, only when a packet is considered lost. +But with the power of h4cks, we could have the QUIC library *assume* the rest of the stream is lost and needs to be retransmitted. +For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. + +So to continue our example above, we can modify QUIC to send byte offsets 0-250 instead of just 230-250. +And now we can accomplish the exact* same behavior as the game dev but without custom acknowledgements, retransmissions, deltas, and reassembly buffers. + +Forking a library feels *so dirty* but it magically works. + + +### Some Caveats +Okay it's not the same as the game dev solution; it's actually better. + +Retransmitting data can quickly balloon out of control. +Congestion can cause bufferbloat, which is when routers queue packets instead of dropping them. +If you retransmit every 30ms, but let's say congestion causes the RTT to (temporarily) increase to 500ms... well now you're transmitting 15x the data and further aggravating any congestion. +It's a vicious loop and you've basically built your own DDoS agent. + +This is yet another reason why you should never disable congestion control. +Yes, I'm still scarred by a Q&A "question" after one of my talks. +Your home grown live video protocol without congestion control is not novel or smart. -If latency is critical, you could instead modify the QUIC library to transmit STREAM 0-250 as the second packet. -There's no need to wait a fixed amount before retransmitting dependencies. +QUIC retransmission are gated by congestion control, so while your real-time application may be clammoring for MORE PACKETS, fortunately QUIC is smart enough to ignore you. +If the network is fully saturated, you need to send fewer packets to drain any queues, not more. -And this is exactly what the game dev was doing but using a custom UDP protocol, complete with acknowledgements and all sorts of stuff QUIC provides for free. -Forking a library and changing a few lines feels *so wrong* but it can a valid solution. +And if the network is fully saturated, or the receiver just drove through a tunnel with no internet access (increasingly rare), you can start over. +Cancel the previous QUIC stream and make a new one once the deltas become larger than a snapshot. +It's that easy. ## Application Limited From fda9c36cc57a0d25a9b879b1cc7dea0f3dcdf6d1 Mon Sep 17 00:00:00 2001 From: kixelated Date: Wed, 26 Feb 2025 09:42:09 -0800 Subject: [PATCH 05/19] Update and rename hacking-quic.md to abusing-quic.md --- .../blog/{hacking-quic.md => abusing-quic.md} | 93 ++++++++++++------- 1 file changed, 62 insertions(+), 31 deletions(-) rename src/pages/blog/{hacking-quic.md => abusing-quic.md} (70%) diff --git a/src/pages/blog/hacking-quic.md b/src/pages/blog/abusing-quic.md similarity index 70% rename from src/pages/blog/hacking-quic.md rename to src/pages/blog/abusing-quic.md index 5248375..3e32a9e 100644 --- a/src/pages/blog/hacking-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -1,37 +1,52 @@ -# Hacking QUIC +# Abusing QUIC We're going to hack QUIC. -"Hack" like a ROM-hack, not "hack" like a prison sentence (unless Nintendo is involved). -We're not trying to be malicious, but rather unlock new functionality while remaining compliant with the specification. -That's a top 10 nerd pickup line right there. +"Hack" like a ROM-hack, not "hack" like a prison sentence. +Unless Nintendo is involved. +We're not trying to be malicious, but rather unlock new functionality while remaining compliant with the specification. We can do this easily because unlike TCP, QUIC is implemented in *userspace*. -That means we can take a QUIC library, tweak a few lines of code, and unlock new functionality while still being compliant with the spec. +That means we can take a QUIC library, tweak a few lines of code, and unlock *new* functionality that the greybeards wanted to keep from us. We can ship our modified library as part of our application; nobody will suspect a thing. -The one disclaimer is that we can't modify web clients, as the browser safe-guards their previous UDP sockets like it's Fort Knox. +The one disclaimer is that we can't modify web clients; the browser safe-guards their precious UDP sockets like it's Fort Knox. We have to use the WebTransport API which uses the browser's built in QUIC library. -Although that doesn't stop us from using a modified server so many of these are still possible. +Our server-side modifications will still work, but short of wasting a zero-day exploit, we can't modify the client. + ## Proper QUIC Datagrams I've been quite critical in the past about QUIC datagrams. Bold statements like "they are bait" and "never use datagrams". -But some developers don't want to know the truth, and just want their beloved UDP datagrams. +But some developers don't want to know the truth and just want their beloved UDP datagrams. -The problem is that QUIC datagrams are not UDP datagrams, they are: -1. Congestion controlled. +The problem is that QUIC datagrams are *not* UDP datagrams. +That's would be something closer to DTLS. +Instead, QUIC datagrams: +1. Are congestion controlled. 2. Trigger acknowledgements. -3. Acknowledgements can't be surfaced to the application. +3. Avoid surfacing these acknowledgements. 4. May be batched. -We can fix *some* of these short-comings by modifying a standard QUIC library. +We're going to try to fix *some* of these short-comings by modifying a standard QUIC library. Be warned though, you're entering dingus territory. +### Dingus Alert +QUIC libraries expect developers like *you* so they baby-proof the shotgun. +They don't want web developers, sporting their favorite "I 💕 node_modules" T-shirt, to access the raw power of UDP and aim it straight at their foot. + +That's why there is no UDP socket Web API. +Heck, there's not even a TCP socket Web API as WebSockets force a HTTP handshake for *reasons*. +You can get kiiiind of close with WebRTC data channels but they are broken for so many reasons. + +QUIC (via WebTransport) doesn't change that mentality. + + +Nor is there a way to do + ### Congestion Control -We've already established that you're a dingus. -QUIC libraries expect developers like to act like you understand how networks behave and send unlimited packets. -It was an explicit goal to not let you do that. -So let's do it anyway. + +But let's suspend reality for a second. + The "truth" is that there's nothing stopping a library from sending an unlimited number of QUIC datagrams. The specification is the equivalent of a pinky promise because there's no mechanism to enforce a limit on the receiving end. @@ -158,23 +173,39 @@ Forking a library feels *so dirty* but it magically works. ### Some Caveats -Okay it's not the same as the game dev solution; it's actually better. - -Retransmitting data can quickly balloon out of control. -Congestion can cause bufferbloat, which is when routers queue packets instead of dropping them. -If you retransmit every 30ms, but let's say congestion causes the RTT to (temporarily) increase to 500ms... well now you're transmitting 15x the data and further aggravating any congestion. +Okay it's not the same as the game dev solution; it's actually better because of **congestion control**. +And once again, you do ~need~ want congestion control. + +Otherwise, retransmitting data can quickly balloon out of control. +Congestion can cause bufferbloat, which is when routers queue packets for an unknown amount of time (potentially for seconds). +Surprise! +It turns out that a router doesn't have to drop a packet when overloaded, but instead it can queue it in RAM. + +Let's say you retransmit every 30ms and everything works great on your PC. +A user from Brazil or India downloads your application and it initially works great too. +But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. +...well now you're transmitting 15x the data and further aggravating any congestion. It's a vicious loop and you've basically built your own DDoS agent. -This is yet another reason why you should never disable congestion control. -Yes, I'm still scarred by a Q&A "question" after one of my talks. -Your home grown live video protocol without congestion control is not novel or smart. - -QUIC retransmission are gated by congestion control, so while your real-time application may be clammoring for MORE PACKETS, fortunately QUIC is smart enough to ignore you. -If the network is fully saturated, you need to send fewer packets to drain any queues, not more. - -And if the network is fully saturated, or the receiver just drove through a tunnel with no internet access (increasingly rare), you can start over. -Cancel the previous QUIC stream and make a new one once the deltas become larger than a snapshot. -It's that easy. +But QUIC can avoid this issue because retransmissions are gated by congestion control. +Even when a packet is considered lost, or my hypothetical `stream.retransmit()` is called, a QUIC library won't immediately retransmit. +Instead, retransmissions are queued up until the congestion controller deems it appropriate. +Note that a late acknowledgement or stream reset will cancel a queued retransmission (unless your QUIC library sucks). + +Why? +If the network is fully saturated, you need to send fewer packets to drain any network queues, not more. +Even ignoring bufferbloat, networks are finite resources and blind retransmissions are the easiest way to join the UDP Wall of Shame. +In this instance,.the QUIC greybeards will stop you from doing bad thing. +The children yearn for the mines, but the adults yearn for child protection laws. + +Under extreme congestion, or when temporarily offline, the backlog of queued data will keep growing and growing. +Once the size of queued delta updates grows larger than the size of a new snapshot, cut your losses and start over. +Reset the stream with deltas to prevent new transmissions and create a new stream with the snapshot. +Repeat as needed; it's that easy! + +I know this horse has already been beaten, battered, and deep fried, but this is yet another benefit of congestion control. +Packets are queued locally so they can be cancelled instantaneously. +Otherwise they would be queued on some intermediate router (ex. for 500ms). ## Application Limited From 8be478a20d0922eff385b7389b060e1cbefc305e Mon Sep 17 00:00:00 2001 From: kixelated Date: Wed, 26 Feb 2025 18:45:53 -0800 Subject: [PATCH 06/19] Update abusing-quic.md --- src/pages/blog/abusing-quic.md | 132 +++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 39 deletions(-) diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic.md index 3e32a9e..abb2ed6 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -12,6 +12,37 @@ The one disclaimer is that we can't modify web clients; the browser safe-guards We have to use the WebTransport API which uses the browser's built in QUIC library. Our server-side modifications will still work, but short of wasting a zero-day exploit, we can't modify the client. +But first, a disclaimer: + +## Dingus Territory +QUIC was designed with developers like *you* in mind. +Yes *you*, wearing your favorite "I 💕 node_modules" T-shirt about to rewrite your website again using the Next-est framework released literally seconds ago. + +*You are a dingus*. + +The greybeards that designed QUIC, the QUIC libraries, and the related web APIs do not respect you. +They think that given a shotgun, the first thing you're going to do is blow your own foot off. +And they're right of course. + +That's why there is no UDP socket Web API. +WebRTC data channels claim to have "unreliable" messages but don't even get me started. +Heck, there's not even a TCP socket Web API; WebSockets force a HTTP handshake for *reasons*. + +QUIC (via WebTransport) doesn't change that mentality. +You can't even disable encryption aka TLS aka HTTPS. +Because otherwise *some dingus* would disable it because they think it make computer go slow (it doesn't) and oh no now North Korea has some more bitcoins. +It's not a good look to provide users with unsafe APIs. + +But let's suspend reality for a second. +Let's say that *You* are a savant who fully understands QUIC and networking. +You're here because you understand the ramifications of your actions and want to push the boundaries of QUIC. +That's great because I have a blog post for you. + +For everyone else, heed my warnings. +Friends don't let friends design UDP protocols. +(And we're friends?) +You should start simple and use the intended QUIC API before reaching for that shotgun. + ## Proper QUIC Datagrams I've been quite critical in the past about QUIC datagrams. @@ -19,76 +50,98 @@ Bold statements like "they are bait" and "never use datagrams". But some developers don't want to know the truth and just want their beloved UDP datagrams. The problem is that QUIC datagrams are *not* UDP datagrams. -That's would be something closer to DTLS. +That would be something closer to DTLS. Instead, QUIC datagrams: 1. Are congestion controlled. 2. Trigger acknowledgements. -3. Avoid surfacing these acknowledgements. +3. Do not expose these acknowledgements. 4. May be batched. We're going to try to fix *some* of these short-comings by modifying a standard QUIC library. Be warned though, you're entering dingus territory. -### Dingus Alert -QUIC libraries expect developers like *you* so they baby-proof the shotgun. -They don't want web developers, sporting their favorite "I 💕 node_modules" T-shirt, to access the raw power of UDP and aim it straight at their foot. - -That's why there is no UDP socket Web API. -Heck, there's not even a TCP socket Web API as WebSockets force a HTTP handshake for *reasons*. -You can get kiiiind of close with WebRTC data channels but they are broken for so many reasons. - -QUIC (via WebTransport) doesn't change that mentality. - - -Nor is there a way to do - ### Congestion Control +The "truth" is that there's nothing stopping a library from sending an unlimited number of QUIC datagrams. +There's only some black pixels on a text document forming the word SHOULD and sometimes SHOULD NOT. +"Congestion Control" is that SHOULD NOT preventing you from flooding the network. +There's many algorithms out there, but basically they guess if the network can handle more traffic. +The simplest form of congestion control is to send less data when packet loss is high. -But let's suspend reality for a second. +But to the dingus, this looks like an artifical limit. +And it's true, many networks could sustain a higher throughput if this pesky congestion control is disabled. +All we need to do is comment out one check and bam, we can send QUIC datagrams at an unlimited rate. +I joke but PLEASE do not do this. -The "truth" is that there's nothing stopping a library from sending an unlimited number of QUIC datagrams. -The specification is the equivalent of a pinky promise because there's no mechanism to enforce a limit on the receiving end. -You'll run into flow control limits with QUIC streams, but not with QUIC datagrams. -All we need to do is comment out one check. - -But you do *need* congestion control if you're sending data over the Internet. -Otherwise you'll suffer from congestion, bufferbloat, high loss, and other symptoms +You *need* some form of congestion control if you're sending data over the internet. +Otherwise you'll suffer from congestion, bufferbloat, high loss, and other symptoms. These are not symptoms of the latest disease kept under wraps by the Trump administration, but rather a reality of the internet being a shared resource. -But nobody said you *have* to use QUIC's congestion control; disable it and implement your own if you dare. + +But it's no fun starting a blog with back to back lectures. +We're here to abuse QUIC damnit, not the readers. + +But nobody said that you have to use *QUIC's congestion control*. +It's pluggable, implement your own! +Unlike TCP, which buries the congestion controller in the kernel, QUIC libraries often expose it as an interace. +Found a startup where you pipe each ACK to ChatGPT and let the funding roll in. +Or do something boring and write a master's thesis. percentile meme -And note that congestion control is not specific to QUIC datagrams. +And note that custom congestion control is not specific to QUIC datagrams. You can use a custom congestion controller for QUIC streams too! +Or completely disable it, you do you. + ### Acknowledgements -This might sound bizarre if you're used to using UDP, but QUIC will explicitly acknowledge each datagram. -However, these are **not** used for retransmissions, so what are they for? +QUIC will reply with an acknowledgement packet are receiving each datagram. +This might sound absolutely bonkers if you're used to UDP. +These are **not** used for retransmissions, so what are they for? ...they're only for congestion control. -But we just disabled QUIC's congestion control! -We'll just get bombarded with useless acknowledgements. -They are batched and quite efficient so it's not the end of the world, but I can already feel your angst. +But what if we just disabled QUIC's congestion control! +Now we're going to get bombarded with useless acknowledgements! + +The good news is that QUIC acknowledgements are batched, potentially at the end of your data packets, and aee quite efficient. +It's only a few extra bytes/packsts and not the end of the world, but I can already feel your angst. + +The most cleverest of dinguses amongst us (amogus?) may think you could leverage these ACKs. +What if we used these otherwise "useless" ACKs to tell our application if a packet was received? +That way we won't have to implement our own ACK/NACK mechanism for reliability. -The most cleverest of dinguses amongst us (amogus?) may think you could leverage these ACKs for your own application. -Unfortunately, there's an edge case discovered by yours truely where QUIC may acknowledge a datagram but never deliver it to the application. -So if you're using QUIC datagrams, yes you do have to implement your own ACK/NACK protocol in the application and yes, it does feel terrible. +Unfortunately, there's an edge case discovered by yours truely where QUIC may acknowledge a datagram but not deliver it to the application. +This will happen if a QUIC library processes a packet but the application (ex. Javascript web page) is too slow. -But let's get rid of these useless ACKs. +So yes, if you're using QUIC datagrams, then you will have to implement your own ACK/NACK protocol in your application dor any reliable data. +One datagram will trigger a QUIC ACK, your custom ACK, and a QUIC ACK for your custom ACK. +Yes, it does feel terrible; like a mud shower on Wednesday. + +So let's get rid of these useless ACKs. If you control the receiver, you can tweak the `max_ack_delay`. This is a parameter exchanged during the handshake that indicates how long the implementation can wait before sending an acknowledgement. Crank it up to 1000ms (the default is 25ms) and the number of acknowledgement packets should slow to a trickle, acting almost as a keep-alive. -Be warned that this will (somewhat) impact other QUIC frames, especially STREAM retransmissions. -It may also throw a wrench into congestion controllers that expect timely feedback. -IMO only hack the ack delay if you've already hacked the sender to not care about acknowledgements, otherwise its not worth it, and even then it's borderline. +Be warned that this will impact all QUIC frames, *especially* STREAM retransmissions. +It may also throw a wrench into the congestion controller too as they expect timely feedback. +The chaos you've sown will be legendary. + +So only consider this route if you've gone full dingus and completely disabled congestion control and retransmissions. +I'm sure it couldn't get worse. ### Batching Most QUIC libraries will automatically fill a UDP packet with as much data as it can. This is dope, but as we established, you're a dingus and can't have nice things. +Maybe you're high on the thrill of sending unlimited packets and now need to figure out why so many of them are getting dropped now. +What if we sent 100 byte packets along with extra some parity bits to fix this pesky "random" packet loss. +Somebody call Nobel, I've got a dynamite idea. + + + + + + Let's say you want to send 100 byte datagrams and don't want QUIC to coalesce them into a single UDP packet. Maybe you're making a custom FEC scheme or something and you want the packets to be fully independent. @@ -103,14 +156,15 @@ Tweak a few lines of code and boop, you're sending a proper UDP packet for each ## Rapid Retransmit I was inspired to write this blog post because someone joined my (dope) Discord server. -They asked if they could do all of the above so they could implement their own acknowledgements and retransmissions. +They asked if they could do all of the above so they would have proper QUIC datagrams. +Then they could implement their own acknowledgements and retransmissions. So I asked them... why not use QUIC streams? They already provide reliability, ordering, and can be cancelled. What more could you want? ### What More Could We Want? -QUIC is pretty poor for real-time latency. +Unfortunately, QUIC is pretty poor for real-time latency. It's not designed for small payloads that need to arrive ASAP, even if it means worse efficiency. Let's say a packet gets lost over the network. From e2927f725c79d1b29028632463be52cab89747f6 Mon Sep 17 00:00:00 2001 From: kixelated Date: Tue, 4 Mar 2025 09:51:05 -0800 Subject: [PATCH 07/19] Update abusing-quic.md --- src/pages/blog/abusing-quic.md | 35 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic.md index abb2ed6..5c5cc7e 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -5,14 +5,10 @@ Unless Nintendo is involved. We're not trying to be malicious, but rather unlock new functionality while remaining compliant with the specification. We can do this easily because unlike TCP, QUIC is implemented in *userspace*. -That means we can take a QUIC library, tweak a few lines of code, and unlock *new* functionality that the greybeards wanted to keep from us. -We can ship our modified library as part of our application; nobody will suspect a thing. +That means we can take a QUIC library, tweak a few lines of code, and unlock new functionality that the greybeards *attempted* to keep from us. +We then ship our modified library as part of our application and nobody will suspect a thing. -The one disclaimer is that we can't modify web clients; the browser safe-guards their precious UDP sockets like it's Fort Knox. -We have to use the WebTransport API which uses the browser's built in QUIC library. -Our server-side modifications will still work, but short of wasting a zero-day exploit, we can't modify the client. - -But first, a disclaimer: +But before we continue, a disclaimer: ## Dingus Territory QUIC was designed with developers like *you* in mind. @@ -26,20 +22,21 @@ And they're right of course. That's why there is no UDP socket Web API. WebRTC data channels claim to have "unreliable" messages but don't even get me started. -Heck, there's not even a TCP socket Web API; WebSockets force a HTTP handshake for *reasons*. +Heck, there's not even a native TCP socket Web API. +WebSockets are a close approximation but force a HTTP handshake and additional framing for *reasons*. -QUIC (via WebTransport) doesn't change that mentality. -You can't even disable encryption aka TLS aka HTTPS. -Because otherwise *some dingus* would disable it because they think it make computer go slow (it doesn't) and oh no now North Korea has some more bitcoins. -It's not a good look to provide users with unsafe APIs. +QUIC doesn't change that mentality. +Short of wasting a zero-day exploit, we have to use the browser's built-in QUIC library via the WebTransport API. +Browser vendors like Google don't want *you*, the commoner, doing any of the stuff mentioned in this article and ruining their cultivated garden. +But that's not going to stop us from modifying the server or the clients we control (ex. native app). -But let's suspend reality for a second. -Let's say that *You* are a savant who fully understands QUIC and networking. -You're here because you understand the ramifications of your actions and want to push the boundaries of QUIC. -That's great because I have a blog post for you. +However, in doing so, you must constantly evaluate if you are the *dingus*. +QUIC famously does not let you disable encryption because otherwise *some dingus* would disable it because they think it make computer go slow (it doesn't) and oh no now North Korea has some more meme coins. +Almost everyone believes that encryption is slow, but once you actually benchmark AES-GCM, it turns out that almost everyone is the dingus. +Yes, there are legitimate use-cases where a full TLS handshake is not worth it, but when the safe API is the best API 99% of the time, then it becomes the only API. -For everyone else, heed my warnings. -Friends don't let friends design UDP protocols. +This article is the equivalent of using the `unsafe` keyword in Rust. +Read it, soak up the power, but heed my warnings, as friends don't let friends design UDP protocols. (And we're friends?) You should start simple and use the intended QUIC API before reaching for that shotgun. @@ -51,7 +48,7 @@ But some developers don't want to know the truth and just want their beloved UDP The problem is that QUIC datagrams are *not* UDP datagrams. That would be something closer to DTLS. -Instead, QUIC datagrams: +Unlike UDP datagrams, QUIC datagrams: 1. Are congestion controlled. 2. Trigger acknowledgements. 3. Do not expose these acknowledgements. From 11fc526606be6b416f24b70a1d6f012ae2edf833 Mon Sep 17 00:00:00 2001 From: kixelated Date: Tue, 4 Mar 2025 18:42:39 -0800 Subject: [PATCH 08/19] Update abusing-quic.md --- src/pages/blog/abusing-quic.md | 87 +++++++++++++++++++++------------- 1 file changed, 54 insertions(+), 33 deletions(-) diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic.md index 5c5cc7e..6519e37 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -58,18 +58,20 @@ We're going to try to fix *some* of these short-comings by modifying a standard Be warned though, you're entering dingus territory. ### Congestion Control -The "truth" is that there's nothing stopping a library from sending an unlimited number of QUIC datagrams. -There's only some black pixels on a text document forming the word SHOULD and sometimes SHOULD NOT. +The truth is that there's nothing stopping a library from sending an unlimited number of QUIC datagrams. +There's only a few pixels on a document somewhere saying you SHOULD NOT do this. -"Congestion Control" is that SHOULD NOT preventing you from flooding the network. -There's many algorithms out there, but basically they guess if the network can handle more traffic. -The simplest form of congestion control is to send less data when packet loss is high. +"Congestion Control" is that SHOULD NOT. +There's many congestion control algorithms out there and they are is an educated guess on if the network can handle more traffic. +The simplest form of congestion control is to send less data when packet loss is high and send more data when packet loss is low. -But to the dingus, this looks like an artifical limit. -And it's true, many networks could sustain a higher throughput if this pesky congestion control is disabled. +But this is an artifical limit that is begging to be broken. +It's often a hiderance as many networks could sustain a higher throughput without this pesky congestion control. All we need to do is comment out one check and bam, we can send QUIC datagrams at an unlimited rate. -I joke but PLEASE do not do this. +But PLEASE do not do this unless you know what you are doing... or are convinced that you know what you are doing. + +[ percentile meme ] You *need* some form of congestion control if you're sending data over the internet. Otherwise you'll suffer from congestion, bufferbloat, high loss, and other symptoms. @@ -77,19 +79,17 @@ These are not symptoms of the latest disease kept under wraps by the Trump admin But it's no fun starting a blog with back to back lectures. We're here to abuse QUIC damnit, not the readers. +Nobody said that you have to use *QUIC's congestion control*. -But nobody said that you have to use *QUIC's congestion control*. -It's pluggable, implement your own! -Unlike TCP, which buries the congestion controller in the kernel, QUIC libraries often expose it as an interace. -Found a startup where you pipe each ACK to ChatGPT and let the funding roll in. -Or do something boring and write a master's thesis. - -percentile meme - +In fact, QUIC is designed with pluggable congestion control in mind. +The good libraries expose an interface s you can ship your latest and greatest congestion controller alongside with your application. +You can't do this with TCP as it's buried inside the kernel, so +1 points to QUIC. And note that custom congestion control is not specific to QUIC datagrams. You can use a custom congestion controller for QUIC streams too! -Or completely disable it, you do you. +So found a startup where you pipe each ACK to ChatGPT and let the funding roll in. +Or do something boring and write a master's thesis about curve fitting or something. +Or completely disable congestion control altogether, you do you. ### Acknowledgements QUIC will reply with an acknowledgement packet are receiving each datagram. @@ -97,22 +97,28 @@ This might sound absolutely bonkers if you're used to UDP. These are **not** used for retransmissions, so what are they for? ...they're only for congestion control. -But what if we just disabled QUIC's congestion control! +But what if we just disabled QUIC's congestion control? Now we're going to get bombarded with useless acknowledgements! -The good news is that QUIC acknowledgements are batched, potentially at the end of your data packets, and aee quite efficient. -It's only a few extra bytes/packsts and not the end of the world, but I can already feel your angst. +The good news is that QUIC acknowledgements are batched, potentially appended to your data packets, and are quite efficient. +It's only a few extra bytes/packets so it's not the end of the world. +But I can already feel your angst; your uncontrollable urge to optimize this *wasted bandwidth*. -The most cleverest of dinguses amongst us (amogus?) may think you could leverage these ACKs. +The most cleverest of dinguses amongst us (amogus?) might try to leverage these ACKs. What if we used these otherwise "useless" ACKs to tell our application if a packet was received? That way we won't have to implement our own ACK/NACK mechanism for reliability. +Somebody call Nobel, that's a dynamite idea. + +You can absolutely hack the QUIC library to expose which datagrams were acknowledged by the remote. -Unfortunately, there's an edge case discovered by yours truely where QUIC may acknowledge a datagram but not deliver it to the application. -This will happen if a QUIC library processes a packet but the application (ex. Javascript web page) is too slow. +...but unfortunately, there's an edge case discovered by yours truely where QUIC may acknowledge a datagram but not deliver it to the application. +This will happen if a QUIC library processes a packet but the application (ex. Javascript web page) is too slow to process it. +It's not a deal breaker especially if your application is only semi-reliable, but it's quite unfortunate. +Note: QUIC streams don't suffer from this because of flow control. -So yes, if you're using QUIC datagrams, then you will have to implement your own ACK/NACK protocol in your application dor any reliable data. -One datagram will trigger a QUIC ACK, your custom ACK, and a QUIC ACK for your custom ACK. -Yes, it does feel terrible; like a mud shower on Wednesday. +If you're using QUIC datagrams and want more reliability... then you should to implement your own ACK/NACK protocol. +This is gross because one datagram will trigger a QUIC ACK, your custom ACK, and a QUIC ACK for your custom ACK. +The angst is overwhelming now. So let's get rid of these useless ACKs. If you control the receiver, you can tweak the `max_ack_delay`. @@ -130,20 +136,35 @@ I'm sure it couldn't get worse. Most QUIC libraries will automatically fill a UDP packet with as much data as it can. This is dope, but as we established, you're a dingus and can't have nice things. -Maybe you're high on the thrill of sending unlimited packets and now need to figure out why so many of them are getting dropped now. -What if we sent 100 byte packets along with extra some parity bits to fix this pesky "random" packet loss. -Somebody call Nobel, I've got a dynamite idea. +Let's you're high on the thrill of sending unlimited packets after disabling congestion control. +However, sometimes a bunch of packets get lost and you need to figure out why. +Surely it can't be the consequences of your actions? + +"No! +It's the network's fault! +I'm going to shotgun additional copies to ensure at least one arrives..." + +I cringed a little bit writing that because I've sat through presentations by staff (video) engineers claiming the same thing. +You've cringed reading this blog instead. + +Now there are multiple things wrong with this line of thinking. +It turns out there's no secret cheat code to the internet: sending more packets will cause proportially *more* packet loss as devices get fully saturated. +But we're going to save that for the finale and instead focus on **atomicity**. +Packet loss instinctively feels like an independent event, like a coin toss on a router. +Sending the same packet back-to-back means you get to toss the coin again, right? +In reality, a "packet" is a high level abstraction. +Even the humble UDP packet will get coalesced at a lower level (ex. Ethernet, WiFi) that may even be using it's own recovery scheme. +Your "independent" packets may actually be fate-bound and dropped together. +QUIC takes this a step further and batches everything, including datagrams. +Ten 100 byte datagrams may appear disjoint in your application but secretly get combined into one UDP datagram under the covers. +Your brain used to be smooth but now it's wrinkly af. -Let's say you want to send 100 byte datagrams and don't want QUIC to coalesce them into a single UDP packet. -Maybe you're making a custom FEC scheme or something and you want the packets to be fully independent. -I interupt this example to proclaim that this is a terrible idea. -Your packets will still get coalesced at a lower level (ex. Ethernet, WiFi) that may even be using it's own FEC scheme. More UDP packets means more context switching means worse performance. But I'm here to pretend not to judge. From 8e1e32c1f5259223fbc7bb6f13510ddb80d91b77 Mon Sep 17 00:00:00 2001 From: kixelated Date: Wed, 5 Mar 2025 09:46:42 -0800 Subject: [PATCH 09/19] Update abusing-quic.md --- src/pages/blog/abusing-quic.md | 158 +++++++++++++++++++-------------- 1 file changed, 91 insertions(+), 67 deletions(-) diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic.md index 6519e37..9f2d7e8 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -22,28 +22,31 @@ And they're right of course. That's why there is no UDP socket Web API. WebRTC data channels claim to have "unreliable" messages but don't even get me started. -Heck, there's not even a native TCP socket Web API. -WebSockets are a close approximation but force a HTTP handshake and additional framing for *reasons*. +Heck, there's not even a native TCP API; WebSockets are a close approximation but force a HTTP handshake and additional framing for *reasons*. QUIC doesn't change that mentality. Short of wasting a zero-day exploit, we have to use the browser's built-in QUIC library via the WebTransport API. -Browser vendors like Google don't want *you*, the commoner, doing any of the stuff mentioned in this article and ruining their cultivated garden. +Browser vendors like Google don't want *you*, the `node_modules` enjoyer, doing any of the stuff mentioned in this article on *their* web clients. But that's not going to stop us from modifying the server or the clients we control (ex. native app). -However, in doing so, you must constantly evaluate if you are the *dingus*. -QUIC famously does not let you disable encryption because otherwise *some dingus* would disable it because they think it make computer go slow (it doesn't) and oh no now North Korea has some more meme coins. -Almost everyone believes that encryption is slow, but once you actually benchmark AES-GCM, it turns out that almost everyone is the dingus. -Yes, there are legitimate use-cases where a full TLS handshake is not worth it, but when the safe API is the best API 99% of the time, then it becomes the only API. +However, in doing so, you must constantly evaluate if you are the *dingus* in the exchange. +QUIC famously does not let you disable encryption because otherwise *some dingus* would disable it because they think it make computer go slow (it doesn't) and oh no now North Korea has some more bitcoins. +So many people believe that encryption is slow, but once you actually benchmark AES-GCM, it turns out that so many people are *the dingus*. +Yes, there are legitimate use-cases where a full TLS handshake is not worth it. +But when the safe API is the best API 99% of the time, then it becomes the only API. This article is the equivalent of using the `unsafe` keyword in Rust. -Read it, soak up the power, but heed my warnings, as friends don't let friends design UDP protocols. +You can do these things and you'll super feel smart, but are you smart? + +So heed my warnings. +Friends don't let friends design UDP protocols. (And we're friends?) You should start simple and use the intended QUIC API before reaching for that shotgun. ## Proper QUIC Datagrams I've been quite critical in the past about QUIC datagrams. -Bold statements like "they are bait" and "never use datagrams". +Bold statements like "they are bait", "never use datagrams", and "try using QUIC streams first ffs". But some developers don't want to know the truth and just want their beloved UDP datagrams. The problem is that QUIC datagrams are *not* UDP datagrams. @@ -54,41 +57,45 @@ Unlike UDP datagrams, QUIC datagrams: 3. Do not expose these acknowledgements. 4. May be batched. -We're going to try to fix *some* of these short-comings by modifying a standard QUIC library. +We're going to try to fix some of these short-comings in the standard by modifying a standard QUIC library. Be warned though, you're entering dingus territory. ### Congestion Control -The truth is that there's nothing stopping a library from sending an unlimited number of QUIC datagrams. -There's only a few pixels on a document somewhere saying you SHOULD NOT do this. +The truth is that there's nothing stopping a QUIC library from sending an unlimited number of QUIC datagrams. +There's only a few pixels in the standard that say you SHOULD NOT do this. -"Congestion Control" is that SHOULD NOT. -There's many congestion control algorithms out there and they are is an educated guess on if the network can handle more traffic. +"Congestion Control" is what a library SHOULD do. +There's many congestion control algorithms out there, and put simply they are little more than an educated guess on if the network can handle more traffic. The simplest form of congestion control is to send less data when packet loss is high and send more data when packet loss is low. But this is an artifical limit that is begging to be broken. It's often a hiderance as many networks could sustain a higher throughput without this pesky congestion control. All we need to do is comment out one check and bam, we can send QUIC datagrams at an unlimited rate. -But PLEASE do not do this unless you know what you are doing... or are convinced that you know what you are doing. +But please do not do this unless you know what you are doing... or are convinced that you know what you are doing. [ percentile meme ] You *need* some form of congestion control if you're sending data over the internet. Otherwise you'll suffer from congestion, bufferbloat, high loss, and other symptoms. These are not symptoms of the latest disease kept under wraps by the Trump administration, but rather a reality of the internet being a shared resource. +Routers will queue and eventually drop excess packets, wrecking any algorithm that treats the internet like an unlimited pipe. But it's no fun starting a blog with back to back lectures. We're here to abuse QUIC damnit, not the readers. -Nobody said that you have to use *QUIC's congestion control*. -In fact, QUIC is designed with pluggable congestion control in mind. -The good libraries expose an interface s you can ship your latest and greatest congestion controller alongside with your application. +But I did not say that you should use the *default* congestion control. +The QUIC RFC is based on the dated TCP New Reno algorithm which performs poorly when latency is important or bufferbloat rampant. +That's because QUIC is designed with pluggable congestion control in mind. +Most libraries expose an interface so you choose the congestion controller or make your own. You can't do this with TCP as it's buried inside the kernel, so +1 points to QUIC. + And note that custom congestion control is not specific to QUIC datagrams. You can use a custom congestion controller for QUIC streams too! -So found a startup where you pipe each ACK to ChatGPT and let the funding roll in. -Or do something boring and write a master's thesis about curve fitting or something. +So use a battle tested algorithm like BBR instead of the default. +Or make your own by piping each ACK to ChatGPT and let the funding roll in. +Or be extra boring and write a master's thesis about curve fitting or something Or completely disable congestion control altogether, you do you. ### Acknowledgements @@ -109,33 +116,37 @@ What if we used these otherwise "useless" ACKs to tell our application if a pack That way we won't have to implement our own ACK/NACK mechanism for reliability. Somebody call Nobel, that's a dynamite idea. -You can absolutely hack the QUIC library to expose which datagrams were acknowledged by the remote. +You can absolutely hack a QUIC library to expose which datagrams were acknowledged by the remote. +More information is always better, right? +Why don't QUIC libraries provide this? -...but unfortunately, there's an edge case discovered by yours truely where QUIC may acknowledge a datagram but not deliver it to the application. -This will happen if a QUIC library processes a packet but the application (ex. Javascript web page) is too slow to process it. -It's not a deal breaker especially if your application is only semi-reliable, but it's quite unfortunate. -Note: QUIC streams don't suffer from this because of flow control. +...unfortunately there's an edge case discovered by yours truely. +The QUIC may acknowledge a datagram but it gets dropped before being delivered it to the application. +This will happen if a QUIC library processes a packet, sends an ACK, but the application (ex. Javascript web page) is too slow and we run out of memory to queue it. +**Note:** QUIC streams don't suffer from this issue because they use flow control. -If you're using QUIC datagrams and want more reliability... then you should to implement your own ACK/NACK protocol. +This might not be a deal breaker depending on your application. +However, it's a quite unfortunate footgun if you're expecting QUIC ACKs to be the definitive signal that your application has received a packet. +If you want that reliability... then you should implement your own ACK/NACK protocol on top of QUIC datagrams. This is gross because one datagram will trigger a QUIC ACK, your custom ACK, and a QUIC ACK for your custom ACK. -The angst is overwhelming now. +I bet the angst is overwhelming now. -So let's get rid of these useless ACKs. -If you control the receiver, you can tweak the `max_ack_delay`. +So let's get rid of these useless ACKs instead. +If you control the receiver, you can tweak the `max_ack_delay` parameter. This is a parameter exchanged during the handshake that indicates how long the implementation can wait before sending an acknowledgement. -Crank it up to 1000ms (the default is 25ms) and the number of acknowledgement packets should slow to a trickle, acting almost as a keep-alive. +Crank it up to 1000ms (the default is 25ms) and the number of acknowledgement packets should slow to a trickle. Be warned that this will impact all QUIC frames, *especially* STREAM retransmissions. -It may also throw a wrench into the congestion controller too as they expect timely feedback. +It may also throw a wrench into the congestion controller too as they expect timely feedback.. +So only consider this route if you've gone *full dingus* and completely disabled congestion control and streams. The chaos you've sown will be legendary. -So only consider this route if you've gone full dingus and completely disabled congestion control and retransmissions. -I'm sure it couldn't get worse. - ### Batching Most QUIC libraries will automatically fill a UDP packet with as much data as it can. -This is dope, but as we established, you're a dingus and can't have nice things. +This is dope, but as we established, you can't settle for nice things. +We're here to strip a transport protocol to its core. +But why would you do this? Let's you're high on the thrill of sending unlimited packets after disabling congestion control. However, sometimes a bunch of packets get lost and you need to figure out why. Surely it can't be the consequences of your actions? @@ -144,36 +155,37 @@ Surely it can't be the consequences of your actions? It's the network's fault! I'm going to shotgun additional copies to ensure at least one arrives..." -I cringed a little bit writing that because I've sat through presentations by staff (video) engineers claiming the same thing. -You've cringed reading this blog instead. - -Now there are multiple things wrong with this line of thinking. +I cringed a bit writing that (while you cringed reading this blog). +See, I've sat through too presentations by staff (video) engineers claiming the same thing. +FEC is the solution to a problem that they don't understand. It turns out there's no secret cheat code to the internet: sending more packets will cause proportially *more* packet loss as devices get fully saturated. -But we're going to save that for the finale and instead focus on **atomicity**. -Packet loss instinctively feels like an independent event, like a coin toss on a router. +But we're going to save that for the finale rant and instead focus on **atomicity**. +Packet loss instinctively feels like an independent event: a coin toss on a router somewhere. Sending the same packet back-to-back means you get to toss the coin again, right? -In reality, a "packet" is a high level abstraction. -Even the humble UDP packet will get coalesced at a lower level (ex. Ethernet, WiFi) that may even be using it's own recovery scheme. -Your "independent" packets may actually be fate-bound and dropped together. - -QUIC takes this a step further and batches everything, including datagrams. -Ten 100 byte datagrams may appear disjoint in your application but secretly get combined into one UDP datagram under the covers. +Not really, because a "packet" is a high level abstraction. +Even the humble UDP packet will get coalesced at a lower level with it's own recovery scheme. +For example, 7 UDP packets (1.2KB MTU*) can fit snug into a jumbo Ethernet frame. +If your protocol depends on "independent" packets, then you may be distraught to learn that they are actually somewhat fate-bound and may be dropped in batches. +QUIC takes this a step further and batches everything, including datagrams. +You may send ten, 100 byte datagrams that appear disjoint in your application but may secretly get combined into one UDP datagram under the covers. +You're at the mercy of the QUIC library, which is at the mercy of the lower level transport. -Your brain used to be smooth but now it's wrinkly af. +Fortunately, QUIC cannot split a datagram across multiple UDP packets. +If your datagrams are large enough (>600 bytes*) then you can sleep easy knowing they won't get combined with other datagrams. +But just like everything else thus far, we can disable this behavior entirely. +Tweak a few lines of code and boop, you're sending a proper UDP packet for each QUIC datagram. +I'm not sure why you would, because it can only worsen performance, but I'm here (to pretend) not to judge. +Your brain used to be smooth but now it's wrinkly af 🧠🔥. -More UDP packets means more context switching means worse performance. -But I'm here to pretend not to judge. -You can disable this coalescing on the sender side. -Tweak a few lines of code and boop, you're sending a proper UDP packet for each QUIC datagram. +## Improper QUIC Streams +Okay so we've hacked QUIC datagrams to pieces, but why? - -## Rapid Retransmit -I was inspired to write this blog post because someone joined my (dope) Discord server. +I was actually inspired to write this blog post because someone joined my (dope) Discord server. They asked if they could do all of the above so they would have proper QUIC datagrams. Then they could implement their own acknowledgements and retransmissions. @@ -182,19 +194,25 @@ They already provide reliability, ordering, and can be cancelled. What more could you want? ### What More Could We Want? -Unfortunately, QUIC is pretty poor for real-time latency. -It's not designed for small payloads that need to arrive ASAP, even if it means worse efficiency. +Look, I may be one of the biggest QUIC fanboys, but I've got to admit that QUIC is pretty poor for real-time latency. +It's not designed for small payloads that need to arrive ASAP, like voice calls. + +But don't take my wrinkle brain statements as fact. +Let's dive deeper into the conceptual abyss. +How does a QUIC library know when a packet is lost? -Let's say a packet gets lost over the network. -How does a QUIC library know? +The unfortunate reality is that it doesn't. +There's no explicit signal (yet?) from routers. +A QUIC library has to instead use FACTS and LOGIC like Ben Shapiro losing a debate against a debate against a university student. +Yes I did make a second political joke in a nerd blog about networking, but that shouldn't be a surprise because I use Rust. +🏳️‍🌈🏳️‍⚧️:usa: -The unfortunate reality (for now) is that there's no explicit signal. -A QUIC library has to use maths and logic to make an educated guess that a packet is lost and needs to be retransmitted. -The RFC outlines an algorithm that I'll attempt to simplify: +Anyway, QUIC works by making an educated guess that a packet is lost and needs to be retransmitted. +The RFC outlines an algorithm and some *recommended* behavior that I'll attempt to simplify: - The sender increments a sequence number for each packet. -- Upon receiving a packet, the receiver will start a timer to ACK the sequence number, batching with any others that arrive within `max_ack_delay`. -- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver. +- Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. +- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver and hopefully start the ACK timer. - After finally receiving an ACK, the sender *may* decide that a packet was lost if: - 3 newer sequences were ACKed. - or a multiple of the RTT has elapsed. @@ -202,12 +220,18 @@ The RFC outlines an algorithm that I'll attempt to simplify: Skipped that boring wall of text? I don't blame you. -You're just here for the funny blog and *maaaaybe* learn something along the way. +You're just here for the funny (political?) blog and *maaaaybe* learn something along the way. I'll help. If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. -It's particularly bad for the last few packets of a burst. -That means if you're trying to send data cross-continent, some data will randomly take 100ms to 200ms longer to deliver. +It's particularly bad for the last few packets of a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke (via the PTO timer). + +But wait what's an RTT? +I just completely glossed over that acronym and expected Google to hallucination an explanation. +The Round Trip Time, is pretty self-explanatory: it's the time it takes for a packet to complete a circuit to and then from a remote (aka "ping"). +So if you're playing Counter Strike cross-continent, you're already at a disadvantage because it takes 150ms for your packets to register. +Throw QUIC into the mix and some packets will take 300ms-450ms because of the conservative retransmissions. +*cyka bylat* ### We Can Do Better So how can we make QUIC better support real-time applications that can't wait multiple round trips? From 0ebeb0507d3e52c05bd36f2e90e500c93d950492 Mon Sep 17 00:00:00 2001 From: kixelated Date: Wed, 5 Mar 2025 19:01:26 -0800 Subject: [PATCH 10/19] Update abusing-quic.md --- src/pages/blog/abusing-quic.md | 87 +++++++++++++++++++++------------- 1 file changed, 54 insertions(+), 33 deletions(-) diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic.md index 9f2d7e8..52d5133 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -187,10 +187,15 @@ Okay so we've hacked QUIC datagrams to pieces, but why? I was actually inspired to write this blog post because someone joined my (dope) Discord server. They asked if they could do all of the above so they would have proper QUIC datagrams. -Then they could implement their own acknowledgements and retransmissions. +Only then they could implement their own acknowledgements and retransmissions. -So I asked them... why not use QUIC streams? -They already provide reliability, ordering, and can be cancelled. +The use-game is vidya games. +The most common approach for video games is to process game state at a constant "tick" rate. +VALORANT, for example, uses a tick rate of [128 Hz](https://playvalorant.com/en-us/news/dev/how-we-got-to-the-best-performing-valorant-servers-since-launch/) meaning each update covers a 7.8ms period. +It's really not too difficult from frame rate (my vidya background) but latency is more crucial otherwise nerds get mad. + +But why disassemble QUIC only to reassemble parts of it? +QUIC srreams provide reliability, ordering, and can be cancelled. What more could you want? ### What More Could We Want? @@ -198,17 +203,14 @@ Look, I may be one of the biggest QUIC fanboys, but I've got to admit that QUIC It's not designed for small payloads that need to arrive ASAP, like voice calls. But don't take my wrinkle brain statements as fact. -Let's dive deeper into the conceptual abyss. -How does a QUIC library know when a packet is lost? +Let's dive deeper. -The unfortunate reality is that it doesn't. -There's no explicit signal (yet?) from routers. -A QUIC library has to instead use FACTS and LOGIC like Ben Shapiro losing a debate against a debate against a university student. -Yes I did make a second political joke in a nerd blog about networking, but that shouldn't be a surprise because I use Rust. -🏳️‍🌈🏳️‍⚧️:usa: +*How does a QUIC library know when a packet is lost?* -Anyway, QUIC works by making an educated guess that a packet is lost and needs to be retransmitted. -The RFC outlines an algorithm and some *recommended* behavior that I'll attempt to simplify: +It doesn't. +There's no explicit signal from routers (yet?) when a packet is lost. +A QUIC library has to instead use FACTS and LOGIC to make an educated guess. +The RFC outlines a *recommended* algorithm that I'll attempt to simplify: - The sender increments a sequence number for each packet. - Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. @@ -220,29 +222,53 @@ The RFC outlines an algorithm and some *recommended* behavior that I'll attempt Skipped that boring wall of text? I don't blame you. -You're just here for the funny (political?) blog and *maaaaybe* learn something along the way. +You're just here for the funny blog and *maaaaybe* learn something along the way. I'll help. If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. -It's particularly bad for the last few packets of a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke (via the PTO timer). +It's particularly bad for the last few packets in a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke. +"You still alive over there?" -But wait what's an RTT? -I just completely glossed over that acronym and expected Google to hallucination an explanation. -The Round Trip Time, is pretty self-explanatory: it's the time it takes for a packet to complete a circuit to and then from a remote (aka "ping"). -So if you're playing Counter Strike cross-continent, you're already at a disadvantage because it takes 150ms for your packets to register. -Throw QUIC into the mix and some packets will take 300ms-450ms because of the conservative retransmissions. +And just in case I lost you in the acronym soup, RTT is just another way of saying "your ping". +So if you're playing Counter Strike cross-continent with a ping of 150ms, you're already at a disadvantage. +Throw QUIC into the mix and some packets will take 300ms to 450ms of conservative retransmissions. *cyka bylat* -### We Can Do Better +### We Can Do Better? So how can we make QUIC better support real-time applications that can't wait multiple round trips? +Should we give up and admit that networking peaked with UDP? + +Of course not what a dumb rhetorical question. + +We can use QUIC streams! +A QUIC stream is nothing more than a byte slice. +The stream is arbitrarily split into STREAM frames that consists of an offset and a payload. +The QUIC reviever reassembles these frames in order before flushing to the application, although some QUIC libraries allow flushing out of order. + +We can abuse the fact that a QUIC receiver must be prepared to accept duplicate or redundant STREAM frames. +This can happen naturally if a packet is lost or arrives out of order. +You might see where this is going: nothing can stop us from sending a boatload of packets. -The trick is that a QUIC receiver MUST be prepared to accept duplicate or redundant packets. -This can happen naturally if a packet is reordered or excessively queued over the network. -You might see where this is going: nothing can stop us from abusing this behavior and sending a boatload of packets. +Our QUIC library does not need to wait for a (negative) acknowledgement before retransmitting a stream chunk. +We could just send it again, and again, and again every 50ms. +If it's a duplicate, then QUIC will silently ignore it. + +But there's a pretty major issue with this approach: +**BUFFERBLOAT**. +Surprise! +It turns out that some routers may queue packets for an undisclosed amount of time when overloaded. + +Let's say you retransmit every 50ms and everything works great on your PC. +A user from Brazil or India downloads your application and it initially works great too. +But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. +...well now you're transmitting 10x the data and further aggravating any congestion. +It's a vicious loop and you've basically built your own DDoS agent. + +Either way, sending redundant copies of data is nothing new. +Let's go a step further and embrace QUIC streams. + +### How I Learned to Embrace the Stream -Instead of sitting around doing nothing, our QUIC library could pre-emptively retransmit data even before it's considered lost. -Maybe we only enable this above a certain RTT where retransmissions cause unacceptable delay. -But sending redundant copies of data is nothing new; let's go a step further and embrace QUIC streams. At the end of the day, a QUIC STREAM frame is a byte offset and payload. Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit 20 bytes of deltas as STREAM 230-250. @@ -272,10 +298,7 @@ Forking a library feels *so dirty* but it magically works. Okay it's not the same as the game dev solution; it's actually better because of **congestion control**. And once again, you do ~need~ want congestion control. -Otherwise, retransmitting data can quickly balloon out of control. -Congestion can cause bufferbloat, which is when routers queue packets for an unknown amount of time (potentially for seconds). -Surprise! -It turns out that a router doesn't have to drop a packet when overloaded, but instead it can queue it in RAM. + Let's say you retransmit every 30ms and everything works great on your PC. A user from Brazil or India downloads your application and it initially works great too. @@ -301,9 +324,7 @@ Repeat as needed; it's that easy! I know this horse has already been beaten, battered, and deep fried, but this is yet another benefit of congestion control. Packets are queued locally so they can be cancelled instantaneously. -Otherwise they would be queued on some intermediate router (ex. for 500ms). - -## Application Limited +Otherwise they would be queued on some intermediate router (ex. for 500ms). ## Hack the Library https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 From c9280f23df37ab2a5cd55f1738d25ca9ea65a9dd Mon Sep 17 00:00:00 2001 From: kixelated Date: Thu, 6 Mar 2025 09:12:21 -0800 Subject: [PATCH 11/19] Update abusing-quic.md --- src/pages/blog/abusing-quic.md | 36 ++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic.md index 52d5133..cb03704 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -187,16 +187,20 @@ Okay so we've hacked QUIC datagrams to pieces, but why? I was actually inspired to write this blog post because someone joined my (dope) Discord server. They asked if they could do all of the above so they would have proper QUIC datagrams. -Only then they could implement their own acknowledgements and retransmissions. The use-game is vidya games. The most common approach for video games is to process game state at a constant "tick" rate. VALORANT, for example, uses a tick rate of [128 Hz](https://playvalorant.com/en-us/news/dev/how-we-got-to-the-best-performing-valorant-servers-since-launch/) meaning each update covers a 7.8ms period. It's really not too difficult from frame rate (my vidya background) but latency is more crucial otherwise nerds get mad. -But why disassemble QUIC only to reassemble parts of it? -QUIC srreams provide reliability, ordering, and can be cancelled. -What more could you want? +So the idea is to transmit each game tick as a QUIC datagram. +However, that would involve transmitting a lot of redundant information, as two game ticks may be very similar to each other. +So the idea is to implement custom acknowledgements and (re)transmit only the unacknowledged deltas. + +If you've had the pleasure of implementing QUIC before, this might sound very similar to how QUIC streams work internally. +In fact, this line of thinking is what lead me to ditch RTP over QUIC (datagrams) and embrace Media over QUIC (streams). +So why disassemble QUIC only to reassemble parts of it? +If we used QUIC streams instead, what more could you want? ### What More Could We Want? Look, I may be one of the biggest QUIC fanboys, but I've got to admit that QUIC is pretty poor for real-time latency. @@ -243,16 +247,33 @@ Of course not what a dumb rhetorical question. We can use QUIC streams! A QUIC stream is nothing more than a byte slice. The stream is arbitrarily split into STREAM frames that consists of an offset and a payload. -The QUIC reviever reassembles these frames in order before flushing to the application, although some QUIC libraries allow flushing out of order. +The QUIC reviever reassembles these frames in order before flushing to the application. +Some QUIC libraries even allow the application to read stream chunks out of order. + +How do we use QUIC streams for vidya games? +Let's suppose we start with a base game state of 1000 bytes and each tick there's an update of 100 bytes. +We make a new stream, serialize the base game state, and constantly append each update. +QUIC will ensure that the update and deltas arrive in the intended order so it's super easy to parse. + +But not so fast, there's a **huge** issue. +We just implemented head-of-line blocking and our protocol is suddenly no better than TCP! +I was promised that QUIC was supposed to fix this... + + We can abuse the fact that a QUIC receiver must be prepared to accept duplicate or redundant STREAM frames. This can happen naturally if a packet is lost or arrives out of order. You might see where this is going: nothing can stop us from sending a boatload of packets. + + + Our QUIC library does not need to wait for a (negative) acknowledgement before retransmitting a stream chunk. We could just send it again, and again, and again every 50ms. If it's a duplicate, then QUIC will silently ignore it. +## A Problem + But there's a pretty major issue with this approach: **BUFFERBLOAT**. Surprise! @@ -261,9 +282,12 @@ It turns out that some routers may queue packets for an undisclosed amount of ti Let's say you retransmit every 50ms and everything works great on your PC. A user from Brazil or India downloads your application and it initially works great too. But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. -...well now you're transmitting 10x the data and further aggravating any congestion. +...well now you're transmitting 10x the data, potentially aggravating any congestion and preventing recovery. It's a vicious loop and you've basically built your own DDoS agent. +For the distributed engineers amogus, this is the networking equivalent of an F5 refresh storm. + + Either way, sending redundant copies of data is nothing new. Let's go a step further and embrace QUIC streams. From 3a271e6485bee1b9334aa4a278f7b2212cf927bf Mon Sep 17 00:00:00 2001 From: kixelated Date: Thu, 6 Mar 2025 10:02:08 -0800 Subject: [PATCH 12/19] Update abusing-quic.md --- src/pages/blog/abusing-quic.md | 88 ++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic.md index cb03704..c873e41 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic.md @@ -181,6 +181,94 @@ Tweak a few lines of code and boop, you're sending a proper UDP packet for each I'm not sure why you would, because it can only worsen performance, but I'm here (to pretend) not to judge. Your brain used to be smooth but now it's wrinkly af 🧠🔥. +## Real-time Streams +Look, I may be one of the biggest QUIC fanboys on the planet, but I've got to admit that QUIC streams are pretty poor for real-time latency. +They're designed for not designed for bulk delivery, not small payloads that need to arrive ASAP like voice calls. +It's the reason why the dinguses reach for datagrams. + +But don't take my wrinkle brain statements as fact. +Let's dive deeper and FIX IT. + +### Detecting Loss +``` +| |i +|| |_ +``` + +*How does a QUIC library know when a packet is lost?* + +It doesn't. +There's no explicit signal from routers (yet?) when a packet is lost. +A QUIC library has to instead use FACTS and LOGIC to make an educated guess. +The RFC outlines a *recommended* algorithm that I'll attempt to simplify: + +- The sender increments a sequence number for each packet. +- Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. +- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver and hopefully start the ACK timer. +- After finally receiving an ACK, the sender *may* decide that a packet was lost if: + - 3 newer sequences were ACKed. + - or a multiple of the RTT has elapsed. +- As the congestion controller allows, retransmit any lost packets and repeat. + +Skipped that boring, "simplified" wall of text? +I don't blame you. +You're just here for the funny blog and *maaaaybe* learn something along the way. + +I'll help. +If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. +It's particularly bad for the last few packets in a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke. +"You still alive over there?". +The tail of our stream will take longer (on average) to arrive unless there's other data in flight to perform this poking. + +And just in case I lost you in the acronym soup, RTT is just another way of saying "your ping". +So if you're playing Counter Strike cross-continent with a ping of 150ms, you're already at a disadvantage. +Throw QUIC into the mix and some packets will take 300ms to 450ms of conservative retransmissions. +*cyka bylat* + + +### Head-of-line Blocking +We're not done yet. +QUIC streams are also poor for real-time because they introduce head-of-line blocking. + +Let's suppose we want to stream real-time chat over QUIC. +But we're super latency sensitive, like it's a bad rash, and need the latest sentence as soon as possible. +We can't settle for random words; we need the full thing in order baby. +The itch is absolutely unbearable and we're okay being a little bit wasteful. + +If the broadcaster types "hello" followed shortly by "world", we have a few options. +Pop quiz, which approach is subjectively the best: + +Option A: Create a stream, write the word "hello", then later write "world". +Option B: Create a stream and write "0hello". Later, create another stream and write "5world". The number at the start is the offset. +Option C: Create a stream and write "hello". Later, create another stream and write "helloworld". +Option D: Abuse QUIC (no spoilers) + +If you answered D then you're correct. +Let's use a red pen and explain why the other students are failing the exam. +No cushy software engineering gig for you. + +#### Option A +*Create a stream, write the word "hello", then later write "world".* + +This is classic head-of-line blocking. If the packet containing "hello" gets lost over the network, then we can't actually use the "world" message if it arrives first. +But that's okay in this scenario because of my arbitrary rules + +The real problem is that when the "hello" packet is lost, it won't arrive for *at least* an RTT after "world" because of the affirmationed retransmission logic. +That's no good. + +#### Option B +*Create a stream and write "0hello". Later, create another stream and write "5world". The number at the start is the offset.* + +I didn't explain how multiple streams work because a bad teacher blames their students. +And I wanted to blame you. + + +QUIC will retransmit any unacknowledged fragments of a stream. +But like I said above, only when a packet is considered lost. +But with the power of h4cks, we could have the QUIC library *assume* the rest of the stream is lost and needs to be retransmitted. +For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. + +### BBR ## Improper QUIC Streams Okay so we've hacked QUIC datagrams to pieces, but why? From 321e70cc216d1af6935e007351971da1dfca08c6 Mon Sep 17 00:00:00 2001 From: Luke Curley Date: Thu, 6 Mar 2025 13:01:41 -0800 Subject: [PATCH 13/19] Split into three. --- ...sing-quic.md => abusing-quic-datagrams.md} | 23 +- src/pages/blog/abusing-quic-streams.md | 247 ++++++++++++++++++ src/pages/blog/using-quic-streams.md | 165 ++++++++++++ 3 files changed, 425 insertions(+), 10 deletions(-) rename src/pages/blog/{abusing-quic.md => abusing-quic-datagrams.md} (98%) create mode 100644 src/pages/blog/abusing-quic-streams.md create mode 100644 src/pages/blog/using-quic-streams.md diff --git a/src/pages/blog/abusing-quic.md b/src/pages/blog/abusing-quic-datagrams.md similarity index 98% rename from src/pages/blog/abusing-quic.md rename to src/pages/blog/abusing-quic-datagrams.md index c873e41..0a6b7e8 100644 --- a/src/pages/blog/abusing-quic.md +++ b/src/pages/blog/abusing-quic-datagrams.md @@ -1,4 +1,7 @@ -# Abusing QUIC +# Abusing QUIC Datagrams +This is the first part of our QUIC hackathon. +Read [Abusing QUIC Streams](/blog/abusing-quic-streams) if you like ordered data like a normal human being. + We're going to hack QUIC. "Hack" like a ROM-hack, not "hack" like a prison sentence. Unless Nintendo is involved. @@ -10,7 +13,7 @@ We then ship our modified library as part of our application and nobody will sus But before we continue, a disclaimer: -## Dingus Territory +## Dingus Territory QUIC was designed with developers like *you* in mind. Yes *you*, wearing your favorite "I 💕 node_modules" T-shirt about to rewrite your website again using the Next-est framework released literally seconds ago. @@ -181,7 +184,7 @@ Tweak a few lines of code and boop, you're sending a proper UDP packet for each I'm not sure why you would, because it can only worsen performance, but I'm here (to pretend) not to judge. Your brain used to be smooth but now it's wrinkly af 🧠🔥. -## Real-time Streams +## Real-time Streams Look, I may be one of the biggest QUIC fanboys on the planet, but I've got to admit that QUIC streams are pretty poor for real-time latency. They're designed for not designed for bulk delivery, not small payloads that need to arrive ASAP like voice calls. It's the reason why the dinguses reach for datagrams. @@ -208,7 +211,7 @@ The RFC outlines a *recommended* algorithm that I'll attempt to simplify: - After finally receiving an ACK, the sender *may* decide that a packet was lost if: - 3 newer sequences were ACKed. - or a multiple of the RTT has elapsed. -- As the congestion controller allows, retransmit any lost packets and repeat. +- As the congestion controller allows, retransmit any lost packets and repeat. Skipped that boring, "simplified" wall of text? I don't blame you. @@ -226,7 +229,7 @@ Throw QUIC into the mix and some packets will take 300ms to 450ms of conservativ *cyka bylat* -### Head-of-line Blocking +### Head-of-line Blocking We're not done yet. QUIC streams are also poor for real-time because they introduce head-of-line blocking. @@ -251,7 +254,7 @@ No cushy software engineering gig for you. *Create a stream, write the word "hello", then later write "world".* This is classic head-of-line blocking. If the packet containing "hello" gets lost over the network, then we can't actually use the "world" message if it arrives first. -But that's okay in this scenario because of my arbitrary rules +But that's okay in this scenario because of my arbitrary rules The real problem is that when the "hello" packet is lost, it won't arrive for *at least* an RTT after "world" because of the affirmationed retransmission logic. That's no good. @@ -270,7 +273,7 @@ For you library maintainers out there, consider adding this as a `stream.retrans ### BBR -## Improper QUIC Streams +## Improper QUIC Streams Okay so we've hacked QUIC datagrams to pieces, but why? I was actually inspired to write this blog post because someone joined my (dope) Discord server. @@ -310,7 +313,7 @@ The RFC outlines a *recommended* algorithm that I'll attempt to simplify: - After finally receiving an ACK, the sender *may* decide that a packet was lost if: - 3 newer sequences were ACKed. - or a multiple of the RTT has elapsed. -- As the congestion controller allows, retransmit any lost packets and repeat. +- As the congestion controller allows, retransmit any lost packets and repeat. Skipped that boring wall of text? I don't blame you. @@ -379,7 +382,7 @@ For the distributed engineers amogus, this is the networking equivalent of an F5 Either way, sending redundant copies of data is nothing new. Let's go a step further and embrace QUIC streams. -### How I Learned to Embrace the Stream +### How I Learned to Embrace the Stream At the end of the day, a QUIC STREAM frame is a byte offset and payload. @@ -436,7 +439,7 @@ Repeat as needed; it's that easy! I know this horse has already been beaten, battered, and deep fried, but this is yet another benefit of congestion control. Packets are queued locally so they can be cancelled instantaneously. -Otherwise they would be queued on some intermediate router (ex. for 500ms). +Otherwise they would be queued on some intermediate router (ex. for 500ms). ## Hack the Library https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 diff --git a/src/pages/blog/abusing-quic-streams.md b/src/pages/blog/abusing-quic-streams.md new file mode 100644 index 0000000..cd6d0bd --- /dev/null +++ b/src/pages/blog/abusing-quic-streams.md @@ -0,0 +1,247 @@ +# Abusing QUIC Streams +This is the second part of our QUIC hackathon. +Read [Abusing QUIC Datagrams](/blog/abusing-quic-datagrams) if byte streams confuse you. + +Look, I may be one of the biggest QUIC fanboys on the planet, but I've got to admit that QUIC streams are pretty poor for real-time latency. +They're designed for not designed for bulk delivery, not small payloads that need to arrive ASAP like voice calls. +It's the reason why the dinguses reach for datagrams. + +But don't take my wrinkle brain statements as fact. +Let's dive deeper and FIX IT. + +## Detecting Loss +``` +| |i +|| |_ +``` + +*How does a QUIC library know when a packet is lost?* + +It doesn't. +There's no explicit signal from routers (yet?) when a packet is lost. +A QUIC library has to instead use FACTS and LOGIC to make an educated guess. +The RFC outlines a *recommended* algorithm that I'll attempt to simplify: + +- The sender increments a sequence number for each packet. +- Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. +- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver and hopefully start the ACK timer. +- After finally receiving an ACK, the sender *may* decide that a packet was lost if: + - 3 newer sequences were ACKed. + - or a multiple of the RTT has elapsed. +- As the congestion controller allows, retransmit any lost packets and repeat. + +Skipped that boring, "simplified" wall of text? +I don't blame you. +You're just here for the funny blog and *maaaaybe* learn something along the way. + +I'll help. +If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. +It's particularly bad for the last few packets in a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke. +"You still alive over there?". +The tail of our stream will take longer (on average) to arrive unless there's other data in flight to perform this poking. + +And just in case I lost you in the acronym soup, RTT is just another way of saying "your ping". +So if you're playing Counter Strike cross-continent with a ping of 150ms, you're already at a disadvantage. +Throw QUIC into the mix and some packets will take 300ms to 450ms of conservative retransmissions. +*cyka bylat* + + +## Head-of-line Blocking +We're not done with packet loss yet, but let's put an `!Unpin` in it. + +QUIC streams are also poor for real-time because they introduce head-of-line blocking. + +Let's suppose we want to stream real-time chat over QUIC. +But we're super latency sensitive, like it's a bad rash, and need the latest sentence as soon as possible. +We can't settle for random words; we need the full thing in order baby. +The itch is absolutely unbearable and we're okay being a little bit wasteful. + +If the broadcaster types "hello" followed shortly by "world", we have a few options. +Pop quiz, which approach is subjectively the best: + +Option A: Create a stream, write the word "hello", then later write "world". +Option B: Create a stream and write "0hello". Later, create another stream and write "5world". The number at the start is the offset. +Option C: Create a stream and write "hello". Later, create another stream and write "helloworld". +Option D: Abuse QUIC (no spoilers) + +If you answered D then you're correct. +Let's use a red pen and explain why the other students are failing the exam. +No cushy software engineering gig for you. + +### Option A +*Create a stream, write the word "hello", then later write "world".* + +This is classic head-of-line blocking. If the packet containing "hello" gets lost over the network, then we can't actually use the "world" message if it arrives first. +But that's okay in this scenario because of my arbitrary rules + +The real problem is that when the "hello" packet is lost, it won't arrive for *at least* an RTT after "world" because of the affirmationed retransmission logic. +That's no good. + +### Option B +*Create a stream and write "0hello". Later, create another stream and write "5world". The number at the start is the offset.* + +I didn't explain how multiple streams work because a bad teacher blames their students. +And I wanted to blame you. + +QUIC streams share a connection but are otherwise independent. +You can create as many streams *as the remote peer allows* with no\* overhead. +In fact, if you were considering retransmitting QUIC datagrams, you could totally use a QUIC stream per datagram instead. + +In this example, both "hello" and "world" will be (re)transmitted independently over separate streams and it's up to the receiver to reassemble them. +That's why we had to include the offset, otherwise the receiver would have no idea that "world" comes after "hello" (duh). +At some point we would also need to include a "sentence ID" if we wanted to support multiple sentences. +The receiver receives "helloworld" and voila, our itch is scratched. + +But this approach sucks. +Major suckage. + +But don't feel bad if this seemed like a good idea. +You literally just reimplemented QUIC streams and this is identical to **Option A**. +We did all of this extra work for nothing. + +Despite the fact that they're using separate streams, "hello" still won't be retransmitted until after "world" is acknowledged. +The fundamental problem is that QUIC retransmissions occur at the *packet level*. +We need to dive deeper, not just create an independent stream. + +### Option C +*Create a stream and write "hello". Later, create another stream and write "helloworld".* + +Now we're getting somewhere. +We're finally wasting bytes. + +This approach is better than Option A/B (for real-time latency) because we removed a dependency. +If "hello" is lost, well it doesn't matter because "helloworld" contains a redundant copy. + +But this approach is still not ideal. +What if "hello" is acknowledged *before* we write "world"? +We don't want to waste bytes unncessarily and shouldn't retransmit "hello". +Most QUIC libraries don't expose these details to the application. +And even if they did, we would have to implement **Option B** and send "5world". +And what if there's a gap, like "world" is acknowledged but not "hello" or a trailing "!"? + +We're delving back into *reimplementing QUIC streams* territory. +The wheel has been invented already. +If only there was a way to hack a QUIC library to do what we want... + +### Option D +*Abuse QUIC (no spoilers)* + +A QUIC stream is broken into STREAM frames that consist of an offset and payload. +The QUIC sender keeps track of which STREAM frames were stuffed into which UDP packets so it knows what to retransmit if a packet is lost. +The QUIC receiver reassembles these STREAM frames based on the offset then flushes it to the application. + +The magic trick depends on an important fact: +A QUIC receiver must be prepared to accept duplicate, overlapping, or otherwise redundant STREAM frames. + +See, there's no requirement that a STREAM frame is retransmitted verbatim. +If STREAM 10-20 is lost, we could retransmit it as STREAM 10-15, STREAM 17-20, and STREAM 15-17 if we wanted to. +This is actually super useful because we can cram a UDP packet full of miscallenous STREAM frames without worrying about overrunning the MTU. + +Grab your favorite sleeveless shirt because we are *abusive*. + +We're going to use a single stream like **Option A**. +Normally, "hello" is sent as STREAM 0-5 and "world" is sent as STREAM 5-10. +However, we can modify our QUIC library to actually transmit STREAM 0-10 instead, effectively sending "helloworld" in one packet. +More generally, we can retransmit any unacknowledged fragments of a stream. + +The easiest way to implement this is to have the QUIC library *assume* the in-flight fragments of a stream are lost and need to be retransmitted. +This won't impact congestion control because we don't consider the packets as lost... just some of their contents. +For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. + +## Revisiting Retransmissions +Remember the part where I said: + +> We're not done with packet loss yet, but let's put an `!Unpin` in it. + +We're back baby. +That's because as covered in the previous section, it's totally legal to retransmit a stream chunk without waiting for an acknowledgement. +There's nothing actually stopping us flooding the network with duplicate copies. +We could just send it again, and again, and again, and again, and again every 50ms. +If it's a duplicate, then QUIC will silently ignore it. + + + + + + + +## A Problem + +But there's a pretty major issue with this approach: +**BUFFERBLOAT**. +Surprise! +It turns out that some routers may queue packets for an undisclosed amount of time when overloaded. + +Let's say you retransmit every 50ms and everything works great on your PC. +A user from Brazil or India downloads your application and it initially works great too. +But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. +...well now you're transmitting 10x the data, potentially aggravating any congestion and preventing recovery. +It's a vicious loop and you've basically built your own DDoS agent. + +For the distributed engineers amogus, this is the networking equivalent of an F5 refresh storm. + + +Either way, sending redundant copies of data is nothing new. +Let's go a step further and embrace QUIC streams. + +### How I Learned to Embrace the Stream + + +At the end of the day, a QUIC STREAM frame is a byte offset and payload. +Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit 20 bytes of deltas as STREAM 230-250. +If the original STREAM frame is lost, well even if we receive those 20 bytes, we can't actually decode them and suffer from HEAD-OF-LINE blocking. + +My game dev friend thinks this is unacceptable and made his own ACK-based algorithm on top of QUIC datagrams instead. +The sender ticks every 30ms and sends a delta from the last acknowledged state, even if that data might be in-flight already. +Pretty cool right? +Why doesn't QUIC do this? + +It does. + +(mind blown) + +QUIC will retransmit any unacknowledged fragments of a stream. +But like I said above, only when a packet is considered lost. +But with the power of h4cks, we could have the QUIC library *assume* the rest of the stream is lost and needs to be retransmitted. +For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. + +So to continue our example above, we can modify QUIC to send byte offsets 0-250 instead of just 230-250. +And now we can accomplish the exact* same behavior as the game dev but without custom acknowledgements, retransmissions, deltas, and reassembly buffers. + +Forking a library feels *so dirty* but it magically works. + + +### Some Caveats +Okay it's not the same as the game dev solution; it's actually better because of **congestion control**. +And once again, you do ~need~ want congestion control. + + + +Let's say you retransmit every 30ms and everything works great on your PC. +A user from Brazil or India downloads your application and it initially works great too. +But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. +...well now you're transmitting 15x the data and further aggravating any congestion. +It's a vicious loop and you've basically built your own DDoS agent. + +But QUIC can avoid this issue because retransmissions are gated by congestion control. +Even when a packet is considered lost, or my hypothetical `stream.retransmit()` is called, a QUIC library won't immediately retransmit. +Instead, retransmissions are queued up until the congestion controller deems it appropriate. +Note that a late acknowledgement or stream reset will cancel a queued retransmission (unless your QUIC library sucks). + +Why? +If the network is fully saturated, you need to send fewer packets to drain any network queues, not more. +Even ignoring bufferbloat, networks are finite resources and blind retransmissions are the easiest way to join the UDP Wall of Shame. +In this instance,.the QUIC greybeards will stop you from doing bad thing. +The children yearn for the mines, but the adults yearn for child protection laws. + +Under extreme congestion, or when temporarily offline, the backlog of queued data will keep growing and growing. +Once the size of queued delta updates grows larger than the size of a new snapshot, cut your losses and start over. +Reset the stream with deltas to prevent new transmissions and create a new stream with the snapshot. +Repeat as needed; it's that easy! + +I know this horse has already been beaten, battered, and deep fried, but this is yet another benefit of congestion control. +Packets are queued locally so they can be cancelled instantaneously. +Otherwise they would be queued on some intermediate router (ex. for 500ms). + +## Hack the Library +https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 diff --git a/src/pages/blog/using-quic-streams.md b/src/pages/blog/using-quic-streams.md new file mode 100644 index 0000000..bf56ebe --- /dev/null +++ b/src/pages/blog/using-quic-streams.md @@ -0,0 +1,165 @@ +# Using QUIC Streams +I was actually inspired to write this blog post because someone joined my (dope) Discord server. +They asked if they could do all of the above so they would have proper QUIC datagrams. + +The use-game is vidya games. +The most common approach for video games is to process game state at a constant "tick" rate. +VALORANT, for example, uses a tick rate of [128 Hz](https://playvalorant.com/en-us/news/dev/how-we-got-to-the-best-performing-valorant-servers-since-launch/) meaning each update covers a 7.8ms period. +It's really not too difficult from frame rate (my vidya background) but latency is more crucial otherwise nerds get mad. + +So the idea is to transmit each game tick as a QUIC datagram. +However, that would involve transmitting a lot of redundant information, as two game ticks may be very similar to each other. +So the idea is to implement custom acknowledgements and (re)transmit only the unacknowledged deltas. + +If you've had the pleasure of implementing QUIC before, this might sound very similar to how QUIC streams work internally. +In fact, this line of thinking is what lead me to ditch RTP over QUIC (datagrams) and embrace Media over QUIC (streams). +So why disassemble QUIC only to reassemble parts of it? +If we used QUIC streams instead, what more could you want? + +### What More Could We Want? +Look, I may be one of the biggest QUIC fanboys, but I've got to admit that QUIC is pretty poor for real-time latency. +It's not designed for small payloads that need to arrive ASAP, like voice calls. + +But don't take my wrinkle brain statements as fact. +Let's dive deeper. + +*How does a QUIC library know when a packet is lost?* + +It doesn't. +There's no explicit signal from routers (yet?) when a packet is lost. +A QUIC library has to instead use FACTS and LOGIC to make an educated guess. +The RFC outlines a *recommended* algorithm that I'll attempt to simplify: + +- The sender increments a sequence number for each packet. +- Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. +- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver and hopefully start the ACK timer. +- After finally receiving an ACK, the sender *may* decide that a packet was lost if: + - 3 newer sequences were ACKed. + - or a multiple of the RTT has elapsed. +- As the congestion controller allows, retransmit any lost packets and repeat. + +Skipped that boring wall of text? +I don't blame you. +You're just here for the funny blog and *maaaaybe* learn something along the way. + +I'll help. +If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. +It's particularly bad for the last few packets in a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke. +"You still alive over there?" + +And just in case I lost you in the acronym soup, RTT is just another way of saying "your ping". +So if you're playing Counter Strike cross-continent with a ping of 150ms, you're already at a disadvantage. +Throw QUIC into the mix and some packets will take 300ms to 450ms of conservative retransmissions. +*cyka bylat* + +### We Can Do Better? +So how can we make QUIC better support real-time applications that can't wait multiple round trips? +Should we give up and admit that networking peaked with UDP? + +Of course not what a dumb rhetorical question. + +We can use QUIC streams! +A QUIC stream is nothing more than a byte slice. +The stream is arbitrarily split into STREAM frames that consists of an offset and a payload. +The QUIC reviever reassembles these frames in order before flushing to the application. +Some QUIC libraries even allow the application to read stream chunks out of order. + +How do we use QUIC streams for vidya games? +Let's suppose we start with a base game state of 1000 bytes and each tick there's an update of 100 bytes. +We make a new stream, serialize the base game state, and constantly append each update. +QUIC will ensure that the update and deltas arrive in the intended order so it's super easy to parse. + +But not so fast, there's a **huge** issue. +We just implemented head-of-line blocking and our protocol is suddenly no better than TCP! +I was promised that QUIC was supposed to fix this... + + + +We can abuse the fact that a QUIC receiver must be prepared to accept duplicate or redundant STREAM frames. +This can happen naturally if a packet is lost or arrives out of order. +You might see where this is going: nothing can stop us from sending a boatload of packets. + + + + +Our QUIC library does not need to wait for a (negative) acknowledgement before retransmitting a stream chunk. +We could just send it again, and again, and again every 50ms. +If it's a duplicate, then QUIC will silently ignore it. + +## A Problem + +But there's a pretty major issue with this approach: +**BUFFERBLOAT**. +Surprise! +It turns out that some routers may queue packets for an undisclosed amount of time when overloaded. + +Let's say you retransmit every 50ms and everything works great on your PC. +A user from Brazil or India downloads your application and it initially works great too. +But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. +...well now you're transmitting 10x the data, potentially aggravating any congestion and preventing recovery. +It's a vicious loop and you've basically built your own DDoS agent. + +For the distributed engineers amogus, this is the networking equivalent of an F5 refresh storm. + + +Either way, sending redundant copies of data is nothing new. +Let's go a step further and embrace QUIC streams. + +### How I Learned to Embrace the Stream + + +At the end of the day, a QUIC STREAM frame is a byte offset and payload. +Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit 20 bytes of deltas as STREAM 230-250. +If the original STREAM frame is lost, well even if we receive those 20 bytes, we can't actually decode them and suffer from HEAD-OF-LINE blocking. + +My game dev friend thinks this is unacceptable and made his own ACK-based algorithm on top of QUIC datagrams instead. +The sender ticks every 30ms and sends a delta from the last acknowledged state, even if that data might be in-flight already. +Pretty cool right? +Why doesn't QUIC do this? + +It does. + +(mind blown) + +QUIC will retransmit any unacknowledged fragments of a stream. +But like I said above, only when a packet is considered lost. +But with the power of h4cks, we could have the QUIC library *assume* the rest of the stream is lost and needs to be retransmitted. +For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. + +So to continue our example above, we can modify QUIC to send byte offsets 0-250 instead of just 230-250. +And now we can accomplish the exact* same behavior as the game dev but without custom acknowledgements, retransmissions, deltas, and reassembly buffers. + +Forking a library feels *so dirty* but it magically works. + + +### Some Caveats +Okay it's not the same as the game dev solution; it's actually better because of **congestion control**. +And once again, you do ~need~ want congestion control. + + + +Let's say you retransmit every 30ms and everything works great on your PC. +A user from Brazil or India downloads your application and it initially works great too. +But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. +...well now you're transmitting 15x the data and further aggravating any congestion. +It's a vicious loop and you've basically built your own DDoS agent. + +But QUIC can avoid this issue because retransmissions are gated by congestion control. +Even when a packet is considered lost, or my hypothetical `stream.retransmit()` is called, a QUIC library won't immediately retransmit. +Instead, retransmissions are queued up until the congestion controller deems it appropriate. +Note that a late acknowledgement or stream reset will cancel a queued retransmission (unless your QUIC library sucks). + +Why? +If the network is fully saturated, you need to send fewer packets to drain any network queues, not more. +Even ignoring bufferbloat, networks are finite resources and blind retransmissions are the easiest way to join the UDP Wall of Shame. +In this instance,.the QUIC greybeards will stop you from doing bad thing. +The children yearn for the mines, but the adults yearn for child protection laws. + +Under extreme congestion, or when temporarily offline, the backlog of queued data will keep growing and growing. +Once the size of queued delta updates grows larger than the size of a new snapshot, cut your losses and start over. +Reset the stream with deltas to prevent new transmissions and create a new stream with the snapshot. +Repeat as needed; it's that easy! + +I know this horse has already been beaten, battered, and deep fried, but this is yet another benefit of congestion control. +Packets are queued locally so they can be cancelled instantaneously. +Otherwise they would be queued on some intermediate router (ex. for 500ms). From 1c933a74526ccf37e964fbf9464ce05b5ab1e7a0 Mon Sep 17 00:00:00 2001 From: kixelated Date: Mon, 10 Mar 2025 09:34:53 -0700 Subject: [PATCH 14/19] Update abusing-quic-streams.md --- src/pages/blog/abusing-quic-streams.md | 97 ++++++++------------------ 1 file changed, 29 insertions(+), 68 deletions(-) diff --git a/src/pages/blog/abusing-quic-streams.md b/src/pages/blog/abusing-quic-streams.md index cd6d0bd..de3e1d3 100644 --- a/src/pages/blog/abusing-quic-streams.md +++ b/src/pages/blog/abusing-quic-streams.md @@ -133,9 +133,9 @@ The QUIC receiver reassembles these STREAM frames based on the offset then flush The magic trick depends on an important fact: A QUIC receiver must be prepared to accept duplicate, overlapping, or otherwise redundant STREAM frames. -See, there's no requirement that a STREAM frame is retransmitted verbatim. +See, there's no requirement that the *same* STREAM frame is retransmitted. If STREAM 10-20 is lost, we could retransmit it as STREAM 10-15, STREAM 17-20, and STREAM 15-17 if we wanted to. -This is actually super useful because we can cram a UDP packet full of miscallenous STREAM frames without worrying about overrunning the MTU. +This is on purpose and super useful because we can cram a UDP packet full of miscallenous STREAM frames without worrying about overrunning the MTU. Grab your favorite sleeveless shirt because we are *abusive*. @@ -155,17 +155,11 @@ Remember the part where I said: We're back baby. That's because as covered in the previous section, it's totally legal to retransmit a stream chunk without waiting for an acknowledgement. -There's nothing actually stopping us flooding the network with duplicate copies. -We could just send it again, and again, and again, and again, and again every 50ms. -If it's a duplicate, then QUIC will silently ignore it. +The specification says don't do it, but there's nothing *actually* stopping us flooding the network with duplicate copies. +If QUIC receives a duplicate stream chunk, it will silently ignore it. - - - - - - -## A Problem +So rather than wait 450ms for a (worst-case) acknowledgement, what if we just... don't? +We could just transmit the same stream chunk again, and again, and again, and again, and again every 50ms. But there's a pretty major issue with this approach: **BUFFERBLOAT**. @@ -179,69 +173,36 @@ But eventually their ISP gets overwhelmed and congestion causes the RTT to (temp It's a vicious loop and you've basically built your own DDoS agent. For the distributed engineers amogus, this is the networking equivalent of an F5 refresh storm. +Blind retransmissions are the easiest way to join the UDP Wall of Shame. -Either way, sending redundant copies of data is nothing new. -Let's go a step further and embrace QUIC streams. - -### How I Learned to Embrace the Stream - - -At the end of the day, a QUIC STREAM frame is a byte offset and payload. -Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit 20 bytes of deltas as STREAM 230-250. -If the original STREAM frame is lost, well even if we receive those 20 bytes, we can't actually decode them and suffer from HEAD-OF-LINE blocking. - -My game dev friend thinks this is unacceptable and made his own ACK-based algorithm on top of QUIC datagrams instead. -The sender ticks every 30ms and sends a delta from the last acknowledged state, even if that data might be in-flight already. -Pretty cool right? -Why doesn't QUIC do this? - -It does. - -(mind blown) - -QUIC will retransmit any unacknowledged fragments of a stream. -But like I said above, only when a packet is considered lost. -But with the power of h4cks, we could have the QUIC library *assume* the rest of the stream is lost and needs to be retransmitted. -For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. - -So to continue our example above, we can modify QUIC to send byte offsets 0-250 instead of just 230-250. -And now we can accomplish the exact* same behavior as the game dev but without custom acknowledgements, retransmissions, deltas, and reassembly buffers. - -Forking a library feels *so dirty* but it magically works. - - -### Some Caveats -Okay it's not the same as the game dev solution; it's actually better because of **congestion control**. -And once again, you do ~need~ want congestion control. +### Congestion Control to the Rescue +Actually I just lied. +This infinite loop of pain, suffering, and bloat is what would happen if you retransmitted using UDP datagrams. +But not with QUIC streams (and datagrams). - -Let's say you retransmit every 30ms and everything works great on your PC. -A user from Brazil or India downloads your application and it initially works great too. -But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. -...well now you're transmitting 15x the data and further aggravating any congestion. -It's a vicious loop and you've basically built your own DDoS agent. - -But QUIC can avoid this issue because retransmissions are gated by congestion control. +Retransmissions are gated by congestion conntrol. Even when a packet is considered lost, or my hypothetical `stream.retransmit()` is called, a QUIC library won't immediately retransmit. -Instead, retransmissions are queued up until the congestion controller deems it appropriate. -Note that a late acknowledgement or stream reset will cancel a queued retransmission (unless your QUIC library sucks). +Instead, stream retransmissions are queued up until the congestion controller allows more packets to be sent. -Why? -If the network is fully saturated, you need to send fewer packets to drain any network queues, not more. -Even ignoring bufferbloat, networks are finite resources and blind retransmissions are the easiest way to join the UDP Wall of Shame. -In this instance,.the QUIC greybeards will stop you from doing bad thing. +The QUIC greybeards will stop you from doing bad thing. The children yearn for the mines, but the adults yearn for child protection laws. -Under extreme congestion, or when temporarily offline, the backlog of queued data will keep growing and growing. -Once the size of queued delta updates grows larger than the size of a new snapshot, cut your losses and start over. -Reset the stream with deltas to prevent new transmissions and create a new stream with the snapshot. -Repeat as needed; it's that easy! +But now we have a new problem. +Our precious data is getting queued locally and latency starts to climb. +If we do nothing, then nothing gets dropped and oh no, we just reimplemented TCP. + +At some point we have to take the L and wipe the buffer clean. +QUIC lets you do this by resetting a stream, notifying the receiver and cancelling any queued (re)transmissions. +We can then make a new stream (for free*) and start over with a new base. +For those more media inclined, this would mean resetting the current GoP and encoding a new I-frame on a new stream. -I know this horse has already been beaten, battered, and deep fried, but this is yet another benefit of congestion control. -Packets are queued locally so they can be cancelled instantaneously. -Otherwise they would be queued on some intermediate router (ex. for 500ms). +To recap: +- Using UDP directly: our data gets queued and arbitrarily dropped by some router in the void. +- Using QUIC datagrams, our data get dropped locally (congestion control) and arbitrarily dropped by the void, although less often. +- Using QUIC streams, our data gets queued locally (congestion control) and explicitly dropped when we choose. -## Hack the Library -https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 +I can't believe there aren't more QUIC stream fanboys. +It's a great abstraction because *you do not understand networking* nor should your application care how stuff gets split into IP packets. +Your application should deal with queues that get drained at an unpredictable rate. From e1aab0b67560f546e6c1efc79cd9291f51211b88 Mon Sep 17 00:00:00 2001 From: kixelated Date: Wed, 2 Apr 2025 09:36:16 -0700 Subject: [PATCH 15/19] Update abusing-quic-datagrams.md --- src/pages/blog/abusing-quic-datagrams.md | 97 ++++++++++++++---------- 1 file changed, 56 insertions(+), 41 deletions(-) diff --git a/src/pages/blog/abusing-quic-datagrams.md b/src/pages/blog/abusing-quic-datagrams.md index 0a6b7e8..7dd4a65 100644 --- a/src/pages/blog/abusing-quic-datagrams.md +++ b/src/pages/blog/abusing-quic-datagrams.md @@ -9,37 +9,46 @@ Unless Nintendo is involved. We're not trying to be malicious, but rather unlock new functionality while remaining compliant with the specification. We can do this easily because unlike TCP, QUIC is implemented in *userspace*. That means we can take a QUIC library, tweak a few lines of code, and unlock new functionality that the greybeards *attempted* to keep from us. -We then ship our modified library as part of our application and nobody will suspect a thing. +We ship our modified library as part of our application and nobody will suspect a thing. But before we continue, a disclaimer: ## Dingus Territory QUIC was designed with developers like *you* in mind. -Yes *you*, wearing your favorite "I 💕 node_modules" T-shirt about to rewrite your website again using the Next-est framework released literally seconds ago. +Yes *you*, wearing your favorite "I 💕 node_modules" T-shirt. +I know you're busy, about to rewrite your website (again) using the Next-est framework released literally seconds ago, but hear me out: *You are a dingus*. -The greybeards that designed QUIC, the QUIC libraries, and the related web APIs do not respect you. +The greybeards that designed QUIC, the QUIC libraries, and the corresponding web APIs do not respect you. They think that given a shotgun, the first thing you're going to do is blow your own foot off. And they're right of course. -That's why there is no UDP socket Web API. -WebRTC data channels claim to have "unreliable" messages but don't even get me started. -Heck, there's not even a native TCP API; WebSockets are a close approximation but force a HTTP handshake and additional framing for *reasons*. +At first glace, UDP datagrams appear to be quantum: a superposition of delivered and lost. +We hear somebody say "5% packet loss" and our monkey brain visualizes a coin flip or dice roll. +But the reality is that congestion causes (most) packet loss at *our* level in the network stack. +Sending more packets does NOT let you reroll the dice. +Instead it compounds the packet loss, impacting other flows on the network and Google's ability to make money. -QUIC doesn't change that mentality. +If that was a mind-blowing revelation... that's why there is no UDP API on the web. +Fresh out of a coding bootcamp and you already managed to DDoS an ISP with your poorly written website, huh. +And before you "umh actually" me, WebRTC data channels (SCTP) are congestion controlled and quite flawed, hence why QUIC is a big deal. + +QUIC doesn't change this mentality either. Short of wasting a zero-day exploit, we have to use the browser's built-in QUIC library via the WebTransport API. Browser vendors like Google don't want *you*, the `node_modules` enjoyer, doing any of the stuff mentioned in this article on *their* web clients. But that's not going to stop us from modifying the server or the clients we control (ex. native app). However, in doing so, you must constantly evaluate if you are the *dingus* in the exchange. -QUIC famously does not let you disable encryption because otherwise *some dingus* would disable it because they think it make computer go slow (it doesn't) and oh no now North Korea has some more bitcoins. -So many people believe that encryption is slow, but once you actually benchmark AES-GCM, it turns out that so many people are *the dingus*. +QUIC infamously does not let you disable encryption because otherwise *some dingus* would disable it because they think it make computer go slow (it doesn't) and oh no now North Korea has some more bitcoins. +Is this a case of the nanny state preventing me from using lead pipes? Yes. +Is AES-GCM slow and worth disabling? Absolutely not, profile your application and you'll find everything else, including sending UDP packets, takes significantly more CPU cycles. -Yes, there are legitimate use-cases where a full TLS handshake is not worth it. -But when the safe API is the best API 99% of the time, then it becomes the only API. +When the safe API is the best API 99% of the time, then it becomes the only API. This article is the equivalent of using the `unsafe` keyword in Rust. -You can do these things and you'll super feel smart, but are you smart? +If you know what you're doing, then you can make the world a slightly better place. +And feel really smart, let's be honest that's why you're here. +But if you mess up, you wasted a ton of time for a worse product. So heed my warnings. Friends don't let friends design UDP protocols. @@ -53,22 +62,21 @@ Bold statements like "they are bait", "never use datagrams", and "try using QUI But some developers don't want to know the truth and just want their beloved UDP datagrams. The problem is that QUIC datagrams are *not* UDP datagrams. -That would be something closer to DTLS. -Unlike UDP datagrams, QUIC datagrams: +QUIC datagrams: 1. Are congestion controlled. 2. Trigger acknowledgements. 3. Do not expose these acknowledgements. 4. May be batched. We're going to try to fix some of these short-comings in the standard by modifying a standard QUIC library. -Be warned though, you're entering dingus territory. + ### Congestion Control The truth is that there's nothing stopping a QUIC library from sending an unlimited number of QUIC datagrams. There's only a few pixels in the standard that say you SHOULD NOT do this. -"Congestion Control" is what a library SHOULD do. -There's many congestion control algorithms out there, and put simply they are little more than an educated guess on if the network can handle more traffic. +"Congestion Control" is what a library SHOULD do instead. +There's many congestion control algorithms out there, and put simply they are little more than an educated guess if the network can handle more traffic. The simplest form of congestion control is to send less data when packet loss is high and send more data when packet loss is low. But this is an artifical limit that is begging to be broken. @@ -87,31 +95,37 @@ Routers will queue and eventually drop excess packets, wrecking any algorithm th But it's no fun starting a blog with back to back lectures. We're here to abuse QUIC damnit, not the readers. -But I did not say that you should use the *default* congestion control. -The QUIC RFC is based on the dated TCP New Reno algorithm which performs poorly when latency is important or bufferbloat rampant. -That's because QUIC is designed with pluggable congestion control in mind. +But I did not say that you need to use the *default* congestion control. +The QUIC RFC outlines the dated TCP New Reno algorithm which performs poorly when latency is important or bufferbloat rampant. +But that's not set in stone, QUIC expects pluggable congestion control and is descriptive, not prescriptive. Most libraries expose an interface so you choose the congestion controller or make your own. -You can't do this with TCP as it's buried inside the kernel, so +1 points to QUIC. +You can't do this with TCP as it's buried inside the kernel, so +1 points to Quicendor (good job 'Arry). And note that custom congestion control is not specific to QUIC datagrams. You can use a custom congestion controller for QUIC streams too! +They share the same connection so you can even prioritize/reserve any available bandwidth. -So use a battle tested algorithm like BBR instead of the default. +If the default Reno congestion controller is giving you hives, get that checked out and then give BBR a try. +It works much better in bufferbloat scenarios and powers 99% of HTTP traffic at this point. Or make your own by piping each ACK to ChatGPT and let the funding roll in. Or be extra boring and write a master's thesis about curve fitting or something -Or completely disable congestion control altogether, you do you. +Or completely disable congestion control altogether, I can't stop you. ### Acknowledgements -QUIC will reply with an acknowledgement packet are receiving each datagram. +QUIC will reply with an acknowledgement packet after receiving each datagram. This might sound absolutely bonkers if you're used to UDP. -These are **not** used for retransmissions, so what are they for? +Why is my unreliable protocol telling me when its unreliable? +The sacrilege! + +Can you take a quess why? +Why go through the trouble of designing an API that looks like UDP only to twist the knife? +These acknowledgements are **not** used for retransmissions... they're only for congestion control. -...they're only for congestion control. But what if we just disabled QUIC's congestion control? Now we're going to get bombarded with useless acknowledgements! The good news is that QUIC acknowledgements are batched, potentially appended to your data packets, and are quite efficient. -It's only a few extra bytes/packets so it's not the end of the world. +It's only a few extra bytes/packets so step 1: get over it. But I can already feel your angst; your uncontrollable urge to optimize this *wasted bandwidth*. The most cleverest of dinguses amongst us (amogus?) might try to leverage these ACKs. @@ -121,16 +135,17 @@ Somebody call Nobel, that's a dynamite idea. You can absolutely hack a QUIC library to expose which datagrams were acknowledged by the remote. More information is always better, right? -Why don't QUIC libraries provide this? -...unfortunately there's an edge case discovered by yours truely. +...unfortunately there's an edge case [discovered by yours truely](https://github.com/quicwg/datagram/issues/15). The QUIC may acknowledge a datagram but it gets dropped before being delivered it to the application. -This will happen if a QUIC library processes a packet, sends an ACK, but the application (ex. Javascript web page) is too slow and we run out of memory to queue it. +This will happen if a QUIC library processes a packet, sends an ACK, but the application (ex. Javascript web page) is too slow and we run out of memory before processing it. **Note:** QUIC streams don't suffer from this issue because they use flow control. -This might not be a deal breaker depending on your application. -However, it's a quite unfortunate footgun if you're expecting QUIC ACKs to be the definitive signal that your application has received a packet. -If you want that reliability... then you should implement your own ACK/NACK protocol on top of QUIC datagrams. +This might not be a deal breaker depending on your application because it introduces false-positives. +They can't be treated as a definitive signal that a packet was processed. +If you need this reassurance then switch to QUIC streams or (*gasp*) implement your own ACKs/NACKs on top of QUIC datagrams. + +But not only is it more work to implement your own ACKs, the underlying QUIC ACKs will still occur. This is gross because one datagram will trigger a QUIC ACK, your custom ACK, and a QUIC ACK for your custom ACK. I bet the angst is overwhelming now. @@ -140,31 +155,31 @@ This is a parameter exchanged during the handshake that indicates how long the i Crank it up to 1000ms (the default is 25ms) and the number of acknowledgement packets should slow to a trickle. Be warned that this will impact all QUIC frames, *especially* STREAM retransmissions. -It may also throw a wrench into the congestion controller too as they expect timely feedback.. +It may also throw a wrench into the congestion controller too as they expect timely feedback. So only consider this route if you've gone *full dingus* and completely disabled congestion control and streams. The chaos you've sown will be legendary. ### Batching Most QUIC libraries will automatically fill a UDP packet with as much data as it can. -This is dope, but as we established, you can't settle for nice things. -We're here to strip a transport protocol to its core. +This is dope, but as we established, you can't settle for *dope*. +You don't get out of bed in the morning for *dope*, it needs to be at least *rad* or 🚀. -But why would you do this? +But why disable batching? Let's you're high on the thrill of sending unlimited packets after disabling congestion control. However, sometimes a bunch of packets get lost and you need to figure out why. Surely it can't be the consequences of your actions? "No! It's the network's fault! -I'm going to shotgun additional copies to ensure at least one arrives..." +I'm going to send additional copies to ensure at least one arrives..." I cringed a bit writing that (while you cringed reading this blog). -See, I've sat through too presentations by staff (video) engineers claiming the same thing. -FEC is the solution to a problem that they don't understand. +See, I've sat through too presentations by principal engineers claiming the same thing. +FEC is the solution to a problem, but a different problem. It turns out there's no secret cheat code to the internet: sending more packets will cause proportially *more* packet loss as devices get fully saturated. But we're going to save that for the finale rant and instead focus on **atomicity**. -Packet loss instinctively feels like an independent event: a coin toss on a router somewhere. +Like I said earlier, packet loss instinctively feels like an independent event: a coin toss on a router somewhere. Sending the same packet back-to-back means you get to toss the coin again, right? Not really, because a "packet" is a high level abstraction. From ef10cf2ef9e675dc17bc51e602471c422730d1b7 Mon Sep 17 00:00:00 2001 From: kixelated Date: Tue, 8 Apr 2025 10:23:47 -0700 Subject: [PATCH 16/19] Update abusing-quic-datagrams.md --- src/pages/blog/abusing-quic-datagrams.md | 306 +++-------------------- 1 file changed, 35 insertions(+), 271 deletions(-) diff --git a/src/pages/blog/abusing-quic-datagrams.md b/src/pages/blog/abusing-quic-datagrams.md index 7dd4a65..65069a0 100644 --- a/src/pages/blog/abusing-quic-datagrams.md +++ b/src/pages/blog/abusing-quic-datagrams.md @@ -46,8 +46,7 @@ Is AES-GCM slow and worth disabling? Absolutely not, profile your application an When the safe API is the best API 99% of the time, then it becomes the only API. This article is the equivalent of using the `unsafe` keyword in Rust. -If you know what you're doing, then you can make the world a slightly better place. -And feel really smart, let's be honest that's why you're here. +If you know what you're doing, then you can make the world a slightly better place (but mostly feel really smart). But if you mess up, you wasted a ton of time for a worse product. So heed my warnings. @@ -173,288 +172,53 @@ Surely it can't be the consequences of your actions? It's the network's fault! I'm going to send additional copies to ensure at least one arrives..." -I cringed a bit writing that (while you cringed reading this blog). +I cringed a bit writing that. +Not as much as you've cringed while reading this blog, but a close 🥈. See, I've sat through too presentations by principal engineers claiming the same thing. -FEC is the solution to a problem, but a different problem. It turns out there's no secret cheat code to the internet: sending more packets will cause proportially *more* packet loss as devices get fully saturated. -But we're going to save that for the finale rant and instead focus on **atomicity**. -Like I said earlier, packet loss instinctively feels like an independent event: a coin toss on a router somewhere. -Sending the same packet back-to-back means you get to toss the coin again, right? - -Not really, because a "packet" is a high level abstraction. -Even the humble UDP packet will get coalesced at a lower level with it's own recovery scheme. -For example, 7 UDP packets (1.2KB MTU*) can fit snug into a jumbo Ethernet frame. -If your protocol depends on "independent" packets, then you may be distraught to learn that they are actually somewhat fate-bound and may be dropped in batches. - -QUIC takes this a step further and batches everything, including datagrams. -You may send ten, 100 byte datagrams that appear disjoint in your application but may secretly get combined into one UDP datagram under the covers. -You're at the mercy of the QUIC library, which is at the mercy of the lower level transport. - -Fortunately, QUIC cannot split a datagram across multiple UDP packets. -If your datagrams are large enough (>600 bytes*) then you can sleep easy knowing they won't get combined with other datagrams. -But just like everything else thus far, we can disable this behavior entirely. -Tweak a few lines of code and boop, you're sending a proper UDP packet for each QUIC datagram. - -I'm not sure why you would, because it can only worsen performance, but I'm here (to pretend) not to judge. -Your brain used to be smooth but now it's wrinkly af 🧠🔥. - -## Real-time Streams -Look, I may be one of the biggest QUIC fanboys on the planet, but I've got to admit that QUIC streams are pretty poor for real-time latency. -They're designed for not designed for bulk delivery, not small payloads that need to arrive ASAP like voice calls. -It's the reason why the dinguses reach for datagrams. - -But don't take my wrinkle brain statements as fact. -Let's dive deeper and FIX IT. - -### Detecting Loss -``` -| |i -|| |_ -``` - -*How does a QUIC library know when a packet is lost?* - -It doesn't. -There's no explicit signal from routers (yet?) when a packet is lost. -A QUIC library has to instead use FACTS and LOGIC to make an educated guess. -The RFC outlines a *recommended* algorithm that I'll attempt to simplify: - -- The sender increments a sequence number for each packet. -- Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. -- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver and hopefully start the ACK timer. -- After finally receiving an ACK, the sender *may* decide that a packet was lost if: - - 3 newer sequences were ACKed. - - or a multiple of the RTT has elapsed. -- As the congestion controller allows, retransmit any lost packets and repeat. - -Skipped that boring, "simplified" wall of text? -I don't blame you. -You're just here for the funny blog and *maaaaybe* learn something along the way. - -I'll help. -If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. -It's particularly bad for the last few packets in a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke. -"You still alive over there?". -The tail of our stream will take longer (on average) to arrive unless there's other data in flight to perform this poking. - -And just in case I lost you in the acronym soup, RTT is just another way of saying "your ping". -So if you're playing Counter Strike cross-continent with a ping of 150ms, you're already at a disadvantage. -Throw QUIC into the mix and some packets will take 300ms to 450ms of conservative retransmissions. -*cyka bylat* - - -### Head-of-line Blocking -We're not done yet. -QUIC streams are also poor for real-time because they introduce head-of-line blocking. - -Let's suppose we want to stream real-time chat over QUIC. -But we're super latency sensitive, like it's a bad rash, and need the latest sentence as soon as possible. -We can't settle for random words; we need the full thing in order baby. -The itch is absolutely unbearable and we're okay being a little bit wasteful. - -If the broadcaster types "hello" followed shortly by "world", we have a few options. -Pop quiz, which approach is subjectively the best: - -Option A: Create a stream, write the word "hello", then later write "world". -Option B: Create a stream and write "0hello". Later, create another stream and write "5world". The number at the start is the offset. -Option C: Create a stream and write "hello". Later, create another stream and write "helloworld". -Option D: Abuse QUIC (no spoilers) - -If you answered D then you're correct. -Let's use a red pen and explain why the other students are failing the exam. -No cushy software engineering gig for you. - -#### Option A -*Create a stream, write the word "hello", then later write "world".* - -This is classic head-of-line blocking. If the packet containing "hello" gets lost over the network, then we can't actually use the "world" message if it arrives first. -But that's okay in this scenario because of my arbitrary rules - -The real problem is that when the "hello" packet is lost, it won't arrive for *at least* an RTT after "world" because of the affirmationed retransmission logic. -That's no good. - -#### Option B -*Create a stream and write "0hello". Later, create another stream and write "5world". The number at the start is the offset.* - -I didn't explain how multiple streams work because a bad teacher blames their students. -And I wanted to blame you. - - -QUIC will retransmit any unacknowledged fragments of a stream. -But like I said above, only when a packet is considered lost. -But with the power of h4cks, we could have the QUIC library *assume* the rest of the stream is lost and needs to be retransmitted. -For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. - -### BBR - -## Improper QUIC Streams -Okay so we've hacked QUIC datagrams to pieces, but why? - -I was actually inspired to write this blog post because someone joined my (dope) Discord server. -They asked if they could do all of the above so they would have proper QUIC datagrams. - -The use-game is vidya games. -The most common approach for video games is to process game state at a constant "tick" rate. -VALORANT, for example, uses a tick rate of [128 Hz](https://playvalorant.com/en-us/news/dev/how-we-got-to-the-best-performing-valorant-servers-since-launch/) meaning each update covers a 7.8ms period. -It's really not too difficult from frame rate (my vidya background) but latency is more crucial otherwise nerds get mad. - -So the idea is to transmit each game tick as a QUIC datagram. -However, that would involve transmitting a lot of redundant information, as two game ticks may be very similar to each other. -So the idea is to implement custom acknowledgements and (re)transmit only the unacknowledged deltas. - -If you've had the pleasure of implementing QUIC before, this might sound very similar to how QUIC streams work internally. -In fact, this line of thinking is what lead me to ditch RTP over QUIC (datagrams) and embrace Media over QUIC (streams). -So why disassemble QUIC only to reassemble parts of it? -If we used QUIC streams instead, what more could you want? - -### What More Could We Want? -Look, I may be one of the biggest QUIC fanboys, but I've got to admit that QUIC is pretty poor for real-time latency. -It's not designed for small payloads that need to arrive ASAP, like voice calls. - -But don't take my wrinkle brain statements as fact. -Let's dive deeper. - -*How does a QUIC library know when a packet is lost?* - -It doesn't. -There's no explicit signal from routers (yet?) when a packet is lost. -A QUIC library has to instead use FACTS and LOGIC to make an educated guess. -The RFC outlines a *recommended* algorithm that I'll attempt to simplify: - -- The sender increments a sequence number for each packet. -- Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. -- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver and hopefully start the ACK timer. -- After finally receiving an ACK, the sender *may* decide that a packet was lost if: - - 3 newer sequences were ACKed. - - or a multiple of the RTT has elapsed. -- As the congestion controller allows, retransmit any lost packets and repeat. - -Skipped that boring wall of text? -I don't blame you. -You're just here for the funny blog and *maaaaybe* learn something along the way. - -I'll help. -If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. -It's particularly bad for the last few packets in a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke. -"You still alive over there?" - -And just in case I lost you in the acronym soup, RTT is just another way of saying "your ping". -So if you're playing Counter Strike cross-continent with a ping of 150ms, you're already at a disadvantage. -Throw QUIC into the mix and some packets will take 300ms to 450ms of conservative retransmissions. -*cyka bylat* - -### We Can Do Better? -So how can we make QUIC better support real-time applications that can't wait multiple round trips? -Should we give up and admit that networking peaked with UDP? - -Of course not what a dumb rhetorical question. - -We can use QUIC streams! -A QUIC stream is nothing more than a byte slice. -The stream is arbitrarily split into STREAM frames that consists of an offset and a payload. -The QUIC reviever reassembles these frames in order before flushing to the application. -Some QUIC libraries even allow the application to read stream chunks out of order. - -How do we use QUIC streams for vidya games? -Let's suppose we start with a base game state of 1000 bytes and each tick there's an update of 100 bytes. -We make a new stream, serialize the base game state, and constantly append each update. -QUIC will ensure that the update and deltas arrive in the intended order so it's super easy to parse. - -But not so fast, there's a **huge** issue. -We just implemented head-of-line blocking and our protocol is suddenly no better than TCP! -I was promised that QUIC was supposed to fix this... - - - -We can abuse the fact that a QUIC receiver must be prepared to accept duplicate or redundant STREAM frames. -This can happen naturally if a packet is lost or arrives out of order. -You might see where this is going: nothing can stop us from sending a boatload of packets. - - - - -Our QUIC library does not need to wait for a (negative) acknowledgement before retransmitting a stream chunk. -We could just send it again, and again, and again every 50ms. -If it's a duplicate, then QUIC will silently ignore it. - -## A Problem - -But there's a pretty major issue with this approach: -**BUFFERBLOAT**. -Surprise! -It turns out that some routers may queue packets for an undisclosed amount of time when overloaded. - -Let's say you retransmit every 50ms and everything works great on your PC. -A user from Brazil or India downloads your application and it initially works great too. -But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. -...well now you're transmitting 10x the data, potentially aggravating any congestion and preventing recovery. -It's a vicious loop and you've basically built your own DDoS agent. - -For the distributed engineers amogus, this is the networking equivalent of an F5 refresh storm. - - -Either way, sending redundant copies of data is nothing new. -Let's go a step further and embrace QUIC streams. - -### How I Learned to Embrace the Stream - - -At the end of the day, a QUIC STREAM frame is a byte offset and payload. -Let's say we transmit our game state as STREAM 0-230 and 33ms later we transmit 20 bytes of deltas as STREAM 230-250. -If the original STREAM frame is lost, well even if we receive those 20 bytes, we can't actually decode them and suffer from HEAD-OF-LINE blocking. - -My game dev friend thinks this is unacceptable and made his own ACK-based algorithm on top of QUIC datagrams instead. -The sender ticks every 30ms and sends a delta from the last acknowledged state, even if that data might be in-flight already. -Pretty cool right? -Why doesn't QUIC do this? - -It does. - -(mind blown) - -QUIC will retransmit any unacknowledged fragments of a stream. -But like I said above, only when a packet is considered lost. -But with the power of h4cks, we could have the QUIC library *assume* the rest of the stream is lost and needs to be retransmitted. -For you library maintainers out there, consider adding this as a `stream.retransmit()` method and feel free to forge my username into the git commit. - -So to continue our example above, we can modify QUIC to send byte offsets 0-250 instead of just 230-250. -And now we can accomplish the exact* same behavior as the game dev but without custom acknowledgements, retransmissions, deltas, and reassembly buffers. +FEC is the solution to a problem, but a different problem. +I already wrote Never* Use Datagrams and you should read that. +Instead, we're going to focus on **atomicity**. -Forking a library feels *so dirty* but it magically works. +Like I said earlier, packet loss instinctively feels like an independent event: a coin toss on a router somewhere. +But sending the same packet back-to-back does *not* mean you get a second flip of the coin. +An IP packet is actually quite a high level abstraction. +Our payload of data has to somehow get serialized into a physical transmission and that's the job of a lower level protocol. +For example, 7 IP packets (1.2KB MTU*) can fit snug into a jumbo Ethernet frame. +These frames then get sliced into different dimensions, be it time or frequency or whatever, as they traverse an underlying medium. +A protocol like Wifi will automatically apply redundancy and even retransmissions based on the properties of the medium. +And let's not forget intermediate routers because they will batch packets too, it's just more efficient. -### Some Caveats -Okay it's not the same as the game dev solution; it's actually better because of **congestion control**. -And once again, you do ~need~ want congestion control. +So if your protocol depends on "independent" packets, then you will be distraught to learn that no such thing exists. +Packets can (and will) be dropped in batches despite your best efforts to avoid batching. +That's why QUIC goes the other direction and batches everything, including datagrams. +An application may appear to send ten disjoint datagrams but under the hood, they may get secretly combined into one UDP datagram to avoid redundant headers. +If not QUIC, then another layer would perform (less efficient) batching. +The ratio of lectures to hacks is approaching dangerous levels. +Fuck it, lets disable batching. -Let's say you retransmit every 30ms and everything works great on your PC. -A user from Brazil or India downloads your application and it initially works great too. -But eventually their ISP gets overwhelmed and congestion causes the RTT to (temporarily) increase to 500ms. -...well now you're transmitting 15x the data and further aggravating any congestion. -It's a vicious loop and you've basically built your own DDoS agent. +If you control the QUIC library, one snip and you can short-circuit the batching. +Each QUIC datagram is now a UDP packet, hazzah! +The library should still perform *some* batching and append stuff like ACKs to packets. +Please have mercy and don't require separate UDP packets for our ill-fated ACK friends. -But QUIC can avoid this issue because retransmissions are gated by congestion control. -Even when a packet is considered lost, or my hypothetical `stream.retransmit()` is called, a QUIC library won't immediately retransmit. -Instead, retransmissions are queued up until the congestion controller deems it appropriate. -Note that a late acknowledgement or stream reset will cancel a queued retransmission (unless your QUIC library sucks). +But even if you don't control the QUIC library (ex. browser), you can abuse the fact that QUIC cannot split a datagram across multiple packets. +If your datagrams are large enough (>600 bytes*) then you can sleep easy knowing they won't get combined. +...unless the QUIC library supports MTU discovery, because while the minimum MTU is 1.2KB, the maximum is 64KB. -Why? -If the network is fully saturated, you need to send fewer packets to drain any network queues, not more. -Even ignoring bufferbloat, networks are finite resources and blind retransmissions are the easiest way to join the UDP Wall of Shame. -In this instance,.the QUIC greybeards will stop you from doing bad thing. -The children yearn for the mines, but the adults yearn for child protection laws. +I'm not sure why you would disable batching because it can only worsen performance, but I'm here (to pretend) not to judge. +Your brain used to be smooth but now it's wrinkly af 🧠🔥. -Under extreme congestion, or when temporarily offline, the backlog of queued data will keep growing and growing. -Once the size of queued delta updates grows larger than the size of a new snapshot, cut your losses and start over. -Reset the stream with deltas to prevent new transmissions and create a new stream with the snapshot. -Repeat as needed; it's that easy! +## Conclusion +I know you just want your precious UDP datagrams but they're kept in a locked drawer lest you hurt yourself. +But I've given you the key and it's your turn to prove me right. -I know this horse has already been beaten, battered, and deep fried, but this is yet another benefit of congestion control. -Packets are queued locally so they can be cancelled instantaneously. -Otherwise they would be queued on some intermediate router (ex. for 500ms). +If you want to "hack" QUIC for more constructive purposes, check out my next blog about QUIC streams. +There's actually some changes you could make without incurring self-harm. ## Hack the Library https://github.com/quinn-rs/quinn/blob/6bfd24861e65649a7b00a9a8345273fe1d853a90/quinn-proto/src/frame.rs#L211 From 9b935d23e4b67338853388376c4a7e0efbdfce24 Mon Sep 17 00:00:00 2001 From: kixelated Date: Fri, 18 Apr 2025 17:59:50 -0700 Subject: [PATCH 17/19] Update abusing-quic-streams.md --- src/pages/blog/abusing-quic-streams.md | 46 ++++++++++++++++++++++---- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/src/pages/blog/abusing-quic-streams.md b/src/pages/blog/abusing-quic-streams.md index de3e1d3..104b4bc 100644 --- a/src/pages/blog/abusing-quic-streams.md +++ b/src/pages/blog/abusing-quic-streams.md @@ -2,9 +2,30 @@ This is the second part of our QUIC hackathon. Read [Abusing QUIC Datagrams](/blog/abusing-quic-datagrams) if byte streams confuse you. -Look, I may be one of the biggest QUIC fanboys on the planet, but I've got to admit that QUIC streams are pretty poor for real-time latency. -They're designed for not designed for bulk delivery, not small payloads that need to arrive ASAP like voice calls. -It's the reason why the dinguses reach for datagrams. +Look, I may be one of the biggest QUIC fanboys on the planet. +I'm ashamed to admit that QUIC streams are meh for real-time latency. + +I know, I know, I just spent the last blog post chastising you, the `I <3 node_modules` developer, for daring to dream. +For daring to vibe code. + + +## QUIC 101 +It's by design: QUIC streams trickle. + + + + + +Thi +ou could reach for datagrams like a dingus + +QUIC streams are designed to trickle, relying on retransmissions to *eventually* patch any holes caused by packet loss. +The key + +QUIC datagrams are the intended alternative, but as I outlined in my last blog post, they're bait for dinguses. + +But what if I told you that we can abuse QUIC streams to better achieve real-time latency? + But don't take my wrinkle brain statements as fact. Let's dive deeper and FIX IT. @@ -15,11 +36,22 @@ Let's dive deeper and FIX IT. || |_ ``` -*How does a QUIC library know when a packet is lost?* +QUIC streams are continuous byte streams that rely on retransmissions to *eventually* patch any holes caused by packet loss. +The key word being *eventually*, as QUIC won't waste bandwidth on retransmissions unless theyre needed. + +**Pop quiz:** +*How does a QUIC library know when a packet is lost and needs to be retransmitted?* + +**Answer**: +Trick question, it doesn't. + +A pop quiz this early into a blog post? +AND a trick question? +That's not fair. -It doesn't. -There's no explicit signal from routers (yet?) when a packet is lost. -A QUIC library has to instead use FACTS and LOGIC to make an educated guess. +There's no explicit signal from routers when a packet is lost. +L4S might change that on some networks but I wouldn't get your hopes up. +Instead, a QUIC library has to instead use FACTS and LOGIC to make an educated guess. The RFC outlines a *recommended* algorithm that I'll attempt to simplify: - The sender increments a sequence number for each packet. From 30944faf9ba421f022d54c91d4b3eae63dfed232 Mon Sep 17 00:00:00 2001 From: kixelated Date: Wed, 23 Apr 2025 10:02:45 -0700 Subject: [PATCH 18/19] Update abusing-quic-streams.md --- src/pages/blog/abusing-quic-streams.md | 47 ++++++++++++++++++++------ 1 file changed, 37 insertions(+), 10 deletions(-) diff --git a/src/pages/blog/abusing-quic-streams.md b/src/pages/blog/abusing-quic-streams.md index 104b4bc..9e8c4d2 100644 --- a/src/pages/blog/abusing-quic-streams.md +++ b/src/pages/blog/abusing-quic-streams.md @@ -6,29 +6,52 @@ Look, I may be one of the biggest QUIC fanboys on the planet. I'm ashamed to admit that QUIC streams are meh for real-time latency. I know, I know, I just spent the last blog post chastising you, the `I <3 node_modules` developer, for daring to dream. +For daring to send individual IP packets without a higher level abstraction. For daring to vibe code. +Fret not because we can "fix" QUIC streams with some clever library tweaks. +It's more work and more wasted bytes but that's basically our job as programmers, right? +A small price to pay. + ## QUIC 101 -It's by design: QUIC streams trickle. +QUIC streams trickle. +Just like TCP, all of the data written to a QUIC steam will eventually arrive (unless cancelled). +If there's packet loss, retransmissions will *eventually* patch any holes. +If there's poor network conditions, congestion control will slow down the send rate until it *eventually* recovers. +If the receiver is CPU starved, flow control will pause transmissions until they *eventually* recover. +Like the DMV, QUIC steams are little more than queues. +You write data to the end while the QUIC library transmits packet-sized chunks from the front. +In the business we call this a FIFO. +But what happens when you have royal data that must arrive ASAP? +If we write to the end of a stream, then it might get queued behind peasant data that is blocked for whatever reason. +This called "head-of-line" blocking and it has plagued TCP since its inception. +Enter Robespierre. +QUIC is a huge advancement over TCP because it offers off-with-the-head-of-line blocking. +Instead of appending to an existing stream, we can open a new stream and (optionally) reset the existing stream. +Our royal stream can be marked highest priority while the peasant stream can be sent to the guillotine. -Thi -ou could reach for datagrams like a dingus +Yes I know the royals were actually guillotined first so the French Revolution wasn't the best analogy. +But look eventually every Frenchman with a history got the choppy choppy so you can treat it as a FILO queue. +But unlike the French Revolution, there's no cost for creating or cancelling a QUIC stream. -QUIC streams are designed to trickle, relying on retransmissions to *eventually* patch any holes caused by packet loss. -The key +So if you don't care about ordering, make a new QUIC stream for each message. +If newer messages are more important, then deprioritize or cancel older streams. +This sounds easy, so what's the problem? -QUIC datagrams are the intended alternative, but as I outlined in my last blog post, they're bait for dinguses. +Three words. +Twenty seven words. +Seventy eight* pixels of whitespace: -But what if I told you that we can abuse QUIC streams to better achieve real-time latency? +- Retransmissions +- Delta Encoding +/* I didn't actually count, direct your hatemail to @kixelated on Discord. -But don't take my wrinkle brain statements as fact. -Let's dive deeper and FIX IT. ## Detecting Loss ``` @@ -37,7 +60,11 @@ Let's dive deeper and FIX IT. ``` QUIC streams are continuous byte streams that rely on retransmissions to *eventually* patch any holes caused by packet loss. -The key word being *eventually*, as QUIC won't waste bandwidth on retransmissions unless theyre needed. +I keep putting *eventually* in *italics* but have yet to explain why. + +QUIC was primarily designed for HTTP/3 and bulk data transfers. +The average transfer speed matters the most when you're downloading `porn.zip`. +In order to achieve that, QUIC and TCP won't waste bandwidth on retransmitting the same data again unless it's absolutely needed. **Pop quiz:** *How does a QUIC library know when a packet is lost and needs to be retransmitted?* From c13c038873af54e9488b0552734a96bca1c827c4 Mon Sep 17 00:00:00 2001 From: kixelated Date: Tue, 29 Apr 2025 10:34:04 -0700 Subject: [PATCH 19/19] Update abusing-quic-streams.md --- src/pages/blog/abusing-quic-streams.md | 78 ++++++++++++++++++-------- 1 file changed, 55 insertions(+), 23 deletions(-) diff --git a/src/pages/blog/abusing-quic-streams.md b/src/pages/blog/abusing-quic-streams.md index 9e8c4d2..22a7259 100644 --- a/src/pages/blog/abusing-quic-streams.md +++ b/src/pages/blog/abusing-quic-streams.md @@ -43,14 +43,8 @@ So if you don't care about ordering, make a new QUIC stream for each message. If newer messages are more important, then deprioritize or cancel older streams. This sounds easy, so what's the problem? -Three words. -Twenty seven words. -Seventy eight* pixels of whitespace: - -- Retransmissions -- Delta Encoding - -/* I didn't actually count, direct your hatemail to @kixelated on Discord. +Come with me, little one. +We're going on an adventure. ## Detecting Loss @@ -60,10 +54,10 @@ Seventy eight* pixels of whitespace: ``` QUIC streams are continuous byte streams that rely on retransmissions to *eventually* patch any holes caused by packet loss. -I keep putting *eventually* in *italics* but have yet to explain why. +I keep putting *eventually* in *italics* and now it's finally time to explain why. QUIC was primarily designed for HTTP/3 and bulk data transfers. -The average transfer speed matters the most when you're downloading `porn.zip`. +The average transfer speed matters the most when you're downloading `pron.zip`. In order to achieve that, QUIC and TCP won't waste bandwidth on retransmitting the same data again unless it's absolutely needed. **Pop quiz:** @@ -73,7 +67,7 @@ In order to achieve that, QUIC and TCP won't waste bandwidth on retransmitting t Trick question, it doesn't. A pop quiz this early into a blog post? -AND a trick question? +AND it's a trick question? That's not fair. There's no explicit signal from routers when a packet is lost. @@ -82,33 +76,71 @@ Instead, a QUIC library has to instead use FACTS and LOGIC to make an educated g The RFC outlines a *recommended* algorithm that I'll attempt to simplify: - The sender increments a sequence number for each packet. -- Upon receiving a packet, the receiver will start a timer to ACK that sequence number, batching with any others that arrive within `max_ack_delay`. -- If the sender does not receive an ACK after waiting multiple RTTs, it will send another packet (like a PING) to poke the receiver and hopefully start the ACK timer. -- After finally receiving an ACK, the sender *may* decide that a packet was lost if: +- Upon receiving a packet, the receiver will start a timer. Once this timer expires, it will ACK that sequence number and any others that arrive in the meantime. +- If the sender does not receive an ACK after waiting multiple RTTs, either the original packet or the ACK probably got lost. +- The sender will poke the receiver by sending another packet (potentially a 1-byte PING) to have them send another ACK. +- Eventually the poke works and the sender receives an ACK indicating which packets were received. +- **FINALLY** the sender *may* decide that a packet was lost if: - 3 newer sequences were ACKed. - or a multiple of the RTT has elapsed. - As the congestion controller allows, retransmit any lost packets and repeat. Skipped that boring, "simplified" wall of text? I don't blame you. -You're just here for the funny blog and *maaaaybe* learn something along the way. +You're just here for the funny blog. I'll help. -If a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. -It's particularly bad for the last few packets in a burst because if they're lost, nothing starts the acknowledgement timer and the sender will have to poke. -"You still alive over there?". -The tail of our stream will take longer (on average) to arrive unless there's other data in flight to perform this poking. +What this means is that if a packet is lost, it takes anywhere from 1-3 RTTs to detect the loss and retransmit. +If you're sending a lot of data, then it's closer to 1RTT because new packets indirectly trigger ACKs for lost packets +But if you're sending a tiiiiny amount of data, and for the last packet in a burst, then it takes closer to 3RTT to recover. And just in case I lost you in the acronym soup, RTT is just another way of saying "your ping". So if you're playing Counter Strike cross-continent with a ping of 150ms, you're already at a disadvantage. -Throw QUIC into the mix and some packets will take 300ms to 450ms of conservative retransmissions. +Throw QUIC into the mix and some packets will take 300ms to 450ms. *cyka bylat* -## Head-of-line Blocking -We're not done with packet loss yet, but let's put an `!Unpin` in it. +## Delta Encoding +We're not done with retransmissions yet, but let's put an `!Unpin` in it. + +If you're sending time series data over the Internet, there's a good chance that it can benefit from delta encoding. +The most common example is, *checks notes*, video encoding. +Wow that's + +Video encoding works by creating a base image called a "keyframe" (or I-frame) that is not too dissimilar from a PNG or JPEG. +Doing this for every frame requires a lot of data which is why animated "GIFs" used to look so ass. +Video encoding instead abuses the fact that most frames of a video are very similar and primarily encodes frames as deltas of previous (and sometimes future!) frames. + +Delta encoding significantly lowers the bitrate because it removes redundancies. +However it introduces dependencies, as packets now depend on previous packets otherwise the data is corrupt and causes trippy effects. +Some encodings are self-healing like Opus (audio), while other encodings have to start over with a new base (video). + +Streams (TCP, QUIC, Unix, etc) are great for delta encoding because they guarantee data arrives and in order. +The application can reference a previous byte range without worrying about pesky holes. +It's like a new pair of underwear, you can just put it on. + +But new underwear is not the most efficient. +If you're in a hurry, you grab whatever you can find and hope the holes aren't in unfortunate places. +There's no time to order new underwear; your rock it. +The user experience will suffer but Sonic gotta go fast. + +If you're rich, you can buy extra pairs to make your underwear redundant. +Maybe you get unlucky and grab a holey pair, but there's a fresh pair stapled to it just for such an unfortunate occasion. +Yeah it's more expensive because of tariffs but Scrooge gotta McDuck. + +QUIC streams take the slow approach. +If QUIC finds a hole in the metaphorical underwear, it will order a hole-sized patch and sow it on. +It gives you a good experience while wasting the least amount of pricy cotton. + +But we're smarter than *default behavior*. +We can modify a QUIC sender so there's some urgency. +The next shipment of underwear comes with a bag of patches for the previous shipment of underwear. + + +QUIC streams are absolutely built for delta encoding as they deliver data reliably and in order. +There's no holes in this pristine data stream. -QUIC streams are also poor for real-time because they introduce head-of-line blocking. +However, this causes a problem for real-time latency, as packets become dependent on each other. Let's suppose we want to stream real-time chat over QUIC. But we're super latency sensitive, like it's a bad rash, and need the latest sentence as soon as possible.