fix(noq): do a best-effort transmit on connection close#625
Conversation
matheus23
left a comment
There was a problem hiding this comment.
Thank you for looking into this @colinmarc :)
Before we put in more work, I need to check back with @flub whether our ideas in n0-computer/iroh#4201 are sound & desirable.
Other than that:
- This needs a test! Ideally one that sets up a
Runtimesimilar to the one we're using in iroh: Most importantly one that uses aTaskTrackerand fairly aggressively closes all tasks (in this case those will be the connection and endpoint drivers). - Rather than driving the transmit in
close, I'd like to drive the transmit only when theConnectionDriveris dropped. Otherwise I worry thedrive_transmitfrom the connection driver and theclosecall can get in each others way. I think there's an invariant that the connection driver is the only thing that callsdrive_transmitand it never misses the consecutive calls to that function. The right place for callingdrive_transmitmight be in theimpl Drop for ConnectionRefright after theconn.implicit_close(..).
No worries, I just wanted to have something concrete to discuss :) This is only half of a workaround for that issue anyway.
That makes sense! Moved it to the drop for |
If we're shutting down ungracefully, we can make a best-effort attempt to write the CONNECTION_CLOSE to the socket and notify the peer. If we're in the middle of graceful shutdown, it doesn't hurt, either.
| // ungracefully. | ||
| let waker = std::task::Waker::noop(); | ||
| let mut cx = std::task::Context::from_waker(waker); | ||
| let _ = conn.drive_transmit(&mut cx); |
There was a problem hiding this comment.
It is kind of tempting to do this. But also, there's very little guarantee. E.g. even if a CONNECTION_CLOSE was scheduled for transmit, it is supposed to respect congestion control and might well not be sent (that could be buggy currently, I recall spotting some CONNECTION_CLOSE packets that did not respect congestion control).
I think at the end of the day when ungraceful things happen the remote has to deal with things just timing out. I think the edge-cases for which this would help are not really worth this extra complexity. I'd rather people discover these problems faster in tests if their code is genuinely misbehaving.
So I'm tempted to close this without merging.
There was a problem hiding this comment.
Yeah - I also ran some tests to see if this would actually help.
The idea was to make graceful close work when you just drop(endpoint), but keep the process alive a bit.
I didn't dive into the details of why it didn't quite work, but I didn't get the expected result. At least not when the AsyncUdpSocket implementation is the one it is in iroh (I haven't tested it with a normal UdpSocket in noq).
So yeah - I think this might need a bit more thorough work and looking into how this would actually work.
W.r.t. congestion control I think it is completely fine to not send the CONNECTION_CLOSE if CC doesn't allow it. I expect that in most cases CC will actually allow the close frame, though I admit relying on that behavior is a bad idea.
There was a problem hiding this comment.
To be clear, I wouldn't really expect it to work with anything other than a normal udp socket. One thing I'm not clear on - with iroh, after migration off the relay, isn't that the case?
There was a problem hiding this comment.
yes, once iroh selected the direct path it should behave the same as noq with a normal udp socket as far as this PR is concerned.
Description
If we're shutting down ungracefully, we can make a best-effort attempt to write the CONNECTION_CLOSE to the socket and notify the peer. If we're in the middle of graceful shutdown, it doesn't hurt, either.
This can be taken advantage of by callers to do a quicker shutdown if they know the drain period isn't necessary. Discussion here: n0-computer/iroh#4201