Skip to content

JsonRpcConnection: don't write new messages on shutdown#10221

Merged
Al2Klimov merged 1 commit intomasterfrom
Al2Klimov-patch-7
Nov 29, 2024
Merged

JsonRpcConnection: don't write new messages on shutdown#10221
Al2Klimov merged 1 commit intomasterfrom
Al2Klimov-patch-7

Conversation

@Al2Klimov
Copy link
Member

In fact, this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk). So once the remote signals EOF, don't try to process the remaining queue until write error (which can't be associated with a particular message anyway, due to buffering), but just let the peer go. Flush already half-written messages, though, if possible.

closes #10216
closes #10219
closes #10220

In fact, this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk). So once the remote signals EOF, don't try to process the remaining queue until write error (which can't be associated with a particular message anyway, due to buffering), but just let the peer go. Flush already half-written messages, though, if possible.
@Al2Klimov Al2Klimov added enhancement New feature or request area/distributed Distributed monitoring (master, satellites, clients) labels Nov 7, 2024
@Al2Klimov Al2Klimov added this to the 2.15.0 milestone Nov 7, 2024
@Al2Klimov Al2Klimov requested a review from julianbrost November 7, 2024 16:35
@cla-bot cla-bot bot added the cla/signed label Nov 7, 2024
@Al2Klimov
Copy link
Member Author

@yhabteab
Copy link
Member

You should carefully read the PR description. That PR has nothing to do with this! That PR fixes this bug #10210 (comment) with the last commit.

@Al2Klimov
Copy link
Member Author

This PR indeed doesn't block anything. But yours is yet another argument for this.

@julianbrost
Copy link
Member

this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk)

For me, that part is the most relevant part of the reasoning for this PR. When we say disconnect, it currently stops at an arbitrary point in the still queued messages (i.e. the random position where m_OutgoingMessagesQueue happened to be read last time by the write coroutine). If we say that calling disconnect does so early and discards messages, finishing the current message and then stopping makes sense.

But yours is yet another argument for this.

I don't really see how this relates to #10210. The remaining problem of that PR was that it waited for m_WriterDone without a timeout, so if the write routine (already) blocked in write or flush, it wait for a long time.

try {
for (auto& message : queue) {
if (m_ShuttingDown) {
break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queue can get quite big, can't it? And your connection could be relatively slow. Fast enough that messages make it through, but slow enough to notice it. Now we have the choice:

  • Stop early, so that no message/TLS is half-written (this PR)
  • Try to get something through, risking being interrupted in the middle of a message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is trying to tell me. There's always the possibility that a connection dies somewhere in the middle of a message, so that can always happen. The one thing that would be strange would be to stop somewhere in the middle of a message but then send a TLS close notify (i.e. boost::asio:ssl::stream::async_shutdown() or SSL_shutdown()).

@Al2Klimov Al2Klimov merged commit 8f51f54 into master Nov 29, 2024
@Al2Klimov Al2Klimov deleted the Al2Klimov-patch-7 branch November 29, 2024 08:24
@Al2Klimov Al2Klimov added the backported Fix was included in a bugfix release label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/distributed Distributed monitoring (master, satellites, clients) backported Fix was included in a bugfix release cla/signed enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants