fix(basic_host): stream.Close() blocks indefinitely on unresponsive peers#3448
Merged
fix(basic_host): stream.Close() blocks indefinitely on unresponsive peers#3448
stream.Close() blocks indefinitely on unresponsive peers#3448Conversation
de88179 to
c2adc7a
Compare
This was referenced Jan 7, 2026
Merged
Closed
gammazero
approved these changes
Jan 7, 2026
c2adc7a to
aeab4ba
Compare
MarcoPolo
approved these changes
Jan 7, 2026
Collaborator
|
I made some small changes to move this to a simnet+synctest env so we don't wait 10s for the test. |
…t blocking streamWrapper.Close() can block indefinitely when the remote peer is slow or unresponsive during the multistream-select handshake completion. The lazy multistream protocol negotiation defers reading the handshake response until Close() is called. If the remote peer doesn't respond, the read blocks forever, causing goroutine leaks. This is particularly problematic for bitswap servers where taskWorkers can get stuck trying to close streams after sending blocks. The fix sets a read deadline (using DefaultNegotiationTimeout) before calling the multistream Close(), ensuring the operation will time out rather than block indefinitely. Related: multiformats/go-multistream#47 Related: multiformats/go-multistream#48
aeab4ba to
bcc2bf1
Compare
Contributor
Is the use of |
Collaborator
no. The test is in a new file with a build tag that only runs when testing with go 1.25. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
stream.Close()can block forever when the remote peer is slow or unresponsive, causing goroutine leaks. This particularly affects WebRTC SCTP state machines and bitswap servers, eventually preventing nodes from serving blocks.Note: this PR aims to be a surgical fix, perhaps there is a better way?
Affected Versions
Symptoms
vole bitswap checkreturnsResponded: falseLikely the Root Cause
When protocol is known from identify,
host.NewStream()returns a lazy multistream wrapper that defers handshake completion untilClose():ReadNextToken()reads from the stream with no deadline. If the peer doesn't respond, the goroutine blocks forever.Why This Happens?
Close()Close()must complete multistream handshake before closingThis affects all transports but WebRTC is more prone due to NAT traversal issues and complex SCTP state machines.
Evidence
Goroutine profile from production node (35 days uptime):
sendBlocks→Close()pathProposed Fix (this PR)
Set a read deadline before calling multistream
Close():This uses the existing 10-second
DefaultNegotiationTimeout.Why this location?
Why not fix in go-multistream?
Fix Verification
s.StreamandlazyClientConn.conare the same objectSetReadDeadlineaffects in-progress reads (per Go spec)lazyClientConn.Close()regardlessRelated
collab-cluster-am6-1)Workaround (without this PR)
Restart affected nodes to clear stuck goroutines.