-
Notifications
You must be signed in to change notification settings - Fork 111
fix: resolve indefinite hang in ztls handshake #894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughReworks tlsHandshakeWithTimeout to perform the TLS handshake asynchronously in a separate goroutine and wait on an error channel; returns nil when error equals tls.ErrCertsOnly and returns a timeout-specific error when the context deadline elapses. Adds a test validating client-side handshake timeout behavior. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Handshake as Handshake Goroutine
participant ErrChan as Error Channel
participant CTX as Context
Client->>+Handshake: start goroutine to run tlsConn.Handshake()
Client->>CTX: set deadline / timeout
Handshake->>Handshake: execute Handshake()
alt Handshake completes before timeout
Handshake->>ErrChan: send error (or nil)
Client->>ErrChan: receive result
alt result == tls.ErrCertsOnly
Client-->>Client: treat as nil (success)
else other error or nil
Client-->>Client: return received error (or nil)
end
else Context deadline exceeded first
CTX-->>Client: Done (timeout)
Client-->>Client: return timeout-specific error
end
Handshake-->>-Client: goroutine ends
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/tlsx/ztls/ztls_timeout_test.go`:
- Around line 26-65: The test leaks a goroutine because hangingConn.Read/Write
block forever (they use a bare select) and the server side net.Pipe() was never
closed; update the test and hangingConn so reads/writes are cancellable: create
the pipe as clientConn, serverConn := net.Pipe() and defer serverConn.Close(),
make hangingConn hold a done channel (or use a sync.Once-closeable signal) that
is closed in hangingConn.Close() (and call
clientConn.Close()/hangingConn.Close() in defer), and change hangingConn.Read
and hangingConn.Write to select on the done channel (return net.ErrClosed or a
timeout-like error) as well as the real underlying Conn so the goroutine spawned
by Client.tlsHandshakeWithTimeout can unblock and exit when the test cancels the
context or closes the conn.
🧹 Nitpick comments (2)
pkg/tlsx/ztls/ztls.go (1)
324-339: The fix correctly resolves the indefinite hang — goroutine + buffered channel is the right approach.Two observations:
Context parameter ordering: Go convention (
contextpackage docs) is forctx context.Contextto be the first parameter. This is a pre-existing issue and changing it would touch all call sites, so it can be deferred.Goroutine lifecycle on timeout: When the context fires, this function returns immediately while the goroutine remains blocked in
Handshake(). This is fine because callers (ConnectWithOptionsat line 130,EnumerateCiphersat line 264) close the underlying connection via defers, which will unblock the stuckHandshake()goroutine. The buffered channel (make(chan error, 1)) ensures the goroutine can write its result without blocking even if nobody is listening. Worth adding a brief comment to document this contract for future maintainers.Suggested comment
// tlsHandshakeWithCtx attempts tls handshake with given timeout func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error { + // Buffered so the goroutine can always send even if we return on timeout. + // Callers must close tlsConn (or the underlying net.Conn) after a timeout + // to unblock the goroutine. errChan := make(chan error, 1)pkg/tlsx/ztls/ztls_timeout_test.go (1)
8-8: Nit:"strings"import should be grouped with standard library imports.Move
"strings"before"testing"to maintain alphabetical order within the stdlib import block (or at least keep it adjacent to the other stdlib imports).import ( "context" "net" + "strings" "testing" "time" - "strings"
| func TestTLSHandshakeHang(t *testing.T) { | ||
| // create a pipe to simulate a connection | ||
| // we wrap client side to ensure it hangs if it tries to read/write | ||
| clientConn, _ := net.Pipe() | ||
| defer clientConn.Close() | ||
|
|
||
| hanging := hangingConn{Conn: clientConn} | ||
|
|
||
| // create a tls connection using the hanging connection | ||
| // we don't need a real server because we want to test the client-side timeout | ||
| // when the "network" hangs during handshake hello or similar. | ||
| config := &tls.Config{ | ||
| InsecureSkipVerify: true, | ||
| } | ||
| tlsConn := tls.Client(hanging, config) | ||
|
|
||
| // Create a dummy client just to call the method | ||
| client := &Client{} | ||
|
|
||
| // context with short timeout | ||
| ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond) | ||
| defer cancel() | ||
|
|
||
| start := time.Now() | ||
| err := client.tlsHandshakeWithTimeout(tlsConn, ctx) | ||
| duration := time.Since(start) | ||
|
|
||
| // Check if it respected timeout | ||
| if duration > 1*time.Second { | ||
| t.Errorf("Handshake took too long: %v, expected ~200ms", duration) | ||
| } | ||
|
|
||
| if err == nil { | ||
| t.Error("Expected timeout error, got nil") | ||
| } else { | ||
| // verify it's the right error | ||
| if !strings.Contains(err.Error(), "timeout") { | ||
| t.Errorf("Expected timeout error, got: %v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goroutine leak in test: the spawned handshake goroutine can never be unblocked.
hangingConn.Read/Write use a bare select {} which blocks forever — even after clientConn.Close() is called. This means the goroutine spawned inside tlsHandshakeWithTimeout will permanently leak for each test run. In production code the caller closes the real net.Conn, which unblocks the zcrypto handshake; the test should simulate this too.
Also, the server side of net.Pipe() (line 29, assigned to _) is never closed.
A simple fix: use a channel to make hangingConn cancellable so the goroutine can exit after the test.
Suggested fix
-type hangingConn struct {
- net.Conn
+type hangingConn struct {
+ net.Conn
+ done chan struct{}
}
-func (h hangingConn) Read(b []byte) (n int, err error) {
- select {}
+func (h hangingConn) Read(b []byte) (n int, err error) {
+ <-h.done
+ return 0, net.ErrClosed
}
-func (h hangingConn) Write(b []byte) (n int, err error) {
- select {}
+func (h hangingConn) Write(b []byte) (n int, err error) {
+ <-h.done
+ return 0, net.ErrClosed
}
func TestTLSHandshakeHang(t *testing.T) {
- clientConn, _ := net.Pipe()
+ clientConn, serverConn := net.Pipe()
defer clientConn.Close()
+ defer serverConn.Close()
- hanging := hangingConn{Conn: clientConn}
+ done := make(chan struct{})
+ defer close(done) // unblocks the goroutine after test assertions
+ hanging := hangingConn{Conn: clientConn, done: done}🧰 Tools
🪛 ast-grep (0.40.5)
[warning] 36-38: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
InsecureSkipVerify: true,
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🤖 Prompt for AI Agents
In `@pkg/tlsx/ztls/ztls_timeout_test.go` around lines 26 - 65, The test leaks a
goroutine because hangingConn.Read/Write block forever (they use a bare select)
and the server side net.Pipe() was never closed; update the test and hangingConn
so reads/writes are cancellable: create the pipe as clientConn, serverConn :=
net.Pipe() and defer serverConn.Close(), make hangingConn hold a done channel
(or use a sync.Once-closeable signal) that is closed in hangingConn.Close() (and
call clientConn.Close()/hangingConn.Close() in defer), and change
hangingConn.Read and hangingConn.Write to select on the done channel (return
net.ErrClosed or a timeout-like error) as well as the real underlying Conn so
the goroutine spawned by Client.tlsHandshakeWithTimeout can unblock and exit
when the test cancels the context or closes the conn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dogancanbakir I have refactored the test case (ztls_timeout_test.go) to address the goroutine leak noted by CodeRabbit. The test still passes successfully. Ready for final review.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Description
This PR fixes the indefinite hang reported in #819.
The
tlsHandshakeWithTimeoutfunction inpkg/tlsx/ztls/ztls.gowas previously blocking. I have refactored it to run in a separate goroutine with a buffered channel, ensuring the context timeout is strictly respected.Proof
Verified with a new test case

TestTLSHandshakeHangthat simulates a hanging connection.Fixes #819
/claim #819
Summary by CodeRabbit
Bug Fixes
Tests