Skip to content

Conversation

@Scatheus
Copy link

@Scatheus Scatheus commented Feb 9, 2026

Description

This PR fixes the indefinite hang reported in #819.

The tlsHandshakeWithTimeout function in pkg/tlsx/ztls/ztls.go was previously blocking. I have refactored it to run in a separate goroutine with a buffered channel, ensuring the context timeout is strictly respected.

Proof

Verified with a new test case TestTLSHandshakeHang that simulates a hanging connection.
proof41

Fixes #819
/claim #819

Summary by CodeRabbit

  • Bug Fixes

    • Improved TLS handshake timeout handling and error normalization so certificate-only responses are treated as successful and timeouts are reliably reported.
  • Tests

    • Added test coverage validating TLS client handshake timeout behavior to ensure timely timeout detection.

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

Walkthrough

Reworks tlsHandshakeWithTimeout to perform the TLS handshake asynchronously in a separate goroutine and wait on an error channel; returns nil when error equals tls.ErrCertsOnly and returns a timeout-specific error when the context deadline elapses. Adds a test validating client-side handshake timeout behavior.

Changes

Cohort / File(s) Summary
TLS handshake refactor
pkg/tlsx/ztls/ztls.go
Run TLS handshake in a dedicated goroutine and select on an error channel; remove synchronous Handshake call inside select; treat tls.ErrCertsOnly as success (nil); return context deadline timeout error when signaled.
Timeout test added
pkg/tlsx/ztls/ztls_timeout_test.go
New test exercising client-side handshake timeout using a hanging simulated connection and a 200ms context deadline; asserts timeout error and bounded duration.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handshake as Handshake Goroutine
    participant ErrChan as Error Channel
    participant CTX as Context

    Client->>+Handshake: start goroutine to run tlsConn.Handshake()
    Client->>CTX: set deadline / timeout
    Handshake->>Handshake: execute Handshake()
    alt Handshake completes before timeout
        Handshake->>ErrChan: send error (or nil)
        Client->>ErrChan: receive result
        alt result == tls.ErrCertsOnly
            Client-->>Client: treat as nil (success)
        else other error or nil
            Client-->>Client: return received error (or nil)
        end
    else Context deadline exceeded first
        CTX-->>Client: Done (timeout)
        Client-->>Client: return timeout-specific error
    end
    Handshake-->>-Client: goroutine ends
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped a goroutine to end the freeze,
Channels hum softly beneath the trees,
A timeout bell rings, the handshake wakes,
No more hangs, just tidy stakes. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing an indefinite hang in the ztls handshake by refactoring tlsHandshakeWithTimeout to use asynchronous goroutine-based logic.
Linked Issues check ✅ Passed The changes directly address issue #819 by refactoring tlsHandshakeWithTimeout to run the handshake in a goroutine and properly respect context timeouts, preventing indefinite hangs.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the indefinite hang: refactoring ztls.go handshake logic and adding regression test in ztls_timeout_test.go, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/tlsx/ztls/ztls_timeout_test.go`:
- Around line 26-65: The test leaks a goroutine because hangingConn.Read/Write
block forever (they use a bare select) and the server side net.Pipe() was never
closed; update the test and hangingConn so reads/writes are cancellable: create
the pipe as clientConn, serverConn := net.Pipe() and defer serverConn.Close(),
make hangingConn hold a done channel (or use a sync.Once-closeable signal) that
is closed in hangingConn.Close() (and call
clientConn.Close()/hangingConn.Close() in defer), and change hangingConn.Read
and hangingConn.Write to select on the done channel (return net.ErrClosed or a
timeout-like error) as well as the real underlying Conn so the goroutine spawned
by Client.tlsHandshakeWithTimeout can unblock and exit when the test cancels the
context or closes the conn.
🧹 Nitpick comments (2)
pkg/tlsx/ztls/ztls.go (1)

324-339: The fix correctly resolves the indefinite hang — goroutine + buffered channel is the right approach.

Two observations:

  1. Context parameter ordering: Go convention (context package docs) is for ctx context.Context to be the first parameter. This is a pre-existing issue and changing it would touch all call sites, so it can be deferred.

  2. Goroutine lifecycle on timeout: When the context fires, this function returns immediately while the goroutine remains blocked in Handshake(). This is fine because callers (ConnectWithOptions at line 130, EnumerateCiphers at line 264) close the underlying connection via defers, which will unblock the stuck Handshake() goroutine. The buffered channel (make(chan error, 1)) ensures the goroutine can write its result without blocking even if nobody is listening. Worth adding a brief comment to document this contract for future maintainers.

Suggested comment
 // tlsHandshakeWithCtx attempts tls handshake with given timeout
 func (c *Client) tlsHandshakeWithTimeout(tlsConn *tls.Conn, ctx context.Context) error {
+	// Buffered so the goroutine can always send even if we return on timeout.
+	// Callers must close tlsConn (or the underlying net.Conn) after a timeout
+	// to unblock the goroutine.
 	errChan := make(chan error, 1)
pkg/tlsx/ztls/ztls_timeout_test.go (1)

8-8: Nit: "strings" import should be grouped with standard library imports.

Move "strings" before "testing" to maintain alphabetical order within the stdlib import block (or at least keep it adjacent to the other stdlib imports).

 import (
 	"context"
 	"net"
+	"strings"
 	"testing"
 	"time"
-	"strings"

Comment on lines 26 to 65
func TestTLSHandshakeHang(t *testing.T) {
// create a pipe to simulate a connection
// we wrap client side to ensure it hangs if it tries to read/write
clientConn, _ := net.Pipe()
defer clientConn.Close()

hanging := hangingConn{Conn: clientConn}

// create a tls connection using the hanging connection
// we don't need a real server because we want to test the client-side timeout
// when the "network" hangs during handshake hello or similar.
config := &tls.Config{
InsecureSkipVerify: true,
}
tlsConn := tls.Client(hanging, config)

// Create a dummy client just to call the method
client := &Client{}

// context with short timeout
ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond)
defer cancel()

start := time.Now()
err := client.tlsHandshakeWithTimeout(tlsConn, ctx)
duration := time.Since(start)

// Check if it respected timeout
if duration > 1*time.Second {
t.Errorf("Handshake took too long: %v, expected ~200ms", duration)
}

if err == nil {
t.Error("Expected timeout error, got nil")
} else {
// verify it's the right error
if !strings.Contains(err.Error(), "timeout") {
t.Errorf("Expected timeout error, got: %v", err)
}
}
Copy link

@coderabbitai coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Goroutine leak in test: the spawned handshake goroutine can never be unblocked.

hangingConn.Read/Write use a bare select {} which blocks forever — even after clientConn.Close() is called. This means the goroutine spawned inside tlsHandshakeWithTimeout will permanently leak for each test run. In production code the caller closes the real net.Conn, which unblocks the zcrypto handshake; the test should simulate this too.

Also, the server side of net.Pipe() (line 29, assigned to _) is never closed.

A simple fix: use a channel to make hangingConn cancellable so the goroutine can exit after the test.

Suggested fix
-type hangingConn struct {
-	net.Conn
+type hangingConn struct {
+	net.Conn
+	done chan struct{}
 }
 
-func (h hangingConn) Read(b []byte) (n int, err error) {
-	select {}
+func (h hangingConn) Read(b []byte) (n int, err error) {
+	<-h.done
+	return 0, net.ErrClosed
 }
 
-func (h hangingConn) Write(b []byte) (n int, err error) {
-	select {}
+func (h hangingConn) Write(b []byte) (n int, err error) {
+	<-h.done
+	return 0, net.ErrClosed
 }
 
 func TestTLSHandshakeHang(t *testing.T) {
-	clientConn, _ := net.Pipe()
+	clientConn, serverConn := net.Pipe()
 	defer clientConn.Close()
+	defer serverConn.Close()
 
-	hanging := hangingConn{Conn: clientConn}
+	done := make(chan struct{})
+	defer close(done) // unblocks the goroutine after test assertions
+	hanging := hangingConn{Conn: clientConn, done: done}
🧰 Tools
🪛 ast-grep (0.40.5)

[warning] 36-38: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
InsecureSkipVerify: true,
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🤖 Prompt for AI Agents
In `@pkg/tlsx/ztls/ztls_timeout_test.go` around lines 26 - 65, The test leaks a
goroutine because hangingConn.Read/Write block forever (they use a bare select)
and the server side net.Pipe() was never closed; update the test and hangingConn
so reads/writes are cancellable: create the pipe as clientConn, serverConn :=
net.Pipe() and defer serverConn.Close(), make hangingConn hold a done channel
(or use a sync.Once-closeable signal) that is closed in hangingConn.Close() (and
call clientConn.Close()/hangingConn.Close() in defer), and change
hangingConn.Read and hangingConn.Write to select on the done channel (return
net.ErrClosed or a timeout-like error) as well as the real underlying Conn so
the goroutine spawned by Client.tlsHandshakeWithTimeout can unblock and exit
when the test cancels the context or closes the conn.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dogancanbakir I have refactored the test case (ztls_timeout_test.go) to address the goroutine leak noted by CodeRabbit. The test still passes successfully. Ready for final review.
proof43

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tlsx hangs indefinitely for some hosts

1 participant