Skip to content

feat(go): introduce context.Context to client methods#2964

Open
chengxilo wants to merge 18 commits into
apache:masterfrom
chengxilo:introduce-context
Open

feat(go): introduce context.Context to client methods#2964
chengxilo wants to merge 18 commits into
apache:masterfrom
chengxilo:introduce-context

Conversation

@chengxilo
Copy link
Copy Markdown
Contributor

@chengxilo chengxilo commented Mar 18, 2026

Which issue does this PR close?

N/A

Rationale

Standardizing the Go SDK to support context.Context across all client methods. This allows users to handle timeouts, cancellations

What changed?

The iggcon.Client interface and its TCP implementation have been updated to include context.Context as the first parameter for all API methods.
Correspondingly, all internal command-handling logic now respects context deadlines and cancellations.

Local Execution

  • Passed
  • Pre-commit hooks ran

AI Usage

  1. Claude Sonnet 4.6
  2. Add context parameters, initial implementation of test (modifed manually)
  3. I read the code and test
  4. yes

@chengxilo chengxilo marked this pull request as draft March 18, 2026 05:24
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 75.30120% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.16%. Comparing base (54fe2bb) to head (c5c7cb9).
⚠️ Report is 54 commits behind head on master.

Files with missing lines Patch % Lines
foreign/go/samples/consumer/consumer.go 0.00% 12 Missing ⚠️
foreign/go/samples/producer/producer.go 0.00% 12 Missing ⚠️
foreign/go/client/tcp/tcp_core.go 80.00% 8 Missing and 1 partial ⚠️
foreign/go/client/iggy_client.go 63.63% 4 Missing ⚠️
foreign/go/client/tcp/tcp_clients_management.go 50.00% 2 Missing ⚠️
foreign/go/client/tcp/tcp_session_management.go 83.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2964      +/-   ##
============================================
- Coverage     73.29%   73.16%   -0.14%     
  Complexity      943      943              
============================================
  Files          1126     1126              
  Lines         98435    99069     +634     
  Branches      75608    75611       +3     
============================================
+ Hits          72148    72479     +331     
- Misses        23683    23984     +301     
- Partials       2604     2606       +2     
Components Coverage Δ
Rust Core 74.15% <ø> (-0.03%) ⬇️
Java SDK 62.30% <ø> (ø)
C# SDK 69.40% <ø> (-0.04%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.53% <ø> (+0.09%) ⬆️
Go SDK 41.19% <75.30%> (+1.78%) ⬆️
Files with missing lines Coverage Δ
foreign/go/client/tcp/cluster.go 45.45% <100.00%> (-10.11%) ⬇️
...reign/go/client/tcp/tcp_access_token_management.go 100.00% <100.00%> (ø)
...ign/go/client/tcp/tcp_consumer_group_management.go 95.58% <100.00%> (-1.24%) ⬇️
foreign/go/client/tcp/tcp_messaging.go 80.00% <100.00%> (-5.19%) ⬇️
foreign/go/client/tcp/tcp_offset_management.go 100.00% <100.00%> (ø)
foreign/go/client/tcp/tcp_partition_management.go 100.00% <100.00%> (ø)
foreign/go/client/tcp/tcp_stream_management.go 79.06% <100.00%> (-3.79%) ⬇️
foreign/go/client/tcp/tcp_topic_management.go 76.19% <100.00%> (-4.95%) ⬇️
foreign/go/client/tcp/tcp_user_management.go 89.09% <100.00%> (-2.91%) ⬇️
foreign/go/client/tcp/tcp_utilities.go 75.00% <100.00%> (-5.00%) ⬇️
... and 7 more

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@chengxilo chengxilo marked this pull request as ready for review March 18, 2026 06:03
Comment thread foreign/go/client/tcp/tcp_core.go Outdated
Comment thread foreign/go/client/tcp/tcp_core.go Outdated
Comment thread foreign/go/client/tcp/tcp_utilities.go
Comment thread foreign/go/client/tcp/tcp_core.go
Comment thread foreign/go/contracts/client.go
Comment thread foreign/go/client/iggy_client.go Outdated
Comment thread foreign/go/contracts/client.go Outdated
Comment thread bdd/go/tests/leader_redirection.go Outdated
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added the stale Inactive issue or pull request label Mar 26, 2026
@github-actions github-actions Bot removed the stale Inactive issue or pull request label Mar 27, 2026
@chengxilo chengxilo requested a review from hubcio March 31, 2026 19:15
Copy link
Copy Markdown
Member

@ryankert01 ryankert01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lg, like the idea.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added stale Inactive issue or pull request and removed stale Inactive issue or pull request labels Apr 19, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added stale Inactive issue or pull request and removed stale Inactive issue or pull request labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added stale Inactive issue or pull request and removed stale Inactive issue or pull request labels May 9, 2026
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented May 14, 2026

/ready

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label May 14, 2026
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented May 14, 2026

looks like CI is failing.

/author

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 14, 2026
Copy link
Copy Markdown
Contributor

@hubcio hubcio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR doesn't compile. go build ./... fails:

client/tcp/tcp_segment_management.go:31:17: not enough arguments in call to c.do

DeleteSegments wasn't migrated - the impl in tcp_segment_management.go still calls c.do(&command.DeleteSegments{...}) without the new ctx arg, and the interface method in contracts/client.go still lacks ctx. that file isn't part of this PR's diff so it can't be commented on inline, but ctx needs threading through both the interface signature and the impl, same as every other client method. go vet flags the same.

the rest of the findings are inline on the Files changed tab.

// clear the deadline after the operation is done.
var deadlineMu sync.Mutex

stop := context.AfterFunc(ctx, func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the context.AfterFunc callback reads c.conn without holding c.mtx, and stop() does not wait for an already-started callback - it can return false while the callback goroutine is still running. once sendAndFetchResponse returns and the mutex is released, connect() can reassign c.conn at line 459, so the callback's read races that write. worse, the callback then calls SetDeadline(time.Now()) on whatever c.conn points to, which can be a fresh healthy connection, poisoning it into an immediate timeout. fix: snapshot conn := c.conn under the mutex before registering the callback and have the callback use the snapshot. a ctx.Done() == nil fast-path also closes the window for context.Background() callers.

return nil, err
}

// deadlineMu makes sure that the deadline won't be set to now by the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment claims deadlineMu stops the callback from setting the deadline to now right after clearDeadline clears it, but a mutex only gives mutual exclusion, not ordering. if clearDeadline acquires deadlineMu first it sets the deadline to zero, then the callback runs and sets it to time.Now(), leaving the connection with a past deadline. needs a cleared flag set under deadlineMu that the callback checks, not just the mutex.

func (c *IggyTcpClient) sendAndFetchResponse(message []byte, command command.Code) ([]byte, error) {
func (c *IggyTcpClient) sendAndFetchResponse(ctx context.Context, message []byte, command command.Code) ([]byte, error) {
if ctx == nil {
return nil, errors.New("nil context")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.New("nil context") is an untyped error - callers can't errors.Is it. every other error path here returns typed values (context.Canceled, context.DeadlineExceeded, ierror.*). make this a sentinel like ierror.ErrNilContext for consistency at the SDK boundary.

}

// invalidateConnLocked closes the connection and marks it as disconnected
func (c *IggyTcpClient) invalidateConnLocked() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidateConnLocked closes the conn and sets StateDisconnected on every i/o error, but nothing on the operation path ever reconnects - connect() is only called from NewIggyTcpClient and the login-after-redirect path, and the heartbeat goroutine just logs ping failures. so after the first transient i/o error the client is permanently dead: every later call writes to the closed conn and fails forever, and the reconnection config (maxRetries/interval) becomes dead code on this path. pre-PR an i/o error left the conn open and state unchanged. either wire up a reconnect trigger here or document this as a hard behavior change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I almost forgot about this . I was planning to solve this in another PR. Maybe I can create a PR to implement the reconnect logic first.

// deadlineMu makes sure that the deadline won't be set to now by the
// AfterFunc callback right after we call SetDeadline(time.Time{}) to
// clear the deadline after the operation is done.
var deadlineMu sync.Mutex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var deadlineMu sync.Mutex plus the context.AfterFunc registration and the clearDeadline/ioError closures are allocated on every sendAndFetchResponse call - including the hot SendMessages/PollMessages path and context.Background() callers whose ctx.Done() is nil and can never fire. a ctx.Done() == nil fast-path that skips the deadline machinery entirely avoids the waste, and as noted on the AfterFunc line also removes the c.conn race for non-cancellable contexts.

return nil, err
}
return c.LoginUser(username, password)
return c.LoginUser(ctx, username, password)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also at line 71 in LoginWithPersonalAccessToken. the recursion re-passes the same ctx, and the first c.do's AfterFunc callback can still be live when connect() (line 46 / 68) reassigns c.conn - same c.conn race flagged in tcp_core.go, reachable single-threaded here. fixing the callback to use a conn snapshot covers this too.

log.Println("Checking cluster metadata for leader detection")

meta, err := c.GetClusterMetadata()
meta, err := c.GetClusterMetadata(ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ctx is cancelled during this GetClusterMetadata call, sendAndFetchResponse closes the connection via invalidateConnLocked and returns ctx.Err() - but the if err != nil block just below logs it and returns ("", nil), swallowing the cancellation. HandleLeaderRedirection then returns (false, nil) and LoginUser returns identity, nil: a successful login on a dead connection. the swallow is intentional for non-clustered servers, so don't propagate all errors blindly - check ctx.Err() specifically and return it before the swallow.

t.Errorf("got %v, want context.DeadlineExceeded", err)
}
// After a timeout, the connection should be invalidated.
if c.state != iggcon.StateDisconnected {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this asserts c.state == StateDisconnected after a timeout, which codifies the sticky-failure behavior (closed conn, no reconnect path) as expected - if that gets fixed, this test changes. separately, every test here is single-operation: none exercise connect() reassigning c.conn while an AfterFunc callback from a prior op is still live, which is exactly the race this PR introduces. go test -race stays green only because of that gap. worth adding a concurrent-reassignment test.

@chengxilo
Copy link
Copy Markdown
Contributor Author

chengxilo commented May 14, 2026

Ok it seems that something happend after merging(). Will start working on this on April 20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author PR is waiting on author response

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants