Skip to content

fix: use fresh access token for SSE reconnection attempts#78

Open
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1778677087-fix-stale-token-sse-reconnect
Open

fix: use fresh access token for SSE reconnection attempts#78
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1778677087-fix-stale-token-sse-reconnect

Conversation

@devin-ai-integration

Copy link
Copy Markdown
Contributor

Summary

Fixes repeated SSE errors reported by customers during secrets reconciliation (OpenInstantUpdatesStream.func2).

Root cause: The requestFn closure passed to SubscribeWithParams captured the access token by value once at SSE setup time (line 861). When the SSE connection later dropped (network blip, server restart, or the backend's 1-hour CONNECTION_TIMEOUT_MS), the reconnect loop reused the original — now expired — token, causing all 5 retry attempts to fail with auth errors. Each failure was logged via the onError callback (func2), producing the error spike customers observed.

Fix: Replace the captured token variable with a live call to infisicalClient.Auth().GetAccessToken() inside the closure. The Go SDK manages token auto-refresh internally, so calling it at reconnection time returns a valid token.

Review & Testing Checklist for Human

  • Verify that infisicalClient.Auth().GetAccessToken() is safe to call from the reconnection goroutine (thread-safety of the Go SDK auth module)
  • Deploy to a staging cluster with instantUpdates: true, wait for the SSE connection to drop (or force-kill it), and confirm reconnection succeeds with a fresh token instead of producing auth errors
  • Confirm that the initial SSE connection still establishes correctly (no regression from removing the token variable)

Notes

The removed token variable on line 861 was only consumed by the closure — the slug resolution on line 866 already called GetAccessToken() independently.

Link to Devin session: https://app.devin.ai/sessions/e138b75048354fd2aa1f7f9f72cfd8ce
Requested by: @0xArshdeep

The requestFn closure passed to SubscribeWithParams captured the access
token by value at SSE setup time. When the connection dropped and the
reconnect loop fired, it reused the original (potentially expired) token,
causing repeated auth failures logged via the onError callback.

Now the closure calls infisicalClient.Auth().GetAccessToken() on each
connection/reconnection attempt, so the Go SDK's auto-refresh provides
a valid token.

Co-Authored-By: arsh <arshsb1998@gmail.com>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@0xArshdeep 0xArshdeep closed this May 18, 2026
@varonix0 varonix0 reopened this May 28, 2026
@greptile-apps

greptile-apps Bot commented May 28, 2026

Copy link
Copy Markdown

Greptile Summary

Fixes stale-token SSE reconnection errors by replacing a once-captured token variable with a live infisicalClient.Auth().GetAccessToken() call inside the requestFn closure, so every reconnection attempt (triggered by reconnectLoop or the health monitor) fetches a valid token instead of reusing one that expired after the backend's 1-hour CONNECTION_TIMEOUT_MS.

  • Stale token fix: Removes the token := infisicalClient.Auth().GetAccessToken() capture at setup time (line 861 of the old code) and calls GetAccessToken() directly inside the closure, which is already the pattern used elsewhere in the reconciler (e.g. slug-to-project-ID resolution at line 864).
  • Unchanged retry semantics: The SSE registry's reconnectLoop and checkConnectionHealth logic in sse.go are untouched; only the token source changes.

Confidence Score: 4/5

The change is minimal and correctly addresses the root cause; the main open question is whether the Go SDK's auth module is goroutine-safe when called from the background reconnect goroutine.

The two-line change is logically sound: calling GetAccessToken() inside the closure matches the existing pattern used on line 864 and eliminates the stale-token capture. The only unresolved concern is thread-safety of GetAccessToken() from a background goroutine — something the PR itself flags as requiring manual verification before merge.

internal/services/infisicalsecret/reconciler.go — confirm that Auth().GetAccessToken() is safe to call concurrently from the reconnect goroutine spawned by sse.go.

Important Files Changed

Filename Overview
internal/services/infisicalsecret/reconciler.go Removes a pre-captured token variable so the SSE reconnect closure fetches a fresh token via GetAccessToken() on every attempt; thread-safety of that call from background goroutines is noted as unverified in the PR checklist.

Reviews (1): Last reviewed commit: "fix: use fresh access token for SSE reco..." | Re-trigger Greptile

// even when reconnecting long after the original token expired.
httpClient, err := util.CreateRestyClient(model.CreateRestyClientOptions{
AccessToken: token,
AccessToken: infisicalClient.Auth().GetAccessToken(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unverified thread-safety of GetAccessToken() from reconnect goroutine

The closure is invoked from reconnectLoop and checkConnectionHealth, both of which run in background goroutines (see sse.go lines 370 and 551). If the Go SDK's auth module is not goroutine-safe — i.e., if a concurrent token refresh races with the read in GetAccessToken() — this could return a partially-written or empty token, producing the same 401-auth errors the PR aims to fix. The PR's own review checklist marks this as unverified.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch from the bot — I investigated the Go SDK source (go-sdk@v0.7.1).

TL;DR: The SDK's token refresh is goroutine-safe; GetAccessToken() has a pre-existing minor race but this change is still strictly better than the stale-token capture.

Here's what the SDK does internally:

  1. Writes are mutex-protected: setAccessToken() (called by both doTokenRenewal and doReAuthentication) holds c.mu.Lock() when writing tokenDetails, credential, and authMethod. A separate refreshMu serializes concurrent refresh attempts via TryLock.

  2. GetAccessToken() does NOT hold c.mu.RLock() — it reads c.client.tokenDetails.AccessToken directly. This is a pre-existing race in the SDK (not introduced by this PR). In practice, Go string assignment is pointer+length and the worst realistic outcome is reading a briefly stale (but previously valid) token, which would succeed or trigger a retry.

  3. Net effect of this PR: Before — guaranteed expired token on every reconnect. After — reads the latest token (possibly racing with a concurrent refresh, but the value will be either the current valid token or the previous one that was valid moments ago). This is a strict improvement.

The proper fix for the read race belongs in the SDK's GetAccessToken() method (adding c.client.mu.RLock()), not in this operator code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants