Skip to content

Add force-refresh path for cached U2M tokens#1552

Open
mihaimitrea-db wants to merge 6 commits intomainfrom
mihaimitrea-db/token-force-refresh
Open

Add force-refresh path for cached U2M tokens#1552
mihaimitrea-db wants to merge 6 commits intomainfrom
mihaimitrea-db/token-force-refresh

Conversation

@mihaimitrea-db
Copy link
Contributor

@mihaimitrea-db mihaimitrea-db commented Mar 17, 2026

Summary

This PR adds an explicit PersistentAuth.ForceRefreshToken() path for cached U2M OAuth tokens so the CLI can implement databricks auth token --force-refresh on top of the 5-minute proactive refresh buffer introduced in #1535.

For maintainability reasons it also splits token loading from refresh decisions and documents a cross-process refresh race for a follow-up fix.

Why

#1535 changed the default U2M token path to proactively refresh tokens that are within 5 minutes of expiry so callers of databricks auth token are less likely to receive a token that expires before it can be used. That addresses the "give me a useful token" case, which came out of the auth token discussion in cli issue #4564.

That proactive refresh is intentionally best-effort, though. PersistentAuth.Token() still returns the existing access token when it is valid and a proactive refresh fails, because callers did not explicitly ask to block on a fresh token. That behavior is appropriate for the default path, but it is not strong enough for integrations that want to treat the CLI as a token minter or that want to manage their own cache/TTL policy. In those cases, "return a still-usable token" is different from "refresh now and give me a newly minted token or fail."

This PR adds that stricter SDK primitive. It gives the CLI a clean way to implement --force-refresh without changing the semantics of the existing Token() flow.

The raw oauth2 error text for forced refresh failures can be misleading. In particular, when a forced refresh is requested for a cached access token that is still valid but does not include a refresh token, the default oauth2 error suggests that the token is expired even though the real problem is that the cached token is not refresh-capable.

This PR surfaces explicit refresh-token errors instead of relying on the default oauth2 wording, so callers can distinguish "the current access token is expired" from "a new token could not be minted because the refresh token is missing or was rejected by the IdP".

What changed

Interface changes

  • Added (*PersistentAuth).ForceRefreshToken() (*oauth2.Token, error).
    This always attempts to refresh the cached U2M token, regardless of its current expiry, and redacts the refresh token from the returned value just like Token() does.

Behavioral changes

  • No default behavior changed for existing callers of PersistentAuth.Token().
  • PersistentAuth.ForceRefreshToken() always tries to refresh, even when the cached access token is still valid.
  • If forced refresh fails, ForceRefreshToken() returns an error instead of falling back to the existing token.
  • If the cached token does not contain a refresh token, ForceRefreshToken() fails explicitly.

Internal changes

  • Extracted cache lookup into loadToken() so token loading and refresh policy are separated.
  • Reused the existing refresh path so forced refresh still goes through the same endpoint resolution, token exchange, cache update, and refresh-token redaction logic as the normal path.
  • Added dedicated tests for force-refresh success, missing refresh token, transient refresh failure, invalid refresh token handling, and the profile-to-host fallback path.
  • Documented the remaining cross-process refresh race in refresh() with a follow-up TODO rather than expanding the scope of this PR to add inter-process coordination.
  • Added ErrMissingRefreshToken for the "cached token has no refresh token" case.
  • Added missing-refresh-token check into the shared refresh() path so both Token() and ForceRefreshToken() wrap the same semantic error.

How is this tested?

  • go test ./credentials/u2m
  • Added unit tests covering:
    • refreshing a still-valid cached token via ForceRefreshToken()
    • failing ForceRefreshToken() when the cached token has no refresh token
    • surfacing the exact wrapped error on refresh failure
    • preserving InvalidRefreshTokenError behavior
    • exercising forced refresh through the existing profile-key to host-key fallback path

Split token loading from refresh decisions, add dedicated force-refresh coverage, and document the remaining cross-process refresh race for a follow-up fix.

Signed-off-by: Mihai Mitrea <mihai.mitrea@databricks.com>
Document the new PersistentAuth.ForceRefreshToken method in NEXT_CHANGELOG so the release notes cover the new explicit cached-token refresh behavior.

Signed-off-by: Mihai Mitrea <mihai.mitrea@databricks.com>
Move the missing refresh token check into the shared refresh path, expose it as ErrMissingRefreshToken, and cover the wrapped error behavior from both Token and ForceRefreshToken.

Signed-off-by: Mihai Mitrea <mihai.mitrea@databricks.com>
Copy link
Member

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (automated, 2 agents)

Verdict: Approved

0 Critical | 0 Major | 1 Gap | 2 Nit | 1 Suggestion

See inline comments for details.

…rMissingRefreshToken to changelog

Signed-off-by: Mihai Mitrea <mihai.mitrea@databricks.com>
mihaimitrea-db

This comment was marked as outdated.

mihaimitrea-db

This comment was marked as outdated.

Copy link
Contributor Author

@mihaimitrea-db mihaimitrea-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Review by 2 Agents

Verdict: Approve with comments

  • 🔴 0 Critical
  • 🟠 0 Major
  • 🟡 3 Minor
  • 🔵 3 Nit

The core logic changes look sound; the main concerns are a misclassified changelog entry, a missing test for the new ErrMissingRefreshToken graceful-degradation path, and a misleading doc comment on ForceRefreshToken().

See inline comments for details.

- NEXT_CHANGELOG: Move ErrMissingRefreshToken to API Changes, remove
  proactive-refresh and ParentPath (rebase artifacts / unrelated)
- error.go: Fix InvalidRefreshTokenError doc (Load -> Token/ForceRefreshToken)
- persistent_auth.go: Clarify ForceRefreshToken dual-write vs Token();
  unify error prefix to 'token refresh:'; simplify loadToken doc
- persistent_auth_test.go: Add TestToken_ReturnsExistingTokenWhenNearExpiryAndNoRefreshToken;
  add defensive store t.Fatalf to ForceRefreshToken tests

Signed-off-by: Mihai Mitrea <mihai.mitrea@databricks.com>
@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-go

Inputs:

  • PR number: 1552
  • Commit SHA: 08dab3c6feb0c3716bc1eda8b00bc4bad636db53

Checks will be approved automatically on success.

Copy link
Member

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (automated, 3 agents: Claude Code + Isaac + Cursor, 2-round deep swarm)

Verdict: Approved (with suggestions)

0 Critical | 1 Major | 0 Gap | 2 Nit | 0 Suggestion

See inline comments for details. Overall the PR is well-designed: the loadToken() extraction is clean, ForceRefreshToken() has the right strict-error semantics, ErrMissingRefreshToken is idiomatic Go, and test coverage is thorough across all key paths.

Comment on lines +283 to 297
// When the token was loaded via host-key fallback (profile key not found), the
// refreshed token is dual-written to both the profile key and host key,
// effectively migrating the token to the profile. Token() migrates only when
// it needs to refresh (expired or nearly expired); ForceRefreshToken always
// refreshes, so it always migrates on success.
func (a *PersistentAuth) ForceRefreshToken() (*oauth2.Token, error) {
t, err := a.loadToken()
if err != nil {
return nil, err
}
t, err = a.refresh(t)
if err != nil {
return nil, fmt.Errorf("forced token refresh: %w", err)
}
t.RefreshToken = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] ForceRefreshToken migrates host-key fallback tokens to the profile key, widening a pre-existing risk

loadToken() deliberately avoids migrating host-key fallback tokens because they "may contain a token from a different profile with different scopes." However, ForceRefreshToken() always calls refresh() which calls dualWrite(), writing the refreshed token to the profile key even when it was loaded via fallback. This can make a wrong-scoped token sticky for future lookups.

Nuance: The base branch's Token() already migrates fallback tokens when they need refresh (expired or near-expiry). ForceRefreshToken() widens the migration to include still-valid tokens, but doesn't introduce a new category of bug. Practical risk is limited (requires two profiles with different scopes pointing to the same host).

Suggestion: Consider threading provenance out of loadToken() in a follow-up, so callers know whether the token came from the primary key or host-key fallback. Then ForceRefreshToken() could refuse to force-refresh a fallback-loaded token (directing the user to databricks auth login --profile <name>), or refresh without writing to the profile key.

Found by: Cursor + Claude Code (R1), nuanced by Isaac (R2)

Comment on lines +9 to +11
// InvalidRefreshTokenError is returned from PersistentAuth's Token() and
// ForceRefreshToken() methods if the access token has expired and the refresh
// token in the token cache is invalid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] InvalidRefreshTokenError doc comment overstates expiry precondition

The comment says this error is returned "if the access token has expired", but ForceRefreshToken() calls refresh() even for still-valid cached access tokens. If that forced refresh gets invalid_grant, callers still receive InvalidRefreshTokenError, so the doc comment overstates the expiry precondition.

Suggestion: Reword in terms of a refresh attempt, e.g.: "returned when a token refresh is attempted and the cached refresh token is invalid."

Found by: Cursor (R2, cross-review)

@simonfaltum
Copy link
Member

[Nit] PR description claims "no default behavior changed" for Token(), but it did

The PR description states "No default behavior changed for existing callers of PersistentAuth.Token()." The new ErrMissingRefreshToken check in refresh() changes the error returned when Token() is called with an expired token that has no refresh token: from a misleading oauth2 library error to "token refresh: cached token has no refresh token". Callers can now use errors.Is(err, ErrMissingRefreshToken). This is a net improvement, but should be acknowledged as a behavioral change.

Suggestion: Update the PR description, e.g.: "For Token(), the only behavioral change is that expired tokens with no refresh token now return ErrMissingRefreshToken instead of a misleading oauth2 library error."

Found by: Isaac (R1) + Claude Code (R1), confirmed by Cursor (R2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants