fix(auth/m2m): remove double-caching to enable proactive token refresh#1550
Open
aausch wants to merge 1 commit intodatabricks:mainfrom
Open
fix(auth/m2m): remove double-caching to enable proactive token refresh#1550aausch wants to merge 1 commit intodatabricks:mainfrom
aausch wants to merge 1 commit intodatabricks:mainfrom
Conversation
clientcredentials.Config.TokenSource returns an oauth2.ReuseTokenSource, which caches the token internally with a 10s expiryDelta. Wrapping this in cachedTokenSource creates a double-caching stack where async refresh calls return the inner-cached token instead of making a real HTTP request. As a result, the proactive 20-min async refresh window is wasted: the underlying token endpoint is not reached until ~10s before expiry. Any request that holds the about-to-expire token and whose HTTP round-trip to Databricks completes after the expiry time receives HTTP 401. Replace clientcredentials.Config.TokenSource (ReuseTokenSource) with a direct TokenSourceFn that always calls ccfg.Token(ctx). cachedTokenSource becomes the sole cache layer and async refresh proactively fetches a fresh token at T-20min as intended. Fixes databricks#1549. Tests: - TestCachedTokenSource_AsyncRefreshBlockedByInnerCache: documents that inner ReuseTokenSource delays the real fetch to near T-10s - TestCachedTokenSource_AsyncRefreshWithDirectSource: verifies that a direct source causes the fetch at T-20min as intended - Existing TestM2mHappyFlow / TestM2mHappyFlowForAccount: still pass Signed-off-by: Alex Ausch <alex@ausch.name>
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a double-caching bug in M2M OAuth that caused the proactive async token refresh to have no effect, resulting in bursts of HTTP 401 errors at each token rotation boundary (~every hour).
Closes #1549.
Why
M2mCredentials.Configurepreviously calledclientcredentials.Config.TokenSource(ctx), which returns anoauth2.ReuseTokenSource. That source was then passed torefreshableVisitor, which wraps it in acachedTokenSource. The resulting stack is:When
cachedTokenSourcetriggers its async refresh at T−20 min, it calls through toReuseTokenSource.Token(). Because the token still has 20 minutes of life,ReuseTokenSourceconsiders it valid and returns the cached token without an HTTP call. The async refresh fires repeatedly (with an accelerating schedule) but each call hits the same inner cache — until only ~10 s remain, at which pointReuseTokenSource'sexpiryDeltawindow is crossed and a real network call is finally made.Any request that receives the about-to-expire token and whose round-trip to Databricks completes after the token's expiry time gets HTTP 401. In production this manifests as a burst of 401s at precisely one-hour intervals, correlated with pod startup times.
What changed
Interface changes
None.
Behavioral changes
Internal changes
auth_m2m.go: replacedclientcredentials.Config.TokenSource(ctx)(which returns aReuseTokenSource) with aTokenSourceFnclosure that callsccfg.Token(ctx)directly.cachedTokenSourceis now the sole caching layer.cache_test.go: addedreuseTokenSourcetest helper and two new tests:TestCachedTokenSource_AsyncRefreshBlockedByInnerCache— documents the double-caching behaviour using a mock clock.TestCachedTokenSource_AsyncRefreshWithDirectSource— verifies that a direct (non-caching) inner source triggers an HTTP fetch at the proactive window.NEXT_CHANGELOG.md: updated.How is this tested?
TestCachedTokenSource_AsyncRefreshBlockedByInnerCache: uses a fake clock and a controlledreuseTokenSourcehelper to confirm that inner caching delays the HTTP fetch to T−10 s rather than T−20 min.TestCachedTokenSource_AsyncRefreshWithDirectSource: uses the same fake clock with a direct token source to confirm the fetch occurs at T−20 min after the fix.TestM2mHappyFlowandTestM2mHappyFlowForAccountcontinue to pass.