Skip to content

Reproduce registry auth stale-cache bug#4951

Draft
lorr1 wants to merge 1 commit intomainfrom
laurel/repro-registry-auth-stale-cache
Draft

Reproduce registry auth stale-cache bug#4951
lorr1 wants to merge 1 commit intomainfrom
laurel/repro-registry-auth-stale-cache

Conversation

@lorr1
Copy link
Copy Markdown
Contributor

@lorr1 lorr1 commented Apr 21, 2026

Summary

  • Why: CachedAPIRegistryProvider.refreshCache() silently falls back to stale cached registry data when the upstream API call fails — including when the failure is an auth error. As a result, thv registry list against a registry behind OAuth prints previously-cached servers with no error even after the user's OAuth flow times out or their credentials are revoked. This masks the auth failure completely and is the bug reported in thv registry list serves stale cache when OAuth flow fails, masking auth failure #4950.
  • What: adds two failing unit tests in pkg/registry/provider_cached_authbug_test.go that exercise the two distinct error-propagation paths (server-side 401/403 and OAuth-browser-flow timeout) that both hit the same buggy fallback in refreshCache().

This PR is intentionally a draft and does not include a fix — it's a reproducer for #4950. The fix direction is under discussion in the issue because a naive sentinel-check fix is insufficient for the browser-flow path (the error carries neither auth.ErrRegistryAuthRequired nor api.ErrRegistryUnauthorized).

Refs #4950

Type of change

  • Bug fix (non-breaking change that fixes an issue) — reproducer only; fix to follow

Test plan

  • New unit tests added in pkg/registry/provider_cached_authbug_test.go.
  • Verified both tests fail on current main:
    --- FAIL: TestCachedProvider_AuthErrorNotMaskedByStaleCache (0.00s)
          An error is expected but got nil.
          expected auth error to propagate on refresh; got nil —
          stale cache is masking the auth failure (bug)
    
    --- FAIL: TestCachedProvider_OAuthFlowFailureNotMaskedByStaleCache (0.00s)
          An error is expected but got nil.
          expected OAuth flow failure to propagate on refresh; got nil —
          stale cache is masking the OAuth failure (bug)
    
  • task lint-fix passes (0 issues).

Special notes for reviewers

  • The second test uses a stubbed auth.TokenSource (failingTokenSource) that mirrors the exact error wrapping oauthTokenSource.Token() applies when its browser flow fails — see pkg/registry/auth/oauth_token_source.go:61-67 and :110-113. This is important because a sentinel-check-only fix in refreshCache would catch test 1 but miss test 2, which is the user-reported scenario.
  • The first test uses tokenSource=nil so the construction-time validation probe (pkg/registry/provider_api.go:57-74) runs against the healthy server; the failure is introduced later by flipping the server's response.
  • Scope beyond the CLI: CachedAPIRegistryProvider is also used by pkg/api/v1/registry*.go and pkg/mcp/server/search_registry.go, so the behavior change implied by a fix will affect headless contexts too. Flagging for reviewer awareness.

CachedAPIRegistryProvider.refreshCache falls back to stale in-memory
cache data on any upstream error, including auth failures. This masks
a timed-out OAuth flow (or revoked credentials) as a successful list,
leaving the user no signal that they are unauthenticated.

Adds two failing reproducers that both hit the same swallow site:

  TestCachedProvider_AuthErrorNotMaskedByStaleCache
    server-side 401/403 rejection path (token was sent and refused)

  TestCachedProvider_OAuthFlowFailureNotMaskedByStaleCache
    browser-flow timeout path — the error carries neither
    auth.ErrRegistryAuthRequired nor api.ErrRegistryUnauthorized,
    so a naive sentinel-only fix would miss it

Refs #4950
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant