Wire identityFromToken into the OAuth2 upstream provider#5222
Open
Wire identityFromToken into the OAuth2 upstream provider#5222
Conversation
Wire identityFromToken into the embedded auth server's OAuth2 upstream
provider. Extension point: the existing tokenResponseRewriter (which
already reads and parses every successful token-endpoint response to
normalise non-standard envelopes) gains a parallel responsibility —
extract user identity claims from the same body when the operator
configures IdentityFromTokenConfig with gjson dot-notation paths.
Identity extraction runs on the RAW pre-rewrite body, so paths are
resolved against the original provider response even when
TokenResponseMapping is also configured. The rewriter passes the
extracted *partialIdentity back to exchangeCodeForTokens via a
returned reference; RefreshTokens passes nil and the rewriter is
either omitted entirely or runs with identityCfg=nil because
providers like Snowflake omit username on refresh and identity is
cached at auth-code time in session storage.
The new priority chain in BaseOAuth2Provider.ExchangeCodeForIdentity:
1. IdentityFromToken — when configured, return the extracted
identity. If extraction failed (path didn't resolve), return
ErrIdentityResolutionFailed without consulting userInfo or
synthesising — the operator's "identity is in the token" claim
is explicit and we surface its failure rather than silently
fall through.
2. UserInfo — existing fetchUserInfo path, unchanged.
3. Synthesis — existing synthesizeIdentity path (PR 5094),
unchanged.
OIDC providers always have ID-token-derived identity, so the OIDC
provider's ExchangeCodeForIdentity discards the rewriter's
identityFromToken return value with a defensive WARN if a future
config-loader bug ever sets IdentityFromToken on an OIDC base config
(structurally absent on the OIDC CRD type today).
The tripwire test asserts userinfo HTTP is never contacted when
identityFromToken is configured, including on extraction failure.
Other new tests cover the happy path, userInfo-only regression, the
refresh path (no identity extraction), and information disclosure
(raw body never appears in error messages or logs above DEBUG).
Two existing slog.Info calls in exchangeCodeForTokens are downgraded
to slog.Debug to comply with the silent-success-at-INFO rule.
Closes: #5156
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5222 +/- ##
==========================================
+ Coverage 67.78% 67.84% +0.05%
==========================================
Files 608 610 +2
Lines 62224 62421 +197
==========================================
+ Hits 42180 42347 +167
- Misses 16872 16898 +26
- Partials 3172 3176 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
userInfowas configured, otherwise a synthesizedtk-...subject (PR Allow OAuth2 upstreams to omit userInfo config #5094). Upstreams that expose identity directly in the token-endpoint response (Snowflake'susername, Slack v2'sauthed_user.id) had no good answer — they would either go through synthesis (losing real subject identity forusersrow creation) or require a fakeuserInfoURL.IdentityFromTokenintoBaseOAuth2Provider.ExchangeCodeForIdentityas a third resolution mode that runs ahead ofuserInfo. The existingtokenResponseRewriteralready reads and parses every successful token-endpoint response to normalise non-standard envelopes; it now extracts identity claims from the same body using gjson dot-notation paths when an operator configures it.IdentityFromTokenextracts identity from the raw pre-rewrite body, (2)UserInfofetches via HTTP (existing), (3)synthesizeIdentityfalls back (existing). When extraction is configured but fails, the operator-actionable extractor error (path name + type description, never body content) is surfaced throughErrIdentityResolutionFailedrather than silently falling through to userinfo or synthesis — "identityFromToken set" is an explicit operator claim.RefreshTokensdeliberately passesnilfor the identity config: providers like Snowflake omit username on refresh; identity is cached at auth-code time and read from session storage on subsequent requests.IdentityFromTokenon an OIDC base config (structurally absent on the OIDC CRD type today).slog.Infocalls inexchangeCodeForTokenswere downgraded toslog.Debugfor silent-success-at-INFO compliance.Closes #5156
Type of change
Test plan
task test)task lint-fix)New tests cover the happy path, the userinfo-bypass tripwire (asserts no userinfo HTTP call when identityFromToken is configured), the userInfo-only regression, the synthesis fallback, the refresh path (no identity extraction), the
@upstreamjwtmodifier path through the provider, and an information-disclosure assertion that the raw token body never appears in error messages.API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.This PR adds an additive optional field to the runtime
OAuth2Configonly; the matching CRD type lands in #5155.Does this introduce a user-facing change?
No direct CRD-level user-facing change in this PR. End-to-end functionality is gated on the runtime translation in #5157 (the next PR in the stack), which wires the v1beta1 CRD field into this runtime config.
Implementation plan
Approved implementation plan
This is phase 3 of the Snowflake /
identityFromTokenstory (#5150):extractIdentityFromTokenResponseand the@upstreamjwtgjson modifier.IdentityFromTokenConfigCRD type onMCPExternalAuthConfigv1beta1.BaseOAuth2Provider.ExchangeCodeForIdentityvia the existingtokenResponseRewriter. New priority chain. Identity-source set on the rewriter, consumed by the caller afterExchangereturns.RegisterModifiers()so@upstreamjwtpaths actually work in production.thv llm setupdoesn't append provider path prefix for Envoy AI Gateway #5158, Recognise identityFromToken in synthesis-mode detection #5159): Examples + synthesis-detection refinement so theIdentitySynthesizedcontroller condition no longer fires whenIdentityFromTokenis configured.Design constraints surfaced and addressed in this PR:
rewriteTokenResponseonly relocates standard OAuth fields today and never touches identity-shaped fields, but extracting first makes the ordering invariant independent of that assumption.wrapHTTPClientForTokenExchangecreates a fresh one per call); no state shared across goroutines.IdentityFromTokenare claiming identity is in the token response; we surface the failure rather than mask it.subjectPath %q %swhere%sis a static validator description ("path not found in token response", etc.). Never body bytes. The ExchangeCodeForIdentity caller wraps this directly so operators see the misconfigured path name in the returned error.Special notes for reviewers
pkg/authserver/upstream/oauth2.go:160-164andpkg/authserver/upstream/identity_from_token.go:31-32referencecmd/thv-operator/api/v1alpha1.IdentityFromTokenConfig(the v1beta1 CRD type). That type lands in Add identityFromToken to MCPExternalAuthConfig CRD #5155. Until Add identityFromToken to MCPExternalAuthConfig CRD #5155 merges, those references dangle on this branch in isolation. End-to-end functionality requires Translate identityFromToken from CRD to runtime config #5157 too, which adds the operator-to-runner translation and theRegisterModifiers()bootstrap call.docs/arch/11-auth-server-storage.mdupdate reframes the previous "Synthesis-mode subjects" section as a 3-priority chain. It also flags a known gap: the controller predicateSyntheticIdentityUpstreams()checks onlyuserInfo == niland does not yet account forIdentityFromToken, so an upstream with onlyIdentityFromTokenconfigured will still triggerIdentitySynthesizedActiveuntil Recognise identityFromToken in synthesis-mode detection #5159 lands.synthesize fallbacksubtest inoauth2_test.gowas rewritten to useNewOAuth2Providerinstead of hand-building a struct literal, aligning it with the rest of the suite.🤖 Generated with Claude Code