Skip to content

Wire identityFromToken into the OAuth2 upstream provider#5222

Open
jhrozek wants to merge 1 commit intomainfrom
token_claim_mapping-3-provider-swap
Open

Wire identityFromToken into the OAuth2 upstream provider#5222
jhrozek wants to merge 1 commit intomainfrom
token_claim_mapping-3-provider-swap

Conversation

@jhrozek
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek commented May 7, 2026

Summary

  • The embedded auth server's pure-OAuth2 path resolved identity in two ways: a UserInfo HTTP fetch when userInfo was configured, otherwise a synthesized tk-... subject (PR Allow OAuth2 upstreams to omit userInfo config #5094). Upstreams that expose identity directly in the token-endpoint response (Snowflake's username, Slack v2's authed_user.id) had no good answer — they would either go through synthesis (losing real subject identity for users row creation) or require a fake userInfo URL.
  • This PR wires IdentityFromToken into BaseOAuth2Provider.ExchangeCodeForIdentity as a third resolution mode that runs ahead of userInfo. The existing tokenResponseRewriter already reads and parses every successful token-endpoint response to normalise non-standard envelopes; it now extracts identity claims from the same body using gjson dot-notation paths when an operator configures it.
  • New priority chain: (1) IdentityFromToken extracts identity from the raw pre-rewrite body, (2) UserInfo fetches via HTTP (existing), (3) synthesizeIdentity falls back (existing). When extraction is configured but fails, the operator-actionable extractor error (path name + type description, never body content) is surfaced through ErrIdentityResolutionFailed rather than silently falling through to userinfo or synthesis — "identityFromToken set" is an explicit operator claim.
  • RefreshTokens deliberately passes nil for the identity config: providers like Snowflake omit username on refresh; identity is cached at auth-code time and read from session storage on subsequent requests.
  • The OIDC provider discards the rewriter's identity return value with a defensive WARN if a future config-loader bug ever sets IdentityFromToken on an OIDC base config (structurally absent on the OIDC CRD type today).
  • Two slog.Info calls in exchangeCodeForTokens were downgraded to slog.Debug for silent-success-at-INFO compliance.

Closes #5156

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

New tests cover the happy path, the userinfo-bypass tripwire (asserts no userinfo HTTP call when identityFromToken is configured), the userInfo-only regression, the synthesis fallback, the refresh path (no identity extraction), the @upstreamjwt modifier path through the provider, and an information-disclosure assertion that the raw token body never appears in error messages.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

This PR adds an additive optional field to the runtime OAuth2Config only; the matching CRD type lands in #5155.

Does this introduce a user-facing change?

No direct CRD-level user-facing change in this PR. End-to-end functionality is gated on the runtime translation in #5157 (the next PR in the stack), which wires the v1beta1 CRD field into this runtime config.

Implementation plan

Approved implementation plan

This is phase 3 of the Snowflake / identityFromToken story (#5150):

Design constraints surfaced and addressed in this PR:

  • Identity extraction runs on the raw pre-rewrite body. rewriteTokenResponse only relocates standard OAuth fields today and never touches identity-shaped fields, but extracting first makes the ordering invariant independent of that assumption.
  • The rewriter is single-use by construction (wrapHTTPClientForTokenExchange creates a fresh one per call); no state shared across goroutines.
  • Extraction failure does NOT fall through to userinfo. Operators who configure IdentityFromToken are claiming identity is in the token response; we surface the failure rather than mask it.
  • The extractor's error format is subjectPath %q %s where %s is a static validator description ("path not found in token response", etc.). Never body bytes. The ExchangeCodeForIdentity caller wraps this directly so operators see the misconfigured path name in the returned error.

Special notes for reviewers

  • Stack ordering: The doc comments at pkg/authserver/upstream/oauth2.go:160-164 and pkg/authserver/upstream/identity_from_token.go:31-32 reference cmd/thv-operator/api/v1alpha1.IdentityFromTokenConfig (the v1beta1 CRD type). That type lands in Add identityFromToken to MCPExternalAuthConfig CRD #5155. Until Add identityFromToken to MCPExternalAuthConfig CRD #5155 merges, those references dangle on this branch in isolation. End-to-end functionality requires Translate identityFromToken from CRD to runtime config #5157 too, which adds the operator-to-runner translation and the RegisterModifiers() bootstrap call.
  • The docs/arch/11-auth-server-storage.md update reframes the previous "Synthesis-mode subjects" section as a 3-priority chain. It also flags a known gap: the controller predicate SyntheticIdentityUpstreams() checks only userInfo == nil and does not yet account for IdentityFromToken, so an upstream with only IdentityFromToken configured will still trigger IdentitySynthesizedActive until Recognise identityFromToken in synthesis-mode detection #5159 lands.
  • The synthesize fallback subtest in oauth2_test.go was rewritten to use NewOAuth2Provider instead of hand-building a struct literal, aligning it with the rest of the suite.

🤖 Generated with Claude Code

Wire identityFromToken into the embedded auth server's OAuth2 upstream
provider. Extension point: the existing tokenResponseRewriter (which
already reads and parses every successful token-endpoint response to
normalise non-standard envelopes) gains a parallel responsibility —
extract user identity claims from the same body when the operator
configures IdentityFromTokenConfig with gjson dot-notation paths.

Identity extraction runs on the RAW pre-rewrite body, so paths are
resolved against the original provider response even when
TokenResponseMapping is also configured. The rewriter passes the
extracted *partialIdentity back to exchangeCodeForTokens via a
returned reference; RefreshTokens passes nil and the rewriter is
either omitted entirely or runs with identityCfg=nil because
providers like Snowflake omit username on refresh and identity is
cached at auth-code time in session storage.

The new priority chain in BaseOAuth2Provider.ExchangeCodeForIdentity:

  1. IdentityFromToken — when configured, return the extracted
     identity. If extraction failed (path didn't resolve), return
     ErrIdentityResolutionFailed without consulting userInfo or
     synthesising — the operator's "identity is in the token" claim
     is explicit and we surface its failure rather than silently
     fall through.
  2. UserInfo — existing fetchUserInfo path, unchanged.
  3. Synthesis — existing synthesizeIdentity path (PR 5094),
     unchanged.

OIDC providers always have ID-token-derived identity, so the OIDC
provider's ExchangeCodeForIdentity discards the rewriter's
identityFromToken return value with a defensive WARN if a future
config-loader bug ever sets IdentityFromToken on an OIDC base config
(structurally absent on the OIDC CRD type today).

The tripwire test asserts userinfo HTTP is never contacted when
identityFromToken is configured, including on extraction failure.
Other new tests cover the happy path, userInfo-only regression, the
refresh path (no identity extraction), and information disclosure
(raw body never appears in error messages or logs above DEBUG).

Two existing slog.Info calls in exchangeCodeForTokens are downgraded
to slog.Debug to comply with the silent-success-at-INFO rule.

Closes: #5156
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 7, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 89.04110% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.84%. Comparing base (2b50a8d) to head (9d9f68b).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
pkg/authserver/upstream/oauth2.go 85.71% 5 Missing and 1 partial ⚠️
pkg/authserver/upstream/oidc.go 77.77% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5222      +/-   ##
==========================================
+ Coverage   67.78%   67.84%   +0.05%     
==========================================
  Files         608      610       +2     
  Lines       62224    62421     +197     
==========================================
+ Hits        42180    42347     +167     
- Misses      16872    16898      +26     
- Partials     3172     3176       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire identityFromToken into the OAuth2 upstream provider

1 participant