Skip to content

Wire persistent DCRCredentialStore into EmbeddedAuthServer#5196

Draft
tgrunnagle wants to merge 2 commits intodcr-3b_issue_5184from
dcr-3c_issue_5185
Draft

Wire persistent DCRCredentialStore into EmbeddedAuthServer#5196
tgrunnagle wants to merge 2 commits intodcr-3b_issue_5184from
dcr-3c_issue_5185

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle commented May 5, 2026

DRAFT - not ready for review

Summary

  • Why: Phase 2 of the DCR story shipped an in-memory stub for the DCRCredentialStore that lived only in the runner package. Restarting (or scaling out to) an authserver dropped every RFC 7591 client registration on the floor and re-registered against the upstream on every boot, which is unworkable for the Datadog-style upstream demo and for any multi-replica deployment. This PR wires the persistent DCRCredentialStore introduced in earlier sub-issues (in-memory + Redis backends) into EmbeddedAuthServer so a Redis-backed authserver reuses already-registered clients across replicas and restarts.
  • What:
    • EmbeddedAuthServer.dcrStore is now typed against storage.DCRCredentialStore and is derived from the same storage.Storage value returned by createStorage, so a single storage_type: redis config toggles DCR persistence alongside the rest of authserver state. The storage.Storage interface embeds storage.DCRCredentialStore, promoting the previously-needed runtime type assertion to a compile-time guarantee.
    • The Phase 2 standalone in-memory DCRCredentialStore in pkg/authserver/runner/dcr_store.go is collapsed into a thin storageBackedStore adapter that delegates to storage.DCRCredentialStore and translates DCRResolution <-> DCRCredentials at the boundary. There is now exactly one persistence implementation per backend.
    • The constructor is split into the public NewEmbeddedAuthServer (creates storage) and an unexported newEmbeddedAuthServerWithStorage (owns the cleanup contract). Any error after entry closes the storage backend via a deferred cleanup gated on a named return error, so a crash-looping caller no longer leaks the Redis client connection pool / MemoryStorage cleanup goroutine on every restart.
    • buildPureOAuth2Config remains pure (unchanged signature, no ctx, no I/O); buildUpstreamConfigs is the boundary that consumes the resolver and overlays DCR-resolved credentials onto each upstream.
    • Adds DCRStore() accessor on EmbeddedAuthServer mirroring IDPTokenStorage / UpstreamTokenRefresher, used by integration tests to verify the resolver and the authserver write through the same backend.

This PR also lands the dependency stack that #5185 builds on (the persistent DCRCredentialStore types + memory backend, the Redis backend, the operator CRD surface for DCR, and the runner-side DCR resolver wiring). Each layer was developed and reviewed as a separate commit on this branch; commits are sequenced so each one builds and tests cleanly.

Closes #5185

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Notable test coverage added by this PR:

  • pkg/authserver/integration_dcr_restart_test.go (new) — TestEmbeddedAuthServer_DCRSurvivesRestart boots an EmbeddedAuthServer against a mock AS, captures the DCR store via the new DCRStore() accessor, closes the server, and asserts the persisted DCR row survives the first server's Close. Lives in package authserver_test to avoid the runner -> authserver import cycle. The full "boot, close, boot again, observe zero /register" scenario across a fresh constructor is documented as a gap (the production Redis path requires Sentinel, which miniredis does not speak); test docstring records the conditions under which it can be closed.
  • pkg/authserver/runner/embeddedauthserver_test.goTestBuildUpstreamConfigs_DCR exercises first-call registration + cache-hit on the second call (zero additional HTTP requests) and asserts the caller's RunConfig.Upstreams slice is never mutated. TestNewEmbeddedAuthServer_ClosesStorageOnError uses a closeTrackingStorage wrapper to verify the deferred-cleanup contract.
  • pkg/authserver/storage/memory_test.go, redis_test.go, redis_integration_test.go — coverage for the persistent DCRCredentialStore operations on both backends, including ScopesHash canonicalisation (sort + dedupe + newline join).

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

The CRD changes are additive: OAuth2UpstreamConfig.clientId becomes optional with a CEL constraint requiring exactly one of clientId or dcrConfig, and a new dcrConfig field is added. Existing MCPExternalAuthConfig / VirtualMCPServer resources that set clientId continue to validate unchanged.

Does this introduce a user-facing change?

Yes. Operators of OAuth2 upstreams can now configure RFC 7591 Dynamic Client Registration in the operator CRD via dcrConfig (with discoveryUrl or registrationEndpoint, plus optional initialAccessTokenRef, softwareId, softwareStatement) instead of statically configuring clientId + clientSecret. When the authserver is configured with storage_type: redis, DCR registrations persist across restarts and are shared across replicas; in single-replica memory mode, registrations live for the process lifetime as before.

Special notes for reviewers

  • This PR is the terminal task in the Phase 3 DCR DAG and pulls along the dependency stack from sub-issues 1 and 2 (persistent DCRCredentialStore types + memory + Redis backends), the operator CRD surface, and the Phase 2 resolver wiring. The size is above the usual 400-line / 10-file limit; each commit is self-contained and the stack reads top-to-bottom in commit order. Reviewers may prefer to walk the per-commit diffs.
  • The full "boot, close, boot again, zero /register" cross-constructor restart scenario is not exercised; closing it requires either miniredis-Sentinel emulation or a Docker-based Redis Sentinel cluster in the test harness. The wiring that the second boot would consume — the type of dcrStore being the same storage.DCRCredentialStore that authserver.New writes through — is verified at compile time by storage.Storage embedding storage.DCRCredentialStore and by TestEmbeddedAuthServer_DCRSurvivesRestart asserting the persistence boundary.
  • buildPureOAuth2Config was kept intentionally pure (no ctx, no I/O) to preserve the architectural gate established in Authserver DCR integration (Phase 2, Steps 2a-2g) #4978; the wiring change swaps the implementation passed into the resolver, not the call shape.
  • No secrets (client_secret, registration_access_token, initial_access_token, refresh tokens) appear as arguments to slog.* calls; the grep assertion from Authserver DCR integration (Phase 2, Steps 2a-2g) #4978 still applies.

@tgrunnagle tgrunnagle changed the base branch from main to dcr-3b_issue_5184 May 5, 2026 15:48
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 5, 2026
@tgrunnagle tgrunnagle force-pushed the dcr-3b_issue_5184 branch 2 times, most recently from b0bf320 to 1736a6e Compare May 7, 2026 15:41
tgrunnagle added 2 commits May 7, 2026 09:28
Type EmbeddedAuthServer.dcrStore against storage.DCRCredentialStore and
derive it from the same storage.Storage value returned by createStorage
via a single type assertion, so a Redis-backed authserver reuses
already-registered RFC 7591 clients across replicas and restarts
instead of re-registering at every boot.

Phase 2 left two parallel DCR stores: a runner-side in-memory map in
dcr_store.go and the storage-level interface added in sub-issue 1. This
collapses the runner-side implementation into a thin storageBackedStore
adapter that delegates to storage.DCRCredentialStore, leaving exactly
one persistence implementation per backend (storage.MemoryStorage and
storage.RedisStorage).

NewInMemoryDCRCredentialStore is preserved as a test helper that wraps
storage.NewMemoryStorage so existing resolver tests compile unchanged;
the standalone inMemoryDCRCredentialStore type and its map / RWMutex
are deleted. buildPureOAuth2Config is unchanged — the wiring change
swaps the implementation passed to the resolver, not the call shape.

Add TestEmbeddedAuthServer_DCRSurvivesRestart in
embeddedauthserver_test.go (next to TestNewEmbeddedAuthServer_DCRBoot)
covering the durable-restart case: boot, close, rebuild against the
same storage.MemoryStorage instance, assert the second resolve makes
zero AS requests. The integration_test.go file under pkg/authserver
would otherwise be the natural home, but it is in package authserver
and importing runner from there would cycle (runner already imports
authserver); the test docstring records this constraint.
Fixed issues from code review of #5185 wiring change:

- HIGH: Storage backend leaked on NewEmbeddedAuthServer error paths.
  Split the constructor into a public NewEmbeddedAuthServer that calls
  createStorage and an unexported newEmbeddedAuthServerWithStorage that
  owns the cleanup contract via a deferred Close gated on a named
  return error. Verified by TestNewEmbeddedAuthServer_ClosesStorageOnError
  using a closeTrackingStorage wrapper.

- MEDIUM: Comment claimed interface embedding that did not exist.
  Embed storage.DCRCredentialStore in the storage.Storage interface
  instead, promoting the runtime type assertion to a compile-time
  guarantee (the AC's explicitly preferred outcome). The dead error
  branch and its outdated comment are gone; mocks regenerated via
  task gen.

- MEDIUM: Test placement deviated from AC instruction. Moved
  TestEmbeddedAuthServer_DCRSurvivesRestart out of the runner package
  and into a new pkg/authserver/integration_dcr_restart_test.go in
  package authserver_test, so the test lives next to the other
  pkg/authserver integration tests without inducing the runner ->
  authserver import cycle. Added a small public DCRStore() accessor
  on EmbeddedAuthServer mirroring existing IDPTokenStorage /
  UpstreamTokenRefresher accessors.

- MEDIUM: Durable-restart not exercised end-to-end. Strengthened the
  restart test to go through NewEmbeddedAuthServer for the first boot
  (full constructor path with DCR), capture the storage via the new
  DCRStore() accessor, and assert the DCR row survives the first
  server's Close. The full "boot, close, boot again, observe zero
  /register" scenario remains a documented gap (the production Redis
  path requires Sentinel which miniredis does not speak); the gap and
  the conditions under which it can be closed are recorded in the test
  docstring per the review's accept-the-gap branch.
@tgrunnagle tgrunnagle force-pushed the dcr-3c_issue_5185 branch from 781a0b9 to 565fade Compare May 7, 2026 16:33
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 7, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 78.94737% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.89%. Comparing base (df0ad8f) to head (565fade).

Files with missing lines Patch % Lines
pkg/authserver/runner/embeddedauthserver.go 56.25% 5 Missing and 2 partials ⚠️
pkg/authserver/runner/dcr_store.go 87.80% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@                  Coverage Diff                  @@
##           dcr-3b_issue_5184    #5196      +/-   ##
=====================================================
+ Coverage              67.81%   67.89%   +0.07%     
=====================================================
  Files                    610      610              
  Lines                  62379    62414      +35     
=====================================================
+ Hits                   42302    42374      +72     
+ Misses                 16902    16858      -44     
- Partials                3175     3182       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persistent DCRCredentialStore: wire into EmbeddedAuthServer

1 participant