Skip to content

Add CRD-runtime drift detection test framework#5209

Open
ChrisJBurns wants to merge 8 commits intomainfrom
drift-detection-walker-and-telemetry-test
Open

Add CRD-runtime drift detection test framework#5209
ChrisJBurns wants to merge 8 commits intomainfrom
drift-detection-walker-and-telemetry-test

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

Summary

  • Silent drift between CRD types and their runtime config counterparts has caused user-facing bugs (e.g. Composite Tools: Fix plumbing for on-error-continue config #3118, Simplify VMCP Configuration #3125) where new fields appeared to work in unit tests but were not wired through the CRD conversion path. The only existing safety nets — code review and end-to-end tests — do not scale.
  • Adds a reusable reflection-based field walker and a bidirectional drift detection pattern. The pattern requires every leaf JSON field on either side of a CRD↔runtime boundary to be either explicitly mapped to the other side, or explicitly declared as intentionally unmapped with a justification string. Drift is allowed; unannounced drift is not.
  • Applies the pattern to MCPTelemetryConfigtelemetry.Config as the first real-world exercise. The drift test passes against the current codebase.

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Tests added (all passing): walker unit tests in cmd/thv-operator/internal/testutil/reflect_test.go; drift tests in cmd/thv-operator/pkg/spectoconfig/telemetry_drift_test.go (TestTelemetryConfigDrift_CRDFieldsCovered, TestTelemetryConfigDrift_RuntimeFieldsCovered, TestTelemetryConfigDrift_MappingTableSanity).

task lint-fix could not be run locally — the installed golangci-lint v2.12.1 is built with go1.25 but the project targets go1.26. go vet and gofmt are clean on the new files. CI will run the project's pinned linter.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

Test-only change. No production code, CRD schemas, or APIs are modified.

Changes

File Change
cmd/thv-operator/internal/testutil/reflect.go New FlattenJSONLeafFields helper that returns sorted dot-delimited JSON leaf paths for a struct type. Recurses into nested structs, deref'd pointers, and slice/map element types. Stops at primitives and a small allowlist (time.Duration, metav1.Duration, json.RawMessage, thvjson.Map/Any, vmcpconfig.Duration). Skips metav1.TypeMeta/ObjectMeta/ListMeta and unexported / json:\"-\" fields. Honors ,inline and Go's anonymous-field promotion. Cycle-protected.
cmd/thv-operator/internal/testutil/reflect_test.go Table-driven walker tests: pointers, slice/array/map elements, embedded structs (with and without ,inline), json:\"-\", missing tags, omitempty, unexported fields, leaf allowlist, sorted output, nil/non-struct input.
cmd/thv-operator/pkg/spectoconfig/telemetry_drift_test.go Bidirectional drift test for MCPTelemetryConfigSpectelemetry.Config. Two coverage tests (one per direction) sharing a single telemetryFieldMappings source-of-truth table, plus per-side IgnoredOnCRDOnly / IgnoredOnRuntimeOnly maps with justification strings. A third test asserts mapping-table sanity (no duplicates, no overlap with ignore lists, non-empty justifications).

Does this introduce a user-facing change?

No.

Special notes for reviewers

The pattern is bidirectional by design — it does not mandate parity between CRD and runtime types. Either side may have fields the other lacks; the test simply forces every divergence to be explicitly classified. New entries in IgnoredOnCRDOnly / IgnoredOnRuntimeOnly require a justification string, which makes intentional decoupling visible in code review.

The walker uncovered one CRD leaf that prior manual analysis missed: openTelemetry.caBundleRef.configMapRef.optional, a *bool promoted from corev1.ConfigMapKeySelector via LocalObjectReference. It is now explicitly declared in telemetryIgnoredOnCRDOnly with a justification — exactly the pattern working as intended.

This is the first PR in a series. Follow-ups will (1) introduce a CRD-owned v1beta1.VirtualMCPConfig mirror to decouple VirtualMCPServerSpec.Config from pkg/vmcp/config.Config without changing the user-facing CRD schema, and (2) extend the drift pattern to other converter boundaries (OIDC, external auth strategies, embedded auth server config).

Implementation plan

Approved drift detection plan

PR 1 (this PR): build the reusable reflection walker + bidirectional drift test machinery, exercise it against an already-decoupled boundary (telemetry) to validate the pattern.

PR 2 (next): mirror vmcpconfig.Config top-level fields into v1beta1.VirtualMCPConfig, with nested fields still pointing at runtime types. CRD JSON schema unchanged. Replace vmcp.Spec.Config.DeepCopy() in the converter with explicit field-by-field copy at the top level.

PR 3+: add bidirectional drift tests for VirtualMCPConfig and incrementally mirror nested types (AggregationConfig, OperationalConfig, …), one per PR, with the drift test guiding each mirror.

🤖 Generated with Claude Code

Silent drift between CRD types and their runtime config counterparts has
caused user-facing bugs (e.g. PR #3118, issue #3125) where new fields
appeared to work in tests but were not wired through the CRD conversion
path. The only existing safety nets — code review and end-to-end tests —
do not scale.

Add a reusable reflection-based field walker and a bidirectional drift
test pattern. The pattern requires every leaf JSON field on either side
of a CRD/runtime boundary to be either explicitly mapped to the other
side or explicitly declared as intentionally unmapped, with a
justification string. Adding a field to either type without classifying
it fails the test with a clear action-required message.

Apply the pattern to MCPTelemetryConfig <-> telemetry.Config as the
first real-world exercise. The drift test passes against the current
codebase and surfaced one previously-undocumented leaf
(openTelemetry.caBundleRef.configMapRef.optional) which is now
explicitly declared.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 93.54839% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.82%. Comparing base (6b39256) to head (75ce1dd).

Files with missing lines Patch % Lines
cmd/thv-operator/internal/testutil/reflect.go 93.54% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5209      +/-   ##
==========================================
- Coverage   67.83%   67.82%   -0.01%     
==========================================
  Files         610      611       +1     
  Lines       62303    62396      +93     
==========================================
+ Hits        42262    42322      +60     
- Misses      16875    16896      +21     
- Partials     3166     3178      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChrisJBurns and others added 2 commits May 7, 2026 01:13
Fix lint failures from CI: rename reflect.Ptr to reflect.Pointer,
suppress exhaustive on the kind switch with rationale, and tag the
,inline json fixtures as a known revive false positive.

Replace the explicit leafTypes allowlist with a json.Marshaler interface
check. Every entry in the previous map shared one property — a custom
MarshalJSON whose output bears no relation to the Go field layout — so
asking the type how it serializes is more general than maintaining a
hand-rolled list of K8s and project types. The walker is now
self-maintaining for any future custom-marshaled type.

Address reviewer findings on telemetry_drift_test.go:
  - delete dead sortedKeys helper and its misleading comment
  - upgrade per-entry empty-field checks to require to avoid empty-string
    pollution of the duplicate-detection maps
  - assert no path appears in both ignore maps (cross-pollination)
  - assert every mapping/ignore entry is still a live leaf on its type
    so renames and deletions surface instead of leaving stale entries
  - rewrite caCertPath, sensitiveHeaders, and serviceName justifications
    to name the actual wiring symbols a reviewer can grep for

Tighten the FlattenJSONLeafFields godoc to describe the one-level
self-reference expansion that maxStructRevisits actually permits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify cycle detection: stop on first revisit instead of allowing one
self-referential expansion. The "expand once" gold-plating produced a
"next.<field>" entry on linked-list-shaped types, but no real CRD or
runtime config has cyclic shape, and stop-on-revisit is the simpler
contract for drift detection. The visited tracker becomes a
map[reflect.Type]struct{} and maxStructRevisits goes away.

Compress the FlattenJSONLeafFields godoc from a 9-bullet semantics list
into one paragraph. Most of the bullets are now subsumed by the single
"json.Marshaler => leaf, otherwise recurse on Struct/Slice/Array/Map"
rule. The detailed encoding/json reference belongs in the standard
library, not here.

Drop redundant subtests in TestFlattenJSONLeafFields. After the
json.Marshaler refactor, the json.RawMessage / thvjson.Map+Any /
vmcpconfig.Duration cases all exercise the same short-circuit branch as
metav1.Duration, so one Marshaler example is enough. Drop slice-of-
pointer-to-struct (covered by combining the pointer-deref and slice-of-
struct cases). Fold the dedicated PointerInputDereferenced test into the
table. Remove TestFlattenJSONLeafFields_OutputIsSorted: the table-
driven test already pins exact sorted-string equality on every case.

Update the recursive-type expectation to ["name"] to lock in the new
stop-on-revisit semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 7, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 7, 2026
golangci-lint v2 doesn't honor the //exhaustive:ignore source directive
in this configuration; the project-wide nolint convention works. The
switch genuinely falls through to a default leaf branch for every Kind
not listed (primitives, interfaces, channels, etc.) — the suppression
is a confirmed false positive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 7, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 7, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 7, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant