Background
Issue #145 introduced deactivation pruning at the HTTPRoute/TLSRoute parent == nil branches (commits 1a2d934, 38e7f39). The prune deletes all previously-emitted CloudflareDNSRecord CRs labelled with the source's identity.
findTunnelTargetedParentRef (internal/controller/tunnel/attach.go:516-551) iterates parentRefs and continues past parents that either don't exist or fail to Get for any other reason — a single `continue` handles both `apierrors.IsNotFound` and transient errors (apiserver glitch, cache resync hole, etc.). If every candidate parent fails, the function returns `(nil parent, nil err)` → reconcile hits the new deactivation-prune branch.
Before #145's fix this was harmless: orphans would just persist for a reconcile. After the fix, a transient Get error on the parent Gateway can spuriously delete every CR the source ever emitted, triggering a delete-and-reemit churn that briefly tears down the Cloudflare-side DNS + TXT records.
Why this isn't urgent
- controller-runtime's cache normally returns `NotFound` (not network errors) for transient apiserver hiccups, so the spurious-prune path is theoretical in practice.
- `client.IgnoreNotFound` in the pruner makes the prune itself race-safe.
- The CR's finalizer chain rebuilds state on the next reconcile.
The risk is observability churn (events, briefly missing Cloudflare-side records), not data loss. Documented as design open question §2 in `docs/plans/2026-05-28-source-controller-orphan-gc-design.md`.
Proposed fix
In `attach.go:529` and `:541` (the two `continue` sites in `findTunnelTargetedParentRef`), branch on the error type:
- `apierrors.IsNotFound(err)` → `continue` (parent definitively doesn't exist).
- Any other error → return the error from `findTunnelTargetedParentRef` so the reconcile can requeue and try again, instead of presenting the call site as "no tunnel-targeted parent."
Then update the HTTPRoute and TLSRoute reconcile paths (`httproute_source_controller.go:120-122` and `tlsroute_source_controller.go:118-120`) to surface the non-NotFound error case as a requeue (`return reconcile.Result{}, err`) instead of falling into the deactivation prune.
When to revisit
Field evidence: an operator reports a CR briefly disappearing without a deliberate annotation change. Logs at the time would show the `orphan-prune failed during deactivation sweep` log message absent (because the prune call succeeded) followed by re-emission a moment later.
Surfaced by
Independent comprehensive review of the #145 fix branch (`feature/source-controller-orphan-gc`).
Background
Issue #145 introduced deactivation pruning at the HTTPRoute/TLSRoute
parent == nilbranches (commits1a2d934,38e7f39). The prune deletes all previously-emittedCloudflareDNSRecordCRs labelled with the source's identity.findTunnelTargetedParentRef(internal/controller/tunnel/attach.go:516-551) iterates parentRefs andcontinues past parents that either don't exist or fail to Get for any other reason — a single `continue` handles both `apierrors.IsNotFound` and transient errors (apiserver glitch, cache resync hole, etc.). If every candidate parent fails, the function returns `(nil parent, nil err)` → reconcile hits the new deactivation-prune branch.Before #145's fix this was harmless: orphans would just persist for a reconcile. After the fix, a transient Get error on the parent Gateway can spuriously delete every CR the source ever emitted, triggering a delete-and-reemit churn that briefly tears down the Cloudflare-side DNS + TXT records.
Why this isn't urgent
The risk is observability churn (events, briefly missing Cloudflare-side records), not data loss. Documented as design open question §2 in `docs/plans/2026-05-28-source-controller-orphan-gc-design.md`.
Proposed fix
In `attach.go:529` and `:541` (the two `continue` sites in `findTunnelTargetedParentRef`), branch on the error type:
Then update the HTTPRoute and TLSRoute reconcile paths (`httproute_source_controller.go:120-122` and `tlsroute_source_controller.go:118-120`) to surface the non-NotFound error case as a requeue (`return reconcile.Result{}, err`) instead of falling into the deactivation prune.
When to revisit
Field evidence: an operator reports a CR briefly disappearing without a deliberate annotation change. Logs at the time would show the `orphan-prune failed during deactivation sweep` log message absent (because the prune call succeeded) followed by re-emission a moment later.
Surfaced by
Independent comprehensive review of the #145 fix branch (`feature/source-controller-orphan-gc`).