Skip to content

fix: cleanup orphaned components when parent cluster is missing#29

Open
wallyxjh wants to merge 3 commits into
labring:fix/v0.9.3from
wallyxjh:fix/orphaned-component-cleanup
Open

fix: cleanup orphaned components when parent cluster is missing#29
wallyxjh wants to merge 3 commits into
labring:fix/v0.9.3from
wallyxjh:fix/orphaned-component-cleanup

Conversation

@wallyxjh

Copy link
Copy Markdown
Collaborator

Problem

PR #18 fixed orphaned Component cleanup for the Terminating case (Components with deletionTimestamp), but did not cover the Running orphan case.

When a Cluster is deleted but its owned Components still exist without a deletionTimestamp, the componentLoadResourcesTransformer enters an infinite 1-second requeue loop:

build error: requeue after: 1s as: failed to get cluster test-db: 
clusters.apps.kubeblocks.io "test-db" not found

This happens when Kubernetes GC fails to complete cascade deletion (e.g. due to timing, force delete, etc.). These orphaned Components waste controller resources and cause log spam across all affected namespaces.

Root Cause

In transformer_component_load_resources.go, when the Cluster is not found, the code unconditionally returns newRequeueError(requeueDuration, err.Error()) without checking whether the Component is an orphan:

cluster := &appsv1alpha1.Cluster{}
err = transCtx.Client.Get(..., cluster)
if err != nil {
    return newRequeueError(requeueDuration, err.Error()) // infinite 1s retry
}

The componentDeletionTransformer (fixed in PR #18) is skipped entirely because these Components have no deletionTimestamp.

Solution

Added orphan detection in componentLoadResourcesTransformer:

  1. When Cluster returns NotFound, check if the Component has an ownerReference pointing to that Cluster (confirming it is an orphan).
  2. If orphaned, trigger deletion via client.Delete() to set deletionTimestamp.
  3. On the next reconcile, componentDeletionTransformer (with PR fix: handle component deletion when parent cluster is missing #18 fix) handles the actual cleanup.

Changes

  • controllers/apps/transformer_component_load_resources.go: Add isOrphanedComponent() and handleOrphanedComponent(), inject client.Client for direct API calls.
  • controllers/apps/component_controller.go: Pass Client to componentLoadResourcesTransformer.

When a Cluster is deleted but its owned Components still exist without a
deletionTimestamp (Running orphans), the componentLoadResourcesTransformer
enters an infinite 1s requeue loop because it cannot find the Cluster.

PR labring#18 fixed the Terminating orphan case (Components with deletionTimestamp)
by handling NotFound in componentDeletionTransformer. However, it did not
cover the Running orphan case where the Component has no deletionTimestamp
set - this happens when Kubernetes GC fails to complete the cascade deletion.

This fix adds orphan detection in componentLoadResourcesTransformer:
1. When Cluster is NotFound, check if the Component has an ownerReference
   pointing to that Cluster (confirming it's an orphan, not a misconfigured
   Component).
2. If orphaned, trigger deletion via client.Delete() to set deletionTimestamp.
3. On the next reconcile, componentDeletionTransformer (with PR labring#18 fix)
   handles the actual cleanup.

This eliminates the infinite requeue loop that wastes controller resources
and causes log spam across all affected namespaces.
@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant