Skip to content

e2e framework: AssertResourceAbsent silently passes when resource still exists #1082

@DavidRajnoha

Description

@DavidRajnoha

Bug

Framework.AssertResourceAbsent and Framework.AssertStatefulSetContainerHasArg in test/e2e/framework/assertions.go silently pass when they should fail. Any e2e test relying on these assertions may be producing false positives.

Root cause

Both functions share the same bug pattern: in their wait.PollUntilContextTimeout condition function, they return (false, fmt.Errorf(...)) for the "not ready yet" case. Per the PollUntilContextCancel contract, returning a non-nil error terminates polling immediately and that error is returned to the caller. The caller then only checks wait.Interrupted(err), which returns true only for context cancellation/deadline — not for a condition error. So the error is silently discarded and t.Fatal is never called.

Affected functions

1. AssertResourceAbsent (line 107)

// When resource IS present (should keep polling):
return false, fmt.Errorf("resource %s/%s should not be present", namespace, name)

On the first poll, if the resource exists, the error terminates the loop. wait.Interrupted returns false, so the test silently passes in ~0.1s without ever waiting for deletion.

Callers affected:

  • monitoring_stack_controller_test.go:466 — PodDisruptionBudget absence
  • monitoring_stack_controller_test.go:490 — PodDisruptionBudget absence
  • monitoring_stack_controller_test.go:615 — Pod absence
  • monitoring_stack_controller_test.go:1510 — Prometheus ClusterRoleBinding absence
  • monitoring_stack_controller_test.go:1511 — Alertmanager ClusterRoleBinding absence
  • AssertClusterRoleBindingAbsent (line 753) — delegates to AssertResourceAbsent

2. AssertStatefulSetContainerHasArg (line 194)

// When container is not found in StatefulSet template:
if container == nil {
    return false, fmt.Errorf("container %q not found in StatefulSet template", containerName)
}

If the container doesn't exist yet (e.g. StatefulSet hasn't been updated), the error terminates polling immediately. wait.Interrupted returns false on line 205, so t.Fatalf is skipped and the test silently passes.

Fix

For both functions, change the "not ready yet" returns from (false, fmt.Errorf(...)) to (false, nil):

AssertResourceAbsent line 107:

// Before:
return false, fmt.Errorf("resource %s/%s should not be present", namespace, name)
// After:
return false, nil

AssertStatefulSetContainerHasArg line 194:

// Before:
return false, fmt.Errorf("container %q not found in StatefulSet template", containerName)
// After:
return false, nil

Returning (false, nil) tells the poller "condition not met, keep trying", so it continues polling until success or timeout. On timeout, wait.Interrupted correctly returns true and t.Fatal fires.

Full audit of assertions.go

Function Poll return pattern Error check Correct?
AssertResourceNeverExists (line 58) (true, err) when found !wait.Interrupted Yes (inverted logic)
AssertResourceAbsent (line 87) (false, err) when present wait.Interrupted BUG
AssertResourceEventuallyExists (line 114) (false, nil) when missing wait.Interrupted Yes
AssertStatefulsetReady (line 141) (..., nil) always err != nil Yes
AssertStatefulSetContainerHasArg (line 163) (false, err) when no container wait.Interrupted BUG
AssertDeploymentReady (line 212) (..., nil) always err != nil Yes
AssertDeploymentReadyAndStable (line 234) (..., nil) always err != nil Yes
GetResourceWithRetry (line 261) (false, nil) when missing wait.Interrupted Yes
AssertAlertmanagerAbsent (line 690) (false, nil) when present wait.Interrupted Yes
AssertPrometheusReplicaStatus (line 718) (false, nil) always wait.Interrupted Yes
AssertClusterRoleBindingAbsent (line 753) delegates to AssertResourceAbsent BUG (inherited)

Note on AssertResourceNeverExists

This function works correctly but relies on a subtle interaction: line 78 returns (true, fmt.Errorf(...)) when the resource is found. With PollUntilContextCancel, the true/false is ignored when an error is returned — the error alone causes the loop to stop. It works because the outer check uses !wait.Interrupted(err) (negated), so any non-interrupted error triggers t.Fatal. But it would be clearer to return (false, fmt.Errorf(...)) since the boolean is meaningless when an error is present.

Discovered while

Writing the TestUIPluginUninstallCleanup e2e test for UIPlugin uninstall cleanup (COO-1404). All 8 resource-absence subtests falsely passed in ~0.11s each while oc get confirmed every resource was still alive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions