Bug
Framework.AssertResourceAbsent and Framework.AssertStatefulSetContainerHasArg in test/e2e/framework/assertions.go silently pass when they should fail. Any e2e test relying on these assertions may be producing false positives.
Root cause
Both functions share the same bug pattern: in their wait.PollUntilContextTimeout condition function, they return (false, fmt.Errorf(...)) for the "not ready yet" case. Per the PollUntilContextCancel contract, returning a non-nil error terminates polling immediately and that error is returned to the caller. The caller then only checks wait.Interrupted(err), which returns true only for context cancellation/deadline — not for a condition error. So the error is silently discarded and t.Fatal is never called.
Affected functions
1. AssertResourceAbsent (line 107)
// When resource IS present (should keep polling):
return false, fmt.Errorf("resource %s/%s should not be present", namespace, name)
On the first poll, if the resource exists, the error terminates the loop. wait.Interrupted returns false, so the test silently passes in ~0.1s without ever waiting for deletion.
Callers affected:
monitoring_stack_controller_test.go:466 — PodDisruptionBudget absence
monitoring_stack_controller_test.go:490 — PodDisruptionBudget absence
monitoring_stack_controller_test.go:615 — Pod absence
monitoring_stack_controller_test.go:1510 — Prometheus ClusterRoleBinding absence
monitoring_stack_controller_test.go:1511 — Alertmanager ClusterRoleBinding absence
AssertClusterRoleBindingAbsent (line 753) — delegates to AssertResourceAbsent
2. AssertStatefulSetContainerHasArg (line 194)
// When container is not found in StatefulSet template:
if container == nil {
return false, fmt.Errorf("container %q not found in StatefulSet template", containerName)
}
If the container doesn't exist yet (e.g. StatefulSet hasn't been updated), the error terminates polling immediately. wait.Interrupted returns false on line 205, so t.Fatalf is skipped and the test silently passes.
Fix
For both functions, change the "not ready yet" returns from (false, fmt.Errorf(...)) to (false, nil):
AssertResourceAbsent line 107:
// Before:
return false, fmt.Errorf("resource %s/%s should not be present", namespace, name)
// After:
return false, nil
AssertStatefulSetContainerHasArg line 194:
// Before:
return false, fmt.Errorf("container %q not found in StatefulSet template", containerName)
// After:
return false, nil
Returning (false, nil) tells the poller "condition not met, keep trying", so it continues polling until success or timeout. On timeout, wait.Interrupted correctly returns true and t.Fatal fires.
Full audit of assertions.go
| Function |
Poll return pattern |
Error check |
Correct? |
AssertResourceNeverExists (line 58) |
(true, err) when found |
!wait.Interrupted |
Yes (inverted logic) |
AssertResourceAbsent (line 87) |
(false, err) when present |
wait.Interrupted |
BUG |
AssertResourceEventuallyExists (line 114) |
(false, nil) when missing |
wait.Interrupted |
Yes |
AssertStatefulsetReady (line 141) |
(..., nil) always |
err != nil |
Yes |
AssertStatefulSetContainerHasArg (line 163) |
(false, err) when no container |
wait.Interrupted |
BUG |
AssertDeploymentReady (line 212) |
(..., nil) always |
err != nil |
Yes |
AssertDeploymentReadyAndStable (line 234) |
(..., nil) always |
err != nil |
Yes |
GetResourceWithRetry (line 261) |
(false, nil) when missing |
wait.Interrupted |
Yes |
AssertAlertmanagerAbsent (line 690) |
(false, nil) when present |
wait.Interrupted |
Yes |
AssertPrometheusReplicaStatus (line 718) |
(false, nil) always |
wait.Interrupted |
Yes |
AssertClusterRoleBindingAbsent (line 753) |
delegates to AssertResourceAbsent |
— |
BUG (inherited) |
Note on AssertResourceNeverExists
This function works correctly but relies on a subtle interaction: line 78 returns (true, fmt.Errorf(...)) when the resource is found. With PollUntilContextCancel, the true/false is ignored when an error is returned — the error alone causes the loop to stop. It works because the outer check uses !wait.Interrupted(err) (negated), so any non-interrupted error triggers t.Fatal. But it would be clearer to return (false, fmt.Errorf(...)) since the boolean is meaningless when an error is present.
Discovered while
Writing the TestUIPluginUninstallCleanup e2e test for UIPlugin uninstall cleanup (COO-1404). All 8 resource-absence subtests falsely passed in ~0.11s each while oc get confirmed every resource was still alive.
Bug
Framework.AssertResourceAbsentandFramework.AssertStatefulSetContainerHasArgintest/e2e/framework/assertions.gosilently pass when they should fail. Any e2e test relying on these assertions may be producing false positives.Root cause
Both functions share the same bug pattern: in their
wait.PollUntilContextTimeoutcondition function, they return(false, fmt.Errorf(...))for the "not ready yet" case. Per thePollUntilContextCancelcontract, returning a non-nil error terminates polling immediately and that error is returned to the caller. The caller then only checkswait.Interrupted(err), which returnstrueonly for context cancellation/deadline — not for a condition error. So the error is silently discarded andt.Fatalis never called.Affected functions
1.
AssertResourceAbsent(line 107)On the first poll, if the resource exists, the error terminates the loop.
wait.Interruptedreturnsfalse, so the test silently passes in ~0.1s without ever waiting for deletion.Callers affected:
monitoring_stack_controller_test.go:466— PodDisruptionBudget absencemonitoring_stack_controller_test.go:490— PodDisruptionBudget absencemonitoring_stack_controller_test.go:615— Pod absencemonitoring_stack_controller_test.go:1510— Prometheus ClusterRoleBinding absencemonitoring_stack_controller_test.go:1511— Alertmanager ClusterRoleBinding absenceAssertClusterRoleBindingAbsent(line 753) — delegates toAssertResourceAbsent2.
AssertStatefulSetContainerHasArg(line 194)If the container doesn't exist yet (e.g. StatefulSet hasn't been updated), the error terminates polling immediately.
wait.Interruptedreturnsfalseon line 205, sot.Fatalfis skipped and the test silently passes.Fix
For both functions, change the "not ready yet" returns from
(false, fmt.Errorf(...))to(false, nil):AssertResourceAbsentline 107:AssertStatefulSetContainerHasArgline 194:Returning
(false, nil)tells the poller "condition not met, keep trying", so it continues polling until success or timeout. On timeout,wait.Interruptedcorrectly returnstrueandt.Fatalfires.Full audit of assertions.go
AssertResourceNeverExists(line 58)(true, err)when found!wait.InterruptedAssertResourceAbsent(line 87)(false, err)when presentwait.InterruptedAssertResourceEventuallyExists(line 114)(false, nil)when missingwait.InterruptedAssertStatefulsetReady(line 141)(..., nil)alwayserr != nilAssertStatefulSetContainerHasArg(line 163)(false, err)when no containerwait.InterruptedAssertDeploymentReady(line 212)(..., nil)alwayserr != nilAssertDeploymentReadyAndStable(line 234)(..., nil)alwayserr != nilGetResourceWithRetry(line 261)(false, nil)when missingwait.InterruptedAssertAlertmanagerAbsent(line 690)(false, nil)when presentwait.InterruptedAssertPrometheusReplicaStatus(line 718)(false, nil)alwayswait.InterruptedAssertClusterRoleBindingAbsent(line 753)AssertResourceAbsentNote on
AssertResourceNeverExistsThis function works correctly but relies on a subtle interaction: line 78 returns
(true, fmt.Errorf(...))when the resource is found. WithPollUntilContextCancel, thetrue/falseis ignored when an error is returned — the error alone causes the loop to stop. It works because the outer check uses!wait.Interrupted(err)(negated), so any non-interrupted error triggerst.Fatal. But it would be clearer to return(false, fmt.Errorf(...))since the boolean is meaningless when an error is present.Discovered while
Writing the
TestUIPluginUninstallCleanupe2e test for UIPlugin uninstall cleanup (COO-1404). All 8 resource-absence subtests falsely passed in ~0.11s each whileoc getconfirmed every resource was still alive.