Use localhost image reference in PodObservedGenerationTracking test#2600
Use localhost image reference in PodObservedGenerationTracking test#2600Chandan9112 wants to merge 1 commit intoopenshift:masterfrom
Conversation
The test uses an invalid image to induce a pull error. The previous image name causes slow DNS/registry resolution on bare-metal environments, leading to 30s timeouts. Using localhost makes the pull fail instantly, avoiding flaky timeouts.
|
@Chandan9112: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Chandan9112 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6 10 |
|
@Chandan9112: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/137040c0-1393-11f1-9f5a-64e05975c424-0 |
|
@Chandan9112: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Upstream PR: kubernetes#137252
This is the downstream counterpart to validate the fix on bare-metal environments.
What this PR does / why we need it:
The PodObservedGenerationTracking e2e test creates a pod with an invalid image
(some-image-that-doesnt-exist) to induce a pull error, then updates the image to a
valid one and verifies that the pod conditions' observedGeneration field increments
from 1 to 2.
On bare-metal, the kubelet/CRI-O spends too long trying to resolve the
non-existent image reference against external registries — DNS resolution and registry
lookups often exceed the test's 30-second timeout before the observedGeneration update
can be observed, causing flaky failures in
periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6.
Changes:
Changed the invalid image reference from some-image-that-doesnt-exist to
localhost/some-image-that-doesnt-exist, which fails instantly because there is no
container registry running on localhost. This eliminates the dependency on external
DNS/registry resolution timing.
File changed: test/e2e/node/pods.go
Testing:
Tested against a live cluster — the test passed in ~8 seconds (previously timing out at 30s).
Triggering bare-metal test via /payload-aggregate to confirm the fix on metal.
Test output (from GCP cluster):
[sig-node] Pods Extended (pod generation) Pod Generation
pod observedGeneration field set in pod conditions
STEP: creating the pod
STEP: submitting the pod to kubernetes
STEP: verifying the pod conditions have observedGeneration values
STEP: updating pod to have a valid image
STEP: verifying the pod conditions have updated observedGeneration values
STEP: deleting the pod
[8.041 seconds]
Ran 1 of 7432 Specs in 11.278 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 7431 Skipped
--- PASS: TestE2E (11.37s)
PASS
cc: @bitoku