OCPBUGS-61828: refactor FeatureGate status check#6862
OCPBUGS-61828: refactor FeatureGate status check#6862openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
WalkthroughCentralizes feature-gate verification into a new helper EnsureFeatureGateStatus and integrates it in two e2e tests. Removes inline FeatureGate/ClusterVersion checks from the control-plane upgrade test, captures guestClient in cluster creation test, and introduces the helper implementation in util. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@sjenning: This pull request references Jira Issue OCPBUGS-61828, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The existing check does not work with the pruning behavior of kas-bootstrapper. The only version assured to be in the FeatureGate status is the most recent Completed version in the ClusterVersion history i.e. the currently running version.
788c33c to
d264903
Compare
|
/test verify |
|
/jira refresh |
|
@sjenning: This pull request references Jira Issue OCPBUGS-61828, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
test/e2e/util/util.go (2)
953-987: Harden against eventual consistency: poll until FeatureGate reflects current Completed versionDirect Get+Expect can flake shortly after rollout; wrap the cross‑resource check in a retry loop and only fail after a reasonable timeout.
Apply this diff:
func EnsureFeatureGateStatus(t *testing.T, ctx context.Context, guestClient crclient.Client) { t.Run("EnsureFeatureGateStatus", func(t *testing.T) { AtLeast(t, Version419) - g := NewWithT(t) - - clusterVersion := &configv1.ClusterVersion{} - err := guestClient.Get(ctx, crclient.ObjectKey{Name: "version"}, clusterVersion) - g.Expect(err).NotTo(HaveOccurred(), "failed to get ClusterVersion resource") - - featureGate := &configv1.FeatureGate{} - err = guestClient.Get(ctx, crclient.ObjectKey{Name: "cluster"}, featureGate) - g.Expect(err).NotTo(HaveOccurred(), "failed to get FeatureGate resource") - - // Expect at least one entry in ClusterVersion history - g.Expect(len(clusterVersion.Status.History)).To(BeNumerically(">", 0), "ClusterVersion history is empty") - currentVersion := clusterVersion.Status.History[0].Version - - // Expect current version to be in Completed state - g.Expect(clusterVersion.Status.History[0].State).To(Equal(configv1.CompletedUpdate), "most recent ClusterVersion history entry is not in Completed state") - - // Ensure that the current version in ClusterVersion is also present in FeatureGate status - versionFound := false - for _, details := range featureGate.Status.FeatureGates { - if details.Version == currentVersion { - versionFound = true - break - } - } - g.Expect(versionFound).To(BeTrue(), "current version %s from ClusterVersion not found in FeatureGate status", currentVersion) + g := NewWithT(t) + err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 10*time.Minute, true, func(ctx context.Context) (bool, error) { + cv := &configv1.ClusterVersion{} + if getErr := guestClient.Get(ctx, crclient.ObjectKey{Name: "version"}, cv); getErr != nil { + t.Logf("retrying ClusterVersion get: %v", getErr) + return false, nil + } + if len(cv.Status.History) == 0 { + return false, nil + } + if cv.Status.History[0].State != configv1.CompletedUpdate { + // Do not assert mid‑rollout; keep polling until Completed. + return false, nil + } + currentVersion := cv.Status.History[0].Version + + fg := &configv1.FeatureGate{} + if getErr := guestClient.Get(ctx, crclient.ObjectKey{Name: "cluster"}, fg); getErr != nil { + t.Logf("retrying FeatureGate get: %v", getErr) + return false, nil + } + for _, details := range fg.Status.FeatureGates { + if details.Version == currentVersion { + return true, nil + } + } + return false, nil + }) + g.Expect(err).NotTo(HaveOccurred(), "current ClusterVersion not reflected in FeatureGate status") }) }
953-987: Improve failure diagnostics (optional)On final failure, consider logging the FeatureGate.Status.FeatureGates slice to aid triage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
test/e2e/control_plane_upgrade_test.go(1 hunks)test/e2e/create_cluster_test.go(2 hunks)test/e2e/util/util.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
- GitHub Check: Cursor Bugbot
🔇 Additional comments (2)
test/e2e/create_cluster_test.go (1)
1848-1849: LGTM: capture guest client and use centralized feature‑gate checkThis reduces duplication and aligns with the pruning behavior constraints.
Also applies to: 1874-1875
test/e2e/control_plane_upgrade_test.go (1)
71-71: LGTM: replace inline verification with helperCentralizing the FeatureGate vs ClusterVersion validation keeps upgrade tests consistent and future‑proof.
wking
left a comment
There was a problem hiding this comment.
Looks good to me; thanks!
Eventually it would be nice to have a monitor watching/polling FeatureGates to show that we see the mid-update behavior we expect (both vA and vB in FeatureGate status while A->B was in progress, with A being pruned shortly after the update completed). But having reliable tests in the short term is worth deferring a refactor of that size.
And having the tests have a clearer idea of when the post-update kas-bootstrap would come around and prune the outgoing version, once the update completed, would also be nice, to confirm that that pruning was functioning. But again, not a blocker, and we want this change to firm up CI reliability now.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sjenning, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified later by observing e2e-aws for 4.21 and 4.20 post-merge and ensure the flake does not occur |
|
@sjenning: Only users can be targets for the DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified later @sjenning |
|
@sjenning: This PR has been marked to be verified later by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Test Resultse2e-aks
e2e-aws
|
|
@bryan-cox: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test verify-deps |
|
/retest-required |
|
flaked twice |
|
@sjenning: Overrode contexts on behalf of sjenning: ci/prow/e2e-aws-4-20 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/e2e-aws-4-20 |
|
@sjenning: Overrode contexts on behalf of sjenning: ci/prow/e2e-aws-4-20 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@sjenning: Jira Issue OCPBUGS-61828: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged:
All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-61828 has not been moved to the MODIFIED state. This PR is marked as verified-later. Jira issue(s) in the title of this PR will require post-merge verification. After testing, it must be manually moved to the DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sjenning: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/jira refresh |
|
@wking: Jira Issue OCPBUGS-61828: All pull requests linked via external trackers have merged: This pull request has the DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The existing check does not work with the pruning behavior of kas-bootstrapper. The only version assured to be in the FeatureGate status is the most recent Completed version in the ClusterVersion history i.e. the currently running version.
FeatureGate status pruning by
kas-bootstraphypershift/kas-bootstrap/kas_boostrap.go
Lines 127 to 131 in b8db9ed