Skip to content

OCPBUGS-61828: refactor FeatureGate status check#6862

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
sjenning:fix-ensure-feature-gate
Sep 19, 2025
Merged

OCPBUGS-61828: refactor FeatureGate status check#6862
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
sjenning:fix-ensure-feature-gate

Conversation

@sjenning
Copy link
Contributor

The existing check does not work with the pruning behavior of kas-bootstrapper. The only version assured to be in the FeatureGate status is the most recent Completed version in the ClusterVersion history i.e. the currently running version.

FeatureGate status pruning by kas-bootstrap

// Once we hit the first Completed entry and insert that into knownVersions
// we can break, because there shouldn't be anything left on the cluster that cares about those ancient releases anymore.
if cvoVersion.State == configv1.CompletedUpdate {
break
}

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Walkthrough

Centralizes feature-gate verification into a new helper EnsureFeatureGateStatus and integrates it in two e2e tests. Removes inline FeatureGate/ClusterVersion checks from the control-plane upgrade test, captures guestClient in cluster creation test, and introduces the helper implementation in util.

Changes

Cohort / File(s) Summary
Feature gate verification centralization
test/e2e/control_plane_upgrade_test.go
Removes inline ClusterVersion/FeatureGate validation and its imports; invokes e2eutil.EnsureFeatureGateStatus(t, ctx, guestClient) instead; retains remaining upgrade checks.
Create cluster test integration
test/e2e/create_cluster_test.go
Stores WaitForGuestClient return in guestClient and calls e2eutil.EnsureFeatureGateStatus(t, ctx, guestClient); existing flow otherwise unchanged.
E2E util addition
test/e2e/util/util.go
Adds EnsureFeatureGateStatus(t, ctx, guestClient): requires Version419, fetches ClusterVersion "version" and FeatureGate "cluster", asserts completed latest history entry, and verifies current version is present in FeatureGate status.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "OCPBUGS-61828: refactor FeatureGate status check" is concise, names the tracked bug, and clearly summarizes the primary change (refactoring how FeatureGate status is checked). It maps directly to the changes in the diff which centralize FeatureGate verification into EnsureFeatureGateStatus and remove inline checks, so it is appropriate for a teammate scanning history.
Description Check ✅ Passed The PR description explains why the change is necessary (FeatureGate status pruning by kas-bootstrapper leaving only the most recent Completed version) and links to the relevant kas-bootstrap code, which directly relates to the refactor in the changeset. This is on-topic and adequately justifies the test modification for the purposes of this lenient description check.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sjenning sjenning changed the title fix(OCPBUGS-61828): refactor FeatureGate status check OCPBUGS-61828: refactor FeatureGate status check Sep 18, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 18, 2025
@openshift-ci-robot
Copy link

@sjenning: This pull request references Jira Issue OCPBUGS-61828, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The existing check does not work with the pruning behavior of kas-bootstrapper. The only version assured to be in the FeatureGate status is the most recent Completed version in the ClusterVersion history i.e. the currently running version.

FeatureGate status pruning by kas-bootstrap

// Once we hit the first Completed entry and insert that into knownVersions
// we can break, because there shouldn't be anything left on the cluster that cares about those ancient releases anymore.
if cvoVersion.State == configv1.CompletedUpdate {
break
}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added area/testing Indicates the PR includes changes for e2e testing approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Sep 18, 2025
The existing check does not work with the pruning behavior
of kas-bootstrapper.  The only version assured to be in the
FeatureGate status is the most recent Completed version
in the ClusterVersion history i.e. the currently running
version.
@sjenning sjenning force-pushed the fix-ensure-feature-gate branch from 788c33c to d264903 Compare September 18, 2025 21:14
@sjenning
Copy link
Contributor Author

/test verify

@sjenning
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 18, 2025
@openshift-ci-robot
Copy link

@sjenning: This pull request references Jira Issue OCPBUGS-61828, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sjenning sjenning marked this pull request as ready for review September 18, 2025 21:37
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2025
@openshift-ci openshift-ci bot requested review from csrwng and jparrill September 18, 2025 21:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
test/e2e/util/util.go (2)

953-987: Harden against eventual consistency: poll until FeatureGate reflects current Completed version

Direct Get+Expect can flake shortly after rollout; wrap the cross‑resource check in a retry loop and only fail after a reasonable timeout.

Apply this diff:

 func EnsureFeatureGateStatus(t *testing.T, ctx context.Context, guestClient crclient.Client) {
 	t.Run("EnsureFeatureGateStatus", func(t *testing.T) {
 		AtLeast(t, Version419)
 
-		g := NewWithT(t)
-
-		clusterVersion := &configv1.ClusterVersion{}
-		err := guestClient.Get(ctx, crclient.ObjectKey{Name: "version"}, clusterVersion)
-		g.Expect(err).NotTo(HaveOccurred(), "failed to get ClusterVersion resource")
-
-		featureGate := &configv1.FeatureGate{}
-		err = guestClient.Get(ctx, crclient.ObjectKey{Name: "cluster"}, featureGate)
-		g.Expect(err).NotTo(HaveOccurred(), "failed to get FeatureGate resource")
-
-		// Expect at least one entry in ClusterVersion history
-		g.Expect(len(clusterVersion.Status.History)).To(BeNumerically(">", 0), "ClusterVersion history is empty")
-		currentVersion := clusterVersion.Status.History[0].Version
-
-		// Expect current version to be in Completed state
-		g.Expect(clusterVersion.Status.History[0].State).To(Equal(configv1.CompletedUpdate), "most recent ClusterVersion history entry is not in Completed state")
-
-		// Ensure that the current version in ClusterVersion is also present in FeatureGate status
-		versionFound := false
-		for _, details := range featureGate.Status.FeatureGates {
-			if details.Version == currentVersion {
-				versionFound = true
-				break
-			}
-		}
-		g.Expect(versionFound).To(BeTrue(), "current version %s from ClusterVersion not found in FeatureGate status", currentVersion)
+		g := NewWithT(t)
+		err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 10*time.Minute, true, func(ctx context.Context) (bool, error) {
+			cv := &configv1.ClusterVersion{}
+			if getErr := guestClient.Get(ctx, crclient.ObjectKey{Name: "version"}, cv); getErr != nil {
+				t.Logf("retrying ClusterVersion get: %v", getErr)
+				return false, nil
+			}
+			if len(cv.Status.History) == 0 {
+				return false, nil
+			}
+			if cv.Status.History[0].State != configv1.CompletedUpdate {
+				// Do not assert mid‑rollout; keep polling until Completed.
+				return false, nil
+			}
+			currentVersion := cv.Status.History[0].Version
+
+			fg := &configv1.FeatureGate{}
+			if getErr := guestClient.Get(ctx, crclient.ObjectKey{Name: "cluster"}, fg); getErr != nil {
+				t.Logf("retrying FeatureGate get: %v", getErr)
+				return false, nil
+			}
+			for _, details := range fg.Status.FeatureGates {
+				if details.Version == currentVersion {
+					return true, nil
+				}
+			}
+			return false, nil
+		})
+		g.Expect(err).NotTo(HaveOccurred(), "current ClusterVersion not reflected in FeatureGate status")
 	})
 }

953-987: Improve failure diagnostics (optional)

On final failure, consider logging the FeatureGate.Status.FeatureGates slice to aid triage.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between b4844c3 and d264903.

📒 Files selected for processing (3)
  • test/e2e/control_plane_upgrade_test.go (1 hunks)
  • test/e2e/create_cluster_test.go (2 hunks)
  • test/e2e/util/util.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (2)
test/e2e/create_cluster_test.go (1)

1848-1849: LGTM: capture guest client and use centralized feature‑gate check

This reduces duplication and aligns with the pruning behavior constraints.

Also applies to: 1874-1875

test/e2e/control_plane_upgrade_test.go (1)

71-71: LGTM: replace inline verification with helper

Centralizing the FeatureGate vs ClusterVersion validation keeps upgrade tests consistent and future‑proof.

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me; thanks!

Eventually it would be nice to have a monitor watching/polling FeatureGates to show that we see the mid-update behavior we expect (both vA and vB in FeatureGate status while A->B was in progress, with A being pruned shortly after the update completed). But having reliable tests in the short term is worth deferring a refactor of that size.

And having the tests have a clearer idea of when the post-update kas-bootstrap would come around and prune the outgoing version, once the update completed, would also be nice, to confirm that that pruning was functioning. But again, not a blocker, and we want this change to firm up CI reliability now.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sjenning
Copy link
Contributor Author

/verified later

by observing e2e-aws for 4.21 and 4.20 post-merge and ensure the flake does not occur

@openshift-ci-robot
Copy link

@sjenning: Only users can be targets for the /verified later command.

Details

In response to this:

/verified later

by observing e2e-aws for 4.21 and 4.20 post-merge and ensure the flake does not occur

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sjenning
Copy link
Contributor Author

/verified later @sjenning

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Sep 18, 2025
@openshift-ci-robot
Copy link

@sjenning: This PR has been marked to be verified later by @sjenning.

Details

In response to this:

/verified later @sjenning

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cwbotbot
Copy link

cwbotbot commented Sep 18, 2025

Test Results

e2e-aks

e2e-aws

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

@bryan-cox: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aks
/test e2e-aks-4-20
/test e2e-aws
/test e2e-aws-4-20
/test e2e-aws-override
/test e2e-aws-upgrade-hypershift-operator
/test e2e-kubevirt-aws-ovn-reduced
/test images
/test okd-scos-images
/test security
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-aws-autonode
/test e2e-aws-metrics
/test e2e-aws-minimal
/test e2e-aws-techpreview
/test e2e-azure-aks-ovn-conformance
/test e2e-conformance
/test e2e-kubevirt-aws-ovn
/test e2e-kubevirt-azure-ovn
/test e2e-kubevirt-metal-conformance
/test e2e-openstack-aws
/test e2e-openstack-aws-conformance
/test e2e-openstack-aws-csi-cinder
/test e2e-openstack-aws-csi-manila
/test e2e-openstack-aws-nfv
/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-hypershift-main-e2e-aks
pull-ci-openshift-hypershift-main-e2e-aks-4-20
pull-ci-openshift-hypershift-main-e2e-aws
pull-ci-openshift-hypershift-main-e2e-aws-4-20
pull-ci-openshift-hypershift-main-e2e-aws-upgrade-hypershift-operator
pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn-reduced
pull-ci-openshift-hypershift-main-images
pull-ci-openshift-hypershift-main-okd-scos-e2e-aws-ovn
pull-ci-openshift-hypershift-main-okd-scos-images
pull-ci-openshift-hypershift-main-security
pull-ci-openshift-hypershift-main-unit
pull-ci-openshift-hypershift-main-verify
Details

In response to this:

/test verify-does

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bryan-cox
Copy link
Member

/test verify-deps

@bryan-cox
Copy link
Member

/retest-required

@sjenning
Copy link
Contributor Author

flaked twice
/override ci/prow/e2e-aws-4-20

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 19, 2025

@sjenning: Overrode contexts on behalf of sjenning: ci/prow/e2e-aws-4-20

Details

In response to this:

flaked twice
/override ci/prow/e2e-aws-4-20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 7e1b07f and 2 for PR HEAD d264903 in total

@sjenning
Copy link
Contributor Author

/override ci/prow/e2e-aws-4-20

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 19, 2025

@sjenning: Overrode contexts on behalf of sjenning: ci/prow/e2e-aws-4-20

Details

In response to this:

/override ci/prow/e2e-aws-4-20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 2c234b1 into openshift:main Sep 19, 2025
19 checks passed
@openshift-ci-robot
Copy link

@sjenning: Jira Issue OCPBUGS-61828: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-61828 has not been moved to the MODIFIED state.

This PR is marked as verified-later. Jira issue(s) in the title of this PR will require post-merge verification. After testing, it must be manually moved to the VERIFIED state.

Details

In response to this:

The existing check does not work with the pruning behavior of kas-bootstrapper. The only version assured to be in the FeatureGate status is the most recent Completed version in the ClusterVersion history i.e. the currently running version.

FeatureGate status pruning by kas-bootstrap

// Once we hit the first Completed entry and insert that into knownVersions
// we can break, because there shouldn't be anything left on the cluster that cares about those ancient releases anymore.
if cvoVersion.State == configv1.CompletedUpdate {
break
}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 19, 2025

@sjenning: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wking
Copy link
Member

wking commented Sep 19, 2025

/jira refresh

@openshift-ci-robot
Copy link

@wking: Jira Issue OCPBUGS-61828: All pull requests linked via external trackers have merged:

This pull request has the verified-later tag and will need to be manually moved to VERIFIED after testing. Jira Issue OCPBUGS-61828 has been moved to the MODIFIED state.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants