OCPBUGS-61218: test(e2e): add control plane component rollout validation to upgrade test#6767
Conversation
|
Skipping CI for Draft Pull Request. |
|
@csrwng: This pull request references Jira Issue OCPBUGS-61218, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds a control plane component rollout wait to the upgrade e2e test, deriving the starting version from HostedCluster status. Introduces a util to wait for ControlPlaneComponent rollout (with optional starting-version gate) and extends condition extraction to ControlPlaneComponent objects. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Test as e2e Upgrade Test
participant API as K8s API Server
participant Util as util.WaitForControlPlaneComponentRollout
Note over Test: Extract startingVersion from HostedCluster.Status.Version.History[0]
Test->>API: Update release image / HostedCluster
Test->>Util: WaitForControlPlaneComponentRollout(startingVersion)
Util->>API: List ControlPlaneComponent in HCP namespace (poll)
alt Components count > 10
loop Until rollout complete for all
Util->>API: Get each ControlPlaneComponent
Util->>Util: Check RolloutComplete condition
opt startingVersion provided
Util->>Util: Verify reported version != startingVersion
end
end
Util-->>Test: Success
else Too few components
Util->>API: Retry (polling)
end
Test->>API: Wait for overall image rollout (existing flow)
Test-->>Test: Upgrade test proceeds
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.2.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: csrwng The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
/test verify |
998e913 to
4d53181
Compare
|
/test verify |
4d53181 to
3ca5e53
Compare
|
/test verify |
…test Add WaitForControlPlaneComponentRollout function to ensure control plane components complete rollout before proceeding with upgrade validation. This includes: - New test step to wait for control plane component rollout completion - Support for ControlPlaneComponent conditions in the eventually utility - 30-minute timeout with 3-second polling interval for efficient validation Signed-off-by: Cesar Wong <cewong@redhat.com> Assisted-by: Claude Sonnet 4 (via Cursor)
3ca5e53 to
c10bcee
Compare
|
/test verify |
|
/test verify |
Test Resultse2e-aws
e2e-aks
|
|
e2e failure is a flake, ready for review |
|
/lgtm |
|
/jira refresh |
|
@sjenning: This pull request references Jira Issue OCPBUGS-61218, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiezhao@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by e2e-test |
|
@csrwng: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@csrwng: This pull request references Jira Issue OCPBUGS-61218, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiezhao@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/e2e/util/util.go (1)
584-617: Harden rollout wait: add in-function version gate, clearer count check, and focused condition dump
- Add an in-function AtLeast(Version420) guard to avoid 30m timeouts if used from older suites.
- Make the component-count predicate explicit and message include the observed count.
- Filter condition dumps to the rollout condition to reduce log noise on failures.
func WaitForControlPlaneComponentRollout(t *testing.T, ctx context.Context, client crclient.Client, hostedCluster *hyperv1.HostedCluster, initialVersion string) { + // Guard misuse on pre-4.20 where ControlPlaneComponent CRD doesn't exist. + AtLeast(t, Version420) controlPlaneComponents := &hyperv1.ControlPlaneComponentList{} controlPlaneNamespace := manifests.HostedControlPlaneNamespace(hostedCluster.Namespace, hostedCluster.Name) EventuallyObjects(t, ctx, "control plane components to complete rollout", func(ctx context.Context) ([]*hyperv1.ControlPlaneComponent, error) { err := client.List(ctx, controlPlaneComponents, crclient.InNamespace(controlPlaneNamespace)) items := make([]*hyperv1.ControlPlaneComponent, len(controlPlaneComponents.Items)) for i := range controlPlaneComponents.Items { items[i] = &controlPlaneComponents.Items[i] } return items, err }, []Predicate[[]*hyperv1.ControlPlaneComponent]{ func(cpComponents []*hyperv1.ControlPlaneComponent) (done bool, reasons string, err error) { - return len(cpComponents) > 10, "expecting more than 10 control plane components", nil + const minComponents = 10 + if len(cpComponents) >= minComponents { + return true, "", nil + } + return false, fmt.Sprintf("expecting at least %d control plane components, got %d", minComponents, len(cpComponents)), nil }, }, []Predicate[*hyperv1.ControlPlaneComponent]{ ConditionPredicate[*hyperv1.ControlPlaneComponent](Condition{ Type: string(hyperv1.ControlPlaneComponentRolloutComplete), Status: metav1.ConditionTrue, }), func(cpComponent *hyperv1.ControlPlaneComponent) (done bool, reasons string, err error) { if initialVersion != "" && cpComponent.Status.Version == initialVersion { return false, fmt.Sprintf("component %s is still on version %s", cpComponent.Name, cpComponent.Status.Version), nil } return true, fmt.Sprintf("component %s has version: %s", cpComponent.Name, cpComponent.Status.Version), nil }, }, WithTimeout(30*time.Minute), WithInterval(10*time.Second), + WithFilteredConditionDump(Condition{Type: string(hyperv1.ControlPlaneComponentRolloutComplete)}), ) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (3)
test/e2e/control_plane_upgrade_test.go(2 hunks)test/e2e/util/eventually.go(1 hunks)test/e2e/util/util.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
🔇 Additional comments (2)
test/e2e/util/eventually.go (1)
479-480: Add ControlPlaneComponent to Conditions: LGTMThe adaptation follows the existing pattern and enables predicates on ControlPlaneComponent. No issues.
test/e2e/control_plane_upgrade_test.go (1)
62-66: Rollout wait subtest: LGTMGood placement and gating; aligns with upgrade flow.
| var startingVersion string | ||
| if len(hostedCluster.Status.Version.History) > 0 { | ||
| startingVersion = hostedCluster.Status.Version.History[0].Version | ||
| } | ||
|
|
There was a problem hiding this comment.
Fix potential nil-pointer dereference on HostedCluster.Status.Version
Status.Version is a pointer; if nil, accessing History panics. Guard it like elsewhere (e.g., WaitForImageRollout).
- var startingVersion string
- if len(hostedCluster.Status.Version.History) > 0 {
- startingVersion = hostedCluster.Status.Version.History[0].Version
- }
+ var startingVersion string
+ if hostedCluster.Status.Version != nil && len(hostedCluster.Status.Version.History) > 0 {
+ startingVersion = hostedCluster.Status.Version.History[0].Version
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var startingVersion string | |
| if len(hostedCluster.Status.Version.History) > 0 { | |
| startingVersion = hostedCluster.Status.Version.History[0].Version | |
| } | |
| var startingVersion string | |
| if hostedCluster.Status.Version != nil && len(hostedCluster.Status.Version.History) > 0 { | |
| startingVersion = hostedCluster.Status.Version.History[0].Version | |
| } |
🤖 Prompt for AI Agents
In test/e2e/control_plane_upgrade_test.go around lines 37 to 41, the code
accesses hostedCluster.Status.Version.History without checking if Status.Version
is nil which can cause a nil-pointer panic; update the logic to first check if
hostedCluster.Status.Version != nil and only then inspect Version.History (and
its length) to set startingVersion, otherwise leave startingVersion empty or
handle the absent version path the same way other tests do (e.g.,
WaitForImageRollout) to avoid dereferencing a nil pointer.
| }, | ||
| []Predicate[*hyperv1.ControlPlaneComponent]{ | ||
| ConditionPredicate[*hyperv1.ControlPlaneComponent](Condition{ | ||
| Type: string(hyperv1.ControlPlaneComponentRolloutComplete), |
There was a problem hiding this comment.
fwiw a component can only be at version if roll out has completed for it.
we'll want to check controlPlaneUptoda once this merges #6300
|
/test e2e-aks |
|
@csrwng: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@csrwng: Jira Issue Verification Checks: Jira Issue OCPBUGS-61218 Jira Issue OCPBUGS-61218 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
PR openshift#6767 added a check for component rollout only in TestUpgradeControlPlane after the upgrade had been triggered. However, we need to check this on _all_ rollouts in all tests. WaitForImageRollout() is the function used for that. This moves the new component rollout check into WaitForImageRollout() so that we do it for all HC rollouts in all tests.
What this PR does / why we need it:
Add WaitForControlPlaneComponentRollout function to ensure control plane components complete rollout before proceeding with upgrade validation. This includes:
Which issue(s) this PR fixes (optional, use
fixes #<issue_number>(, fixes #<issue_number>, ...)format, where issue_number might be a GitHub issue, or a Jira story:Fixes #OCPBUGS-61218
Checklist
Summary by CodeRabbit