INTEROP-9230,INTEROP-9231: Add OPP GA-to-nightly upgrade step#81418
INTEROP-9230,INTEROP-9231: Add OPP GA-to-nightly upgrade step#81418amp-rh wants to merge 5 commits into
Conversation
New step registry ref (interop-opp-upgrade) and ci-operator config variant for OPP upgrade testing. Provisions at GA, installs OPP operators, upgrades to nightly, validates platform and operator health. Cron disabled; to be enabled after manual validation.
|
@amp-rh: This pull request references INTEROP-9230 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. This pull request references INTEROP-9231 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Currently processing new changes in this PR. This may take a few minutes, please wait... ⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
WalkthroughThis PR adds a new OCP upgrade CI test for the stolostron policy collection and registers a new interop OPP upgrade step. The step script handles upgrade prechecks, upgrade execution, progress monitoring, cluster stabilization, and post-upgrade validation of platform and operator health. ChangesInterop OPP upgrade flow
Estimated code review effort: 4 (Complex) | ~45 minutes Sequence Diagram(s)sequenceDiagram
participant CIConfig as policy-collection config
participant StepRef as interop-opp-upgrade-ref.yaml
participant Script as interop-opp-upgrade-commands.sh
participant OC as oc CLI
participant Cluster as OpenShift cluster
CIConfig->>StepRef: schedule interop-opp-upgrade
StepRef->>Script: run command script
Script->>OC: registry login
Script->>OC: read target release and upgrade status
Script->>Cluster: apply admin-ack / CCO annotation updates
Script->>OC: start oc adm upgrade
loop monitor_upgrade
Script->>OC: poll upgrade status
Script->>Cluster: snapshot ClusterVersion
end
Script->>OC: wait-for-stable-cluster
Script->>Cluster: validate platform and OPP operator health
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (12 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@amp-rh, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
- Add grace_period: 10m to ref.yaml (required when script uses trap) - Add OWNERS files at interop/ and interop/opp/ parent directories - Add generated metadata JSON for step registry
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/interop/opp/upgrade/interop-opp-upgrade-commands.sh`:
- Around line 12-21: The EXIT/TERM trap is being installed before debug_on_exit
is defined, so failures in the setup commands can trigger an undefined function
and hide the real error. Move the trap setup in interop-opp-upgrade-commands.sh
to after the debug_on_exit function definition, or define debug_on_exit first
and then install the trap, so the trap always resolves to a valid function.
- Around line 299-347: The local IFS setting in the OPP upgrade check is leaking
past the operator parsing and breaking the namespace loop in the same function.
Limit the comma IFS change to the `read -ra operators` call in
`interop-opp-upgrade-commands.sh` and restore normal splitting before the `for
ns in ${opp_namespaces}` loop so `opp_namespaces` iterates correctly over each
namespace for the pod readiness check.
- Around line 187-209: The upgrade timeout logic in monitor_upgrade is tied to
poll iterations instead of real elapsed time, so changing POLL_INTERVAL changes
the effective timeout and command runtime is not counted. Use the existing
start_time in monitor_upgrade to compute elapsed wall-clock time on each loop
iteration and stop when elapsed time reaches UPGRADE_TIMEOUT in minutes, rather
than decrementing a counter once per sleep. Keep the remaining/polling logic and
status collection in monitor_upgrade, but base the timeout check on actual time
so the advertised timeout matches behavior regardless of POLL_INTERVAL or oc
command duration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: fe747f1d-5afa-4e07-ab19-3ecfbeb1e661
📒 Files selected for processing (4)
ci-operator/config/stolostron/policy-collection/stolostron-policy-collection-main__ocp-upgrade.yamlci-operator/step-registry/interop/opp/upgrade/OWNERSci-operator/step-registry/interop/opp/upgrade/interop-opp-upgrade-commands.shci-operator/step-registry/interop/opp/upgrade/interop-opp-upgrade-ref.yaml
- Fix ref.yaml dependencies to use correct name/env mapping (name=image stream tag, env=variable name) - Remove ODF_VERSION_MAJOR_MINOR from config (not declared in any step) - Remove OPENSHIFT_UPGRADE_RELEASE_IMAGE_OVERRIDE from config deps (ref.yaml already declares it correctly)
Generated periodic job definition for the new ocp-upgrade config variant, matching the format of existing periodics.
|
@amp-rh: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- Move trap installation after debug_on_exit definition so early failures in setup commands invoke a defined function - Use wall-clock deadline instead of iteration counter for upgrade timeout so behavior is correct regardless of POLL_INTERVAL value - Scope IFS=',' to the read call only so the namespace loop in validate_opp_operators splits correctly on whitespace
|
/pj-rehearse ack |
|
@amp-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: amp-rh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-stolostron-policy-collection-main-ocp-upgrade-interop-opp-upgrade-aws |
|
@amp-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@amp-rh: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
INTEROP-9230 + INTEROP-9231: OPP GA-to-Nightly Upgrade Step
Adds an OCP upgrade automation step for OPP (OpenShift Platform Plus)
interop testing. The upgrade step provisions at the latest GA release,
installs OPP operators, then upgrades to a nightly build and validates
that operators survive the version transition.
New files
ci-operator/step-registry/interop/opp/upgrade/- Step registry refthat performs GA-to-nightly upgrade with stall detection, admin ack,
platform health checks, and OPP operator CSV validation
ci-operator/config/stolostron/policy-collection/stolostron-policy-collection-main__ocp-upgrade.yaml-Periodic job config (cron disabled) for the upgrade variant
Testing
/cc @mpruitt-rh
Summary by CodeRabbit
This PR extends the
stolostron/policy-collectionCI infrastructure (viaci-operator) with a new OCP GA-to-nightly upgrade interop scenario focused on OPP (OpenShift Platform Plus) operator persistence.It adds a new periodic
ocp-upgradevariant config (ci-operator/config/stolostron/policy-collection/stolostron-policy-collection-main__ocp-upgrade.yaml) that defines aninterop-opp-upgrade-awsworkflow. The job is intentionally disabled (cron set to an invalid Feb 31 schedule until manual validation), includes Slack state reporting, and wires a staged workflow (pre/post/setup plus test steps) that provisions an AWS cluster using the GA baseline images, applies upgrade/install release image overrides, installs the targeted OPP operator set, then performs the GA→nightly upgrade and follow-up collection/deprovisioning and issue reporting.In support of the workflow, it introduces a new interop step-registry command (
ci-operator/step-registry/interop/opp/upgrade/) with:interop-opp-upgrade-ref.yaml) supporting upgrade timeouts, polling, stall detection, andOPENSHIFT_UPGRADE_RELEASE_IMAGE_OVERRIDE, plus a grace period to avoid transient CI issues.interop-opp-upgrade-commands.sh) that runsoc adm upgradeto the nightly, detects upgrade stalls (no-progress window), performs CVO/cluster health verification, and validates OPP operator survival by checking expected operator CSVs reachSucceededand that their pods are ready.Overall, the PR provides automated GA-to-nightly upgrade coverage specifically to ensure OPP operators remain healthy across the version transition, with safeguards for upgrade stalls and comprehensive post-upgrade health checks.