Skip to content

feat(cpo): use HO image for CPO on 4.20+ clusters#7446

Draft
muraee wants to merge 1 commit intoopenshift:mainfrom
muraee:use-ho-image-for-cpo-420
Draft

feat(cpo): use HO image for CPO on 4.20+ clusters#7446
muraee wants to merge 1 commit intoopenshift:mainfrom
muraee:use-ho-image-for-cpo-420

Conversation

@muraee
Copy link
Copy Markdown
Contributor

@muraee muraee commented Jan 8, 2026

Summary

For cluster versions 4.20 and above, use the HyperShift Operator image directly for the Control Plane Operator instead of extracting it from the OCP release payload.

Benefits

  • Faster feature delivery: CPO ships with HO releases instead of being tied to OCP payload
  • Simplified hotfix process: Single HO image bump fixes all 4.20+ clusters (no per-cluster annotation overrides needed)
  • Consistent deployment model: Same approach for both managed services and self-managed

Changes

  1. support/util/util.go: Modified GetControlPlaneOperatorImage() to use HO image for 4.20+ if the CPO binary exists
  2. support/util/util_test.go: Added comprehensive unit tests for the new behavior
  3. Dockerfile & Containerfile.operator:
    • Build and include control-plane-operator and control-plane-pki-operator binaries
    • Add symlinks for ignition-server, konnectivity-socks5-proxy, availability-prober, token-minter
    • Add missing label io.openshift.hypershift.control-plane-operator-supports-kas-custom-kubeconfig=true

Safety Mechanism

The code includes a safety check that verifies /usr/bin/control-plane-operator exists in the HO image before using it. This ensures:

  • Older HO images (without the CPO binary) continue to use the release payload CPO
  • Self-managed users running pre-change HO versions are not affected
  • Graceful fallback to payload CPO if binary check fails

Behavior Matrix

Cluster Version HO Has CPO Binary Result
4.20+ Yes Uses HO image
4.20+ No Uses payload CPO (graceful fallback)
< 4.20 Any Uses payload CPO

Test plan

  • Unit tests pass for GetControlPlaneOperatorImage
  • E2E test with 4.20+ cluster to verify CPO uses HO image
  • E2E test with pre-change HO image to verify fallback to payload CPO
  • Verify CPO starts correctly with the new image

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 8, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Changes update container build files to include control-plane-operator and control-plane-pki-operator binaries with symlink configuration and new metadata label. The image selection logic adds CPO binary detection with caching to check HO image availability before consulting release payload. Tests validate the updated precedence logic across multiple scenarios.

Changes

Cohort / File(s) Summary
Container build configuration
Containerfile.operator, Dockerfile
Added karpenter-operator, control-plane-operator, and control-plane-pki-operator build targets; copy control-plane-operator and control-plane-pki-operator binaries to /usr/bin/; create symlinks for ignition-server, konnectivity-socks5-proxy, availability-prober, and token-minter pointing to control-plane-operator; add new LABEL annotation io.openshift.hypershift.control-plane-operator-supports-kas-custom-kubeconfig=true.
CPO image selection logic
support/util/util.go
Introduced cpoBinaryPath constant and cpoBinaryExists cache variable; added cpoBinaryExistsInHOImage() function for cached binary presence detection; refactored GetControlPlaneOperatorImage precedence to check for CPO binary in HO image (4.20+) before consulting release payload.
Image selection test coverage
support/util/util_test.go
Added TestGetControlPlaneOperatorImage with multiple test cases covering CPO annotation override, HO image availability, hypershift payload presence, and CPO binary existence scenarios; introduced testReleaseProvider for mocking release lookups.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 8, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release labels Jan 8, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Jan 8, 2026
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jan 8, 2026

/test verify
/test e2e-aws
/test e2e-aws-4-21

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jan 8, 2026

/test e2e-aws-4-20

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 8, 2026

@muraee: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aks
/test e2e-aks-4-21
/test e2e-aks-override
/test e2e-aws
/test e2e-aws-4-21
/test e2e-aws-override
/test e2e-aws-upgrade-hypershift-operator
/test e2e-kubevirt-aws-ovn-reduced
/test images
/test okd-scos-images
/test security
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-aws-autonode
/test e2e-aws-metrics
/test e2e-aws-minimal
/test e2e-aws-techpreview
/test e2e-azure-aks-ovn-conformance
/test e2e-conformance
/test e2e-kubevirt-aws-ovn
/test e2e-kubevirt-azure-ovn
/test e2e-kubevirt-metal-conformance
/test e2e-openstack-aws
/test e2e-openstack-aws-conformance
/test e2e-openstack-aws-csi-cinder
/test e2e-openstack-aws-csi-manila
/test e2e-openstack-aws-nfv
/test okd-scos-e2e-aws-ovn
/test reqserving-e2e-aws

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-hypershift-main-e2e-aks
pull-ci-openshift-hypershift-main-e2e-aks-4-21
pull-ci-openshift-hypershift-main-e2e-aws
pull-ci-openshift-hypershift-main-e2e-aws-upgrade-hypershift-operator
pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn-reduced
pull-ci-openshift-hypershift-main-images
pull-ci-openshift-hypershift-main-okd-scos-images
pull-ci-openshift-hypershift-main-security
pull-ci-openshift-hypershift-main-unit
pull-ci-openshift-hypershift-main-verify
Details

In response to this:

/test e2e-aws-4-20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jan 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @support/util/util.go:
- Line 686: The version check using "if version.Major >= 4 && version.Minor >=
20 && cpoBinaryExistsInHOImage()" is incorrect for majors >4 and for majors <4;
update the condition in util.go to correctly test "version >= 4.20" by either
constructing a semver threshold (e.g., create minVersionForHOImage :=
semver.Version{Major:4, Minor:20, Patch:0} and use
version.GTE(minVersionForHOImage)) or use an equivalent comparison like "if
version.Major > 4 || (version.Major == 4 && version.Minor >= 20) &&
cpoBinaryExistsInHOImage()", keeping the check against
cpoBinaryExistsInHOImage() unchanged and referencing the existing version
variable and cpoBinaryExistsInHOImage() call.
- Around line 633-647: The package-level cpoBinaryExists is racy; replace the
manual nil-check/write in cpoBinaryExistsInHOImage with a thread-safe
initialization using sync.Once (add a package-level sync.Once, e.g.
cpoBinaryOnce) or a sync.Mutex; call cpoBinaryOnce.Do(func(){ stat
os.Stat(cpoBinaryPath) and set cpoBinaryExists = &exists }) inside
cpoBinaryExistsInHOImage and then return *cpoBinaryExists so reads/writes are
synchronized and the value is computed exactly once.
🧹 Nitpick comments (2)
support/util/util_test.go (2)

1128-1134: Test setup directly mutates package-level state.

The test sets cpoBinaryExists directly (line 1133), which works with the current pointer-based caching but would break if sync.Once is adopted per the suggestion in util.go. Consider abstracting the binary existence check via a function variable or interface to improve testability.

♻️ Suggestion for improved testability

In util.go, use a function variable that can be replaced in tests:

// cpoBinaryExistsFunc is the function used to check CPO binary existence.
// It can be replaced in tests.
var cpoBinaryExistsFunc = cpoBinaryExistsInHOImage

Then in tests:

cpoBinaryExistsFunc = func() bool { return tc.cpoBinaryExists }
defer func() { cpoBinaryExistsFunc = cpoBinaryExistsInHOImage }()

1045-1126: Good test coverage, but consider adding a test case for major version > 4.

The test cases comprehensively cover 4.x versions, but given the version comparison bug noted in util.go, adding a test case for version 5.0+ would help catch regressions and validate the fix.

{
    name:                 "When version is 5.0 and CPO binary exists it should use HO image",
    version:              "5.0.0",
    payloadHasHypershift: true,
    cpoBinaryExists:      true,
    expectedImage:        hoImage,
},
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 032f041 and bac4dde.

📒 Files selected for processing (4)
  • Containerfile.operator
  • Dockerfile
  • support/util/util.go
  • support/util/util_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • Containerfile.operator
  • support/util/util.go
  • Dockerfile
  • support/util/util_test.go
🧬 Code graph analysis (1)
support/util/util_test.go (3)
support/releaseinfo/releaseinfo.go (1)
  • ReleaseImage (39-42)
api/hypershift/v1beta1/hostedcluster_types.go (1)
  • ControlPlaneOperatorImageAnnotation (59-59)
support/util/util.go (1)
  • GetControlPlaneOperatorImage (663-698)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
  • GitHub Check: Red Hat Konflux / control-plane-operator-main-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-gomaxprocs-webhook-on-pull-request
🔇 Additional comments (6)
Dockerfile (2)

7-29: LGTM! Build stage and binary packaging changes are well-structured.

The additions correctly build and package the control-plane-operator and control-plane-pki-operator binaries, with appropriate symlinks for the multi-call binary pattern where ignition-server, konnectivity-socks5-proxy, availability-prober, and token-minter all resolve to control-plane-operator.


45-45: LGTM! New capability label added.

The label correctly signals that this image supports the kas-custom-kubeconfig feature.

Containerfile.operator (2)

7-29: LGTM! Changes mirror Dockerfile appropriately.

The build stage and binary packaging changes are consistent with the Dockerfile, ensuring both container build paths produce equivalent images.


54-54: LGTM! Label added consistently with Dockerfile.

support/util/util.go (1)

649-661: LGTM! Documentation accurately reflects the updated precedence logic.

The docstring clearly explains the five-level precedence hierarchy for CPO image resolution.

support/util/util_test.go (1)

1016-1036: LGTM! Clean fake release provider implementation.

The testReleaseProvider correctly constructs a ReleaseImage with the version in the ImageStream name and component images in tags.

Comment thread support/util/util.go Outdated
Comment thread support/util/util.go Outdated
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jan 9, 2026

/test e2e-aws
/test e2e-aws-4-21

@rtheis
Copy link
Copy Markdown
Contributor

rtheis commented Jan 9, 2026

/cc @rtheis

@openshift-ci openshift-ci Bot requested a review from rtheis January 9, 2026 12:34
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jan 12, 2026

/test e2e-aws

For cluster versions 4.20 and above, use the HyperShift Operator image
directly for the Control Plane Operator instead of extracting it from
the OCP release payload. This enables:

- Faster feature delivery for CPO (ships with HO releases)
- Simplified hotfix process (single HO image bump fixes all clusters)
- Consistent deployment model between managed and self-managed

The change includes a safety check that verifies the control-plane-operator
binary exists in the HO image before using it. This ensures backward
compatibility with older HO images that don't include the CPO binary -
they will continue to use the release payload CPO.

Dockerfiles are updated to:
- Build and include control-plane-operator and control-plane-pki-operator
- Add symlinks for ignition-server, konnectivity-socks5-proxy, etc.
- Add missing CPO feature discovery labels

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@muraee muraee force-pushed the use-ho-image-for-cpo-420 branch from bac4dde to 1141357 Compare January 12, 2026 17:21
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 17, 2026

@muraee: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-workflows 1141357 link true /test verify-workflows

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 17, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have complete data. Let me compose the final report.

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-verify-workflows
  • Build ID: 2045127073764216832
  • PR: feat(cpo): use HO image for CPO on 4.20+ clusters #7446feat(cpo): use HO image for CPO on 4.20+ clusters by @muraee
  • PR Branch: use-ho-image-for-cpo-420 (created 2026-01-08, single commit from 2026-01-12)
  • Base SHA: 1180bcaf557d7df5da61fc97197c69c68c32c4e2
  • PR SHA: 1141357a860aaa988158f9841199fcc1b718a52d

Test Failure Analysis

Error

CONFLICT (content): Merge conflict in Dockerfile
Auto-merging support/util/util_test.go
CONFLICT (content): Merge conflict in support/util/util_test.go
Automatic merge failed; fix conflicts and then commit the result.
# Error: exit status 1

Summary

All five CI failures on PR #7446 share a single root cause: the PR branch is 3+ months stale (last commit January 12, 2026) and has unresolved merge conflicts with the main branch. The verify-workflows Prow job failed during the git checkout phase before any test code ran, because Prow could not merge the PR's commit into the current main. The tide error is a direct consequence — Tide reports mergeStateStatus: DIRTY and mergeable: CONFLICTING, so it will not attempt to merge the PR. The three Konflux enterprise-contract failures (2 policy violations each) ran on January 12, 2026 against an outdated MCE-2.11 policy; current PRs on the repo now run MCE-2.17 EC checks which show as "skipping" — these EC failures are stale artifacts from an obsolete pipeline configuration and will not reappear after rebasing.

Root Cause

The PR branch use-ho-image-for-cpo-420 has not been rebased since its creation on January 8, 2026. In the intervening 3+ months, the main branch received changes to two files that this PR also modifies:

  1. Dockerfile — The PR modifies the Dockerfile to build and include control-plane-operator and control-plane-pki-operator binaries, add symlinks, and add CPO feature discovery labels. The main branch has since received independent modifications to the Dockerfile that conflict with these changes.

  2. support/util/util_test.go — The PR adds 167 lines of test changes. The main branch has since modified the same file in overlapping regions.

The Prow CI infrastructure cannot automatically resolve these conflicts. When ci-operator attempts to merge PR commit 1141357a860a into the current main (1180bcaf557d), git reports CONFLICT (content) for both files and exits with status 1. This causes an immediate job failure before any CI step (including verify-workflows) can execute.

Breakdown of all 5 failures:

Check Root Cause Status
ci/prow/verify-workflows Git merge conflict during checkout Blocked
tide PR not mergeable (CONFLICTING) Blocked
Konflux EC / hypershift-operator-main Stale MCE-2.11 EC policy (2 failures) Outdated
Konflux EC / hypershift-operator Stale MCE-2.11 EC policy (2 failures) Outdated
Konflux EC / hypershift-cli-mce-211 Stale MCE-2.11 EC policy (2 failures) Outdated

The Konflux EC failures are independent of the merge conflicts — the image builds succeeded, but the enterprise-contract verification (a post-build policy check) failed with 2 policy violations. However, these checks ran against the MCE-2.11 pipeline, while current PRs on the repo use MCE-2.17. On all recently merged PRs (e.g., #8261, #8258), the EC checks show as "skipping," indicating the pipeline configuration has been updated since January. These EC failures will not recur after rebasing and re-triggering CI.

Recommendations
  1. Rebase the PR branch onto current main — This is the only required action. Resolve the merge conflicts in Dockerfile and support/util/util_test.go, then force-push the updated branch. All Prow jobs and Konflux checks will automatically re-trigger.

  2. Verify Dockerfile changes still apply — Given 3+ months of drift, review whether the Dockerfile modifications (adding CPO/CPKO binaries, symlinks, labels) are still compatible with the current Dockerfile structure on main.

  3. Verify support/util/util_test.go changes still apply — The test changes may need updating if the support/util package APIs have changed on main.

  4. No action needed for Konflux EC failures — After rebasing, Konflux will run the current MCE-2.17 pipeline (which shows "skipping" on recent PRs). The stale MCE-2.11 EC failures will be replaced.

  5. Consider closing the PR if the feature direction has changed — This PR has been open for over 3 months without update. Confirm with the team that the approach (using HO image for CPO on 4.20+ clusters) is still the desired direction before investing time in conflict resolution.

Evidence
Evidence Detail
Merge conflict files Dockerfile, support/util/util_test.go
PR age Created 2026-01-08, last commit 2026-01-12 (3+ months stale)
PR mergeable status CONFLICTING / DIRTY (from GitHub API)
verify-workflows exit exit status 1 during git merge before any CI step ran
tide error message Not mergeable. PR has a merge conflict.
Konflux EC run date 2026-01-12 (stale)
Konflux EC policy version MCE-2.11 (current PRs use MCE-2.17)
Konflux EC on recent PRs All show "skipping" (e.g., PR #8261, #8264)
EC build status Builds passed (on-pull-request pipelines succeeded); only verify task failed
EC failure count 2 policy failures per check (consistent across all 3 EC checks)
Files changed in PR Dockerfile, Containerfile.operator, support/util/util.go, support/util/util_test.go
Prow base SHA 1180bcaf557d7df5da61fc97197c69c68c32c4e2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants