Skip to content

OCPBUGS-70320: enable scale-up from zero with InPlace upgrade#7427

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
LiangquanLi930:from-zero-adjust
Jan 12, 2026
Merged

OCPBUGS-70320: enable scale-up from zero with InPlace upgrade#7427
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
LiangquanLi930:from-zero-adjust

Conversation

@LiangquanLi930
Copy link
Copy Markdown
Contributor

@LiangquanLi930 LiangquanLi930 commented Jan 5, 2026

What this PR does / why we need it:

Enable scale-up from zero with InPlace upgrade. Remove machinedeployment controller and refactor scale-from-zero using instancetype provider abstraction.

Which issue(s) this PR fixes:

Fixes: OCPBUGS-70320

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 5, 2026

Walkthrough

Removed the MachineDeployment scale-from-zero controller and EC2 cache/utilities; introduced a cloud-agnostic instancetype.Provider with an AWS provider, moved scale-from-zero annotation logic into the NodePool reconciler, and updated startup flags, installer wiring, and tests.

Changes

Cohort / File(s) Summary
MachineDeployment controller & tests removed
hypershift-operator/controllers/machinedeployment/controller.go, hypershift-operator/controllers/machinedeployment/controller_test.go
Deleted the MachineDeployment scale-from-zero reconciler implementation and its entire unit test suite.
EC2 instance-types cache & tests removed
hypershift-operator/controllers/machinedeployment/ec2_instance_types.go, hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
Removed EC2 instance-type cache, fetch/transform/pagination logic and related tests; exported types/interfaces deleted.
Provider-spec utilities removed
hypershift-operator/controllers/machinedeployment/utils/providerspec.go
Removed helpers for resolving AWSMachineTemplate, extracting instance type and region, and the RegionAnnotation constant.
NodePool controller: reconcile & wiring updates
hypershift-operator/controllers/nodepool/nodepool_controller.go
Added InstanceTypeProvider field; integrated reconcileScaleFromZeroAnnotations call into reconcile flow when autoscaling + provider + supported platform; added platform guard.
Scale-from-zero core logic (NodePool)
hypershift-operator/controllers/nodepool/scale_from_zero.go
New deterministic taintsToAnnotation and setScaleFromZeroAnnotationsOnObject to apply/clear vCPU, memory, GPU, labels, taints, and architecture annotations using provider lookups and honoring template Status.Capacity.
New instancetype API & AWS provider
hypershift-operator/controllers/nodepool/instancetype/instancetype.go, hypershift-operator/controllers/nodepool/instancetype/aws/provider.go, hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go
Added instancetype.Provider interface and InstanceTypeInfo; implemented AWS provider using DescribeInstanceTypes, transform/GPU helpers, and unit tests covering transform/lookup/error cases.
Main startup flags & wiring
hypershift-operator/main.go
Replaced AWS-only flags with --scale-from-zero-provider and --scale-from-zero-creds; added validation; initialize/inject InstanceTypeProvider into NodePool reconciler; removed MachineDeployment controller wiring for scale-from-zero.
Installer assets & install CLI
cmd/install/assets/hypershift_operator.go, cmd/install/install.go, cmd/install/install_test.go
Renamed AWS-specific secret/types to provider-aware variants (ScaleFromZeroCredentialsSecret, ScaleFromZeroProvider); updated deployment volume/mount names, CLI flags, validation, and tests.
NodePool scale-from-zero unit tests
hypershift-operator/controllers/nodepool/scale_from_zero_test.go
Added tests validating deterministic taintsToAnnotation formatting and ordering.
E2E test update
test/e2e/autoscaling_test.go
Updated secret lookup and skip logic to use new hypershift-operator-scale-from-zero-credentials secret name.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Command failed


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from csrwng and muraee January 5, 2026 17:10
@openshift-ci openshift-ci Bot added area/cli Indicates the PR includes changes for CLI area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jan 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @hypershift-operator/controllers/nodepool/scale_from_zero.go:
- Around line 104-128: getInstanceTypeInfo currently blocks on the EC2 call
without cancellation; change its signature to accept a context.Context (e.g.,
getInstanceTypeInfo(ctx context.Context, ec2Client ec2iface.EC2API, instanceType
string)) and use the context-aware EC2 call
(ec2Client.DescribeInstanceTypesWithContext or the SDK's Context variant) when
invoking DescribeInstanceTypes, returning errors as before; update the caller
setScaleFromZeroAnnotationsOnObject to pass through the caller's context into
getInstanceTypeInfo so the EC2 request honors timeouts and cancellations.
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

1169-1177: Consider returning early if no changes are needed.

The current implementation always calls r.Update(ctx, obj) even if setScaleFromZeroAnnotationsOnObject didn't modify the annotations (e.g., when CAPA already provides Status.Capacity). This could result in unnecessary API calls.

🔎 Suggested improvement

Consider capturing whether annotations were modified and skipping the update if no changes occurred:

+	// Capture original annotation count to detect changes
+	originalLen := len(annotations)
+
 	// Set annotations on the object (MachineSet or MachineDeployment)
 	if err := setScaleFromZeroAnnotationsOnObject(ctx, r.Client, r.EC2Client, nodePool, annotations, awsMachineTemplate); err != nil {
 		return fmt.Errorf("failed to set scale-from-zero annotations: %w", err)
 	}
 
+	// Skip update if no annotations were added/modified
+	if len(annotations) == originalLen {
+		return nil
+	}
+
 	// Update the object

Note: This is a simplified check. A more robust solution would compare annotation values, but given the context this optimization may not be necessary.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c1d28e3 and 11f7b1f.

📒 Files selected for processing (10)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
  • hypershift-operator/main.go
💤 Files with no reviewable changes (6)
  • hypershift-operator/main.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • cmd/install/assets/hypershift_operator.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • cmd/install/install.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
🧬 Code graph analysis (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
api/hypershift/v1beta1/hostedcluster_types.go (1)
  • AWSPlatform (1211-1211)
api/hypershift/v1beta1/nodepool_types.go (3)
  • NodePool (83-97)
  • UpgradeType (280-280)
  • UpgradeTypeInPlace (289-289)
support/awsclient/client.go (1)
  • Client (81-83)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
support/awsclient/client.go (1)
  • Client (81-83)
🔇 Additional comments (10)
cmd/install/assets/hypershift_operator.go (1)

100-104: Clean removal of scale-from-zero credentials infrastructure.

The consolidation of secret-name constants from four to three and removal of the scale-from-zero AWS credentials constant aligns with the PR objective of moving scale-from-zero support into the NodePool controller.

hypershift-operator/controllers/nodepool/nodepool_controller.go (2)

409-417: Non-blocking error handling for scale-from-zero annotations.

The error from reconcileScaleFromZeroAnnotations is logged but does not abort reconciliation. This is appropriate since scale-from-zero annotations are a non-critical optimization for the cluster autoscaler, and failures shouldn't block the core NodePool reconciliation.


1108-1152: Well-structured handling of both upgrade types.

The method correctly handles both InPlace (MachineSet) and Replace (MachineDeployment) upgrade types, with proper nil checks for annotations map initialization and appropriate early returns for not-found cases.

cmd/install/install.go (3)

590-615: Clean removal of scale-from-zero credentials from install flow.

The updated function signatures and call sites correctly remove the scale-from-zero secret handling. The setupAuth and setupOperatorResources functions are now streamlined without the previously required scale-from-zero credentials.


836-876: Operator deployment configuration updated correctly.

The setupOperatorResources function signature and the HyperShiftOperatorDeployment struct initialization have been properly updated to remove the ScaleFromZeroAWSSecret and ScaleFromZeroAWSSecretKey fields.


1107-1184: Auth setup correctly returns fewer secrets.

The setupAuth function no longer handles scale-from-zero credentials, simplifying the authentication setup flow while maintaining the existing OIDC and platform-specific credential handling.

hypershift-operator/controllers/nodepool/scale_from_zero.go (4)

77-91: Architecture detection takes only the first supported architecture.

The code takes only the first architecture from SupportedArchitectures (line 78). For instance types that support multiple architectures (e.g., Graviton instances that might show both arm64 and x86_64 in some configurations), this may not accurately reflect the intended architecture for the NodePool.

Consider validating this against the NodePool's Spec.Arch if available.


150-161: Good check for native CAPA Status.Capacity support.

The early return when awsMachineTemplate.Status.Capacity is populated ensures that when CAPA natively supports scale-from-zero, the workaround annotations are cleaned up. This provides a clean migration path.


180-192: Labels annotation includes architecture and NodePool labels.

The implementation correctly includes both the instance architecture and user-specified node labels, with deterministic sorting for consistent output. This aligns with the cluster-autoscaler scale-from-zero specification.


130-145: Taints conversion follows CAPI specification.

The taintsToAnnotation function correctly formats taints according to the CAPI scale-from-zero proposal with deterministic ordering via sorting.

Comment thread hypershift-operator/controllers/nodepool/scale_from_zero.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/nodepool_controller.go:
- Around line 1136-1142: The code builds templateKey using
ms.Spec.Template.Spec.InfrastructureRef.Namespace directly, which can be empty;
change it to default to the MachineSet's namespace when empty (e.g., use ns :=
ms.Namespace; if ms.Spec.Template.Spec.InfrastructureRef.Namespace != "" { ns =
ms.Spec.Template.Spec.InfrastructureRef.Namespace } and construct templateKey
with that ns) before calling r.Get(ctx, templateKey, awsMachineTemplate). Apply
the identical change to the MachineDeployment branch (the equivalent
variables/fields around the MachineDeployment lookup) so both branches use the
resource's namespace when InfrastructureRef.Namespace is empty.
🧹 Nitpick comments (4)
cmd/install/assets/hypershift_operator.go (1)

101-103: Document the scale-from-zero credentials removal and provide migration guidance.

The removal of scaleFromZeroAWSCredsSecretName and related scale-from-zero credential handling is a significant change. Consider:

  1. Add a comment explaining that scale-from-zero functionality has moved to the NodePool controller
  2. Document the migration path for existing installations that may have scale-from-zero credentials configured
  3. Reference the bug OCPBUGS-70320 in the code to provide context for future maintainers

Would you like me to generate documentation or migration notes for this change?

cmd/install/install.go (1)

590-590: LGTM! Signature changes are internally consistent.

The removal of scale-from-zero secret handling from setupAuth and setupOperatorResources is implemented correctly:

  • Function signatures updated consistently
  • All call sites properly adjusted
  • Return statements correctly match new signatures
  • Error handling preserved

However, consider adding a comment in setupAuth or the Options struct explaining that scale-from-zero functionality has been relocated to the NodePool controller, which would help future maintainers understand this architectural change.

Also applies to: 615-615, 836-836, 1107-1107, 1184-1184

hypershift-operator/controllers/nodepool/scale_from_zero.go (2)

64-67: Add nil check for rawInstanceType to prevent panic.

If rawInstanceType is nil, dereferencing *rawInstanceType.InstanceType on line 66 will cause a panic. While the current caller (getInstanceTypeInfo) validates the response, defensive coding here would prevent issues if the function is called from other contexts.

🔎 Suggested improvement
 func transformInstanceType(rawInstanceType *ec2.InstanceTypeInfo) InstanceType {
+	if rawInstanceType == nil || rawInstanceType.InstanceType == nil {
+		return InstanceType{}
+	}
 	instanceType := InstanceType{
 		InstanceType: *rawInstanceType.InstanceType,
 	}

150-150: Unused parameters: ctx and c are never used in the function body.

The ctx parameter should be passed to getInstanceTypeInfo (once that function accepts context). The c client.Client parameter appears to be completely unused and can be removed.

🔎 Suggested fix
-func setScaleFromZeroAnnotationsOnObject(ctx context.Context, c client.Client, ec2Client ec2iface.EC2API, nodePool *hyperv1.NodePool, annotations map[string]string, awsMachineTemplate *infrav1.AWSMachineTemplate) error {
+func setScaleFromZeroAnnotationsOnObject(ctx context.Context, ec2Client ec2iface.EC2API, nodePool *hyperv1.NodePool, annotations map[string]string, awsMachineTemplate *infrav1.AWSMachineTemplate) error {

Then update callers in nodepool_controller.go to remove the r.Client argument.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 11f7b1f and 3832780.

📒 Files selected for processing (10)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
  • hypershift-operator/main.go
💤 Files with no reviewable changes (6)
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/main.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/controller.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • cmd/install/install.go
  • cmd/install/assets/hypershift_operator.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
🧬 Code graph analysis (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
api/hypershift/v1beta1/hostedcluster_types.go (1)
  • AWSPlatform (1211-1211)
api/hypershift/v1beta1/nodepool_types.go (2)
  • UpgradeType (280-280)
  • UpgradeTypeInPlace (289-289)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
support/awsclient/client.go (1)
  • Client (81-83)
🔇 Additional comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)

409-416: LGTM!

The integration point is well-placed after capi.Reconcile() succeeds. Non-blocking error handling is appropriate here since scale-from-zero annotations are an enhancement for autoscaling, not critical to NodePool operation. The nil check on r.EC2Client prevents issues when the client isn't configured.


1115-1151: The MachineSet/MachineDeployment is already created and fully populated by capi.Reconcile() before this code runs, so CreateOrPatch will always fetch the existing object.

The execution order is guaranteed: capi.Reconcile(ctx) (line 402) completes first and creates the MachineSet/MachineDeployment with all Spec fields properly populated via reconcileMachineSet() or reconcileMachineDeployment(). The reconcileScaleFromZeroAnnotations() call (line 413) only executes if capi.Reconcile() succeeds with no early return. When CreateOrPatch is called, it will retrieve the already-existing object and populate it before the mutate function runs, so the InfrastructureRef check will operate on valid data. The pattern of creating an empty object for CreateOrPatch is unconventional but not incorrect given this ordering.

Comment thread hypershift-operator/controllers/nodepool/nodepool_controller.go Outdated
@LiangquanLi930
Copy link
Copy Markdown
Contributor Author

/retest

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/nodepool_controller.go:
- Around line 409-417: The call to reconcileScaleFromZeroAnnotations logs errors
but continues reconciliation, so failures against the EC2 API are not surfaced;
modify the nodepool reconciliation to set a NodePool condition (e.g.,
ScaleFromZeroAnnotationsReady) to False with the error details when
reconcileScaleFromZeroAnnotations(ctx, nodePool, controlPlaneNamespace) returns
an error, and update the NodePool status accordingly, or alternatively return
that error from Reconcile to stop the reconciliation; ensure you reference
reconcileScaleFromZeroAnnotations and update the NodePool status conditions/path
where other conditions are set so the failure is visible to users.
- Around line 1108-1192: The reconcileScaleFromZeroAnnotations function
mishandles AWSMachineTemplate lookups by always using controlPlaneNamespace and
it silently ignores NotFound errors; fix both: when building templateKey for
both the MachineSet path (ms) and MachineDeployment path (md) use the
InfrastructureRef.Namespace if non-empty (i.e., templateNS :=
spec.Template.Spec.InfrastructureRef.Namespace; if templateNS=="" use
controlPlaneNamespace) and set templateKey.Namespace = templateNS; also stop
swallowing not-found errors in the CreateOrPatch error handling—do not return
nil on apierrors.IsNotFound(err); instead wrap and return the error (or at
minimum log a warning and return it) so missing MachineSet/MachineDeployment or
AWSMachineTemplate lookup failures surface rather than being silently ignored;
reference reconcileScaleFromZeroAnnotations, ms/md, awsMachineTemplate,
templateKey, and setScaleFromZeroAnnotationsOnObject when applying the changes.

In @hypershift-operator/controllers/nodepool/scale_from_zero.go:
- Around line 64-91: The transformInstanceType function currently defaults
CPUArchitecture to ArchitectureAmd64 on unknown or missing ProcessorInfo which
can mislabel ARM instances; change transformInstanceType to return
(InstanceType, error) instead of just InstanceType, detect when ProcessorInfo is
nil, SupportedArchitectures is empty, or the architecture value is unrecognized
and return a descriptive error instead of defaulting, and update all callers of
transformInstanceType to handle the error (propagate or log and skip the
instance); alternatively, if you must keep a default for compatibility, emit a
warning via the operator logger (not silently default) before assigning
ArchitectureAmd64 and document the assumption.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3832780 and 4375af3.

📒 Files selected for processing (10)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
  • hypershift-operator/main.go
💤 Files with no reviewable changes (6)
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/main.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/install/install.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • cmd/install/assets/hypershift_operator.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
🧬 Code graph analysis (1)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
support/awsclient/client.go (1)
  • Client (81-83)
🔇 Additional comments (3)
cmd/install/assets/hypershift_operator.go (1)

100-103: LGTM! Clean removal of deprecated scale-from-zero credentials.

The removal of scaleFromZeroAWSCredsSecretName and related infrastructure aligns with the PR objective to move scale-from-zero annotation logic into the NodePool controller, eliminating the need for separate credentials.

hypershift-operator/controllers/nodepool/scale_from_zero.go (2)

104-128: LGTM! Context handling implemented correctly.

The function now uses DescribeInstanceTypesWithContext (line 115), properly propagating the caller's context for timeout and cancellation support. This addresses the previous review concern about blocking EC2 calls.


147-204: LGTM! Well-structured scale-from-zero annotation logic.

The function correctly:

  • Checks for CAPA native support first (lines 152-161) and cleans up workaround annotations when not needed
  • Validates input before fetching EC2 data (lines 164-167)
  • Propagates errors appropriately (line 173)
  • Ensures deterministic output by sorting labels and taints (lines 191, 143)
  • Handles optional taints with proper cleanup (lines 194-201)

Comment thread hypershift-operator/controllers/nodepool/nodepool_controller.go
Comment thread hypershift-operator/controllers/nodepool/scale_from_zero.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)

409-417: Consider surfacing scale-from-zero annotation failures in NodePool status.

When reconcileScaleFromZeroAnnotations fails (e.g., EC2 API unavailable), the error is logged but reconciliation continues. Users won't see this failure reflected in the NodePool status, which could lead to silent autoscaling issues.

Consider either:

  • Adding a dedicated status condition (e.g., ScaleFromZeroAnnotationsReady) to surface failures
  • Returning the error to block reconciliation until resolved

1108-1186: LGTM! Implementation correctly handles both upgrade types.

The namespace defaulting using cmp.Or (lines 1137, 1171) properly addresses the previous review concern. The CreateOrPatch pattern is appropriate here since CAPI reconciliation ensures the MachineSet/MachineDeployment exists before this code runs.

One minor observation: if the MachineSet/MachineDeployment doesn't exist yet (e.g., during initial reconciliation), CreateOrPatch could theoretically create an empty resource. However, this is unlikely to occur in practice given the reconciliation ordering (CAPI runs first at line 402-408), and the Kind validation provides a safeguard by skipping annotation setup for non-AWS templates.

Optional: More defensive pattern

If you want to be more explicit about requiring the resource to exist first, consider using Get followed by Update instead of CreateOrPatch:

// For MachineSet
ms := &capiv1.MachineSet{}
if err := r.Get(ctx, key, ms); err != nil {
    if apierrors.IsNotFound(err) {
        // Resource not created yet by CAPI, skip for now
        log.V(1).Info("MachineSet not found yet, skipping scale-from-zero annotations")
        return nil
    }
    return fmt.Errorf("failed to get MachineSet: %w", err)
}
// ... rest of annotation logic with Update instead of CreateOrPatch

This makes the assumption explicit that the resource should exist, and gracefully handles the case where it doesn't yet.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4375af3 and 381f36f.

📒 Files selected for processing (3)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
💤 Files with no reviewable changes (1)
  • cmd/install/assets/hypershift_operator.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • cmd/install/install.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (1)
cmd/install/install.go (2)
karpenter-operator/controllers/karpenter/assets/assets.go (1)
  • ReadFile (23-25)
cmd/install/assets/hypershift_operator.go (1)
  • ScaleFromZeroAWSCredentialsSecret (157-161)
🔇 Additional comments (6)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

4-14: LGTM!

The new imports are appropriate for the scale-from-zero functionality. The cmp package is used for namespace defaulting, and the config/agent imports support the expanded reconciliation logic.

cmd/install/install.go (5)

135-138: LGTM! Consistent field additions.

The new scale-from-zero credential fields follow the established pattern used for AWS private credentials and OIDC storage, providing both file-based and secret-reference options.


191-208: LGTM! Thorough validation logic.

The validation correctly enforces mutual exclusivity and provides helpful error messages for file access issues. The approach is consistent with existing credential validation patterns in this file.


312-314: LGTM! Clear flag definitions.

The CLI flags are well-documented and follow the naming conventions used for other credential options.


1134-1135: LGTM! Accurate comment and signature update.

The function comment correctly describes all return values, and the signature change is properly reflected at all call sites.


1213-1233: LGTM! Implementation follows established patterns and is properly integrated.

The scale-from-zero secret creation logic correctly mirrors the approach used for OIDC and AWS private credentials. Both the file-based case (reading credentials and creating a managed secret) and the secret reference case (creating a minimal reference object for existing secrets) are properly implemented. Error handling uses correct %w wrapping. The secret is properly mounted in the operator deployment, with appropriate nil and value checks before use.

@LiangquanLi930 LiangquanLi930 force-pushed the from-zero-adjust branch 2 times, most recently from 1632d1c to a78e835 Compare January 6, 2026 14:40
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

1115-1148: Consider GET-first approach to avoid creation attempts.

When using CreateOrPatch on a potentially non-existent MachineSet (lines 1117-1148) or MachineDeployment (lines 1151-1183), the mutation function only sets annotations without populating the required Spec fields. If the resource doesn't exist, CreateOrPatch will attempt to create it with an incomplete Spec, causing validation errors and unnecessary reconciliation loops.

Since the MachineSet/MachineDeployment should already exist after CAPI reconciliation (line 402-408), a GET-first approach would be cleaner and avoid spurious errors during timing edge cases.

🔎 Suggested refactor for MachineSet path
 	if nodePool.Spec.Management.UpgradeType == hyperv1.UpgradeTypeInPlace {
 		// InPlace mode uses MachineSet
 		ms := &capiv1.MachineSet{
 			ObjectMeta: metav1.ObjectMeta{
 				Namespace: key.Namespace,
 				Name:      key.Name,
 			},
 		}
+		// Get the MachineSet first to ensure it exists
+		if err := r.Get(ctx, key, ms); err != nil {
+			if apierrors.IsNotFound(err) {
+				log.V(1).Info("MachineSet not found yet, skipping scale-from-zero annotations")
+				return nil
+			}
+			return fmt.Errorf("failed to get MachineSet: %w", err)
+		}
+
+		// Check if it's an AWS machine template
+		if ms.Spec.Template.Spec.InfrastructureRef.Kind != "AWSMachineTemplate" {
+			return nil
+		}
+
+		// Fetch the AWSMachineTemplate
+		awsMachineTemplate := &capiaws.AWSMachineTemplate{}
+		templateKey := client.ObjectKey{
+			Namespace: cmp.Or(ms.Spec.Template.Spec.InfrastructureRef.Namespace, controlPlaneNamespace),
+			Name:      ms.Spec.Template.Spec.InfrastructureRef.Name,
+		}
+		if err := r.Get(ctx, templateKey, awsMachineTemplate); err != nil {
+			return fmt.Errorf("failed to get AWSMachineTemplate: %w", err)
+		}
+
+		// Patch annotations on the MachineSet
 		if _, err := controllerutil.CreateOrPatch(ctx, r.Client, ms, func() error {
-			// Ensure annotations map is initialized
 			if ms.Annotations == nil {
 				ms.Annotations = make(map[string]string)
 			}
-
-			// Check if it's an AWS machine template
-			if ms.Spec.Template.Spec.InfrastructureRef.Kind != "AWSMachineTemplate" {
-				return nil // Not an AWS machine template, skip
-			}
-
-			// Fetch the AWSMachineTemplate
-			awsMachineTemplate := &capiaws.AWSMachineTemplate{}
-			templateKey := client.ObjectKey{
-				Namespace: cmp.Or(ms.Spec.Template.Spec.InfrastructureRef.Namespace, controlPlaneNamespace),
-				Name:      ms.Spec.Template.Spec.InfrastructureRef.Name,
-			}
-			if err := r.Get(ctx, templateKey, awsMachineTemplate); err != nil {
-				return fmt.Errorf("failed to get AWSMachineTemplate: %w", err)
-			}
-
-			// Set annotations on the MachineSet
 			return setScaleFromZeroAnnotationsOnObject(ctx, r.Client, r.EC2Client, nodePool, ms.Annotations, awsMachineTemplate)
 		}); err != nil {
-			return fmt.Errorf("failed to reconcile MachineSet with scale-from-zero annotations: %w", err)
+			return fmt.Errorf("failed to patch MachineSet annotations: %w", err)
 		}

Apply the same pattern to the MachineDeployment path (lines 1149-1183).

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 381f36f and a78e835.

📒 Files selected for processing (1)
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • hypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
api/hypershift/v1beta1/hostedcluster_types.go (1)
  • AWSPlatform (1211-1211)
api/hypershift/v1beta1/nodepool_types.go (3)
  • NodePool (83-97)
  • UpgradeType (280-280)
  • UpgradeTypeInPlace (289-289)
support/awsclient/client.go (1)
  • Client (81-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)

4-4: LGTM: Standard library cmp import for namespace defaulting.

The import correctly uses the Go 1.22+ standard library cmp package for the cmp.Or function used in lines 1137 and 1171.


1137-1138: LGTM: Correct namespace resolution with cmp.Or.

The use of cmp.Or properly defaults to controlPlaneNamespace when InfrastructureRef.Namespace is empty, correctly addressing the namespace handling issue from previous reviews.

Also applies to: 1171-1172

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/nodepool_controller.go:
- Around line 1115-1148: The current code uses controllerutil.CreateOrPatch on a
MachineSet which can create an empty MachineSet; change this to only act on
existing MachineSets by performing a Get for the MachineSet (ms) based on
key.Namespace/key.Name, return nil if NotFound, then ensure ms.Annotations is
initialized, check ms.Spec.Template.Spec.InfrastructureRef.Kind ==
"AWSMachineTemplate" and fetch the referenced AWSMachineTemplate (using r.Get
into awsMachineTemplate), call setScaleFromZeroAnnotationsOnObject to mutate the
annotations, and persist the change with r.Update (or a Patch) instead of
CreateOrPatch; keep the same error wrapping messages on Get/Update and preserve
use of r.EC2Client and nodePool in the call to
setScaleFromZeroAnnotationsOnObject.
- Around line 1149-1182: The current code uses controllerutil.CreateOrPatch on
the MachineDeployment (md) which will create an empty MachineDeployment when
none exists; instead, use r.Get to fetch the existing MachineDeployment (using
the same ObjectMeta key), return early if NotFound, then modify md.Annotations
and call r.Update to persist changes; keep the same checks for
md.Spec.Template.Spec.InfrastructureRef.Kind == "AWSMachineTemplate", fetch the
AWSMachineTemplate into awsMachineTemplate via r.Get, and call
setScaleFromZeroAnnotationsOnObject to populate annotations before calling
r.Update on md (do not call CreateOrPatch or create new MDs).
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

1130-1132: Consider logging when skipping non-AWS templates.

The code silently skips non-AWS templates by returning nil. Adding a log message would improve debuggability when troubleshooting why scale-from-zero annotations weren't applied.

🔎 Proposed addition
 			// Check if it's an AWS machine template
 			if ms.Spec.Template.Spec.InfrastructureRef.Kind != "AWSMachineTemplate" {
+				log.V(1).Info("Skipping scale-from-zero annotations for non-AWS template", "kind", ms.Spec.Template.Spec.InfrastructureRef.Kind)
 				return nil // Not an AWS machine template, skip
 			}

Apply the same change to the MachineDeployment branch at Line 1164.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a78e835 and 2e100a3.

📒 Files selected for processing (2)
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/main.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/main.go
🔇 Additional comments (3)
hypershift-operator/main.go (3)

127-127: LGTM!

Field definition is clear and consistent with other credential path fields in the struct.


164-164: LGTM!

Flag definition follows the existing pattern for credential path flags.


402-414: LGTM!

The priority-based initialization is well-structured:

  1. Explicit scale-from-zero credentials take precedence when configured
  2. Falls back to platform credentials for AWS private platform

The clear comments and distinct session names ("hypershift-operator-scale-from-zero" vs "hypershift-operator") aid debugging and traceability.

Comment thread hypershift-operator/controllers/nodepool/nodepool_controller.go
Comment thread hypershift-operator/controllers/nodepool/nodepool_controller.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @hypershift-operator/main.go:
- Around line 402-412: The current if-else-if makes PrivatePlatform AWS take
precedence and silently ignores opts.ScaleFromZeroAWSCreds; change the branching
so scale-from-zero creds are honored when present (or explicitly document
precedence). Replace the if-else-if with explicit checks: first check
opts.ScaleFromZeroAWSCreds != "" and initialize ec2Client using
awsutil.NewSession("hypershift-operator-scale-from-zero",
opts.ScaleFromZeroAWSCreds, ...), then separately check
hyperv1.PlatformType(opts.PrivatePlatform) == hyperv1.AWSPlatform and initialize
the PrivatePlatform client only if ec2Client is still nil (or vice versa if you
prefer documented precedence). Ensure you reference the variables
opts.ScaleFromZeroAWSCreds, opts.PrivatePlatform, hyperv1.PlatformType, and
ec2Client and either remove the else or add a clear comment explaining the
chosen precedence.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2e100a3 and 46086d7.

📒 Files selected for processing (2)
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/main.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • hypershift-operator/main.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (2)
hypershift-operator/main.go (1)
cmd/infra/aws/util/util.go (2)
  • NewSession (89-106)
  • NewConfig (120-129)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
api/hypershift/v1beta1/hostedcluster_types.go (1)
  • AWSPlatform (1211-1211)
api/hypershift/v1beta1/nodepool_types.go (3)
  • NodePool (83-97)
  • UpgradeType (280-280)
  • UpgradeTypeInPlace (289-289)
support/awsclient/client.go (1)
  • Client (81-83)
🔇 Additional comments (2)
hypershift-operator/main.go (1)

202-210: LGTM: Early validation prevents runtime failures.

The early validation of scale-from-zero credentials ensures the operator fails fast on startup if the credentials file is missing or inaccessible, rather than encountering errors during reconciliation.

hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

4-4: LGTM: Appropriate use of cmp.Or for namespace defaulting.

The cmp import is correctly used in the new reconcileScaleFromZeroAnnotations function to provide default namespace values.

Comment thread hypershift-operator/controllers/nodepool/nodepool_controller.go Outdated
Comment thread hypershift-operator/main.go
@openshift-ci openshift-ci Bot added the area/platform/aws PR/issue for AWS (AWSPlatform) platform label Jan 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/instancetype/aws/provider.go:
- Around line 19-26: The doc for NewProvider says it panics if ec2Client is nil
but the function does not check; update NewProvider (and its comment if you
prefer the other option) to match behavior: either add a nil-check in
NewProvider that panics with a clear message (e.g., if ec2Client == nil {
panic("NewProvider: ec2Client is nil") }) so callers get an immediate, explicit
failure instead of later nil dereferences in methods like GetInstanceTypeInfo,
or change the function comment to remove the “Panics if ec2Client is nil”
statement if you choose not to enforce it here; update the Provider struct
reference (ec2Client field) and ensure callers are aware of the chosen behavior.
🧹 Nitpick comments (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (2)

96-111: Consider logging unknown CPU architectures instead of silently defaulting.

Lines 105-106 silently default to AMD64 for unknown architectures returned by AWS. This could mask AWS API changes or unexpected instance types. For scale-from-zero functionality, an incorrect architecture could lead to scheduling issues. Consider logging a warning when an unknown architecture is encountered so operators can investigate.

🔎 Proposed enhancement
 		switch architecture {
 		case ec2.ArchitectureTypeX8664:
 			info.CPUArchitecture = instancetype.ArchAMD64
 		case ec2.ArchitectureTypeArm64:
 			info.CPUArchitecture = instancetype.ArchARM64
 		default:
-			// Default to amd64 for unknown architectures
+			// Default to amd64 for unknown architectures, but log a warning
+			log.Printf("WARNING: Unknown architecture %q for instance type %q, defaulting to amd64", architecture, instanceTypeName)
 			info.CPUArchitecture = instancetype.ArchAMD64
 		}

Note: You'll need to pass a logger through the function signature or use a package-level logger.


28-53: Consider caching instance type information to reduce AWS API calls.

The provider queries the EC2 API on every call to GetInstanceTypeInfo. Since instance type specifications are static and rarely change, repeated calls for the same instance type are wasteful and could hit AWS rate limits under load. Consider adding an in-memory cache with a reasonable TTL (e.g., 1 hour) to improve performance and reduce API usage.

hypershift-operator/main.go (1)

414-420: Consider documenting the dual AWS client initialization.

When PrivatePlatform is AWS and ScaleFromZeroAWSCreds is provided, the code initializes two separate AWS sessions and EC2 clients. This is intentional and correct—ec2Client (line 407) is used for private platform functionality, while scaleFromZeroEC2Client (line 418) is specifically for the instance type provider. However, this dual initialization might not be immediately obvious to future maintainers. Consider adding a brief comment explaining why separate clients are needed.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 46086d7 and 8499cd0.

📒 Files selected for processing (5)
  • hypershift-operator/controllers/nodepool/instancetype/aws/provider.go
  • hypershift-operator/controllers/nodepool/instancetype/instancetype.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
  • hypershift-operator/main.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • hypershift-operator/controllers/nodepool/instancetype/aws/provider.go
  • hypershift-operator/controllers/nodepool/instancetype/instancetype.go
  • hypershift-operator/main.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (1)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (3)
  • Provider (20-25)
  • InstanceTypeInfo (30-45)
  • ArchAMD64 (12-12)
hypershift-operator/main.go (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (2)
  • Provider (15-17)
  • NewProvider (22-26)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
  • Provider (20-25)
cmd/infra/aws/util/util.go (2)
  • NewSession (89-106)
  • NewConfig (120-129)
hypershift-operator/controllers/nodepool/nodepool_controller.go (4)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (1)
  • Provider (15-17)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
  • Provider (20-25)
api/hypershift/v1beta1/hostedcluster_types.go (1)
  • AWSPlatform (1211-1211)
support/awsclient/client.go (1)
  • Client (81-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (6)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)

1-45: LGTM! Clean interface design for instance type providers.

The interface and types are well-designed and clearly documented. The Provider interface provides a clean abstraction for fetching instance type information across different cloud providers, and the InstanceTypeInfo struct includes all necessary fields for scale-from-zero functionality.

hypershift-operator/main.go (2)

203-212: LGTM! Good early validation of configuration.

The early validation of the scale-from-zero credentials file is good defensive programming. It catches configuration errors at startup rather than during reconciliation, providing clear error messages to operators.


422-433: LGTM! Instance type provider is correctly wired.

The InstanceTypeProvider is properly integrated into the NodePoolReconciler. The nil check in the reconciler (line 414 of nodepool_controller.go) ensures the provider is only used when available, making it an optional enhancement rather than a required dependency.

hypershift-operator/controllers/nodepool/nodepool_controller.go (3)

412-418: Scale-from-zero errors are logged but not surfaced to users.

When reconcileScaleFromZeroAnnotations fails, the error is logged but reconciliation continues without updating the NodePool status or failing the reconcile. This means EC2 API failures or configuration issues won't be visible to users, potentially causing silent autoscaling problems.

Consider either:

  1. Adding a dedicated status condition (e.g., ScaleFromZeroReady) to surface failures, or
  2. Returning the error to halt reconciliation when annotation setup fails

Based on past review comments.

Likely an incorrect or invalid review comment.


1151-1184: Use Get + Update instead of CreateOrPatch to avoid creating empty resources.

Same issue as with the MachineSet path: CreateOrPatch will create the MachineDeployment if it doesn't exist. For scale-from-zero annotations, you should only annotate existing resources.

Based on past review comments, this was identified as a critical issue that should be addressed.

Likely an incorrect or invalid review comment.


1117-1150: Use Get + Update instead of CreateOrPatch to avoid creating empty resources.

CreateOrPatch will create the MachineSet if it doesn't exist. For scale-from-zero annotations, you should only annotate existing resources, not create new ones. Creating a MachineSet with only annotations and no required spec fields will result in an incomplete resource.

Based on past review comments, this was identified as a critical issue that should be addressed.

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

411-419: Consider adding a status condition for scale-from-zero failures.

Errors from reconcileScaleFromZeroAnnotations are logged but don't block reconciliation or update NodePool status. While this is acceptable for a non-critical feature, users won't see annotation failures reflected in the NodePool status.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8499cd0 and e943b79.

📒 Files selected for processing (12)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/nodepool/instancetype/aws/provider.go
  • hypershift-operator/controllers/nodepool/instancetype/instancetype.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
  • hypershift-operator/main.go
💤 Files with no reviewable changes (5)
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • hypershift-operator/controllers/nodepool/instancetype/instancetype.go
  • hypershift-operator/controllers/nodepool/instancetype/aws/provider.go
  • hypershift-operator/main.go
  • cmd/install/install.go
  • cmd/install/assets/hypershift_operator.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (2)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (1)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (4)
  • Provider (20-25)
  • InstanceTypeInfo (30-45)
  • ArchAMD64 (12-12)
  • ArchARM64 (14-14)
hypershift-operator/main.go (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (2)
  • Provider (15-17)
  • NewProvider (22-26)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
  • Provider (20-25)
cmd/infra/aws/util/util.go (2)
  • NewSession (89-106)
  • NewConfig (120-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
🔇 Additional comments (20)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (3)

28-53: LGTM!

The GetInstanceTypeInfo implementation correctly queries the EC2 API, validates the response, and transforms the data. Error handling is appropriate.


55-114: LGTM!

The transformInstanceTypeInfo function has thorough validation for required fields (vCPU, memory) and handles optional fields (GPU, architecture) with sensible defaults. The architecture mapping and fallback to amd64 are reasonable.


116-126: LGTM!

The GPU counting logic correctly handles the nil check on individual GPU counts and sums across all GPU devices.

cmd/install/assets/hypershift_operator.go (4)

100-104: LGTM!

The constant renaming from AWS-specific to generic naming aligns with the provider-agnostic scale-from-zero approach.


157-179: LGTM!

The ScaleFromZeroCredentialsSecret struct and Build method are properly refactored to support multiple providers. The Provider field is stored for future use.


450-453: LGTM!

The deployment struct fields are properly renamed to be provider-agnostic.


573-591: LGTM!

The conditional wiring correctly checks all required fields before adding the scale-from-zero volume mount and command-line arguments. The provider flag is now passed to the operator.

hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)

1-45: LGTM!

Clean interface design that provides a cloud-agnostic abstraction for instance type information. The Provider interface and InstanceTypeInfo struct are well-documented and support the scale-from-zero functionality across different cloud providers.

hypershift-operator/controllers/nodepool/nodepool_controller.go (2)

4-4: LGTM!

The new imports and InstanceTypeProvider field on NodePoolReconciler properly integrate the provider-based scale-from-zero functionality.

Also applies to: 16-16, 102-103


423-426: LGTM!

The supportedScaleFromZeroPlatform helper correctly restricts the feature to AWS platform only.

hypershift-operator/main.go (4)

130-131: LGTM!

The new ScaleFromZeroProvider and ScaleFromZeroCreds fields properly support the provider-based scale-from-zero configuration.


206-221: LGTM!

The early validation ensures configuration consistency: provider is required when credentials are provided, and only supported providers are accepted. File existence check prevents runtime failures.


440-449: LGTM!

The NodePoolReconciler is correctly wired with the InstanceTypeProvider, enabling the scale-from-zero annotation flow.


421-438: Add AWS region configuration or validation for scale-from-zero provider.

The AWS session is created with an empty region, which relies on the AWS_REGION environment variable being set at runtime. Unlike other AWS session creations in the codebase, this code path lacks:

  1. An explicit comment documenting the AWS_REGION dependency
  2. Validation that AWS_REGION is actually set before attempting EC2 API calls

Consider either passing an explicit region parameter to awsutil.NewSession() or adding a validation check (similar to the check in control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:271-276) to ensure the environment variable is set, along with a comment explaining the dependency.

cmd/install/install.go (6)

136-140: LGTM!

The new Options fields properly support the provider-agnostic scale-from-zero configuration, replacing the AWS-specific fields.


194-217: LGTM!

The validation logic is comprehensive: enforces mutual exclusivity between file-based and secret-based credentials, requires provider when credentials are provided, validates the provider value, and checks file accessibility.


321-325: LGTM!

The new CLI flags are properly documented and aligned with the Options struct fields.


383-383: LGTM!

The default value for ScaleFromZeroCredentialsSecretKey is set to "credentials", consistent with other credential secret key defaults.


912-914: LGTM!

The deployment wiring correctly passes all three scale-from-zero fields to the operator deployment.


1224-1245: LGTM!

The secret setup correctly handles both file-based credentials (building a new secret with provider info) and pre-existing secrets. The Provider field is properly included in the secret construction.

@LiangquanLi930 LiangquanLi930 changed the title fix(OCPBUGS-70320): enable scale-up from zero with InPlace upgrade OCPBUGS-70320: enable scale-up from zero with InPlace upgrade Jan 7, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@LiangquanLi930: This pull request references Jira Issue OCPBUGS-70320, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

Enable scale-up from zero with InPlace upgrade
Remove scale-from-zero credentials workaround that prevented autoscaling.

Which issue(s) this PR fixes:

Fixes: OCPBUGS-70320

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Move architecture constants to hyperv1, improve error handling, and
refactor for platform abstraction.
@LiangquanLi930
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 12, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@LiangquanLi930: This pull request references Jira Issue OCPBUGS-70320, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (lgao@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @hypershift-operator/main.go:
- Around line 426-438: The EC2 client for scale-from-zero is created with
awsutil.NewSession using an empty region, causing unpredictable region
resolution; add a new option (e.g., opts.ScaleFromZeroRegion) or require
AWS_REGION and pass that explicit region into awsutil.NewSession when creating
the session for scale-from-zero (the block that calls awsutil.NewSession,
awsutil.NewConfig, ec2.New and awsinstancetype.NewProvider) and validate the
flag is set (or fallback with a clear error) similar to how
OIDCStorageProviderS3Region is handled so DescribeInstanceTypes always uses an
explicit region.
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)

91-109: Minor: Duplicate step numbering in comments.

Steps are numbered "2" twice (lines 91 and 105). This doesn't affect functionality but could cause confusion when referencing the code.

📝 Suggested fix for comment numbering
-	// 2. Check if Status.Capacity is already provided (prefer native support)
+	// 2. Check if Status.Capacity is already provided (prefer native support)
 	if len(statusCapacity) > 0 {
 		// ...
 	}
 
-	// 2. Skip if provider is not configured
+	// 3. Skip if provider is not configured
 	if provider == nil {
 		// ...
 	}
 
-	// 3. Validate instance type is not empty
+	// 4. Validate instance type is not empty

And renumber subsequent steps accordingly (4→5, 5→6, etc.).

cmd/install/assets/hypershift_operator.go (1)

157-179: Unused Provider field in ScaleFromZeroCredentialsSecret.

The Provider field (line 161) is defined in the struct but not used in the Build() method. If this is intentional for future use (e.g., adding provider metadata to the secret), consider documenting that. Otherwise, it could be removed to avoid confusion.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 992f478 and b8c3530.

📒 Files selected for processing (16)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • cmd/install/install_test.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
  • hypershift-operator/controllers/nodepool/instancetype/aws/provider.go
  • hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go
  • hypershift-operator/controllers/nodepool/instancetype/instancetype.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
  • hypershift-operator/controllers/nodepool/scale_from_zero_test.go
  • hypershift-operator/main.go
  • test/e2e/autoscaling_test.go
💤 Files with no reviewable changes (5)
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
  • hypershift-operator/controllers/machinedeployment/controller.go
  • hypershift-operator/controllers/machinedeployment/controller_test.go
  • hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
  • hypershift-operator/controllers/machinedeployment/utils/providerspec.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • hypershift-operator/controllers/nodepool/instancetype/instancetype.go
  • test/e2e/autoscaling_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • cmd/install/install_test.go
  • hypershift-operator/controllers/nodepool/scale_from_zero_test.go
  • hypershift-operator/controllers/nodepool/scale_from_zero.go
  • hypershift-operator/main.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go
  • cmd/install/install.go
  • cmd/install/assets/hypershift_operator.go
  • hypershift-operator/controllers/nodepool/instancetype/aws/provider.go
🧬 Code graph analysis (4)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
  • Provider (10-15)
hypershift-operator/main.go (2)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
  • Provider (10-15)
cmd/infra/aws/util/util.go (2)
  • NewSession (89-106)
  • NewConfig (120-129)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
  • Provider (10-15)
api/hypershift/v1beta1/hostedcluster_types.go (2)
  • PlatformType (1213-1213)
  • AWSPlatform (1217-1217)
api/hypershift/v1beta1/nodepool_types.go (3)
  • NodePool (83-97)
  • UpgradeType (280-280)
  • UpgradeTypeInPlace (289-289)
hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go (2)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
  • InstanceTypeInfo (20-35)
support/supportedversion/version.go (1)
  • String (254-256)
🔇 Additional comments (24)
cmd/install/install_test.go (1)

143-193: LGTM! Comprehensive test coverage for scale-from-zero validation.

Test cases appropriately cover error paths (missing provider, invalid provider, conflicting creds) and success paths (AWS with creds file, AWS with secret reference). Using /dev/null as a placeholder for existing file paths is a practical approach.

hypershift-operator/controllers/nodepool/scale_from_zero_test.go (1)

13-62: LGTM! Well-structured tests for taint annotation formatting.

Tests cover essential edge cases including empty taints, empty values, and deterministic sorting. The expected formats align with the CAPI scale-from-zero specification.

hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go (1)

1-401: LGTM! Excellent test coverage for the AWS instance type provider.

The tests comprehensively cover:

  • GPU counting with various nil/empty scenarios
  • Input validation for all required fields (name, vCPU, memory, architecture)
  • Architecture mapping (x86_64 → AMD64, arm64 → ARM64)
  • Provider integration including found, not-found, and API error cases

The configurable mock pattern is clean and enables targeted testing of edge cases.

hypershift-operator/main.go (1)

130-131: LGTM! Clean addition of scale-from-zero configuration options.

The new StartOptions fields and corresponding command-line flags follow the existing patterns in the codebase. The flag names are clear and consistent with other credential/provider patterns in the operator.

Also applies to: 168-169

hypershift-operator/controllers/nodepool/scale_from_zero.go (2)

44-63: LGTM! Clean implementation of taint formatting.

The function correctly implements the CAPI scale-from-zero annotation format with deterministic sorting for stable output.


134-147: Good design: Architecture enforcement from instance type.

The explicit override of kubernetes.io/arch with the actual instance architecture (line 140) is the correct approach. This prevents misconfigurations where user-specified labels could lie about the node's architecture, which would cause pod scheduling failures.

hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (4)

1-28: LGTM - Clean provider initialization.

The Provider struct and NewProvider constructor follow good patterns - the interface dependency (ec2iface.EC2API) enables easy testing via mocks.


30-55: LGTM - Proper API call and validation.

The GetInstanceTypeInfo method correctly:

  • Uses context for cancellation
  • Validates empty response
  • Wraps errors with context

57-114: LGTM - Thorough validation in transformation logic.

The transformInstanceTypeInfo function has comprehensive nil checks and validates all required fields (vCPU, memory, architecture) before returning. The architecture switch handles the supported AWS architectures appropriately.


116-131: LGTM - Safe GPU counting with nil guards.

The getGpuCount helper properly handles nil gpuInfo, nil Gpus slice, and nil individual GPU entries before accessing the Count field.

cmd/install/assets/hypershift_operator.go (3)

100-104: LGTM - Consistent naming convention for new secret.

The new scaleFromZeroCredsSecretName constant follows the established naming pattern alongside other credential secrets.


450-453: LGTM - New deployment fields for scale-from-zero configuration.

The three new fields (ScaleFromZeroSecret, ScaleFromZeroSecretKey, ScaleFromZeroProvider) follow the existing pattern used for OIDC and AWS private credentials.


573-591: LGTM - Consistent secret mounting pattern.

The scale-from-zero credentials mounting mirrors the OIDC storage provider pattern (lines 552-571), ensuring consistency. The ReadOnly: true flag is a good security practice for credential mounts.

hypershift-operator/controllers/nodepool/nodepool_controller.go (4)

15-15: LGTM - Clean integration of instance type provider.

The new InstanceTypeProvider field on NodePoolReconciler follows the existing pattern of injecting dependencies (similar to ReleaseProvider, ImageMetadataProvider).

Also applies to: 101-102


410-418: LGTM - Appropriate error handling for non-critical annotation updates.

The decision to log the error and requeue after 30 seconds instead of failing the reconciliation is sensible. Scale-from-zero annotations are not critical for cluster operation, so a transient failure here shouldn't block other reconciliation.


423-426: LGTM - Extensible platform check.

The supportedScaleFromZeroPlatform function provides a clear extension point for adding future platform support (Azure, GCP, etc.) as noted in the commented code in reconcileScaleFromZeroAnnotations.


1115-1187: LGTM - Well-structured reconciliation with proper error handling.

The reconcileScaleFromZeroAnnotations function:

  • Handles not-found gracefully for both machine templates and CAPI objects
  • Correctly distinguishes between InPlace (MachineSet) and Replace (MachineDeployment) modes
  • Uses client.MergeFrom patch to minimize unnecessary API updates
  • Has clear comments for future platform additions
cmd/install/install.go (7)

135-139: LGTM - Clean field additions for scale-from-zero configuration.

The new fields follow the established naming convention used for AWS private credentials and OIDC storage provider credentials.


144-147: LGTM - Good UX improvement with normalization.

Trimming whitespace and lowercasing the provider value prevents common user input errors and ensures consistent validation.


198-222: LGTM - Comprehensive validation logic.

The validation correctly enforces:

  1. Mutual exclusivity between --scale-from-zero-creds and --scale-from-zero-secret
  2. Provider is required when credentials are specified
  3. Provider must be in the supported list
  4. Credentials file must exist and be accessible

326-329: LGTM - Well-documented CLI flags.

The flag descriptions clearly indicate purpose and alternatives (file vs. secret).


388-388: LGTM - Sensible default for credentials secret key.

Defaulting to "credentials" is consistent with other credential secrets in the codebase (e.g., AWSPrivateCredentialsSecretKey, OIDCStorageProviderS3CredentialsSecretKey).


917-919: LGTM - Proper wiring to operator deployment.

The scale-from-zero configuration is correctly passed to the HyperShiftOperatorDeployment builder.


1229-1250: LGTM - Consistent secret setup pattern.

The scale-from-zero credentials setup follows the same pattern as other credentials (OIDC, AWS private), supporting both file-based and pre-existing secret references.

Comment thread hypershift-operator/main.go
@LiangquanLi930
Copy link
Copy Markdown
Contributor Author

/test e2e-aws

@bryan-cox
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox, LiangquanLi930, muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@LiangquanLi930
Copy link
Copy Markdown
Contributor Author

/verified by e2e test

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jan 12, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@LiangquanLi930: This PR has been marked as verified by e2e test.

Details

In response to this:

/verified by e2e test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 12, 2026

@LiangquanLi930: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit aa6f394 into openshift:main Jan 12, 2026
20 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@LiangquanLi930: Jira Issue Verification Checks: Jira Issue OCPBUGS-70320
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-70320 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

What this PR does / why we need it:

Enable scale-up from zero with InPlace upgrade. Remove machinedeployment controller and refactor scale-from-zero using instancetype provider abstraction.

Which issue(s) this PR fixes:

Fixes: OCPBUGS-70320

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@LiangquanLi930
Copy link
Copy Markdown
Contributor Author

/jira backport release-4.21

LiangquanLi930 added a commit to LiangquanLi930/hypershift that referenced this pull request Jan 13, 2026
OCPBUGS-70320: enable scale-up from zero with InPlace upgrade
@openshift-ci-robot
Copy link
Copy Markdown

@LiangquanLi930: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.21

Details

In response to this:

/jira backport release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@openshift-ci-robot: #7427 failed to apply on top of branch "release-4.21":

Applying: fix(nodepool): enable scale-up from zero with InPlace upgrade
Using index info to reconstruct a base tree...
M	cmd/install/assets/hypershift_operator.go
M	cmd/install/install.go
M	cmd/install/install_test.go
A	hypershift-operator/controllers/machinedeployment/controller.go
A	hypershift-operator/controllers/machinedeployment/controller_test.go
A	hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
A	hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
A	hypershift-operator/controllers/machinedeployment/utils/providerspec.go
M	hypershift-operator/main.go
M	test/e2e/autoscaling_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/autoscaling_test.go
CONFLICT (content): Merge conflict in test/e2e/autoscaling_test.go
Auto-merging hypershift-operator/main.go
CONFLICT (content): Merge conflict in hypershift-operator/main.go
Auto-merging cmd/install/install_test.go
Auto-merging cmd/install/install.go
CONFLICT (content): Merge conflict in cmd/install/install.go
Auto-merging cmd/install/assets/hypershift_operator.go
CONFLICT (content): Merge conflict in cmd/install/assets/hypershift_operator.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 fix(nodepool): enable scale-up from zero with InPlace upgrade

Details

In response to this:

@LiangquanLi930: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.21

In response to this:

/jira backport release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LiangquanLi930 added a commit to LiangquanLi930/hypershift that referenced this pull request Jan 13, 2026
OCPBUGS-70320: enable scale-up from zero with InPlace upgrade
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

Fix included in accepted release 4.22.0-0.nightly-2026-01-19-085729

wangke19 added a commit to wangke19/release that referenced this pull request Jan 27, 2026
The scale-from-zero flags were refactored in hypershift PR openshift#7427 from
AWS-specific to platform-agnostic flags.

Changes:
- Added: --scale-from-zero-provider=aws
- Added: --scale-from-zero-creds=/etc/hypershift-pool-aws-credentials/credentials

The old --scale-from-zero-aws-creds flag no longer exists and has been
replaced with the combination of --scale-from-zero-provider and
--scale-from-zero-creds flags.

Related: openshift/hypershift#7427
wangke19 added a commit to wangke19/release that referenced this pull request Jan 27, 2026
The scale-from-zero flags were refactored in hypershift PR openshift#7427 from
AWS-specific to platform-agnostic flags.

Changes:
- Added: --scale-from-zero-provider=aws
- Added: --scale-from-zero-creds=/etc/hypershift-pool-aws-credentials/credentials

The old --scale-from-zero-aws-creds flag no longer exists and has been
replaced with the combination of --scale-from-zero-provider and
--scale-from-zero-creds flags.

Related: openshift/hypershift#7427
wangke19 added a commit to wangke19/release that referenced this pull request Jan 27, 2026
The scale-from-zero flags were refactored in hypershift PR openshift#7427 from
AWS-specific to platform-agnostic flags.

Changes:
- Added: --scale-from-zero-provider=aws
- Added: --scale-from-zero-creds=/etc/hypershift-pool-aws-credentials/credentials

The old --scale-from-zero-aws-creds flag no longer exists and has been
replaced with the combination of --scale-from-zero-provider and
--scale-from-zero-creds flags.

Related: openshift/hypershift#7427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants