OCPBUGS-70320: enable scale-up from zero with InPlace upgrade#7427
Conversation
WalkthroughRemoved the MachineDeployment scale-from-zero controller and EC2 cache/utilities; introduced a cloud-agnostic instancetype.Provider with an AWS provider, moved scale-from-zero annotation logic into the NodePool reconciler, and updated startup flags, installer wiring, and tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Command failed Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @hypershift-operator/controllers/nodepool/scale_from_zero.go:
- Around line 104-128: getInstanceTypeInfo currently blocks on the EC2 call
without cancellation; change its signature to accept a context.Context (e.g.,
getInstanceTypeInfo(ctx context.Context, ec2Client ec2iface.EC2API, instanceType
string)) and use the context-aware EC2 call
(ec2Client.DescribeInstanceTypesWithContext or the SDK's Context variant) when
invoking DescribeInstanceTypes, returning errors as before; update the caller
setScaleFromZeroAnnotationsOnObject to pass through the caller's context into
getInstanceTypeInfo so the EC2 request honors timeouts and cancellations.
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)
1169-1177: Consider returning early if no changes are needed.The current implementation always calls
r.Update(ctx, obj)even ifsetScaleFromZeroAnnotationsOnObjectdidn't modify the annotations (e.g., when CAPA already provides Status.Capacity). This could result in unnecessary API calls.🔎 Suggested improvement
Consider capturing whether annotations were modified and skipping the update if no changes occurred:
+ // Capture original annotation count to detect changes + originalLen := len(annotations) + // Set annotations on the object (MachineSet or MachineDeployment) if err := setScaleFromZeroAnnotationsOnObject(ctx, r.Client, r.EC2Client, nodePool, annotations, awsMachineTemplate); err != nil { return fmt.Errorf("failed to set scale-from-zero annotations: %w", err) } + // Skip update if no annotations were added/modified + if len(annotations) == originalLen { + return nil + } + // Update the objectNote: This is a simplified check. A more robust solution would compare annotation values, but given the context this optimization may not be necessary.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (10)
cmd/install/assets/hypershift_operator.gocmd/install/install.gohypershift-operator/controllers/machinedeployment/controller.gohypershift-operator/controllers/machinedeployment/controller_test.gohypershift-operator/controllers/machinedeployment/ec2_instance_types.gohypershift-operator/controllers/machinedeployment/ec2_instance_types_test.gohypershift-operator/controllers/machinedeployment/utils/providerspec.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/scale_from_zero.gohypershift-operator/main.go
💤 Files with no reviewable changes (6)
- hypershift-operator/main.go
- hypershift-operator/controllers/machinedeployment/controller_test.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
- hypershift-operator/controllers/machinedeployment/controller.go
- hypershift-operator/controllers/machinedeployment/utils/providerspec.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/install/assets/hypershift_operator.gohypershift-operator/controllers/nodepool/nodepool_controller.gocmd/install/install.gohypershift-operator/controllers/nodepool/scale_from_zero.go
🧬 Code graph analysis (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
api/hypershift/v1beta1/hostedcluster_types.go (1)
AWSPlatform(1211-1211)api/hypershift/v1beta1/nodepool_types.go (3)
NodePool(83-97)UpgradeType(280-280)UpgradeTypeInPlace(289-289)support/awsclient/client.go (1)
Client(81-83)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
support/awsclient/client.go (1)
Client(81-83)
🔇 Additional comments (10)
cmd/install/assets/hypershift_operator.go (1)
100-104: Clean removal of scale-from-zero credentials infrastructure.The consolidation of secret-name constants from four to three and removal of the scale-from-zero AWS credentials constant aligns with the PR objective of moving scale-from-zero support into the NodePool controller.
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
409-417: Non-blocking error handling for scale-from-zero annotations.The error from
reconcileScaleFromZeroAnnotationsis logged but does not abort reconciliation. This is appropriate since scale-from-zero annotations are a non-critical optimization for the cluster autoscaler, and failures shouldn't block the core NodePool reconciliation.
1108-1152: Well-structured handling of both upgrade types.The method correctly handles both InPlace (MachineSet) and Replace (MachineDeployment) upgrade types, with proper nil checks for annotations map initialization and appropriate early returns for not-found cases.
cmd/install/install.go (3)
590-615: Clean removal of scale-from-zero credentials from install flow.The updated function signatures and call sites correctly remove the scale-from-zero secret handling. The
setupAuthandsetupOperatorResourcesfunctions are now streamlined without the previously required scale-from-zero credentials.
836-876: Operator deployment configuration updated correctly.The
setupOperatorResourcesfunction signature and theHyperShiftOperatorDeploymentstruct initialization have been properly updated to remove theScaleFromZeroAWSSecretandScaleFromZeroAWSSecretKeyfields.
1107-1184: Auth setup correctly returns fewer secrets.The
setupAuthfunction no longer handles scale-from-zero credentials, simplifying the authentication setup flow while maintaining the existing OIDC and platform-specific credential handling.hypershift-operator/controllers/nodepool/scale_from_zero.go (4)
77-91: Architecture detection takes only the first supported architecture.The code takes only the first architecture from
SupportedArchitectures(line 78). For instance types that support multiple architectures (e.g., Graviton instances that might show both arm64 and x86_64 in some configurations), this may not accurately reflect the intended architecture for the NodePool.Consider validating this against the NodePool's
Spec.Archif available.
150-161: Good check for native CAPA Status.Capacity support.The early return when
awsMachineTemplate.Status.Capacityis populated ensures that when CAPA natively supports scale-from-zero, the workaround annotations are cleaned up. This provides a clean migration path.
180-192: Labels annotation includes architecture and NodePool labels.The implementation correctly includes both the instance architecture and user-specified node labels, with deterministic sorting for consistent output. This aligns with the cluster-autoscaler scale-from-zero specification.
130-145: Taints conversion follows CAPI specification.The
taintsToAnnotationfunction correctly formats taints according to the CAPI scale-from-zero proposal with deterministic ordering via sorting.
11f7b1f to
3832780
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/nodepool_controller.go:
- Around line 1136-1142: The code builds templateKey using
ms.Spec.Template.Spec.InfrastructureRef.Namespace directly, which can be empty;
change it to default to the MachineSet's namespace when empty (e.g., use ns :=
ms.Namespace; if ms.Spec.Template.Spec.InfrastructureRef.Namespace != "" { ns =
ms.Spec.Template.Spec.InfrastructureRef.Namespace } and construct templateKey
with that ns) before calling r.Get(ctx, templateKey, awsMachineTemplate). Apply
the identical change to the MachineDeployment branch (the equivalent
variables/fields around the MachineDeployment lookup) so both branches use the
resource's namespace when InfrastructureRef.Namespace is empty.
🧹 Nitpick comments (4)
cmd/install/assets/hypershift_operator.go (1)
101-103: Document the scale-from-zero credentials removal and provide migration guidance.The removal of
scaleFromZeroAWSCredsSecretNameand related scale-from-zero credential handling is a significant change. Consider:
- Add a comment explaining that scale-from-zero functionality has moved to the NodePool controller
- Document the migration path for existing installations that may have scale-from-zero credentials configured
- Reference the bug OCPBUGS-70320 in the code to provide context for future maintainers
Would you like me to generate documentation or migration notes for this change?
cmd/install/install.go (1)
590-590: LGTM! Signature changes are internally consistent.The removal of scale-from-zero secret handling from
setupAuthandsetupOperatorResourcesis implemented correctly:
- Function signatures updated consistently
- All call sites properly adjusted
- Return statements correctly match new signatures
- Error handling preserved
However, consider adding a comment in
setupAuthor the Options struct explaining that scale-from-zero functionality has been relocated to the NodePool controller, which would help future maintainers understand this architectural change.Also applies to: 615-615, 836-836, 1107-1107, 1184-1184
hypershift-operator/controllers/nodepool/scale_from_zero.go (2)
64-67: Add nil check forrawInstanceTypeto prevent panic.If
rawInstanceTypeis nil, dereferencing*rawInstanceType.InstanceTypeon line 66 will cause a panic. While the current caller (getInstanceTypeInfo) validates the response, defensive coding here would prevent issues if the function is called from other contexts.🔎 Suggested improvement
func transformInstanceType(rawInstanceType *ec2.InstanceTypeInfo) InstanceType { + if rawInstanceType == nil || rawInstanceType.InstanceType == nil { + return InstanceType{} + } instanceType := InstanceType{ InstanceType: *rawInstanceType.InstanceType, }
150-150: Unused parameters:ctxandcare never used in the function body.The
ctxparameter should be passed togetInstanceTypeInfo(once that function accepts context). Thec client.Clientparameter appears to be completely unused and can be removed.🔎 Suggested fix
-func setScaleFromZeroAnnotationsOnObject(ctx context.Context, c client.Client, ec2Client ec2iface.EC2API, nodePool *hyperv1.NodePool, annotations map[string]string, awsMachineTemplate *infrav1.AWSMachineTemplate) error { +func setScaleFromZeroAnnotationsOnObject(ctx context.Context, ec2Client ec2iface.EC2API, nodePool *hyperv1.NodePool, annotations map[string]string, awsMachineTemplate *infrav1.AWSMachineTemplate) error {Then update callers in
nodepool_controller.goto remove ther.Clientargument.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (10)
cmd/install/assets/hypershift_operator.gocmd/install/install.gohypershift-operator/controllers/machinedeployment/controller.gohypershift-operator/controllers/machinedeployment/controller_test.gohypershift-operator/controllers/machinedeployment/ec2_instance_types.gohypershift-operator/controllers/machinedeployment/ec2_instance_types_test.gohypershift-operator/controllers/machinedeployment/utils/providerspec.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/scale_from_zero.gohypershift-operator/main.go
💤 Files with no reviewable changes (6)
- hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
- hypershift-operator/controllers/machinedeployment/controller_test.go
- hypershift-operator/main.go
- hypershift-operator/controllers/machinedeployment/utils/providerspec.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
- hypershift-operator/controllers/machinedeployment/controller.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/install/install.gocmd/install/assets/hypershift_operator.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/scale_from_zero.go
🧬 Code graph analysis (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
api/hypershift/v1beta1/hostedcluster_types.go (1)
AWSPlatform(1211-1211)api/hypershift/v1beta1/nodepool_types.go (2)
UpgradeType(280-280)UpgradeTypeInPlace(289-289)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
support/awsclient/client.go (1)
Client(81-83)
🔇 Additional comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
409-416: LGTM!The integration point is well-placed after
capi.Reconcile()succeeds. Non-blocking error handling is appropriate here since scale-from-zero annotations are an enhancement for autoscaling, not critical to NodePool operation. The nil check onr.EC2Clientprevents issues when the client isn't configured.
1115-1151: The MachineSet/MachineDeployment is already created and fully populated bycapi.Reconcile()before this code runs, soCreateOrPatchwill always fetch the existing object.The execution order is guaranteed:
capi.Reconcile(ctx)(line 402) completes first and creates the MachineSet/MachineDeployment with all Spec fields properly populated viareconcileMachineSet()orreconcileMachineDeployment(). ThereconcileScaleFromZeroAnnotations()call (line 413) only executes ifcapi.Reconcile()succeeds with no early return. WhenCreateOrPatchis called, it will retrieve the already-existing object and populate it before the mutate function runs, so theInfrastructureRefcheck will operate on valid data. The pattern of creating an empty object forCreateOrPatchis unconventional but not incorrect given this ordering.
|
/retest |
3832780 to
4375af3
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/nodepool_controller.go:
- Around line 409-417: The call to reconcileScaleFromZeroAnnotations logs errors
but continues reconciliation, so failures against the EC2 API are not surfaced;
modify the nodepool reconciliation to set a NodePool condition (e.g.,
ScaleFromZeroAnnotationsReady) to False with the error details when
reconcileScaleFromZeroAnnotations(ctx, nodePool, controlPlaneNamespace) returns
an error, and update the NodePool status accordingly, or alternatively return
that error from Reconcile to stop the reconciliation; ensure you reference
reconcileScaleFromZeroAnnotations and update the NodePool status conditions/path
where other conditions are set so the failure is visible to users.
- Around line 1108-1192: The reconcileScaleFromZeroAnnotations function
mishandles AWSMachineTemplate lookups by always using controlPlaneNamespace and
it silently ignores NotFound errors; fix both: when building templateKey for
both the MachineSet path (ms) and MachineDeployment path (md) use the
InfrastructureRef.Namespace if non-empty (i.e., templateNS :=
spec.Template.Spec.InfrastructureRef.Namespace; if templateNS=="" use
controlPlaneNamespace) and set templateKey.Namespace = templateNS; also stop
swallowing not-found errors in the CreateOrPatch error handling—do not return
nil on apierrors.IsNotFound(err); instead wrap and return the error (or at
minimum log a warning and return it) so missing MachineSet/MachineDeployment or
AWSMachineTemplate lookup failures surface rather than being silently ignored;
reference reconcileScaleFromZeroAnnotations, ms/md, awsMachineTemplate,
templateKey, and setScaleFromZeroAnnotationsOnObject when applying the changes.
In @hypershift-operator/controllers/nodepool/scale_from_zero.go:
- Around line 64-91: The transformInstanceType function currently defaults
CPUArchitecture to ArchitectureAmd64 on unknown or missing ProcessorInfo which
can mislabel ARM instances; change transformInstanceType to return
(InstanceType, error) instead of just InstanceType, detect when ProcessorInfo is
nil, SupportedArchitectures is empty, or the architecture value is unrecognized
and return a descriptive error instead of defaulting, and update all callers of
transformInstanceType to handle the error (propagate or log and skip the
instance); alternatively, if you must keep a default for compatibility, emit a
warning via the operator logger (not silently default) before assigning
ArchitectureAmd64 and document the assumption.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (10)
cmd/install/assets/hypershift_operator.gocmd/install/install.gohypershift-operator/controllers/machinedeployment/controller.gohypershift-operator/controllers/machinedeployment/controller_test.gohypershift-operator/controllers/machinedeployment/ec2_instance_types.gohypershift-operator/controllers/machinedeployment/ec2_instance_types_test.gohypershift-operator/controllers/machinedeployment/utils/providerspec.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/scale_from_zero.gohypershift-operator/main.go
💤 Files with no reviewable changes (6)
- hypershift-operator/controllers/machinedeployment/controller_test.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
- hypershift-operator/controllers/machinedeployment/controller.go
- hypershift-operator/main.go
- hypershift-operator/controllers/machinedeployment/utils/providerspec.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/install/install.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
hypershift-operator/controllers/nodepool/nodepool_controller.gocmd/install/assets/hypershift_operator.gohypershift-operator/controllers/nodepool/scale_from_zero.go
🧬 Code graph analysis (1)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
support/awsclient/client.go (1)
Client(81-83)
🔇 Additional comments (3)
cmd/install/assets/hypershift_operator.go (1)
100-103: LGTM! Clean removal of deprecated scale-from-zero credentials.The removal of
scaleFromZeroAWSCredsSecretNameand related infrastructure aligns with the PR objective to move scale-from-zero annotation logic into the NodePool controller, eliminating the need for separate credentials.hypershift-operator/controllers/nodepool/scale_from_zero.go (2)
104-128: LGTM! Context handling implemented correctly.The function now uses
DescribeInstanceTypesWithContext(line 115), properly propagating the caller's context for timeout and cancellation support. This addresses the previous review concern about blocking EC2 calls.
147-204: LGTM! Well-structured scale-from-zero annotation logic.The function correctly:
- Checks for CAPA native support first (lines 152-161) and cleans up workaround annotations when not needed
- Validates input before fetching EC2 data (lines 164-167)
- Propagates errors appropriately (line 173)
- Ensures deterministic output by sorting labels and taints (lines 191, 143)
- Handles optional taints with proper cleanup (lines 194-201)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
409-417: Consider surfacing scale-from-zero annotation failures in NodePool status.When
reconcileScaleFromZeroAnnotationsfails (e.g., EC2 API unavailable), the error is logged but reconciliation continues. Users won't see this failure reflected in the NodePool status, which could lead to silent autoscaling issues.Consider either:
- Adding a dedicated status condition (e.g.,
ScaleFromZeroAnnotationsReady) to surface failures- Returning the error to block reconciliation until resolved
1108-1186: LGTM! Implementation correctly handles both upgrade types.The namespace defaulting using
cmp.Or(lines 1137, 1171) properly addresses the previous review concern. The CreateOrPatch pattern is appropriate here since CAPI reconciliation ensures the MachineSet/MachineDeployment exists before this code runs.One minor observation: if the MachineSet/MachineDeployment doesn't exist yet (e.g., during initial reconciliation), CreateOrPatch could theoretically create an empty resource. However, this is unlikely to occur in practice given the reconciliation ordering (CAPI runs first at line 402-408), and the Kind validation provides a safeguard by skipping annotation setup for non-AWS templates.
Optional: More defensive pattern
If you want to be more explicit about requiring the resource to exist first, consider using Get followed by Update instead of CreateOrPatch:
// For MachineSet ms := &capiv1.MachineSet{} if err := r.Get(ctx, key, ms); err != nil { if apierrors.IsNotFound(err) { // Resource not created yet by CAPI, skip for now log.V(1).Info("MachineSet not found yet, skipping scale-from-zero annotations") return nil } return fmt.Errorf("failed to get MachineSet: %w", err) } // ... rest of annotation logic with Update instead of CreateOrPatchThis makes the assumption explicit that the resource should exist, and gracefully handles the case where it doesn't yet.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
cmd/install/assets/hypershift_operator.gocmd/install/install.gohypershift-operator/controllers/nodepool/nodepool_controller.go
💤 Files with no reviewable changes (1)
- cmd/install/assets/hypershift_operator.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/install/install.gohypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (1)
cmd/install/install.go (2)
karpenter-operator/controllers/karpenter/assets/assets.go (1)
ReadFile(23-25)cmd/install/assets/hypershift_operator.go (1)
ScaleFromZeroAWSCredentialsSecret(157-161)
🔇 Additional comments (6)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)
4-14: LGTM!The new imports are appropriate for the scale-from-zero functionality. The
cmppackage is used for namespace defaulting, and the config/agent imports support the expanded reconciliation logic.cmd/install/install.go (5)
135-138: LGTM! Consistent field additions.The new scale-from-zero credential fields follow the established pattern used for AWS private credentials and OIDC storage, providing both file-based and secret-reference options.
191-208: LGTM! Thorough validation logic.The validation correctly enforces mutual exclusivity and provides helpful error messages for file access issues. The approach is consistent with existing credential validation patterns in this file.
312-314: LGTM! Clear flag definitions.The CLI flags are well-documented and follow the naming conventions used for other credential options.
1134-1135: LGTM! Accurate comment and signature update.The function comment correctly describes all return values, and the signature change is properly reflected at all call sites.
1213-1233: LGTM! Implementation follows established patterns and is properly integrated.The scale-from-zero secret creation logic correctly mirrors the approach used for OIDC and AWS private credentials. Both the file-based case (reading credentials and creating a managed secret) and the secret reference case (creating a minimal reference object for existing secrets) are properly implemented. Error handling uses correct
%wwrapping. The secret is properly mounted in the operator deployment, with appropriate nil and value checks before use.
1632d1c to
a78e835
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)
1115-1148: Consider GET-first approach to avoid creation attempts.When using
CreateOrPatchon a potentially non-existent MachineSet (lines 1117-1148) or MachineDeployment (lines 1151-1183), the mutation function only sets annotations without populating the required Spec fields. If the resource doesn't exist,CreateOrPatchwill attempt to create it with an incomplete Spec, causing validation errors and unnecessary reconciliation loops.Since the MachineSet/MachineDeployment should already exist after CAPI reconciliation (line 402-408), a GET-first approach would be cleaner and avoid spurious errors during timing edge cases.
🔎 Suggested refactor for MachineSet path
if nodePool.Spec.Management.UpgradeType == hyperv1.UpgradeTypeInPlace { // InPlace mode uses MachineSet ms := &capiv1.MachineSet{ ObjectMeta: metav1.ObjectMeta{ Namespace: key.Namespace, Name: key.Name, }, } + // Get the MachineSet first to ensure it exists + if err := r.Get(ctx, key, ms); err != nil { + if apierrors.IsNotFound(err) { + log.V(1).Info("MachineSet not found yet, skipping scale-from-zero annotations") + return nil + } + return fmt.Errorf("failed to get MachineSet: %w", err) + } + + // Check if it's an AWS machine template + if ms.Spec.Template.Spec.InfrastructureRef.Kind != "AWSMachineTemplate" { + return nil + } + + // Fetch the AWSMachineTemplate + awsMachineTemplate := &capiaws.AWSMachineTemplate{} + templateKey := client.ObjectKey{ + Namespace: cmp.Or(ms.Spec.Template.Spec.InfrastructureRef.Namespace, controlPlaneNamespace), + Name: ms.Spec.Template.Spec.InfrastructureRef.Name, + } + if err := r.Get(ctx, templateKey, awsMachineTemplate); err != nil { + return fmt.Errorf("failed to get AWSMachineTemplate: %w", err) + } + + // Patch annotations on the MachineSet if _, err := controllerutil.CreateOrPatch(ctx, r.Client, ms, func() error { - // Ensure annotations map is initialized if ms.Annotations == nil { ms.Annotations = make(map[string]string) } - - // Check if it's an AWS machine template - if ms.Spec.Template.Spec.InfrastructureRef.Kind != "AWSMachineTemplate" { - return nil // Not an AWS machine template, skip - } - - // Fetch the AWSMachineTemplate - awsMachineTemplate := &capiaws.AWSMachineTemplate{} - templateKey := client.ObjectKey{ - Namespace: cmp.Or(ms.Spec.Template.Spec.InfrastructureRef.Namespace, controlPlaneNamespace), - Name: ms.Spec.Template.Spec.InfrastructureRef.Name, - } - if err := r.Get(ctx, templateKey, awsMachineTemplate); err != nil { - return fmt.Errorf("failed to get AWSMachineTemplate: %w", err) - } - - // Set annotations on the MachineSet return setScaleFromZeroAnnotationsOnObject(ctx, r.Client, r.EC2Client, nodePool, ms.Annotations, awsMachineTemplate) }); err != nil { - return fmt.Errorf("failed to reconcile MachineSet with scale-from-zero annotations: %w", err) + return fmt.Errorf("failed to patch MachineSet annotations: %w", err) }Apply the same pattern to the MachineDeployment path (lines 1149-1183).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
hypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
api/hypershift/v1beta1/hostedcluster_types.go (1)
AWSPlatform(1211-1211)api/hypershift/v1beta1/nodepool_types.go (3)
NodePool(83-97)UpgradeType(280-280)UpgradeTypeInPlace(289-289)support/awsclient/client.go (1)
Client(81-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
4-4: LGTM: Standard librarycmpimport for namespace defaulting.The import correctly uses the Go 1.22+ standard library
cmppackage for thecmp.Orfunction used in lines 1137 and 1171.
1137-1138: LGTM: Correct namespace resolution withcmp.Or.The use of
cmp.Orproperly defaults tocontrolPlaneNamespacewhenInfrastructureRef.Namespaceis empty, correctly addressing the namespace handling issue from previous reviews.Also applies to: 1171-1172
a78e835 to
2e100a3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/nodepool_controller.go:
- Around line 1115-1148: The current code uses controllerutil.CreateOrPatch on a
MachineSet which can create an empty MachineSet; change this to only act on
existing MachineSets by performing a Get for the MachineSet (ms) based on
key.Namespace/key.Name, return nil if NotFound, then ensure ms.Annotations is
initialized, check ms.Spec.Template.Spec.InfrastructureRef.Kind ==
"AWSMachineTemplate" and fetch the referenced AWSMachineTemplate (using r.Get
into awsMachineTemplate), call setScaleFromZeroAnnotationsOnObject to mutate the
annotations, and persist the change with r.Update (or a Patch) instead of
CreateOrPatch; keep the same error wrapping messages on Get/Update and preserve
use of r.EC2Client and nodePool in the call to
setScaleFromZeroAnnotationsOnObject.
- Around line 1149-1182: The current code uses controllerutil.CreateOrPatch on
the MachineDeployment (md) which will create an empty MachineDeployment when
none exists; instead, use r.Get to fetch the existing MachineDeployment (using
the same ObjectMeta key), return early if NotFound, then modify md.Annotations
and call r.Update to persist changes; keep the same checks for
md.Spec.Template.Spec.InfrastructureRef.Kind == "AWSMachineTemplate", fetch the
AWSMachineTemplate into awsMachineTemplate via r.Get, and call
setScaleFromZeroAnnotationsOnObject to populate annotations before calling
r.Update on md (do not call CreateOrPatch or create new MDs).
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)
1130-1132: Consider logging when skipping non-AWS templates.The code silently skips non-AWS templates by returning nil. Adding a log message would improve debuggability when troubleshooting why scale-from-zero annotations weren't applied.
🔎 Proposed addition
// Check if it's an AWS machine template if ms.Spec.Template.Spec.InfrastructureRef.Kind != "AWSMachineTemplate" { + log.V(1).Info("Skipping scale-from-zero annotations for non-AWS template", "kind", ms.Spec.Template.Spec.InfrastructureRef.Kind) return nil // Not an AWS machine template, skip }Apply the same change to the MachineDeployment branch at Line 1164.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/main.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
hypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/main.go
🔇 Additional comments (3)
hypershift-operator/main.go (3)
127-127: LGTM!Field definition is clear and consistent with other credential path fields in the struct.
164-164: LGTM!Flag definition follows the existing pattern for credential path flags.
402-414: LGTM!The priority-based initialization is well-structured:
- Explicit scale-from-zero credentials take precedence when configured
- Falls back to platform credentials for AWS private platform
The clear comments and distinct session names ("hypershift-operator-scale-from-zero" vs "hypershift-operator") aid debugging and traceability.
2e100a3 to
46086d7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @hypershift-operator/main.go:
- Around line 402-412: The current if-else-if makes PrivatePlatform AWS take
precedence and silently ignores opts.ScaleFromZeroAWSCreds; change the branching
so scale-from-zero creds are honored when present (or explicitly document
precedence). Replace the if-else-if with explicit checks: first check
opts.ScaleFromZeroAWSCreds != "" and initialize ec2Client using
awsutil.NewSession("hypershift-operator-scale-from-zero",
opts.ScaleFromZeroAWSCreds, ...), then separately check
hyperv1.PlatformType(opts.PrivatePlatform) == hyperv1.AWSPlatform and initialize
the PrivatePlatform client only if ec2Client is still nil (or vice versa if you
prefer documented precedence). Ensure you reference the variables
opts.ScaleFromZeroAWSCreds, opts.PrivatePlatform, hyperv1.PlatformType, and
ec2Client and either remove the else or add a clear comment explaining the
chosen precedence.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/main.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
hypershift-operator/main.gohypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (2)
hypershift-operator/main.go (1)
cmd/infra/aws/util/util.go (2)
NewSession(89-106)NewConfig(120-129)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
api/hypershift/v1beta1/hostedcluster_types.go (1)
AWSPlatform(1211-1211)api/hypershift/v1beta1/nodepool_types.go (3)
NodePool(83-97)UpgradeType(280-280)UpgradeTypeInPlace(289-289)support/awsclient/client.go (1)
Client(81-83)
🔇 Additional comments (2)
hypershift-operator/main.go (1)
202-210: LGTM: Early validation prevents runtime failures.The early validation of scale-from-zero credentials ensures the operator fails fast on startup if the credentials file is missing or inaccessible, rather than encountering errors during reconciliation.
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)
4-4: LGTM: Appropriate use ofcmp.Orfor namespace defaulting.The
cmpimport is correctly used in the newreconcileScaleFromZeroAnnotationsfunction to provide default namespace values.
07008ba to
8499cd0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @hypershift-operator/controllers/nodepool/instancetype/aws/provider.go:
- Around line 19-26: The doc for NewProvider says it panics if ec2Client is nil
but the function does not check; update NewProvider (and its comment if you
prefer the other option) to match behavior: either add a nil-check in
NewProvider that panics with a clear message (e.g., if ec2Client == nil {
panic("NewProvider: ec2Client is nil") }) so callers get an immediate, explicit
failure instead of later nil dereferences in methods like GetInstanceTypeInfo,
or change the function comment to remove the “Panics if ec2Client is nil”
statement if you choose not to enforce it here; update the Provider struct
reference (ec2Client field) and ensure callers are aware of the chosen behavior.
🧹 Nitpick comments (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (2)
96-111: Consider logging unknown CPU architectures instead of silently defaulting.Lines 105-106 silently default to AMD64 for unknown architectures returned by AWS. This could mask AWS API changes or unexpected instance types. For scale-from-zero functionality, an incorrect architecture could lead to scheduling issues. Consider logging a warning when an unknown architecture is encountered so operators can investigate.
🔎 Proposed enhancement
switch architecture { case ec2.ArchitectureTypeX8664: info.CPUArchitecture = instancetype.ArchAMD64 case ec2.ArchitectureTypeArm64: info.CPUArchitecture = instancetype.ArchARM64 default: - // Default to amd64 for unknown architectures + // Default to amd64 for unknown architectures, but log a warning + log.Printf("WARNING: Unknown architecture %q for instance type %q, defaulting to amd64", architecture, instanceTypeName) info.CPUArchitecture = instancetype.ArchAMD64 }Note: You'll need to pass a logger through the function signature or use a package-level logger.
28-53: Consider caching instance type information to reduce AWS API calls.The provider queries the EC2 API on every call to
GetInstanceTypeInfo. Since instance type specifications are static and rarely change, repeated calls for the same instance type are wasteful and could hit AWS rate limits under load. Consider adding an in-memory cache with a reasonable TTL (e.g., 1 hour) to improve performance and reduce API usage.hypershift-operator/main.go (1)
414-420: Consider documenting the dual AWS client initialization.When
PrivatePlatformis AWS andScaleFromZeroAWSCredsis provided, the code initializes two separate AWS sessions and EC2 clients. This is intentional and correct—ec2Client(line 407) is used for private platform functionality, whilescaleFromZeroEC2Client(line 418) is specifically for the instance type provider. However, this dual initialization might not be immediately obvious to future maintainers. Consider adding a brief comment explaining why separate clients are needed.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.gohypershift-operator/controllers/nodepool/instancetype/instancetype.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/scale_from_zero.gohypershift-operator/main.go
🚧 Files skipped from review as they are similar to previous changes (1)
- hypershift-operator/controllers/nodepool/scale_from_zero.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
hypershift-operator/controllers/nodepool/instancetype/aws/provider.gohypershift-operator/controllers/nodepool/instancetype/instancetype.gohypershift-operator/main.gohypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (1)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (3)
Provider(20-25)InstanceTypeInfo(30-45)ArchAMD64(12-12)
hypershift-operator/main.go (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (2)
Provider(15-17)NewProvider(22-26)hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
Provider(20-25)cmd/infra/aws/util/util.go (2)
NewSession(89-106)NewConfig(120-129)
hypershift-operator/controllers/nodepool/nodepool_controller.go (4)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (1)
Provider(15-17)hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
Provider(20-25)api/hypershift/v1beta1/hostedcluster_types.go (1)
AWSPlatform(1211-1211)support/awsclient/client.go (1)
Client(81-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (6)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
1-45: LGTM! Clean interface design for instance type providers.The interface and types are well-designed and clearly documented. The
Providerinterface provides a clean abstraction for fetching instance type information across different cloud providers, and theInstanceTypeInfostruct includes all necessary fields for scale-from-zero functionality.hypershift-operator/main.go (2)
203-212: LGTM! Good early validation of configuration.The early validation of the scale-from-zero credentials file is good defensive programming. It catches configuration errors at startup rather than during reconciliation, providing clear error messages to operators.
422-433: LGTM! Instance type provider is correctly wired.The
InstanceTypeProvideris properly integrated into theNodePoolReconciler. The nil check in the reconciler (line 414 of nodepool_controller.go) ensures the provider is only used when available, making it an optional enhancement rather than a required dependency.hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
412-418: Scale-from-zero errors are logged but not surfaced to users.When
reconcileScaleFromZeroAnnotationsfails, the error is logged but reconciliation continues without updating the NodePool status or failing the reconcile. This means EC2 API failures or configuration issues won't be visible to users, potentially causing silent autoscaling problems.Consider either:
- Adding a dedicated status condition (e.g.,
ScaleFromZeroReady) to surface failures, or- Returning the error to halt reconciliation when annotation setup fails
Based on past review comments.
Likely an incorrect or invalid review comment.
1151-1184: Use Get + Update instead of CreateOrPatch to avoid creating empty resources.Same issue as with the MachineSet path:
CreateOrPatchwill create the MachineDeployment if it doesn't exist. For scale-from-zero annotations, you should only annotate existing resources.Based on past review comments, this was identified as a critical issue that should be addressed.
Likely an incorrect or invalid review comment.
1117-1150: Use Get + Update instead of CreateOrPatch to avoid creating empty resources.
CreateOrPatchwill create the MachineSet if it doesn't exist. For scale-from-zero annotations, you should only annotate existing resources, not create new ones. Creating a MachineSet with only annotations and no required spec fields will result in an incomplete resource.Based on past review comments, this was identified as a critical issue that should be addressed.
Likely an incorrect or invalid review comment.
ae2bfba to
e943b79
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)
411-419: Consider adding a status condition for scale-from-zero failures.Errors from
reconcileScaleFromZeroAnnotationsare logged but don't block reconciliation or update NodePool status. While this is acceptable for a non-critical feature, users won't see annotation failures reflected in the NodePool status.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (12)
cmd/install/assets/hypershift_operator.gocmd/install/install.gohypershift-operator/controllers/machinedeployment/controller.gohypershift-operator/controllers/machinedeployment/controller_test.gohypershift-operator/controllers/machinedeployment/ec2_instance_types.gohypershift-operator/controllers/machinedeployment/ec2_instance_types_test.gohypershift-operator/controllers/machinedeployment/utils/providerspec.gohypershift-operator/controllers/nodepool/instancetype/aws/provider.gohypershift-operator/controllers/nodepool/instancetype/instancetype.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/scale_from_zero.gohypershift-operator/main.go
💤 Files with no reviewable changes (5)
- hypershift-operator/controllers/machinedeployment/controller_test.go
- hypershift-operator/controllers/machinedeployment/controller.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
- hypershift-operator/controllers/machinedeployment/utils/providerspec.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
🚧 Files skipped from review as they are similar to previous changes (1)
- hypershift-operator/controllers/nodepool/scale_from_zero.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
hypershift-operator/controllers/nodepool/instancetype/instancetype.gohypershift-operator/controllers/nodepool/instancetype/aws/provider.gohypershift-operator/main.gocmd/install/install.gocmd/install/assets/hypershift_operator.gohypershift-operator/controllers/nodepool/nodepool_controller.go
🧬 Code graph analysis (2)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (1)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (4)
Provider(20-25)InstanceTypeInfo(30-45)ArchAMD64(12-12)ArchARM64(14-14)
hypershift-operator/main.go (3)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (2)
Provider(15-17)NewProvider(22-26)hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
Provider(20-25)cmd/infra/aws/util/util.go (2)
NewSession(89-106)NewConfig(120-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
🔇 Additional comments (20)
hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (3)
28-53: LGTM!The
GetInstanceTypeInfoimplementation correctly queries the EC2 API, validates the response, and transforms the data. Error handling is appropriate.
55-114: LGTM!The
transformInstanceTypeInfofunction has thorough validation for required fields (vCPU, memory) and handles optional fields (GPU, architecture) with sensible defaults. The architecture mapping and fallback to amd64 are reasonable.
116-126: LGTM!The GPU counting logic correctly handles the nil check on individual GPU counts and sums across all GPU devices.
cmd/install/assets/hypershift_operator.go (4)
100-104: LGTM!The constant renaming from AWS-specific to generic naming aligns with the provider-agnostic scale-from-zero approach.
157-179: LGTM!The
ScaleFromZeroCredentialsSecretstruct andBuildmethod are properly refactored to support multiple providers. TheProviderfield is stored for future use.
450-453: LGTM!The deployment struct fields are properly renamed to be provider-agnostic.
573-591: LGTM!The conditional wiring correctly checks all required fields before adding the scale-from-zero volume mount and command-line arguments. The provider flag is now passed to the operator.
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
1-45: LGTM!Clean interface design that provides a cloud-agnostic abstraction for instance type information. The
Providerinterface andInstanceTypeInfostruct are well-documented and support the scale-from-zero functionality across different cloud providers.hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
4-4: LGTM!The new imports and
InstanceTypeProviderfield onNodePoolReconcilerproperly integrate the provider-based scale-from-zero functionality.Also applies to: 16-16, 102-103
423-426: LGTM!The
supportedScaleFromZeroPlatformhelper correctly restricts the feature to AWS platform only.hypershift-operator/main.go (4)
130-131: LGTM!The new
ScaleFromZeroProviderandScaleFromZeroCredsfields properly support the provider-based scale-from-zero configuration.
206-221: LGTM!The early validation ensures configuration consistency: provider is required when credentials are provided, and only supported providers are accepted. File existence check prevents runtime failures.
440-449: LGTM!The
NodePoolReconcileris correctly wired with theInstanceTypeProvider, enabling the scale-from-zero annotation flow.
421-438: Add AWS region configuration or validation for scale-from-zero provider.The AWS session is created with an empty region, which relies on the
AWS_REGIONenvironment variable being set at runtime. Unlike other AWS session creations in the codebase, this code path lacks:
- An explicit comment documenting the
AWS_REGIONdependency- Validation that
AWS_REGIONis actually set before attempting EC2 API callsConsider either passing an explicit region parameter to
awsutil.NewSession()or adding a validation check (similar to the check incontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:271-276) to ensure the environment variable is set, along with a comment explaining the dependency.cmd/install/install.go (6)
136-140: LGTM!The new Options fields properly support the provider-agnostic scale-from-zero configuration, replacing the AWS-specific fields.
194-217: LGTM!The validation logic is comprehensive: enforces mutual exclusivity between file-based and secret-based credentials, requires provider when credentials are provided, validates the provider value, and checks file accessibility.
321-325: LGTM!The new CLI flags are properly documented and aligned with the Options struct fields.
383-383: LGTM!The default value for
ScaleFromZeroCredentialsSecretKeyis set to "credentials", consistent with other credential secret key defaults.
912-914: LGTM!The deployment wiring correctly passes all three scale-from-zero fields to the operator deployment.
1224-1245: LGTM!The secret setup correctly handles both file-based credentials (building a new secret with provider info) and pre-existing secrets. The
Providerfield is properly included in the secret construction.
|
@LiangquanLi930: This pull request references Jira Issue OCPBUGS-70320, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Move architecture constants to hyperv1, improve error handling, and refactor for platform abstraction.
992f478 to
b8c3530
Compare
|
/jira refresh |
|
@LiangquanLi930: This pull request references Jira Issue OCPBUGS-70320, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (lgao@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @hypershift-operator/main.go:
- Around line 426-438: The EC2 client for scale-from-zero is created with
awsutil.NewSession using an empty region, causing unpredictable region
resolution; add a new option (e.g., opts.ScaleFromZeroRegion) or require
AWS_REGION and pass that explicit region into awsutil.NewSession when creating
the session for scale-from-zero (the block that calls awsutil.NewSession,
awsutil.NewConfig, ec2.New and awsinstancetype.NewProvider) and validate the
flag is set (or fallback with a clear error) similar to how
OIDCStorageProviderS3Region is handled so DescribeInstanceTypes always uses an
explicit region.
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
91-109: Minor: Duplicate step numbering in comments.Steps are numbered "2" twice (lines 91 and 105). This doesn't affect functionality but could cause confusion when referencing the code.
📝 Suggested fix for comment numbering
- // 2. Check if Status.Capacity is already provided (prefer native support) + // 2. Check if Status.Capacity is already provided (prefer native support) if len(statusCapacity) > 0 { // ... } - // 2. Skip if provider is not configured + // 3. Skip if provider is not configured if provider == nil { // ... } - // 3. Validate instance type is not empty + // 4. Validate instance type is not emptyAnd renumber subsequent steps accordingly (4→5, 5→6, etc.).
cmd/install/assets/hypershift_operator.go (1)
157-179: UnusedProviderfield inScaleFromZeroCredentialsSecret.The
Providerfield (line 161) is defined in the struct but not used in theBuild()method. If this is intentional for future use (e.g., adding provider metadata to the secret), consider documenting that. Otherwise, it could be removed to avoid confusion.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (16)
cmd/install/assets/hypershift_operator.gocmd/install/install.gocmd/install/install_test.gohypershift-operator/controllers/machinedeployment/controller.gohypershift-operator/controllers/machinedeployment/controller_test.gohypershift-operator/controllers/machinedeployment/ec2_instance_types.gohypershift-operator/controllers/machinedeployment/ec2_instance_types_test.gohypershift-operator/controllers/machinedeployment/utils/providerspec.gohypershift-operator/controllers/nodepool/instancetype/aws/provider.gohypershift-operator/controllers/nodepool/instancetype/aws/provider_test.gohypershift-operator/controllers/nodepool/instancetype/instancetype.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/scale_from_zero.gohypershift-operator/controllers/nodepool/scale_from_zero_test.gohypershift-operator/main.gotest/e2e/autoscaling_test.go
💤 Files with no reviewable changes (5)
- hypershift-operator/controllers/machinedeployment/ec2_instance_types_test.go
- hypershift-operator/controllers/machinedeployment/controller.go
- hypershift-operator/controllers/machinedeployment/controller_test.go
- hypershift-operator/controllers/machinedeployment/ec2_instance_types.go
- hypershift-operator/controllers/machinedeployment/utils/providerspec.go
🚧 Files skipped from review as they are similar to previous changes (2)
- hypershift-operator/controllers/nodepool/instancetype/instancetype.go
- test/e2e/autoscaling_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/install/install_test.gohypershift-operator/controllers/nodepool/scale_from_zero_test.gohypershift-operator/controllers/nodepool/scale_from_zero.gohypershift-operator/main.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/instancetype/aws/provider_test.gocmd/install/install.gocmd/install/assets/hypershift_operator.gohypershift-operator/controllers/nodepool/instancetype/aws/provider.go
🧬 Code graph analysis (4)
hypershift-operator/controllers/nodepool/scale_from_zero.go (1)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
Provider(10-15)
hypershift-operator/main.go (2)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
Provider(10-15)cmd/infra/aws/util/util.go (2)
NewSession(89-106)NewConfig(120-129)
hypershift-operator/controllers/nodepool/nodepool_controller.go (3)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
Provider(10-15)api/hypershift/v1beta1/hostedcluster_types.go (2)
PlatformType(1213-1213)AWSPlatform(1217-1217)api/hypershift/v1beta1/nodepool_types.go (3)
NodePool(83-97)UpgradeType(280-280)UpgradeTypeInPlace(289-289)
hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go (2)
hypershift-operator/controllers/nodepool/instancetype/instancetype.go (1)
InstanceTypeInfo(20-35)support/supportedversion/version.go (1)
String(254-256)
🔇 Additional comments (24)
cmd/install/install_test.go (1)
143-193: LGTM! Comprehensive test coverage for scale-from-zero validation.Test cases appropriately cover error paths (missing provider, invalid provider, conflicting creds) and success paths (AWS with creds file, AWS with secret reference). Using
/dev/nullas a placeholder for existing file paths is a practical approach.hypershift-operator/controllers/nodepool/scale_from_zero_test.go (1)
13-62: LGTM! Well-structured tests for taint annotation formatting.Tests cover essential edge cases including empty taints, empty values, and deterministic sorting. The expected formats align with the CAPI scale-from-zero specification.
hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go (1)
1-401: LGTM! Excellent test coverage for the AWS instance type provider.The tests comprehensively cover:
- GPU counting with various nil/empty scenarios
- Input validation for all required fields (name, vCPU, memory, architecture)
- Architecture mapping (x86_64 → AMD64, arm64 → ARM64)
- Provider integration including found, not-found, and API error cases
The configurable mock pattern is clean and enables targeted testing of edge cases.
hypershift-operator/main.go (1)
130-131: LGTM! Clean addition of scale-from-zero configuration options.The new
StartOptionsfields and corresponding command-line flags follow the existing patterns in the codebase. The flag names are clear and consistent with other credential/provider patterns in the operator.Also applies to: 168-169
hypershift-operator/controllers/nodepool/scale_from_zero.go (2)
44-63: LGTM! Clean implementation of taint formatting.The function correctly implements the CAPI scale-from-zero annotation format with deterministic sorting for stable output.
134-147: Good design: Architecture enforcement from instance type.The explicit override of
kubernetes.io/archwith the actual instance architecture (line 140) is the correct approach. This prevents misconfigurations where user-specified labels could lie about the node's architecture, which would cause pod scheduling failures.hypershift-operator/controllers/nodepool/instancetype/aws/provider.go (4)
1-28: LGTM - Clean provider initialization.The
Providerstruct andNewProviderconstructor follow good patterns - the interface dependency (ec2iface.EC2API) enables easy testing via mocks.
30-55: LGTM - Proper API call and validation.The
GetInstanceTypeInfomethod correctly:
- Uses context for cancellation
- Validates empty response
- Wraps errors with context
57-114: LGTM - Thorough validation in transformation logic.The
transformInstanceTypeInfofunction has comprehensive nil checks and validates all required fields (vCPU, memory, architecture) before returning. The architecture switch handles the supported AWS architectures appropriately.
116-131: LGTM - Safe GPU counting with nil guards.The
getGpuCounthelper properly handles nilgpuInfo, nilGpusslice, and nil individual GPU entries before accessing theCountfield.cmd/install/assets/hypershift_operator.go (3)
100-104: LGTM - Consistent naming convention for new secret.The new
scaleFromZeroCredsSecretNameconstant follows the established naming pattern alongside other credential secrets.
450-453: LGTM - New deployment fields for scale-from-zero configuration.The three new fields (
ScaleFromZeroSecret,ScaleFromZeroSecretKey,ScaleFromZeroProvider) follow the existing pattern used for OIDC and AWS private credentials.
573-591: LGTM - Consistent secret mounting pattern.The scale-from-zero credentials mounting mirrors the OIDC storage provider pattern (lines 552-571), ensuring consistency. The
ReadOnly: trueflag is a good security practice for credential mounts.hypershift-operator/controllers/nodepool/nodepool_controller.go (4)
15-15: LGTM - Clean integration of instance type provider.The new
InstanceTypeProviderfield onNodePoolReconcilerfollows the existing pattern of injecting dependencies (similar toReleaseProvider,ImageMetadataProvider).Also applies to: 101-102
410-418: LGTM - Appropriate error handling for non-critical annotation updates.The decision to log the error and requeue after 30 seconds instead of failing the reconciliation is sensible. Scale-from-zero annotations are not critical for cluster operation, so a transient failure here shouldn't block other reconciliation.
423-426: LGTM - Extensible platform check.The
supportedScaleFromZeroPlatformfunction provides a clear extension point for adding future platform support (Azure, GCP, etc.) as noted in the commented code inreconcileScaleFromZeroAnnotations.
1115-1187: LGTM - Well-structured reconciliation with proper error handling.The
reconcileScaleFromZeroAnnotationsfunction:
- Handles not-found gracefully for both machine templates and CAPI objects
- Correctly distinguishes between InPlace (MachineSet) and Replace (MachineDeployment) modes
- Uses
client.MergeFrompatch to minimize unnecessary API updates- Has clear comments for future platform additions
cmd/install/install.go (7)
135-139: LGTM - Clean field additions for scale-from-zero configuration.The new fields follow the established naming convention used for AWS private credentials and OIDC storage provider credentials.
144-147: LGTM - Good UX improvement with normalization.Trimming whitespace and lowercasing the provider value prevents common user input errors and ensures consistent validation.
198-222: LGTM - Comprehensive validation logic.The validation correctly enforces:
- Mutual exclusivity between
--scale-from-zero-credsand--scale-from-zero-secret- Provider is required when credentials are specified
- Provider must be in the supported list
- Credentials file must exist and be accessible
326-329: LGTM - Well-documented CLI flags.The flag descriptions clearly indicate purpose and alternatives (file vs. secret).
388-388: LGTM - Sensible default for credentials secret key.Defaulting to
"credentials"is consistent with other credential secrets in the codebase (e.g.,AWSPrivateCredentialsSecretKey,OIDCStorageProviderS3CredentialsSecretKey).
917-919: LGTM - Proper wiring to operator deployment.The scale-from-zero configuration is correctly passed to the
HyperShiftOperatorDeploymentbuilder.
1229-1250: LGTM - Consistent secret setup pattern.The scale-from-zero credentials setup follows the same pattern as other credentials (OIDC, AWS private), supporting both file-based and pre-existing secret references.
|
/test e2e-aws |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, LiangquanLi930, muraee The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by e2e test |
|
@LiangquanLi930: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@LiangquanLi930: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@LiangquanLi930: Jira Issue Verification Checks: Jira Issue OCPBUGS-70320 Jira Issue OCPBUGS-70320 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira backport release-4.21 |
OCPBUGS-70320: enable scale-up from zero with InPlace upgrade
|
@LiangquanLi930: The following backport issues have been created:
Queuing cherrypicks to the requested branches to be created after this PR merges: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-ci-robot: #7427 failed to apply on top of branch "release-4.21": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
OCPBUGS-70320: enable scale-up from zero with InPlace upgrade
|
Fix included in accepted release 4.22.0-0.nightly-2026-01-19-085729 |
The scale-from-zero flags were refactored in hypershift PR openshift#7427 from AWS-specific to platform-agnostic flags. Changes: - Added: --scale-from-zero-provider=aws - Added: --scale-from-zero-creds=/etc/hypershift-pool-aws-credentials/credentials The old --scale-from-zero-aws-creds flag no longer exists and has been replaced with the combination of --scale-from-zero-provider and --scale-from-zero-creds flags. Related: openshift/hypershift#7427
The scale-from-zero flags were refactored in hypershift PR openshift#7427 from AWS-specific to platform-agnostic flags. Changes: - Added: --scale-from-zero-provider=aws - Added: --scale-from-zero-creds=/etc/hypershift-pool-aws-credentials/credentials The old --scale-from-zero-aws-creds flag no longer exists and has been replaced with the combination of --scale-from-zero-provider and --scale-from-zero-creds flags. Related: openshift/hypershift#7427
The scale-from-zero flags were refactored in hypershift PR openshift#7427 from AWS-specific to platform-agnostic flags. Changes: - Added: --scale-from-zero-provider=aws - Added: --scale-from-zero-creds=/etc/hypershift-pool-aws-credentials/credentials The old --scale-from-zero-aws-creds flag no longer exists and has been replaced with the combination of --scale-from-zero-provider and --scale-from-zero-creds flags. Related: openshift/hypershift#7427
What this PR does / why we need it:
Enable scale-up from zero with InPlace upgrade. Remove machinedeployment controller and refactor scale-from-zero using instancetype provider abstraction.
Which issue(s) this PR fixes:
Fixes: OCPBUGS-70320
Special notes for your reviewer:
Checklist: