ROSAENG-132: fix(karpenter): Set Ready condition to False when version resolution fails#7866
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@joshbranham: This pull request references ROSAENG-132 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds a new exported readiness constant and implements atomic Ready condition computation for OpenshiftEC2NodeClass in the EC2 nodeclass reconciler; includes unit tests for computeReadyCondition and minor whitespace adjustments in the ignition controller files. Changes
Sequence Diagram(s)sequenceDiagram
participant Reconciler as EC2NodeClassReconciler
participant Resource as OpenshiftEC2NodeClass
participant Status as StatusPatch/Setter
participant K8sAPI as Kubernetes API
Reconciler->>Resource: fetch current status.conditions
Reconciler->>Reconciler: computeReadyCondition(VersionResolved, EC2Readiness)
alt VersionResolved == False
Reconciler->>Status: set Ready = False, Reason = VersionResolvedFailed
else VersionResolved == True or absent
Reconciler->>Status: preserve or set Ready based on EC2 readiness
end
Status->>K8sAPI: patch status with updated Ready condition
K8sAPI-->>Status: status patch response
Status-->>Reconciler: updated status acknowledged
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
|
/test e2e-aws-techpreview |
|
@joshbranham: This pull request references ROSAENG-132 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go (1)
943-974: Add a recovery-path test to guard against stale Ready=False.Please add a case that calls
updateVersionStatuswithresolveErr != nilfirst, then with success, and verifies Ready is no longer stuck onResolutionFailed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go` around lines 943 - 974, Add a new test that exercises the recovery path in updateVersionStatus: create a NodeClass (use newNodeClass like "recover-version"), initialize a fakeGuestClient and KarpenterIgnitionReconciler, call r.updateVersionStatus(ctx, nc, "", "", resolveErr) with a non-nil error to set the ResolutionFailed/Ready=False state, then call r.updateVersionStatus(ctx, nc, "quay.io/openshift/release:latest", "4.17.0", nil) to simulate a successful resolution; fetch the updated OpenshiftEC2NodeClass via fakeGuestClient.Get and assert that Status.ReleaseImage is set, VersionResolved condition is metav1.ConditionTrue (not ResolutionFailed), and the Ready condition is not stuck on Reason "ResolutionFailed" and is set to True (or cleared of the ResolutionFailed state) to confirm recovery.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 535-547: The current code only sets ConditionTypeReady to False
when resolveErr != nil, so a previous Ready=False with
Reason=ConditionReasonResolutionFailed can persist after a later successful
resolution; update the success path (where resolveErr == nil / after version
resolution completes) to explicitly set or replace the Ready condition to
Status=True (ObservedGeneration = openshiftEC2NodeClass.Generation,
LastTransitionTime = metav1.Now(), Reason = "" or a success reason) using
meta.SetStatusCondition on openshiftEC2NodeClass.Status.Conditions (or remove
the ResolutionFailed condition) so the Ready condition is cleared/updated when
resolution succeeds; locate resolveErr, the existing readyCondition block, and
the use of meta.SetStatusCondition to apply the fix.
---
Nitpick comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 943-974: Add a new test that exercises the recovery path in
updateVersionStatus: create a NodeClass (use newNodeClass like
"recover-version"), initialize a fakeGuestClient and
KarpenterIgnitionReconciler, call r.updateVersionStatus(ctx, nc, "", "",
resolveErr) with a non-nil error to set the ResolutionFailed/Ready=False state,
then call r.updateVersionStatus(ctx, nc, "quay.io/openshift/release:latest",
"4.17.0", nil) to simulate a successful resolution; fetch the updated
OpenshiftEC2NodeClass via fakeGuestClient.Get and assert that
Status.ReleaseImage is set, VersionResolved condition is metav1.ConditionTrue
(not ResolutionFailed), and the Ready condition is not stuck on Reason
"ResolutionFailed" and is set to True (or cleared of the ResolutionFailed state)
to confirm recovery.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f619de0b-cf1a-4fcb-b80c-1dcd7efaa41d
⛔ Files ignored due to path filters (1)
vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (3)
api/karpenter/v1beta1/karpenter_types.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
|
/test e2e-aws-techpreview |
|
if we want to include ConditionReasonResolutionFailed into the upper level ready semantic, I think what ever is bubbling up the ready condition (ec2 node class controller atm) should compute that atomically, i.e. upper level ready = (ec2 ready && ConditionReasonResolutionFailed) |
d24b773 to
476587a
Compare
|
@joshbranham: This pull request references ROSAENG-132 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-techpreview |
|
@joshbranham: This pull request references ROSAENG-132 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller_test.go (1)
551-638: Good test coverage forcomputeReadyCondition.The test cases cover the key scenarios: VersionResolved=False overriding Ready, VersionResolved=True preserving Ready, and absent VersionResolved condition.
Minor note: The
readyShouldChangefield (line 557) is defined but never used in the assertions. Consider removing it or adding an assertion if the intent was to verify condition mutation behavior.♻️ Remove unused field or add assertion
testCases := []struct { name string conditions []metav1.Condition expectedReadyStatus metav1.ConditionStatus expectedReadyReason string - readyShouldChange bool }{Or alternatively, add a test that uses this field to verify transition behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller_test.go` around lines 551 - 638, The test defines an unused field readyShouldChange in the testCases struct; remove the readyShouldChange field and any references to it (the struct field on the anonymous test case and any related variables) to eliminate dead test data in TestComputeReadyCondition, locating the struct declaration in TestComputeReadyCondition and the per-case initializers that include readyShouldChange.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller_test.go`:
- Around line 551-638: The test defines an unused field readyShouldChange in the
testCases struct; remove the readyShouldChange field and any references to it
(the struct field on the anonymous test case and any related variables) to
eliminate dead test data in TestComputeReadyCondition, locating the struct
declaration in TestComputeReadyCondition and the per-case initializers that
include readyShouldChange.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5a2d27b5-e5dd-448c-8acf-e0de0e3f3c5f
⛔ Files ignored due to path filters (1)
vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (5)
api/karpenter/v1beta1/karpenter_types.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gokarpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.gokarpenter-operator/controllers/nodeclass/ec2_nodeclass_controller_test.go
✅ Files skipped from review due to trivial changes (1)
- karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- api/karpenter/v1beta1/karpenter_types.go
476587a to
7e9e260
Compare
|
@joshbranham: This pull request references ROSAENG-132 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-techpreview |
|
/retest |
…fails Compute the Ready condition atomically in the EC2 node class controller by combining the upstream EC2NodeClass Ready status with the VersionResolved condition. When VersionResolved is False, Ready is overridden to False regardless of EC2 readiness. Previously the ignition controller set Ready directly, which could race with the ec2nodeclass controller that also patches status. Moving the computation into reconcileStatus ensures a single controller owns the Ready semantic and both conditions are evaluated in a single status patch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7e9e260 to
090e738
Compare
|
@joshbranham: This pull request references ROSAENG-132 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-techpreview |
|
/test e2e-azure-self-managed |
|
Let's see if I can do this :-) /approve EDIT: guess not |
|
/lgtm |
|
Scheduling tests matching the |
Test Resultse2e-aws
e2e-aks
|
|
/test e2e-aks |
|
@joshbranham: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/approve |
|
@enxebre: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre, joshbranham, maxcao13 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Compute the Ready condition atomically in the EC2 node class controller
by combining the upstream EC2NodeClass Ready status with the
VersionResolved condition. When VersionResolved is False, Ready is
overridden to False regardless of EC2 readiness.
Which issue(s) this PR fixes:
Fixes ROSAENG-132
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit