CNTRLPLANE-2219: feat(aws): migrate ec2 to AWS SDK v2#7871
CNTRLPLANE-2219: feat(aws): migrate ec2 to AWS SDK v2#7871openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@LiangquanLi930: This pull request references CNTRLPLANE-2219 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMigrates the codebase from AWS SDK v1 to AWS SDK v2: adds a generated Changes
sequenceDiagram
participant Component as "Component (controller/CLI)"
participant Builder as "clientBuilder / GetEC2Client"
participant Config as "config.LoadDefaultConfig"
participant EC2 as "ec2.Client (v2)"
rect rgba(200,200,255,0.5)
Component->>Builder: request EC2 client (ctx)
Builder->>Config: LoadDefaultConfig(ctx, with user-agent)
Config-->>Builder: awsv2.Config (with Retryer)
Builder->>EC2: ec2.NewFromConfig(cfg, opts...)
EC2-->>Builder: ec2.Client (implements awsapi.EC2API)
Builder-->>Component: return awsapi.EC2API (EC2 client)
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
|
@LiangquanLi930: This pull request references CNTRLPLANE-2219 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/infra/aws/delegatingclientgenerator/main.go (1)
495-517: Deduplicate merged API lists to prevent duplicate generated methods.With the expanded EC2 extended set, Line 584 currently concatenates API lists without deduplication. If an API appears in both sources, generated interfaces will duplicate method declarations.
♻️ Proposed hardening in
generateInterfaceFilefunc generateInterfaceFile(service string, apis []string, extended []string) error { - // Merge extended APIs into the main list for a single interface - allAPIs := append(append([]string{}, apis...), extended...) - slices.Sort(allAPIs) + // Merge extended APIs into the main list for a single interface. + // Deduplicate to avoid duplicate method declarations in generated output. + apiSet := sets.New[string]() + apiSet.Insert(apis...) + apiSet.Insert(extended...) + allAPIs := apiSet.UnsortedList() + slices.Sort(allAPIs)Also applies to: 583-586
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/infra/aws/delegatingclientgenerator/main.go` around lines 495 - 517, The merged EC2 API slice used in generateInterfaceFile is concatenated without deduplication, causing duplicate method declarations when APIs overlap; update generateInterfaceFile to merge the two API sources into a deduplicated set (e.g., build a map[string]struct{} or set, iterate both source slices adding names once, then produce a sorted slice) before rendering interfaces so duplicated API names in the EC2 list (and any other concatenations around the same code path) are removed; reference generateInterfaceFile and the variables that hold the API lists where concatenation currently occurs and replace that concatenation with a set-based merge and rebuild of the final slice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/infra/aws/delegatingclientgenerator/main.go`:
- Around line 495-517: The merged EC2 API slice used in generateInterfaceFile is
concatenated without deduplication, causing duplicate method declarations when
APIs overlap; update generateInterfaceFile to merge the two API sources into a
deduplicated set (e.g., build a map[string]struct{} or set, iterate both source
slices adding names once, then produce a sorted slice) before rendering
interfaces so duplicated API names in the EC2 list (and any other concatenations
around the same code path) are removed; reference generateInterfaceFile and the
variables that hold the API lists where concatenation currently occurs and
replace that concatenation with a set-based merge and rebuild of the final
slice.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e6056f3f-c2e0-45cd-82bf-d6e8b7886ec0
⛔ Files ignored due to path filters (10)
vendor/github.com/aws/aws-sdk-go/private/protocol/ec2query/build.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/private/protocol/ec2query/unmarshal.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/service/ec2/api.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/service/ec2/customizations.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/service/ec2/doc.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/service/ec2/ec2iface/interface.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/service/ec2/errors.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/service/ec2/service.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/aws/aws-sdk-go/service/ec2/waiters.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (3)
cmd/infra/aws/delegating_client.gocmd/infra/aws/delegatingclientgenerator/main.gosupport/awsapi/ec2.go
ae92b5a to
24d69d2
Compare
24d69d2 to
c4fbe28
Compare
|
@LiangquanLi930: This pull request references CNTRLPLANE-2219 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)
2412-2416:⚠️ Potential issue | 🔴 CriticalHandle the not-found case from
GetSecurityGroupById.Per
support/awsutil/sg.go, this helper returns(nil, nil)when the SG no longer exists. Line 2416 then dereferencessg.Tagsand panics on stale status or out-of-band deletion instead of letting reconciliation recover.💡 Minimal guard
sg, err := supportawsutil.GetSecurityGroupById(ctx, r.ec2Client, hcp.Status.Platform.AWS.DefaultWorkerSecurityGroupID) if err != nil { return fmt.Errorf("failed to get default security group: %w", err) } + if sg == nil { + return fmt.Errorf("default security group %s was not found", hcp.Status.Platform.AWS.DefaultWorkerSecurityGroupID) + } lastAppliedTags = supportawsutil.EC2TagsToMap(sg.Tags)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go` around lines 2412 - 2416, GetSecurityGroupById can return (nil, nil) when the security group is gone; add a nil guard after calling supportawsutil.GetSecurityGroupById (the call that assigns sg, err) so you don't dereference sg.Tags. If sg == nil, set lastAppliedTags to an empty map (or nil) and continue reconciliation instead of accessing sg.Tags; otherwise, keep the existing lastAppliedTags = supportawsutil.EC2TagsToMap(sg.Tags) assignment. This prevents a panic on out-of-band deletion and allows reconciliation to recover.control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go (2)
806-817:⚠️ Potential issue | 🟠 MajorAuthorize only the missing ingress rules.
Line 809 computes
missingPermissions, but the request still sendsingressPermissions. On a partially-correct security group, AWS can reject the batch for duplicate rules and the missing rules never get added.Proposed fix
ingressPermissions := supportawsutil.VPCEndpointSecurityGroupRules(machineCIDRs, vpcEndpointPort(awsEndpointService)) missingPermissions := diffPermissions(sg.IpPermissions, ingressPermissions) if len(missingPermissions) > 0 { if _, err = ec2Client.AuthorizeSecurityGroupIngress(ctx, &ec2v2.AuthorizeSecurityGroupIngressInput{ GroupId: awsv2.String(sgID), - IpPermissions: ingressPermissions, + IpPermissions: missingPermissions, }); err != nil { if supportawsutil.AWSErrorCode(err) != "InvalidPermission.Duplicate" { log.Error(err, "failed to set security group ingress rules", "id", sgID) return fmt.Errorf("failed to set security group ingress rules, code: %s", supportawsutil.AWSErrorCode(err)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go` around lines 806 - 817, The code computes missingPermissions but still calls ec2Client.AuthorizeSecurityGroupIngress with ingressPermissions; change the request to send only missingPermissions (IpPermissions: missingPermissions) and skip the Authorize call entirely if len(missingPermissions)==0 to avoid duplicate-rule rejects. Update the block around AuthorizeSecurityGroupIngress (the call in the awsprivatelink_controller.go that follows computing ingressPermissions and missingPermissions) to use missingPermissions and keep the existing error handling (supportawsutil.AWSErrorCode check) unchanged.
547-583:⚠️ Potential issue | 🔴 CriticalCheck for an empty
VpcEndpointsresult before indexing it.Line 565 reads
output.VpcEndpoints[0]before Line 578 handles the empty-slice case. If AWS returns no endpoints here, the controller panics instead of clearing status and retrying.Proposed fix
output, err := ec2Client.DescribeVpcEndpoints(ctx, &ec2v2.DescribeVpcEndpointsInput{ VpcEndpointIds: []string{endpointID}, }) if err != nil { log.Error(err, "failed to describe vpc endpoint", "endpointID", endpointID) var apiErr smithy.APIError if errors.As(err, &apiErr) { if apiErr.ErrorCode() == "InvalidVpcEndpointId.NotFound" { // clear the EndpointID so a new Endpoint is created on the requeue awsEndpointService.Status.EndpointID = "" return fmt.Errorf("endpoint with id %s not found, resetting status", endpointID) } else { return errors.New(apiErr.ErrorCode()) } } return err } + if len(output.VpcEndpoints) == 0 { + // This should not happen but just in case + // clear the EndpointID so a new Endpoint is created on the requeue + awsEndpointService.Status.EndpointID = "" + return fmt.Errorf("endpoint with id %s not found, resetting status", endpointID) + } + if awsv2.ToString(output.VpcEndpoints[0].ServiceName) != awsEndpointService.Status.EndpointServiceName { log.Info("endpoint links to wrong endpointservice, deleting...", "LinkedVPCEndpointServiceName", awsv2.ToString(output.VpcEndpoints[0].ServiceName), "WantedVPCEndpointService", awsEndpointService.Status.EndpointServiceName) if _, err := ec2Client.DeleteVpcEndpoints(ctx, &ec2v2.DeleteVpcEndpointsInput{ VpcEndpointIds: []string{awsv2.ToString(output.VpcEndpoints[0].VpcEndpointId)}, }); err != nil { log.Error(err, "failed to delete vpc endpoint", "id", awsv2.ToString(output.VpcEndpoints[0].VpcEndpointId)) return fmt.Errorf("error deleting AWSEndpoint: %w", err) } // Once the VPC Endpoint is deleted, we need to send an error in order to reexecute the reconcilliation return fmt.Errorf("current endpoint %s is not pointing to the existing .Status.EndpointServiceName, reconciling by deleting endpoint", awsv2.ToString(output.VpcEndpoints[0].ServiceName)) } - - if len(output.VpcEndpoints) == 0 { - // This should not happen but just in case - // clear the EndpointID so a new Endpoint is created on the requeue - awsEndpointService.Status.EndpointID = "" - return fmt.Errorf("endpoint with id %s not found, resetting status", endpointID) - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go` around lines 547 - 583, The code indexes output.VpcEndpoints[0] before verifying the slice is non-empty, which can panic; after the DescribeVpcEndpoints call in the DescribeVpcEndpoints handling block (around ec2Client.DescribeVpcEndpoints and output.VpcEndpoints), immediately check if len(output.VpcEndpoints) == 0 and if so clear awsEndpointService.Status.EndpointID and return an error (the same "endpoint with id %s not found, resetting status" used later) before any access to output.VpcEndpoints[0]; relocate the existing empty-slice handling (the block that sets awsEndpointService.Status.EndpointID = "" and returns) to directly after the nil-error check so subsequent uses of output.VpcEndpoints[0] (e.g., comparing ServiceName or deleting the endpoint) are safe.
🧹 Nitpick comments (4)
cmd/bastion/aws/create.go (1)
188-188: Consider usingawsapi.EC2APIinterface for consistency.Similar to
destroy.go, the bastion create functions use concrete*ec2.Clienttypes. Consider usingawsapi.EC2APIinterface for consistency withcmd/infra/aws/files and better testability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/bastion/aws/create.go` at line 188, The function ensureBastionSecurityGroup currently accepts a concrete *ec2.Client; change its signature to accept the awsapi.EC2API interface instead (replace parameter type *ec2.Client with awsapi.EC2API) and update any callers to pass the interface-typed client; also update any other bastion create functions in this file that use *ec2.Client to use awsapi.EC2API for consistency with destroy.go and cmd/infra/aws/, and adjust imports to include the awsapi package; ensure all ec2 method calls used inside ensureBastionSecurityGroup remain compatible with the awsapi.EC2API interface.cmd/consolelogs/aws/getlogs.go (1)
126-126: Consider direct comparison instead of string conversion.
ec2types.InstanceStateNameis a string type alias, so you can compare directly without thestring()cast:♻️ Suggested simplification
- if string(instance.State.Name) == "running" { + if instance.State.Name == ec2types.InstanceStateNameRunning {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/consolelogs/aws/getlogs.go` at line 126, Replace the unnecessary string conversion in the instance state check: change the condition using string(instance.State.Name) to compare the EC2 state alias directly (e.g., instance.State.Name == "running") or preferably to the SDK constant ec2types.InstanceStateNameRunning; update the conditional that references instance.State.Name accordingly.cmd/bastion/aws/destroy.go (1)
133-144: Consider usingawsapi.EC2APIinterface for testability.The destroy functions use the concrete
*ec2.Clienttype, while other files in this PR (e.g.,cmd/infra/aws/destroy.go,cmd/infra/aws/ec2.go) use theawsapi.EC2APIinterface. Using the interface would improve testability and maintain consistency across the codebase.♻️ Suggested change
-func destroyBastion(ctx context.Context, logger logr.Logger, ec2Client *ec2.Client, infraID string) error { +func destroyBastion(ctx context.Context, logger logr.Logger, ec2Client awsapi.EC2API, infraID string) error {Apply similar changes to
destroyEC2Instance,destroySecurityGroup, anddestroyKeyPair.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/bastion/aws/destroy.go` around lines 133 - 144, The destroyBastion function and its callees currently accept *ec2.Client which hinders testing; change the signature of destroyBastion(ctx, logger, ec2Client, infraID) to accept the awsapi.EC2API interface instead of *ec2.Client, and update the calls to destroyEC2Instance, destroySecurityGroup, and destroyKeyPair to also take awsapi.EC2API (and adjust their implementations and any helper functions they call accordingly) so they match the interface used elsewhere (e.g., in destroyEC2Instance, destroySecurityGroup, destroyKeyPair); ensure types compile and tests can mock EC2 by replacing all concrete *ec2.Client parameter types with awsapi.EC2API in these functions and their call sites.control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go (1)
717-718: Require at least oneDescribeVpcEndpointscall here.
AnyTimes()lets these tests pass even if the reconciler stops issuing the v2 EC2 lookup entirely, so this migration path is no longer asserted.MinTimes(1)keeps the tests flexible without dropping the coverage.💡 Tighten both expectations
- mockEC2.EXPECT().DescribeVpcEndpoints(gomock.Any(), gomock.Any()).Return(&ec2.DescribeVpcEndpointsOutput{}, fmt.Errorf("not ready")).AnyTimes() + mockEC2.EXPECT().DescribeVpcEndpoints(gomock.Any(), gomock.Any()).Return(&ec2.DescribeVpcEndpointsOutput{}, fmt.Errorf("not ready")).MinTimes(1)Also applies to: 798-799
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go` around lines 717 - 718, The test currently sets up mockEC2 to expect DescribeVpcEndpoints with AnyTimes(), which allows zero calls; change the expectation on mockEC2.EXPECT().DescribeVpcEndpoints(gomock.Any(), gomock.Any()).Return(...).AnyTimes() to require at least one call by using MinTimes(1) instead of AnyTimes(), and apply the same change to the other occurrence around lines referencing mockEC2 (the second DescribeVpcEndpoints expectation at 798-799) so the reconciler must perform the v2 EC2 lookup at least once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/infra/aws/ec2.go`:
- Around line 627-634: The hasAssociatedSubnet function is incorrectly comparing
assoc.RouteTableId to subnetID; update the comparison inside hasAssociatedSubnet
to check aws.ToString(assoc.SubnetId) == subnetID instead of RouteTableId so it
correctly detects an association by subnet; keep the loop and return semantics
unchanged.
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 2498-2508: The discovery filters in awsSecurityGroupFilters
currently rely on mutable tags ("Name" and "kubernetes.io/cluster/<infraID>")
that callers can override; change the destructive-path discovery
(create/delete/cleanup) to not rely on these overridable tag filters — instead
use the stored security group ID from status when available or a dedicated
immutable discovery tag, and ensure awsSecurityGroupFilters (and any call sites
that currently call it in create/delete/cleanup paths) are only used for
non-destructive listing; update callers that perform deletion or idempotent
create to prefer the stored SG ID (or the dedicated tag) over
awsSecurityGroupFilters/awsSecurityGroupName to avoid leaking or duplicating the
default worker SG.
---
Outside diff comments:
In
`@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go`:
- Around line 806-817: The code computes missingPermissions but still calls
ec2Client.AuthorizeSecurityGroupIngress with ingressPermissions; change the
request to send only missingPermissions (IpPermissions: missingPermissions) and
skip the Authorize call entirely if len(missingPermissions)==0 to avoid
duplicate-rule rejects. Update the block around AuthorizeSecurityGroupIngress
(the call in the awsprivatelink_controller.go that follows computing
ingressPermissions and missingPermissions) to use missingPermissions and keep
the existing error handling (supportawsutil.AWSErrorCode check) unchanged.
- Around line 547-583: The code indexes output.VpcEndpoints[0] before verifying
the slice is non-empty, which can panic; after the DescribeVpcEndpoints call in
the DescribeVpcEndpoints handling block (around ec2Client.DescribeVpcEndpoints
and output.VpcEndpoints), immediately check if len(output.VpcEndpoints) == 0 and
if so clear awsEndpointService.Status.EndpointID and return an error (the same
"endpoint with id %s not found, resetting status" used later) before any access
to output.VpcEndpoints[0]; relocate the existing empty-slice handling (the block
that sets awsEndpointService.Status.EndpointID = "" and returns) to directly
after the nil-error check so subsequent uses of output.VpcEndpoints[0] (e.g.,
comparing ServiceName or deleting the endpoint) are safe.
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 2412-2416: GetSecurityGroupById can return (nil, nil) when the
security group is gone; add a nil guard after calling
supportawsutil.GetSecurityGroupById (the call that assigns sg, err) so you don't
dereference sg.Tags. If sg == nil, set lastAppliedTags to an empty map (or nil)
and continue reconciliation instead of accessing sg.Tags; otherwise, keep the
existing lastAppliedTags = supportawsutil.EC2TagsToMap(sg.Tags) assignment. This
prevents a panic on out-of-band deletion and allows reconciliation to recover.
---
Nitpick comments:
In `@cmd/bastion/aws/create.go`:
- Line 188: The function ensureBastionSecurityGroup currently accepts a concrete
*ec2.Client; change its signature to accept the awsapi.EC2API interface instead
(replace parameter type *ec2.Client with awsapi.EC2API) and update any callers
to pass the interface-typed client; also update any other bastion create
functions in this file that use *ec2.Client to use awsapi.EC2API for consistency
with destroy.go and cmd/infra/aws/, and adjust imports to include the awsapi
package; ensure all ec2 method calls used inside ensureBastionSecurityGroup
remain compatible with the awsapi.EC2API interface.
In `@cmd/bastion/aws/destroy.go`:
- Around line 133-144: The destroyBastion function and its callees currently
accept *ec2.Client which hinders testing; change the signature of
destroyBastion(ctx, logger, ec2Client, infraID) to accept the awsapi.EC2API
interface instead of *ec2.Client, and update the calls to destroyEC2Instance,
destroySecurityGroup, and destroyKeyPair to also take awsapi.EC2API (and adjust
their implementations and any helper functions they call accordingly) so they
match the interface used elsewhere (e.g., in destroyEC2Instance,
destroySecurityGroup, destroyKeyPair); ensure types compile and tests can mock
EC2 by replacing all concrete *ec2.Client parameter types with awsapi.EC2API in
these functions and their call sites.
In `@cmd/consolelogs/aws/getlogs.go`:
- Line 126: Replace the unnecessary string conversion in the instance state
check: change the condition using string(instance.State.Name) to compare the EC2
state alias directly (e.g., instance.State.Name == "running") or preferably to
the SDK constant ec2types.InstanceStateNameRunning; update the conditional that
references instance.State.Name accordingly.
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go`:
- Around line 717-718: The test currently sets up mockEC2 to expect
DescribeVpcEndpoints with AnyTimes(), which allows zero calls; change the
expectation on mockEC2.EXPECT().DescribeVpcEndpoints(gomock.Any(),
gomock.Any()).Return(...).AnyTimes() to require at least one call by using
MinTimes(1) instead of AnyTimes(), and apply the same change to the other
occurrence around lines referencing mockEC2 (the second DescribeVpcEndpoints
expectation at 798-799) so the reconciler must perform the v2 EC2 lookup at
least once.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bca4a07a-9052-4b29-9e36-5528c44052b7
📒 Files selected for processing (14)
cmd/bastion/aws/create.gocmd/bastion/aws/destroy.gocmd/consolelogs/aws/getlogs.gocmd/infra/aws/create.gocmd/infra/aws/delegating_client.gocmd/infra/aws/delegatingclientgenerator/main.gocmd/infra/aws/destroy.gocmd/infra/aws/ec2.gocmd/infra/aws/route53.gocontrol-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.gocontrol-plane-operator/controllers/awsprivatelink/awsprivatelink_controller_test.gocontrol-plane-operator/controllers/healthcheck/aws.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go
c4fbe28 to
a9eb9ed
Compare
|
@LiangquanLi930: This pull request references CNTRLPLANE-2219 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
test/e2e/nodepool_day2_tags_test.go (1)
86-93:⚠️ Potential issue | 🟡 MinorUse the current
awsMachinewhen resolving the EC2 instance.Line 89 still reads
awsMachines.Items[0].Spec.InstanceID, so each loop iteration describes the first instance only. If this list ever contains multiple machines, the test can pass without checking tags on the others.Suggested fix
- instanceID := awsMachines.Items[0].Spec.InstanceID + instanceID := awsMachine.Spec.InstanceID // Fetch the EC2 instance to verify the tag instance, err := ec2client.DescribeInstances(ar.ctx, &ec2.DescribeInstancesInput{ InstanceIds: []string{aws.ToString(instanceID)}, })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/nodepool_day2_tags_test.go` around lines 86 - 93, The loop is always describing the first EC2 instance because it uses awsMachines.Items[0].Spec.InstanceID; change the call to use the current loop variable awsMachine.Spec.InstanceID so each iteration calls ec2client.DescribeInstances with the right InstanceID (keep using DescribeInstances and InstanceIds as before) and validate tags for each awsMachine rather than always for the first item.control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go (2)
547-583:⚠️ Potential issue | 🔴 CriticalCheck for zero endpoints before reading
output.VpcEndpoints[0].Line 565 indexes
output.VpcEndpoints[0], but the empty-result fallback is on Line 578. IfDescribeVpcEndpointssucceeds with no matches, this path panics instead of resettingStatus.EndpointID.🐛 Proposed fix
- if awsv2.ToString(output.VpcEndpoints[0].ServiceName) != awsEndpointService.Status.EndpointServiceName { + if len(output.VpcEndpoints) == 0 { + awsEndpointService.Status.EndpointID = "" + return fmt.Errorf("endpoint with id %s not found, resetting status", endpointID) + } + + if awsv2.ToString(output.VpcEndpoints[0].ServiceName) != awsEndpointService.Status.EndpointServiceName { log.Info("endpoint links to wrong endpointservice, deleting...", "LinkedVPCEndpointServiceName", awsv2.ToString(output.VpcEndpoints[0].ServiceName), "WantedVPCEndpointService", awsEndpointService.Status.EndpointServiceName) if _, err := ec2Client.DeleteVpcEndpoints(ctx, &ec2v2.DeleteVpcEndpointsInput{ VpcEndpointIds: []string{awsv2.ToString(output.VpcEndpoints[0].VpcEndpointId)}, }); err != nil { log.Error(err, "failed to delete vpc endpoint", "id", awsv2.ToString(output.VpcEndpoints[0].VpcEndpointId)) return fmt.Errorf("error deleting AWSEndpoint: %w", err) } @@ - if len(output.VpcEndpoints) == 0 { - // This should not happen but just in case - // clear the EndpointID so a new Endpoint is created on the requeue - awsEndpointService.Status.EndpointID = "" - return fmt.Errorf("endpoint with id %s not found, resetting status", endpointID) - } log.Info("endpoint exists", "endpointID", endpointID)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go` around lines 547 - 583, DescribeVpcEndpoints result is being indexed at output.VpcEndpoints[0] before checking that any endpoints exist; move the len(output.VpcEndpoints) == 0 check immediately after the DescribeVpcEndpoints success and before any access to output.VpcEndpoints[0], reset awsEndpointService.Status.EndpointID and return the reset error there, and then perform comparisons/deletion logic (the code that references output.VpcEndpoints[0].ServiceName and VpcEndpointId) only after confirming len(output.VpcEndpoints) > 0 so no out-of-range panic occurs.
222-257:⚠️ Potential issue | 🔴 CriticalBuild AWS clients from an HCP snapshot, not shared mutable state.
initializeWithHCP()mutates the builder state andgetClients()reads it later under a different lock. Because this controller reconciles concurrently, another request can swap the assume-role ARNs between those calls and this reconcile can act with the wrong AWS credentials/account.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go` around lines 222 - 257, The getClients method reads mutable builder fields (assumeSharedVPCEndpointRoleARN, assumeSharedVPCRoute53RoleARN, awsConfig) that initializeWithHCP may mutate concurrently; capture the HCP-derived values into local variables (or accept the HCP snapshot as a parameter) while holding the builder lock (or copy them atomically) and then use those locals to build sts clients and credentials providers in getClients (i.e., do not reference b.assumeSharedVPCEndpointRoleARN or b.assumeSharedVPCRoute53RoleARN after releasing the lock); ensure awsConfig(ctx) is called with the correct per-snapshot configuration rather than relying on shared mutable state so each reconcile builds AWS clients from the immutable snapshot values.hypershift-operator/controllers/platform/aws/controller.go (1)
546-566:⚠️ Potential issue | 🟡 MinorMissing pagination handling in
findExistingVpcEndpointService.
DescribeVpcEndpointServiceConfigurationsmay return paginated results. If there are many VPC endpoint services, only the first page will be searched and the function may incorrectly report "no endpoint service found" when one exists on a subsequent page.🔧 Proposed fix to handle pagination
func findExistingVpcEndpointService(ctx context.Context, ec2Client awsapi.EC2API, lbARN string) (string, string, error) { - output, err := ec2Client.DescribeVpcEndpointServiceConfigurations(ctx, &ec2v2.DescribeVpcEndpointServiceConfigurationsInput{}) - if err != nil { - var apiErr smithy.APIError - if errors.As(err, &apiErr) { - return "", "", errors.New(apiErr.ErrorCode()) + var nextToken *string + for { + output, err := ec2Client.DescribeVpcEndpointServiceConfigurations(ctx, &ec2v2.DescribeVpcEndpointServiceConfigurationsInput{ + NextToken: nextToken, + }) + if err != nil { + var apiErr smithy.APIError + if errors.As(err, &apiErr) { + return "", "", errors.New(apiErr.ErrorCode()) + } + return "", "", err } - return "", "", err - } - if len(output.ServiceConfigurations) == 0 { - return "", "", fmt.Errorf("no endpoint services found") - } - for _, svc := range output.ServiceConfigurations { - for _, lbArn := range svc.NetworkLoadBalancerArns { - if lbArn == lbARN { - return awsv2.ToString(svc.ServiceName), awsv2.ToString(svc.ServiceId), nil + for _, svc := range output.ServiceConfigurations { + for _, lbArn := range svc.NetworkLoadBalancerArns { + if lbArn == lbARN { + return awsv2.ToString(svc.ServiceName), awsv2.ToString(svc.ServiceId), nil + } } } + if output.NextToken == nil { + break + } + nextToken = output.NextToken } return "", "", fmt.Errorf("no endpoint service found with load balancer ARN %s", lbARN) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/platform/aws/controller.go` around lines 546 - 566, The function findExistingVpcEndpointService currently only inspects the first page returned by DescribeVpcEndpointServiceConfigurations; update it to handle pagination by iterating through all pages (using the SDK paginator or looping with NextToken) and accumulate or check ServiceConfigurations across pages, preserving the existing error handling (including the smithy.APIError branch) and returning the matching ServiceName and ServiceId as soon as a NetworkLoadBalancerArn equals lbARN; ensure you still return the same "no endpoint service found..." error if no match is found after all pages are processed.cmd/infra/aws/destroy.go (1)
253-275:⚠️ Potential issue | 🔴 CriticalInitialize
vpcOwnerEC2Clientin the delegated-client path.This branch leaves
vpcOwnerEC2Clientnil, but the rest ofDestroyInfraalways uses it for owner-side cleanup. The first paginator over that nil interface will panic instead of destroying anything.🛠️ Proposed fix
} else { if o.VPCOwnerCredentialsOpts.AWSCredentialsFile != "" { return fmt.Errorf("delegating client is not supported for shared vpc infrastructure") } delegatingClent, err := NewDelegatingClient( ctx, o.AWSEbsCsiDriverControllerCredentialsFile, o.CloudControllerCredentialsFile, o.CloudNetworkConfigControllerCredentialsFile, o.ControlPlaneOperatorCredentialsFile, o.NodePoolCredentialsFile, o.OpenshiftImageRegistryCredentialsFile, ) if err != nil { return fmt.Errorf("failed to create delegating client: %w", err) } ec2Client = delegatingClent.EC2Client + vpcOwnerEC2Client = ec2Client elbClient = delegatingClent.ELBClient elbv2Client = delegatingClent.ELBV2Client listRoute53Client = delegatingClent.ROUTE53Client recordsRoute53Client = delegatingClent.ROUTE53Client s3Client = delegatingClent.S3Client }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/infra/aws/destroy.go` around lines 253 - 275, The delegated-client branch fails to set vpcOwnerEC2Client causing downstream panics; after creating delegatingClent in the else branch (where ec2Client, elbClient, elbv2Client, listRoute53Client, recordsRoute53Client, s3Client are assigned) also initialize vpcOwnerEC2Client from the delegating client (e.g., vpcOwnerEC2Client = delegatingClent.VPCOwnerEC2Client) so the owner-side cleanup paginators have a non-nil client.
♻️ Duplicate comments (1)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)
2498-2508:⚠️ Potential issue | 🟠 MajorDon't discover the worker security group via overridable tags.
awsSecurityGroupTags()lets callers override bothNameandkubernetes.io/cluster/<infraID>, so these filters can stop matching the SG on create/delete paths. That can leak the SG or recreate duplicates; destructive lookup should use the stored SG ID or an immutable discovery tag instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go` around lines 2498 - 2508, The awsSecurityGroupFilters function is using overridable tags (Name and tag:kubernetes.io/cluster/<infraID>) which callers can override via awsSecurityGroupTags and thus can break destructive lookups; update the discovery logic so awsSecurityGroupFilters no longer relies on those mutable tags—use the stored security-group ID when available (the create/delete code paths should prefer the persisted SG ID), or switch to a stable/immutable discovery tag (add/read an immutable tag key and use that in awsSecurityGroupFilters) and stop matching on awsSecurityGroupName and the cluster tag; update calls that currently depend on awsSecurityGroupFilters to fall back to the stored ID or the new immutable tag lookup.
🧹 Nitpick comments (4)
hypershift-operator/main.go (1)
419-425: Consider improving error handling for AWS configuration loading.The
NewSessionV2function incmd/infra/aws/util/util.go(line 161) silently discards errors fromconfigv2.LoadDefaultConfigwithcfg, _ :=, which means credential resolution failures produce a zero-value config and cause unclear errors later. While the operator environment is expected to have AWS credentials available via the default credential chain (IAM role or environment variables), adding explicit error handling inNewSessionV2to fail fast with a clear message would improve robustness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/main.go` around lines 419 - 425, The call to awsutil.NewSessionV2 is currently ignoring credential/config load errors downstream; update NewSessionV2 to return (aws.Config, error) and change the call site in main.go (where awsSession := awsutil.NewSessionV2(...)) to check the error, log a clear message and exit or propagate the error when config/credentials fail to load; specifically modify the NewSessionV2 function to surface the error returned by configv2.LoadDefaultConfig and then update the usage in main.go (the awsutil.NewSessionV2 invocation and subsequent ec2.NewFromConfig call) to handle that error before creating ec2Client.test/e2e/nodepool_autorepair_test.go (1)
99-104: Fail fast on AWS config setup here.Line 100 uses
context.Background(), and the helper it calls incmd/infra/aws/util/util.go:144-162currently discards theLoadDefaultConfigerror. That lets this test build an EC2 client even when AWS config loading has already failed, so the real setup problem can be deferred until the firstTerminateInstancescall. Since the same helper also routescredentialsFilethroughWithSharedConfigFiles, please passar.ctxthrough, return the config-load error, and verify that the explicit credentials file is being wired through the intended AWS SDK v2 load option. (pkg.go.dev)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/nodepool_autorepair_test.go` around lines 99 - 104, The test should fail fast when AWS config loading fails: change ec2Client to accept and pass a context (use ar.ctx) into awsutil.NewSessionV2 instead of context.Background(), and update awsutil.NewSessionV2 to return any config-load error rather than discarding it so ec2Client can propagate that error to the caller; also ensure the explicit credentialsFile is passed through awsutil.NewSessionV2 into the AWS SDK v2 option WithSharedConfigFiles so the provided credentials file is actually used.cmd/consolelogs/aws/getlogs.go (1)
145-163:defer cancel()inside loop may delay context cancellation.The
defer cancel()at line 147 will not execute until the function returns, not at each loop iteration. This means timeout contexts accumulate. While not strictly a bug (they'll be cleaned up eventually), consider callingcancel()explicitly after each iteration for better resource hygiene.♻️ Proposed fix
for name, instanceID := range instances { ctxWithTimeout, cancel := context.WithTimeout(ctx, 2*time.Minute) - defer cancel() output, err := ec2Client.GetConsoleOutput(ctxWithTimeout, &ec2.GetConsoleOutputInput{ InstanceId: aws.String(instanceID), }) + cancel() // Cancel immediately after use to free resources if err != nil { errs = append(errs, err) continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/consolelogs/aws/getlogs.go` around lines 145 - 163, The loop creates a per-iteration context (ctxWithTimeout) but uses defer cancel(), which defers cancellation until the surrounding function returns and lets contexts accumulate; replace the defer with an explicit cancel() at the end of each iteration (after writing the file / appending errs) or wrap the loop body in an anonymous function and keep defer cancel() inside that closure so ctxWithTimeout is cancelled immediately each iteration; look for the for name, instanceID := range instances loop, ctxWithTimeout/cancel, ec2Client.GetConsoleOutput, base64.StdEncoding.DecodeString and os.WriteFile to apply the change.hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go (1)
332-336: Assert the requested instance type in the mock expectation.Using
gomock.Any()for theDescribeInstanceTypesinput means this test still passes ifGetInstanceTypeInfostops scoping the request tott.instanceType. MatchingDescribeInstanceTypesInput.InstanceTypeshere would keep that contract under test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go` around lines 332 - 336, The test currently uses gomock.Any() for DescribeInstanceTypes input so it doesn't assert the requested instance type; update the mockEC2 expectation for DescribeInstanceTypes to verify the DescribeInstanceTypesInput.InstanceTypes contains the expected tt.instanceType (the input arg to GetInstanceTypeInfo) instead of gomock.Any(). Locate mockEC2.EXPECT().DescribeInstanceTypes and replace the second matcher with a gomock matcher (AssignableToTypeOf + Matches or a custom matcher) that casts the input to *ec2.DescribeInstanceTypesInput and returns true only when InstanceTypes includes exactly tt.instanceType, keeping the return values (&ec2.DescribeInstanceTypesOutput{InstanceTypes: tt.instanceTypes}, tt.describeErr) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go`:
- Around line 807-812: The code computes missingPermissions :=
diffPermissions(sg.IpPermissions, ingressPermissions) but then calls
ec2Client.AuthorizeSecurityGroupIngress with the full ingressPermissions, which
can cause InvalidPermission.Duplicate errors and fail to add only the missing
rules; change the AuthorizeSecurityGroupIngress call in the branch that checks
len(missingPermissions) > 0 to use missingPermissions as the IpPermissions value
(i.e., pass missingPermissions to AuthorizeSecurityGroupIngressInput instead of
ingressPermissions), keeping GroupId as awsv2.String(sgID) and preserving the
same ctx and error handling.
- Around line 999-1014: The DeleteVpcEndpoints call currently returns an error
when the endpoint is already gone; update the error handling around
ec2Client.DeleteVpcEndpoints (the block that checks endpointID != "" and calls
ec2Client.DeleteVpcEndpoints) to detect smithy.APIError with ErrorCode() ==
"InvalidVpcEndpointId.NotFound" and treat that as success (do not return an
error), mirroring the DescribeVpcEndpoints handling that ignores that specific
API error; ensure other errors are still returned unchanged.
In `@control-plane-operator/controllers/healthcheck/aws.go`:
- Around line 51-56: The code currently checks apiErr.ErrorCode() ==
"WebIdentityErr" which never matches in AWS SDK v2; update the condition around
the errors.As(err, &apiErr) block to check for the actual STS error codes
returned by AssumeRoleWithWebIdentity (InvalidIdentityToken, IDPRejectedClaim,
IDPCommunicationError) instead of "WebIdentityErr" so the metav1.Condition (the
ValidAWSIdentityProvider condition) is set correctly; modify the if that
compares apiErr.ErrorCode() to use a logical OR of those three strings and leave
the rest of the condition construction (metav1.Condition, status false,
reason/message handling) unchanged.
In `@karpenter-operator/controllers/karpenter/machine_approver.go`:
- Around line 259-264: NewSessionV2 currently swallows the LoadDefaultConfig
error, causing getEC2Client to return a usable ec2.Client even when config
loading failed; change awsutil.NewSessionV2 to return (aws.Config, error) (or at
least include an error return) and propagate the error from
config.LoadDefaultConfig instead of assigning it to the blank identifier, then
update getEC2Client to call awsutil.NewSessionV2, check the returned error and
return nil,err when config loading fails; ensure callers reference the new
signature (awsutil.NewSessionV2, getEC2Client) and handle the error path so
client construction fails fast on invalid AWS config.
In `@test/e2e/nodepool_kms_root_volume_test.go`:
- Line 111: The assertion using BeElementOf() with no arguments is
invalid/meaningless; remove or replace it. Locate the test assertion that calls
BeElementOf() on resp.KmsKeyId and either delete that line if redundant given
the surrounding checks (e.g., other assertions around resp.KmsKeyId) or call
BeElementOf() with the intended expected values (provide the allowed KMS key
ID(s)) so the matcher has arguments; ensure the test uses the corrected
assertion for resp.KmsKeyId.
In `@test/e2e/util/aws.go`:
- Around line 53-63: GetDefaultSecurityGroup currently creates its own
context.Background() which prevents callers from canceling
DescribeSecurityGroups; change the function signature to accept a context (e.g.,
func GetDefaultSecurityGroup(ctx context.Context, awsCreds, awsRegion, sgID
string) (*ec2types.SecurityGroup, error)), remove context.Background() usage,
and pass the received ctx into awsutil.NewSessionV2, ec2.NewFromConfig calls,
and the ec2Client.DescribeSecurityGroups call so the outer
Eventually(...).WithContext(ctx) can cancel AWS requests.
---
Outside diff comments:
In `@cmd/infra/aws/destroy.go`:
- Around line 253-275: The delegated-client branch fails to set
vpcOwnerEC2Client causing downstream panics; after creating delegatingClent in
the else branch (where ec2Client, elbClient, elbv2Client, listRoute53Client,
recordsRoute53Client, s3Client are assigned) also initialize vpcOwnerEC2Client
from the delegating client (e.g., vpcOwnerEC2Client =
delegatingClent.VPCOwnerEC2Client) so the owner-side cleanup paginators have a
non-nil client.
In
`@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go`:
- Around line 547-583: DescribeVpcEndpoints result is being indexed at
output.VpcEndpoints[0] before checking that any endpoints exist; move the
len(output.VpcEndpoints) == 0 check immediately after the DescribeVpcEndpoints
success and before any access to output.VpcEndpoints[0], reset
awsEndpointService.Status.EndpointID and return the reset error there, and then
perform comparisons/deletion logic (the code that references
output.VpcEndpoints[0].ServiceName and VpcEndpointId) only after confirming
len(output.VpcEndpoints) > 0 so no out-of-range panic occurs.
- Around line 222-257: The getClients method reads mutable builder fields
(assumeSharedVPCEndpointRoleARN, assumeSharedVPCRoute53RoleARN, awsConfig) that
initializeWithHCP may mutate concurrently; capture the HCP-derived values into
local variables (or accept the HCP snapshot as a parameter) while holding the
builder lock (or copy them atomically) and then use those locals to build sts
clients and credentials providers in getClients (i.e., do not reference
b.assumeSharedVPCEndpointRoleARN or b.assumeSharedVPCRoute53RoleARN after
releasing the lock); ensure awsConfig(ctx) is called with the correct
per-snapshot configuration rather than relying on shared mutable state so each
reconcile builds AWS clients from the immutable snapshot values.
In `@hypershift-operator/controllers/platform/aws/controller.go`:
- Around line 546-566: The function findExistingVpcEndpointService currently
only inspects the first page returned by
DescribeVpcEndpointServiceConfigurations; update it to handle pagination by
iterating through all pages (using the SDK paginator or looping with NextToken)
and accumulate or check ServiceConfigurations across pages, preserving the
existing error handling (including the smithy.APIError branch) and returning the
matching ServiceName and ServiceId as soon as a NetworkLoadBalancerArn equals
lbARN; ensure you still return the same "no endpoint service found..." error if
no match is found after all pages are processed.
In `@test/e2e/nodepool_day2_tags_test.go`:
- Around line 86-93: The loop is always describing the first EC2 instance
because it uses awsMachines.Items[0].Spec.InstanceID; change the call to use the
current loop variable awsMachine.Spec.InstanceID so each iteration calls
ec2client.DescribeInstances with the right InstanceID (keep using
DescribeInstances and InstanceIds as before) and validate tags for each
awsMachine rather than always for the first item.
---
Duplicate comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 2498-2508: The awsSecurityGroupFilters function is using
overridable tags (Name and tag:kubernetes.io/cluster/<infraID>) which callers
can override via awsSecurityGroupTags and thus can break destructive lookups;
update the discovery logic so awsSecurityGroupFilters no longer relies on those
mutable tags—use the stored security-group ID when available (the create/delete
code paths should prefer the persisted SG ID), or switch to a stable/immutable
discovery tag (add/read an immutable tag key and use that in
awsSecurityGroupFilters) and stop matching on awsSecurityGroupName and the
cluster tag; update calls that currently depend on awsSecurityGroupFilters to
fall back to the stored ID or the new immutable tag lookup.
---
Nitpick comments:
In `@cmd/consolelogs/aws/getlogs.go`:
- Around line 145-163: The loop creates a per-iteration context (ctxWithTimeout)
but uses defer cancel(), which defers cancellation until the surrounding
function returns and lets contexts accumulate; replace the defer with an
explicit cancel() at the end of each iteration (after writing the file /
appending errs) or wrap the loop body in an anonymous function and keep defer
cancel() inside that closure so ctxWithTimeout is cancelled immediately each
iteration; look for the for name, instanceID := range instances loop,
ctxWithTimeout/cancel, ec2Client.GetConsoleOutput,
base64.StdEncoding.DecodeString and os.WriteFile to apply the change.
In `@hypershift-operator/controllers/nodepool/instancetype/aws/provider_test.go`:
- Around line 332-336: The test currently uses gomock.Any() for
DescribeInstanceTypes input so it doesn't assert the requested instance type;
update the mockEC2 expectation for DescribeInstanceTypes to verify the
DescribeInstanceTypesInput.InstanceTypes contains the expected tt.instanceType
(the input arg to GetInstanceTypeInfo) instead of gomock.Any(). Locate
mockEC2.EXPECT().DescribeInstanceTypes and replace the second matcher with a
gomock matcher (AssignableToTypeOf + Matches or a custom matcher) that casts the
input to *ec2.DescribeInstanceTypesInput and returns true only when
InstanceTypes includes exactly tt.instanceType, keeping the return values
(&ec2.DescribeInstanceTypesOutput{InstanceTypes: tt.instanceTypes},
tt.describeErr) unchanged.
In `@hypershift-operator/main.go`:
- Around line 419-425: The call to awsutil.NewSessionV2 is currently ignoring
credential/config load errors downstream; update NewSessionV2 to return
(aws.Config, error) and change the call site in main.go (where awsSession :=
awsutil.NewSessionV2(...)) to check the error, log a clear message and exit or
propagate the error when config/credentials fail to load; specifically modify
the NewSessionV2 function to surface the error returned by
configv2.LoadDefaultConfig and then update the usage in main.go (the
awsutil.NewSessionV2 invocation and subsequent ec2.NewFromConfig call) to handle
that error before creating ec2Client.
In `@test/e2e/nodepool_autorepair_test.go`:
- Around line 99-104: The test should fail fast when AWS config loading fails:
change ec2Client to accept and pass a context (use ar.ctx) into
awsutil.NewSessionV2 instead of context.Background(), and update
awsutil.NewSessionV2 to return any config-load error rather than discarding it
so ec2Client can propagate that error to the caller; also ensure the explicit
credentialsFile is passed through awsutil.NewSessionV2 into the AWS SDK v2
option WithSharedConfigFiles so the provided credentials file is actually used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 73cde95b-7d4e-4ec0-9c35-f6f05934ddee
📒 Files selected for processing (33)
cmd/bastion/aws/create.gocmd/bastion/aws/destroy.gocmd/consolelogs/aws/getlogs.gocmd/infra/aws/create.gocmd/infra/aws/destroy.gocmd/infra/aws/ec2.gocmd/infra/aws/route53.gocontrol-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.gocontrol-plane-operator/controllers/awsprivatelink/awsprivatelink_controller_test.gocontrol-plane-operator/controllers/healthcheck/aws.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.gohypershift-operator/controllers/nodepool/instancetype/aws/provider.gohypershift-operator/controllers/nodepool/instancetype/aws/provider_test.gohypershift-operator/controllers/nodepool/metrics/metrics.gohypershift-operator/controllers/nodepool/metrics/metrics_test.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/platform/aws/controller.gohypershift-operator/controllers/platform/aws/controller_test.gohypershift-operator/main.gokarpenter-operator/controllers/karpenter/machine_approver.gokarpenter-operator/controllers/karpenter/machine_approver_test.gosupport/awsclient/client.gosupport/awsclient/fake/fake.gosupport/awsutil/sg.gotest/e2e/karpenter_test.gotest/e2e/nodepool_autorepair_test.gotest/e2e/nodepool_day2_tags_test.gotest/e2e/nodepool_imagetype_test.gotest/e2e/nodepool_kms_root_volume_test.gotest/e2e/util/aws.gotest/e2e/util/dump/journals.gotest/e2e/util/util.go
💤 Files with no reviewable changes (2)
- support/awsclient/fake/fake.go
- support/awsclient/client.go
|
/test unit |
0fe7f4e to
5e9e1ca
Compare
|
/test e2e-aws |
| state := aws.StringValue(conn.VpcEndpointState) | ||
| state := string(conn.VpcEndpointState) | ||
| switch state { | ||
| case "pendingAcceptance", "pending", "available": |
There was a problem hiding this comment.
if (and only if) you remodify the PR there's no constant for this? Sometimes AWS-SDK surprises me :)
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
/lgtm |
5e9e1ca to
cc73f8a
Compare
|
/pipeline required |
|
Scheduling tests matching the |
Signed-off-by: Liangquan Li <liangli@redhat.com>
cc73f8a to
2a7fdb5
Compare
|
/pipeline required |
|
Scheduling tests matching the |
|
/retest |
|
/test e2e-aws |
|
@LiangquanLi930: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
| log.Info("existing endpoint service not found, adoption failed", "err", err) | ||
| return errors.New(awsErr.Code()) | ||
| return errors.New(apiErr.ErrorCode()) |
There was a problem hiding this comment.
I know this is just a migration without refactors, should keep those redundant messages and tackle them in a follow up PR? (multiple times in this PR)
There was a problem hiding this comment.
Yes, keeping as-is since this is a migration-only PR. The inconsistency between the logged error and the returned error code is a pre-existing issue — will tackle it in a follow-up tracked in CNTRLPLANE-2078.
jparrill
left a comment
There was a problem hiding this comment.
Dropped a question. Let me know for tagging
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jparrill, LiangquanLi930, sdminonne The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by e2e test |
|
@LiangquanLi930: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit
Refactor
Tests