Conversation
|
@tthvo: This pull request references CORS-4072 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tthvo: This pull request references CORS-4072 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tthvo: This pull request references CORS-4072 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177 openshift/machine-api-provider-aws#171 |
|
/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177 openshift/machine-api-provider-aws#171 |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-aws-usgov-ipi-private-ep-fips-f7 |
|
@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/70bcb810-1773-11f1-831b-7034f4f09cdc-0 |
|
/label platform/aws |
|
/test golint |
|
/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177 |
|
/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177 |
patrickdillon
left a comment
There was a problem hiding this comment.
This LGTM, but I had a few questions, particularly about why we need the node port changes.
The code changes for that and SSH look good, but it's unfortunate we need to do that. I suppose, particularly for SSH, it does not make sense to use Security Group Roles? i.e. we instruct users to add the role to the installer host
I'm less sure about node port... if we do need it, seems like Security Group Role might be ok there?
|
Also, I don't really get how install is copmleting in the test with MAPI PR: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-installer-10353-openshift-machine-api-provider-aws-177-e2e-aws-ovn-dualstack-ipv4-primary-techpreview/2029358234887786496 I thought we were blocked by ingress on this. I know previously you were using the node port hack to get around the ingress issue; and there's still nodeport changes in this pr, but I thought those were just security group rules. Seems like I'm missing something |
The security group already have rules to allow node ports (30000-32767) to However, we also need to allow traffic from the load balancers created by CCM for
Right, unfortunately, the roles are specifically for cluster components (i.e. nodes, load balancers) so we can't really add one to installer host. Also, a role is corresponding a security group, which means the installer host may be allowed to access ports (for example, 30000-32767) more than intended. |
Right, we are still blocked by ingress and CCM as of today. However, in dualstack IPv4 primary, the following happens:
This means IPv4 traffic will flow to routers just fine even though IPv6 is not supported. To the cluster operators, the setup is the same as a regular singlestack IPv4 since they only care if the router is reachable (regardless of what IP family). Thus, the install completed. For dualstack IPv6 primary, the That being said, it's a "false positive" that the install succeeded :D |
I see! Thinking about this more, as I mentioned above, this should actually be handled by the ingress component... Technically we should not have to deal with this. In the case that we actually do need to do something like this, I do wonder if we could pre-provision, or use a determnistic name for a security group, so that ingress creates the LB in a security group we have pre-created rules for. |
Ahh yess, the installer can pre-create an additional security group with only allowed node port ingress rule (30000-32767) and attach it to cluster machines, similar to AdditionalSecurityGroupIDs feature. The only issue is that 30000-32767 is still open to all IPs if |
|
Regarding ingress services/nodeport, I realize now I was assuming that aws would have the same behavior as cloud-provider-gcp, but it does not. So sorry for confusing things. In gcp, the cloud provider will handle opening all the ports for an ingress service, but that does not happen in |
Right, I was thinking we would leave That said, the current approach looks good and ingress isn't even sorted out, so maybe that is just something we want to consider once ingress is in place, as it might help simplify our code a bit. |
I guess if we can use an openshift fork of CAPA, we can diverge from upstream a bit to restrict the node port service ingress to the VPC CIDRs (i.e instead of all IPs) by default if left unset. That would solve all the concern and make things so much simpler 😅 For now, I simplied the changes a bit to only apply the patch to node port ingress. For SSH, we only need to allow VPC CIDRs in private install; however, private install requires BYO subnets so those CIDRs are already allowed in the rule. See 1802f78 |
|
/verified by @tthvo |
|
@tthvo: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)
589-600:⚠️ Potential issue | 🟠 MajorMatch the actual bootstrap SSH rule instead of only
0.0.0.0/0.
pkg/asset/manifests/aws/cluster.go:37-45,156-161now buildsBootstrapSSHDescriptionfromsshRuleCidrs, so private clusters use machine/VPC CIDRs and dual-stack can add IPv6 ranges. This verifier still waits only on an IPv40.0.0.0/0source, which meansDestroyBootstrapcan return before the bootstrap SSH rule is actually gone.🔐 More robust rule matching
for _, rule := range sg.IpPermissions { if ptr.Deref(rule.ToPort, 0) != 22 { continue } for _, source := range rule.IpRanges { - if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" { + if ptr.Deref(source.Description, "") == awsmanifest.BootstrapSSHDescription { ruleDesc := ptr.Deref(source.Description, "[no description]") - logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp) + logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp) return false, nil } } + for _, source := range rule.Ipv6Ranges { + if ptr.Deref(source.Description, "") == awsmanifest.BootstrapSSHDescription { + ruleDesc := ptr.Deref(source.Description, "[no description]") + logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6) + return false, nil + } + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 589 - 600, The verifier currently only looks for an IPv4 0.0.0.0/0 CIDR when scanning sg.IpPermissions and may miss the actual bootstrap SSH rule; update the check so it matches the real bootstrap SSH rule by (1) comparing the security group rule description (source.Description) to the BootstrapSSHDescription generated in pkg/asset/manifests/aws/cluster.go and/or (2) checking both IpRanges and Ipv6Ranges against the configured sshRuleCidrs (not just "0.0.0.0/0"); keep the existing guard on rule.ToPort == 22 and return false while a matching rule (by description or any matching IPv4/IPv6 CIDR) still exists.
🧹 Nitpick comments (2)
pkg/asset/manifests/capiutils/helpers.go (1)
12-15: Prefer immutable accessors for the all-address CIDR constants.These are exported
*ipnet.IPNetpointers, making them mutable and reassignable by any caller—a risk for shared security-rule defaults. If intended to be constant sentinels, keep the values unexported and expose accessors that return copies. While refactoring, renameCidrtoCIDRfor idiomatic Go naming.♻️ Proposed refactor
var ( defaultCIDR = ipnet.MustParseCIDR("10.0.0.0/16") - // AnyIPv4CidrBlock is the CIDR block to match all IPv4 addresses. - AnyIPv4CidrBlock = ipnet.MustParseCIDR("0.0.0.0/0") - // AnyIPv6CidrBlock is the CIDR block to match all IPv6 addresses. - AnyIPv6CidrBlock = ipnet.MustParseCIDR("::/0") + anyIPv4CIDRBlock = *ipnet.MustParseCIDR("0.0.0.0/0") + anyIPv6CIDRBlock = *ipnet.MustParseCIDR("::/0") ) + +// AnyIPv4CIDRBlock returns the CIDR block that matches all IPv4 addresses. +func AnyIPv4CIDRBlock() ipnet.IPNet { + return anyIPv4CIDRBlock +} + +// AnyIPv6CIDRBlock returns the CIDR block that matches all IPv6 addresses. +func AnyIPv6CIDRBlock() ipnet.IPNet { + return anyIPv6CIDRBlock +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/manifests/capiutils/helpers.go` around lines 12 - 15, The exported pointer constants AnyIPv4CidrBlock and AnyIPv6CidrBlock are mutable; make them unexported package-level variables (e.g., anyIPv4CIDR, anyIPv6CIDR) and add exported accessor functions AnyIPv4CIDR() and AnyIPv6CIDR() that return copies (value or cloned *ipnet.IPNet) so callers cannot mutate shared state; also rename any identifier suffix "Cidr" to "CIDR" for idiomatic Go (update AnyIPv4CidrBlock/AnyIPv6CidrBlock references to the new names and use the new accessor functions throughout the codebase).pkg/infrastructure/clusterapi/types.go (1)
92-105: Normalize partialIgnitionOutputs before threading them forward.
ApplyIgnitionEditscurrently assumes everyIgnitionEditFuncre-populates all three blobs. A future bootstrap-only edit that returns justUpdatedBootstrapIgnwill drop master/worker data from the final output, andnilis treated as a valid next state. Filling unset fields from the current input here would make the new exported API much harder to misuse.♻️ Possible hardening
for _, edit := range edits { result, err := edit(ctx, in) if err != nil { return nil, err } + if result == nil { + result = output + } + if result.UpdatedBootstrapIgn == nil { + result.UpdatedBootstrapIgn = in.BootstrapIgnData + } + if result.UpdatedMasterIgn == nil { + result.UpdatedMasterIgn = in.MasterIgnData + } + if result.UpdatedWorkerIgn == nil { + result.UpdatedWorkerIgn = in.WorkerIgnData + } output = result in = in.WithOutput(result) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/clusterapi/types.go` around lines 92 - 105, ApplyIgnitionEdits assumes each IgnitionEditFunc returns a fully-populated *IgnitionOutput; instead, normalize any partial results by copying missing fields from the current output before using them as the next state. Inside ApplyIgnitionEdits (working with types IgnitionOutput, IgnitionEditFunc and the in.WithOutput call), after receiving result, set any zero/empty UpdatedBootstrapIgn, UpdatedMasterIgn or UpdatedWorkerIgn fields on result from the current output variable (or from in if appropriate), then assign output = result and update in = in.WithOutput(result) so partial outputs don’t drop existing blobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 371-385: The polling condition inside wait.PollUntilContextTimeout
in aws.go should not return an error when CAPA fields are not yet populated;
change the branches that now return fmt.Errorf for missing
awsCluster.Spec.NetworkSpec.VPC.IPv6.CidrBlock and empty
awsCluster.Status.Network.SecurityGroups[sgRole].ID to return (false, nil) so
the poller keeps retrying until values appear (keep returning an error only for
real failures like cl.Get errors). Also update the isSSHRuleGone verifier to
detect bootstrap SSH rules using the actual CIDRs used at bootstrap (check both
private IPv4 and IPv6 CIDRs from sshRuleCidrs defined in cluster.go) instead of
only 0.0.0.0/0 IPv4; ensure it checks presence/absence for both IPv4 and IPv6
rules on port 22 so it does not prematurely report success.
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-372: The function updateMachineNetworks should fail fast when
the embedded cluster-config payload or its key is missing: after locating the
file with Path == clusterConfigDataFile and decoding/raw-unmarshalling into
configCM, check that configCM.Data is non-nil and that the clusterConfigCMKey
entry exists and is non-empty before calling yaml.Unmarshal into
types.InstallConfig; if the key is missing or empty, return a descriptive error
(e.g., "cluster-config ConfigMap missing key clusterConfigCMKey") instead of
proceeding to marshal an empty installConfig; ensure any early returns include
context about updateMachineNetworks and reference clusterConfigDataFile and
clusterConfigCMKey so callers can detect the failure.
---
Outside diff comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 589-600: The verifier currently only looks for an IPv4 0.0.0.0/0
CIDR when scanning sg.IpPermissions and may miss the actual bootstrap SSH rule;
update the check so it matches the real bootstrap SSH rule by (1) comparing the
security group rule description (source.Description) to the
BootstrapSSHDescription generated in pkg/asset/manifests/aws/cluster.go and/or
(2) checking both IpRanges and Ipv6Ranges against the configured sshRuleCidrs
(not just "0.0.0.0/0"); keep the existing guard on rule.ToPort == 22 and return
false while a matching rule (by description or any matching IPv4/IPv6 CIDR)
still exists.
---
Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 12-15: The exported pointer constants AnyIPv4CidrBlock and
AnyIPv6CidrBlock are mutable; make them unexported package-level variables
(e.g., anyIPv4CIDR, anyIPv6CIDR) and add exported accessor functions
AnyIPv4CIDR() and AnyIPv6CIDR() that return copies (value or cloned
*ipnet.IPNet) so callers cannot mutate shared state; also rename any identifier
suffix "Cidr" to "CIDR" for idiomatic Go (update
AnyIPv4CidrBlock/AnyIPv6CidrBlock references to the new names and use the new
accessor functions throughout the codebase).
In `@pkg/infrastructure/clusterapi/types.go`:
- Around line 92-105: ApplyIgnitionEdits assumes each IgnitionEditFunc returns a
fully-populated *IgnitionOutput; instead, normalize any partial results by
copying missing fields from the current output before using them as the next
state. Inside ApplyIgnitionEdits (working with types IgnitionOutput,
IgnitionEditFunc and the in.WithOutput call), after receiving result, set any
zero/empty UpdatedBootstrapIgn, UpdatedMasterIgn or UpdatedWorkerIgn fields on
result from the current output variable (or from in if appropriate), then assign
output = result and update in = in.WithOutput(result) so partial outputs don’t
drop existing blobs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e06a4ef9-6b2e-40cc-b5c4-23efa935ab5b
📒 Files selected for processing (8)
pkg/asset/manifests/aws/cluster.gopkg/asset/manifests/capiutils/helpers.gopkg/infrastructure/aws/clusterapi/aws.gopkg/infrastructure/aws/clusterapi/ignition.gopkg/infrastructure/azure/ignition.gopkg/infrastructure/clusterapi/ignition.gopkg/infrastructure/clusterapi/types.gopkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/infrastructure/azure/ignition.go
|
@tthvo: This pull request references CORS-4072 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test golint |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)
584-598:⚠️ Potential issue | 🟠 MajorMatch the bootstrap SSH rule by description, not only by open CIDRs.
pkg/asset/manifests/aws/cluster.goLines 156-161 now build the bootstrap SSH rule fromsshRuleCidrs, which is machine-network CIDRs on private installs. This check can return success while the actual bootstrap SSH rule is still present, so bootstrap destroy stops polling too early and can leave the stale port-22 rule behind.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 584 - 598, The current ingress-check loops in aws.go that iterate rule.IpRanges and rule.Ipv6Ranges only look for fully open CIDRs (0.0.0.0/0 and ::/0) and can falsely consider the bootstrap SSH rule gone when the rule uses machine-network CIDRs; update the logic in the loops that inspect rule.IpRanges and rule.Ipv6Ranges to also detect the bootstrap SSH rule by its description (compare ptr.Deref(source.Description, "") against the same description/string used when creating the bootstrap SSH rule in pkg/asset/manifests/aws/cluster.go), and treat a matching description as “still present” (return false, nil) even if the CIDR differs; ensure both IPv4 and IPv6 branches check source.Description and preserve the existing CIDR checks.
🧹 Nitpick comments (1)
pkg/infrastructure/clusterapi/ignition.go (1)
301-326: Remove unusedplatformparameter.The
platformparameter is declared but never used in the function body, unlike the similarEditIgnitionForCustomDNSfunction which actively uses it. Remove it or add intended platform-specific logic.♻️ Suggested removal
-func EditIgnitionForDualStack(in IgnitionInput, platform string, machineNetworks []types.MachineNetworkEntry) (*IgnitionOutput, error) { +func EditIgnitionForDualStack(in IgnitionInput, machineNetworks []types.MachineNetworkEntry) (*IgnitionOutput, error) {This requires updating the call site in
pkg/infrastructure/aws/clusterapi/ignition.go:132to remove theawstypes.Nameargument.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/clusterapi/ignition.go` around lines 301 - 326, The EditIgnitionForDualStack function declares an unused parameter platform; remove the platform parameter from the function signature and update all call sites to stop passing a platform value (e.g., the AWS-specific caller that currently passes an awstypes.Name); adjust any imports/usages accordingly, recompile and run tests to ensure there are no remaining references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 245-253: The IPv6 node-port SG patch (the check using enableIPv6,
capiutils.MachineCIDRsFromInstallConfig and capiutils.GetIPv6CIDRs that calls
updateNodePortIngressRules) must be executed regardless of the
UserProvisionedDNS/DNS-only early return; move that if-block so it runs before
the UserProvisionedDNS check (i.e., evaluate if enableIPv6 &&
len(capiutils.GetIPv6CIDRs(capiutils.MachineCIDRsFromInstallConfig(in.InstallConfig)))
== 0 and call updateNodePortIngressRules(ctx, in.Client, in.InstallConfig,
in.InfraID) returning any error) and then keep the existing UserProvisionedDNS
branch (in.InstallConfig.Config.AWS.UserProvisionedDNS) afterwards so
InfraReady’s early return does not prevent the IPv6 SG patch from running.
In `@pkg/infrastructure/aws/clusterapi/ignition.go`:
- Around line 119-123: The code currently calls ipnet.MustParseCIDR on
vpcSpec.IPv6.CidrBlock which will panic on malformed controller-populated CIDRs;
change this to use ipnet.ParseCIDR (or the library's non-panicking parser) when
building ipv6Entry and handle the returned error instead of panicking—e.g.,
validate vpcSpec.IPv6.CidrBlock before appending to ipv6Entry and return or
propagate an error from the surrounding function (the code that constructs
machineNetworks / ipv6Entry in ignition.go) so callers can handle invalid CIDRs
safely.
---
Duplicate comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 584-598: The current ingress-check loops in aws.go that iterate
rule.IpRanges and rule.Ipv6Ranges only look for fully open CIDRs (0.0.0.0/0 and
::/0) and can falsely consider the bootstrap SSH rule gone when the rule uses
machine-network CIDRs; update the logic in the loops that inspect rule.IpRanges
and rule.Ipv6Ranges to also detect the bootstrap SSH rule by its description
(compare ptr.Deref(source.Description, "") against the same description/string
used when creating the bootstrap SSH rule in
pkg/asset/manifests/aws/cluster.go), and treat a matching description as “still
present” (return false, nil) even if the CIDR differs; ensure both IPv4 and IPv6
branches check source.Description and preserve the existing CIDR checks.
---
Nitpick comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 301-326: The EditIgnitionForDualStack function declares an unused
parameter platform; remove the platform parameter from the function signature
and update all call sites to stop passing a platform value (e.g., the
AWS-specific caller that currently passes an awstypes.Name); adjust any
imports/usages accordingly, recompile and run tests to ensure there are no
remaining references.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 18403ca0-136a-4343-855d-b85b8809d252
📒 Files selected for processing (13)
pkg/asset/installconfig/aws/route53.gopkg/asset/machines/aws/awsmachines.gopkg/asset/machines/clusterapi.gopkg/asset/manifests/aws/cluster.gopkg/asset/manifests/aws/zones.gopkg/asset/manifests/aws/zones_test.gopkg/asset/manifests/capiutils/helpers.gopkg/infrastructure/aws/clusterapi/aws.gopkg/infrastructure/aws/clusterapi/ignition.gopkg/infrastructure/azure/ignition.gopkg/infrastructure/clusterapi/ignition.gopkg/infrastructure/clusterapi/types.gopkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/infrastructure/gcp/clusterapi/ignition.go
- pkg/asset/manifests/capiutils/helpers.go
- pkg/infrastructure/azure/ignition.go
- pkg/asset/manifests/aws/zones_test.go
|
@tthvo: This pull request references CORS-4072 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)
564-603:⚠️ Potential issue | 🟠 Major
isSSHRuleGonemay report success prematurely for private installs.The function checks for hardcoded
0.0.0.0/0and::/0CIDRs, butremoveSSHRuleidentifies bootstrap SSH rules by their description (BootstrapSSHDescription). For private installs, bootstrap SSH rules use VPC/subnet CIDRs (fromsshRuleCidrs), not0.0.0.0/0. This meansisSSHRuleGonewill returntruebefore the actual bootstrap SSH rules are deleted.Consider matching by
BootstrapSSHDescription(consistent withremoveSSHRule) instead of hardcoded CIDRs:🛠️ Proposed fix to match rules by description
func isSSHRuleGone(ctx context.Context, client *ec2.Client, sgID string) (bool, error) { // ... existing SG fetch code ... sg := sgs[0] for _, rule := range sg.IpPermissions { if ptr.Deref(rule.ToPort, 0) != 22 { continue } - // Check IPv4 rules + // Check IPv4 rules for bootstrap SSH by description for _, source := range rule.IpRanges { - if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" { - ruleDesc := ptr.Deref(source.Description, "[no description]") - logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp) + if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) { + logrus.Debugf("Found bootstrap SSH ingress rule with IPv4 CIDR %s. Still waiting for deletion...", ptr.Deref(source.CidrIp, "")) return false, nil } } - // Check IPv6 rules + // Check IPv6 rules for bootstrap SSH by description for _, source := range rule.Ipv6Ranges { - if source.CidrIpv6 != nil && *source.CidrIpv6 == "::/0" { - ruleDesc := ptr.Deref(source.Description, "[no description]") - logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6) + if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) { + logrus.Debugf("Found bootstrap SSH ingress rule with IPv6 CIDR %s. Still waiting for deletion...", ptr.Deref(source.CidrIpv6, "")) return false, nil } } } return true, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 564 - 603, isSSHRuleGone currently looks for hardcoded CIDRs (0.0.0.0/0 and ::/0) which misses bootstrap SSH rules that are identified by description in removeSSHRule; update isSSHRuleGone to detect any SSH ingress rules (ToPort == 22) that have the BootstrapSSHDescription (use the same BootstrapSSHDescription symbol) in their Description fields for both IpRanges and Ipv6Ranges (and any other permission fields you use for bootstrap rules), and treat presence of those described rules as "not gone" (return false), otherwise return true; keep the existing error handling and logging (use ptr.Deref on Description like existing code) so behavior matches removeSSHRule's identification logic.
♻️ Duplicate comments (1)
pkg/infrastructure/clusterapi/ignition.go (1)
328-378:⚠️ Potential issue | 🟠 MajorSilent success when
clusterConfigDataFileis not found may mask issues.If
clusterConfigDataFile("/opt/openshift/manifests/cluster-config.yaml") is not present inconfig.Storage.Files, the function returnsnilwithout an error. Given thatupdateMachineNetworksis called specifically to update machine networks for dual-stack, a missing cluster-config file is likely an unexpected state that should fail the install rather than silently succeed.🛠️ Proposed fix to fail when cluster-config is not found
func updateMachineNetworks(in IgnitionInput, config *igntypes.Config, machineNetworks []types.MachineNetworkEntry) error { + found := false for i, fileData := range config.Storage.Files { if fileData.Path != clusterConfigDataFile { continue } + found = true contents := strings.Split(*config.Storage.Files[i].Contents.Source, ",") // ... existing logic ... encoded := fmt.Sprintf("%s%s", header, base64.StdEncoding.EncodeToString(configCMContents)) // replace the contents with the edited information config.Storage.Files[i].Contents.Source = &encoded - break + return nil } - return nil + if !found { + return fmt.Errorf("failed to find %s in bootstrap ignition", clusterConfigDataFile) + } + return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/clusterapi/ignition.go` around lines 328 - 378, The function updateMachineNetworks currently iterates config.Storage.Files for clusterConfigDataFile and silently returns nil if not found; change it to return an explicit error when the loop finishes without locating clusterConfigDataFile so the caller knows the expected cluster-config manifest was missing. Locate updateMachineNetworks and the loop that checks fileData.Path != clusterConfigDataFile, and after the loop (or by tracking a found boolean) return a descriptive fmt.Errorf mentioning clusterConfigDataFile when it was not found instead of nil; keep all existing decoding/marshalling behavior unchanged when the file is present.
🧹 Nitpick comments (1)
pkg/asset/manifests/capiutils/helpers.go (1)
57-77: Consider renaming loop variable to avoid shadowing theipnetpackage.The loop variable
ipnetshadows the imported package name. While this works correctly here (the package isn't accessed inside the loop), it can cause confusion during maintenance.♻️ Suggested refactor
// GetIPv4CIDRs returns only IPNets of IPv4 family. func GetIPv4CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet { var ipv4Nets []ipnet.IPNet - for _, ipnet := range cidrs { - if netutils.IsIPv4CIDR(&ipnet.IPNet) { - ipv4Nets = append(ipv4Nets, ipnet) + for _, cidr := range cidrs { + if netutils.IsIPv4CIDR(&cidr.IPNet) { + ipv4Nets = append(ipv4Nets, cidr) } } return ipv4Nets } // GetIPv6CIDRs returns only IPNets of IPv6 family. func GetIPv6CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet { var ipv6Nets []ipnet.IPNet - for _, ipnet := range cidrs { - if netutils.IsIPv6CIDR(&ipnet.IPNet) { - ipv6Nets = append(ipv6Nets, ipnet) + for _, cidr := range cidrs { + if netutils.IsIPv6CIDR(&cidr.IPNet) { + ipv6Nets = append(ipv6Nets, cidr) } } return ipv6Nets }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/manifests/capiutils/helpers.go` around lines 57 - 77, The loop variable name `ipnet` shadows the imported package `ipnet`, causing confusion; in both GetIPv4CIDRs and GetIPv6CIDRs, rename the loop variable (e.g., use `n`, `netItem`, or `cidr`) and update references inside the loop (e.g., netutils.IsIPv4CIDR(&netItem.IPNet) and ipv4Nets = append(ipv4Nets, netItem)) so the code no longer shadows the package while preserving use of the ipnet.IPNet type and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 564-603: isSSHRuleGone currently looks for hardcoded CIDRs
(0.0.0.0/0 and ::/0) which misses bootstrap SSH rules that are identified by
description in removeSSHRule; update isSSHRuleGone to detect any SSH ingress
rules (ToPort == 22) that have the BootstrapSSHDescription (use the same
BootstrapSSHDescription symbol) in their Description fields for both IpRanges
and Ipv6Ranges (and any other permission fields you use for bootstrap rules),
and treat presence of those described rules as "not gone" (return false),
otherwise return true; keep the existing error handling and logging (use
ptr.Deref on Description like existing code) so behavior matches removeSSHRule's
identification logic.
---
Duplicate comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-378: The function updateMachineNetworks currently iterates
config.Storage.Files for clusterConfigDataFile and silently returns nil if not
found; change it to return an explicit error when the loop finishes without
locating clusterConfigDataFile so the caller knows the expected cluster-config
manifest was missing. Locate updateMachineNetworks and the loop that checks
fileData.Path != clusterConfigDataFile, and after the loop (or by tracking a
found boolean) return a descriptive fmt.Errorf mentioning clusterConfigDataFile
when it was not found instead of nil; keep all existing decoding/marshalling
behavior unchanged when the file is present.
---
Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 57-77: The loop variable name `ipnet` shadows the imported package
`ipnet`, causing confusion; in both GetIPv4CIDRs and GetIPv6CIDRs, rename the
loop variable (e.g., use `n`, `netItem`, or `cidr`) and update references inside
the loop (e.g., netutils.IsIPv4CIDR(&netItem.IPNet) and ipv4Nets =
append(ipv4Nets, netItem)) so the code no longer shadows the package while
preserving use of the ipnet.IPNet type and behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bd03415a-daaa-4907-8723-6655733350bb
📒 Files selected for processing (8)
pkg/asset/manifests/aws/cluster.gopkg/asset/manifests/capiutils/helpers.gopkg/infrastructure/aws/clusterapi/aws.gopkg/infrastructure/aws/clusterapi/ignition.gopkg/infrastructure/azure/ignition.gopkg/infrastructure/clusterapi/ignition.gopkg/infrastructure/clusterapi/types.gopkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/infrastructure/azure/ignition.go
|
/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177 /testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177 |
|
Phew, I think all CodeRabbit reviews are now addressed and the PR is rebased on top of latest In case anyone misses due to the long bot review, the latest comments/discussions were posted here: #10353 (comment) |
|
/test e2e-aws-ovn e2e-aws-ovn-edge-zones e2e-aws-ovn-shared-vpc-edge-zones |
1 similar comment
|
/test e2e-aws-ovn e2e-aws-ovn-edge-zones e2e-aws-ovn-shared-vpc-edge-zones |
| func(awsCluster *capa.AWSCluster, vpcIPv6CIDR string) bool { | ||
| if slices.Contains(awsCluster.Spec.NetworkSpec.NodePortIngressRuleCidrBlocks, vpcIPv6CIDR) { | ||
| logrus.Debugf("VPC IPv6 CIDR %s already in node port ingress rules", vpcIPv6CIDR) | ||
| return false |
There was a problem hiding this comment.
Shouldn't this return true also since the vpcIPv6CIDR is already present in awsCluster.Spec.NetworkSpec.NodePortIngressRuleCidrBlocks?
There was a problem hiding this comment.
Or is it supposed to indicate if NodePortIngressRuleCidrBlocks needs to be updated?
There was a problem hiding this comment.
Right, the bool result indicates whether a change was made to the awscluster. If false, it will skip to check with AWS API instead of calling Update to the local envtest.
There was a problem hiding this comment.
On line 377:
// Update the spec using the provided function
if updateSpecFn(awsCluster, vpcIPv6CIDR) {
// Update the AWSCluster resource
if err := cl.Update(ctx, awsCluster); err != nil {|
@tthvo: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Note: Defining a non-nil (but empty) vpc.ipv6 block will configure cAPA to provision the network infrastructure with IPv6 capabilities.
The machines need configuring to: - Use resource-name hostname type in order to have A and AAAA records for its hostname. - Assign a primary IPv6 address in order to register with IPv6 target group when dualstack is IPv6 primary. - Enable IPv6 IMDS HTTP endpoint.
The VPC IPv6 CIDR is not available at install time (when the installer provisions the VPC). Thus, we need to add it in the node port rules later on and ensure the rules are available.
The etcd operator requires IPv6 machine networks in dualstack networking. Since we may not know the VPC IPV6 at install time, we need to add it in later on once the infrastructure is ready.
|
@tthvo: This pull request references CORS-4072 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)
584-599:⚠️ Potential issue | 🟠 MajorMatch bootstrap SSH rules by description, not by
0.0.0.0/0/::/0.
removeSSHRuleremoves entries byawsmanifest.BootstrapSSHDescription, but this verifier only waits on world-open CIDRs. In private or BYO-subnet installs the bootstrap SSH rule can be scoped to cluster/VPC CIDRs instead, so this can returntruebefore the bootstrap SSH rule has actually been removed. Check bothIpRangesandIpv6Rangesby description to keep the verifier aligned with the removal logic.Proposed fix
for _, rule := range sg.IpPermissions { if ptr.Deref(rule.ToPort, 0) != 22 { continue } // Check IPv4 rules for _, source := range rule.IpRanges { - if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" { - ruleDesc := ptr.Deref(source.Description, "[no description]") - logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp) + if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) { + logrus.Debugf("Found bootstrap SSH ingress rule with source cidr %s. Still waiting for deletion...", ptr.Deref(source.CidrIp, "")) return false, nil } } // Check IPv6 rules for _, source := range rule.Ipv6Ranges { - if source.CidrIpv6 != nil && *source.CidrIpv6 == "::/0" { - ruleDesc := ptr.Deref(source.Description, "[no description]") - logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6) + if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) { + logrus.Debugf("Found bootstrap SSH ingress rule with source cidr %s. Still waiting for deletion...", ptr.Deref(source.CidrIpv6, "")) return false, nil } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 584 - 599, The verifier currently only checks for world-open CIDRs when scanning rule.IpRanges and rule.Ipv6Ranges, which can miss bootstrap SSH rules that are scoped to non-world CIDRs; update the checks in the loops that iterate rule.IpRanges and rule.Ipv6Ranges to look for source.Description matching awsmanifest.BootstrapSSHDescription (use ptr.Deref or nil-safe compare) and return false if such a description is found (so the verifier aligns with removeSSHRule which deletes by awsmanifest.BootstrapSSHDescription); preserve or remove the CIDR checks as desired, but ensure the description-based check is present for both IpRanges and Ipv6Ranges.
🧹 Nitpick comments (1)
pkg/asset/manifests/capiutils/helpers.go (1)
58-76: Minor: Variable shadows imported package name.The loop variable
ipnetshadows the importedgithub.com/openshift/installer/pkg/ipnetpackage. While this doesn't cause functional issues since the package isn't used inside the loop body, renaming tocidrwould improve clarity.♻️ Suggested improvement
func GetIPv4CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet { var ipv4Nets []ipnet.IPNet - for _, ipnet := range cidrs { - if netutils.IsIPv4CIDR(&ipnet.IPNet) { - ipv4Nets = append(ipv4Nets, ipnet) + for _, cidr := range cidrs { + if netutils.IsIPv4CIDR(&cidr.IPNet) { + ipv4Nets = append(ipv4Nets, cidr) } } return ipv4Nets } func GetIPv6CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet { var ipv6Nets []ipnet.IPNet - for _, ipnet := range cidrs { - if netutils.IsIPv6CIDR(&ipnet.IPNet) { - ipv6Nets = append(ipv6Nets, ipnet) + for _, cidr := range cidrs { + if netutils.IsIPv6CIDR(&cidr.IPNet) { + ipv6Nets = append(ipv6Nets, cidr) } } return ipv6Nets }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/manifests/capiutils/helpers.go` around lines 58 - 76, The loop variable name `ipnet` in GetIPv4CIDRs and GetIPv6CIDRs shadows the imported ipnet package; rename the loop variable (e.g., to `cidr`) in both functions and update all uses inside the loops (references to `ipnet.IPNet` -> `cidr.IPNet`) to avoid shadowing and improve clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-377: The updateMachineNetworks function currently returns nil
when no file with path clusterConfigDataFile is found in config.Storage.Files,
causing silent success; modify updateMachineNetworks to detect when the loop
does not find clusterConfigDataFile (e.g., set a found flag or return after
handling) and return a descriptive error if the file is missing so callers know
the ignition lacked the expected cluster-config file; reference the loop over
config.Storage.Files, the clusterConfigDataFile constant and the break after
updating config.Storage.Files[i].Contents.Source to locate where to add this
check.
---
Duplicate comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 584-599: The verifier currently only checks for world-open CIDRs
when scanning rule.IpRanges and rule.Ipv6Ranges, which can miss bootstrap SSH
rules that are scoped to non-world CIDRs; update the checks in the loops that
iterate rule.IpRanges and rule.Ipv6Ranges to look for source.Description
matching awsmanifest.BootstrapSSHDescription (use ptr.Deref or nil-safe compare)
and return false if such a description is found (so the verifier aligns with
removeSSHRule which deletes by awsmanifest.BootstrapSSHDescription); preserve or
remove the CIDR checks as desired, but ensure the description-based check is
present for both IpRanges and Ipv6Ranges.
---
Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 58-76: The loop variable name `ipnet` in GetIPv4CIDRs and
GetIPv6CIDRs shadows the imported ipnet package; rename the loop variable (e.g.,
to `cidr`) in both functions and update all uses inside the loops (references to
`ipnet.IPNet` -> `cidr.IPNet`) to avoid shadowing and improve clarity.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 31d01973-45b0-409a-b123-1fcf7db806c4
📒 Files selected for processing (13)
pkg/asset/installconfig/aws/route53.gopkg/asset/machines/aws/awsmachines.gopkg/asset/machines/clusterapi.gopkg/asset/manifests/aws/cluster.gopkg/asset/manifests/aws/zones.gopkg/asset/manifests/aws/zones_test.gopkg/asset/manifests/capiutils/helpers.gopkg/infrastructure/aws/clusterapi/aws.gopkg/infrastructure/aws/clusterapi/ignition.gopkg/infrastructure/azure/ignition.gopkg/infrastructure/clusterapi/ignition.gopkg/infrastructure/clusterapi/types.gopkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/infrastructure/azure/ignition.go
- pkg/asset/machines/clusterapi.go
|
/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177 /testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177 |
Description
This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.
There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.
This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243
Setup Instructions
Use the below
install-configsnippet to configure networking and AWS platform.Note:
machineNetworkdoes not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.Dualstack IPv4 Primary:
Dualstack IPv6 Primary:
Summary by CodeRabbit
New Features
Refactor
Tests