Skip to content

CORS-4072: dualstack support for AWS#10353

Open
tthvo wants to merge 5 commits intoopenshift:mainfrom
tthvo:CORS-4076
Open

CORS-4072: dualstack support for AWS#10353
tthvo wants to merge 5 commits intoopenshift:mainfrom
tthvo:CORS-4076

Conversation

@tthvo
Copy link
Member

@tthvo tthvo commented Mar 4, 2026

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd01::/48
    hostPrefix: 64
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
  - fd02::/112
platform:
  aws:
    region: us-east-1
    ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
  clusterNetwork:
  - cidr: fd01::/48
    hostPrefix: 64
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - fd02::/112
  - 172.30.0.0/16
platform:
  aws:
    region: us-east-1
    ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Summary by CodeRabbit

  • New Features

    • Optional AAAA (IPv6) DNS records and batching of multiple DNS record changes
    • IPv6/dual‑stack support across machines, VPCs, subnets, load balancers, and API target group IP type
    • Helpers to extract and filter machine CIDRs by IP family
  • Refactor

    • Centralized, chainable ignition editing with improved dual‑stack and custom‑DNS handling
    • IPv6-aware updates for security groups, SSH rules, and node‑port ingress rules
  • Tests

    • Added dual‑stack VPC and subnet test coverage

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 4, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 4, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077,CORS-4167,CORS-4168, and CORS-4243

Install-config Snippet

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/cc @sadasu @nrb @patrickdillon

@openshift-ci openshift-ci bot requested review from nrb, patrickdillon and sadasu March 4, 2026 02:18
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 4, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Install-config Snippet

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 4, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177 openshift/machine-api-provider-aws#171

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177 openshift/machine-api-provider-aws#171

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/payload-job periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-aws-usgov-ipi-private-ep-fips-f7

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-aws-usgov-ipi-private-ep-fips-f7

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/70bcb810-1773-11f1-831b-7034f4f09cdc-0

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/label platform/aws

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2026

/test golint

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I had a few questions, particularly about why we need the node port changes.

The code changes for that and SSH look good, but it's unfortunate we need to do that. I suppose, particularly for SSH, it does not make sense to use Security Group Roles? i.e. we instruct users to add the role to the installer host

I'm less sure about node port... if we do need it, seems like Security Group Role might be ok there?

@patrickdillon
Copy link
Contributor

patrickdillon commented Mar 5, 2026

Also, I don't really get how install is copmleting in the test with MAPI PR: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-installer-10353-openshift-machine-api-provider-aws-177-e2e-aws-ovn-dualstack-ipv4-primary-techpreview/2029358234887786496

I thought we were blocked by ingress on this. I know previously you were using the node port hack to get around the ingress issue; and there's still nodeport changes in this pr, but I thought those were just security group rules. Seems like I'm missing something

@tthvo
Copy link
Member Author

tthvo commented Mar 6, 2026

I'm less sure about node port... if we do need it, seems like Security Group Role might be ok there?

The security group already have rules to allow node ports (30000-32767) to node and controlplane role because we define it in CNI rules. This means traffic is allowed between nodes.

However, we also need to allow traffic from the load balancers created by CCM for LoadBalancer type Services, for example, default ingress router. These load balancers do not have "role", so all we can do is to allow traffic from VPC/subnet CIDRs (or all IPs)...

I suppose, particularly for SSH, it does not make sense to use Security Group Roles? i.e. we instruct users to add the role to the installer host

Right, unfortunately, the roles are specifically for cluster components (i.e. nodes, load balancers) so we can't really add one to installer host. Also, a role is corresponding a security group, which means the installer host may be allowed to access ports (for example, 30000-32767) more than intended.

@tthvo
Copy link
Member Author

tthvo commented Mar 6, 2026

I thought we were blocked by ingress on this. I know previously you were using the node port hack to get around the ingress issue; and there's still nodeport changes in this pr, but I thought those were just security group rules. Seems like I'm missing something

Right, we are still blocked by ingress and CCM as of today. However, in dualstack IPv4 primary, the following happens:

  • The openshift-ingress/router-default Service is created as ipFamily: IPv4 (by default) due to no dualstack support yet from ingress and CCM. There is only an IPv4 ClusterIP address assigned to the Service.
  • The ingress Load Balancer is created as NLB and is IPv4 ip type.
  • There is an A record for the ingress LB in route53 hosted zone.

This means IPv4 traffic will flow to routers just fine even though IPv6 is not supported. To the cluster operators, the setup is the same as a regular singlestack IPv4 since they only care if the router is reachable (regardless of what IP family). Thus, the install completed.

For dualstack IPv6 primary, the openshift-ingress/router-default Service has ipFamily: IPv6 by default and the NLB is IPv4. Because there is no conversion from IPv4 LB traffic to IPv6 clusterIP, the routers are unreachable, causing the install to fail.

That being said, it's a "false positive" that the install succeeded :D

@patrickdillon
Copy link
Contributor

However, we also need to allow traffic from the load balancers created by CCM for LoadBalancer type Services, for example, default ingress router. These load balancers do not have "role", so all we can do is to allow traffic from VPC/subnet CIDRs (or all IPs)...

I see! Thinking about this more, as I mentioned above, this should actually be handled by the ingress component... Technically we should not have to deal with this.

In the case that we actually do need to do something like this, I do wonder if we could pre-provision, or use a determnistic name for a security group, so that ingress creates the LB in a security group we have pre-created rules for.

@tthvo
Copy link
Member Author

tthvo commented Mar 6, 2026

In the case that we actually do need to do something like this, I do wonder if we could pre-provision, or use a determnistic name for a security group, so that ingress creates the LB in a security group we have pre-created rules for.

Ahh yess, the installer can pre-create an additional security group with only allowed node port ingress rule (30000-32767) and attach it to cluster machines, similar to AdditionalSecurityGroupIDs feature.

The only issue is that 30000-32767 is still open to all IPs if NodePortIngressRuleCidrBlocks is left unset. From #10353 (comment), I suppose this behaviour won't go away soon as it's needed for other supported scenarios that other non-ocp projects/products are doing.

@patrickdillon
Copy link
Contributor

patrickdillon commented Mar 9, 2026

Regarding ingress services/nodeport, I realize now I was assuming that aws would have the same behavior as cloud-provider-gcp, but it does not. So sorry for confusing things. In gcp, the cloud provider will handle opening all the ports for an ingress service, but that does not happen in gcp aws.

@patrickdillon
Copy link
Contributor

The only issue is that 30000-32767 is still open to all IPs if NodePortIngressRuleCidrBlocks is left unset. From #10353 (comment), I suppose this behaviour won't go away soon as it's needed for other supported scenarios that other non-ocp projects/products are doing.

Right, I was thinking we would leave NodePortIngressRuleCidrBlocks in place for ipv4, and for ipv6 we could use the security group. The only advantage to my suggestion is that we don't have to jump through hoops during provisioning to determine the cidr and update the manifests on the fly.

That said, the current approach looks good and ingress isn't even sorted out, so maybe that is just something we want to consider once ingress is in place, as it might help simplify our code a bit.

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

ahh sorry, CAPA implementation auto-adds a ::/0 if NodePortIngressRuleCidrBlocks doesn't include any IPv6 entry and IPv6 is enabled (i.e. done to be consistent with existing IPv4 handling). So, in this case, ::/0 is still open because we don't pass in an IPv6 CIDR. See #10353 (comment).

I guess if we can use an openshift fork of CAPA, we can diverge from upstream a bit to restrict the node port service ingress to the VPC CIDRs (i.e instead of all IPs) by default if left unset. That would solve all the concern and make things so much simpler 😅

For now, I simplied the changes a bit to only apply the patch to node port ingress. For SSH, we only need to allow VPC CIDRs in private install; however, private install requires BYO subnets so those CIDRs are already allowed in the rule. See 1802f78

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/verified by @tthvo

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 10, 2026
@openshift-ci-robot
Copy link
Contributor

@tthvo: This PR has been marked as verified by @tthvo.

Details

In response to this:

/verified by @tthvo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)

589-600: ⚠️ Potential issue | 🟠 Major

Match the actual bootstrap SSH rule instead of only 0.0.0.0/0.

pkg/asset/manifests/aws/cluster.go:37-45,156-161 now builds BootstrapSSHDescription from sshRuleCidrs, so private clusters use machine/VPC CIDRs and dual-stack can add IPv6 ranges. This verifier still waits only on an IPv4 0.0.0.0/0 source, which means DestroyBootstrap can return before the bootstrap SSH rule is actually gone.

🔐 More robust rule matching
for _, rule := range sg.IpPermissions {
    if ptr.Deref(rule.ToPort, 0) != 22 {
        continue
    }
    for _, source := range rule.IpRanges {
-       if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" {
+       if ptr.Deref(source.Description, "") == awsmanifest.BootstrapSSHDescription {
            ruleDesc := ptr.Deref(source.Description, "[no description]")
-           logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp)
+           logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp)
            return false, nil
        }
    }
+   for _, source := range rule.Ipv6Ranges {
+       if ptr.Deref(source.Description, "") == awsmanifest.BootstrapSSHDescription {
+           ruleDesc := ptr.Deref(source.Description, "[no description]")
+           logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6)
+           return false, nil
+       }
+   }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 589 - 600, The
verifier currently only looks for an IPv4 0.0.0.0/0 CIDR when scanning
sg.IpPermissions and may miss the actual bootstrap SSH rule; update the check so
it matches the real bootstrap SSH rule by (1) comparing the security group rule
description (source.Description) to the BootstrapSSHDescription generated in
pkg/asset/manifests/aws/cluster.go and/or (2) checking both IpRanges and
Ipv6Ranges against the configured sshRuleCidrs (not just "0.0.0.0/0"); keep the
existing guard on rule.ToPort == 22 and return false while a matching rule (by
description or any matching IPv4/IPv6 CIDR) still exists.
🧹 Nitpick comments (2)
pkg/asset/manifests/capiutils/helpers.go (1)

12-15: Prefer immutable accessors for the all-address CIDR constants.

These are exported *ipnet.IPNet pointers, making them mutable and reassignable by any caller—a risk for shared security-rule defaults. If intended to be constant sentinels, keep the values unexported and expose accessors that return copies. While refactoring, rename Cidr to CIDR for idiomatic Go naming.

♻️ Proposed refactor
 var (
 	defaultCIDR = ipnet.MustParseCIDR("10.0.0.0/16")
-	// AnyIPv4CidrBlock is the CIDR block to match all IPv4 addresses.
-	AnyIPv4CidrBlock = ipnet.MustParseCIDR("0.0.0.0/0")
-	// AnyIPv6CidrBlock is the CIDR block to match all IPv6 addresses.
-	AnyIPv6CidrBlock = ipnet.MustParseCIDR("::/0")
+	anyIPv4CIDRBlock = *ipnet.MustParseCIDR("0.0.0.0/0")
+	anyIPv6CIDRBlock = *ipnet.MustParseCIDR("::/0")
 )
+
+// AnyIPv4CIDRBlock returns the CIDR block that matches all IPv4 addresses.
+func AnyIPv4CIDRBlock() ipnet.IPNet {
+	return anyIPv4CIDRBlock
+}
+
+// AnyIPv6CIDRBlock returns the CIDR block that matches all IPv6 addresses.
+func AnyIPv6CIDRBlock() ipnet.IPNet {
+	return anyIPv6CIDRBlock
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/manifests/capiutils/helpers.go` around lines 12 - 15, The exported
pointer constants AnyIPv4CidrBlock and AnyIPv6CidrBlock are mutable; make them
unexported package-level variables (e.g., anyIPv4CIDR, anyIPv6CIDR) and add
exported accessor functions AnyIPv4CIDR() and AnyIPv6CIDR() that return copies
(value or cloned *ipnet.IPNet) so callers cannot mutate shared state; also
rename any identifier suffix "Cidr" to "CIDR" for idiomatic Go (update
AnyIPv4CidrBlock/AnyIPv6CidrBlock references to the new names and use the new
accessor functions throughout the codebase).
pkg/infrastructure/clusterapi/types.go (1)

92-105: Normalize partial IgnitionOutputs before threading them forward.

ApplyIgnitionEdits currently assumes every IgnitionEditFunc re-populates all three blobs. A future bootstrap-only edit that returns just UpdatedBootstrapIgn will drop master/worker data from the final output, and nil is treated as a valid next state. Filling unset fields from the current input here would make the new exported API much harder to misuse.

♻️ Possible hardening
for _, edit := range edits {
    result, err := edit(ctx, in)
    if err != nil {
        return nil, err
    }
+   if result == nil {
+       result = output
+   }
+   if result.UpdatedBootstrapIgn == nil {
+       result.UpdatedBootstrapIgn = in.BootstrapIgnData
+   }
+   if result.UpdatedMasterIgn == nil {
+       result.UpdatedMasterIgn = in.MasterIgnData
+   }
+   if result.UpdatedWorkerIgn == nil {
+       result.UpdatedWorkerIgn = in.WorkerIgnData
+   }
    output = result
    in = in.WithOutput(result)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/clusterapi/types.go` around lines 92 - 105,
ApplyIgnitionEdits assumes each IgnitionEditFunc returns a fully-populated
*IgnitionOutput; instead, normalize any partial results by copying missing
fields from the current output before using them as the next state. Inside
ApplyIgnitionEdits (working with types IgnitionOutput, IgnitionEditFunc and the
in.WithOutput call), after receiving result, set any zero/empty
UpdatedBootstrapIgn, UpdatedMasterIgn or UpdatedWorkerIgn fields on result from
the current output variable (or from in if appropriate), then assign output =
result and update in = in.WithOutput(result) so partial outputs don’t drop
existing blobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 371-385: The polling condition inside wait.PollUntilContextTimeout
in aws.go should not return an error when CAPA fields are not yet populated;
change the branches that now return fmt.Errorf for missing
awsCluster.Spec.NetworkSpec.VPC.IPv6.CidrBlock and empty
awsCluster.Status.Network.SecurityGroups[sgRole].ID to return (false, nil) so
the poller keeps retrying until values appear (keep returning an error only for
real failures like cl.Get errors). Also update the isSSHRuleGone verifier to
detect bootstrap SSH rules using the actual CIDRs used at bootstrap (check both
private IPv4 and IPv6 CIDRs from sshRuleCidrs defined in cluster.go) instead of
only 0.0.0.0/0 IPv4; ensure it checks presence/absence for both IPv4 and IPv6
rules on port 22 so it does not prematurely report success.

In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-372: The function updateMachineNetworks should fail fast when
the embedded cluster-config payload or its key is missing: after locating the
file with Path == clusterConfigDataFile and decoding/raw-unmarshalling into
configCM, check that configCM.Data is non-nil and that the clusterConfigCMKey
entry exists and is non-empty before calling yaml.Unmarshal into
types.InstallConfig; if the key is missing or empty, return a descriptive error
(e.g., "cluster-config ConfigMap missing key clusterConfigCMKey") instead of
proceeding to marshal an empty installConfig; ensure any early returns include
context about updateMachineNetworks and reference clusterConfigDataFile and
clusterConfigCMKey so callers can detect the failure.

---

Outside diff comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 589-600: The verifier currently only looks for an IPv4 0.0.0.0/0
CIDR when scanning sg.IpPermissions and may miss the actual bootstrap SSH rule;
update the check so it matches the real bootstrap SSH rule by (1) comparing the
security group rule description (source.Description) to the
BootstrapSSHDescription generated in pkg/asset/manifests/aws/cluster.go and/or
(2) checking both IpRanges and Ipv6Ranges against the configured sshRuleCidrs
(not just "0.0.0.0/0"); keep the existing guard on rule.ToPort == 22 and return
false while a matching rule (by description or any matching IPv4/IPv6 CIDR)
still exists.

---

Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 12-15: The exported pointer constants AnyIPv4CidrBlock and
AnyIPv6CidrBlock are mutable; make them unexported package-level variables
(e.g., anyIPv4CIDR, anyIPv6CIDR) and add exported accessor functions
AnyIPv4CIDR() and AnyIPv6CIDR() that return copies (value or cloned
*ipnet.IPNet) so callers cannot mutate shared state; also rename any identifier
suffix "Cidr" to "CIDR" for idiomatic Go (update
AnyIPv4CidrBlock/AnyIPv6CidrBlock references to the new names and use the new
accessor functions throughout the codebase).

In `@pkg/infrastructure/clusterapi/types.go`:
- Around line 92-105: ApplyIgnitionEdits assumes each IgnitionEditFunc returns a
fully-populated *IgnitionOutput; instead, normalize any partial results by
copying missing fields from the current output before using them as the next
state. Inside ApplyIgnitionEdits (working with types IgnitionOutput,
IgnitionEditFunc and the in.WithOutput call), after receiving result, set any
zero/empty UpdatedBootstrapIgn, UpdatedMasterIgn or UpdatedWorkerIgn fields on
result from the current output variable (or from in if appropriate), then assign
output = result and update in = in.WithOutput(result) so partial outputs don’t
drop existing blobs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e06a4ef9-6b2e-40cc-b5c4-23efa935ab5b

📥 Commits

Reviewing files that changed from the base of the PR and between f1e729c and 1dd313c.

📒 Files selected for processing (8)
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/capiutils/helpers.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/aws/clusterapi/ignition.go
  • pkg/infrastructure/azure/ignition.go
  • pkg/infrastructure/clusterapi/ignition.go
  • pkg/infrastructure/clusterapi/types.go
  • pkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/infrastructure/azure/ignition.go

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Mar 10, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Summary by CodeRabbit

  • New Features

  • Optional AAAA (IPv6) DNS records and support for batching multiple DNS record changes

  • IPv6 / dual-stack support across AWS machines, VPCs, subnets, load balancers, and API target group IP type

  • Helpers to extract and filter machine CIDRs by IP family

  • Refactor

  • Centralized ignition editing flow with separate paths for custom DNS and dual-stack

  • Tests

  • Added dual-stack VPC and subnet test coverage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/test golint

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)

584-598: ⚠️ Potential issue | 🟠 Major

Match the bootstrap SSH rule by description, not only by open CIDRs.

pkg/asset/manifests/aws/cluster.go Lines 156-161 now build the bootstrap SSH rule from sshRuleCidrs, which is machine-network CIDRs on private installs. This check can return success while the actual bootstrap SSH rule is still present, so bootstrap destroy stops polling too early and can leave the stale port-22 rule behind.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 584 - 598, The current
ingress-check loops in aws.go that iterate rule.IpRanges and rule.Ipv6Ranges
only look for fully open CIDRs (0.0.0.0/0 and ::/0) and can falsely consider the
bootstrap SSH rule gone when the rule uses machine-network CIDRs; update the
logic in the loops that inspect rule.IpRanges and rule.Ipv6Ranges to also detect
the bootstrap SSH rule by its description (compare ptr.Deref(source.Description,
"") against the same description/string used when creating the bootstrap SSH
rule in pkg/asset/manifests/aws/cluster.go), and treat a matching description as
“still present” (return false, nil) even if the CIDR differs; ensure both IPv4
and IPv6 branches check source.Description and preserve the existing CIDR
checks.
🧹 Nitpick comments (1)
pkg/infrastructure/clusterapi/ignition.go (1)

301-326: Remove unused platform parameter.

The platform parameter is declared but never used in the function body, unlike the similar EditIgnitionForCustomDNS function which actively uses it. Remove it or add intended platform-specific logic.

♻️ Suggested removal
-func EditIgnitionForDualStack(in IgnitionInput, platform string, machineNetworks []types.MachineNetworkEntry) (*IgnitionOutput, error) {
+func EditIgnitionForDualStack(in IgnitionInput, machineNetworks []types.MachineNetworkEntry) (*IgnitionOutput, error) {

This requires updating the call site in pkg/infrastructure/aws/clusterapi/ignition.go:132 to remove the awstypes.Name argument.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/clusterapi/ignition.go` around lines 301 - 326, The
EditIgnitionForDualStack function declares an unused parameter platform; remove
the platform parameter from the function signature and update all call sites to
stop passing a platform value (e.g., the AWS-specific caller that currently
passes an awstypes.Name); adjust any imports/usages accordingly, recompile and
run tests to ensure there are no remaining references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 245-253: The IPv6 node-port SG patch (the check using enableIPv6,
capiutils.MachineCIDRsFromInstallConfig and capiutils.GetIPv6CIDRs that calls
updateNodePortIngressRules) must be executed regardless of the
UserProvisionedDNS/DNS-only early return; move that if-block so it runs before
the UserProvisionedDNS check (i.e., evaluate if enableIPv6 &&
len(capiutils.GetIPv6CIDRs(capiutils.MachineCIDRsFromInstallConfig(in.InstallConfig)))
== 0 and call updateNodePortIngressRules(ctx, in.Client, in.InstallConfig,
in.InfraID) returning any error) and then keep the existing UserProvisionedDNS
branch (in.InstallConfig.Config.AWS.UserProvisionedDNS) afterwards so
InfraReady’s early return does not prevent the IPv6 SG patch from running.

In `@pkg/infrastructure/aws/clusterapi/ignition.go`:
- Around line 119-123: The code currently calls ipnet.MustParseCIDR on
vpcSpec.IPv6.CidrBlock which will panic on malformed controller-populated CIDRs;
change this to use ipnet.ParseCIDR (or the library's non-panicking parser) when
building ipv6Entry and handle the returned error instead of panicking—e.g.,
validate vpcSpec.IPv6.CidrBlock before appending to ipv6Entry and return or
propagate an error from the surrounding function (the code that constructs
machineNetworks / ipv6Entry in ignition.go) so callers can handle invalid CIDRs
safely.

---

Duplicate comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 584-598: The current ingress-check loops in aws.go that iterate
rule.IpRanges and rule.Ipv6Ranges only look for fully open CIDRs (0.0.0.0/0 and
::/0) and can falsely consider the bootstrap SSH rule gone when the rule uses
machine-network CIDRs; update the logic in the loops that inspect rule.IpRanges
and rule.Ipv6Ranges to also detect the bootstrap SSH rule by its description
(compare ptr.Deref(source.Description, "") against the same description/string
used when creating the bootstrap SSH rule in
pkg/asset/manifests/aws/cluster.go), and treat a matching description as “still
present” (return false, nil) even if the CIDR differs; ensure both IPv4 and IPv6
branches check source.Description and preserve the existing CIDR checks.

---

Nitpick comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 301-326: The EditIgnitionForDualStack function declares an unused
parameter platform; remove the platform parameter from the function signature
and update all call sites to stop passing a platform value (e.g., the
AWS-specific caller that currently passes an awstypes.Name); adjust any
imports/usages accordingly, recompile and run tests to ensure there are no
remaining references.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 18403ca0-136a-4343-855d-b85b8809d252

📥 Commits

Reviewing files that changed from the base of the PR and between 1dd313c and e47210e.

📒 Files selected for processing (13)
  • pkg/asset/installconfig/aws/route53.go
  • pkg/asset/machines/aws/awsmachines.go
  • pkg/asset/machines/clusterapi.go
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/aws/zones.go
  • pkg/asset/manifests/aws/zones_test.go
  • pkg/asset/manifests/capiutils/helpers.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/aws/clusterapi/ignition.go
  • pkg/infrastructure/azure/ignition.go
  • pkg/infrastructure/clusterapi/ignition.go
  • pkg/infrastructure/clusterapi/types.go
  • pkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/infrastructure/gcp/clusterapi/ignition.go
  • pkg/asset/manifests/capiutils/helpers.go
  • pkg/infrastructure/azure/ignition.go
  • pkg/asset/manifests/aws/zones_test.go

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Summary by CodeRabbit

  • New Features

  • Optional AAAA (IPv6) DNS records and batching of multiple DNS record changes

  • IPv6/dual‑stack support for machines, VPCs, subnets, load balancers, and API target group IP type

  • Helpers to extract and filter machine CIDRs by IP family

  • Refactor

  • Centralized ignition editing with chained edit steps and improved dual‑stack/custom DNS handling

  • IPv6-aware updates for security groups, SSH rules, and node‑port ingress rules

  • Tests

  • Added dual‑stack VPC and subnet test coverage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)

564-603: ⚠️ Potential issue | 🟠 Major

isSSHRuleGone may report success prematurely for private installs.

The function checks for hardcoded 0.0.0.0/0 and ::/0 CIDRs, but removeSSHRule identifies bootstrap SSH rules by their description (BootstrapSSHDescription). For private installs, bootstrap SSH rules use VPC/subnet CIDRs (from sshRuleCidrs), not 0.0.0.0/0. This means isSSHRuleGone will return true before the actual bootstrap SSH rules are deleted.

Consider matching by BootstrapSSHDescription (consistent with removeSSHRule) instead of hardcoded CIDRs:

🛠️ Proposed fix to match rules by description
 func isSSHRuleGone(ctx context.Context, client *ec2.Client, sgID string) (bool, error) {
 	// ... existing SG fetch code ...

 	sg := sgs[0]
 	for _, rule := range sg.IpPermissions {
 		if ptr.Deref(rule.ToPort, 0) != 22 {
 			continue
 		}
-		// Check IPv4 rules
+		// Check IPv4 rules for bootstrap SSH by description
 		for _, source := range rule.IpRanges {
-			if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with IPv4 CIDR %s. Still waiting for deletion...", ptr.Deref(source.CidrIp, ""))
 				return false, nil
 			}
 		}
-		// Check IPv6 rules
+		// Check IPv6 rules for bootstrap SSH by description
 		for _, source := range rule.Ipv6Ranges {
-			if source.CidrIpv6 != nil && *source.CidrIpv6 == "::/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with IPv6 CIDR %s. Still waiting for deletion...", ptr.Deref(source.CidrIpv6, ""))
 				return false, nil
 			}
 		}
 	}

 	return true, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 564 - 603,
isSSHRuleGone currently looks for hardcoded CIDRs (0.0.0.0/0 and ::/0) which
misses bootstrap SSH rules that are identified by description in removeSSHRule;
update isSSHRuleGone to detect any SSH ingress rules (ToPort == 22) that have
the BootstrapSSHDescription (use the same BootstrapSSHDescription symbol) in
their Description fields for both IpRanges and Ipv6Ranges (and any other
permission fields you use for bootstrap rules), and treat presence of those
described rules as "not gone" (return false), otherwise return true; keep the
existing error handling and logging (use ptr.Deref on Description like existing
code) so behavior matches removeSSHRule's identification logic.
♻️ Duplicate comments (1)
pkg/infrastructure/clusterapi/ignition.go (1)

328-378: ⚠️ Potential issue | 🟠 Major

Silent success when clusterConfigDataFile is not found may mask issues.

If clusterConfigDataFile ("/opt/openshift/manifests/cluster-config.yaml") is not present in config.Storage.Files, the function returns nil without an error. Given that updateMachineNetworks is called specifically to update machine networks for dual-stack, a missing cluster-config file is likely an unexpected state that should fail the install rather than silently succeed.

🛠️ Proposed fix to fail when cluster-config is not found
 func updateMachineNetworks(in IgnitionInput, config *igntypes.Config, machineNetworks []types.MachineNetworkEntry) error {
+	found := false
 	for i, fileData := range config.Storage.Files {
 		if fileData.Path != clusterConfigDataFile {
 			continue
 		}
+		found = true

 		contents := strings.Split(*config.Storage.Files[i].Contents.Source, ",")
 		// ... existing logic ...

 		encoded := fmt.Sprintf("%s%s", header, base64.StdEncoding.EncodeToString(configCMContents))
 		// replace the contents with the edited information
 		config.Storage.Files[i].Contents.Source = &encoded

-		break
+		return nil
 	}
-	return nil
+	if !found {
+		return fmt.Errorf("failed to find %s in bootstrap ignition", clusterConfigDataFile)
+	}
+	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/clusterapi/ignition.go` around lines 328 - 378, The
function updateMachineNetworks currently iterates config.Storage.Files for
clusterConfigDataFile and silently returns nil if not found; change it to return
an explicit error when the loop finishes without locating clusterConfigDataFile
so the caller knows the expected cluster-config manifest was missing. Locate
updateMachineNetworks and the loop that checks fileData.Path !=
clusterConfigDataFile, and after the loop (or by tracking a found boolean)
return a descriptive fmt.Errorf mentioning clusterConfigDataFile when it was not
found instead of nil; keep all existing decoding/marshalling behavior unchanged
when the file is present.
🧹 Nitpick comments (1)
pkg/asset/manifests/capiutils/helpers.go (1)

57-77: Consider renaming loop variable to avoid shadowing the ipnet package.

The loop variable ipnet shadows the imported package name. While this works correctly here (the package isn't accessed inside the loop), it can cause confusion during maintenance.

♻️ Suggested refactor
 // GetIPv4CIDRs returns only IPNets of IPv4 family.
 func GetIPv4CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv4Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv4CIDR(&ipnet.IPNet) {
-			ipv4Nets = append(ipv4Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv4CIDR(&cidr.IPNet) {
+			ipv4Nets = append(ipv4Nets, cidr)
 		}
 	}
 	return ipv4Nets
 }

 // GetIPv6CIDRs returns only IPNets of IPv6 family.
 func GetIPv6CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv6Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv6CIDR(&ipnet.IPNet) {
-			ipv6Nets = append(ipv6Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv6CIDR(&cidr.IPNet) {
+			ipv6Nets = append(ipv6Nets, cidr)
 		}
 	}
 	return ipv6Nets
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/manifests/capiutils/helpers.go` around lines 57 - 77, The loop
variable name `ipnet` shadows the imported package `ipnet`, causing confusion;
in both GetIPv4CIDRs and GetIPv6CIDRs, rename the loop variable (e.g., use `n`,
`netItem`, or `cidr`) and update references inside the loop (e.g.,
netutils.IsIPv4CIDR(&netItem.IPNet) and ipv4Nets = append(ipv4Nets, netItem)) so
the code no longer shadows the package while preserving use of the ipnet.IPNet
type and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 564-603: isSSHRuleGone currently looks for hardcoded CIDRs
(0.0.0.0/0 and ::/0) which misses bootstrap SSH rules that are identified by
description in removeSSHRule; update isSSHRuleGone to detect any SSH ingress
rules (ToPort == 22) that have the BootstrapSSHDescription (use the same
BootstrapSSHDescription symbol) in their Description fields for both IpRanges
and Ipv6Ranges (and any other permission fields you use for bootstrap rules),
and treat presence of those described rules as "not gone" (return false),
otherwise return true; keep the existing error handling and logging (use
ptr.Deref on Description like existing code) so behavior matches removeSSHRule's
identification logic.

---

Duplicate comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-378: The function updateMachineNetworks currently iterates
config.Storage.Files for clusterConfigDataFile and silently returns nil if not
found; change it to return an explicit error when the loop finishes without
locating clusterConfigDataFile so the caller knows the expected cluster-config
manifest was missing. Locate updateMachineNetworks and the loop that checks
fileData.Path != clusterConfigDataFile, and after the loop (or by tracking a
found boolean) return a descriptive fmt.Errorf mentioning clusterConfigDataFile
when it was not found instead of nil; keep all existing decoding/marshalling
behavior unchanged when the file is present.

---

Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 57-77: The loop variable name `ipnet` shadows the imported package
`ipnet`, causing confusion; in both GetIPv4CIDRs and GetIPv6CIDRs, rename the
loop variable (e.g., use `n`, `netItem`, or `cidr`) and update references inside
the loop (e.g., netutils.IsIPv4CIDR(&netItem.IPNet) and ipv4Nets =
append(ipv4Nets, netItem)) so the code no longer shadows the package while
preserving use of the ipnet.IPNet type and behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd03415a-daaa-4907-8723-6655733350bb

📥 Commits

Reviewing files that changed from the base of the PR and between e47210e and ba7f608.

📒 Files selected for processing (8)
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/capiutils/helpers.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/aws/clusterapi/ignition.go
  • pkg/infrastructure/azure/ignition.go
  • pkg/infrastructure/clusterapi/ignition.go
  • pkg/infrastructure/clusterapi/types.go
  • pkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/infrastructure/azure/ignition.go

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

Phew, I think all CodeRabbit reviews are now addressed and the PR is rebased on top of latest main. PTAL 🙏

In case anyone misses due to the long bot review, the latest comments/discussions were posted here: #10353 (comment)

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/test e2e-aws-ovn e2e-aws-ovn-edge-zones e2e-aws-ovn-shared-vpc-edge-zones

1 similar comment
@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/test e2e-aws-ovn e2e-aws-ovn-edge-zones e2e-aws-ovn-shared-vpc-edge-zones

func(awsCluster *capa.AWSCluster, vpcIPv6CIDR string) bool {
if slices.Contains(awsCluster.Spec.NetworkSpec.NodePortIngressRuleCidrBlocks, vpcIPv6CIDR) {
logrus.Debugf("VPC IPv6 CIDR %s already in node port ingress rules", vpcIPv6CIDR)
return false
Copy link
Contributor

@sadasu sadasu Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return true also since the vpcIPv6CIDR is already present in awsCluster.Spec.NetworkSpec.NodePortIngressRuleCidrBlocks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it supposed to indicate if NodePortIngressRuleCidrBlocks needs to be updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the bool result indicates whether a change was made to the awscluster. If false, it will skip to check with AWS API instead of calling Update to the local envtest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 377:

// Update the spec using the provided function
if updateSpecFn(awsCluster, vpcIPv6CIDR) {
    // Update the AWSCluster resource
    if err := cl.Update(ctx, awsCluster); err != nil {

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 10, 2026

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-custom-dns-techpreview 82cff60 link false /test e2e-aws-custom-dns-techpreview
ci/prow/e2e-azure-ovn-shared-vpc ba7f608 link false /test e2e-azure-ovn-shared-vpc
ci/prow/e2e-gcp-custom-dns ba7f608 link false /test e2e-gcp-custom-dns
ci/prow/e2e-azurestack ba7f608 link false /test e2e-azurestack
ci/prow/gcp-custom-endpoints-proxy-wif ba7f608 link false /test gcp-custom-endpoints-proxy-wif
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones ba7f608 link false /test e2e-aws-ovn-shared-vpc-edge-zones
ci/prow/e2e-aws-ovn-edge-zones ba7f608 link false /test e2e-aws-ovn-edge-zones
ci/prow/e2e-aws-ovn-heterogeneous ba7f608 link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-gcp-xpn-custom-dns ba7f608 link false /test e2e-gcp-xpn-custom-dns

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tthvo added 5 commits March 10, 2026 11:24
Note: Defining a non-nil (but empty) vpc.ipv6 block will configure cAPA
to provision the network infrastructure with IPv6 capabilities.
The machines need configuring to:
- Use resource-name hostname type in order to have A and AAAA records
  for its hostname.
- Assign a primary IPv6 address in order to register with IPv6 target
  group when dualstack is IPv6 primary.
- Enable IPv6 IMDS HTTP endpoint.
The VPC IPv6 CIDR is not available at install time (when the installer
provisions the VPC). Thus, we need to add it in the node port rules
later on and ensure the rules are available.
The etcd operator requires IPv6 machine networks in dualstack
networking. Since we may not know the VPC IPV6 at install time, we need
to add it in later on once the infrastructure is ready.
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Summary by CodeRabbit

  • New Features

  • Optional AAAA (IPv6) DNS records and batching of multiple DNS record changes

  • IPv6/dual‑stack support across machines, VPCs, subnets, load balancers, and API target group IP type

  • Helpers to extract and filter machine CIDRs by IP family

  • Refactor

  • Centralized, chainable ignition editing with improved dual‑stack and custom‑DNS handling

  • IPv6-aware updates for security groups, SSH rules, and node‑port ingress rules

  • Tests

  • Added dual‑stack VPC and subnet test coverage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)

584-599: ⚠️ Potential issue | 🟠 Major

Match bootstrap SSH rules by description, not by 0.0.0.0/0 / ::/0.

removeSSHRule removes entries by awsmanifest.BootstrapSSHDescription, but this verifier only waits on world-open CIDRs. In private or BYO-subnet installs the bootstrap SSH rule can be scoped to cluster/VPC CIDRs instead, so this can return true before the bootstrap SSH rule has actually been removed. Check both IpRanges and Ipv6Ranges by description to keep the verifier aligned with the removal logic.

Proposed fix
 	for _, rule := range sg.IpPermissions {
 		if ptr.Deref(rule.ToPort, 0) != 22 {
 			continue
 		}
 		// Check IPv4 rules
 		for _, source := range rule.IpRanges {
-			if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with source cidr %s. Still waiting for deletion...", ptr.Deref(source.CidrIp, ""))
 				return false, nil
 			}
 		}
 		// Check IPv6 rules
 		for _, source := range rule.Ipv6Ranges {
-			if source.CidrIpv6 != nil && *source.CidrIpv6 == "::/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with source cidr %s. Still waiting for deletion...", ptr.Deref(source.CidrIpv6, ""))
 				return false, nil
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 584 - 599, The
verifier currently only checks for world-open CIDRs when scanning rule.IpRanges
and rule.Ipv6Ranges, which can miss bootstrap SSH rules that are scoped to
non-world CIDRs; update the checks in the loops that iterate rule.IpRanges and
rule.Ipv6Ranges to look for source.Description matching
awsmanifest.BootstrapSSHDescription (use ptr.Deref or nil-safe compare) and
return false if such a description is found (so the verifier aligns with
removeSSHRule which deletes by awsmanifest.BootstrapSSHDescription); preserve or
remove the CIDR checks as desired, but ensure the description-based check is
present for both IpRanges and Ipv6Ranges.
🧹 Nitpick comments (1)
pkg/asset/manifests/capiutils/helpers.go (1)

58-76: Minor: Variable shadows imported package name.

The loop variable ipnet shadows the imported github.com/openshift/installer/pkg/ipnet package. While this doesn't cause functional issues since the package isn't used inside the loop body, renaming to cidr would improve clarity.

♻️ Suggested improvement
 func GetIPv4CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv4Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv4CIDR(&ipnet.IPNet) {
-			ipv4Nets = append(ipv4Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv4CIDR(&cidr.IPNet) {
+			ipv4Nets = append(ipv4Nets, cidr)
 		}
 	}
 	return ipv4Nets
 }

 func GetIPv6CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv6Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv6CIDR(&ipnet.IPNet) {
-			ipv6Nets = append(ipv6Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv6CIDR(&cidr.IPNet) {
+			ipv6Nets = append(ipv6Nets, cidr)
 		}
 	}
 	return ipv6Nets
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/manifests/capiutils/helpers.go` around lines 58 - 76, The loop
variable name `ipnet` in GetIPv4CIDRs and GetIPv6CIDRs shadows the imported
ipnet package; rename the loop variable (e.g., to `cidr`) in both functions and
update all uses inside the loops (references to `ipnet.IPNet` -> `cidr.IPNet`)
to avoid shadowing and improve clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-377: The updateMachineNetworks function currently returns nil
when no file with path clusterConfigDataFile is found in config.Storage.Files,
causing silent success; modify updateMachineNetworks to detect when the loop
does not find clusterConfigDataFile (e.g., set a found flag or return after
handling) and return a descriptive error if the file is missing so callers know
the ignition lacked the expected cluster-config file; reference the loop over
config.Storage.Files, the clusterConfigDataFile constant and the break after
updating config.Storage.Files[i].Contents.Source to locate where to add this
check.

---

Duplicate comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 584-599: The verifier currently only checks for world-open CIDRs
when scanning rule.IpRanges and rule.Ipv6Ranges, which can miss bootstrap SSH
rules that are scoped to non-world CIDRs; update the checks in the loops that
iterate rule.IpRanges and rule.Ipv6Ranges to look for source.Description
matching awsmanifest.BootstrapSSHDescription (use ptr.Deref or nil-safe compare)
and return false if such a description is found (so the verifier aligns with
removeSSHRule which deletes by awsmanifest.BootstrapSSHDescription); preserve or
remove the CIDR checks as desired, but ensure the description-based check is
present for both IpRanges and Ipv6Ranges.

---

Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 58-76: The loop variable name `ipnet` in GetIPv4CIDRs and
GetIPv6CIDRs shadows the imported ipnet package; rename the loop variable (e.g.,
to `cidr`) in both functions and update all uses inside the loops (references to
`ipnet.IPNet` -> `cidr.IPNet`) to avoid shadowing and improve clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 31d01973-45b0-409a-b123-1fcf7db806c4

📥 Commits

Reviewing files that changed from the base of the PR and between ba7f608 and 52243fa.

📒 Files selected for processing (13)
  • pkg/asset/installconfig/aws/route53.go
  • pkg/asset/machines/aws/awsmachines.go
  • pkg/asset/machines/clusterapi.go
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/aws/zones.go
  • pkg/asset/manifests/aws/zones_test.go
  • pkg/asset/manifests/capiutils/helpers.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/aws/clusterapi/ignition.go
  • pkg/infrastructure/azure/ignition.go
  • pkg/infrastructure/clusterapi/ignition.go
  • pkg/infrastructure/clusterapi/types.go
  • pkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/infrastructure/azure/ignition.go
  • pkg/asset/machines/clusterapi.go

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. platform/aws

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants