Skip to content

USHIFT-7187: C2CC Dual Stack support & tests#6954

Draft
pmtk wants to merge 7 commits into
openshift:mainfrom
pmtk:c2cc/dual-stack
Draft

USHIFT-7187: C2CC Dual Stack support & tests#6954
pmtk wants to merge 7 commits into
openshift:mainfrom
pmtk:c2cc/dual-stack

Conversation

@pmtk

@pmtk pmtk commented Jun 26, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features

    • Remote cluster routing now supports separate next-hop addresses for IPv4 and IPv6, improving dual-stack handling.
    • Added dual-stack test coverage for connectivity, infrastructure, cleanup, and recovery scenarios.
  • Bug Fixes

    • Routing and rule checks now select the correct next hop for each address family.
    • Validation now catches missing, duplicate, or invalid next-hop entries more consistently.
  • Documentation

    • Updated configuration examples and defaults to show next-hop values as lists.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 26, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 26, 2026

Copy link
Copy Markdown

@pmtk: This pull request references USHIFT-7187 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

The PR changes C2CC remote next-hop handling from single strings to per-family arrays, updates controller route and RemoteCluster naming logic to use family-specific next hops, and expands the dual-stack test harness, scenarios, and Robot suites to cover IPv4/IPv6 flows.

Changes

Dual-stack C2CC routing

Layer / File(s) Summary
Schema and examples
cmd/generate-config/config/config-openapi-spec.json, packaging/microshift/config.yaml, docs/user/howto_config.md
Remote next-hop fields are represented as arrays in the OpenAPI spec, packaging config, and user-facing examples.
Config parsing and validation
pkg/config/c2cc.go
RemoteCluster and ResolvedRemoteCluster store next hops by IP family, parse slice inputs, and enforce per-family coverage and duplicate checks.
Config tests
pkg/config/c2cc_test.go
Config validation, probe, DNS, rendering, and user-settings tests use slice-based next hops and family-keyed resolved hops.
RemoteCluster naming and fixtures
pkg/controllers/c2cc/healthcheck.go, pkg/controllers/c2cc/healthcheck_test.go, pkg/controllers/c2cc/helpers_test.go
RemoteCluster CR names derive from the primary next hop, and controller test helpers build resolved next-hop maps from parsed hop lists.
Route selection
pkg/controllers/c2cc/ovn.go, pkg/controllers/c2cc/ovn_test.go, pkg/controllers/c2cc/routes.go
OVN route generation and Linux route reconciliation select gateways per CIDR family and use precomputed destination keys.
Dual-stack harness and keywords
test/bin/c2cc_common.sh, test/resources/c2cc.resource, test/assets/c2cc/hello-microshift.yaml
Shell helpers, shared Robot keywords, and the service fixture add dual-stack CIDRs, IPv6-aware host discovery, family-aware IP commands, and dual-stack verification helpers.
Bootc scenario scripts
test/scenarios-bootc/c2cc/*dual-stack*.sh
The bootc scenario scripts add dual-stack VM creation, teardown, and test execution entry points for the supported images.
Dual-stack suite coverage
test/suites/c2cc/*.robot
Robot suites add dual-stack cleanup, infrastructure, connectivity, probe, reconciliation, and disruption cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • openshift/microshift#6930: Extends the same C2CC disruption path by passing disrupted_ipv6 into Verify RemoteCluster Unhealthy On Observers.

Suggested reviewers

  • copejon
  • jerpeter1
  • jogeo
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the PR’s main change: C2CC dual-stack support and related tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Changed test files use standard Test* functions only; no Ginkgo It/Describe/Context/When titles or dynamic values were added.
Test Structure And Quality ✅ Passed The changed Go tests are table-driven unit tests, not Ginkgo; no cluster resources, waits, or cleanup risks were introduced, and shared setup uses t.Cleanup.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; changed Go tests are plain t.Run/testing, with no MicroShift-incompatible OpenShift APIs or guards needed.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the changed Go tests use testing.T, and the rest are Robot/Bash scenarios, so this SNO check doesn't apply.
Topology-Aware Scheduling Compatibility ✅ Passed Modified C2CC controllers only change route/health reconciliation; I found no affinity, nodeSelector, topology spread, replica, or PDB logic.
Ote Binary Stdout Contract ✅ Passed No PR-scope main/TestMain/RunSpecs/init code or stdout prints were added; touched files are controllers, unit tests, docs, and scripts only.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the changed Robot/bash tests use cluster-internal DNS/IPs and IPv6-aware helpers, with no public internet dependencies.
No-Weak-Crypto ✅ Passed Changed logic is C2CC routing/tests only; I found no added MD5/SHA1/DES/RC4/custom crypto or secret/token comparisons.
Container-Privileges ✅ Passed Touched manifest(s) keep allowPrivilegeEscalation: false, runAsNonRoot: true, runAsUser: 1001; no privileged, host namespace, or SYS_ADMIN settings were added.
No-Sensitive-Data-In-Logs ✅ Passed No new Log/echo/printf calls were added in touched files; the dual-stack changes only compute/compare IPs and CIDRs, not secrets or PII.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=warning msg="The linter 'gomodguard' is deprecated (since v2.12.0) due to: new major version. Replaced by gomodguard_v2."
level=warning msg="Suggested new configuration:\nlinters:\n enable:\n - gomodguard_v2\n"
level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/apparentlymart/go-cidr@v1.1.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/coreos/go-systemd@v0.0.0-20190321100706-95778dfbb74e: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/go-cmp@v0.7.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/miekg/dns@v1.1.63: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/openshift/api@v0.0.0-20260511191110-9b69e5fa27e9: is

... [truncated 31032 characters] ...

elet: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/metrics: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/mount-utils: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/pod-security-admission: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-apiserver: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-cli-plugin: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-controller: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Comment @coderabbitai help to get the list of available commands.

@pmtk

pmtk commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

/test ?

@pmtk

pmtk commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

/test e2e-aws-tests-bootc-c2cc e2e-aws-tests-bootc-c2cc-arm

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/suites/c2cc/disruptive.robot (1)

58-62: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Populate NIC teardown state before the outage step.

Both NIC-outage flows still assign ${DISABLED_VM} / @{DISABLED_IFACES} only after Disable All NICs For VM returns. If that keyword fails mid-way, teardown has no interface list and Restore NICs And Reconnect cannot recover the VM. Please move the state capture ahead of the disruptive call or make the keyword set teardown state on failure paths too.

Based on learnings: "setting ${DISABLED_VM} before calling Disable All NICs For VM is not sufficient for recovery... @{DISABLED_IFACES} will remain empty... populate reliably even when Disable All NICs For VM errors."

Also applies to: 97-101

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/suites/c2cc/disruptive.robot` around lines 58 - 62, Populate the
teardown state before calling Disable All NICs For VM so recovery can still run
if that keyword fails. In the disruptive flow using ${DISABLED_VM},
@{DISABLED_IFACES}, and Restore NICs And Reconnect, make sure the VM name and
interface list are captured even on failure paths, either by assigning them
before the NIC-disable step or by having Disable All NICs For VM set the
teardown variables itself. Apply the same fix to the other NIC-outage flow
mentioned in the comment.

Source: Learnings

🧹 Nitpick comments (3)
pkg/config/c2cc_test.go (1)

149-209: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add coverage for the two new parseRemoteClusters branches.

The migration is thorough, but two new validation paths in c2cc.go have no test:

  • empty NextHop"nextHop must not be empty" (c2cc.go:186-188)
  • two same-family hops → "multiple IPv4/IPv6 addresses (max 1 per family)" (c2cc.go:200-202)
💚 Suggested cases
{
	name: "empty NextHop",
	cfg: mkC2CCConfig(C2CC{
		RemoteClusters: []RemoteCluster{{
			NextHop:        []string{},
			ClusterNetwork: []string{"10.45.0.0/16"},
			ServiceNetwork: []string{"10.46.0.0/16"},
		}},
	}),
	expectErr: true,
	errMsg:    "nextHop must not be empty",
},
{
	name: "two IPv4 nextHops",
	cfg: mkC2CCConfig(C2CC{
		RemoteClusters: []RemoteCluster{{
			NextHop:        []string{"10.100.0.2", "10.100.0.3"},
			ClusterNetwork: []string{"10.45.0.0/16"},
			ServiceNetwork: []string{"10.46.0.0/16"},
		}},
	}),
	expectErr: true,
	errMsg:    "max 1 per family",
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/config/c2cc_test.go` around lines 149 - 209, Add test coverage in the
c2cc validation table for the new parseRemoteClusters branches in c2cc.go by
adding cases in c2cc_test.go for an empty NextHop and for duplicate same-family
hops. Use mkC2CCConfig with RemoteClusters to verify the empty NextHop returns
the “nextHop must not be empty” error, and a case like two IPv4 next hops to
assert the “multiple IPv4/IPv6 addresses (max 1 per family)” message.
pkg/controllers/c2cc/helpers_test.go (1)

43-56: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use netlink family constants instead of magic numbers.

The 2/10 literals must stay in lockstep with whatever ipFamilyOf/PrimaryNextHop key the map by. Referencing netlink.FAMILY_V4/netlink.FAMILY_V6 (already a dependency) makes the test self-documenting and immune to constant drift.

♻️ Proposed change
-		family := 2 // FAMILY_V4
-		if ip.To4() == nil {
-			family = 10 // FAMILY_V6
-		}
+		family := netlink.FAMILY_V4
+		if ip.To4() == nil {
+			family = netlink.FAMILY_V6
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/helpers_test.go` around lines 43 - 56, The helper
parseNextHops currently uses hardcoded family values, which should be replaced
with the netlink family constants so the test stays aligned with
ipFamilyOf/PrimaryNextHop behavior. Update parseNextHops in helpers_test.go to
key the map using netlink.FAMILY_V4 and netlink.FAMILY_V6 instead of 2 and 10,
keeping the logic the same but making the test self-documenting and resilient to
constant drift.
pkg/controllers/c2cc/routes.go (1)

22-24: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Parallel slices + map are redundant and risk a duplicate-CIDR collision.

desiredDstKeys[i] is always desiredDsts[i].String(), so desiredGWs (keyed by the same string) just maps back to a gateway you could store as a parallel []net.IP. More importantly, if two resolved entries contribute the same CIDR string, the map collapses to a single gateway while both slice entries survive — both routes then resolve to the last-written gateway. A small struct ({dst, key, gw}) avoids the desync and the collision class entirely.

Also applies to: 37-47, 56-59, 88-89

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/routes.go` around lines 22 - 24, The route state in c2cc
routes is split across parallel slices plus a gateway map, which can
desynchronize and collapse duplicate CIDRs. Refactor the data model around the
existing route-handling logic in routes.go so each entry keeps its destination,
derived key, and gateway together in a single struct instead of relying on
desiredDsts, desiredDstKeys, and desiredGWs separately. Update the code paths
that populate, compare, and consume these values to use the new struct
consistently so duplicate CIDR resolutions remain distinct and each route
retains its correct gateway.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/suites/c2cc/disruptive.robot`:
- Around line 58-62: Populate the teardown state before calling Disable All NICs
For VM so recovery can still run if that keyword fails. In the disruptive flow
using ${DISABLED_VM}, @{DISABLED_IFACES}, and Restore NICs And Reconnect, make
sure the VM name and interface list are captured even on failure paths, either
by assigning them before the NIC-disable step or by having Disable All NICs For
VM set the teardown variables itself. Apply the same fix to the other NIC-outage
flow mentioned in the comment.

---

Nitpick comments:
In `@pkg/config/c2cc_test.go`:
- Around line 149-209: Add test coverage in the c2cc validation table for the
new parseRemoteClusters branches in c2cc.go by adding cases in c2cc_test.go for
an empty NextHop and for duplicate same-family hops. Use mkC2CCConfig with
RemoteClusters to verify the empty NextHop returns the “nextHop must not be
empty” error, and a case like two IPv4 next hops to assert the “multiple
IPv4/IPv6 addresses (max 1 per family)” message.

In `@pkg/controllers/c2cc/helpers_test.go`:
- Around line 43-56: The helper parseNextHops currently uses hardcoded family
values, which should be replaced with the netlink family constants so the test
stays aligned with ipFamilyOf/PrimaryNextHop behavior. Update parseNextHops in
helpers_test.go to key the map using netlink.FAMILY_V4 and netlink.FAMILY_V6
instead of 2 and 10, keeping the logic the same but making the test
self-documenting and resilient to constant drift.

In `@pkg/controllers/c2cc/routes.go`:
- Around line 22-24: The route state in c2cc routes is split across parallel
slices plus a gateway map, which can desynchronize and collapse duplicate CIDRs.
Refactor the data model around the existing route-handling logic in routes.go so
each entry keeps its destination, derived key, and gateway together in a single
struct instead of relying on desiredDsts, desiredDstKeys, and desiredGWs
separately. Update the code paths that populate, compare, and consume these
values to use the new struct consistently so duplicate CIDR resolutions remain
distinct and each route retains its correct gateway.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 47cfa155-ce92-47b9-ba43-bf984328fd88

📥 Commits

Reviewing files that changed from the base of the PR and between cf4e066 and 2f5de13.

⛔ Files ignored due to path filters (1)
  • etcd/vendor/github.com/openshift/microshift/pkg/config/c2cc.go is excluded by !**/vendor/**
📒 Files selected for processing (22)
  • cmd/generate-config/config/config-openapi-spec.json
  • docs/user/howto_config.md
  • packaging/microshift/config.yaml
  • pkg/config/c2cc.go
  • pkg/config/c2cc_test.go
  • pkg/controllers/c2cc/healthcheck.go
  • pkg/controllers/c2cc/healthcheck_test.go
  • pkg/controllers/c2cc/helpers_test.go
  • pkg/controllers/c2cc/ovn.go
  • pkg/controllers/c2cc/ovn_test.go
  • pkg/controllers/c2cc/routes.go
  • test/assets/c2cc/hello-microshift.yaml
  • test/bin/c2cc_common.sh
  • test/resources/c2cc.resource
  • test/scenarios-bootc/c2cc/el102-src@c2cc-dual-stack.sh
  • test/scenarios-bootc/c2cc/el98-src@c2cc-dual-stack-v6.sh
  • test/suites/c2cc/cleanup.robot
  • test/suites/c2cc/connectivity.robot
  • test/suites/c2cc/disruptive.robot
  • test/suites/c2cc/infrastructure.robot
  • test/suites/c2cc/probe.robot
  • test/suites/c2cc/reconciliation.robot

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@pmtk: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-tests-bootc-c2cc 2f5de13 link true /test e2e-aws-tests-bootc-c2cc
ci/prow/e2e-aws-tests-bootc-c2cc-arm 2f5de13 link true /test e2e-aws-tests-bootc-c2cc-arm

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants