USHIFT-7187: C2CC Dual Stack support & tests#6954
Conversation
|
@pmtk: This pull request references USHIFT-7187 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThe PR changes C2CC remote next-hop handling from single strings to per-family arrays, updates controller route and RemoteCluster naming logic to use family-specific next hops, and expands the dual-stack test harness, scenarios, and Robot suites to cover IPv4/IPv6 flows. ChangesDual-stack C2CC routing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=warning msg="The linter 'gomodguard' is deprecated (since v2.12.0) due to: new major version. Replaced by gomodguard_v2." ... [truncated 31032 characters] ... elet: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/metrics: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/mount-utils: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/pod-security-admission: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-apiserver: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-cli-plugin: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\tk8s.io/sample-controller: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
/test ? |
|
/test e2e-aws-tests-bootc-c2cc e2e-aws-tests-bootc-c2cc-arm |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/suites/c2cc/disruptive.robot (1)
58-62: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftPopulate NIC teardown state before the outage step.
Both NIC-outage flows still assign
${DISABLED_VM}/@{DISABLED_IFACES}only afterDisable All NICs For VMreturns. If that keyword fails mid-way, teardown has no interface list andRestore NICs And Reconnectcannot recover the VM. Please move the state capture ahead of the disruptive call or make the keyword set teardown state on failure paths too.Based on learnings: "setting
${DISABLED_VM}before callingDisable All NICs For VMis not sufficient for recovery...@{DISABLED_IFACES}will remain empty... populate reliably even whenDisable All NICs For VMerrors."Also applies to: 97-101
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/suites/c2cc/disruptive.robot` around lines 58 - 62, Populate the teardown state before calling Disable All NICs For VM so recovery can still run if that keyword fails. In the disruptive flow using ${DISABLED_VM}, @{DISABLED_IFACES}, and Restore NICs And Reconnect, make sure the VM name and interface list are captured even on failure paths, either by assigning them before the NIC-disable step or by having Disable All NICs For VM set the teardown variables itself. Apply the same fix to the other NIC-outage flow mentioned in the comment.Source: Learnings
🧹 Nitpick comments (3)
pkg/config/c2cc_test.go (1)
149-209: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd coverage for the two new
parseRemoteClustersbranches.The migration is thorough, but two new validation paths in
c2cc.gohave no test:
- empty
NextHop→"nextHop must not be empty"(c2cc.go:186-188)- two same-family hops →
"multiple IPv4/IPv6 addresses (max 1 per family)"(c2cc.go:200-202)💚 Suggested cases
{ name: "empty NextHop", cfg: mkC2CCConfig(C2CC{ RemoteClusters: []RemoteCluster{{ NextHop: []string{}, ClusterNetwork: []string{"10.45.0.0/16"}, ServiceNetwork: []string{"10.46.0.0/16"}, }}, }), expectErr: true, errMsg: "nextHop must not be empty", }, { name: "two IPv4 nextHops", cfg: mkC2CCConfig(C2CC{ RemoteClusters: []RemoteCluster{{ NextHop: []string{"10.100.0.2", "10.100.0.3"}, ClusterNetwork: []string{"10.45.0.0/16"}, ServiceNetwork: []string{"10.46.0.0/16"}, }}, }), expectErr: true, errMsg: "max 1 per family", },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/config/c2cc_test.go` around lines 149 - 209, Add test coverage in the c2cc validation table for the new parseRemoteClusters branches in c2cc.go by adding cases in c2cc_test.go for an empty NextHop and for duplicate same-family hops. Use mkC2CCConfig with RemoteClusters to verify the empty NextHop returns the “nextHop must not be empty” error, and a case like two IPv4 next hops to assert the “multiple IPv4/IPv6 addresses (max 1 per family)” message.pkg/controllers/c2cc/helpers_test.go (1)
43-56: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse netlink family constants instead of magic numbers.
The
2/10literals must stay in lockstep with whateveripFamilyOf/PrimaryNextHopkey the map by. Referencingnetlink.FAMILY_V4/netlink.FAMILY_V6(already a dependency) makes the test self-documenting and immune to constant drift.♻️ Proposed change
- family := 2 // FAMILY_V4 - if ip.To4() == nil { - family = 10 // FAMILY_V6 - } + family := netlink.FAMILY_V4 + if ip.To4() == nil { + family = netlink.FAMILY_V6 + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/helpers_test.go` around lines 43 - 56, The helper parseNextHops currently uses hardcoded family values, which should be replaced with the netlink family constants so the test stays aligned with ipFamilyOf/PrimaryNextHop behavior. Update parseNextHops in helpers_test.go to key the map using netlink.FAMILY_V4 and netlink.FAMILY_V6 instead of 2 and 10, keeping the logic the same but making the test self-documenting and resilient to constant drift.pkg/controllers/c2cc/routes.go (1)
22-24: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueParallel slices + map are redundant and risk a duplicate-CIDR collision.
desiredDstKeys[i]is alwaysdesiredDsts[i].String(), sodesiredGWs(keyed by the same string) just maps back to a gateway you could store as a parallel[]net.IP. More importantly, if two resolved entries contribute the same CIDR string, the map collapses to a single gateway while both slice entries survive — both routes then resolve to the last-written gateway. A small struct ({dst, key, gw}) avoids the desync and the collision class entirely.Also applies to: 37-47, 56-59, 88-89
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/routes.go` around lines 22 - 24, The route state in c2cc routes is split across parallel slices plus a gateway map, which can desynchronize and collapse duplicate CIDRs. Refactor the data model around the existing route-handling logic in routes.go so each entry keeps its destination, derived key, and gateway together in a single struct instead of relying on desiredDsts, desiredDstKeys, and desiredGWs separately. Update the code paths that populate, compare, and consume these values to use the new struct consistently so duplicate CIDR resolutions remain distinct and each route retains its correct gateway.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/suites/c2cc/disruptive.robot`:
- Around line 58-62: Populate the teardown state before calling Disable All NICs
For VM so recovery can still run if that keyword fails. In the disruptive flow
using ${DISABLED_VM}, @{DISABLED_IFACES}, and Restore NICs And Reconnect, make
sure the VM name and interface list are captured even on failure paths, either
by assigning them before the NIC-disable step or by having Disable All NICs For
VM set the teardown variables itself. Apply the same fix to the other NIC-outage
flow mentioned in the comment.
---
Nitpick comments:
In `@pkg/config/c2cc_test.go`:
- Around line 149-209: Add test coverage in the c2cc validation table for the
new parseRemoteClusters branches in c2cc.go by adding cases in c2cc_test.go for
an empty NextHop and for duplicate same-family hops. Use mkC2CCConfig with
RemoteClusters to verify the empty NextHop returns the “nextHop must not be
empty” error, and a case like two IPv4 next hops to assert the “multiple
IPv4/IPv6 addresses (max 1 per family)” message.
In `@pkg/controllers/c2cc/helpers_test.go`:
- Around line 43-56: The helper parseNextHops currently uses hardcoded family
values, which should be replaced with the netlink family constants so the test
stays aligned with ipFamilyOf/PrimaryNextHop behavior. Update parseNextHops in
helpers_test.go to key the map using netlink.FAMILY_V4 and netlink.FAMILY_V6
instead of 2 and 10, keeping the logic the same but making the test
self-documenting and resilient to constant drift.
In `@pkg/controllers/c2cc/routes.go`:
- Around line 22-24: The route state in c2cc routes is split across parallel
slices plus a gateway map, which can desynchronize and collapse duplicate CIDRs.
Refactor the data model around the existing route-handling logic in routes.go so
each entry keeps its destination, derived key, and gateway together in a single
struct instead of relying on desiredDsts, desiredDstKeys, and desiredGWs
separately. Update the code paths that populate, compare, and consume these
values to use the new struct consistently so duplicate CIDR resolutions remain
distinct and each route retains its correct gateway.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 47cfa155-ce92-47b9-ba43-bf984328fd88
⛔ Files ignored due to path filters (1)
etcd/vendor/github.com/openshift/microshift/pkg/config/c2cc.gois excluded by!**/vendor/**
📒 Files selected for processing (22)
cmd/generate-config/config/config-openapi-spec.jsondocs/user/howto_config.mdpackaging/microshift/config.yamlpkg/config/c2cc.gopkg/config/c2cc_test.gopkg/controllers/c2cc/healthcheck.gopkg/controllers/c2cc/healthcheck_test.gopkg/controllers/c2cc/helpers_test.gopkg/controllers/c2cc/ovn.gopkg/controllers/c2cc/ovn_test.gopkg/controllers/c2cc/routes.gotest/assets/c2cc/hello-microshift.yamltest/bin/c2cc_common.shtest/resources/c2cc.resourcetest/scenarios-bootc/c2cc/el102-src@c2cc-dual-stack.shtest/scenarios-bootc/c2cc/el98-src@c2cc-dual-stack-v6.shtest/suites/c2cc/cleanup.robottest/suites/c2cc/connectivity.robottest/suites/c2cc/disruptive.robottest/suites/c2cc/infrastructure.robottest/suites/c2cc/probe.robottest/suites/c2cc/reconciliation.robot
|
@pmtk: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
New Features
Bug Fixes
Documentation