Skip to content

USHIFT-6800: Add c2cc reboot tests#6943

Open
vimauro wants to merge 7 commits into
openshift:mainfrom
vimauro:reboot-tests
Open

USHIFT-6800: Add c2cc reboot tests#6943
vimauro wants to merge 7 commits into
openshift:mainfrom
vimauro:reboot-tests

Conversation

@vimauro

@vimauro vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Tests
    • Added a C2CC reboot test suite covering single-cluster, dual, and three-cluster simultaneous reboot scenarios.
    • Improved reboot recovery by capturing per-cluster boot IDs, rebooting all targets concurrently, then reconnecting each cluster and confirming boot ID changes and Greenboot healthcheck completion.
    • Expanded post-reboot validation to include workloads readiness, C2CC connectivity (including source IP preservation), networking/routing and nftables/OVN behaviors, RemoteCluster health/probe timestamps, and cross-cluster DNS checks.
    • Generalized “remote clusters healthy” waiting to cover all configured remote clusters (and broadened disruptive test execution to the full C2CC suite).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@vimauro: This pull request references USHIFT-6800 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vimauro

vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/label tide/merge-method-squash

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 47994f33-3b26-44dc-a123-321cb7feeecd

📥 Commits

Reviewing files that changed from the base of the PR and between 15cc38a and 7d833eb.

📒 Files selected for processing (2)
  • test/scenarios-bootc/c2cc/el102-src@c2cc-disruptive.sh
  • test/scenarios-bootc/c2cc/el98-src@c2cc-disruptive.sh
✅ Files skipped from review due to trivial changes (1)

Walkthrough

Adds C2CC reboot coverage with new helper keywords, a reboot-focused Robot suite, expanded post-reboot verification, and scenario entrypoints that run the broader C2CC suite directory.

Changes

C2CC reboot scenarios

Layer / File(s) Summary
Reboot helpers
test/resources/c2cc.resource
Updates healthy-cluster iteration and adds reboot, reconnect, and reboot-state validation keywords for cluster aliases.
Reboot suite cases
test/suites/c2cc/reboot.robot
Adds suite metadata, setup and teardown flow, reboot test cases, readiness gating, and initial full-stack verification.
Post-reboot checks
test/suites/c2cc/reboot.robot
Adds retrying verification for connectivity, infrastructure state, health probes, and DNS-based service access across clusters.
Scenario entrypoints
test/scenarios-bootc/c2cc/el102-src@c2cc-disruptive.sh, test/scenarios-bootc/c2cc/el98-src@c2cc-disruptive.sh
Changes the C2CC scenario scripts to run the suite directory instead of a single disruptive file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

ready-for-human-review

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding C2CC reboot tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo tests were added in the touched files; the new Robot test case titles are static strings with no interpolated dynamic values.
Test Structure And Quality ✅ Passed PASS: PR only adds Robot Framework suites/resources and shell-script wiring; no Ginkgo tests are touched, so this Ginkgo-specific check doesn’t apply.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; this PR only adds Robot Framework reboot tests and shell-script tweaks, so the MicroShift Ginkgo compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No multi-node/HA assumption found; the new C2CC reboot tests use per-cluster pods/services and reboot only the cluster VMs, which is compatible with SNO.
Topology-Aware Scheduling Compatibility ✅ Passed PR only changes Robot test resources/scripts; no deployment manifests, controllers, or topology-sensitive scheduling logic were added.
Ote Binary Stdout Contract ✅ Passed Only Robot resource/suite and shell scenario files changed; no process-level Go setup or stdout writes (fmt/klog/log to stdout) were added.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New reboot/DNS tests use IPv6-aware URL formatting and cluster-internal DNS only; no hardcoded IPv4 or public internet deps found.
No-Weak-Crypto ✅ Passed Changed files only add reboot/test orchestration; search found no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons.
Container-Privileges ✅ Passed Touched files are Robot tests and shell wrappers; no container/K8s manifests or privilege flags (privileged, hostPID/Network/IPC, SYS_ADMIN, allowPrivilegeEscalation) were added.
No-Sensitive-Data-In-Logs ✅ Passed No new logging of secrets, PII, hostnames, or customer data appears in the added reboot/reconnect flow.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 25, 2026
@openshift-ci openshift-ci Bot requested review from jerpeter1 and pmtk June 25, 2026 13:56
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vimauro
Once this PR has been reviewed and has the lgtm label, please assign kasturinarra for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vimauro

vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@vimauro: This pull request references USHIFT-6800 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/resources/c2cc.resource`:
- Around line 456-458: The re-registration flow in the remote cluster setup is
mutating `${C2CC_REMOTE_ALIASES}` before `Register Remote Cluster` succeeds,
which can leave teardown state inconsistent if that keyword fails. Update the
logic around `Remove Values From List` and `Register Remote Cluster` so the
alias is only removed after a successful re-registration, or use `TRY/FINALLY`
to restore/reconcile `${C2CC_REMOTE_ALIASES}` on failure. Keep the
teardown-tracked alias list in sync in the re-registration path used by `Wait
Until Keyword Succeeds`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 52322569-b28b-466c-bc7a-ba41b45857c4

📥 Commits

Reviewing files that changed from the base of the PR and between 8d0593e and 5296d02.

📒 Files selected for processing (2)
  • test/resources/c2cc.resource
  • test/suites/c2cc/reboot.robot

Comment thread test/resources/c2cc.resource Outdated
Comment on lines +456 to +458
${kubeconfig}= Get From Dictionary ${C2CC_KUBECONFIGS} ${alias}
Remove Values From List ${C2CC_REMOTE_ALIASES} ${alias}
Register Remote Cluster ${alias} ${host} ${port} ${kubeconfig}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Re-registration is not failure-safe for teardown.

Remove Values From List drops the alias from ${C2CC_REMOTE_ALIASES} before Register Remote Cluster re-adds it. If Register Remote Cluster errors (host still down), the alias is gone from the tracked list. Within Wait Until Keyword Succeeds retries this self-heals, but if all retries exhaust, Teardown All Remote Clusters will never switch to / close that connection, leaking it and leaving teardown state inconsistent.

Consider only mutating the tracking list after a successful re-registration, or guarding with TRY/FINALLY so the list is reconciled even on the failure path.

Based on learnings: teardown state (the alias/interface list consumed by teardown keywords) must be populated reliably even when the mutating keyword errors before completing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/resources/c2cc.resource` around lines 456 - 458, The re-registration
flow in the remote cluster setup is mutating `${C2CC_REMOTE_ALIASES}` before
`Register Remote Cluster` succeeds, which can leave teardown state inconsistent
if that keyword fails. Update the logic around `Remove Values From List` and
`Register Remote Cluster` so the alias is only removed after a successful
re-registration, or use `TRY/FINALLY` to restore/reconcile
`${C2CC_REMOTE_ALIASES}` on failure. Keep the teardown-tracked alias list in
sync in the re-registration path used by `Wait Until Keyword Succeeds`.

Source: Learnings

Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot
Comment thread test/resources/c2cc.resource
Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot Outdated
@agullon

agullon commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@vimauro

vimauro commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@agullon

agullon commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@vimauro unfortunatelly the tests you added were not exected.
The reason is we are only triggering disruptive.robot in https://github.com/openshift/microshift/blob/main/test/scenarios-bootc/c2cc/el102-src%40c2cc-disruptive.sh#L23 and https://github.com/openshift/microshift/blob/main/test/scenarios-bootc/c2cc/el98-src%40c2cc-disruptive.sh#L23

The fix you can add is to remove the disruptive.robot from both lines I shared with you.

@coderabbitai coderabbitai Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 27, 2026
@openshift-ci

openshift-ci Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@vimauro: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants