Skip to content

OCPEDGE-2591: feat(two-node): add verify-rhel-bugfix skill#118

Open
lucaconsalvi wants to merge 11 commits intoopenshift-eng:mainfrom
lucaconsalvi:feat/two-node-bugfix-verify
Open

OCPEDGE-2591: feat(two-node): add verify-rhel-bugfix skill#118
lucaconsalvi wants to merge 11 commits intoopenshift-eng:mainfrom
lucaconsalvi:feat/two-node-bugfix-verify

Conversation

@lucaconsalvi
Copy link
Copy Markdown
Contributor

@lucaconsalvi lucaconsalvi commented May 5, 2026

Summary

  • Add /two-node:verify-rhel-bugfix skill to automate RHEL resource-agents bug fix verification on TNF clusters
  • Fetches bug context from Jira via MCP (RHEL ticket, upstream OCPBUGS bug, OCPEDGE tracking ticket), checks cluster state, patches nodes, runs verification tests, and generates a JIRA comment report
  • Includes three deterministic scripts: verify-cluster.sh, patch-nodes.sh, collect-logs.sh
  • Bumps two-node plugin version to 1.1.0

Test plan

  • Run ./marketplace validate two-node — passes
  • Run bash plugins/tests/marketplace_smoke_test.sh — 23/23 pass
  • Run npx markdownlint-cli2 on changed files — 0 errors
  • Invoke /two-node:verify-rhel-bugfix RHEL-XXXXX on a session with a running TNF cluster

Context

Ported from the tnf-dev-env project's bugfix-verify skill (lightweight port — no project workspace scaffolding). Fills the gap in the two-node plugin lifecycle: create-rhel-stories creates the Jira tickets, verify-rhel-bugfix does the actual verification, bug-reproducer handles a different use case (reproducing bugs).

Tracked by OCPEDGE-2591.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • RHEL bugfix verification workflow for two-node clusters with guided steps and optional Jira reporting
    • New scripts to patch nodes, verify cluster health, and collect troubleshooting logs
  • Documentation

    • Expanded and reformatted verification workflow documentation and READMEs
  • Chores

    • Updated two-node plugin to version 1.1.0 and reorganized marketplace entries

lucaconsalvi and others added 5 commits April 21, 2026 14:33
Port the RHEL bug fix verification workflow from tnf-dev-env into the
two-node plugin. The skill fetches bug context from Jira via MCP,
checks cluster state, patches nodes with the fixed RPM, runs
verification tests, and generates a JIRA comment report.

Includes three scripts for deterministic operations:
- verify-cluster.sh: cluster health check
- patch-nodes.sh: RPM patching with reboot and verification
- collect-logs.sh: pacemaker/etcd log collection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add missing blank lines around fenced code blocks and lists,
add language specifiers to fenced code blocks, and remove
inline HTML that triggered MD033.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New skill added (rhel-bugfix-verify), bump minor version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lucaconsalvi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a two-node RHEL resource-agents verification workflow: three new Bash helper scripts, a detailed SKILL.md describing the verification flow and Jira/reporting steps, README formatting updates, and plugin metadata/marketplace registry version and ordering adjustments.

Changes

RHEL Bugfix Verification Workflow

Layer / File(s) Summary
Version & Registry
plugins/two-node/.claude-plugin/plugin.json, .claude-plugin/marketplace.json
Bump two-node plugin version 1.0.01.1.0. Move/add edge-ic plugin entry to a new location in .claude-plugin/marketplace.json and remove the old duplicate.
Documentation / Discovery
.claude/skills/two-node-verify-rhel-bugfix, plugins/two-node/README.md
Register skill pointer and reformat README sections: /two-node:create-rhel-stories features, /two-node:verify-rhel-bugfix workflow/scripts/prereqs, and /two-node:bug-reproducer output/topologies formatting.
Skill Spec / Workflow
plugins/two-node/skills/verify-rhel-bugfix/SKILL.md
Add complete two-node:verify-rhel-bugfix skill: front matter, synopsis, prerequisites, scripts description, Jira ticket gathering and linked-ticket traversal, user prompts, cluster-state checks, guarded patching workflow, code-only vs functional test paths, log collection, Markdown report generation, and optional Jira posting rules.
Helper Scripts — Implementation
plugins/two-node/scripts/collect-logs.sh
New strict Bash script creating timestamped output dir (default /tmp/bugfix-verify-logs), auto-detecting HYPERVISOR, SSHing via hypervisor to two hard-coded masters (192.168.111.20, 192.168.111.21), and collecting pacemaker/etcd logs and journal snippets into per-master files.
Helper Scripts — Implementation
plugins/two-node/scripts/patch-nodes.sh
New strict Bash script to copy an RPM to hypervisor and masters, run rpm-ostree override replace /tmp/<rpm> -C on each master, reboot nodes concurrently, poll for availability (~10 min), verify installed resource-agents, and optionally grep a provided pattern in the on-node resource-agent file.
Helper Scripts — Implementation
plugins/two-node/scripts/verify-cluster.sh
New strict Bash script to auto-detect HYPERVISOR, run oc checks (clusterversion, nodes, operators), SSH to masters for OS/RPM/pacemaker status, and gather etcd member/endpoint health via podman exec, delimiting sections and suppressing SSH noise.
Tests / Docs (end state)
plugins/two-node/README.md, plugins/two-node/skills/verify-rhel-bugfix/SKILL.md
README formatted for clarity; SKILL.md documents end-to-end verification, prerequisites, scripts usage, and reporting rules.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: adding a new verify-rhel-bugfix skill to the two-node plugin, with the Jira ticket reference providing context.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
plugins/two-node/skills/rhel-bugfix-verify/SKILL.md (1)

410-411: 💤 Low value

Consider removing or generalizing the hardcoded user-specific path.

The reference to ~/Documents/claude_test_file/ is a user-specific path that likely won't exist for other contributors. Consider removing this reference or making it more generic.

Proposed fix
-- Reports from past verifications in `~/Documents/claude_test_file/` can be
-  used as format reference if available.
+- Reports from past verifications can be used as format reference if available.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/two-node/skills/rhel-bugfix-verify/SKILL.md` around lines 410 - 411,
The SKILL.md contains a hardcoded, user-specific path string
"~/Documents/claude_test_file/" that should be removed or generalized; update
the sentence that currently references that path to either remove the example
entirely or replace it with a generic placeholder (e.g.,
"~/Documents/<report_directory>" or "$HOME/Documents/<report_directory>") and
add a brief note saying "if available, use your local report directory" so other
contributors can adapt it; ensure you update the exact text matching the string
"~/Documents/claude_test_file/" in SKILL.md.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/two-node/scripts/collect-logs.sh`:
- Line 1: The script collect-logs.sh currently uses the shebang line
"#!/bin/bash"; update the topmost shebang to "#!/usr/bin/bash" to conform to
CONTRIBUTING.md; simply replace the existing shebang in collect-logs.sh so the
script invokes /usr/bin/bash instead of /bin/bash.

In `@plugins/two-node/scripts/patch-nodes.sh`:
- Line 1: The script's shebang currently uses /bin/bash but CONTRIBUTING.md
mandates /usr/bin/bash; update the shebang line in the patch-nodes.sh script to
use #!/usr/bin/bash so the script follows repository conventions (look for the
file's shebang at the top of patch-nodes.sh).
- Around line 79-91: The wait loop in patch-nodes.sh (the for i in $(seq 1 60)
loop that sets m0 and m1) lacks failure handling if masters never become "up";
modify the script after the loop to check whether both m0 and m1 reached "up"
and if not print a clear error including last-known values of m0 and m1 and exit
with a non-zero status (e.g., exit 1) so downstream verification does not
proceed; reference the variables m0, m1 and the loop using seq 1 60 when
locating where to add this check and error/exit behavior.

In `@plugins/two-node/scripts/verify-cluster.sh`:
- Line 1: The script uses the wrong shebang; replace the current shebang line
"#!/bin/bash" with the required "#!/usr/bin/bash" per CONTRIBUTING.md so the
script consistently uses /usr/bin/bash; update the first line of the file (the
shebang) in the verify-cluster.sh script to the new path.

---

Nitpick comments:
In `@plugins/two-node/skills/rhel-bugfix-verify/SKILL.md`:
- Around line 410-411: The SKILL.md contains a hardcoded, user-specific path
string "~/Documents/claude_test_file/" that should be removed or generalized;
update the sentence that currently references that path to either remove the
example entirely or replace it with a generic placeholder (e.g.,
"~/Documents/<report_directory>" or "$HOME/Documents/<report_directory>") and
add a brief note saying "if available, use your local report directory" so other
contributors can adapt it; ensure you update the exact text matching the string
"~/Documents/claude_test_file/" in SKILL.md.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 21d91ebd-8a38-44a7-be2b-662b8a7cc607

📥 Commits

Reviewing files that changed from the base of the PR and between 45227fe and 20958b9.

📒 Files selected for processing (7)
  • .claude/skills/two-node-rhel-bugfix-verify
  • plugins/two-node/.claude-plugin/plugin.json
  • plugins/two-node/README.md
  • plugins/two-node/scripts/collect-logs.sh
  • plugins/two-node/scripts/patch-nodes.sh
  • plugins/two-node/scripts/verify-cluster.sh
  • plugins/two-node/skills/rhel-bugfix-verify/SKILL.md

Comment thread plugins/two-node/scripts/collect-logs.sh Outdated
Comment thread plugins/two-node/scripts/patch-nodes.sh Outdated
Comment thread plugins/two-node/scripts/patch-nodes.sh
Comment thread plugins/two-node/scripts/verify-cluster.sh Outdated
lucaconsalvi and others added 3 commits May 5, 2026 13:13
Bump two-node version to 1.1.0 in marketplace catalog and alphabetize
plugin entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use #!/usr/bin/bash shebang per CONTRIBUTING.md in all three scripts
- Add timeout failure handling in patch-nodes.sh reboot wait loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align skill name with the verb-noun convention used across all plugins
(e.g. create-rhel-stories, generate-tests, check-release-readiness).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lucaconsalvi lucaconsalvi changed the title OCPEDGE-2591: feat(two-node): add rhel-bugfix-verify skill OCPEDGE-2591: feat(two-node): add verify-rhel-bugfix skill May 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/two-node/scripts/collect-logs.sh`:
- Around line 22-23: Add strict shell flags and validate MINUTES_AGO is a
non-negative integer before using it: enable set -euo pipefail at the top of the
script, check the MINUTES_AGO variable (from its assignment
MINUTES_AGO="${1:-30}") with a numeric regex or integer test and exit with a
clear error if invalid, and then use the validated variable when building the
remote command (the interpolation near where MINUTES_AGO is used on the remote
ssh/exec line) while quoting variables like "${MINUTES_AGO}" and
"${OUTPUT_BASE}" to avoid word-splitting; ensure the remote command string is
constructed safely (prefer passing the integer as a separate argument or
explicit sanitized value) and run shellcheck to confirm no remaining issues.

In `@plugins/two-node/scripts/patch-nodes.sh`:
- Around line 61-63: The current inline SSH pipeline pipes "sudo rpm-ostree
override replace /tmp/${RPM_FILE} -C 2>&1 | grep -v '^Warning:'" which loses
rpm-ostree's exit code when grep runs; update the remote command in
scripts/patch-nodes.sh (the ssh "ec2-user@${HYPERVISOR}" -> ssh ${SSH_OPTS}
core@${node} invocation that runs "sudo rpm-ostree override replace
/tmp/${RPM_FILE} -C") to capture rpm-ostree's exit status and still filter
warnings—e.g. run the command so stdout/stderr are captured (with tee or a temp
file), record the rpm-ostree exit code via PIPESTATUS[0] or $? immediately after
the command, run grep -v '^Warning:' against the captured output, then exit with
the saved rpm-ostree status so pipeline returns the real result.
- Around line 109-113: The nested SSH call interpolates user-controlled
FIX_GREP_PATTERN directly into a remote shell command, allowing command
injection; fix it by sanitizing/passing the pattern as a safe argument instead
of embedding it unescaped: in patch-nodes.sh update the verification block that
uses FIX_GREP_PATTERN so you quote variables everywhere, call grep with -- and a
quoted argument (e.g. ssh ... core@${MASTER_1} 'grep -n -- "$1"
/usr/lib/ocf/resource.d/heartbeat/podman-etcd' -- "$FIX_GREP_PATTERN" or escape
the pattern with printf '%q' before embedding), add set -euo pipefail at the
script top, and run shellcheck to ensure no remaining quoting issues; reference
FIX_GREP_PATTERN and the nested ssh line when making the change.

In `@plugins/two-node/scripts/verify-cluster.sh`:
- Around line 50-51: The current pipeline using "oc get co | awk 'NR==1 ||
/False/'" matches any row with "False" (e.g., Progressing=False) and must be
changed to only flag operators whose AVAILABLE column is False; modify the awk
expression after "oc get co" to detect the header index for the AVAILABLE column
and then print the header plus only rows where that AVAILABLE field == "False"
(for example, replace the simple /False/ filter with an awk snippet that finds
the AVAILABLE column in NR==1 and for subsequent lines tests that column ==
"False").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2f9d93b7-a3fc-431f-846e-1102abbcce81

📥 Commits

Reviewing files that changed from the base of the PR and between 5af18f1 and 4e0ebc2.

📒 Files selected for processing (3)
  • plugins/two-node/scripts/collect-logs.sh
  • plugins/two-node/scripts/patch-nodes.sh
  • plugins/two-node/scripts/verify-cluster.sh

Comment thread plugins/two-node/scripts/collect-logs.sh
Comment thread plugins/two-node/scripts/patch-nodes.sh Outdated
Comment thread plugins/two-node/scripts/patch-nodes.sh
Comment thread plugins/two-node/scripts/verify-cluster.sh Outdated
lucaconsalvi and others added 3 commits May 5, 2026 13:27
Client-side variable expansion in ssh commands is intentional — we
resolve HYPERVISOR, MASTER_0/1, SSH_OPTS, etc. locally before sending
the command string to the remote host.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Validate minutes-ago input is numeric in collect-logs.sh
- Fix cluster operator health filter in verify-cluster.sh to check
  Available/Progressing/Degraded columns instead of matching any "False"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The grep -v pipeline runs on the hypervisor's remote shell which has no
pipefail, so rpm-ostree failures could be masked by grep's exit code.
Capture output first, then filter warnings locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant