Skip to content

Add rc-testing plugin for OCP edge topology RC validation#107

Open
dhensel-rh wants to merge 3 commits intoopenshift-eng:mainfrom
dhensel-rh:agentic_rc_testing
Open

Add rc-testing plugin for OCP edge topology RC validation#107
dhensel-rh wants to merge 3 commits intoopenshift-eng:mainfrom
dhensel-rh:agentic_rc_testing

Conversation

@dhensel-rh
Copy link
Copy Markdown
Contributor

@dhensel-rh dhensel-rh commented Apr 29, 2026

Summary

Adds a new rc-testing plugin that provides an end-to-end workflow for validating OCP release candidate builds across edge topologies (TNF, TNA, SNO 4vCPU).

  • Launch Prow CI jobs via gangway-cli against RC builds
  • Monitor job status by polling GCS artifacts
  • Extract failure root causes from junit XML
  • Generate Jira-ready markdown reports
  • Claude Code /rc-test skill for conversational RC testing

What it does

scripts/launch.sh — Unified job launcher wrapping gangway-cli

  • Pre-flight checks: verifies release image on quay.io, validates app.ci token
  • Job selection by number, list, pattern, or all
  • Three job files per topology (regular, z-stream, y-stream) with --initial controlling upgrades
  • --relaunch-failed to re-launch failures from the latest run
  • Stale version warning when job files don't match requested version
  • --dry-run mode for preview

scripts/status.sh — Status checker with multiple output modes

  • Table view (default), structured JSON (--json), Jira markdown (--report)
  • Failure-only filtering (--failed)
  • Root cause extraction from junit_operator.xml (--logs)
  • Failure classification against Sippy nightly pass rates (--classify)
  • Watch mode (--watch) with single-fetch-per-cycle
  • Job numbering consistent with launch.sh --list
  • Exit code 1 on failures for scripting

skills/rc-test/SKILL.md — Claude Code skill for conversational workflow

  • Natural language mapping: "launch TNF against rc.0", "what failed?", "update the Jira ticket"
  • Integrates with Jira MCP for posting reports

Supersedes #93 (accidentally closed).

Summary by CodeRabbit

  • New Features

    • Added CLI tooling to list, refresh, launch, relaunch, and monitor RC jobs across topologies, including selective/job-number selectors, dry-run, and relaunch-failed flows.
    • Added result inspection with classification, log extraction, watch polling, JSON/table/Jira output, and run tracking directories.
    • Added curated nightly job lists for TNF/TNA/SNO topologies.
  • Documentation

    • Added comprehensive README and skill docs explaining setup, usage, flows, and reporting.
  • Chores

    • Ignored runs/ directory via .gitignore.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhensel-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds the edge-ocp-rc plugin: job list files for TNF/TNA/SNO (regular, z-/y-streams), two new Bash CLIs (launch.sh, status.sh), a README and a Claude Code rc-test skill doc, and a .gitignore entry to exclude runs/.

Changes

Cohort / File(s) Summary
Project config
plugins/edge-ocp-rc/.gitignore
Add runs/ to ignore list.
Documentation
plugins/edge-ocp-rc/README.md, plugins/edge-ocp-rc/skills/rc-test/SKILL.md
Add README explaining build/config/use and a Claude Code skill doc mapping workflows to launch.sh/status.sh.
Job definitions
plugins/edge-ocp-rc/jobs/tnf.txt, plugins/edge-ocp-rc/jobs/tna.txt, plugins/edge-ocp-rc/jobs/sno.txt, plugins/edge-ocp-rc/jobs/tnf-z-stream.txt, plugins/edge-ocp-rc/jobs/tna-z-stream.txt, plugins/edge-ocp-rc/jobs/tna-y-stream.txt, plugins/edge-ocp-rc/jobs/sno-z-stream.txt
Add/appended multiple periodic CI job identifier entries for nightly OCP 4.22 across TNF/TNA/SNO and z-/y-stream upgrade lists (many new job names).
Orchestration scripts
plugins/edge-ocp-rc/scripts/launch.sh, plugins/edge-ocp-rc/scripts/status.sh
Add launch.sh (job discovery, Sippy queries, gangway launches, relaunch-failed handling, run dir management) and status.sh (run inspection, GCS finished.json retrieval, failure extraction, Sippy-based classification, table/JSON/Jira outputs, watch mode).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR—adding a new plugin for release-candidate testing across OCP edge topologies.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
plugins/rc-testing/scripts/status.sh (2)

115-125: Consider using find instead of ls for robustness.

While run names are validated to alphanumeric characters, using find is more robust for handling edge cases with directory listing.

Proposed fix
 # Find run directory
 if [[ -n "$RUN_NAME" ]]; then
     RUN_DIR="$SCRIPT_DIR/runs/$RUN_NAME"
 else
-    RUN_DIR=$(ls -td "$SCRIPT_DIR/runs"/*/ 2>/dev/null | head -1 || true)
+    RUN_DIR=$(find "$SCRIPT_DIR/runs" -maxdepth 1 -mindepth 1 -type d -printf '%T@ %p\n' 2>/dev/null | sort -rn | head -1 | cut -d' ' -f2- || true)
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/rc-testing/scripts/status.sh` around lines 115 - 125, The current
RUN_DIR discovery uses ls which can break on weird filenames; update the
fallback branch that assigns RUN_DIR (when RUN_NAME is empty) to use find to
locate the most recent subdirectory under $SCRIPT_DIR/runs (e.g., use find
"$SCRIPT_DIR/runs" -mindepth 1 -maxdepth 1 -type d -print0 piped to xargs -0
stat/ls or sort by modification time) so it robustly handles spaces and special
characters; keep the existing validation that checks RUN_DIR is non-empty and a
directory, and still respect the explicit RUN_NAME branch that sets
RUN_DIR="$SCRIPT_DIR/runs/$RUN_NAME".

1-2: Shebang should use direct path per CONTRIBUTING.md.

The coding guidelines specify using #!/usr/bin/bash rather than #!/usr/bin/env bash.

Per CONTRIBUTING.md: "Shell scripts should use #!/usr/bin/bash"

Proposed fix
-#!/usr/bin/env bash
+#!/usr/bin/bash
 set -euo pipefail
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/rc-testing/scripts/status.sh` around lines 1 - 2, Replace the current
shebang line that uses /usr/bin/env (#!/usr/bin/env bash) at the top of the
status.sh script with the required direct path shebang (#!/usr/bin/bash) per
CONTRIBUTING.md so the script uses /usr/bin/bash instead of invoking env.
plugins/rc-testing/scripts/launch.sh (2)

301-352: Consider using find instead of ls for robustness.

Static analysis (SC2012) flags line 302. While run directories use controlled date-based naming (making special character issues unlikely), using find would be more robust:

Proposed fix
-    PREV_RUN_DIR=$(ls -td "$SCRIPT_DIR/runs"/*/"$TOPOLOGY" 2>/dev/null | head -1 || true)
+    PREV_RUN_DIR=$(find "$SCRIPT_DIR/runs" -mindepth 2 -maxdepth 2 -type d -name "$TOPOLOGY" -printf '%T@ %p\n' 2>/dev/null | sort -rn | head -1 | cut -d' ' -f2- || true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/rc-testing/scripts/launch.sh` around lines 301 - 352, Replace the
fragile ls-based discovery of PREV_RUN_DIR with a find-based search: when
RELAUNCH_FAILED is true, use find rooted at "$SCRIPT_DIR/runs" to locate
directories named "$TOPOLOGY", sort by newest modification time and pick the
newest path to set PREV_RUN_DIR (this replaces the ls -td ... | head -1 usage);
keep the same subsequent checks that PREV_RUN_DIR is non-empty and a directory
and leave the rest of the RELAUNCH_FAILED handling (COMBINED_JOBS, FAILED_NUMS,
JOB_FILTER) unchanged.

429-450: Minor: A && B || C pattern could have unexpected behavior.

Static analysis (SC2015) notes that in cmd && { success } || { failure }, the failure block runs if any command in the success block fails, not just cmd. Here, if echo "launched" or sleep failed (unlikely), the error block would execute.

This is low-risk in practice, but an explicit if/else would be more robust:

Proposed fix
-            GANGWAY_OUTPUT=$("$GANGWAY_BIN" \
-                --api-url="$GANGWAY_API" \
-                --initial "$job_initial" \
-                --latest "$RELEASE_IMAGE" \
-                --job-name "$JOB" \
-                --jobs-file-path="$RUN_DIR" 2>&1) && {
-                echo "  launched"
-                sleep "$DELAY"
-            } || {
-                if echo "$GANGWAY_OUTPUT" | grep -q "500 Internal Server Error"; then
-                    echo "  SKIPPED — job not found in Prow (HTTP 500). Remove from job file or run --refresh."
-                else
-                    echo "  FAILED to launch: $GANGWAY_OUTPUT"
-                    FAILED=$((FAILED + 1))
-                fi
-            }
+            if GANGWAY_OUTPUT=$("$GANGWAY_BIN" \
+                --api-url="$GANGWAY_API" \
+                --initial "$job_initial" \
+                --latest "$RELEASE_IMAGE" \
+                --job-name "$JOB" \
+                --jobs-file-path="$RUN_DIR" 2>&1); then
+                echo "  launched"
+                sleep "$DELAY"
+            else
+                if echo "$GANGWAY_OUTPUT" | grep -q "500 Internal Server Error"; then
+                    echo "  SKIPPED — job not found in Prow (HTTP 500). Remove from job file or run --refresh."
+                else
+                    echo "  FAILED to launch: $GANGWAY_OUTPUT"
+                    FAILED=$((FAILED + 1))
+                fi
+            fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/rc-testing/scripts/launch.sh` around lines 429 - 450, The current
launch block uses the fragile "cmd && { success } || { failure }" pattern when
running "$GANGWAY_BIN" and assigning GANGWAY_OUTPUT, which can invoke the
failure branch if any command in the success block (like echo or sleep) fails;
change it to an explicit if/then/else that runs the command, captures its output
into GANGWAY_OUTPUT, checks the command's exit status (if [ $? -eq 0 ] or use if
"$GANGWAY_BIN" ... >/dev/null 2>&1; then), and then runs the success path (echo
"launched" and sleep "$DELAY") in the then branch and the error handling
(inspect GANGWAY_OUTPUT for "500 Internal Server Error" or increment FAILED) in
the else branch, referencing the existing variables GANGWAY_BIN, GANGWAY_OUTPUT,
JOB, RELEASE_IMAGE, RUN_DIR and DELAY so the logic and messages remain
unchanged.
plugins/rc-testing/README.md (1)

177-289: Add trailing newline.

The file appears to be missing a trailing newline at the end (line 289). Per markdownlint (MD047), files should end with a single newline character.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/rc-testing/README.md` around lines 177 - 289, The README's final
section under the "## status.sh" header is missing a trailing newline (violates
MD047); open the file containing the "## status.sh" block
(plugins/rc-testing/README.md) and ensure the file ends with a single newline
character by adding one newline at EOF so the last line (the end of the Agentic
workflow list) is followed by exactly one newline.
plugins/rc-testing/skills/rc-test/SKILL.md (1)

122-140: Add trailing newline.

The file is missing a trailing newline at the end (line 140). Per markdownlint (MD047), files should end with a single newline character.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/rc-testing/skills/rc-test/SKILL.md` around lines 122 - 140, The
Markdown file SKILL.md is missing a trailing newline (MD047); open the file (end
of the "Important Notes" section / after the final bullet), add a single newline
character at EOF so the file ends with exactly one trailing newline, and save
the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/rc-testing/scripts/launch.sh`:
- Around line 1-2: The script shebang must be changed from the portable form to
the project standard: replace the current "#!/usr/bin/env bash" with
"#!/usr/bin/bash" at the top of the launch.sh file so the script uses
/usr/bin/bash per CONTRIBUTING.md; ensure the first line is updated only and
retains the existing "set -euo pipefail" behavior immediately after.

---

Nitpick comments:
In `@plugins/rc-testing/README.md`:
- Around line 177-289: The README's final section under the "## status.sh"
header is missing a trailing newline (violates MD047); open the file containing
the "## status.sh" block (plugins/rc-testing/README.md) and ensure the file ends
with a single newline character by adding one newline at EOF so the last line
(the end of the Agentic workflow list) is followed by exactly one newline.

In `@plugins/rc-testing/scripts/launch.sh`:
- Around line 301-352: Replace the fragile ls-based discovery of PREV_RUN_DIR
with a find-based search: when RELAUNCH_FAILED is true, use find rooted at
"$SCRIPT_DIR/runs" to locate directories named "$TOPOLOGY", sort by newest
modification time and pick the newest path to set PREV_RUN_DIR (this replaces
the ls -td ... | head -1 usage); keep the same subsequent checks that
PREV_RUN_DIR is non-empty and a directory and leave the rest of the
RELAUNCH_FAILED handling (COMBINED_JOBS, FAILED_NUMS, JOB_FILTER) unchanged.
- Around line 429-450: The current launch block uses the fragile "cmd && {
success } || { failure }" pattern when running "$GANGWAY_BIN" and assigning
GANGWAY_OUTPUT, which can invoke the failure branch if any command in the
success block (like echo or sleep) fails; change it to an explicit if/then/else
that runs the command, captures its output into GANGWAY_OUTPUT, checks the
command's exit status (if [ $? -eq 0 ] or use if "$GANGWAY_BIN" ... >/dev/null
2>&1; then), and then runs the success path (echo "launched" and sleep "$DELAY")
in the then branch and the error handling (inspect GANGWAY_OUTPUT for "500
Internal Server Error" or increment FAILED) in the else branch, referencing the
existing variables GANGWAY_BIN, GANGWAY_OUTPUT, JOB, RELEASE_IMAGE, RUN_DIR and
DELAY so the logic and messages remain unchanged.

In `@plugins/rc-testing/scripts/status.sh`:
- Around line 115-125: The current RUN_DIR discovery uses ls which can break on
weird filenames; update the fallback branch that assigns RUN_DIR (when RUN_NAME
is empty) to use find to locate the most recent subdirectory under
$SCRIPT_DIR/runs (e.g., use find "$SCRIPT_DIR/runs" -mindepth 1 -maxdepth 1
-type d -print0 piped to xargs -0 stat/ls or sort by modification time) so it
robustly handles spaces and special characters; keep the existing validation
that checks RUN_DIR is non-empty and a directory, and still respect the explicit
RUN_NAME branch that sets RUN_DIR="$SCRIPT_DIR/runs/$RUN_NAME".
- Around line 1-2: Replace the current shebang line that uses /usr/bin/env
(#!/usr/bin/env bash) at the top of the status.sh script with the required
direct path shebang (#!/usr/bin/bash) per CONTRIBUTING.md so the script uses
/usr/bin/bash instead of invoking env.

In `@plugins/rc-testing/skills/rc-test/SKILL.md`:
- Around line 122-140: The Markdown file SKILL.md is missing a trailing newline
(MD047); open the file (end of the "Important Notes" section / after the final
bullet), add a single newline character at EOF so the file ends with exactly one
trailing newline, and save the file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4beeb929-2298-4a60-97db-01931b677f1a

📥 Commits

Reviewing files that changed from the base of the PR and between ca69b0f and 9b85705.

📒 Files selected for processing (11)
  • plugins/rc-testing/README.md
  • plugins/rc-testing/jobs/sno-z-stream.txt
  • plugins/rc-testing/jobs/sno.txt
  • plugins/rc-testing/jobs/tna-y-stream.txt
  • plugins/rc-testing/jobs/tna-z-stream.txt
  • plugins/rc-testing/jobs/tna.txt
  • plugins/rc-testing/jobs/tnf-z-stream.txt
  • plugins/rc-testing/jobs/tnf.txt
  • plugins/rc-testing/scripts/launch.sh
  • plugins/rc-testing/scripts/status.sh
  • plugins/rc-testing/skills/rc-test/SKILL.md

Comment thread plugins/edge-ocp-rc/scripts/launch.sh
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
plugins/edge-ocp-rc/scripts/launch.sh (1)

1-1: ⚠️ Potential issue | 🟠 Major

Use the repository-standard shebang.

Line 1 should use /usr/bin/bash to match project policy.

Proposed fix
-#!/usr/bin/env bash
+#!/usr/bin/bash

As per coding guidelines, "Shell scripts should use: #!/usr/bin/bash, set -euo pipefail, quote all variables, and pass shellcheck."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/edge-ocp-rc/scripts/launch.sh` at line 1, Update the script launch.sh
to use the repository-standard shebang and strict shell safety: change the first
line from #!/usr/bin/env bash to #!/usr/bin/bash, add set -euo pipefail as the
next line, and audit the script to quote all variable expansions (e.g., "$var")
and fix any shellcheck warnings; after changes, run shellcheck on launch.sh and
address reported issues before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/edge-ocp-rc/scripts/launch.sh`:
- Line 100: The RUN_NAME currently set as RUN_NAME="$(date +%Y-%m-%d)" causes
same-day collisions when artifacts are written to runs/${RUN_NAME}/${TOPOLOGY};
change RUN_NAME to include a more granular unique suffix (e.g., timestamp with
hours/minutes/seconds or epoch milliseconds, or append $$/$(uuidgen)/$(mktemp
-u)) so each launch creates a distinct directory; update any references that
write to runs/${RUN_NAME}/${TOPOLOGY} (and related cleanup/relaunch logic) to
use the new RUN_NAME value so --relaunch-failed/status behavior is preserved.

In `@plugins/edge-ocp-rc/scripts/status.sh`:
- Line 1: Replace the current shebang with the repository-standard one
(/usr/bin/bash), add strict mode by enabling set -euo pipefail at the top of
plugins/edge-ocp-rc/scripts/status.sh, ensure all variable references in the
script are properly quoted, and run shellcheck to fix any reported issues;
locate the file by its script name (status.sh) and update the header and
variable usages accordingly.
- Around line 100-113: The loop currently discards the child command's exit code
and treats bc failures as pending=0 which can prematurely end watch mode; change
the invocation of "$0" "${PASS_ARGS[@]}" to capture its exit code (e.g., rc=$?)
immediately after running and if rc is non-zero echo the output and exit with
that rc instead of continuing, and compute pending in a way that doesn't rely on
bc returning success (e.g., replace the bc pipeline with a safe sum using awk or
check the bc exit status and if it fails set pending to a non-zero sentinel) so
that variables output, pending, PASS_ARGS and WATCH_INTERVAL behave correctly
and the script doesn't exit 0 on hidden failures.

In `@plugins/edge-ocp-rc/skills/rc-test/SKILL.md`:
- Line 4: The skill metadata's allowed-tools list is missing the Jira MCP tool
referenced by the "update Jira" workflow (documented at line ~98); update the
allowed-tools entry on the header (the line that currently reads "allowed-tools:
Bash(bash *) Bash(cd *) Read") to include the Jira MCP tool identifier used by
your tool registry (e.g., add the exact token/name for the Jira MCP tool such as
"JiraMCP(...)" or the registry-specific identifier) so the documented Jira
posting action can be executed at runtime.

---

Duplicate comments:
In `@plugins/edge-ocp-rc/scripts/launch.sh`:
- Line 1: Update the script launch.sh to use the repository-standard shebang and
strict shell safety: change the first line from #!/usr/bin/env bash to
#!/usr/bin/bash, add set -euo pipefail as the next line, and audit the script to
quote all variable expansions (e.g., "$var") and fix any shellcheck warnings;
after changes, run shellcheck on launch.sh and address reported issues before
committing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 766c7b19-f7fd-4e19-84dd-cf755cb51995

📥 Commits

Reviewing files that changed from the base of the PR and between 9b85705 and 42e0f99.

📒 Files selected for processing (12)
  • plugins/edge-ocp-rc/.gitignore
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/jobs/sno-z-stream.txt
  • plugins/edge-ocp-rc/jobs/sno.txt
  • plugins/edge-ocp-rc/jobs/tna-y-stream.txt
  • plugins/edge-ocp-rc/jobs/tna-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tna.txt
  • plugins/edge-ocp-rc/jobs/tnf-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tnf.txt
  • plugins/edge-ocp-rc/scripts/launch.sh
  • plugins/edge-ocp-rc/scripts/status.sh
  • plugins/edge-ocp-rc/skills/rc-test/SKILL.md
✅ Files skipped from review due to trivial changes (9)
  • plugins/edge-ocp-rc/jobs/tna-z-stream.txt
  • plugins/edge-ocp-rc/jobs/sno.txt
  • plugins/edge-ocp-rc/jobs/tna-y-stream.txt
  • plugins/edge-ocp-rc/jobs/tnf.txt
  • plugins/edge-ocp-rc/jobs/tnf-z-stream.txt
  • plugins/edge-ocp-rc/.gitignore
  • plugins/edge-ocp-rc/jobs/tna.txt
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/jobs/sno-z-stream.txt

Comment thread plugins/edge-ocp-rc/scripts/launch.sh
Comment thread plugins/edge-ocp-rc/scripts/status.sh
Comment thread plugins/edge-ocp-rc/scripts/status.sh
Comment thread plugins/edge-ocp-rc/scripts/status.sh
Comment thread plugins/edge-ocp-rc/skills/rc-test/SKILL.md Outdated
@lucaconsalvi
Copy link
Copy Markdown
Contributor

Thanks Doug for addressing all the feedback. Just one remaining thing:
grep -oP (Perl regex with \K) is used in load_config, detect_release, and collect_topology — macOS BSD grep doesn't support -P. Since the shebang was changed to #!/usr/bin/env bash for macOS compatibility, these should be replaced with sed or grep -oE + cut.

@brandisher
Copy link
Copy Markdown
Contributor

PR is not yet ready for human review. PR needs to pass the markdownlint check and have major/critical coderabbit findings addressed. Consider making this PR a Draft until its ready for human review.

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2026
@dhensel-rh dhensel-rh marked this pull request as draft April 30, 2026 20:09
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
plugins/edge-ocp-rc/scripts/launch.sh (1)

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Repository shell shebang standard is not followed.

Per CONTRIBUTING.md, this script should use #!/usr/bin/bash for shell standard compliance.

As per coding guidelines, "For this PR’s Bash scripts/docs under plugins/edge-ocp-rc: use the required shell standards (shebang #!/usr/bin/bash, set -euo pipefail, quote variables, and pass shellcheck)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/edge-ocp-rc/scripts/launch.sh` at line 1, The script launch.sh
currently uses an incorrect shebang; update the shebang to #!/usr/bin/bash, add
strict mode (set -euo pipefail) at the top of the script, ensure all variable
usages in functions and commands are properly quoted (e.g., "$var"), and run/fix
issues reported by shellcheck to satisfy the repository's Bash standards; search
for launch.sh and modify the shebang and top-of-file initialization and then
address quoting warnings in the script.
plugins/edge-ocp-rc/scripts/status.sh (3)

184-226: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add positive/negative tests for parsing and classification paths.

This file introduces non-trivial parsing/classification behavior (junit_operator.xml extraction, nightly release parsing, Sippy classification), but no matching validation tests are included.

As per coding guidelines, "Shell regex/parsing changes should be portable (avoid non-portable GNU/Pcre grep features); add/adjust tests where validation/regex/parsing logic exists (positive + negative cases)."

Also applies to: 228-317


100-117: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Watch mode reports success even when failures exist.

At Line 116, watch mode always exits 0 after loop completion. If the child run returned 1 (failures/aborts) and pending reached 0, this masks failures in automation.

Suggested fix
-    while true; do
+    last_rc=0
+    while true; do
         rc=0
         output=$("$0" "${PASS_ARGS[@]}" 2>&1) || rc=$?
+        last_rc="$rc"
         echo "$output"
         if [[ "$rc" -ne 0 && "$rc" -ne 1 ]]; then
             exit "$rc"
@@
         if [[ "$pending" -eq 0 ]]; then
             echo "All jobs complete."
             break
         fi
@@
-    exit 0
+    exit "$last_rc"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/edge-ocp-rc/scripts/status.sh` around lines 100 - 117, The watch loop
currently always exits 0 after all jobs complete, masking prior failures; update
the logic around the pending==0 branch to propagate the last child return code
(rc) instead of unconditionally exiting 0 — use the existing rc variable set
from running "$0" "${PASS_ARGS[@]}" and when pending reaches 0, exit with rc (or
0 if rc is empty/zero) rather than breaking and falling through to exit 0;
adjust handling near the pending check and the final exit so that
failures/aborts (rc==1) are returned to callers while still printing "All jobs
complete." and respecting WATCH_INTERVAL and PASS_ARGS.

1-1: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Repository shell shebang standard is not followed.

Per CONTRIBUTING.md, this script should use #!/usr/bin/bash for shell standard compliance.

As per coding guidelines, "For this PR’s Bash scripts/docs under plugins/edge-ocp-rc: use the required shell standards (shebang #!/usr/bin/bash, set -euo pipefail, quote variables, and pass shellcheck)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/edge-ocp-rc/scripts/status.sh` at line 1, Update the script header in
status.sh to use the repository standard shebang and strict options: replace the
current shebang with #!/usr/bin/bash and add "set -euo pipefail" immediately
after it; then audit the script for unquoted variable expansions and quote all
variable uses, and run/fix issues reported by shellcheck so the file passes
linting under the repo guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/edge-ocp-rc/scripts/launch.sh`:
- Around line 32-52: Add positive and negative shell tests that exercise the
detect_release function and the requested-release parsing logic: create test
cases that (1) provide JOB_FILE, JOB_FILE_Z, and JOB_FILE_Y content containing
nightly-<version> lines to verify awk extracts the correct major.minor (e.g.,
"nightly-1.2.3"), (2) ensure empty or missing job files return empty string, (3)
pass a version_tag matching ^([0-9]+\.[0-9]+) to confirm BASH_REMATCH[1] is
echoed, and (4) include malformed tags and edge cases (e.g., non-numeric, extra
prefixes/suffixes) to assert the function returns "" and does not mis-select;
run these tests in a POSIX-compatible shell to avoid GNU/Pcre-only features and
assert outputs for both positive and negative cases so the detect_release
function and the requested-release checks are fully covered.

---

Duplicate comments:
In `@plugins/edge-ocp-rc/scripts/launch.sh`:
- Line 1: The script launch.sh currently uses an incorrect shebang; update the
shebang to #!/usr/bin/bash, add strict mode (set -euo pipefail) at the top of
the script, ensure all variable usages in functions and commands are properly
quoted (e.g., "$var"), and run/fix issues reported by shellcheck to satisfy the
repository's Bash standards; search for launch.sh and modify the shebang and
top-of-file initialization and then address quoting warnings in the script.

In `@plugins/edge-ocp-rc/scripts/status.sh`:
- Around line 100-117: The watch loop currently always exits 0 after all jobs
complete, masking prior failures; update the logic around the pending==0 branch
to propagate the last child return code (rc) instead of unconditionally exiting
0 — use the existing rc variable set from running "$0" "${PASS_ARGS[@]}" and
when pending reaches 0, exit with rc (or 0 if rc is empty/zero) rather than
breaking and falling through to exit 0; adjust handling near the pending check
and the final exit so that failures/aborts (rc==1) are returned to callers while
still printing "All jobs complete." and respecting WATCH_INTERVAL and PASS_ARGS.
- Line 1: Update the script header in status.sh to use the repository standard
shebang and strict options: replace the current shebang with #!/usr/bin/bash and
add "set -euo pipefail" immediately after it; then audit the script for unquoted
variable expansions and quote all variable uses, and run/fix issues reported by
shellcheck so the file passes linting under the repo guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1dbfdf65-ef80-44e3-a092-389b8cbc5272

📥 Commits

Reviewing files that changed from the base of the PR and between 42e0f99 and 258df1c.

📒 Files selected for processing (4)
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/scripts/launch.sh
  • plugins/edge-ocp-rc/scripts/status.sh
  • plugins/edge-ocp-rc/skills/rc-test/SKILL.md
✅ Files skipped from review due to trivial changes (2)
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/skills/rc-test/SKILL.md

Comment thread plugins/edge-ocp-rc/scripts/launch.sh
@dhensel-rh
Copy link
Copy Markdown
Contributor Author

/test markdownlint

@dhensel-rh
Copy link
Copy Markdown
Contributor Author

/test markdownlint

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
plugins/edge-ocp-rc/scripts/launch.sh (1)

357-362: ⚠️ Potential issue | 🟠 Major

Update shebang and use portable date format.

Line 361's date -Iseconds is GNU-specific and fails on BSD/macOS; use date '+%Y-%m-%dT%H:%M:%S%z' instead.

Additionally, the script shebang should be #!/usr/bin/bash per CONTRIBUTING.md Code Standards (currently #!/usr/bin/env bash).

Portable replacements
-#!/usr/bin/env bash
+#!/usr/bin/bash
-LAUNCHED=$(date -Iseconds)
+LAUNCHED=$(date '+%Y-%m-%dT%H:%M:%S%z')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/edge-ocp-rc/scripts/launch.sh` around lines 357 - 362, Change the
script shebang from #!/usr/bin/env bash to #!/usr/bin/bash per CONTRIBUTING.md
and update the timestamp generation in the block that writes
"$RUN_DIR/config.env" (the LAUNCHED assignment) to use a portable date format;
replace the GNU-only date -Iseconds with date '+%Y-%m-%dT%H:%M:%S%z' so LAUNCHED
is produced portably across BSD/macOS and Linux.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/edge-ocp-rc/scripts/launch.sh`:
- Around line 213-215: The grep patterns using the non-portable escape sequence
'\s' (used when computing JOB_COUNT, Z_COUNT, Y_COUNT against JOB_FILE,
JOB_FILE_Z, JOB_FILE_Y) must be replaced with the POSIX character class form;
update the three grep invocations to use '^[[:space:]]*$' instead of '^\s*$' and
also update any other occurrences of the same pattern elsewhere in the script
(the other grep lines reported in the review) to ensure POSIX-compliant behavior
on BSD/portable grep implementations.

In `@plugins/edge-ocp-rc/scripts/status.sh`:
- Around line 489-496: The per-topology loop calls load_config which mutates
run-level variables (RELEASE_IMAGE, LAUNCHED) causing the run header/JSON root
to reflect only the last topology; fix by preserving run-level metadata before
the loop and restoring it after each per-topology load (or change load_config to
only load into a local/topology-scoped map), i.e., capture the original
RELEASE_IMAGE and LAUNCHED (and any other run-level keys) before iterating
TOPO_LIST, call load_config and run save_topo_config/collect_topology, then
restore RELEASE_IMAGE and LAUNCHED so the top-level run metadata remains stable
(apply same change to the other loop at lines 522-529).

---

Duplicate comments:
In `@plugins/edge-ocp-rc/scripts/launch.sh`:
- Around line 357-362: Change the script shebang from #!/usr/bin/env bash to
#!/usr/bin/bash per CONTRIBUTING.md and update the timestamp generation in the
block that writes "$RUN_DIR/config.env" (the LAUNCHED assignment) to use a
portable date format; replace the GNU-only date -Iseconds with date
'+%Y-%m-%dT%H:%M:%S%z' so LAUNCHED is produced portably across BSD/macOS and
Linux.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 88b9db29-3f8a-4764-85ec-40f7adc0e05c

📥 Commits

Reviewing files that changed from the base of the PR and between 258df1c and f84bc6e.

📒 Files selected for processing (12)
  • plugins/edge-ocp-rc/.gitignore
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/jobs/sno-z-stream.txt
  • plugins/edge-ocp-rc/jobs/sno.txt
  • plugins/edge-ocp-rc/jobs/tna-y-stream.txt
  • plugins/edge-ocp-rc/jobs/tna-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tna.txt
  • plugins/edge-ocp-rc/jobs/tnf-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tnf.txt
  • plugins/edge-ocp-rc/scripts/launch.sh
  • plugins/edge-ocp-rc/scripts/status.sh
  • plugins/edge-ocp-rc/skills/rc-test/SKILL.md
✅ Files skipped from review due to trivial changes (8)
  • plugins/edge-ocp-rc/.gitignore
  • plugins/edge-ocp-rc/jobs/tnf-z-stream.txt
  • plugins/edge-ocp-rc/jobs/sno.txt
  • plugins/edge-ocp-rc/jobs/tna.txt
  • plugins/edge-ocp-rc/jobs/tnf.txt
  • plugins/edge-ocp-rc/jobs/tna-z-stream.txt
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/skills/rc-test/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/edge-ocp-rc/jobs/sno-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tna-y-stream.txt

Comment thread plugins/edge-ocp-rc/scripts/launch.sh Outdated
Comment thread plugins/edge-ocp-rc/scripts/status.sh
Launch Prow CI jobs via gangway-cli, track results, investigate failures,
and report to Jira for TNF, TNA, and SNO topologies.

launch.sh:
- Pre-flight checks (quay.io tag, gangway token)
- Three job files per topology (regular, z-stream, y-stream)
- Job selection by number, list, pattern, or all
- Stale version warning when job files don't match requested version
- --relaunch-failed to re-launch failures from latest run
- --dry-run mode

status.sh:
- Table, JSON, and Jira markdown output modes
- Failure classification against Sippy nightly pass rates (--classify)
- Root cause extraction from junit XML (--logs)
- Watch mode with single-fetch-per-cycle and exit code propagation
- Job numbering consistent with launch.sh --list
- Per-topology config isolation

skills/rc-test/SKILL.md:
- Claude Code skill for conversational RC testing workflow
- Jira MCP integration for posting reports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dhensel-rh dhensel-rh force-pushed the agentic_rc_testing branch from f84bc6e to 2ea4314 Compare May 1, 2026 00:57
@dhensel-rh
Copy link
Copy Markdown
Contributor Author

/test markdownlint

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
plugins/edge-ocp-rc/scripts/status.sh (1)

447-455: ⚡ Quick win

Escape markdown table cell content in report output.

reason (and occasionally job text) may include | or newlines, which breaks Jira markdown table rendering.

Proposed fix
+escape_md_cell() {
+    local s="$1"
+    s="${s//$'\n'/'<br/>'}"
+    s="${s//|/\\|}"
+    printf '%s' "$s"
+}
+
 ...
         while IFS=$'\t' read -r num status job_name url reason classification pass_pct; do
-            local job_link="[${job_name}](${url})"
-            local note="${reason:-}"
+            local job_link="[${job_name}](${url})"
+            local note
+            note="$(escape_md_cell "${reason:-}")"
             if $CLASSIFY && [[ -n "$classification" ]]; then
                 local pct_display
                 pct_display=$(printf '%.0f' "$pass_pct" 2>/dev/null || echo "?")
                 echo "| ${num} | ${status} | ${classification} (${pct_display}%) | ${job_link} | ${note} |"
             else
                 echo "| ${num} | ${status} | ${job_link} | ${note} |"
             fi
         done < "$display_file"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/edge-ocp-rc/scripts/status.sh` around lines 447 - 455, The table
output can break when cell values contain '|' or newlines; add a sanitizer
(e.g., sanitize_cell) and call it on reason, job_name/job_link and
classification before printing so pipe characters are escaped (replace '|' with
'\|') and newlines are converted to spaces or Jira-friendly breaks; update the
while-loop where variables are read (the block that builds job_link, note,
pct_display and echoes the table rows) to use the sanitized values instead of
raw ${reason}, ${job_link}, and ${classification}.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/edge-ocp-rc/scripts/launch.sh`:
- Around line 374-385: The current preflight in launch.sh accepts any
non-000/401/403 response as success; update the HTTP_CODE check (the curl call
that sets HTTP_CODE and the following token validation block) to treat any
non-2xx response as failure: after obtaining HTTP_CODE from the curl to
"${GANGWAY_API}/v1/executions/" using MY_APPCI_TOKEN, if HTTP_CODE does not
begin with "2" (i.e., not a 2xx), print an error including the actual HTTP_CODE
and exit non‑zero; retain the existing special handling for "000" and 401/403
but ensure all other codes (404, 429, 5xx, etc.) are treated as token/API
preflight failure.

In `@plugins/edge-ocp-rc/scripts/status.sh`:
- Around line 492-497: Before processing each topology, reset the run-level
defaults so a missing per-topology config.env cannot inherit previous topology
values; specifically, in the for loop before checking
"$RUN_DIR/$topo_name/config.env" call the run-level loader or clear the
per-topology keys (e.g. invoke load_config on the run-level config or explicitly
unset RELEASE_IMAGE and LAUNCHED) so load_config
"$RUN_DIR/$topo_name/config.env" only overrides fresh defaults and then call
save_topo_config "$topo_name".

---

Nitpick comments:
In `@plugins/edge-ocp-rc/scripts/status.sh`:
- Around line 447-455: The table output can break when cell values contain '|'
or newlines; add a sanitizer (e.g., sanitize_cell) and call it on reason,
job_name/job_link and classification before printing so pipe characters are
escaped (replace '|' with '\|') and newlines are converted to spaces or
Jira-friendly breaks; update the while-loop where variables are read (the block
that builds job_link, note, pct_display and echoes the table rows) to use the
sanitized values instead of raw ${reason}, ${job_link}, and ${classification}.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b2aa2406-2cd5-412e-916d-4238be62277b

📥 Commits

Reviewing files that changed from the base of the PR and between f84bc6e and 2ea4314.

📒 Files selected for processing (12)
  • plugins/edge-ocp-rc/.gitignore
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/jobs/sno-z-stream.txt
  • plugins/edge-ocp-rc/jobs/sno.txt
  • plugins/edge-ocp-rc/jobs/tna-y-stream.txt
  • plugins/edge-ocp-rc/jobs/tna-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tna.txt
  • plugins/edge-ocp-rc/jobs/tnf-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tnf.txt
  • plugins/edge-ocp-rc/scripts/launch.sh
  • plugins/edge-ocp-rc/scripts/status.sh
  • plugins/edge-ocp-rc/skills/rc-test/SKILL.md
✅ Files skipped from review due to trivial changes (10)
  • plugins/edge-ocp-rc/.gitignore
  • plugins/edge-ocp-rc/jobs/tnf-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tna-y-stream.txt
  • plugins/edge-ocp-rc/jobs/tna.txt
  • plugins/edge-ocp-rc/jobs/tna-z-stream.txt
  • plugins/edge-ocp-rc/jobs/sno.txt
  • plugins/edge-ocp-rc/jobs/sno-z-stream.txt
  • plugins/edge-ocp-rc/jobs/tnf.txt
  • plugins/edge-ocp-rc/README.md
  • plugins/edge-ocp-rc/skills/rc-test/SKILL.md

Comment thread plugins/edge-ocp-rc/scripts/launch.sh
Comment thread plugins/edge-ocp-rc/scripts/status.sh
Reset RELEASE_IMAGE and LAUNCHED to run-level defaults before each
topology iteration so a topology-specific config.env doesn't bleed
into subsequent topologies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dhensel-rh
Copy link
Copy Markdown
Contributor Author

/test markdownlint

@dhensel-rh dhensel-rh marked this pull request as ready for review May 1, 2026 01:36
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 1, 2026
- SC2012: suppress ls-for-directory-listing (run dirs are date-named)
- SC2015: replace && { } || { } with proper if/then/else
- SC2016: use escaped dollar signs in double quotes for jq variables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dhensel-rh
Copy link
Copy Markdown
Contributor Author

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants