Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions plugins/lvms-ci/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"name": "lvms-ci",
"description": "LVMS CI Release Manager - Analyze LVMS periodic job failures and generate HTML reports",
"version": "1.0.0",
"author": {
"name": "kasturinarra"
},
"homepage": "https://github.com/openshift-eng/edge-tooling",
"license": "Apache-2.0"
}
71 changes: 71 additions & 0 deletions plugins/lvms-ci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# lvms-ci

Analyze LVMS CI periodic job failures and generate HTML release manager reports.

## Installation

```text
/plugin marketplace add openshift-eng/edge-tooling
/plugin install lvms-ci
```

## Skills

| Skill | Description |
|---|---|
| `/lvms-ci:doctor` | Analyze CI for multiple releases and produce an HTML summary |
| `/lvms-ci:analyze-release` | Analyze all failed LVMS periodic jobs for a single release |
| `/lvms-ci:generate-html-report` | Re-generate HTML report from existing analysis files |

## Usage

### Full pipeline
```text
/lvms-ci:doctor 4.20,4.21,4.22
```

### Single release analysis
```text
/lvms-ci:analyze-release 4.22
```

### Re-generate report
```text
/lvms-ci:generate-html-report 4.20,4.21,4.22
```

## Architecture

The pipeline follows the same pattern as `microshift-ci` and reuses shared scripts where possible:

1. **Prepare** (`doctor.sh prepare`) -- collects failed jobs and downloads artifacts
2. **Analyze** -- LLM agents analyze each job in parallel via `/ci:prow-job-analyze-test-failure`
3. **Finalize** (`doctor.sh finalize`) -- aggregates results and generates HTML

### Scripts

All scripts are shared across plugins in `plugins/shared/scripts/`:

| Script | Purpose |
|---|---|
| `doctor.sh` | Orchestrator with prepare/finalize phases (`--product lvms --filter lvm`) |
| `prow-jobs-for-release.sh` | Fetch failed periodic jobs from Prow API (`--filter lvm`) |
| `download-jobs.sh` | Download job artifacts in parallel |
| `aggregate.py` | Aggregate per-job reports into release summary JSON |
| `create-report.py` | Generate HTML report (`--product lvms` enables index image section) |

### LVMS-Specific Features

- **Index image extraction**: Per-job analysis extracts the LVMS catalog index image (digest, build date, source commit) and displays it in the HTML report
- **Prow API**: Uses the standard Prow `data.js` API to discover LVMS periodic jobs

## Requirements

- `gcloud` CLI (for downloading artifacts from public GCS buckets)
- `skopeo` (for index image inspection)
- Python 3
- **Category:** ci-cd

## Author

kasturinarra
78 changes: 78 additions & 0 deletions plugins/lvms-ci/skills/analyze-release/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
name: lvms-ci:analyze-release
argument-hint: <release-version>
description: Analyze all failed LVMS periodic jobs for a single release
user-invocable: true
allowed-tools: Skill, Bash, Read, Write, Glob, Grep, Agent
---

# lvms-ci:analyze-release

## Synopsis
```bash
/lvms-ci:analyze-release <release-version>
```

## Description
Fetches failed LVMS periodic jobs for a release, downloads artifacts, analyzes each job via `/ci:prow-job-analyze-test-failure`, and produces an aggregated summary. This is a standalone version of what `/lvms-ci:doctor` does for a single release.

## Arguments
- `<release-version>` (required): e.g., 4.22, 4.21

## Scripts Directory

Shared scripts are in:
```bash
SHARED_SCRIPTS=plugins/shared/scripts
```

## Work Directory
```bash
WORKDIR=/tmp/lvms-ci-claude-workdir.$(date +%y%m%d)
```

## Steps

### Step 1: Prepare -- Collect and Download Artifacts
1. `WORKDIR=/tmp/lvms-ci-claude-workdir.$(date +%y%m%d)`
2. Run:
```bash
bash ${SHARED_SCRIPTS}/doctor.sh prepare --product lvms --filter lvm --workdir ${WORKDIR} <release>
```
3. Read the JSON output. If no failed jobs, report success and stop.

### Step 2: Analyze Each Job
For each failed job, launch a separate **Agent** with `run_in_background: true`:

```
This is an LVMS job. Artifacts are in gs://test-platform-results/.
Some build-log.txt files are gzip-compressed -- pipe through zcat if binary.

Before analyzing test failures, check artifacts/<TEST_NAME>/lvms-catalogsource/finished.json -- if "passed":false, that is the root cause. Report it and skip test analysis.

## Extract Index Image Info
Before running test analysis, extract the LVMS catalog index image from the job artifacts:
1. Fetch artifacts/<TEST_NAME>/lvms-catalogsource/build-log.txt (may be gzip-compressed)
2. Look for the line containing "LVM_INDEX_IMAGE is set to:" and extract the image reference
3. If found, run skopeo inspect --no-tags "docker://<INDEX_IMAGE>" to get:
- Digest, Build date, Source commit
4. Include this in the report under an "## Index Image" section

Run /ci:prow-job-analyze-test-failure <ARTIFACTS_DIR>

Save the full report to: <WORKDIR>/analyze-ci-release-<RELEASE>-job-<N>-<JOB_ID>.txt
```

Launch ALL agents in parallel. Wait for all to complete.

### Step 3: Finalize
1. Run:
```bash
bash ${SHARED_SCRIPTS}/doctor.sh finalize --product lvms --workdir ${WORKDIR} <release>
```
2. Display the summary and path to the generated HTML report.

## Prerequisites
- `gcloud` CLI installed (for downloading artifacts from public GCS buckets)
- `skopeo` for index image inspection
- Python 3
149 changes: 149 additions & 0 deletions plugins/lvms-ci/skills/doctor/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
name: lvms-ci:doctor
argument-hint: <release1,release2,...>
description: Analyze CI for multiple LVMS releases and produce an HTML summary
user-invocable: true
allowed-tools: Skill, Bash, Read, Write, Glob, Grep, Agent
---

# lvms-ci:doctor

## Synopsis
```bash
/lvms-ci:doctor <release1,release2,...>
```

## Description
Accepts a comma-separated list of release versions, runs analysis for each release, and produces a single HTML summary file consolidating all results. Uses deterministic scripts for data collection, artifact download, aggregation, and HTML generation. LLM agents are used only for per-job root cause analysis.

## Arguments
- `$ARGUMENTS` (required): Comma-separated list of release versions (e.g., `4.20,4.21,4.22`)

## Scripts Directory

Shared scripts are in:
```bash
SHARED_SCRIPTS=plugins/shared/scripts
```

## Work Directory

Set once at the start and reference throughout:
```bash
WORKDIR=/tmp/lvms-ci-claude-workdir.$(date +%y%m%d)
```

## Implementation Steps

### Step 1: Prepare -- Collect and Download All Artifacts

**Goal**: Deterministically collect all failed jobs and download their artifacts before any LLM analysis.

**Actions**:
1. Run `WORKDIR=/tmp/lvms-ci-claude-workdir.$(date +%y%m%d)` using the `Bash` tool
2. Run the prepare script:
```bash
bash ${SHARED_SCRIPTS}/doctor.sh prepare --product lvms --filter lvm --workdir ${WORKDIR} $ARGUMENTS
```
3. The script deterministically:
- For each release: fetches failed periodic jobs, downloads artifacts, writes `${WORKDIR}/analyze-ci-release-<version>-jobs.json`
- Outputs a JSON summary listing all releases, job counts, and file paths
4. Read the JSON output to know which releases have jobs to analyze and how many

**Error Handling**:
- If `$ARGUMENTS` is empty, show usage and stop
- If a release has no failed jobs, its jobs JSON will be an empty array -- skip analysis for that release

### Step 2: Analyze Each Job Using /lvms-ci:analyze-release

**Goal**: Get detailed root cause analysis for each failed job using pre-downloaded artifacts.

**Actions**:
1. Use the JSON summary output from Step 1 to build agent prompts. Do NOT read the job JSON files into the main conversation -- the prepare script already printed all job details (artifacts_dir, build_id, job name) and agents receive artifacts_dir directly in their prompt.
2. For **every** failed job across all releases, launch a separate **Agent** (using the `Agent` tool, NOT the `Skill` tool).

```text
Agent: subagent_type=general_purpose, prompt="Analyze this LVMS Prow job and save the report:

This is an LVMS job. Artifacts are in gs://test-platform-results/.
Some build-log.txt files are gzip-compressed -- pipe through zcat if binary.

Before analyzing test failures, check artifacts/<TEST_NAME>/lvms-catalogsource/finished.json -- if 'passed':false, that is the root cause. Report it and skip test analysis.

## Extract Index Image Info
Before running test analysis, extract the LVMS catalog index image from the job artifacts:
1. Fetch artifacts/<TEST_NAME>/lvms-catalogsource/build-log.txt (may be gzip-compressed)
2. Look for the line containing 'LVM_INDEX_IMAGE is set to:' and extract the image reference
3. If found, run skopeo inspect --no-tags 'docker://<INDEX_IMAGE>' to get:
- Digest (sha256)
- Build date (from org.opencontainers.image.created label)
- Source commit (from vcs-ref or org.opencontainers.image.revision label)
4. Include this in the report under an '## Index Image' section

Run /ci:prow-job-analyze-test-failure <ARTIFACTS_DIR>

Save the full report to: ${WORKDIR}/analyze-ci-release-<RELEASE>-job-<N>-<BUILD_ID>.txt"
```

3. Launch **ALL** agents in a single message using `run_in_background: true`
4. After launching, say "Analyzing N jobs in parallel..." and STOP.
5. As agent completion notifications arrive, respond with only "." (a single period).
6. Only after ALL agents are confirmed complete, proceed to Step 3.

### Step 3: Finalize -- Aggregate and Generate HTML Report

**Goal**: Deterministically aggregate results and generate the HTML report.

**Actions**:
1. Run the finalize script:
```bash
bash ${SHARED_SCRIPTS}/doctor.sh finalize --product lvms --workdir ${WORKDIR} $ARGUMENTS
```
2. The script deterministically:
- Runs `aggregate.py` for each release -> `summary.json` files
- Runs `create-report.py` -> `lvms-ci-doctor-report.html`
3. Report the script's output to the user

### Step 4: Report Completion

**Actions**:
1. Display the path to the generated HTML file
2. Summarize: failed job counts per release

**Example Output**:
```text
Summary:
Release 4.20: 3 failed periodic jobs
Release 4.21: 0 failed periodic jobs
Release 4.22: 7 failed periodic jobs

HTML report generated: ${WORKDIR}/lvms-ci-doctor-report.html
```

## Examples

### Example 1: Analyze Multiple Releases
```bash
/lvms-ci:doctor 4.20,4.21,4.22
```

### Example 2: Single Release
```bash
/lvms-ci:doctor 4.22
```

## Prerequisites

- `gcloud` CLI installed (for downloading artifacts from public GCS buckets)
- `skopeo` for index image inspection
- Python 3
- Bash shell

## Notes
- **Deterministic scripts** handle: data collection, artifact download, aggregation, HTML generation
- **LLM agents** handle: per-job root cause analysis (Step 2)
- All agents are launched in a single parallel wave
- The `prepare` script downloads all artifacts upfront so prow-job agents use local paths
- The `finalize` script runs aggregation and HTML generation in one call
- All intermediate files use prescribed filenames in `${WORKDIR}`
- The HTML report is self-contained (no external CSS/JS dependencies)
44 changes: 44 additions & 0 deletions plugins/lvms-ci/skills/generate-html-report/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
name: lvms-ci:generate-html-report
argument-hint: <release1,release2,...>
description: Generate an HTML report from existing LVMS CI analysis files
user-invocable: true
allowed-tools: Bash, Read, Glob, Grep
---

# lvms-ci:generate-html-report

## Synopsis
```bash
/lvms-ci:generate-html-report <release1,release2,...>
```

## Description
Generates an HTML report from existing analysis files in the work directory. This is useful for re-generating the report after analysis has already been completed by `/lvms-ci:doctor` or `/lvms-ci:analyze-release`.

## Arguments
- `$ARGUMENTS` (required): Comma-separated release versions (e.g., `4.20,4.21,4.22`)

## Scripts Directory
```bash
SHARED_SCRIPTS=plugins/shared/scripts
```

## Work Directory
```bash
WORKDIR=/tmp/lvms-ci-claude-workdir.$(date +%y%m%d)
```

## Steps

### Step 1: Run Finalize
```bash
bash ${SHARED_SCRIPTS}/doctor.sh finalize --product lvms --workdir ${WORKDIR} $ARGUMENTS
```

### Step 2: Report Completion
Display the path to the generated HTML file.

## Prerequisites
- Analysis files must already exist in `${WORKDIR}` (produced by `/lvms-ci:doctor` or `/lvms-ci:analyze-release`)
- Python 3
8 changes: 4 additions & 4 deletions plugins/microshift-ci/skills/doctor/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ Accepts a comma-separated list of MicroShift release versions, runs analysis for

## Scripts Directory

All scripts are run relative to the repository root:
Shared scripts are in:
```bash
SCRIPTS_DIR=plugins/microshift-ci/scripts
SHARED_SCRIPTS=plugins/shared/scripts
```

## Work Directory
Expand All @@ -43,7 +43,7 @@ WORKDIR=/tmp/microshift-ci-claude-workdir.$(date +%y%m%d)
1. Determine today's WORKDIR path by running `date +%y%m%d` and substituting into `/tmp/microshift-ci-claude-workdir.YYMMDD`. Use this value in all subsequent `--workdir` arguments.
2. Run the prepare script:
```bash
bash ${SCRIPTS_DIR}/doctor.sh prepare --workdir ${WORKDIR} $ARGUMENTS --rebase
bash ${SHARED_SCRIPTS}/doctor.sh prepare --product microshift --filter microshift --workdir ${WORKDIR} $ARGUMENTS --rebase
```
3. The script deterministically:
- For each release: fetches failed periodic jobs, downloads artifacts, writes `${WORKDIR}/analyze-ci-release-<version>-jobs.json`
Expand Down Expand Up @@ -122,7 +122,7 @@ WORKDIR=/tmp/microshift-ci-claude-workdir.$(date +%y%m%d)
**Actions**:
1. Run the finalize script:
```bash
bash ${SCRIPTS_DIR}/doctor.sh finalize --workdir ${WORKDIR} $ARGUMENTS
bash ${SHARED_SCRIPTS}/doctor.sh finalize --product microshift --workdir ${WORKDIR} $ARGUMENTS
```
2. The script deterministically:
- Runs `aggregate.py` for each release and for PRs → `summary.json` files
Expand Down
Loading