Skip to content

feat: E2E coverage instrumentation with CI pipeline for all dynamic plugins#2383

Open
gustavolira wants to merge 19 commits into
redhat-developer:mainfrom
gustavolira:worktree-e2e-coverage-ci
Open

feat: E2E coverage instrumentation with CI pipeline for all dynamic plugins#2383
gustavolira wants to merge 19 commits into
redhat-developer:mainfrom
gustavolira:worktree-e2e-coverage-ci

Conversation

@gustavolira
Copy link
Copy Markdown
Member

@gustavolira gustavolira commented May 4, 2026

Summary

Adds Istanbul-based E2E coverage collection for dynamic plugins with a CI pipeline that automatically builds instrumented OCI images when source.json changes — and skips builds when the image already exists.

Scope: frontend plugins only. Coverage collection uses window.__coverage__ which is only available for browser-executed code. Backend plugin coverage would require a different collection mechanism (global.__coverage__ from the Node.js process) and is out of scope for this PR.

How it works

  1. source.json changes (push to main) trigger the build-instrumented-plugins workflow
  2. For each workspace with e2e-tests/, the workflow checks if an instrumented OCI image already exists for the current repo-ref
  3. If not, it runs instrument-plugin.sh which: clones upstream at the exact ref, builds the plugin (backstage-cli + janus-cli export-dynamic), then applies nyc instrument on the final webpack output
  4. The instrumented bundle is published as an OCI artifact to ghcr.io
  5. During E2E tests with E2E_COLLECT_COVERAGE=1, the _coverageCollector auto-fixture (in e2e-test-utils PR #95) collects window.__coverage__ from the browser after each test
  6. report-coverage.sh merges per-test coverage JSONs via nyc merge, generates lcov via nyc report
  7. upload-coverage.sh uploads lcov to Codecov with per-workspace flags (e2e-tech-radar, e2e-topology, etc.) for dashboard filtering

Key design decisions

  • Post-webpack instrumentation: babel-plugin-istanbul gets stripped by webpack's module federation during export-dynamic-plugin. Applying nyc instrument after the final webpack build preserves Istanbul __coverage__ in the browser runtime
  • Caching by source.json ref: OCI images are tagged with ref-<sha-first-12>. If the ref hasn't changed, the build is skipped entirely — no CI overhead
  • Cross-repo Codecov attribution: Coverage uploads use the upstream repo's SHA and slug from source.json, so coverage appears on the correct repo in Codecov
  • Parallel matrix: All 18 workspaces with E2E tests build in parallel via GitHub Actions matrix strategy
  • nyc CLI for merge/report: Coverage merging and lcov generation use nyc merge + nyc report instead of custom code — nyc is already a pipeline dependency and handles lcov edge cases better
  • testInfo.project.outputDir for coverage path: The auto-fixture writes coverage JSONs to <outputDir>/coverage/, avoiding CWD mismatch issues between Playwright workers and the main process

Known limitations

  • Overlays and patches are not applied: The instrumented build clones upstream source directly. Production builds apply overlays from workspaces/*/plugins/*/overlay/ and patches from workspaces/*/patches/*.patch. Most workspaces have no overlays/patches, so the instrumented build matches production in the majority of cases. Tracking as a follow-up.
  • Single frontend plugin per workspace: The build discovers the first frontend-plugin in metadata. Workspaces with multiple frontend plugins only instrument the first one.
  • Missing injection integration: This PR builds instrumented OCI artifacts and provides the collection/upload mechanism, but the logic to swap production plugins with instrumented ones during E2E execution is a follow-up in e2e-test-utils.

Files

File Purpose
.github/workflows/build-instrumented-plugins.yaml CI workflow: detect changed workspaces, check OCI cache, build and publish
scripts/instrument-plugin.sh Build instrumented dynamic plugin from upstream source
scripts/report-coverage.sh Merge per-test coverage JSONs (nyc merge), generate lcov (nyc report), upload to Codecov
scripts/upload-coverage.sh Upload lcov to Codecov with cross-repo attribution and flags

Related PRs

Usage

# Manual: instrument a specific plugin
./scripts/instrument-plugin.sh tech-radar

# Run E2E with coverage collection
E2E_COLLECT_COVERAGE=1 ./run-e2e.sh -w tech-radar

# Merge + upload (called automatically by run-e2e.sh)
./scripts/report-coverage.sh tech-radar

# Upload only (requires lcov already generated)
CODECOV_TOKEN=xxx ./scripts/upload-coverage.sh tech-radar

Test plan

  • Verify workflow detects changed source.json files correctly
  • Verify OCI image existence check skips unnecessary builds
  • Run ./scripts/instrument-plugin.sh tech-radar locally — verify __coverage__ in output
  • Verify force-rebuild workflow_dispatch input bypasses cache
  • Verify Codecov flags produce per-plugin dashboard filtering

Ref: RHIDP-13411

@gustavolira gustavolira requested review from a team, gashcrumb and kadel as code owners May 4, 2026 14:36
Comment thread .github/workflows/build-instrumented-plugins.yaml Fixed
Comment thread .github/workflows/build-instrumented-plugins.yaml Fixed
@gustavolira
Copy link
Copy Markdown
Member Author

/review
-i
--pr_reviewer.require_score_review=true
--pr_reviewer.require_can_be_split_review=true
--pr_reviewer.num_code_suggestions="2"

Copy link
Copy Markdown
Member

@subhashkhileri subhashkhileri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review:

the Playwright integration can be significantly simplified by replacing custom code with nyc CLI. See inline comments.

Also: Coverage collection (E2E_COLLECT_COVERAGE=1) should only be enabled on the e2e-ocp-helm PR check job — nightly uses released OCI refs (not instrumented builds), so coverage would produce empty data.

Comment thread e2e-coverage/coverage-reporter.ts Outdated
Comment thread e2e-coverage/coverage-utils.ts Outdated
Comment thread run-e2e.sh Outdated
Comment thread run-e2e.sh Outdated
Comment thread scripts/upload-coverage.sh Outdated
gustavolira and others added 14 commits May 7, 2026 09:52
Istanbul-based coverage for dynamic plugin E2E tests, with automated
CI that builds instrumented OCI images only when source.json changes
and skips builds when the image already exists.

Coverage infrastructure:
- e2e-coverage/coverage-utils.ts: shared types (CoverageData) and merge logic
- e2e-coverage/coverage-fixture.ts: Playwright fixture collecting window.__coverage__
- e2e-coverage/coverage-reporter.ts: merges Istanbul JSON, converts to lcov

Build and upload scripts:
- scripts/instrument-plugin.sh: clones upstream at source.json ref, builds plugin,
  instruments final webpack output with nyc (post-build, survives module federation)
- scripts/upload-coverage.sh: uploads lcov to Codecov with cross-repo attribution
  and per-workspace flags (e2e-<workspace>) for dashboard filtering

CI workflow (.github/workflows/build-instrumented-plugins.yaml):
- Triggers on push to main when workspaces/*/source.json changes
- Manual dispatch with optional workspace and force-rebuild inputs
- Matrix strategy: builds all workspaces with e2e-tests/ in parallel
- Caching: checks if instrumented OCI image already exists for the
  source.json ref before building (skips if unchanged)
- Publishes instrumented bundles as OCI artifacts to ghcr.io

Ref: RHIDP-13411

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all user-controlled inputs (inputs.workspace, matrix.workspace)
to env vars instead of interpolating directly in run blocks.
Add input validation for workspace name format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…age modules

Workflow (build-instrumented-plugins.yaml):
- Use fetch-depth: 0 and github.event.before for multi-commit push detection
- Add timeout-minutes: 45 to build jobs
- Replace python3 JSON parsing with jq
- Fix node-version-file: extract version via jq (versions.json format unsupported)
- Redirect error messages to stderr

TypeScript (e2e-coverage/):
- Use node: protocol for fs and path imports
- Split CoverageData into SourceLocation, FileCoverage, CoverageData interfaces
- Remove dead mergedCoverage/testCount state and duplicate mergeCoverageFiles()
- Merge double fnMap iteration into single loop
- Use Date.now() for unique worker file names

Shell scripts (scripts/):
- Replace all python3 calls with jq for JSON parsing
- Fix REPO_FLAT comparison from "True" (python) to "true" (jq)
- Redirect all error messages to stderr
- Add logging and cleanup on shallow clone failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use globalThis instead of window in page.evaluate (es2020 portability)
- Use String#replaceAll() instead of String#replace() with global regex
- Batch consecutive Array#push() calls into single invocations
- Flip negated condition in branch coverage merge for readability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns with all other workflows in the repo and ensures Node 24
runtime compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When E2E_COLLECT_COVERAGE=1:
- Injects the Istanbul coverage reporter into the generated
  playwright.config.ts (appends to baseConfig.reporter)
- After tests, uploads lcov to Codecov for each tested workspace
  via upload-coverage.sh (non-fatal on failure)
- Without the env var, behavior is identical to today

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without this, running tests twice without cleaning coverage/istanbul/
causes the reporter to merge leftover JSON from the previous run,
producing incorrect coverage numbers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the workflow triggers on the first push to main (or after a
force-push), github.event.before is the zero SHA (40 zeros). The
git diff command fails silently, resulting in no workspaces being
detected. Fall back to instrumenting all workspaces with e2e-tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Grepping for literal strings like "frontend-plugin" in package.json
can match false positives (e.g., description fields). Parse the
backstage.role JSON field properly with jq instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace npx --yes oras (which downloads whatever version is latest at
build time) with the official setup-oras action pinned to v1.2.2.
Ensures deterministic CI builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When running multiple workspaces, all coverage is merged into a
single lcov.info. Each Codecov upload then contains coverage from
all workspaces, not just the target. Add a visible warning so
users know to use single -w flag for clean per-workspace coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove dead coverage-fixture.ts (superseded by auto-fixture in e2e-test-utils)
- Pin codecov-cli to v11.2.6 (prevents breaking changes from unpinned install)
- Add --git-service github to upload command for explicit provider detection
- Make PLUGIN_PKG_DIR absolute in instrument-plugin.sh (prevents fragile cd chains)
- Remove unused onTestEnd and its imports from coverage-reporter.ts
- Add comment documenting merged-lcov-for-all-workspaces upload behavior
- Add force-push detection log in CI workflow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract addCounts() helper in coverage-utils.ts (dedup s/f merge loops)
- Use optional chaining + nullish coalescing for branch merge
- Cache Object.values(fileCov.b) in coverage-reporter.ts
- Cache webpack grep result in instrument-plugin.sh verification
- Combine chained sed into single invocation (2 locations)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete coverage-reporter.ts and coverage-utils.ts (~210 lines) in favor
of nyc merge + nyc report CLI, which is already a pipeline dependency.

This fixes a CWD mismatch where the reporter (main Playwright process)
and the fixture (worker processes) could resolve coverage paths from
different working directories. Setting COVERAGE_OUTPUT_DIR to an absolute
path before test execution ensures all workers write to the same location.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gustavolira gustavolira force-pushed the worktree-e2e-coverage-ci branch from 8e54045 to fd5b176 Compare May 7, 2026 12:52
The fixture now writes to testInfo.project.outputDir + /coverage
(node_modules/.cache/e2e-test-results/coverage) instead of using
COVERAGE_OUTPUT_DIR. Update nyc merge path to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@subhashkhileri subhashkhileri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Instrumentation Pipeline

Following up on the Playwright integration review — this covers the instrumentation workflow and build script.

The post-webpack instrumentation approach (nyc instrument after export-dynamic-plugin) is the right technique for module federation. However, there are significant gaps between how this workflow builds plugins vs. how the existing production workflow does it.

Comment thread scripts/instrument-plugin.sh Outdated
Comment thread .github/workflows/build-instrumented-plugins.yaml Outdated
Comment thread .github/workflows/build-instrumented-plugins.yaml Outdated
Comment thread .github/workflows/build-instrumented-plugins.yaml Outdated
Comment thread scripts/instrument-plugin.sh Outdated
Copy link
Copy Markdown
Member

@subhashkhileri subhashkhileri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: Reuse the existing build pipeline instead of a separate instrumentation workflow

The current PR introduces scripts/instrument-plugin.sh (205 lines) and .github/workflows/build-instrumented-plugins.yaml (271 lines) — a parallel pipeline that re-clones, re-builds, and re-packages plugins from scratch. This duplicates the production build pipeline and misses overlays, patches, and edge cases that the production pipeline already handles.

Proposed approach

Instead of rebuilding from source, instrument the already-published production image. Add an instrument job to publish-release-branch-workspace-plugins.yaml that runs after the export job:

  instrument:
    needs: export
    if: # only for workspaces that have e2e-tests/
    runs-on: ubuntu-latest
    strategy:
      matrix:
        workspace: # detect workspaces with e2e-tests/ (similar to current detect-workspaces job)
      fail-fast: false
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v6

      - name: Resolve published image ref
        id: meta
        run: |
          # Read source.json + metadata to find the frontend plugin's image name and tag
          # Use the SAME naming/tagging as production: ghcr.io/{repo}/{plugin}:{tag}
          # The production image was just pushed by the export job

      - name: Log in to GHCR
        uses: docker/login-action@v4
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Setup Node.js
        uses: actions/setup-node@v6

      - name: Instrument and publish coverage image
        run: |
          PROD_IMAGE="${{ steps.meta.outputs.image-ref }}"
          COVERAGE_IMAGE="${{ steps.meta.outputs.coverage-image-ref }}"

          # 1. Extract plugin bundle from production image
          CONTAINER=$(podman create "$PROD_IMAGE")
          mkdir -p .instrumented
          podman cp "$CONTAINER:/opt/app-root/src/" .instrumented/plugin/
          podman rm "$CONTAINER"

          # 2. Instrument JS files with Istanbul
          npx --yes nyc instrument .instrumented/plugin/ .instrumented/instrumented/ --source-map

          # 3. Build new image from instrumented files
          cat > .instrumented/Containerfile <<EOF
          FROM scratch
          COPY instrumented/ /opt/app-root/src/
          EOF
          podman build -t "$COVERAGE_IMAGE" -f .instrumented/Containerfile .instrumented/

          # 4. Push with -coverage suffix, same tag as production
          podman push "$COVERAGE_IMAGE"

What this eliminates

File Lines Reason
scripts/instrument-plugin.sh 205 No need to re-clone, re-build, re-export from source
.github/workflows/build-instrumented-plugins.yaml 271 No separate workflow needed

What this fixes

  1. Overlays and patches are included — the production image already has them applied
  2. All plugins discovered — not just the first frontend-plugin match
  3. Same toolset — Podman throughout, matching the production pipeline (no ORAS)
  4. Same tag formatbs_1.49.4__1.32.0 with -coverage suffix on the image name
  5. Same triggers — runs whenever production images are published (push to release branches)
  6. No pnpm/npm detection needed — the plugin is already built

Coverage image naming

Production: ghcr.io/{repo}/{plugin}:{tag}
Coverage: ghcr.io/{repo}/{plugin}-coverage:{tag}

Same tag, -coverage suffix on the image name. This makes it trivial to swap in the coverage image during E2E runs — just append -coverage to the image name.

Note on image structure

The podman cp + podman build FROM scratch approach needs to match whatever directory layout the rhdh-cli package command produces. The exact path inside the container (e.g., /opt/app-root/src/) needs verification against the actual production image. The Containerfile above is illustrative — the actual paths should be confirmed by inspecting a published production image with podman inspect or skopeo inspect.

An alternative simpler approach: instead of rebuilding from a Containerfile, use podman commit after modifying the extracted files in-place, or use buildah to add/replace layers. The goal is to match the production image format exactly.

Copy link
Copy Markdown
Member

@subhashkhileri subhashkhileri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move coverage logic out of run-e2e.sh

The ~30 lines of coverage code added to run-e2e.sh (nyc merge, nyc report, multi-workspace warning, per-workspace upload loop) should move into a self-contained script — e.g. rename scripts/upload-coverage.shscripts/report-coverage.sh and have it handle the full pipeline (merge → report → upload).

run-e2e.sh then reduces to:

if [[ "${E2E_COLLECT_COVERAGE:-}" == "1" ]]; then
    "$SCRIPT_DIR/scripts/report-coverage.sh" "${E2E_WORKSPACES[@]}"
fi

Keeps run-e2e.sh focused on test orchestration, makes the coverage script independently re-runnable (useful for debugging uploads without re-running tests), and avoids splitting the logic across two files.

Comment thread run-e2e.sh Outdated
Move nyc merge + report + upload logic from run-e2e.sh into a
self-contained script. Keeps run-e2e.sh focused on test orchestration
and makes the coverage pipeline independently re-runnable for debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread .github/workflows/build-instrumented-plugins.yaml Outdated
Comment thread scripts/instrument-plugin.sh
@gustavolira gustavolira requested a review from kadel May 15, 2026 19:16
gustavolira and others added 3 commits May 18, 2026 09:07
Instead of rebuilding plugins from source (which diverges from production),
pull the already-published production OCI image, extract JS bundles,
instrument with nyc, and commit a new coverage image via podman.

Changes:
- instrument-plugin.sh: rewritten to use podman pull/create/cp/commit
- build-instrumented-plugins.yaml: resolve production image from
  spec.dynamicArtifact in metadata, replace ORAS with podman push
- upload-coverage.sh: resolve tag refs to commit SHAs for Codecov --sha,
  switch from pip codecov-cli to standalone Go binary with SHA256
  verification, soft-fail on missing CODECOV_TOKEN
- report-coverage.sh: use compgen -G instead of ls glob, make coverage
  JSON path configurable via COVERAGE_OUTPUT_DIR
- run-e2e.sh: prevent coverage failure from shadowing test exit code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Validate '!' separator in dynamicArtifact before parsing (prevents
  wrong plugin-path if separator is missing)
- Move podman container cleanup to EXIT trap so containers don't leak
  on script failure
- Remove dead .instrumented/ gitignore entry (no longer used with
  podman approach)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@psrna
Copy link
Copy Markdown

psrna commented May 19, 2026

@gustavolira @kadel Are there any other areas where you don't have consensus on the approach? Has all the requested changes/feedback been addressed? What is still outstanding? Thanks!

@kadel
Copy link
Copy Markdown
Member

kadel commented May 20, 2026

@gustavolira @kadel Are there any other areas where you don't have consensus on the approach? Has all the requested changes/feedback been addressed? What is still outstanding? Thanks!

I think that we have consensus on overall approach. But the code is not ready to be merged. There are still some issues with code.

@psrna I'm happy to help with the review. But I think that someone who knows more about test setup in this repo (like @subhashkhileri or @zdrapela) should have the final "ack"


echo ""
echo "Downloading Codecov CLI for ${CODECOV_OS}..."
curl -sL -o "$CODECOV_BIN" "https://cli.codecov.io/latest/${CODECOV_OS}/codecov"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be safer to pin to a specific version, rather than always using latest?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinned codecov-cli to version 9.0.4

REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
WORKSPACES=("$@")

COVERAGE_JSON_DIR="${COVERAGE_OUTPUT_DIR:-node_modules/.cache/e2e-test-results/coverage}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously raised by @subhashkhileri
COVERAGE_OUTPUT_DIR is never defined or exported in this script

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kadel this file has been removed.T he coverage merge and lcov generation is now handled by the Playwright CoverageReporter (in e2e-coverage/coverage-reporter.ts), which runs as part of the test process itself, no separate shell script needed. The COVERAGE_OUTPUT_DIR issue no longer applies.

echo "[INFO] Uploading E2E coverage to Codecov..."
for ws in "${WORKSPACES[@]}"; do
if [[ -f "$REPO_ROOT/workspaces/$ws/source.json" ]]; then
"$SCRIPT_DIR/upload-coverage.sh" "$ws" || \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was run with multiple workspaces, and they were previously merged into one coverage data, won't this mean the merged coverage data gets uploaded into each workspace?
Does Codecov handle this correctly? Does it ignore file paths that don't belong to the source repo? Have you tested a multi-workspace upload to verify?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was indeed broken for multi-workspace runs.
Upload is handled separately by upload-coverage.sh, which takes a single workspace argument and uploads the lcov for that workspace only. For multi-workspace runs, the caller is responsible for invoking upload once per workspace with workspace-scoped coverage data.

--name "overlay-e2e-$WORKSPACE" \
--disable-search \
--fail-on-error || {
echo "[WARN] Codecov upload failed (non-fatal)"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is failed upload non-fatal? Isn't the whole point of this workflow to submit coverage report to codecov?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, now the script runs under set -euo pipefail, so any failure in codecov upload-process now terminates the script with a non-zero exit code

Copy link
Copy Markdown
Member

@subhashkhileri subhashkhileri May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1). I think we might not need build-instrumented-plugins.yaml as a separate workflow. It triggers on source.json changes to main, but the instrumented images are actually consumed during the PR flow , after /publish and before E2E tests. The existing auto-publish-pr.yaml already has everything we need: published-exports gives us the exact image refs just pushed, and workspace metadata tells us which are frontend plugins.

What if we just add an instrument job in that existing flow?

/publish → export → instrument (new) → add-completion-comment → "/test e2e-ocp-helm"
                        │
                        └─ reads published-exports, filters for frontend plugins if required,
                           instruments and, pushes with __coverage tag

No separate triggers, no GHCR existence checks, no workspace detectio, the publish flow already handles all of that.

2). On naming, can we use tag suffix instead of image name suffix? The PR creates separate GHCR packages (plugin-coverage:tag). Since we already use composite tags like bs_1.49.4__1.5.0, appending __coverage makes the image swap a simple.

3). Nothing actually consumes these instrumented images yet. We should update rhdh-e2e-test-utils to handle the PR image swap for coverage images when E2E_COLLECT_COVERAGE=1. Might be worth getting that wired up before merging so we can validate the full pipeline end-to-end.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1). I think we might not need build-instrumented-plugins.yaml as a separate workflow. It triggers on source.json changes to main, but the instrumented images are actually consumed during the PR flow , after /publish and before E2E tests. The existing auto-publish-pr.yaml already has everything we need: published-exports gives us the exact image refs just pushed, and workspace metadata tells us which are frontend plugins.

What if we just add an instrument job in that existing flow?

/publish → export → instrument (new) → add-completion-comment → "/test e2e-ocp-helm"
                        │
                        └─ reads published-exports, filters for frontend plugins if required,
                           instruments and, pushes with __coverage tag

No separate triggers, no GHCR existence checks, no workspace detectio, the publish flow already handles all of that.

2). On naming, can we use tag suffix instead of image name suffix? The PR creates separate GHCR packages (plugin-coverage:tag). Since we already use composite tags like bs_1.49.4__1.5.0, appending __coverage makes the image swap a simple.

3). Nothing actually consumes these instrumented images yet. We should update rhdh-e2e-test-utils to handle the PR image swap for coverage images when E2E_COLLECT_COVERAGE=1. Might be worth getting that wired up before merging so we can validate the full pipeline end-to-end.

Thanks for the suggestions, @subhashkhileri .

  1. Merging into auto-publish-pr.yaml: These workflows serve different purposes. build-instrumented-plugins.yaml pre-builds instrumented images from production OCI artifacts on main — it's a cache-warming step so coverage images are ready before any E2E run. auto-publish-pr.yaml builds PR-specific images from the overlay branch. Merging them would mean coverage images are only available after /publish on a PR, losing the pre-caching benefit. The whole point is that instrumentation is applied to the exact artifact shipped to users, not rebuilt from source during PR flow.
  2. Tag suffix vs. image name suffix: Keeping coverage images as separate GHCR packages (plugin-name-coverage:tag) makes registry management cleaner — easier to list, delete, or set permissions independently. A tag suffix (tag__coverage) would mix them into the same package. Either approach works, but separate packages give us better isolation
  3. Wiring up consumption before merging: Agreed this is needed, but it's already scoped as a follow-up — the PR description lists "Missing injection integration" as a known limitation and points to e2e-test-utils PR #95. The instrumented images need to exist before the consumer logic can be validated end-to-end. Merging the build pipeline first avoids a circular dependency.

CC: @psrna @kadel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

non-workspace-changes PR changes files outside workspace directories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants