MicroShift CI Doctor: Deterministic prepare by pmtk · Pull Request #81454 · openshift/release

pmtk · 2026-07-03T15:15:43Z

Summary by CodeRabbit

This update makes the MicroShift CI Doctor workflow more deterministic in OpenShift CI. The pre-processing and report-generation phases are now run as scripted, non-Claude steps, while Claude is reserved for the interactive analysis and bug-triage portions of the job. That reduces reliance on session-specific refresh logic, simplifies log handling, and makes the HTML/report refresh step reproducible by deriving ignore rules from closed stale-bug results.

The step definition was also adjusted to reflect the new execution model and given more runtime headroom by increasing the job timeout.

The doctor step previously ran artifact downloads, PCP graph generation, and evidence extraction inside the doctor Claude session — burning its 45-minute timeout and turn budget while the model waited — and booted an entire 10-minute Claude session (doctor-refresh) to run one deterministic script plus a JSON check. Now the bash step owns the deterministic pipeline: - prepare/graphs/evidence/fetch-previous run before the doctor session; the session is invoked with --prepared and spends its (reduced, 40m) budget purely on root cause analysis - finalize (aggregation, cross-run history, HTML generation) runs right after the session, so the report no longer depends on the model ending its session gracefully - the doctor-refresh session is replaced by a direct doctor.sh refresh call with the --ignore keys derived from closed-bugs.json Step timeout goes to 1h45m: the preparation time that was previously hidden inside the doctor session budget is now additive, partially offset by the removed refresh session.

coderabbitai · 2026-07-03T15:16:13Z

Walkthrough

The microshift-ci doctor step script was restructured to separate deterministic phases (prepare, graphs, evidence, finalize, refresh) from Claude-based analysis. The Claude doctor-refresh log tracking was removed, refresh now runs deterministically with an ignore list, and the step reference's timeout and documentation were updated accordingly.

Changes

Doctor Workflow Determinism

Layer / File(s)	Summary
Two-phase doctor invocation `ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-commands.sh`	Runs `doctor.sh prepare/graphs/evidence` deterministically before Claude, adds `--prepared` flag and adjusted timeout checks, then runs `doctor.sh finalize` deterministically.
Deterministic HTML refresh `ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-commands.sh`	Replaces Claude-based refresh with `doctor.sh refresh`, reading `closed-bugs.json` to build a `--ignore` list.
Log tracking cleanup `ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-commands.sh`	Removes `CLAUDE_DOCTOR_REFRESH_LOG` and updates exit-time log verification to check only the remaining four Claude session logs.
Step reference update `ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-ref.yaml`	Increases timeout to `1h45m0s` and updates documentation to reflect deterministic script phases vs. Claude's constrained role.

Estimated code review effort: 3 (Moderate) | ~20 minutes

Sequence Diagram(s)

sequenceDiagram
    participant Script as doctor-commands.sh
    participant DoctorSh as doctor.sh
    participant Claude as Claude plugin
    participant Bugs as closed-bugs.json

    Script->>DoctorSh: prepare
    Script->>DoctorSh: graphs
    Script->>DoctorSh: evidence
    Script->>Claude: microshift-ci:doctor --prepared
    Claude-->>Script: analysis result
    Script->>DoctorSh: finalize
    Script->>Bugs: read closed entries
    Bugs-->>Script: closed bug keys
    Script->>DoctorSh: refresh --ignore <keys>

Suggested reviewers: agullon

🚥 Pre-merge checks | ✅ 15

✅ Passed checks (15 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title matches the main theme of making the MicroShift CI Doctor flow deterministic, though it undersells the broader workflow changes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	The PR only changes shell/YAML doctor workflow files; no Ginkgo test titles were added or modified, and no It/Describe/Context/When strings exist in the touched subtree.
Test Structure And Quality	✅ Passed	PR changes only one shell script; no _test.go/Ginkgo files were modified, so the Ginkgo test-quality check is not applicable.
Microshift Test Compatibility	✅ Passed	The PR only changes ci-operator doctor shell config; no Ginkgo e2e tests or MicroShift-unsupported APIs/features were added.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR only changes a ci-operator doctor shell script; no Ginkgo e2e tests or Describe/It additions were modified, so SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility	✅ Passed	Only a CI doctor shell script changed; no deployment manifests, controllers, affinities, node selectors, PDBs, or scheduling constraints were added.
Ote Binary Stdout Contract	✅ Passed	PASS: The only changed file is a CI shell script; no OTE binary/main/TestMain/process-level test code was modified, so the stdout contract is unaffected.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Only a doctor shell script changed; no new Ginkgo/e2e tests or network/IP assumptions were added.
No-Weak-Crypto	✅ Passed	No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret-comparison code appears in the changed lines.
Container-Privileges	✅ Passed	No changed manifest adds privileged settings; searches found no privileged/hostPID/hostNetwork/hostIPC/allowPrivilegeEscalation/SYS_ADMIN in the doctor subtree or generated job configs.
No-Sensitive-Data-In-Logs	✅ Passed	The patch only removes fetch-previous and tweaks comments; it adds no new logging of tokens, PII, hostnames, or customer data.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-07-03T15:16:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/OWNERS~~ [pmtk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-merge-bot · 2026-07-03T15:18:45Z

[REHEARSALNOTIFIER]
@pmtk: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name	Repo	Type	Reason
periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor	N/A	periodic	Registry content changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

coderabbitai

🧹 Nitpick comments (1)

ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml (1)
45-51: 🧹 Nitpick | 🔵 Trivial

Documentation accurately reflects the new deterministic/Claude split.

Confirm the new 1h45m0s budget comfortably covers prepare+graphs+evidence+the 40-minute Claude doctor session+finalize+refresh, plus the other Claude sessions (create-bugs, close-stale-bugs, fix-test-bugs) referenced by atexit_handler, which aren't visible in this diff.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml`
around lines 45 - 51, The doctor step timeout needs to account for all work
launched by the pipeline, not just the deterministic phases and the main
40-minute Claude doctor session. Review the timeout and flow in
openshift-edge-tooling-microshift-ci-doctor-ref.yaml alongside the
atexit_handler-driven sessions (create-bugs, close-stale-bugs, fix-test-bugs)
and make the budget explicit by increasing the timeout if needed or documenting
the full set of Claude sessions covered by the current 1h45m0s window.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml`:
- Around line 45-51: The doctor step timeout needs to account for all work
launched by the pipeline, not just the deterministic phases and the main
40-minute Claude doctor session. Review the timeout and flow in
openshift-edge-tooling-microshift-ci-doctor-ref.yaml alongside the
atexit_handler-driven sessions (create-bugs, close-stale-bugs, fix-test-bugs)
and make the budget explicit by increasing the timeout if needed or documenting
the full set of Claude sessions covered by the current 1h45m0s window.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1837abb2-452d-4641-95e5-f5b7b4b47d6b

📥 Commits

Reviewing files that changed from the base of the PR and between 5c158fe and 575380b.

📒 Files selected for processing (2)

ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml

openshift-ci · 2026-07-03T15:22:29Z

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

pmtk added 2 commits July 3, 2026 17:13

Remove fetch-previous

575380b

openshift-ci Bot requested review from ggiguash and jeff-roche July 3, 2026 15:16

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2026

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

pmtk marked this pull request as draft July 3, 2026 17:17

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MicroShift CI Doctor: Deterministic prepare#81454

MicroShift CI Doctor: Deterministic prepare#81454
pmtk wants to merge 2 commits into
openshift:mainfrom
pmtk:ci-doc-deterministic-prepare

pmtk commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

openshift-merge-bot Bot commented Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pmtk commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

openshift-merge-bot Bot commented Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pmtk commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading