Skip to content

MicroShift CI Doctor: Deterministic prepare#81454

Draft
pmtk wants to merge 2 commits into
openshift:mainfrom
pmtk:ci-doc-deterministic-prepare
Draft

MicroShift CI Doctor: Deterministic prepare#81454
pmtk wants to merge 2 commits into
openshift:mainfrom
pmtk:ci-doc-deterministic-prepare

Conversation

@pmtk

@pmtk pmtk commented Jul 3, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

This update makes the MicroShift CI Doctor workflow more deterministic in OpenShift CI. The pre-processing and report-generation phases are now run as scripted, non-Claude steps, while Claude is reserved for the interactive analysis and bug-triage portions of the job. That reduces reliance on session-specific refresh logic, simplifies log handling, and makes the HTML/report refresh step reproducible by deriving ignore rules from closed stale-bug results.

The step definition was also adjusted to reflect the new execution model and given more runtime headroom by increasing the job timeout.

pmtk added 2 commits July 3, 2026 17:13
The doctor step previously ran artifact downloads, PCP graph
generation, and evidence extraction inside the doctor Claude session —
burning its 45-minute timeout and turn budget while the model waited —
and booted an entire 10-minute Claude session (doctor-refresh) to run
one deterministic script plus a JSON check.

Now the bash step owns the deterministic pipeline:
- prepare/graphs/evidence/fetch-previous run before the doctor session;
  the session is invoked with --prepared and spends its (reduced, 40m)
  budget purely on root cause analysis
- finalize (aggregation, cross-run history, HTML generation) runs right
  after the session, so the report no longer depends on the model
  ending its session gracefully
- the doctor-refresh session is replaced by a direct doctor.sh refresh
  call with the --ignore keys derived from closed-bugs.json

Step timeout goes to 1h45m: the preparation time that was previously
hidden inside the doctor session budget is now additive, partially
offset by the removed refresh session.
@openshift-ci openshift-ci Bot requested review from ggiguash and jeff-roche July 3, 2026 15:16
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

The microshift-ci doctor step script was restructured to separate deterministic phases (prepare, graphs, evidence, finalize, refresh) from Claude-based analysis. The Claude doctor-refresh log tracking was removed, refresh now runs deterministically with an ignore list, and the step reference's timeout and documentation were updated accordingly.

Changes

Doctor Workflow Determinism

Layer / File(s) Summary
Two-phase doctor invocation
ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-commands.sh
Runs doctor.sh prepare/graphs/evidence deterministically before Claude, adds --prepared flag and adjusted timeout checks, then runs doctor.sh finalize deterministically.
Deterministic HTML refresh
ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-commands.sh
Replaces Claude-based refresh with doctor.sh refresh, reading closed-bugs.json to build a --ignore list.
Log tracking cleanup
ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-commands.sh
Removes CLAUDE_DOCTOR_REFRESH_LOG and updates exit-time log verification to check only the remaining four Claude session logs.
Step reference update
ci-operator/.../openshift-edge-tooling-microshift-ci-doctor-ref.yaml
Increases timeout to 1h45m0s and updates documentation to reflect deterministic script phases vs. Claude's constrained role.

Estimated code review effort: 3 (Moderate) | ~20 minutes

Sequence Diagram(s)

sequenceDiagram
    participant Script as doctor-commands.sh
    participant DoctorSh as doctor.sh
    participant Claude as Claude plugin
    participant Bugs as closed-bugs.json

    Script->>DoctorSh: prepare
    Script->>DoctorSh: graphs
    Script->>DoctorSh: evidence
    Script->>Claude: microshift-ci:doctor --prepared
    Claude-->>Script: analysis result
    Script->>DoctorSh: finalize
    Script->>Bugs: read closed entries
    Bugs-->>Script: closed bug keys
    Script->>DoctorSh: refresh --ignore <keys>
Loading

Suggested reviewers: agullon

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title matches the main theme of making the MicroShift CI Doctor flow deterministic, though it undersells the broader workflow changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR only changes shell/YAML doctor workflow files; no Ginkgo test titles were added or modified, and no It/Describe/Context/When strings exist in the touched subtree.
Test Structure And Quality ✅ Passed PR changes only one shell script; no _test.go/Ginkgo files were modified, so the Ginkgo test-quality check is not applicable.
Microshift Test Compatibility ✅ Passed The PR only changes ci-operator doctor shell config; no Ginkgo e2e tests or MicroShift-unsupported APIs/features were added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR only changes a ci-operator doctor shell script; no Ginkgo e2e tests or Describe/It additions were modified, so SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed Only a CI doctor shell script changed; no deployment manifests, controllers, affinities, node selectors, PDBs, or scheduling constraints were added.
Ote Binary Stdout Contract ✅ Passed PASS: The only changed file is a CI shell script; no OTE binary/main/TestMain/process-level test code was modified, so the stdout contract is unaffected.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Only a doctor shell script changed; no new Ginkgo/e2e tests or network/IP assumptions were added.
No-Weak-Crypto ✅ Passed No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret-comparison code appears in the changed lines.
Container-Privileges ✅ Passed No changed manifest adds privileged settings; searches found no privileged/hostPID/hostNetwork/hostIPC/allowPrivilegeEscalation/SYS_ADMIN in the doctor subtree or generated job configs.
No-Sensitive-Data-In-Logs ✅ Passed The patch only removes fetch-previous and tweaks comments; it adds no new logging of tokens, PII, hostnames, or customer data.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@pmtk: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor N/A periodic Registry content changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml (1)

45-51: 🧹 Nitpick | 🔵 Trivial

Documentation accurately reflects the new deterministic/Claude split.

Confirm the new 1h45m0s budget comfortably covers prepare+graphs+evidence+the 40-minute Claude doctor session+finalize+refresh, plus the other Claude sessions (create-bugs, close-stale-bugs, fix-test-bugs) referenced by atexit_handler, which aren't visible in this diff.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml`
around lines 45 - 51, The doctor step timeout needs to account for all work
launched by the pipeline, not just the deterministic phases and the main
40-minute Claude doctor session. Review the timeout and flow in
openshift-edge-tooling-microshift-ci-doctor-ref.yaml alongside the
atexit_handler-driven sessions (create-bugs, close-stale-bugs, fix-test-bugs)
and make the budget explicit by increasing the timeout if needed or documenting
the full set of Claude sessions covered by the current 1h45m0s window.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml`:
- Around line 45-51: The doctor step timeout needs to account for all work
launched by the pipeline, not just the deterministic phases and the main
40-minute Claude doctor session. Review the timeout and flow in
openshift-edge-tooling-microshift-ci-doctor-ref.yaml alongside the
atexit_handler-driven sessions (create-bugs, close-stale-bugs, fix-test-bugs)
and make the budget explicit by increasing the timeout if needed or documenting
the full set of Claude sessions covered by the current 1h45m0s window.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1837abb2-452d-4641-95e5-f5b7b4b47d6b

📥 Commits

Reviewing files that changed from the base of the PR and between 5c158fe and 575380b.

📒 Files selected for processing (2)
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml

@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@pmtk pmtk marked this pull request as draft July 3, 2026 17:17
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant