Skip to content

feat(agentic-ci): decision-ready triage and daily PR fixes#600

Open
andreatgretel wants to merge 11 commits intomainfrom
andreatgretel/feat/agentic-ci-improvements
Open

feat(agentic-ci): decision-ready triage and daily PR fixes#600
andreatgretel wants to merge 11 commits intomainfrom
andreatgretel/feat/agentic-ci-improvements

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

📋 Summary

Reorganize the weekly issue-triage report around recommended actions so each flagged item is decision-ready, and flip four of the five daily audit suites from read-only reports to opening one PR per run for the most important localized fix. The shared procedure (selection, ranking, allowlists, attempted-fixes lifecycle, two-strike escalation, branch/PR conventions) lives in a single _fix-policy.md; suite recipes declare only their eligible categories.

🔗 Related Issue

N/A — extends the agentic-CI work tracked in plan 472. We can link a follow-up tracking issue once one is opened.

🔄 Changes

  • New .agents/recipes/_fix-policy.md — universal localized-fix bar (≤3 files, ≤50 LOC, reversible, self-evident, test-safe, single-concern), per-suite path/command allowlists, finding-hash spec for stable cross-run identification, fix_backlog / attempted_fixes schema, ranking criteria (confidence > severity > impact > recency), draft-PR mode, two-strike escalation, standard fix procedure, direct gh pr create --body-file PR-creation pattern.
  • New .agents/recipes/_phase-audit.md and .agents/recipes/_phase-fix.md — phase directives prepended to each claude invocation so each call knows which phase it executes.
  • Rewrite .agents/recipes/issue-triage/recipe.md: action-organized buckets (Close as resolved, Close as duplicate, Needs maintainer decision, Ready for assignment, Stuck PR, Duplicate PRs, Stale, consider closing), per-row Action / Evidence / Rationale columns, healthy items collapsed to count + <details>, multi-comment split with :i/N markers and orphan reconciliation, plus a Repeatedly-failed fix attempts section that surfaces two-strike findings from the daily suites.
  • Add a fix phase to four suite recipes (eligibility tables only — procedure inherited from _fix-policy.md):
    • docs-and-references: broken-link, docstring-drift (signature-driven), arch-ref-rename. Non-draft.
    • structure: missing-future, lazy-import. Non-draft. Dead exports stay report-only.
    • dependencies: transitive-gap, unused. Non-draft.
    • code-quality: bare-except narrowing only. Draft PRs until draft_until_proven is flipped after two non-draft PRs land clean. TODO-line deletion explicitly forbidden.
  • test-health: explicit "no fix phase, all categories report-only" with a future-candidate note for test-isolation violations.
  • Update .github/workflows/agentic-ci-daily.yml: split the recipe step into two claude invocations (audit, then conditionally fix), each with the existing --max-turns 50. Fix step gated on audit success AND matrix.suite != 'test-health' AND non-empty fix_backlog. Adds a git-identity step, expands artifact upload to include the fix log + PR body, and reports both phase outcomes in the job summary. No branch auto-deletion.
  • Minor _runner.md updates: generalized branch prefix beyond chore, documented why CI uses gh pr create --body-file instead of /create-pr.

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • .agents/recipes/_fix-policy.md — load-bearing contract. Allowlists, ranking, two-strike escalation, and the standard fix procedure all live here. Worth reviewing in full.
  • .github/workflows/agentic-ci-daily.yml — the audit/fix split, the backlog gate (fromJSON(steps.backlog.outputs.size || '0') > 0), and the matrix.suite != 'test-health' guard.
  • One-time manual setup before merge: create labels agentic-ci, agentic-ci/docs-and-references, agentic-ci/structure, agentic-ci/dependencies, agentic-ci/code-quality. The recipes apply these via gh pr edit --add-label; missing labels won't block the PR opening but will surface a gh warning.

🧪 Testing

  • make test — N/A. No Python or other source code changed; the diff is recipes (markdown), workflow YAML, and one new policy doc.
  • Workflow YAML validated (python3 -c "import yaml; yaml.safe_load(...)")
  • Pre-commit hooks pass (trim whitespace, EOF, yaml, large files, merge conflicts, line endings)
  • Production validation plan — enable each suite via workflow_dispatch one at a time and read the first 1–2 actual runs before promoting to the next, per the validation section of the plan. Two-strike escalation, allowlist enforcement, and re-attempt blocking are all observable from the runner state and PR list.

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A; this extends agentic-CI infrastructure (originally in plan 472), and _fix-policy.md itself is the architecture doc for the fix phase.

@andreatgretel andreatgretel requested a review from a team as a code owner May 4, 2026 16:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR #600 Review — feat(agentic-ci): decision-ready triage and daily PR fixes

Summary

Reorganizes the weekly issue-triage report around recommended actions (seven exclusive buckets, per-row Action/Evidence/Rationale) and extends four of five daily audit suites (docs-and-references, structure, dependencies, code-quality) to open one PR per run for eligible, narrowly-scoped mechanical fixes. Shared rules live in a new .agents/recipes/_fix-policy.md; suite recipes declare only their eligible categories. test-health stays explicitly report-only. The workflow .github/workflows/agentic-ci-daily.yml splits the recipe step into audit + fix invocations, gated on audit success, non-test-health suite, and fix_backlog non-empty.

Diff is 607 adds / 118 dels, entirely in recipe markdown + one workflow YAML — no Python changes. Scope is consistent with the PR description.

Findings

Correctness / potential issues

  • make test-<package> name mapping is ambiguous. _fix-policy.md:26 and the fix procedure at _fix-policy.md:171-173 tell the fix phase to run make test-<package> for the affected package. The Makefile only defines test-config, test-engine, test-interface — but a recipe selecting e.g. packages/data-designer-config/pyproject.toml under the dependencies allowlist may plausibly infer make test-data-designer-config (which doesn't exist). Recommend adding an explicit mapping table in _fix-policy.md (path prefix → make target) so the recipe can't synthesize a non-existent target. This is the single most likely silent-abandon source on day one.

  • Check fix backlog step has no id: audit dependency, just if: success(). That's fine when Run audit recipe fails (whole job flips to failure), but if: success() does not distinguish between "audit ran and passed" and "audit was skipped". There is no current path to skip it, so not a bug today, but worth tightening to if: steps.audit.outcome == 'success' && matrix.suite != 'test-health' for robustness if a pre-audit gate is ever added.

  • fromJSON(steps.backlog.outputs.size || '0') when the step is skipped. If backlog is skipped (test-health), steps.backlog.outputs.size is empty string and the || '0' is required to keep fromJSON from erroring. Good. Consider applying the same pattern to AUDIT_OUTCOME / FIX_OUTCOME in the summary step — the default skipped/unknown render is currently fine, but a skipped backlog will show Fix backlog size: \`rather thann/a`. Minor.

  • Single-part vs numbered triage filename inconsistency. issue-triage/recipe.md:204-206 says each part is /tmp/issue-triage-report-1.md, ...-2.md, etc. The fallback note at recipe.md:265 says "the workflow's fallback step will post /tmp/issue-triage-report.md" (no suffix). In the single-part case, the file must exist under both names or the fallback loses the report. Either (a) instruct the recipe to write both, or (b) change the workflow fallback to prefer -1.md.

  • attempted_fixes reconciliation is underspecified. _fix-policy.md:158-159 says reconcile against open PRs at the start of each fix run; step 6 adds a hidden <!-- agentic-ci finding=<id> suite=<suite> --> marker in the PR body. But the reconciliation algorithm (how to match PRs to attempted_fixes entries by id) isn't described. On a crash after gh pr create but before the state write, the next run has no way to recover the entry without parsing that marker. Recommend one explicit sentence: "reconcile by scanning open PR bodies for the agentic-ci finding= marker and back-filling missing attempted_fixes entries".

  • fix_backlog.data schema is freeform. Each category describes its data shape in prose (e.g. docs-drift "signature-vs-Args delta", bare-except "proposed replacement type and grep evidence"). If two runs use slightly different key names, downstream reconciliation is brittle. A small schema table per category would harden this.

  • set -o pipefail is set in both the audit and fix shell steps — good. claude ... | tee will propagate non-zero correctly. No issue.

Style / conventions

  • Markdown tables in _fix-policy.md are dense but readable; good use of section headers. No issues.
  • Branch-naming pattern update in _runner.md is consistent with _fix-policy.md:167.
  • The explicit "TODO line deletion is forbidden" in code-quality (both in fix phase and Constraints) is a welcome belt-and-suspenders after the prior incident pattern.
  • Issue-triage multi-comment split logic (PATCH existing, post surplus, delete extras) is clean and idempotent.

Risk / operational

  • Cost. Cache is disabled (DISABLE_PROMPT_CACHING: "1") and the fix phase re-loads _phase-fix.md + _runner.md + _fix-policy.md + the recipe body per run. This roughly doubles per-suite prompt bytes on fix-eligible days. Acceptable, but worth a note in the rollout plan that cost per daily run approximately doubles when a backlog exists.
  • Draft-PR gate is governed by a runner-state boolean (draft_until_proven). Flipping it requires manually editing the cached state. That's fine as designed, but please document the flip procedure in the rollout note (the PR body already says "a maintainer flips the flag after two non-draft PRs land clean" — just spell out the exact file/key path).
  • Label pre-creation is a documented manual step. Missing labels only surface a gh warning; the PR still opens. Acceptable.
  • Two-strike surfacing in the triage report depends on access to the daily-suites' runner-state, which lives in per-suite caches on a different workflow. The recipe acknowledges this with a "may be empty" caveat, but that effectively neuters the escalation visibility until a shared-state mechanism is added. Worth tracking as a follow-up rather than blocking this PR.

Tests

  • N/A. No Python source changed. YAML is syntactically valid per the PR checklist; the changes are behavioral and only exercisable in production runs.

Verdict

Approve-with-comments. The design is coherent, the audit/fix split is well-isolated, and the escape hatches (localized-fix bar, allowlists, two-strike escalation, draft mode for the highest-risk suite) are appropriate. The one thing I'd resolve before merging is the make test-<package> mapping ambiguity — leaving it implicit invites day-one silent abandons. The other findings are nice-to-have tightening that can land as follow-ups.

Suggested pre-merge:

  1. Add an explicit path-prefix → make-target table to _fix-policy.md.
  2. Clarify the attempted_fixes PR-marker reconciliation algorithm in one sentence.
  3. Resolve the single-part triage filename (/tmp/issue-triage-report.md vs -1.md) inconsistency.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR evolves the agentic-CI infrastructure from read-only daily audits into a two-phase audit-then-fix pipeline: four of five suites now open one PR per run for the highest-ranked localized finding, with a shared policy document (_fix-policy.md) governing allowlists, ranking, two-strike escalation, and a workflow-level scope gate that closes violating PRs and flips their state to abandoned.

  • New _fix-policy.md is the single source of truth for the fix-phase contract (localized-fix bar, path/command allowlists, finding-hash spec, fix_backlog/attempted_fixes schema, draft-PR promotion, atomicity).
  • The daily workflow (agentic-ci-daily.yml) splits into two sequential claude invocations (audit, then conditionally fix), adds a snapshot-based scope gate and a dependencies-specific lockfile gate, uploads richer artifacts always, and increases the job timeout to 40 min to accommodate the second invocation.
  • issue-triage/recipe.md is rewritten around action buckets so each flagged item is decision-ready; multi-part split, orphan reconciliation, and test() guard on capture() address all previously raised threading/fallback bugs.

Confidence Score: 4/5

Safe to merge with one known gap in the scope gate's orphan-PR coverage — a PR from a prior crashed run could escape all allowlist enforcement if reconciliation and a new fix happen to coexist in the same run.

The load-bearing scope gate uses | last // empty to identify the PR opened in this run. When the fix step's reconciliation back-fills entries for orphaned PRs from crashed prior runs, those entries also satisfy the grew-vs-snapshot predicate. | last picks the last entry (the newly opened PR), and any reconciled orphan entries escape scope validation permanently — a PR pushed in a crashed run that never passed the gate will remain open without allowlist or LOC checks ever being applied by the workflow.

.github/workflows/agentic-ci-daily.yml — both scope_gate and lockfile_gate use the | last // empty jq selector that validates only one of potentially multiple entries that grew vs the pre-fix snapshot.

Important Files Changed

Filename Overview
.github/workflows/agentic-ci-daily.yml Adds audit/fix split with scope gate, lockfile gate, and snapshot-based entry identification. The `
.agents/recipes/_fix-policy.md New load-bearing policy file defining the localized-fix bar, allowlists, finding-hash spec, fix_backlog/attempted_fixes schema, ranking, two-strike escalation, PR conventions, atomicity guarantees, and the workflow-level scope gate contract. Policy is internally consistent and previously flagged issues resolved.
.agents/recipes/issue-triage/recipe.md Major rewrite to action-bucket format. Multi-part split, reconciliation against existing bot comments, numbered file output, and orphan-part detection are all logically sound. Previous threading/fallback bugs addressed in prior commits.
.github/workflows/agentic-ci-issue-triage.yml Fallback posting now identity-based (per-part marker matching), capture() guarded by test() to prevent jq errors on unrelated bot comments, and loop correctly iterates only missing parts. All previously-flagged issues resolved.
.agents/recipes/code-quality/recipe.md Adds fix phase for bare-except narrowing only; TODO-line deletion explicitly forbidden; draft mode enforced until landing rate is proven. Regex correctly extends to cover except BaseException:.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions (matrix suite)
    participant Agent as claude (audit)
    participant AgentF as claude (fix)
    participant SG as scope_gate
    participant LG as lockfile_gate (deps only)
    participant GH as GitHub API

    GHA->>Agent: "prompt = phase-audit + runner + fix-policy + recipe"
    Agent->>GHA: writes runner-state.json (fix_backlog populated)
    GHA->>GHA: backlog step (check fix_backlog size)
    GHA->>GHA: snapshot step (capture attempted_fixes lengths)
    GHA->>AgentF: "prompt = phase-fix + runner + fix-policy + recipe"
    AgentF->>AgentF: reconcile attempted_fixes vs open PRs
    AgentF->>AgentF: pick highest-ranked candidate, apply fix
    AgentF->>GH: git push + gh pr create --body-file
    AgentF->>GHA: writes attempted_fixes entry (outcome: open)
    GHA->>SG: compare diff vs allowlist + LOC cap
    alt violation
        SG->>GH: gh pr close + delete branch
        SG->>GHA: flip attempted_fixes to abandoned
        GHA->>GHA: Save rejected gate state + exit 1
    else pass
        SG->>GHA: scope gate passed
        alt "suite == dependencies"
            GHA->>LG: make install-dev against pushed branch
            alt lockfile fails
                LG->>GH: gh pr close
                LG->>GHA: flip attempted_fixes to abandoned + exit 1
            end
        end
    end
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
.github/workflows/agentic-ci-daily.yml:284-292
**Orphaned-PR scope gate bypass via `| last` when reconciliation grows multiple entries**

The fix step runs reconciliation first (policy §Standard fix procedure step 1), which back-fills `attempted_fixes` entries for any PRs opened in crashed prior runs. Each back-filled entry increments from 0 to 1 relative to the pre-fix snapshot — the same "grew" signal as the new entry opened in this run. When at least one orphaned PR is reconciled AND a new PR is opened, two or more entries satisfy the `(length > $p[.id] // 0)` predicate. `| last // empty` picks only the last entry (the newly opened one), and the reconciled orphan — which was never scope-gated in its original crashed run — escapes validation permanently. A PR that violates the path allowlist or LOC cap could stay open indefinitely.

The same issue is duplicated in `lockfile_gate` at line 1331.

Reviews (10): Last reviewed commit: "fix(agentic-ci): harden daily fix gates" | Re-trigger Greptile

Comment thread .github/workflows/agentic-ci-daily.yml Outdated
Comment thread .agents/recipes/_fix-policy.md
Comment thread .agents/recipes/issue-triage/recipe.md Outdated
Comment thread .github/workflows/agentic-ci-issue-triage.yml Outdated
Comment thread .github/workflows/agentic-ci-issue-triage.yml
Comment thread .github/workflows/agentic-ci-issue-triage.yml
@johnnygreco
Copy link
Copy Markdown
Contributor

Thanks for putting this together, @andreatgretel — the iteration is visible, and the factoring of _fix-policy.md as the shared contract is the right call.

Summary

This reorganizes the weekly triage report around action buckets and flips four daily suites from report-only to opening one PR per run, with a shared policy doc covering selection, ranking, allowlists, two-strike escalation, and atomicity. The implementation matches the stated intent and reflects multiple rounds of review feedback (the partial-post detection and capture() guard were tightened well across commits 19db106591e87496).

I read this through the lens you asked for — humans-out-of-the-loop autonomous PR generation — and the bulk of my comments are about the trust boundary between policy and enforcement. Most of the safety bar lives in _fix-policy.md as instructions to the agent, but the workflow doesn't independently verify that the agent followed them before the PR is opened. That's a reasonable v1 stance, but it's worth being explicit about which guarantees are invariants and which are agent-best-effort.

Findings

Warnings — Worth addressing

_fix-policy.md:38-49 — Path allowlists and forbidden commands are policy, not gates

  • What: The path allowlists, ≤3 files / ≤50 LOC bound, and the shared forbidden-command list (git push --force, gh pr merge, gh pr close, gh pr review, pip install) are spelled out as constraints the agent must follow. Nothing in agentic-ci-daily.yml validates the diff before the agent pushes/opens the PR — the agent runs with GH_TOKEN: ${{ github.token }} and pull-requests: write, so a misbehaving or confused agent could in principle push to a forbidden path or self-merge.
  • Why: For "humans out of the loop", this is the trust boundary. The agent's compliance with _fix-policy.md is load-bearing. A single bad run that touches .github/workflows/** (allowed by the token) or merges its own PR (allowed if branch protection isn't strict enough) is hard to undo. The two-strike escalation handles "agent picked the wrong fix"; it doesn't handle "agent escaped the allowlist."
  • Suggestion: Add a workflow-level gate between the agent's commit and the PR open. After the fix phase finishes, run git diff --name-only HEAD~1 HEAD against the suite's allowlist and wc -l against the LOC cap; abort + delete the local branch (without force-push) if either fails. Same idea for the docstring-only constraint on docs-and-references (the path allowlist permits packages/*/src/**/*.py but the policy says "docstring-only" — a git diff filtered to non-docstring AST changes would catch escapes). Even a coarse check ("no lines outside docstrings, comments, or whitespace") is much stronger than relying on the prompt.

agentic-ci-daily.yml:71-73cancel-in-progress: true on the matrix job can interrupt mid-fix

  • What: The matrix-level concurrency group cancels in-progress runs when a new run for the same suite arrives (e.g., a workflow_dispatch while the cron run is still going). If cancellation lands between the local commit and the PR-marker comment that drives reconciliation, the orphaned branch exists but no attempted_fixes record does.
  • Why: The _fix-policy.md "Atomicity" section promises "no half-states." Reconciliation via the <!-- agentic-ci finding=<id> --> marker recovers state for opened PRs, but a cancellation between git push and gh pr create leaves a pushed branch with no PR — the next run won't see it via gh pr list and may attempt the same finding again. With humans out of the loop, this drifts silently.
  • Suggestion: Either flip the matrix job to cancel-in-progress: false (the worst case is a queued duplicate run, not an interrupted one), or extend the reconciliation pass to also enumerate agentic-ci/* branches without an open PR and either delete them or open the deferred PR. Branch deletion is forbidden by policy elsewhere — leaving them for pr-stale.yml to reap is fine, but document the failure mode.

_phase-audit.md:12 — "Do NOT modify any files outside {{memory_path}}/" contradicts the recipes

  • What: The phase directive forbids writing outside {{memory_path}}/, but every audit recipe instructs the agent to write the report to /tmp/audit-{{suite}}.md. The workflow then reads that exact path for the job summary and artifact upload (agentic-ci-daily.yml:268, 288).
  • Why: A literal-minded agent could refuse to write the report file, which would break the entire daily pipeline silently (job summary empty, no artifact, fix phase has no audit context).
  • Suggestion: Change the bullet to something like "Do NOT modify any files outside {{memory_path}}/ and /tmp/audit-{{suite}}.md." Same nit applies to _phase-fix.md if it grows similar wording — the fix phase legitimately writes /tmp/pr-body-{{suite}}.md.

agentic-ci-daily.yml:260-271 — Agent log only uploaded on failure

  • What: The upload-artifact step is gated on if: failure(). Successful runs leave no trace of the agent's full reasoning stream beyond what made it into the job summary.
  • Why: For autonomous PR generation, the most interesting failure is "the workflow succeeded but the PR was wrong" — a bad fix that lands, an over-eager allowlist interpretation, the agent doing something subtly off. Without the stream-json log on success, there's nothing to look back at when a maintainer notices the issue days later.
  • Suggestion: Drop the if: failure() (or change to if: always()). Disk + retention cost is small; debuggability for autonomous mode is the bigger win. Bonus: include runner-state.json in the artifact list so the post-run state is recoverable if the cache gets evicted.

dependencies/recipe.md:178-181make install-dev after pyproject.toml edits is policy, not enforced

  • What: The recipe says "Before running the per-package test target [...], run make install-dev to confirm the lockfile resolves cleanly." Like the allowlist, this is an instruction to the agent. The workflow doesn't validate that make install-dev re-ran after the pyproject change before the test target.
  • Why: A transitive-gap fix that adds an unresolvable specifier (e.g., a typo in a version) would pass the per-package test target if the venv is still using the pre-change lockfile. The PR opens; CI on the PR would catch it eventually, but for a fully autonomous flow the in-recipe verification is the first line of defense.
  • Suggestion: For the dependencies suite specifically, the workflow could re-run make install-dev and a quick python -c "import <new-dep>" in a post-fix step before letting gh pr create run. Cheap, deterministic, and exactly what "test-safe" was meant to guarantee.

Suggestions — Take it or leave it

agentic-ci-daily.yml:91-99 — Unused cache step id

  • What: id: cache is set but steps.cache.outputs.cache-hit is never referenced.
  • Suggestion: Drop the id for cleanliness, or use it to skip the "Initialize runner memory" step when there's a cache hit (saves a couple of lines of redundant work).

_fix-policy.md:206 — Draft mode toggle is manual-only

  • What: draft_until_proven flips from truefalse only by maintainer edit of runner-state.json after "at least two non-draft PRs have landed clean." For a humans-out-of-the-loop system, the flip still requires a human.
  • Why: Not a bug — this is intentional and probably right for v1. But worth an explicit note in the policy that this is a deliberate human-gated promotion step, so future-you (or the next agent reading the policy) doesn't decide to automate it without thinking through the implications.
  • Suggestion: Add a sentence to the "Draft PRs" bullet: "This flip is intentionally manual; it is the sole human-gated promotion step in the fix policy."

code-quality/recipe.md:260 — Bare-except eligibility leans on agent judgment of "exactly one plausible exception type"

  • What: Eligibility requires "grep across the try-block confirms exactly one exception type is plausibly raised, verified by inspecting the called functions or imported library docs." That's the most interpretive of all the eligibility rules — and you've already correctly gated it behind draft mode.
  • Suggestion: Consider tightening further: only mark eligible when the try-block calls a single function (not a chain), and that function's signature/docstring lists Raises: in Google style. That makes the rule mechanical (grep for Raises: in the called function's docstring) rather than inferential. Could narrow the eligible population, but every fix that lands becomes evidence the rule works.

agentic-ci-issue-triage.yml:117mapfile -t PARTS < <(...) overwrites the array we just collected

  • What: PARTS=(...) is built first via the glob, then mapfile -t PARTS < <(printf '%s\n' "${PARTS[@]}" | sort -V) rebuilds it sorted. This works but reads a little awkwardly because PARTS is being reassigned from itself through a subshell.
  • Suggestion: A direct sort would be clearer: IFS=$'\n' read -r -d '' -a PARTS < <(printf '%s\n' /tmp/issue-triage-report-*.md | sort -V && printf '\0'). Take it or leave it — current form is correct.

Minor — JSON formatting consistency

  • What: agentic-ci-daily.yml:257 writes runner-state.json with indent=2. The agent likely writes it compact (no instruction either way). Each handover flips the format, which produces noisy cache diffs.
  • Suggestion: Either standardize on indent=2 (add a one-line note in _fix-policy.md's "Runner-state schema" section) or leave compact. Pick one.

What Looks Good

  • _fix-policy.md as the shared contract. Pulling allowlists, ranking, the standard fix procedure, and the attempted_fixes lifecycle into one document is exactly the right factoring — each suite recipe shrinks to the parts that actually vary (eligible categories, branch type, test_required). It's a high-signal doc to read end-to-end; reviewers can verify the contract once.
  • Phase split via separate claude invocations. Two distinct invocations with _phase-audit.md / _phase-fix.md directives is structurally cleaner than a single agent that has to choose its own phase. It's easier to reason about what each invocation is allowed to do, easier to gate (the workflow's if: steps.audit.outcome == 'success' && ... && fix_backlog > 0 is dead simple), and matches the "no half-states" atomicity goal.
  • Identity-based partial-post detection in triage fallback. The evolution of this through the review (count → identity-based with test() guard before capture(), then iterating only MISSING[] instead of PARTS[]) is a small piece of bash but a meaningful one — it's exactly the kind of "duplicate post of one part shouldn't mask a missing other" failure mode that's hard to test for and easy to get wrong. Worth keeping the explanatory comment block at lines 120-126.
  • Two-strike escalation surfaced in weekly triage. Routing repeatedly-failed fix attempts into the human-facing weekly report (rather than just dropping them) is exactly the right escape valve for autonomous mode — the system gracefully tells maintainers "I gave up here, you decide."
  • test-health opting out of the fix phase. Acknowledging that hollow-test rewrites and missing-test authoring require inferring intent, and explicitly leaving them report-only with a documented future-candidate note for test-isolation, is good restraint.

Verdict

Needs changes — Five Warnings worth addressing before this runs autonomously:

  1. Add a workflow-level diff gate against the suite's path allowlist + LOC cap before PR open (_fix-policy.md enforcement).
  2. Reconsider cancel-in-progress: true on the matrix job, or extend reconciliation to cover orphaned branches.
  3. Reword the _phase-audit.md "do not modify outside {{memory_path}}/" line to allow /tmp/audit-{{suite}}.md (and the equivalent for fix phase).
  4. Always upload the agent log artifact, not only on failure.
  5. Either gate the dependencies suite on a post-fix make install-dev + import smoke check, or document explicitly that the per-package test target is the only verification.

The Suggestions are honestly nits — none of them block.


This review was generated by an AI assistant.

Reorganize the weekly issue-triage report around recommended actions
(close as resolved, close as duplicate, needs maintainer decision,
ready for assignment, stuck PR, duplicate PRs, stale) so each flagged
item carries action + evidence + rationale and can be resolved without
opening it. Multi-comment split with i/N markers and orphan
reconciliation when the report grows or shrinks.

Flip the four daily audit suites with mechanical fix categories from
read-only reports to opening one PR per run:

- docs-and-references: broken-link, docstring-drift, arch-ref-rename
- structure: missing-future, lazy-import
- dependencies: transitive-gap, unused
- code-quality: bare-except (draft until landing rate proven)

test-health stays report-only (all candidates require inferring intent).

The shared procedure - fix_backlog selection, finding-hash spec for
stable cross-run identification, attempted_fixes lifecycle with
two-strike escalation, allowlists, ranking, branch/PR conventions -
lives in .agents/recipes/_fix-policy.md. Each suite recipe declares
only its eligible categories, branch types, and test requirements.

Workflow runs claude twice per suite (audit, then conditionally fix),
each capped at the existing --max-turns 50. Fix call is gated on
non-empty fix_backlog and skipped entirely for test-health.
- Map per-package test targets explicitly in _fix-policy.md (Makefile
  exposes test-config/test-engine/test-interface, not test-<package>).
- Use github-actions[bot] noreply identity for commits the recipes
  produce.
- Refresh fix_backlog.data when an id already exists so the fix phase
  cannot drive a PR from stale data after the underlying file changed.
- Stop time-pruning closed/abandoned attempted_fixes entries — pruning
  before the two-strike threshold erases the history needed to
  escalate. Single-strike entries now age out only via the 200-entry
  cap.
- Disambiguate bare-except findings within the same function by
  including a try-body hash in the finding id.
- Audit grep for code-quality now matches both `except:` and
  `except BaseException:`, in parity with the fix eligibility.
- Restrict transitive-gap fix eligibility to cases where a sibling
  package already declares the dep (avoids inventing version
  specifiers from scratch).
- Issue-triage workflow handles multi-part reports in both the fallback
  post step and the job summary; recipe always writes numbered parts.
- Replace remaining `make test-<package>` references with pointers to
  the mapping table; only the table itself uses that placeholder now.
- Fix `gh api --paginate | jq | length` returning per-page counts: slurp
  with `jq -s 'add // 0'` to get a single total.
- Compare posted-comment count to expected part count so a partial post
  (agent posted part 1 but not 2/3) triggers the fallback instead of
  being silently treated as success.
- Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're
  not at the mercy of the runner's default shell.
- Disambiguate bare-except findings whose try-body hashes collide by
  adding a per-function ordinal to the canonical_key.
- Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at`
  (the schema has no `first_seen` field).
…back

Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an
identity-based check that extracts every i/N marker seen in
today-dated bot comments and verifies each expected i is present.
A duplicate post of one part can no longer mask a missing other.
- Exempt two-strike attempted_fixes entries from the 200-entry cap
  eviction. Cap now evicts non-two-strike oldest-first by
  attempts[0].at; two-strike entries are silently-forgotten only in
  the pathological all-200-are-two-strike case (itself a signal).
- Specify the attempted_fixes PR-marker reconciliation algorithm:
  scan open PR bodies for the `<!-- agentic-ci finding=<id> -->`
  marker and back-fill missing entries.
- Tighten the daily workflow conditionals to gate on explicit step
  outcomes (steps.audit.outcome == 'success' rather than success())
  so a future pre-audit gate cannot accidentally trip the fix step.
…ording)

- Bump daily-suite job timeout from 20 to 40 minutes. The split into
  two sequential `claude --max-turns 50` invocations can saturate a
  20-minute budget; a mid-fix SIGTERM would leave an orphaned branch
  and inconsistent runner-state.
- Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids
  rebuilding fix_backlog from scratch but does NOT override the
  per-candidate re-verification step required by _fix-policy.md
  step 4.1 (re-grep / re-read the specific file the candidate points
  at). Single-candidate re-verification is required; whole-codebase
  re-scanning is forbidden.
- Guard `jq capture()` with a `test()` select. `capture()` errors on
  non-match instead of returning empty, which would truncate
  SEEN_PARTS if any unrelated today-dated bot comment lacks the
  triage marker (e.g. from a sibling workflow). Adding the test()
  guard ensures capture() only runs on bodies that already match.
- Iterate the MISSING[] array when posting fallback parts, not the
  full PARTS[] array. Posting all parts when only some were missing
  was creating duplicate comments for the parts the agent already
  successfully posted.
Address the five Warnings from the 2026-05-07 review focused on the
trust boundary for autonomous PR generation. Five workflow/policy
adjustments shrink the surface where agent compliance is load-bearing:

- Workflow-level scope gate. After the fix step, re-derive the diff
  against `origin/main` and validate against the per-suite path
  allowlist (regex mirrored from `_fix-policy.md`), the 50-LOC cap, and
  the 3-file cap. On violation, close the PR with `--delete-branch`
  and flip the `attempted_fixes` entry from `open` to `abandoned` so
  two-strike logic still sees the failure. The recipe alone could not
  bind the agent's path choices; the workflow now does.
- Dependencies install-dev verification. For the dependencies suite
  only, re-run `make install-dev` after the scope gate so the agent's
  pyproject edit is exercised against the lockfile resolver. Closes
  the PR if `install-dev` fails — catches the failure mode where the
  per-package test target passed against the old cached lockfile.
- Flip matrix-job `cancel-in-progress` from true to false. A
  cancellation between the agent's git push and `gh pr create` would
  leave an orphaned branch with no `attempted_fixes` record;
  reconciliation only covers PRs that were opened. Queueing a
  duplicate run is the lesser evil. `_fix-policy.md` Atomicity
  section now documents the trade-off.
- Allow `/tmp/audit-{{suite}}.md` in `_phase-audit.md`'s "do not
  modify outside `{{memory_path}}/`" directive. A literal-minded
  agent could refuse to write the report file, which would break the
  job summary, artifact upload, and the fix phase's audit context.
- Always upload the agent log artifact (was `if: failure()` only) and
  include `runner-state.json`. For autonomous mode, the most
  interesting failure is "the workflow succeeded but the PR was
  wrong"; the stream-json log is the only way to look back days
  later.

Also takes johnnygreco's Suggestion 2: spell out in the policy doc
that the `draft_until_proven` flip is the sole human-gated
promotion step in the fix policy and must not be automated.

Greptile and the github-actions auto-reviewer's findings were
already closed in the prior pass-2/pass-3 commits; no action needed
on those.
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/agentic-ci-improvements branch from 91e8749 to 23829fb Compare May 7, 2026 23:40
Codex flagged five issues in the prior commit's scope/lockfile gates.
This commit closes all five:

- HIGH: Wrong-PR targeting. Both gates selected the last globally-open
  attempted_fixes entry, which could match a stale orphan from a
  prior crashed run rather than the PR opened by *this* run. Adds a
  pre-fix snapshot step that captures `(id, attempts-length)` pairs
  before the fix runs, and changes the post-fix selectors to require
  that the entry's attempts count grew during this run.
- HIGH: Docstring-only enforcement gap on the docs-and-references
  suite. The .py path allowlist was at workflow level but the
  docstring-only caveat was still policy-only. Adds an AST-based
  check: for each .py file changed, parse the post-change tree,
  collect docstring line ranges (module/class/function), then verify
  every added line in the diff is either inside a docstring, a
  comment, or whitespace. Verified locally with both pass and fail
  fixtures.
- MEDIUM: Diff-ref mismatch. Gates diffed `origin/main...HEAD` rather
  than `origin/main...origin/$BRANCH`, so a misbehaving agent that
  left HEAD pointing elsewhere would have validated the wrong tree.
  Now fetches `origin/$BRANCH` first and prefers that ref. Falls
  back to HEAD only if fetch fails (with a warning).
- MEDIUM: FILE_COUNT bug. `grep -c '.' || echo 0` produced "0\n0" on
  empty diff, breaking the downstream integer comparison. Replaces
  with `mapfile -t FILE_ARR` + `${#FILE_ARR[@]}`, which is correct
  for any input including empty.
- LOW: Non-atomic JSON writes. The runner-state mutations could leave
  the file half-written if the workflow was cancelled mid-write.
  Switches both gates to the temp-file + os.replace pattern.

Also: dependencies-lockfile gate now does an explicit
`git checkout --detach origin/$BRANCH` before re-running install-dev,
so verification runs against what was actually pushed rather than
relying on local working-tree state.
Comment thread .github/workflows/agentic-ci-daily.yml Outdated
Greptile review on 872d561 flagged that the fix step's custom `if:`
expression bypasses GitHub Actions' implicit success() check. Without
explicitly referencing steps.snapshot.outcome, a snapshot failure
(corrupt runner-state, disk error) would let the fix step run anyway.
The scope gate's `jq --slurpfile prior /tmp/prior-attempted-fixes.json`
would then exit non-zero on the missing file, leave OPEN empty, and
hit the "nothing to validate" early-exit — silently approving whatever
the agent pushed.

Adds steps.snapshot.outcome == 'success' to both the fix step's
condition (the actual fix) and the scope_gate step's condition
(belt-and-suspenders against future refactors).
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
Comment on lines +284 to +292
OPEN=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json '
(($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p
| .attempted_fixes // []
| map(select(
((.attempts | last | .outcome) == "open")
and ((.attempts | length) > ($p[.id] // 0))
))
| last // empty
' .agentic-ci-state/runner-state.json)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Orphaned-PR scope gate bypass via | last when reconciliation grows multiple entries

The fix step runs reconciliation first (policy §Standard fix procedure step 1), which back-fills attempted_fixes entries for any PRs opened in crashed prior runs. Each back-filled entry increments from 0 to 1 relative to the pre-fix snapshot — the same "grew" signal as the new entry opened in this run. When at least one orphaned PR is reconciled AND a new PR is opened, two or more entries satisfy the (length > $p[.id] // 0) predicate. | last // empty picks only the last entry (the newly opened one), and the reconciled orphan — which was never scope-gated in its original crashed run — escapes validation permanently. A PR that violates the path allowlist or LOC cap could stay open indefinitely.

The same issue is duplicated in lockfile_gate at line 1331.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/agentic-ci-daily.yml
Line: 284-292

Comment:
**Orphaned-PR scope gate bypass via `| last` when reconciliation grows multiple entries**

The fix step runs reconciliation first (policy §Standard fix procedure step 1), which back-fills `attempted_fixes` entries for any PRs opened in crashed prior runs. Each back-filled entry increments from 0 to 1 relative to the pre-fix snapshot — the same "grew" signal as the new entry opened in this run. When at least one orphaned PR is reconciled AND a new PR is opened, two or more entries satisfy the `(length > $p[.id] // 0)` predicate. `| last // empty` picks only the last entry (the newly opened one), and the reconciled orphan — which was never scope-gated in its original crashed run — escapes validation permanently. A PR that violates the path allowlist or LOC cap could stay open indefinitely.

The same issue is duplicated in `lockfile_gate` at line 1331.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants