feat(agentic-ci): decision-ready triage and daily PR fixes#600
feat(agentic-ci): decision-ready triage and daily PR fixes#600andreatgretel wants to merge 11 commits intomainfrom
Conversation
PR #600 Review —
|
Greptile SummaryThis PR evolves the agentic-CI infrastructure from read-only daily audits into a two-phase audit-then-fix pipeline: four of five suites now open one PR per run for the highest-ranked localized finding, with a shared policy document (
|
| Filename | Overview |
|---|---|
| .github/workflows/agentic-ci-daily.yml | Adds audit/fix split with scope gate, lockfile gate, and snapshot-based entry identification. The ` |
| .agents/recipes/_fix-policy.md | New load-bearing policy file defining the localized-fix bar, allowlists, finding-hash spec, fix_backlog/attempted_fixes schema, ranking, two-strike escalation, PR conventions, atomicity guarantees, and the workflow-level scope gate contract. Policy is internally consistent and previously flagged issues resolved. |
| .agents/recipes/issue-triage/recipe.md | Major rewrite to action-bucket format. Multi-part split, reconciliation against existing bot comments, numbered file output, and orphan-part detection are all logically sound. Previous threading/fallback bugs addressed in prior commits. |
| .github/workflows/agentic-ci-issue-triage.yml | Fallback posting now identity-based (per-part marker matching), capture() guarded by test() to prevent jq errors on unrelated bot comments, and loop correctly iterates only missing parts. All previously-flagged issues resolved. |
| .agents/recipes/code-quality/recipe.md | Adds fix phase for bare-except narrowing only; TODO-line deletion explicitly forbidden; draft mode enforced until landing rate is proven. Regex correctly extends to cover except BaseException:. |
Sequence Diagram
sequenceDiagram
participant GHA as GitHub Actions (matrix suite)
participant Agent as claude (audit)
participant AgentF as claude (fix)
participant SG as scope_gate
participant LG as lockfile_gate (deps only)
participant GH as GitHub API
GHA->>Agent: "prompt = phase-audit + runner + fix-policy + recipe"
Agent->>GHA: writes runner-state.json (fix_backlog populated)
GHA->>GHA: backlog step (check fix_backlog size)
GHA->>GHA: snapshot step (capture attempted_fixes lengths)
GHA->>AgentF: "prompt = phase-fix + runner + fix-policy + recipe"
AgentF->>AgentF: reconcile attempted_fixes vs open PRs
AgentF->>AgentF: pick highest-ranked candidate, apply fix
AgentF->>GH: git push + gh pr create --body-file
AgentF->>GHA: writes attempted_fixes entry (outcome: open)
GHA->>SG: compare diff vs allowlist + LOC cap
alt violation
SG->>GH: gh pr close + delete branch
SG->>GHA: flip attempted_fixes to abandoned
GHA->>GHA: Save rejected gate state + exit 1
else pass
SG->>GHA: scope gate passed
alt "suite == dependencies"
GHA->>LG: make install-dev against pushed branch
alt lockfile fails
LG->>GH: gh pr close
LG->>GHA: flip attempted_fixes to abandoned + exit 1
end
end
end
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
.github/workflows/agentic-ci-daily.yml:284-292
**Orphaned-PR scope gate bypass via `| last` when reconciliation grows multiple entries**
The fix step runs reconciliation first (policy §Standard fix procedure step 1), which back-fills `attempted_fixes` entries for any PRs opened in crashed prior runs. Each back-filled entry increments from 0 to 1 relative to the pre-fix snapshot — the same "grew" signal as the new entry opened in this run. When at least one orphaned PR is reconciled AND a new PR is opened, two or more entries satisfy the `(length > $p[.id] // 0)` predicate. `| last // empty` picks only the last entry (the newly opened one), and the reconciled orphan — which was never scope-gated in its original crashed run — escapes validation permanently. A PR that violates the path allowlist or LOC cap could stay open indefinitely.
The same issue is duplicated in `lockfile_gate` at line 1331.
Reviews (10): Last reviewed commit: "fix(agentic-ci): harden daily fix gates" | Re-trigger Greptile
|
Thanks for putting this together, @andreatgretel — the iteration is visible, and the factoring of SummaryThis reorganizes the weekly triage report around action buckets and flips four daily suites from report-only to opening one PR per run, with a shared policy doc covering selection, ranking, allowlists, two-strike escalation, and atomicity. The implementation matches the stated intent and reflects multiple rounds of review feedback (the partial-post detection and I read this through the lens you asked for — humans-out-of-the-loop autonomous PR generation — and the bulk of my comments are about the trust boundary between policy and enforcement. Most of the safety bar lives in FindingsWarnings — Worth addressing
Suggestions — Take it or leave it
Minor — JSON formatting consistency
What Looks Good
VerdictNeeds changes — Five Warnings worth addressing before this runs autonomously:
The Suggestions are honestly nits — none of them block. This review was generated by an AI assistant. |
Reorganize the weekly issue-triage report around recommended actions (close as resolved, close as duplicate, needs maintainer decision, ready for assignment, stuck PR, duplicate PRs, stale) so each flagged item carries action + evidence + rationale and can be resolved without opening it. Multi-comment split with i/N markers and orphan reconciliation when the report grows or shrinks. Flip the four daily audit suites with mechanical fix categories from read-only reports to opening one PR per run: - docs-and-references: broken-link, docstring-drift, arch-ref-rename - structure: missing-future, lazy-import - dependencies: transitive-gap, unused - code-quality: bare-except (draft until landing rate proven) test-health stays report-only (all candidates require inferring intent). The shared procedure - fix_backlog selection, finding-hash spec for stable cross-run identification, attempted_fixes lifecycle with two-strike escalation, allowlists, ranking, branch/PR conventions - lives in .agents/recipes/_fix-policy.md. Each suite recipe declares only its eligible categories, branch types, and test requirements. Workflow runs claude twice per suite (audit, then conditionally fix), each capped at the existing --max-turns 50. Fix call is gated on non-empty fix_backlog and skipped entirely for test-health.
- Map per-package test targets explicitly in _fix-policy.md (Makefile exposes test-config/test-engine/test-interface, not test-<package>). - Use github-actions[bot] noreply identity for commits the recipes produce. - Refresh fix_backlog.data when an id already exists so the fix phase cannot drive a PR from stale data after the underlying file changed. - Stop time-pruning closed/abandoned attempted_fixes entries — pruning before the two-strike threshold erases the history needed to escalate. Single-strike entries now age out only via the 200-entry cap. - Disambiguate bare-except findings within the same function by including a try-body hash in the finding id. - Audit grep for code-quality now matches both `except:` and `except BaseException:`, in parity with the fix eligibility. - Restrict transitive-gap fix eligibility to cases where a sibling package already declares the dep (avoids inventing version specifiers from scratch). - Issue-triage workflow handles multi-part reports in both the fallback post step and the job summary; recipe always writes numbered parts.
- Replace remaining `make test-<package>` references with pointers to the mapping table; only the table itself uses that placeholder now. - Fix `gh api --paginate | jq | length` returning per-page counts: slurp with `jq -s 'add // 0'` to get a single total. - Compare posted-comment count to expected part count so a partial post (agent posted part 1 but not 2/3) triggers the fallback instead of being silently treated as success. - Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're not at the mercy of the runner's default shell. - Disambiguate bare-except findings whose try-body hashes collide by adding a per-function ordinal to the canonical_key. - Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at` (the schema has no `first_seen` field).
…back Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an identity-based check that extracts every i/N marker seen in today-dated bot comments and verifies each expected i is present. A duplicate post of one part can no longer mask a missing other.
- Exempt two-strike attempted_fixes entries from the 200-entry cap eviction. Cap now evicts non-two-strike oldest-first by attempts[0].at; two-strike entries are silently-forgotten only in the pathological all-200-are-two-strike case (itself a signal). - Specify the attempted_fixes PR-marker reconciliation algorithm: scan open PR bodies for the `<!-- agentic-ci finding=<id> -->` marker and back-fill missing entries. - Tighten the daily workflow conditionals to gate on explicit step outcomes (steps.audit.outcome == 'success' rather than success()) so a future pre-audit gate cannot accidentally trip the fix step.
…ording) - Bump daily-suite job timeout from 20 to 40 minutes. The split into two sequential `claude --max-turns 50` invocations can saturate a 20-minute budget; a mid-fix SIGTERM would leave an orphaned branch and inconsistent runner-state. - Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids rebuilding fix_backlog from scratch but does NOT override the per-candidate re-verification step required by _fix-policy.md step 4.1 (re-grep / re-read the specific file the candidate points at). Single-candidate re-verification is required; whole-codebase re-scanning is forbidden.
- Guard `jq capture()` with a `test()` select. `capture()` errors on non-match instead of returning empty, which would truncate SEEN_PARTS if any unrelated today-dated bot comment lacks the triage marker (e.g. from a sibling workflow). Adding the test() guard ensures capture() only runs on bodies that already match. - Iterate the MISSING[] array when posting fallback parts, not the full PARTS[] array. Posting all parts when only some were missing was creating duplicate comments for the parts the agent already successfully posted.
Address the five Warnings from the 2026-05-07 review focused on the
trust boundary for autonomous PR generation. Five workflow/policy
adjustments shrink the surface where agent compliance is load-bearing:
- Workflow-level scope gate. After the fix step, re-derive the diff
against `origin/main` and validate against the per-suite path
allowlist (regex mirrored from `_fix-policy.md`), the 50-LOC cap, and
the 3-file cap. On violation, close the PR with `--delete-branch`
and flip the `attempted_fixes` entry from `open` to `abandoned` so
two-strike logic still sees the failure. The recipe alone could not
bind the agent's path choices; the workflow now does.
- Dependencies install-dev verification. For the dependencies suite
only, re-run `make install-dev` after the scope gate so the agent's
pyproject edit is exercised against the lockfile resolver. Closes
the PR if `install-dev` fails — catches the failure mode where the
per-package test target passed against the old cached lockfile.
- Flip matrix-job `cancel-in-progress` from true to false. A
cancellation between the agent's git push and `gh pr create` would
leave an orphaned branch with no `attempted_fixes` record;
reconciliation only covers PRs that were opened. Queueing a
duplicate run is the lesser evil. `_fix-policy.md` Atomicity
section now documents the trade-off.
- Allow `/tmp/audit-{{suite}}.md` in `_phase-audit.md`'s "do not
modify outside `{{memory_path}}/`" directive. A literal-minded
agent could refuse to write the report file, which would break the
job summary, artifact upload, and the fix phase's audit context.
- Always upload the agent log artifact (was `if: failure()` only) and
include `runner-state.json`. For autonomous mode, the most
interesting failure is "the workflow succeeded but the PR was
wrong"; the stream-json log is the only way to look back days
later.
Also takes johnnygreco's Suggestion 2: spell out in the policy doc
that the `draft_until_proven` flip is the sole human-gated
promotion step in the fix policy and must not be automated.
Greptile and the github-actions auto-reviewer's findings were
already closed in the prior pass-2/pass-3 commits; no action needed
on those.
91e8749 to
23829fb
Compare
Codex flagged five issues in the prior commit's scope/lockfile gates.
This commit closes all five:
- HIGH: Wrong-PR targeting. Both gates selected the last globally-open
attempted_fixes entry, which could match a stale orphan from a
prior crashed run rather than the PR opened by *this* run. Adds a
pre-fix snapshot step that captures `(id, attempts-length)` pairs
before the fix runs, and changes the post-fix selectors to require
that the entry's attempts count grew during this run.
- HIGH: Docstring-only enforcement gap on the docs-and-references
suite. The .py path allowlist was at workflow level but the
docstring-only caveat was still policy-only. Adds an AST-based
check: for each .py file changed, parse the post-change tree,
collect docstring line ranges (module/class/function), then verify
every added line in the diff is either inside a docstring, a
comment, or whitespace. Verified locally with both pass and fail
fixtures.
- MEDIUM: Diff-ref mismatch. Gates diffed `origin/main...HEAD` rather
than `origin/main...origin/$BRANCH`, so a misbehaving agent that
left HEAD pointing elsewhere would have validated the wrong tree.
Now fetches `origin/$BRANCH` first and prefers that ref. Falls
back to HEAD only if fetch fails (with a warning).
- MEDIUM: FILE_COUNT bug. `grep -c '.' || echo 0` produced "0\n0" on
empty diff, breaking the downstream integer comparison. Replaces
with `mapfile -t FILE_ARR` + `${#FILE_ARR[@]}`, which is correct
for any input including empty.
- LOW: Non-atomic JSON writes. The runner-state mutations could leave
the file half-written if the workflow was cancelled mid-write.
Switches both gates to the temp-file + os.replace pattern.
Also: dependencies-lockfile gate now does an explicit
`git checkout --detach origin/$BRANCH` before re-running install-dev,
so verification runs against what was actually pushed rather than
relying on local working-tree state.
Greptile review on 872d561 flagged that the fix step's custom `if:` expression bypasses GitHub Actions' implicit success() check. Without explicitly referencing steps.snapshot.outcome, a snapshot failure (corrupt runner-state, disk error) would let the fix step run anyway. The scope gate's `jq --slurpfile prior /tmp/prior-attempted-fixes.json` would then exit non-zero on the missing file, leave OPEN empty, and hit the "nothing to validate" early-exit — silently approving whatever the agent pushed. Adds steps.snapshot.outcome == 'success' to both the fix step's condition (the actual fix) and the scope_gate step's condition (belt-and-suspenders against future refactors).
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
| OPEN=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' | ||
| (($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p | ||
| | .attempted_fixes // [] | ||
| | map(select( | ||
| ((.attempts | last | .outcome) == "open") | ||
| and ((.attempts | length) > ($p[.id] // 0)) | ||
| )) | ||
| | last // empty | ||
| ' .agentic-ci-state/runner-state.json) |
There was a problem hiding this comment.
Orphaned-PR scope gate bypass via
| last when reconciliation grows multiple entries
The fix step runs reconciliation first (policy §Standard fix procedure step 1), which back-fills attempted_fixes entries for any PRs opened in crashed prior runs. Each back-filled entry increments from 0 to 1 relative to the pre-fix snapshot — the same "grew" signal as the new entry opened in this run. When at least one orphaned PR is reconciled AND a new PR is opened, two or more entries satisfy the (length > $p[.id] // 0) predicate. | last // empty picks only the last entry (the newly opened one), and the reconciled orphan — which was never scope-gated in its original crashed run — escapes validation permanently. A PR that violates the path allowlist or LOC cap could stay open indefinitely.
The same issue is duplicated in lockfile_gate at line 1331.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/agentic-ci-daily.yml
Line: 284-292
Comment:
**Orphaned-PR scope gate bypass via `| last` when reconciliation grows multiple entries**
The fix step runs reconciliation first (policy §Standard fix procedure step 1), which back-fills `attempted_fixes` entries for any PRs opened in crashed prior runs. Each back-filled entry increments from 0 to 1 relative to the pre-fix snapshot — the same "grew" signal as the new entry opened in this run. When at least one orphaned PR is reconciled AND a new PR is opened, two or more entries satisfy the `(length > $p[.id] // 0)` predicate. `| last // empty` picks only the last entry (the newly opened one), and the reconciled orphan — which was never scope-gated in its original crashed run — escapes validation permanently. A PR that violates the path allowlist or LOC cap could stay open indefinitely.
The same issue is duplicated in `lockfile_gate` at line 1331.
How can I resolve this? If you propose a fix, please make it concise.
📋 Summary
Reorganize the weekly issue-triage report around recommended actions so each flagged item is decision-ready, and flip four of the five daily audit suites from read-only reports to opening one PR per run for the most important localized fix. The shared procedure (selection, ranking, allowlists, attempted-fixes lifecycle, two-strike escalation, branch/PR conventions) lives in a single
_fix-policy.md; suite recipes declare only their eligible categories.🔗 Related Issue
N/A — extends the agentic-CI work tracked in plan 472. We can link a follow-up tracking issue once one is opened.
🔄 Changes
.agents/recipes/_fix-policy.md— universal localized-fix bar (≤3 files, ≤50 LOC, reversible, self-evident, test-safe, single-concern), per-suite path/command allowlists, finding-hash spec for stable cross-run identification,fix_backlog/attempted_fixesschema, ranking criteria (confidence > severity > impact > recency), draft-PR mode, two-strike escalation, standard fix procedure, directgh pr create --body-filePR-creation pattern..agents/recipes/_phase-audit.mdand.agents/recipes/_phase-fix.md— phase directives prepended to eachclaudeinvocation so each call knows which phase it executes..agents/recipes/issue-triage/recipe.md: action-organized buckets (Close as resolved,Close as duplicate,Needs maintainer decision,Ready for assignment,Stuck PR,Duplicate PRs,Stale, consider closing), per-row Action / Evidence / Rationale columns, healthy items collapsed to count +<details>, multi-comment split with:i/Nmarkers and orphan reconciliation, plus aRepeatedly-failed fix attemptssection that surfaces two-strike findings from the daily suites._fix-policy.md):docs-and-references: broken-link, docstring-drift (signature-driven), arch-ref-rename. Non-draft.structure: missing-future, lazy-import. Non-draft. Dead exports stay report-only.dependencies: transitive-gap, unused. Non-draft.code-quality: bare-except narrowing only. Draft PRs untildraft_until_provenis flipped after two non-draft PRs land clean. TODO-line deletion explicitly forbidden.test-health: explicit "no fix phase, all categories report-only" with a future-candidate note for test-isolation violations..github/workflows/agentic-ci-daily.yml: split the recipe step into twoclaudeinvocations (audit, then conditionally fix), each with the existing--max-turns 50. Fix step gated on audit success ANDmatrix.suite != 'test-health'AND non-emptyfix_backlog. Adds a git-identity step, expands artifact upload to include the fix log + PR body, and reports both phase outcomes in the job summary. No branch auto-deletion._runner.mdupdates: generalized branch prefix beyondchore, documented why CI usesgh pr create --body-fileinstead of/create-pr.🔍 Attention Areas
.agents/recipes/_fix-policy.md— load-bearing contract. Allowlists, ranking, two-strike escalation, and the standard fix procedure all live here. Worth reviewing in full..github/workflows/agentic-ci-daily.yml— the audit/fix split, the backlog gate (fromJSON(steps.backlog.outputs.size || '0') > 0), and thematrix.suite != 'test-health'guard.agentic-ci,agentic-ci/docs-and-references,agentic-ci/structure,agentic-ci/dependencies,agentic-ci/code-quality. The recipes apply these viagh pr edit --add-label; missing labels won't block the PR opening but will surface aghwarning.🧪 Testing
make test— N/A. No Python or other source code changed; the diff is recipes (markdown), workflow YAML, and one new policy doc.python3 -c "import yaml; yaml.safe_load(...)")workflow_dispatchone at a time and read the first 1–2 actual runs before promoting to the next, per the validation section of the plan. Two-strike escalation, allowlist enforcement, and re-attempt blocking are all observable from the runner state and PR list.✅ Checklist
_fix-policy.mditself is the architecture doc for the fix phase.