From e8bf5891b5a01ca53ff685d51a87a67bbf12c7a8 Mon Sep 17 00:00:00 2001 From: Andre Manoel Date: Mon, 4 May 2026 16:32:52 +0000 Subject: [PATCH 01/12] feat(agentic-ci): decision-ready triage and daily PR fixes Reorganize the weekly issue-triage report around recommended actions (close as resolved, close as duplicate, needs maintainer decision, ready for assignment, stuck PR, duplicate PRs, stale) so each flagged item carries action + evidence + rationale and can be resolved without opening it. Multi-comment split with i/N markers and orphan reconciliation when the report grows or shrinks. Flip the four daily audit suites with mechanical fix categories from read-only reports to opening one PR per run: - docs-and-references: broken-link, docstring-drift, arch-ref-rename - structure: missing-future, lazy-import - dependencies: transitive-gap, unused - code-quality: bare-except (draft until landing rate proven) test-health stays report-only (all candidates require inferring intent). The shared procedure - fix_backlog selection, finding-hash spec for stable cross-run identification, attempted_fixes lifecycle with two-strike escalation, allowlists, ranking, branch/PR conventions - lives in .agents/recipes/_fix-policy.md. Each suite recipe declares only its eligible categories, branch types, and test requirements. Workflow runs claude twice per suite (audit, then conditionally fix), each capped at the existing --max-turns 50. Fix call is gated on non-empty fix_backlog and skipped entirely for test-health. --- .agents/recipes/_fix-policy.md | 204 ++++++++++++++++ .agents/recipes/_phase-audit.md | 17 ++ .agents/recipes/_phase-fix.md | 23 ++ .agents/recipes/_runner.md | 14 +- .agents/recipes/code-quality/recipe.md | 53 +++- .agents/recipes/dependencies/recipe.md | 38 ++- .agents/recipes/docs-and-references/recipe.md | 28 ++- .agents/recipes/issue-triage/recipe.md | 231 ++++++++++-------- .agents/recipes/structure/recipe.md | 28 ++- .agents/recipes/test-health/recipe.md | 22 +- .github/workflows/agentic-ci-daily.yml | 67 ++++- 11 files changed, 607 insertions(+), 118 deletions(-) create mode 100644 .agents/recipes/_fix-policy.md create mode 100644 .agents/recipes/_phase-audit.md create mode 100644 .agents/recipes/_phase-fix.md diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md new file mode 100644 index 000000000..963824cab --- /dev/null +++ b/.agents/recipes/_fix-policy.md @@ -0,0 +1,204 @@ +# Agentic CI Fix Policy + +Prepended to every daily-suite recipe alongside `_runner.md`. Defines what +"open a PR" means for these recipes and the rules that apply across all of +them. Each suite recipe declares only its eligible finding categories, its +branch types, and any risk-specific notes — everything else is here. + +When in doubt, fall back to report-only. + +## Localized fix bar + +A finding may be converted to a fix only if all hold: + +- **Bounded scope**: ≤3 files, ≤50 LOC net. +- **Reversible**: no public API changes, no `__all__` deletions, no version + bumps (Dependabot owns those), no schema changes, no migrations. +- **Self-evident**: the audit established both the problem *and* the unique + correct fix. Mechanical, not interpretive. +- **Test-safe**: when the recipe declares `test_required`, run + `make test-` for the affected package and abort on failure. +- **Single concern**: one finding per PR. +- **Allowlisted paths**: matches the suite's path allowlist. + +If the top-ranked candidate fails the bar, try the next. If none of the top +5 qualify, skip the fix step and emit report-only. + +## Allowlists + +### Per-suite path allowlist + +| Suite | Paths the recipe MAY modify | +|-------|-----------------------------| +| docs-and-references | `architecture/**`, `docs/**`, `README.md`, `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md`, `packages/*/src/**/*.py` (docstring-only edits) | +| dependencies | `packages/*/pyproject.toml` | +| structure | `packages/*/src/**/*.py` | +| code-quality | `packages/*/src/**/*.py` | +| test-health | (no fix phase) | + +### Shared forbidden paths (all suites) + +- `.github/workflows/**`, `.agents/**`, repo-root `pyproject.toml`, + `.git/**`, anything in `.gitignore`. + +### Shared forbidden commands + +- `git push --force` (any variant), `git rebase`, `git reset --hard`, + `git branch -D`/`-d`/`--delete`. +- `gh pr merge`, `gh pr close`, `gh pr review`. +- `pip install`, `uv pip install` (use `make install-dev` only). + +## Runner-state schema + +Each daily recipe maintains two arrays in +`{{memory_path}}/runner-state.json` beyond the existing `known_issues` / +`baselines`: + +```json +{ + "fix_backlog": [ + { "id": "", "category": "...", "first_seen": "YYYY-MM-DD", + "last_seen": "YYYY-MM-DD", "data": { /* category fields */ } } + ], + "attempted_fixes": [ + { "id": "", "attempts": [ + { "pr_number": 612, "outcome": "merged", "at": "YYYY-MM-DD", + "branch": "agentic-ci/..." } + ] } + ] +} +``` + +Also: `draft_until_proven` (boolean, per-suite, default `true` for +code-quality and unset elsewhere) controls draft-PR mode. + +### `fix_backlog` rules (audit phase populates this) + +- Append every detected finding in an eligible category. Update `last_seen` + if `id` already present. +- Drop entries with `last_seen` older than 30 days. +- Cap at 200 entries (drop oldest by `first_seen`). +- Populated **before** the `known_issues` filter so fixable findings persist + even when their report row is suppressed for being unchanged. + +### `attempted_fixes` rules + +`outcome` ∈ `{open, merged, closed, abandoned}`. + +- `abandoned` means the recipe could not produce a PR (tests failed, + conflict, lint failed, allowlist rejected, etc.). +- Reconcile against open PRs (`gh pr list`) at the start of each fix run + to recover from crashes that left state un-updated. +- Prune: drop `merged` >90d, drop single `closed`/`abandoned` >30d. +- Two-strike entries (≥2 `closed`/`abandoned`) are NOT pruned; they + surface in the report under `Repeatedly-failed fix attempts`. + +## Finding hash + +`finding_id = sha1(suite + ":" + canonical_key)[:12]`, where +`canonical_key` uses durable identifiers only — never line numbers or free +text: + +| Suite (category) | canonical_key | +|------------------|---------------| +| docs (broken-link) | `:` | +| docs (docstring-drift) | `:::` | +| docs (arch-ref-rename) | `:` | +| dependencies (transitive-gap) | `::transitive` | +| dependencies (unused) | `::unused` | +| structure (missing-future) | `:missing-future` | +| structure (lazy-import) | `:lazy-import:` | +| code-quality (bare-except) | `::bare-except` | + +Symbols use fully-qualified Python names. + +## Ranking + +Earlier criteria override later ones: + +1. **Fix confidence** (per-category): + + | Category | Confidence | + |----------|-----------| + | structure / missing-future | 1.0 | + | structure / lazy-import | 0.9 | + | docs / broken-link | 0.9 | + | dependencies / transitive-gap | 0.85 | + | docs / arch-ref-rename | 0.8 | + | dependencies / unused | 0.75 | + | docs / docstring-drift | 0.75 | + | code-quality / bare-except | 0.6 | + +2. **Defect severity**: + + | Severity | Examples | + |----------|----------| + | high | missing transitive dep, heavy import bypassing lazy system | + | medium | broken doc link visible on docs site, bare-except hiding errors, docstring drift on public API | + | low | broken link in dev-notes, missing `__future__ import annotations`, unused dep | + +3. **User-facing impact** — visible to docs-site readers or plugin + consumers vs internal-only. + +4. **Recency** — newer findings rank above long-standing ones. + +Record the chosen finding's id, scores, and rationale at the top of +`/tmp/audit-{{suite}}.md`. + +## Standard fix procedure + +The fix phase of every eligible recipe follows these steps. Suite recipes +declare only the parts that vary (eligible categories, branch type, +`test_required`, suite-specific quirks). + +1. Reconcile `attempted_fixes` against open PRs (`gh pr list`) to recover + any state lost to a prior crash. +2. Filter `fix_backlog`: drop entries whose latest attempt is `open` or + `merged`; surface two-strike entries in the report's + `Repeatedly-failed fix attempts` section and drop them from selection. +3. Rank the remainder per the Ranking section. +4. For each candidate, top 5 max: + 1. Re-verify the finding still applies (re-grep / re-read). If not, + remove from `fix_backlog` and continue. + 2. Apply the fix. If the diff exceeds the localized-fix bar or touches + a non-allowlisted path, abandon and continue. + 3. If the category sets `test_required: true`, run + `make test-` for the package containing the change. On + failure: abandon and continue. + 4. Branch: `agentic-ci//-YYYYMMDD-`. Commit: + `(agentic-ci): `. Push. + 5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the + hidden metadata block: + `` + 6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft` + iff `draft_until_proven` is true for the suite. + 7. `gh pr edit --add-label agentic-ci --add-label agentic-ci/`. + 8. Record `attempted_fixes` entry with `outcome: "open"` and exit. +5. If all 5 candidates were abandoned, append a one-line note to the + report and exit cleanly. The state already reflects the abandonments. + +On any failure mid-flow: record `outcome: "abandoned"` for the chosen +finding (with `pr_number: null`), leave any pushed branch in place +(`pr-stale.yml` will reap it; branch deletion is forbidden), and continue +to the next candidate. + +## PR conventions + +- **Use `gh pr create --body-file`**, not `/create-pr`. The skill is + interactive-only and shells the body inline; CI needs determinism. +- **Title**: conventional, `(agentic-ci): `. +- **Labels**: `agentic-ci`, `agentic-ci/`. +- **Draft PRs**: `code-quality` opens draft until a maintainer flips + `draft_until_proven` to `false` in runner-state, after at least two + non-draft PRs from that suite have landed clean. + +## Atomicity + +Each fix-phase invocation produces exactly one of: + +- **Report-only** — runner-state updated; no branch, commit, or PR. +- **Report + PR** — same, plus a pushed branch, a commit, and a PR. The + `attempted_fixes` entry is recorded *before* the recipe exits. + +No half-states. The runner state is the source of truth for what the +recipe has tried; never silently drop a failed attempt. diff --git a/.agents/recipes/_phase-audit.md b/.agents/recipes/_phase-audit.md new file mode 100644 index 000000000..5caefa0c8 --- /dev/null +++ b/.agents/recipes/_phase-audit.md @@ -0,0 +1,17 @@ +## Phase directive + +This invocation runs the **AUDIT** phase only. + +- Execute the audit steps from the recipe and write the report to + `/tmp/audit-{{suite}}.md`. +- Update `{{memory_path}}/runner-state.json` with detected findings, + including `fix_backlog` entries per `_fix-policy.md` (populated BEFORE + applying the `known_issues` filter to the report, so fixable findings + persist across runs even when their report row is suppressed). +- Do NOT attempt any fix. Do NOT create any branches, commits, or PRs. +- Do NOT modify any files outside `{{memory_path}}/`. +- A separate invocation will run the FIX phase if `fix_backlog` has + eligible candidates and the suite has a fix phase. +- Read the recipe in full for context; the "Fix phase" section informs + which finding categories should populate `fix_backlog`, but you must + not act on them in this invocation. diff --git a/.agents/recipes/_phase-fix.md b/.agents/recipes/_phase-fix.md new file mode 100644 index 000000000..48bf77a09 --- /dev/null +++ b/.agents/recipes/_phase-fix.md @@ -0,0 +1,23 @@ +## Phase directive + +This invocation runs the **FIX** phase only. + +- The audit phase has already completed in a previous invocation. Its + report is at `/tmp/audit-{{suite}}.md` and + `{{memory_path}}/runner-state.json` has the populated `fix_backlog`. +- Execute only the recipe's "Fix phase" section per `_fix-policy.md`. + Do NOT redo audit work; do NOT re-scan the codebase to rebuild + findings. +- Pick the highest-ranked eligible candidate from `fix_backlog`, apply + the fix, run the package's tests if applicable, commit, push, and open + the PR using `gh pr create --body-file`. +- Record the attempt in `attempted_fixes` (whether successful, abandoned, + or failed through the top-5 fallback) before exiting. +- If no candidate qualifies after trying up to 5 of them, exit cleanly, + append a short note to `/tmp/audit-{{suite}}.md` describing what was + tried, and update `attempted_fixes` accordingly. Do NOT open a PR. +- Do NOT delete branches, even on failure (per `_runner.md` and + `_fix-policy.md`). Leave them for the existing `pr-stale.yml` workflow + to reap over time. +- Read the recipe in full for context, but treat the audit phase as + already done. diff --git a/.agents/recipes/_runner.md b/.agents/recipes/_runner.md index 98081279b..9705633d5 100644 --- a/.agents/recipes/_runner.md +++ b/.agents/recipes/_runner.md @@ -76,6 +76,14 @@ Write all output to a temp file (e.g., `/tmp/recipe-output.md`). The workflow will handle posting it. Do not post directly to GitHub - the workflow controls output routing. -If your recipe produces code changes, commit them on a new branch and use -`/create-pr` to open a pull request. The branch name should follow the -pattern `agentic-ci/chore/{suite}-YYYYMMDD`. +If your recipe produces code changes, commit them on a new branch following +the pattern `agentic-ci/{type}/{suite}-YYYYMMDD-{short-slug}` where `{type}` +matches the change kind (`chore`/`docs`/`fix`/`refactor`). + +For PR creation in CI, use `gh pr create --body-file /tmp/pr-body-.md` +directly rather than the `/create-pr` skill. The skill assumes an interactive +session (it can prompt about uncommitted changes, base branch, etc.) and +shells the body inline, which breaks on backticks and special characters. +Daily-suite recipes that open PRs are governed by `_fix-policy.md` — read it +for the full PR contract (allowlists, draft mode, hidden metadata, branch +naming, atomicity). diff --git a/.agents/recipes/code-quality/recipe.md b/.agents/recipes/code-quality/recipe.md index ab8fcb793..302a8f5ad 100644 --- a/.agents/recipes/code-quality/recipe.md +++ b/.agents/recipes/code-quality/recipe.md @@ -35,6 +35,15 @@ Read `{{memory_path}}/runner-state.json` for baselines from previous runs re-reporting known issues. Flag metrics that are trending in the wrong direction compared to the previous baseline. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected bare-except +finding *before* the `known_issues` filter applies. (Other categories +remain report-only and do not enter `fix_backlog`.) + +The `draft_until_proven` flag in runner-state controls whether this +suite's PRs are opened as draft. Default `true` until a maintainer flips +it to `false`. + ## Instructions ### 1. Complexity hotspots @@ -238,9 +247,51 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| bare-except | `refactor` | yes | Replace `except:` / `except BaseException:` with the specific exception type. Eligible only when grep across the try-block confirms **exactly one** exception type is plausibly raised, verified by inspecting the called functions or imported library docs. Multiple plausible types → ineligible. Test files are excluded (different exception-handling standards). | + +`fix_backlog.data` should record the proposed replacement exception type +and the grep evidence used to determine it. Within bare-except findings, +prefer ones in user-facing modules (`packages/data-designer/src/`) over +internal helpers (the ranking impact criterion handles this once +`data.user_facing` is set). + +The PR body should include the before/after of the try-block plus the +grep evidence that justified the chosen exception type, and a note that +the PR is draft until landing rate is proven (ask reviewers to mark +ready-for-review if the change is correct). + +**Draft mode**: this suite opens PRs as draft until a maintainer flips +`draft_until_proven` to `false` in runner-state, after at least two +non-draft PRs have landed clean. Bare-except narrowing is the most +inference-heavy fix in any suite (confidence 0.6); recipe judgement has +to be earned before promotion. Two-strike findings here are an +especially important signal — they suggest the detector is producing +false positives in an already-cautious category. + +**Not eligible** — stays report-only: + +- Complexity refactors, type annotation additions, exception hierarchy + normalization (judgement-heavy). +- **TODO line deletion** — the audit's "looks done" judgement is not + mechanical enough to delete code on. Deletion is forbidden. + ## Constraints -- Do not modify any files. This is a read-only audit. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify paths in the suite's path allowlist + (`packages/*/src/**/*.py`). Test files are excluded. +- **TODO line deletion is forbidden.** The audit phase still inventories + TODOs, but the fix phase does not act on them. +- Bare-except narrowing is only eligible when the exception type is + unambiguous. When in doubt, skip. - Do not flag test files for type coverage or exception hygiene. Tests have different standards. - Do not duplicate ruff checks (W, F, I, ICN, PIE, TID, UP*). Those are diff --git a/.agents/recipes/dependencies/recipe.md b/.agents/recipes/dependencies/recipe.md index 074e23857..deedf742b 100644 --- a/.agents/recipes/dependencies/recipe.md +++ b/.agents/recipes/dependencies/recipe.md @@ -26,6 +26,10 @@ dependency versions. After the audit, update `known_issues` and `baselines.dependency_versions` with the current state. Skip reporting issues that already appear in `known_issues`. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected finding *before* +the `known_issues` filter applies. + ## Instructions ### 1. Inventory current dependencies @@ -154,12 +158,40 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| transitive-gap | `chore` | yes | Add the imported module to `[project.dependencies]` of the package that imports it. Use the version specifier from a package that already declares it; otherwise the latest stable specifier. Insert in alphabetical order; match existing quote/specifier style. | +| unused | `chore` | yes | Remove the declaration. Eligible only when grep across the package's `src/`, lazy-import system, plugin entry points, and tests turns up zero references. | + +`fix_backlog.data` should record: for transitive-gap, the importing source +files and proposed version specifier; for unused, which other packages +also declare the dep. + +Before running `make test-`, run `make install-dev` to confirm +the lockfile resolves cleanly. `make install-dev` is the only sanctioned +install command (no direct `pip install` or `uv pip install`). + +**Not eligible** — stays report-only: + +- Cross-package version reconciliation, version pinning concerns + (judgement-heavy). +- CVE response (Dependabot's job). + ## Constraints -- Do not modify any files. This is a read-only audit. -- Do not install packages or run `pip install`. Only inspect `pyproject.toml` - and source files. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify `packages/*/pyproject.toml`. The + repo-root `pyproject.toml` is forbidden. +- `make install-dev` is the only sanctioned install command. Do not + invoke `pip install` or `uv pip install` directly. - Do not run `pip audit` (may not be available on the runner). Focus on structural dependency analysis, not CVE scanning (Dependabot handles that). - Do not recommend changes to dependencies you haven't verified are actually problematic. False positives erode trust in the audit. +- Version pinning changes are explicitly out of scope for the fix phase. diff --git a/.agents/recipes/docs-and-references/recipe.md b/.agents/recipes/docs-and-references/recipe.md index f8a7073dc..45a54b324 100644 --- a/.agents/recipes/docs-and-references/recipe.md +++ b/.agents/recipes/docs-and-references/recipe.md @@ -26,6 +26,11 @@ After completing the audit, update the file with any new findings (add to `known_issues` array with a short hash of the finding). Skip reporting issues that already appear in `known_issues`. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected finding *before* +the `known_issues` filter applies, so fixable findings persist across runs +even when their report row is suppressed for being unchanged. + ## Instructions ### 1. Docstring vs signature drift @@ -147,9 +152,30 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| broken-link | `docs` | no | Only when the corrected target is unambiguous (exact-match file at a different path, or a single similar anchor). Multiple candidates → ineligible. | +| docstring-drift | `docs` | yes | Purely signature-driven `Args:`/`Returns:`/`Raises:` updates. Rename a param to its current name, drop entries for removed params, add placeholder entries for added params (note the signature; do not invent semantic descriptions). | +| arch-ref-rename | `docs` | no | Only when grep confirms the old symbol is gone and exactly one similarly-named new symbol exists at the same role. | + +`fix_backlog.data` should carry whatever the fix step needs without +re-scanning: the proposed target for broken-link, the signature-vs-Args +delta for docstring-drift, the new symbol name for arch-ref-rename. + +All other audit categories (docs-site rewrites, dev-note edits, external +URL breakage) stay report-only. + ## Constraints -- Do not modify any files. This is a read-only audit. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify paths in the suite's path allowlist. + See `_fix-policy.md` for the shared command/path baseline. - Do not read file contents unless needed to verify a specific reference. Use `grep` and `head` for targeted checks rather than reading entire files. - Skip vendored or generated files. diff --git a/.agents/recipes/issue-triage/recipe.md b/.agents/recipes/issue-triage/recipe.md index 396b0667a..3edc061ae 100644 --- a/.agents/recipes/issue-triage/recipe.md +++ b/.agents/recipes/issue-triage/recipe.md @@ -1,6 +1,6 @@ --- name: issue-triage -description: Weekly triage of open issues and PRs - classify, verify, detect staleness, duplicates, and cross-reference +description: Weekly triage of open issues and PRs - decision-ready report organized by recommended action trigger: schedule tool: claude-code timeout_minutes: 15 @@ -14,7 +14,9 @@ permissions: # Repository Triage Triage all open issues and pull requests in this repository, then post a -combined report to the tracking issue. +decision-ready report to the tracking issue. The report is organized by +**recommended action** so a maintainer can resolve flagged items without +opening each one. ## Instructions @@ -23,164 +25,185 @@ combined report to the tracking issue. Collect all open issues, open PRs, and recent merge activity: ```bash -# All open issues with metadata gh issue list --state open --limit 200 \ --json number,title,state,createdAt,updatedAt,labels,assignees,author,body -# All open PRs with metadata gh pr list --state open --limit 200 \ --json number,title,state,createdAt,updatedAt,labels,author,headRefName,body -# Recently merged PRs (last 60 days) to cross-reference gh pr list --state merged --limit 100 \ --json number,title,headRefName,body,mergedAt -# PR check status for open PRs +# Failing-check counts for open PRs for pr in $(gh pr list --state open --json number --jq '.[].number'); do - echo "=== PR #${pr} ===" - gh pr checks "$pr" --json name,state --jq '[.[] | select(.state == "FAILURE" or .state == "ERROR")] | length' + FAILING=$(gh pr checks "$pr" --json name,state \ + --jq '[.[] | select(.state == "FAILURE" or .state == "ERROR")] | length') + echo "${pr} ${FAILING}" done ``` -### 2. Triage issues +### 2. Decide an action for every flagged item -For each open issue, determine: +For each open issue and PR, decide whether it needs maintainer action and, if +so, which **action bucket** it belongs in. Buckets are exclusive — every +flagged item appears under exactly one heading. -**Classification** (pick one): -- `bug` - something is broken -- `feature` - new capability or enhancement -- `chore` - maintenance, CI, docs, refactoring -- `discussion` - needs design input or decision before work starts +Buckets and the criteria for each: -**Staleness** (based on last update, today's date, and activity): -- `active` - updated within the last 14 days -- `aging` - updated 14-30 days ago -- `stale` - no update for 30+ days +| Bucket | Apply when | +|--------|-----------| +| `Close as resolved` | A merged PR closes the issue via `Fixes/Closes/Resolves #N`, OR a merged PR's title/branch/body strongly indicates it addressed the issue. The issue is still open. | +| `Close as duplicate` | An older open issue covers the same scope. Pick the older issue as the canonical one. | +| `Needs maintainer decision` | Issue labeled `discussion`, design-input items with no clear scope, or items labeled `needs-attention` (flagged by the stale-PR workflow because their linked PR was auto-closed). | +| `Ready for assignment` | Well-scoped issue, no assignee, no linked open PR, not stale (updated within 30 days). Brief enough that someone could pick it up today. | +| `Stuck PR` | Open PR with one or more failing checks, OR no author activity (push/comment) for 14+ days. | +| `Duplicate PRs` | Two or more open PRs reference the same issue (`Fixes/Closes/Resolves #N`). | +| `Stale, consider closing` | 60+ days since last activity, no assignee, no linked open PR. Older than `Ready for assignment` and without traction. | -**Verification** - check if the issue has been addressed: -- Search merged PRs for closing keywords (`Fixes #N`, `Closes #N`, `Resolves #N`) - referencing this issue -- Search merged PR titles and branches for keywords matching the issue -- If a merged PR appears to fix the issue, flag it as `potentially resolved` -- If there is an open PR linked to the issue, note the PR number +Items that don't fit any of the above are **healthy** — count them but do +not list them in the action sections. -**Labels as signals** - issues with `needs-attention` were flagged by the stale -PR workflow because their linked PR was auto-closed. Always include these in the -"Action needed" section. +Also check `attempted_fixes` in any daily-suite runner-state files (under +`.agentic-ci-state/` if accessible) — findings with two `closed` or +`abandoned` attempts are surfaced in their own section so the maintainer +sees them alongside other action items. (This section may be empty if +those state files are not available in the triage run; that's fine.) -**Duplicates / related** - flag issues that overlap in scope or description. +### 3. Build the report -### 3. Triage PRs +Write the report to `/tmp/issue-triage-report.md`. Each part of a +multi-comment report is a separate file: `/tmp/issue-triage-report-1.md`, +`/tmp/issue-triage-report-2.md`, etc. -For each open PR, determine: +Format: -**Health flags** (check all that apply): -- `no-issue` - PR body has no `Fixes/Closes/Resolves #N` reference (external - contributors only - collaborators are exempt) -- `issue-closed` - PR links to an issue that is already closed (by another PR - or manually) -- `checks-failing` - PR has failing CI checks -- `stale` - no author activity (push or comment) for 14+ days with failing - checks -- `duplicate-fix` - another open PR references the same issue +````markdown + +## Repository Triage Report — YYYY-MM-DD -**Cross-reference** - for each PR that references an issue: -- Verify the linked issue exists and is open -- Check if another open or merged PR also references the same issue -- If two open PRs fix the same issue, flag both as `duplicate-fix` +**Open issues:** N | **Open PRs:** N | **Healthy (no action needed):** N -### 4. Build the report +--- -Write the combined report to `/tmp/issue-triage-report.md` using this format: +### Close as resolved (M) -```markdown - -## Repository Triage Report +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #123 | ... | Close | Merged in #456 | PR title says "fix ..." matching issue scope | -**Run date:** YYYY-MM-DD -**Open issues:** N | **Open PRs:** N +### Close as duplicate (M) ---- +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #234 | ... | Close, point to #200 | Overlaps #200 (older) | Both describe the same crash on empty config | -### Issues: action needed +### Needs maintainer decision (M) -Issues that need maintainer attention (potentially resolved, stale with no -assignee, possible duplicates, needs-attention label). +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #345 | ... | Decide direction | `discussion` label, no consensus | Two competing approaches in comments | -| # | Title | Category | Staleness | Flag | Notes | -|---|-------|----------|-----------|------|-------| +### Ready for assignment (M) -### Issues: active work +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #456 | ... | Assign | Scope clear, no assignee | One-line repro, fix likely <50 LOC | -Issues with assignees or linked open PRs. +### Stuck PR (M) -| # | Title | Category | Assignee | PR | Last updated | -|---|-------|----------|----------|-----|-------------| +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #567 | ... | Nudge author or close | 3 failing checks, 21d since push | DCO + lint failing, author hasn't responded | -### Issues: backlog +### Duplicate PRs (M) -Remaining open issues, ordered by staleness (most stale first). +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #678 / #679 | both fix #500 | Pick one, close other | Both reference #500 | #678 has tests, #679 is simpler | -| # | Title | Category | Staleness | Last updated | -|---|-------|----------|-----------|-------------| +### Stale, consider closing (M) ---- +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #789 | ... | Close with note | 87d no activity, no assignee | No traction; linked design discussion went silent | -### PRs: action needed +### Repeatedly-failed fix attempts (M) -PRs with health flags that need maintainer attention. +(Only emit this section if any items qualify. See `_fix-policy.md` — +two-strike escalation.) -| # | Title | Author | Flags | Notes | -|---|-------|--------|-------|-------| +| Finding | Suite | Attempts | Notes | +|---------|-------|----------|-------| +| ... | docs-and-references | 2 closed | Detector may be flagging a false positive | -### PRs: healthy +--- -Open PRs with no flags. +
+Healthy items (M issues, M PRs) -| # | Title | Author | Linked issue | Last updated | -|---|-------|--------|-------------|-------------| +(One-line summary of each: `#N — <author> — <last update>`. No +action needed; this block is for completeness.) ---- +</details> ### Summary -**Issues:** -- N triaged, N flagged for action, N active, N backlog -- Flags: X potentially resolved, Y stale, Z duplicates +- N items flagged for action across 7 buckets +- M PRs flagged (X stuck, Y duplicate) +- K healthy items (collapsed above) +```` -**PRs:** -- N triaged, N flagged -- Flags: X no linked issue, Y checks failing, Z stale, W duplicate fixes -``` +The marker on the first line (`<!-- agentic-ci-issue-triage:1/N -->`) is +required. If the report fits in one comment, set N = 1. + +### 4. Multi-comment split + +GitHub issue comments cap at 65,536 characters. Use a 60,000-char per-part +budget to leave room for body manipulations. + +Build the parts: + +1. Render the full report. If `len(body) <= 60000`, you have one part. Use + marker `<!-- agentic-ci-issue-triage:1/1 -->`. +2. Otherwise, split on **action-bucket boundaries** (never split a table + mid-row). Each part starts with its own marker + `<!-- agentic-ci-issue-triage:i/N -->` and a heading + `### Triage Report — Part i of N`. +3. Place the summary and `Healthy items` `<details>` block at the end of + the last part. ### 5. Post the report -Find the tracking issue number from the `ISSUE_TRIAGE_TRACKING_ISSUE` -environment variable. Find the last comment by `github-actions[bot]` that -contains `<!-- agentic-ci-issue-triage -->` and note its ID. +Tracking issue number is in `ISSUE_TRIAGE_TRACKING_ISSUE`. List all +existing bot comments containing `agentic-ci-issue-triage:` (in id +order) and reconcile against the new parts: -- If a previous comment exists, **edit it in place** using - `gh api -X PATCH repos/{owner}/{repo}/issues/comments/{id}`. -- If no previous comment exists, post a new comment using `gh issue comment`. +- For `i in 0..min(len(existing), len(parts))`: PATCH `existing[i]` + with `parts[i]` (`gh api -X PATCH .../comments/<id> -f body=...`). +- Surplus parts: post via `gh issue comment --body-file`. +- Surplus existing comments: delete via + `gh api -X DELETE .../comments/<id>`. -```bash -# Edit existing comment -gh api -X PATCH "repos/${GITHUB_REPOSITORY}/issues/comments/${COMMENT_ID}" \ - -f body="$(cat /tmp/issue-triage-report.md)" +This keeps the report a coherent set across runs whether it grows, +shrinks, or stays stable. -# Or post new comment -gh issue comment "$TRACKING_ISSUE" --body-file /tmp/issue-triage-report.md -``` +### 6. Fallback + +If you cannot find the tracking issue or the API calls fail repeatedly, +write the report parts to `/tmp/issue-triage-report-*.md` and stop. The +workflow's fallback step will post `/tmp/issue-triage-report.md` (which, +for a single-part report, is identical to part 1; the workflow falls back +only on single-part output). The workflow does not handle multi-part +fallback — make sure the report fits in one part if you cannot post it +yourself. ## Constraints -- **Read-only triage.** Do not close, label, or modify any issues or PRs. The - report is for maintainers to act on. -- **Do not post the report yourself if you cannot find the tracking issue.** - Write the report to `/tmp/issue-triage-report.md` and stop. The workflow - will handle fallback posting. -- **Stay concise.** Notes columns should be one sentence max. Link to the - relevant PR, issue, or duplicate - don't explain the fix. +- **Read-only triage.** Do not close, label, or modify any issues or PRs. + The report is for maintainers to act on. +- **Stay concise.** Rationale columns should be one sentence max. +- **No fix authority.** This recipe never opens PRs or commits code. It + reads, classifies, and posts a report. - **Cost awareness.** Do not read full issue/PR bodies unless needed to - determine duplicates or verify cross-references. The metadata from - `gh issue list` and `gh pr list` is enough for most checks. + determine duplicates, verify cross-references, or decide an action. The + metadata from `gh issue list` / `gh pr list` is enough for most checks. diff --git a/.agents/recipes/structure/recipe.md b/.agents/recipes/structure/recipe.md index 2dc793d54..3df885ecd 100644 --- a/.agents/recipes/structure/recipe.md +++ b/.agents/recipes/structure/recipe.md @@ -51,6 +51,10 @@ regardless of test coverage. Read `{{memory_path}}/runner-state.json` for known issues from previous runs. Update after the audit. Skip re-reporting known issues. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected finding *before* +the `known_issues` filter applies. + ## Instructions ### 1. Import boundary violations @@ -208,15 +212,35 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| missing-future | `chore` | yes | Insert `from __future__ import annotations` after the SPDX header block, before other imports. Fully deterministic. Tests required because `__future__` annotations can affect introspection-heavy code paths. | +| lazy-import | `refactor` | yes | Move a top-level heavy import (pandas/numpy/polars/torch/duckdb/sqlfluff/faker) to the `data_designer.lazy_heavy_imports` accessor pattern. Eligible only when (a) file is under `packages/*/src/`, (b) the module is already wired in the lazy system, (c) the heavy module is used only inside function bodies. | + +**Not eligible** — stays report-only: + +- Import boundary violations (architectural judgement). +- Dead exports (audit labels them "potentially dead"; external plugin + consumers may use them). + ## Constraints -- Do not modify any files. This is a read-only audit. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify paths in the suite's path allowlist. + See `_fix-policy.md` for the shared command/path baseline. - Imports inside `if TYPE_CHECKING:` blocks are allowed and should not be flagged for any check. - Lazy imports in `__init__.py` (via `__getattr__`) are deferred and should not be treated as violations. - Dead export detection has false positives. Mark uncertain cases as - "potentially dead" rather than definitively dead. + "potentially dead" rather than definitively dead. **Do not auto-remove + them** — they are not in the fix-eligible list. - Always cite which rule or doc is violated so maintainers can verify. - Import boundaries are currently clean. No findings in that section is normal and expected. diff --git a/.agents/recipes/test-health/recipe.md b/.agents/recipes/test-health/recipe.md index 1fe82c4a8..2224684ff 100644 --- a/.agents/recipes/test-health/recipe.md +++ b/.agents/recipes/test-health/recipe.md @@ -317,9 +317,29 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +**This suite has no fix phase.** All categories — coverage gaps, hollow +tests, import perf regressions, smoke check failures, test isolation +violations — stay report-only. + +Rationale: the categories that look mechanical (rewriting hollow +`assert ... is not None` checks, adding missing test files, fixing +test-isolation violations) all require inferring intent or authoring new +code. The audit phase only commits to *flagging* these conservatively; +turning that into authored test changes is beyond the "self-evident" +bar in `_fix-policy.md`. + +A future revision may add **test-isolation violations** (config tests +importing engine, engine tests importing interface) as an eligible fix +category — those are mechanical (replace the cross-boundary import with +a test-local equivalent or fixture). Add only after `code-quality` (the +other inference-heavy suite) has its draft-PR landing rate proven and +`draft_until_proven` flipped to `false`. + ## Constraints -- Do not modify any test files. This is a read-only audit. +- This recipe is fully read-only — do not modify any files at all. - Do not run the full test suite or coverage tool. Analysis is based on file structure and static inspection, not execution. - Be conservative with hollow test detection. Only flag tests you've read diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index d04a5d839..481ac937d 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -127,6 +127,11 @@ jobs: python -m venv /tmp/graphify-venv /tmp/graphify-venv/bin/python -m pip install graphifyy==0.4.23 --quiet 2>&1 | tail -3 + - name: Configure git identity + run: | + git config user.email "agentic-ci@datadesigner" + git config user.name "Agentic CI" + - name: Pre-flight checks env: ANTHROPIC_BASE_URL: ${{ secrets.AGENTIC_CI_API_BASE_URL }} @@ -173,11 +178,13 @@ jobs: exit 1 fi - # Build prompt: _runner.md + recipe body (strip YAML frontmatter) + # Build prompt: phase directive + _runner.md + _fix-policy.md + recipe body (strip YAML frontmatter) + PHASE_DIRECTIVE=$(cat .agents/recipes/_phase-audit.md) RUNNER_CTX=$(cat .agents/recipes/_runner.md) + FIX_POLICY=$(cat .agents/recipes/_fix-policy.md) RECIPE_BODY=$(sed '1,/^---$/{ /^---$/,/^---$/d }' "${RECIPE_DIR}/recipe.md") - PROMPT=$(printf '%s\n\n%s\n' "${RUNNER_CTX}" "${RECIPE_BODY}" \ + PROMPT=$(printf '%s\n\n%s\n\n%s\n\n%s\n' "${PHASE_DIRECTIVE}" "${RUNNER_CTX}" "${FIX_POLICY}" "${RECIPE_BODY}" \ | sed "s|{{suite}}|${SUITE}|g" \ | sed "s|{{date}}|$(date -u +%Y-%m-%d)|g" \ | sed "s|{{memory_path}}|.agentic-ci-state|g") @@ -190,6 +197,49 @@ jobs: --verbose \ 2>&1 | tee /tmp/claude-audit-log.txt + - name: Check fix backlog + id: backlog + if: success() && matrix.suite != 'test-health' + run: | + BACKLOG_SIZE=$(jq '.fix_backlog // [] | length' .agentic-ci-state/runner-state.json 2>/dev/null || echo 0) + echo "size=${BACKLOG_SIZE}" >> "$GITHUB_OUTPUT" + echo "fix_backlog has ${BACKLOG_SIZE} entries" + + - name: Run fix recipe + id: fix + if: success() && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 + env: + ANTHROPIC_BASE_URL: ${{ secrets.AGENTIC_CI_API_BASE_URL }} + ANTHROPIC_API_KEY: ${{ secrets.AGENTIC_CI_API_KEY }} + AGENTIC_CI_MODEL: ${{ vars.AGENTIC_CI_MODEL }} + DISABLE_PROMPT_CACHING: "1" + GH_TOKEN: ${{ github.token }} + GITHUB_REPOSITORY: ${{ github.repository }} + SUITE: ${{ matrix.suite }} + run: | + set -o pipefail + + RECIPE_DIR=".agents/recipes/${SUITE}" + + # Build prompt: phase directive + _runner.md + _fix-policy.md + recipe body (strip YAML frontmatter) + PHASE_DIRECTIVE=$(cat .agents/recipes/_phase-fix.md) + RUNNER_CTX=$(cat .agents/recipes/_runner.md) + FIX_POLICY=$(cat .agents/recipes/_fix-policy.md) + RECIPE_BODY=$(sed '1,/^---$/{ /^---$/,/^---$/d }' "${RECIPE_DIR}/recipe.md") + + PROMPT=$(printf '%s\n\n%s\n\n%s\n\n%s\n' "${PHASE_DIRECTIVE}" "${RUNNER_CTX}" "${FIX_POLICY}" "${RECIPE_BODY}" \ + | sed "s|{{suite}}|${SUITE}|g" \ + | sed "s|{{date}}|$(date -u +%Y-%m-%d)|g" \ + | sed "s|{{memory_path}}|.agentic-ci-state|g") + + stdbuf -oL -eL claude \ + --model "$AGENTIC_CI_MODEL" \ + -p "$PROMPT" \ + --max-turns 50 \ + --output-format stream-json \ + --verbose \ + 2>&1 | tee /tmp/claude-fix-log.txt + - name: Update runner memory if: always() env: @@ -205,7 +255,9 @@ jobs: except (json.JSONDecodeError, FileNotFoundError) as e: print(f'::warning::runner-state.json is invalid ({e}), resetting') state = {'suite': os.environ['SUITE'], 'known_issues': [], 'baselines': {}} - # Only stamp last_run if the audit actually succeeded + # Only stamp last_run if the audit actually succeeded. + # Fix phase manages its own state via attempted_fixes; its outcome + # does not gate last_run. if os.environ.get('AUDIT_OUTCOME') == 'success': state['last_run'] = datetime.datetime.now(datetime.timezone.utc).isoformat() state['suite'] = os.environ['SUITE'] @@ -220,7 +272,9 @@ jobs: name: claude-audit-log-${{ matrix.suite }}-${{ github.run_id }}-${{ github.run_attempt }} path: | /tmp/claude-audit-log.txt + /tmp/claude-fix-log.txt /tmp/audit-${{ matrix.suite }}.md + /tmp/pr-body-${{ matrix.suite }}.md retention-days: 14 if-no-files-found: ignore @@ -228,9 +282,16 @@ jobs: if: always() env: SUITE: ${{ matrix.suite }} + AUDIT_OUTCOME: ${{ steps.audit.outcome }} + FIX_OUTCOME: ${{ steps.fix.outcome }} + BACKLOG_SIZE: ${{ steps.backlog.outputs.size }} run: | echo "## Daily Audit: ${SUITE}" >> "$GITHUB_STEP_SUMMARY" echo "" >> "$GITHUB_STEP_SUMMARY" + echo "- Audit: \`${AUDIT_OUTCOME:-unknown}\`" >> "$GITHUB_STEP_SUMMARY" + echo "- Fix backlog size: \`${BACKLOG_SIZE:-n/a}\`" >> "$GITHUB_STEP_SUMMARY" + echo "- Fix: \`${FIX_OUTCOME:-skipped}\`" >> "$GITHUB_STEP_SUMMARY" + echo "" >> "$GITHUB_STEP_SUMMARY" if [ -s "/tmp/audit-${SUITE}.md" ]; then cat "/tmp/audit-${SUITE}.md" >> "$GITHUB_STEP_SUMMARY" From 91c0015d2ed2bc4c8676b8015277386f5f6390d4 Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Mon, 4 May 2026 17:13:55 +0000 Subject: [PATCH 02/12] fix(agentic-ci): address review findings before merge MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Map per-package test targets explicitly in _fix-policy.md (Makefile exposes test-config/test-engine/test-interface, not test-<package>). - Use github-actions[bot] noreply identity for commits the recipes produce. - Refresh fix_backlog.data when an id already exists so the fix phase cannot drive a PR from stale data after the underlying file changed. - Stop time-pruning closed/abandoned attempted_fixes entries — pruning before the two-strike threshold erases the history needed to escalate. Single-strike entries now age out only via the 200-entry cap. - Disambiguate bare-except findings within the same function by including a try-body hash in the finding id. - Audit grep for code-quality now matches both `except:` and `except BaseException:`, in parity with the fix eligibility. - Restrict transitive-gap fix eligibility to cases where a sibling package already declares the dep (avoids inventing version specifiers from scratch). - Issue-triage workflow handles multi-part reports in both the fallback post step and the job summary; recipe always writes numbered parts. --- .agents/recipes/_fix-policy.md | 35 ++++++++--- .agents/recipes/code-quality/recipe.md | 6 +- .agents/recipes/dependencies/recipe.md | 8 ++- .agents/recipes/issue-triage/recipe.md | 15 +++-- .github/workflows/agentic-ci-daily.yml | 4 +- .github/workflows/agentic-ci-issue-triage.yml | 62 +++++++++++-------- 6 files changed, 81 insertions(+), 49 deletions(-) diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md index 963824cab..065842704 100644 --- a/.agents/recipes/_fix-policy.md +++ b/.agents/recipes/_fix-policy.md @@ -16,8 +16,15 @@ A finding may be converted to a fix only if all hold: bumps (Dependabot owns those), no schema changes, no migrations. - **Self-evident**: the audit established both the problem *and* the unique correct fix. Mechanical, not interpretive. -- **Test-safe**: when the recipe declares `test_required`, run - `make test-<package>` for the affected package and abort on failure. +- **Test-safe**: when the recipe declares `test_required`, run the + per-package test target for the affected package and abort on failure. + Mapping (the Makefile does not expose `test-<package>` directly): + + | Package directory | Test target | + |-------------------|-------------| + | `packages/data-designer-config` | `make test-config` | + | `packages/data-designer-engine` | `make test-engine` | + | `packages/data-designer` | `make test-interface` | - **Single concern**: one finding per PR. - **Allowlisted paths**: matches the suite's path allowlist. @@ -74,8 +81,11 @@ code-quality and unset elsewhere) controls draft-PR mode. ### `fix_backlog` rules (audit phase populates this) -- Append every detected finding in an eligible category. Update `last_seen` - if `id` already present. +- Append every detected finding in an eligible category. If `id` is already + present, **refresh both `last_seen` and `data`** with the current scan's + values. The `data` field is used by the fix phase to apply the change + without re-scanning, so stale `data` would let an old plan drive a new + PR after the underlying file moved or changed. - Drop entries with `last_seen` older than 30 days. - Cap at 200 entries (drop oldest by `first_seen`). - Populated **before** the `known_issues` filter so fixable findings persist @@ -89,9 +99,14 @@ code-quality and unset elsewhere) controls draft-PR mode. conflict, lint failed, allowlist rejected, etc.). - Reconcile against open PRs (`gh pr list`) at the start of each fix run to recover from crashes that left state un-updated. -- Prune: drop `merged` >90d, drop single `closed`/`abandoned` >30d. -- Two-strike entries (≥2 `closed`/`abandoned`) are NOT pruned; they - surface in the report under `Repeatedly-failed fix attempts`. +- Prune: drop `merged` entries older than 90 days. Do **not** prune + `closed` or `abandoned` entries by age — pruning a single-strike entry + would erase the history needed to ever reach the two-strike threshold. + The 200-entry cap (with oldest-first eviction by `first_seen`) handles + long-tail cleanup. +- Two-strike entries (≥2 `closed`/`abandoned`) surface in the report + under `Repeatedly-failed fix attempts` and are filtered from selection + permanently. ## Finding hash @@ -108,9 +123,13 @@ text: | dependencies (unused) | `<package>:<dep>:unused` | | structure (missing-future) | `<source-file>:missing-future` | | structure (lazy-import) | `<source-file>:lazy-import:<imported-module>` | -| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:bare-except` | +| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:<try-body-hash>:bare-except` | Symbols use fully-qualified Python names. +`try-body-hash` is `sha1(<try-block body, leading/trailing whitespace +stripped, internal lines preserved>)[:8]` — needed because a function +can contain multiple `try:` blocks with bare excepts that would +otherwise collide on the same finding id. ## Ranking diff --git a/.agents/recipes/code-quality/recipe.md b/.agents/recipes/code-quality/recipe.md index 302a8f5ad..268e0adff 100644 --- a/.agents/recipes/code-quality/recipe.md +++ b/.agents/recipes/code-quality/recipe.md @@ -73,8 +73,10 @@ Check for patterns that violate the project's "errors normalize at boundaries" principle (AGENTS.md): ```bash -# Bare except clauses (should use specific exception types) -grep -rn "except:" packages/*/src/ --include='*.py' | grep -v "# noqa" +# Bare except clauses (should use specific exception types). +# Catches both `except:` and `except BaseException:` — both swallow +# everything including KeyboardInterrupt and SystemExit. +grep -rnE "except\s*:|except\s+BaseException" packages/*/src/ --include='*.py' | grep -v "# noqa" # Swallowed exceptions (except + pass/continue with no logging) grep -rn -A1 "except" packages/*/src/ --include='*.py' | grep -B1 "pass$\|continue$" diff --git a/.agents/recipes/dependencies/recipe.md b/.agents/recipes/dependencies/recipe.md index deedf742b..13580393b 100644 --- a/.agents/recipes/dependencies/recipe.md +++ b/.agents/recipes/dependencies/recipe.md @@ -166,12 +166,14 @@ Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: | Category | Branch type | test_required | Eligibility note | |----------|-------------|---------------|------------------| -| transitive-gap | `chore` | yes | Add the imported module to `[project.dependencies]` of the package that imports it. Use the version specifier from a package that already declares it; otherwise the latest stable specifier. Insert in alphabetical order; match existing quote/specifier style. | +| transitive-gap | `chore` | yes | Add the imported module to `[project.dependencies]` of the package that imports it, copying the version specifier from a sibling package that already declares it. Insert in alphabetical order; match existing quote/specifier style. **Ineligible** when no sibling package declares the dep — choosing a specifier from scratch is interpretive, not mechanical. Those findings stay report-only and surface for maintainer judgement. | | unused | `chore` | yes | Remove the declaration. Eligible only when grep across the package's `src/`, lazy-import system, plugin entry points, and tests turns up zero references. | `fix_backlog.data` should record: for transitive-gap, the importing source -files and proposed version specifier; for unused, which other packages -also declare the dep. +files and the sibling package whose specifier was copied (the recipe +must record this *during the audit*; the fix phase rejects entries with +no sibling source). For unused, which other packages also declare the +dep. Before running `make test-<package>`, run `make install-dev` to confirm the lockfile resolves cleanly. `make install-dev` is the only sanctioned diff --git a/.agents/recipes/issue-triage/recipe.md b/.agents/recipes/issue-triage/recipe.md index 3edc061ae..3536c4b82 100644 --- a/.agents/recipes/issue-triage/recipe.md +++ b/.agents/recipes/issue-triage/recipe.md @@ -71,9 +71,10 @@ those state files are not available in the triage run; that's fine.) ### 3. Build the report -Write the report to `/tmp/issue-triage-report.md`. Each part of a -multi-comment report is a separate file: `/tmp/issue-triage-report-1.md`, -`/tmp/issue-triage-report-2.md`, etc. +Write each part to a numbered file: `/tmp/issue-triage-report-1.md`, +`/tmp/issue-triage-report-2.md`, etc. Single-part reports use +`/tmp/issue-triage-report-1.md`. The workflow's fallback step looks for +numbered files first. Format: @@ -191,11 +192,9 @@ shrinks, or stays stable. If you cannot find the tracking issue or the API calls fail repeatedly, write the report parts to `/tmp/issue-triage-report-*.md` and stop. The -workflow's fallback step will post `/tmp/issue-triage-report.md` (which, -for a single-part report, is identical to part 1; the workflow falls back -only on single-part output). The workflow does not handle multi-part -fallback — make sure the report fits in one part if you cannot post it -yourself. +workflow's fallback step posts every numbered part in order if no +agent-authored comments containing today's date already exist on the +tracking issue. ## Constraints diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 481ac937d..40c57ae5c 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -129,8 +129,8 @@ jobs: - name: Configure git identity run: | - git config user.email "agentic-ci@datadesigner" - git config user.name "Agentic CI" + git config user.email "41898282+github-actions[bot]@users.noreply.github.com" + git config user.name "github-actions[bot]" - name: Pre-flight checks env: diff --git a/.github/workflows/agentic-ci-issue-triage.yml b/.github/workflows/agentic-ci-issue-triage.yml index 7a2dcdbc6..796c4b669 100644 --- a/.github/workflows/agentic-ci-issue-triage.yml +++ b/.github/workflows/agentic-ci-issue-triage.yml @@ -101,42 +101,52 @@ jobs: GH_TOKEN: ${{ github.token }} TRACKING_ISSUE: ${{ vars.ISSUE_TRIAGE_TRACKING_ISSUE }} run: | - if [ ! -s "/tmp/issue-triage-report.md" ]; then + # Collect report parts. Numbered files (multi-part) take precedence; + # fall back to the unsuffixed file for single-part runs. + shopt -s nullglob + PARTS=(/tmp/issue-triage-report-*.md) + shopt -u nullglob + if [ ${#PARTS[@]} -eq 0 ] && [ -s /tmp/issue-triage-report.md ]; then + PARTS=(/tmp/issue-triage-report.md) + fi + if [ ${#PARTS[@]} -eq 0 ]; then echo "::warning::Triage report not created by agent." exit 0 fi + mapfile -t PARTS < <(printf '%s\n' "${PARTS[@]}" | sort -V) - # Check if the agent already posted/updated the comment. - MARKER="<!-- agentic-ci-issue-triage -->" - EXISTING=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ - --jq "[.[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${MARKER}\"))] | last | .id" \ - 2>/dev/null || echo "") - - REPORT=$(cat /tmp/issue-triage-report.md) - - # Only post if the report marker is not already in a recent comment - # with today's date (agent already posted). + # Skip the fallback if the agent already posted today (any number of parts). + MARKER_PREFIX="agentic-ci-issue-triage:" TODAY=$(date -u +%Y-%m-%d) - if [ -n "$EXISTING" ] && [ "$EXISTING" != "null" ]; then - EXISTING_BODY=$(gh api "repos/${{ github.repository }}/issues/comments/${EXISTING}" --jq '.body') - if echo "$EXISTING_BODY" | grep -q "$TODAY"; then - echo "Agent already posted today's report. Skipping fallback." - exit 0 - fi - # Update existing comment. - gh api -X PATCH "repos/${{ github.repository }}/issues/comments/${EXISTING}" \ - -f body="$REPORT" - echo "Updated existing triage comment." - else - gh issue comment "$TRACKING_ISSUE" --body-file /tmp/issue-triage-report.md - echo "Posted new triage comment." + ALREADY_POSTED=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ + --paginate \ + --jq "[.[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${MARKER_PREFIX}\")) | select(.body | contains(\"${TODAY}\"))] | length" \ + 2>/dev/null || echo 0) + if [ "${ALREADY_POSTED}" != "0" ]; then + echo "Agent already posted today's report (${ALREADY_POSTED} part(s)). Skipping fallback." + exit 0 fi + for PART in "${PARTS[@]}"; do + gh issue comment "$TRACKING_ISSUE" --body-file "$PART" + echo "Posted ${PART}" + done + - name: Write job summary if: always() run: | - if [ -s "/tmp/issue-triage-report.md" ]; then - cat /tmp/issue-triage-report.md >> "$GITHUB_STEP_SUMMARY" + shopt -s nullglob + PARTS=(/tmp/issue-triage-report-*.md) + shopt -u nullglob + if [ ${#PARTS[@]} -eq 0 ] && [ -s /tmp/issue-triage-report.md ]; then + PARTS=(/tmp/issue-triage-report.md) + fi + if [ ${#PARTS[@]} -gt 0 ]; then + mapfile -t PARTS < <(printf '%s\n' "${PARTS[@]}" | sort -V) + for PART in "${PARTS[@]}"; do + cat "${PART}" >> "$GITHUB_STEP_SUMMARY" + echo "" >> "$GITHUB_STEP_SUMMARY" + done else echo "No triage report was generated." >> "$GITHUB_STEP_SUMMARY" fi From 141a33b260942c6df7afbefe4aa9d5bc69900adc Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Mon, 4 May 2026 17:23:29 +0000 Subject: [PATCH 03/12] fix(agentic-ci): close residuals from review pass 2 - Replace remaining `make test-<package>` references with pointers to the mapping table; only the table itself uses that placeholder now. - Fix `gh api --paginate | jq | length` returning per-page counts: slurp with `jq -s 'add // 0'` to get a single total. - Compare posted-comment count to expected part count so a partial post (agent posted part 1 but not 2/3) triggers the fallback instead of being silently treated as success. - Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're not at the mercy of the runner's default shell. - Disambiguate bare-except findings whose try-body hashes collide by adding a per-function ordinal to the canonical_key. - Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at` (the schema has no `first_seen` field). --- .agents/recipes/_fix-policy.md | 22 +++++++++++-------- .agents/recipes/dependencies/recipe.md | 7 +++--- .github/workflows/agentic-ci-issue-triage.yml | 20 ++++++++++++----- 3 files changed, 32 insertions(+), 17 deletions(-) diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md index 065842704..bece0226e 100644 --- a/.agents/recipes/_fix-policy.md +++ b/.agents/recipes/_fix-policy.md @@ -102,8 +102,9 @@ code-quality and unset elsewhere) controls draft-PR mode. - Prune: drop `merged` entries older than 90 days. Do **not** prune `closed` or `abandoned` entries by age — pruning a single-strike entry would erase the history needed to ever reach the two-strike threshold. - The 200-entry cap (with oldest-first eviction by `first_seen`) handles - long-tail cleanup. + The 200-entry cap handles long-tail cleanup; eviction order is oldest + first by the `at` date of the entry's first attempt + (`attempts[0].at`). - Two-strike entries (≥2 `closed`/`abandoned`) surface in the report under `Repeatedly-failed fix attempts` and are filtered from selection permanently. @@ -123,13 +124,15 @@ text: | dependencies (unused) | `<package>:<dep>:unused` | | structure (missing-future) | `<source-file>:missing-future` | | structure (lazy-import) | `<source-file>:lazy-import:<imported-module>` | -| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:<try-body-hash>:bare-except` | +| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:<try-body-hash>:<ordinal>:bare-except` | Symbols use fully-qualified Python names. `try-body-hash` is `sha1(<try-block body, leading/trailing whitespace -stripped, internal lines preserved>)[:8]` — needed because a function -can contain multiple `try:` blocks with bare excepts that would -otherwise collide on the same finding id. +stripped, internal lines preserved>)[:8]`. +`ordinal` is the 1-based position of this bare-except among bare-excepts +in the same enclosing symbol, in source order. Both are needed: the body +hash distinguishes most cases, and the ordinal disambiguates the rare +case of two bare-except blocks with byte-identical try bodies. ## Ranking @@ -181,9 +184,10 @@ declare only the parts that vary (eligible categories, branch type, remove from `fix_backlog` and continue. 2. Apply the fix. If the diff exceeds the localized-fix bar or touches a non-allowlisted path, abandon and continue. - 3. If the category sets `test_required: true`, run - `make test-<package>` for the package containing the change. On - failure: abandon and continue. + 3. If the category sets `test_required: true`, run the per-package + test target (see the mapping table in "Localized fix bar" above) + for the package containing the change. On failure: abandon and + continue. 4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit: `<type>(agentic-ci): <one-line>`. Push. 5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the diff --git a/.agents/recipes/dependencies/recipe.md b/.agents/recipes/dependencies/recipe.md index 13580393b..bc72ee805 100644 --- a/.agents/recipes/dependencies/recipe.md +++ b/.agents/recipes/dependencies/recipe.md @@ -175,9 +175,10 @@ must record this *during the audit*; the fix phase rejects entries with no sibling source). For unused, which other packages also declare the dep. -Before running `make test-<package>`, run `make install-dev` to confirm -the lockfile resolves cleanly. `make install-dev` is the only sanctioned -install command (no direct `pip install` or `uv pip install`). +Before running the per-package test target (see `_fix-policy.md` for the +mapping), run `make install-dev` to confirm the lockfile resolves cleanly. +`make install-dev` is the only sanctioned install command (no direct +`pip install` or `uv pip install`). **Not eligible** — stays report-only: diff --git a/.github/workflows/agentic-ci-issue-triage.yml b/.github/workflows/agentic-ci-issue-triage.yml index 796c4b669..841410d11 100644 --- a/.github/workflows/agentic-ci-issue-triage.yml +++ b/.github/workflows/agentic-ci-issue-triage.yml @@ -97,6 +97,7 @@ jobs: continue-on-error: true - name: Fallback post if agent did not post + shell: bash env: GH_TOKEN: ${{ github.token }} TRACKING_ISSUE: ${{ vars.ISSUE_TRIAGE_TRACKING_ISSUE }} @@ -114,19 +115,27 @@ jobs: exit 0 fi mapfile -t PARTS < <(printf '%s\n' "${PARTS[@]}" | sort -V) + EXPECTED_PARTS=${#PARTS[@]} - # Skip the fallback if the agent already posted today (any number of parts). + # Count how many bot comments today already match the marker. + # `gh api --paginate` emits one JSON document per page, so we slurp + # them with `jq -s 'add'` to get a single count. MARKER_PREFIX="agentic-ci-issue-triage:" TODAY=$(date -u +%Y-%m-%d) - ALREADY_POSTED=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ + POSTED_COUNT=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ --paginate \ --jq "[.[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${MARKER_PREFIX}\")) | select(.body | contains(\"${TODAY}\"))] | length" \ - 2>/dev/null || echo 0) - if [ "${ALREADY_POSTED}" != "0" ]; then - echo "Agent already posted today's report (${ALREADY_POSTED} part(s)). Skipping fallback." + 2>/dev/null | jq -s 'add // 0') + + if [ "${POSTED_COUNT}" -ge "${EXPECTED_PARTS}" ]; then + echo "Agent already posted today's report (${POSTED_COUNT}/${EXPECTED_PARTS} part(s)). Skipping fallback." exit 0 fi + if [ "${POSTED_COUNT}" -gt 0 ]; then + echo "::warning::Agent posted ${POSTED_COUNT} of ${EXPECTED_PARTS} expected parts. Posting all parts now to ensure coverage; this may produce duplicate comments." + fi + for PART in "${PARTS[@]}"; do gh issue comment "$TRACKING_ISSUE" --body-file "$PART" echo "Posted ${PART}" @@ -134,6 +143,7 @@ jobs: - name: Write job summary if: always() + shell: bash run: | shopt -s nullglob PARTS=(/tmp/issue-triage-report-*.md) From 15cefc2707a685af5b96bb4eeba342ff6209562a Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Mon, 4 May 2026 17:28:18 +0000 Subject: [PATCH 04/12] fix(agentic-ci): identity-based partial-post detection in triage fallback Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an identity-based check that extracts every i/N marker seen in today-dated bot comments and verifies each expected i is present. A duplicate post of one part can no longer mask a missing other. --- .github/workflows/agentic-ci-issue-triage.yml | 30 ++++++++++++------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/.github/workflows/agentic-ci-issue-triage.yml b/.github/workflows/agentic-ci-issue-triage.yml index 841410d11..9a83787a9 100644 --- a/.github/workflows/agentic-ci-issue-triage.yml +++ b/.github/workflows/agentic-ci-issue-triage.yml @@ -117,23 +117,31 @@ jobs: mapfile -t PARTS < <(printf '%s\n' "${PARTS[@]}" | sort -V) EXPECTED_PARTS=${#PARTS[@]} - # Count how many bot comments today already match the marker. - # `gh api --paginate` emits one JSON document per page, so we slurp - # them with `jq -s 'add'` to get a single count. - MARKER_PREFIX="agentic-ci-issue-triage:" + # Skip fallback only if every expected part identity (i/N) appears + # in a today-dated bot comment. Identity-based, not count-based: + # a duplicate post of one part should not mask a missing other. TODAY=$(date -u +%Y-%m-%d) - POSTED_COUNT=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ + SEEN_PARTS=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ --paginate \ - --jq "[.[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${MARKER_PREFIX}\")) | select(.body | contains(\"${TODAY}\"))] | length" \ - 2>/dev/null | jq -s 'add // 0') + --jq ".[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${TODAY}\")) | .body | capture(\"agentic-ci-issue-triage:(?<i>[0-9]+)/${EXPECTED_PARTS}\") | .i" \ + 2>/dev/null | sort -u) + + ALL_COVERED=1 + MISSING=() + for i in $(seq 1 "${EXPECTED_PARTS}"); do + if ! echo "${SEEN_PARTS}" | grep -qx "${i}"; then + ALL_COVERED=0 + MISSING+=("${i}") + fi + done - if [ "${POSTED_COUNT}" -ge "${EXPECTED_PARTS}" ]; then - echo "Agent already posted today's report (${POSTED_COUNT}/${EXPECTED_PARTS} part(s)). Skipping fallback." + if [ "${ALL_COVERED}" -eq 1 ]; then + echo "All ${EXPECTED_PARTS} parts already posted today (identities: $(echo "${SEEN_PARTS}" | tr '\n' ',' | sed 's/,$//')). Skipping fallback." exit 0 fi - if [ "${POSTED_COUNT}" -gt 0 ]; then - echo "::warning::Agent posted ${POSTED_COUNT} of ${EXPECTED_PARTS} expected parts. Posting all parts now to ensure coverage; this may produce duplicate comments." + if [ -n "${SEEN_PARTS}" ]; then + echo "::warning::Missing parts ${MISSING[*]} of ${EXPECTED_PARTS}; posting all parts. Duplicates may result for parts that were already posted." fi for PART in "${PARTS[@]}"; do From 8786b8de22884b5e5e098b59a978bcd632949225 Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Mon, 4 May 2026 17:30:41 +0000 Subject: [PATCH 05/12] fix(agentic-ci): close remaining bot-review findings - Exempt two-strike attempted_fixes entries from the 200-entry cap eviction. Cap now evicts non-two-strike oldest-first by attempts[0].at; two-strike entries are silently-forgotten only in the pathological all-200-are-two-strike case (itself a signal). - Specify the attempted_fixes PR-marker reconciliation algorithm: scan open PR bodies for the `<!-- agentic-ci finding=<id> -->` marker and back-fill missing entries. - Tighten the daily workflow conditionals to gate on explicit step outcomes (steps.audit.outcome == 'success' rather than success()) so a future pre-audit gate cannot accidentally trip the fix step. --- .agents/recipes/_fix-policy.md | 23 ++++++++++++++++------- .github/workflows/agentic-ci-daily.yml | 4 ++-- 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md index bece0226e..7dbaf7dee 100644 --- a/.agents/recipes/_fix-policy.md +++ b/.agents/recipes/_fix-policy.md @@ -97,16 +97,25 @@ code-quality and unset elsewhere) controls draft-PR mode. - `abandoned` means the recipe could not produce a PR (tests failed, conflict, lint failed, allowlist rejected, etc.). -- Reconcile against open PRs (`gh pr list`) at the start of each fix run - to recover from crashes that left state un-updated. +- Reconcile against open PRs (`gh pr list`) at the start of each fix + run to recover from crashes that left state un-updated. The + reconciliation algorithm: list open PRs whose bodies contain the + `<!-- agentic-ci finding=<id> suite=<suite> -->` marker, parse out + each `<id>`, and back-fill any missing `attempted_fixes` entries with + `outcome: "open"` and the parsed `pr_number` and `branch`. - Prune: drop `merged` entries older than 90 days. Do **not** prune `closed` or `abandoned` entries by age — pruning a single-strike entry would erase the history needed to ever reach the two-strike threshold. - The 200-entry cap handles long-tail cleanup; eviction order is oldest - first by the `at` date of the entry's first attempt - (`attempts[0].at`). -- Two-strike entries (≥2 `closed`/`abandoned`) surface in the report - under `Repeatedly-failed fix attempts` and are filtered from selection +- The 200-entry cap handles long-tail cleanup. Eviction order: + non-two-strike entries first, oldest-first by `attempts[0].at`. + Two-strike entries (≥2 `closed`/`abandoned`) are exempt from cap + eviction unless every other entry has already been evicted — they + represent maintainer-action signals and must not be silently + forgotten. If two-strike entries alone exceed 200, that's itself a + signal worth surfacing; in that pathological case, evict oldest-first + by `attempts[0].at`. +- Two-strike entries surface in the report under + `Repeatedly-failed fix attempts` and are filtered from selection permanently. ## Finding hash diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 40c57ae5c..e5ca1e067 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -199,7 +199,7 @@ jobs: - name: Check fix backlog id: backlog - if: success() && matrix.suite != 'test-health' + if: steps.audit.outcome == 'success' && matrix.suite != 'test-health' run: | BACKLOG_SIZE=$(jq '.fix_backlog // [] | length' .agentic-ci-state/runner-state.json 2>/dev/null || echo 0) echo "size=${BACKLOG_SIZE}" >> "$GITHUB_OUTPUT" @@ -207,7 +207,7 @@ jobs: - name: Run fix recipe id: fix - if: success() && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 + if: steps.audit.outcome == 'success' && steps.backlog.outcome == 'success' && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 env: ANTHROPIC_BASE_URL: ${{ secrets.AGENTIC_CI_API_BASE_URL }} ANTHROPIC_API_KEY: ${{ secrets.AGENTIC_CI_API_KEY }} From e27792b78872b1875ea025fba11f9929ee632548 Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Mon, 4 May 2026 19:36:35 +0000 Subject: [PATCH 06/12] fix(agentic-ci): close Greptile pass-2 findings (timeout, re-verify wording) - Bump daily-suite job timeout from 20 to 40 minutes. The split into two sequential `claude --max-turns 50` invocations can saturate a 20-minute budget; a mid-fix SIGTERM would leave an orphaned branch and inconsistent runner-state. - Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids rebuilding fix_backlog from scratch but does NOT override the per-candidate re-verification step required by _fix-policy.md step 4.1 (re-grep / re-read the specific file the candidate points at). Single-candidate re-verification is required; whole-codebase re-scanning is forbidden. --- .agents/recipes/_phase-fix.md | 10 ++++++++-- .github/workflows/agentic-ci-daily.yml | 2 +- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/.agents/recipes/_phase-fix.md b/.agents/recipes/_phase-fix.md index 48bf77a09..44449dd8e 100644 --- a/.agents/recipes/_phase-fix.md +++ b/.agents/recipes/_phase-fix.md @@ -6,8 +6,14 @@ This invocation runs the **FIX** phase only. report is at `/tmp/audit-{{suite}}.md` and `{{memory_path}}/runner-state.json` has the populated `fix_backlog`. - Execute only the recipe's "Fix phase" section per `_fix-policy.md`. - Do NOT redo audit work; do NOT re-scan the codebase to rebuild - findings. + Do NOT redo audit work — that is, do NOT re-scan whole packages or + rebuild `fix_backlog` from scratch. The "no re-scan" rule does NOT + override the per-candidate re-verification step required by + `_fix-policy.md` §"Standard fix procedure" step 4.1: when you pick a + candidate, you MUST re-grep / re-read the specific file or symbol it + points at to confirm the finding still applies before editing. + Re-verification of a single candidate is required; re-scanning the + codebase to discover new findings is forbidden. - Pick the highest-ranked eligible candidate from `fix_backlog`, apply the fix, run the package's tests if applicable, commit, push, and open the PR using `gh pr create --body-file`. diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index e5ca1e067..7bf3fe5e5 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -62,7 +62,7 @@ jobs: needs: determine-suite if: needs.determine-suite.outputs.suites != '[]' runs-on: [self-hosted, agentic-ci] - timeout-minutes: 20 + timeout-minutes: 40 strategy: fail-fast: false matrix: From b3e07f3dca9d1e97ac1c2dc93bbe988e17b18957 Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Mon, 4 May 2026 20:00:29 +0000 Subject: [PATCH 07/12] fix(agentic-ci): close Greptile pass-3 P1s in triage fallback - Guard `jq capture()` with a `test()` select. `capture()` errors on non-match instead of returning empty, which would truncate SEEN_PARTS if any unrelated today-dated bot comment lacks the triage marker (e.g. from a sibling workflow). Adding the test() guard ensures capture() only runs on bodies that already match. - Iterate the MISSING[] array when posting fallback parts, not the full PARTS[] array. Posting all parts when only some were missing was creating duplicate comments for the parts the agent already successfully posted. --- .github/workflows/agentic-ci-issue-triage.yml | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/.github/workflows/agentic-ci-issue-triage.yml b/.github/workflows/agentic-ci-issue-triage.yml index 9a83787a9..080ace5b7 100644 --- a/.github/workflows/agentic-ci-issue-triage.yml +++ b/.github/workflows/agentic-ci-issue-triage.yml @@ -120,33 +120,36 @@ jobs: # Skip fallback only if every expected part identity (i/N) appears # in a today-dated bot comment. Identity-based, not count-based: # a duplicate post of one part should not mask a missing other. + # The pre-`capture()` `test()` guard is required: jq `capture()` + # raises an error on non-matching input, which would truncate the + # stream if any unrelated today-dated bot comment lacks the + # triage marker (e.g. from another automated workflow). TODAY=$(date -u +%Y-%m-%d) SEEN_PARTS=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ --paginate \ - --jq ".[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${TODAY}\")) | .body | capture(\"agentic-ci-issue-triage:(?<i>[0-9]+)/${EXPECTED_PARTS}\") | .i" \ + --jq ".[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${TODAY}\")) | select(.body | test(\"agentic-ci-issue-triage:[0-9]+/${EXPECTED_PARTS}\")) | .body | capture(\"agentic-ci-issue-triage:(?<i>[0-9]+)/${EXPECTED_PARTS}\") | .i" \ 2>/dev/null | sort -u) - ALL_COVERED=1 MISSING=() for i in $(seq 1 "${EXPECTED_PARTS}"); do if ! echo "${SEEN_PARTS}" | grep -qx "${i}"; then - ALL_COVERED=0 MISSING+=("${i}") fi done - if [ "${ALL_COVERED}" -eq 1 ]; then + if [ ${#MISSING[@]} -eq 0 ]; then echo "All ${EXPECTED_PARTS} parts already posted today (identities: $(echo "${SEEN_PARTS}" | tr '\n' ',' | sed 's/,$//')). Skipping fallback." exit 0 fi if [ -n "${SEEN_PARTS}" ]; then - echo "::warning::Missing parts ${MISSING[*]} of ${EXPECTED_PARTS}; posting all parts. Duplicates may result for parts that were already posted." + echo "::warning::Posting only missing parts ${MISSING[*]} of ${EXPECTED_PARTS}; the agent already posted parts $(echo "${SEEN_PARTS}" | tr '\n' ',' | sed 's/,$//')." fi - for PART in "${PARTS[@]}"; do + for i in "${MISSING[@]}"; do + PART="${PARTS[$((i-1))]}" gh issue comment "$TRACKING_ISSUE" --body-file "$PART" - echo "Posted ${PART}" + echo "Posted part ${i}: ${PART}" done - name: Write job summary From 23829fb0412b4d40e090e16a271772a9712a025c Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Thu, 7 May 2026 23:40:00 +0000 Subject: [PATCH 08/12] fix(agentic-ci): close johnnygreco review-pass warnings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address the five Warnings from the 2026-05-07 review focused on the trust boundary for autonomous PR generation. Five workflow/policy adjustments shrink the surface where agent compliance is load-bearing: - Workflow-level scope gate. After the fix step, re-derive the diff against `origin/main` and validate against the per-suite path allowlist (regex mirrored from `_fix-policy.md`), the 50-LOC cap, and the 3-file cap. On violation, close the PR with `--delete-branch` and flip the `attempted_fixes` entry from `open` to `abandoned` so two-strike logic still sees the failure. The recipe alone could not bind the agent's path choices; the workflow now does. - Dependencies install-dev verification. For the dependencies suite only, re-run `make install-dev` after the scope gate so the agent's pyproject edit is exercised against the lockfile resolver. Closes the PR if `install-dev` fails — catches the failure mode where the per-package test target passed against the old cached lockfile. - Flip matrix-job `cancel-in-progress` from true to false. A cancellation between the agent's git push and `gh pr create` would leave an orphaned branch with no `attempted_fixes` record; reconciliation only covers PRs that were opened. Queueing a duplicate run is the lesser evil. `_fix-policy.md` Atomicity section now documents the trade-off. - Allow `/tmp/audit-{{suite}}.md` in `_phase-audit.md`'s "do not modify outside `{{memory_path}}/`" directive. A literal-minded agent could refuse to write the report file, which would break the job summary, artifact upload, and the fix phase's audit context. - Always upload the agent log artifact (was `if: failure()` only) and include `runner-state.json`. For autonomous mode, the most interesting failure is "the workflow succeeded but the PR was wrong"; the stream-json log is the only way to look back days later. Also takes johnnygreco's Suggestion 2: spell out in the policy doc that the `draft_until_proven` flip is the sole human-gated promotion step in the fix policy and must not be automated. Greptile and the github-actions auto-reviewer's findings were already closed in the prior pass-2/pass-3 commits; no action needed on those. --- .agents/recipes/_fix-policy.md | 23 +++- .agents/recipes/_phase-audit.md | 3 +- .github/workflows/agentic-ci-daily.yml | 173 ++++++++++++++++++++++++- 3 files changed, 195 insertions(+), 4 deletions(-) diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md index 7dbaf7dee..210fd5435 100644 --- a/.agents/recipes/_fix-policy.md +++ b/.agents/recipes/_fix-policy.md @@ -222,7 +222,9 @@ to the next candidate. - **Labels**: `agentic-ci`, `agentic-ci/<suite>`. - **Draft PRs**: `code-quality` opens draft until a maintainer flips `draft_until_proven` to `false` in runner-state, after at least two - non-draft PRs from that suite have landed clean. + non-draft PRs from that suite have landed clean. This flip is + intentionally manual — it is the sole human-gated promotion step in + the fix policy and must not be automated. ## Atomicity @@ -234,3 +236,22 @@ Each fix-phase invocation produces exactly one of: No half-states. The runner state is the source of truth for what the recipe has tried; never silently drop a failed attempt. + +The matrix-level concurrency for the daily workflow uses +`cancel-in-progress: false` so a fix in flight cannot be cancelled +between push and PR open. The trade-off is a queued duplicate run if a +manual dispatch arrives while cron is still going; that's preferable to +orphaned branches with no `attempted_fixes` record. + +## Workflow-level scope gate + +The agent's compliance with the path allowlists and the localized-fix +bar is load-bearing for autonomous PR generation, but the recipe alone +cannot enforce them. The daily workflow runs a post-fix scope gate that +re-derives the per-suite allowlist (mirrored from the table above) and +the diff stats from the pushed branch, then closes the PR and deletes +the remote branch on violation. The gate also flips the +`attempted_fixes` entry from `open` to `abandoned` so two-strike logic +sees the failure. Keep the workflow's allowlist regexes in sync with the +table above; the workflow is the enforcement, the table is the +specification. diff --git a/.agents/recipes/_phase-audit.md b/.agents/recipes/_phase-audit.md index 5caefa0c8..c2cef775c 100644 --- a/.agents/recipes/_phase-audit.md +++ b/.agents/recipes/_phase-audit.md @@ -9,7 +9,8 @@ This invocation runs the **AUDIT** phase only. applying the `known_issues` filter to the report, so fixable findings persist across runs even when their report row is suppressed). - Do NOT attempt any fix. Do NOT create any branches, commits, or PRs. -- Do NOT modify any files outside `{{memory_path}}/`. +- Do NOT modify any files outside `{{memory_path}}/` and the report file + `/tmp/audit-{{suite}}.md` itself. - A separate invocation will run the FIX phase if `fix_backlog` has eligible candidates and the suite has a fix phase. - Read the recipe in full for context; the "Fix phase" section informs diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 7bf3fe5e5..2a76f4c7e 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -69,8 +69,12 @@ jobs: suite: ${{ fromJSON(needs.determine-suite.outputs.suites) }} concurrency: + # cancel-in-progress is intentionally false: a cancellation between + # the agent's git push and gh pr create would leave an orphaned + # branch with no attempted_fixes record. Queueing a duplicate run is + # the lesser evil. See _fix-policy.md "Atomicity". group: agentic-ci-daily-${{ matrix.suite }} - cancel-in-progress: true + cancel-in-progress: false steps: - name: Check required config @@ -240,6 +244,166 @@ jobs: --verbose \ 2>&1 | tee /tmp/claude-fix-log.txt + - name: Validate fix scope (allowlist + LOC + file cap) + # Workflow-level enforcement of the localized-fix bar from + # _fix-policy.md. Recipe instructions alone cannot bind the agent; + # this gate re-derives the diff and closes the PR if the agent + # escaped the allowlist or the LOC/file caps. + # Note: the docstring-only constraint on docs-and-references .py + # paths remains policy-level; AST-based enforcement is a follow-up. + id: scope_gate + if: steps.fix.outcome == 'success' + env: + SUITE: ${{ matrix.suite }} + GH_TOKEN: ${{ github.token }} + run: | + set -o pipefail + + OPEN=$(jq -c ' + .attempted_fixes // [] + | map(select((.attempts | last | .outcome) == "open")) + | last // empty + ' .agentic-ci-state/runner-state.json) + if [ -z "$OPEN" ] || [ "$OPEN" = "null" ]; then + echo "No open attempted_fix recorded; nothing to validate." + exit 0 + fi + + BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') + PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') + FINDING_ID=$(echo "$OPEN" | jq -r '.id') + if [ -z "$BRANCH" ]; then + echo "::warning::Open attempt has no branch; skipping gate." + exit 0 + fi + + case "$SUITE" in + docs-and-references) + ALLOW='^(architecture/|docs/|README\.md$|CONTRIBUTING\.md$|DEVELOPMENT\.md$|STYLEGUIDE\.md$|packages/[^/]+/src/.*\.py$)' + ;; + dependencies) + ALLOW='^packages/[^/]+/pyproject\.toml$' + ;; + structure|code-quality) + ALLOW='^packages/[^/]+/src/.*\.py$' + ;; + *) + echo "::error::No allowlist defined for suite: $SUITE" + exit 1 + ;; + esac + + FILES=$(git diff --name-only origin/main...HEAD) + BAD=$(printf '%s\n' "$FILES" | grep -vE "$ALLOW" | grep -v '^$' || true) + FILE_COUNT=$(printf '%s\n' "$FILES" | grep -c '.' || echo 0) + LOC_DELTA=$(git diff --shortstat origin/main...HEAD \ + | awk '{ a=0; d=0; for (i=1;i<=NF;i++) { if ($i ~ /insertion/) a=$(i-1); if ($i ~ /deletion/) d=$(i-1) } print a+d }') + : "${LOC_DELTA:=0}" + + REASONS="" + if [ -n "$BAD" ]; then + REASONS="${REASONS}- files outside allowlist:\n$(echo "$BAD" | sed 's/^/ - /')\n" + fi + if [ "$FILE_COUNT" -gt 3 ]; then + REASONS="${REASONS}- file count ($FILE_COUNT) exceeds 3-file cap\n" + fi + if [ "$LOC_DELTA" -gt 50 ]; then + REASONS="${REASONS}- LOC delta ($LOC_DELTA) exceeds 50-line cap\n" + fi + + if [ -z "$REASONS" ]; then + echo "Scope gate passed: ${FILE_COUNT} file(s), ${LOC_DELTA} LOC, all within allowlist." + exit 0 + fi + + echo "::error::Scope gate violation" + printf '%b' "$REASONS" + + if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + MSG=$(printf 'Closed by workflow scope gate. The pushed diff violated the localized-fix bar (see `.agents/recipes/_fix-policy.md`):\n\n%b\nThe `attempted_fixes` entry has been flipped to `abandoned`.' "$REASONS") + gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ + echo "::warning::gh pr close failed; branch may need manual cleanup" + else + git push origin --delete "$BRANCH" || \ + echo "::warning::Could not delete remote branch $BRANCH" + fi + + FINDING_ID="$FINDING_ID" python3 -c " + import json, os + finding_id = os.environ['FINDING_ID'] + with open('.agentic-ci-state/runner-state.json') as f: + state = json.load(f) + for entry in state.get('attempted_fixes', []): + if entry.get('id') != finding_id: + continue + attempts = entry.get('attempts') or [] + if attempts and attempts[-1].get('outcome') == 'open': + attempts[-1]['outcome'] = 'abandoned' + attempts[-1]['gate_violation'] = True + with open('.agentic-ci-state/runner-state.json', 'w') as f: + json.dump(state, f, indent=2) + " + + exit 1 + + - name: Verify dependencies lockfile + # Dependencies suite only: re-run make install-dev against the + # agent's pyproject.toml changes. This catches the failure mode + # where the per-package test target passed against the *old* + # lockfile but the proposed dep does not actually resolve. + if: matrix.suite == 'dependencies' && steps.fix.outcome == 'success' && steps.scope_gate.outcome == 'success' + env: + GH_TOKEN: ${{ github.token }} + run: | + set -o pipefail + + OPEN=$(jq -c ' + .attempted_fixes // [] + | map(select((.attempts | last | .outcome) == "open")) + | last // empty + ' .agentic-ci-state/runner-state.json) + if [ -z "$OPEN" ] || [ "$OPEN" = "null" ]; then + echo "No open attempted_fix; skipping lockfile verification." + exit 0 + fi + + PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') + BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') + FINDING_ID=$(echo "$OPEN" | jq -r '.id') + + if make install-dev 2>&1 | tee /tmp/install-dev-verify.log; then + echo "Lockfile resolves cleanly against the agent's pyproject changes." + exit 0 + fi + + echo "::error::make install-dev failed against the agent's pyproject changes" + + if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + MSG="Closed by workflow lockfile verification. \`make install-dev\` failed against the agent's \`pyproject.toml\` changes — the dependency edit does not resolve cleanly. See \`/tmp/install-dev-verify.log\` in the workflow artifact." + gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ + echo "::warning::gh pr close failed" + else + git push origin --delete "$BRANCH" || true + fi + + FINDING_ID="$FINDING_ID" python3 -c " + import json, os + finding_id = os.environ['FINDING_ID'] + with open('.agentic-ci-state/runner-state.json') as f: + state = json.load(f) + for entry in state.get('attempted_fixes', []): + if entry.get('id') != finding_id: + continue + attempts = entry.get('attempts') or [] + if attempts and attempts[-1].get('outcome') == 'open': + attempts[-1]['outcome'] = 'abandoned' + attempts[-1]['lockfile_verification_failed'] = True + with open('.agentic-ci-state/runner-state.json', 'w') as f: + json.dump(state, f, indent=2) + " + + exit 1 + - name: Update runner memory if: always() env: @@ -266,7 +430,10 @@ jobs: " - name: Upload agent log - if: failure() + # Always upload: for autonomous PR generation, the most interesting + # failure mode is "the workflow succeeded but the PR was wrong". + # The full event stream is the only way to look back days later. + if: always() uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: claude-audit-log-${{ matrix.suite }}-${{ github.run_id }}-${{ github.run_attempt }} @@ -275,6 +442,8 @@ jobs: /tmp/claude-fix-log.txt /tmp/audit-${{ matrix.suite }}.md /tmp/pr-body-${{ matrix.suite }}.md + /tmp/install-dev-verify.log + .agentic-ci-state/runner-state.json retention-days: 14 if-no-files-found: ignore From 872d56170d0e7443c91709c40f74ebf6f183943d Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Thu, 7 May 2026 23:51:41 +0000 Subject: [PATCH 09/12] fix(agentic-ci): close Codex review-pass-2 findings on workflow gates Codex flagged five issues in the prior commit's scope/lockfile gates. This commit closes all five: - HIGH: Wrong-PR targeting. Both gates selected the last globally-open attempted_fixes entry, which could match a stale orphan from a prior crashed run rather than the PR opened by *this* run. Adds a pre-fix snapshot step that captures `(id, attempts-length)` pairs before the fix runs, and changes the post-fix selectors to require that the entry's attempts count grew during this run. - HIGH: Docstring-only enforcement gap on the docs-and-references suite. The .py path allowlist was at workflow level but the docstring-only caveat was still policy-only. Adds an AST-based check: for each .py file changed, parse the post-change tree, collect docstring line ranges (module/class/function), then verify every added line in the diff is either inside a docstring, a comment, or whitespace. Verified locally with both pass and fail fixtures. - MEDIUM: Diff-ref mismatch. Gates diffed `origin/main...HEAD` rather than `origin/main...origin/$BRANCH`, so a misbehaving agent that left HEAD pointing elsewhere would have validated the wrong tree. Now fetches `origin/$BRANCH` first and prefers that ref. Falls back to HEAD only if fetch fails (with a warning). - MEDIUM: FILE_COUNT bug. `grep -c '.' || echo 0` produced "0\n0" on empty diff, breaking the downstream integer comparison. Replaces with `mapfile -t FILE_ARR` + `${#FILE_ARR[@]}`, which is correct for any input including empty. - LOW: Non-atomic JSON writes. The runner-state mutations could leave the file half-written if the workflow was cancelled mid-write. Switches both gates to the temp-file + os.replace pattern. Also: dependencies-lockfile gate now does an explicit `git checkout --detach origin/$BRANCH` before re-running install-dev, so verification runs against what was actually pushed rather than relying on local working-tree state. --- .github/workflows/agentic-ci-daily.yml | 165 ++++++++++++++++++++++--- 1 file changed, 146 insertions(+), 19 deletions(-) diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 2a76f4c7e..5ed05768e 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -209,6 +209,18 @@ jobs: echo "size=${BACKLOG_SIZE}" >> "$GITHUB_OUTPUT" echo "fix_backlog has ${BACKLOG_SIZE} entries" + - name: Snapshot pre-fix attempted_fixes + # Captures (id, attempts-length) pairs before the fix step runs so + # the post-fix gates can identify which entry grew during *this* + # run, instead of grabbing the last globally-open entry (which + # might be a stale orphan from a prior crashed run). + id: snapshot + if: steps.audit.outcome == 'success' && steps.backlog.outcome == 'success' && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 + run: | + jq -c '.attempted_fixes // [] | map({id, n: (.attempts | length)})' \ + .agentic-ci-state/runner-state.json > /tmp/prior-attempted-fixes.json + echo "Snapshot: $(cat /tmp/prior-attempted-fixes.json)" + - name: Run fix recipe id: fix if: steps.audit.outcome == 'success' && steps.backlog.outcome == 'success' && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 @@ -248,9 +260,9 @@ jobs: # Workflow-level enforcement of the localized-fix bar from # _fix-policy.md. Recipe instructions alone cannot bind the agent; # this gate re-derives the diff and closes the PR if the agent - # escaped the allowlist or the LOC/file caps. - # Note: the docstring-only constraint on docs-and-references .py - # paths remains policy-level; AST-based enforcement is a follow-up. + # escaped the allowlist or the LOC/file caps. The docs-and-references + # suite additionally gets AST-based docstring-only enforcement on + # .py edits (no non-docstring/non-comment lines may change). id: scope_gate if: steps.fix.outcome == 'success' env: @@ -259,13 +271,19 @@ jobs: run: | set -o pipefail - OPEN=$(jq -c ' - .attempted_fixes // [] - | map(select((.attempts | last | .outcome) == "open")) + # Identify the attempted_fixes entry that grew during *this* run + # (vs the pre-fix snapshot), not the last globally-open entry. + OPEN=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' + (($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p + | .attempted_fixes // [] + | map(select( + ((.attempts | last | .outcome) == "open") + and ((.attempts | length) > ($p[.id] // 0)) + )) | last // empty ' .agentic-ci-state/runner-state.json) if [ -z "$OPEN" ] || [ "$OPEN" = "null" ]; then - echo "No open attempted_fix recorded; nothing to validate." + echo "No new open attempted_fix recorded by this run; nothing to validate." exit 0 fi @@ -277,6 +295,17 @@ jobs: exit 0 fi + # Diff against the actual pushed branch (origin/$BRANCH), not + # local HEAD — HEAD may not match what was pushed if the agent + # left the working tree in an unexpected state. + git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true + if git rev-parse --verify "refs/remotes/origin/$BRANCH" >/dev/null 2>&1; then + DIFF_REF="origin/$BRANCH" + else + echo "::warning::origin/$BRANCH not fetchable; falling back to HEAD" + DIFF_REF="HEAD" + fi + case "$SUITE" in docs-and-references) ALLOW='^(architecture/|docs/|README\.md$|CONTRIBUTING\.md$|DEVELOPMENT\.md$|STYLEGUIDE\.md$|packages/[^/]+/src/.*\.py$)' @@ -293,10 +322,14 @@ jobs: ;; esac - FILES=$(git diff --name-only origin/main...HEAD) - BAD=$(printf '%s\n' "$FILES" | grep -vE "$ALLOW" | grep -v '^$' || true) - FILE_COUNT=$(printf '%s\n' "$FILES" | grep -c '.' || echo 0) - LOC_DELTA=$(git diff --shortstat origin/main...HEAD \ + mapfile -t FILE_ARR < <(git diff --name-only "origin/main...$DIFF_REF") + FILE_COUNT=${#FILE_ARR[@]} + if [ "$FILE_COUNT" -gt 0 ]; then + BAD=$(printf '%s\n' "${FILE_ARR[@]}" | grep -vE "$ALLOW" | grep -v '^$' || true) + else + BAD="" + fi + LOC_DELTA=$(git diff --shortstat "origin/main...$DIFF_REF" \ | awk '{ a=0; d=0; for (i=1;i<=NF;i++) { if ($i ~ /insertion/) a=$(i-1); if ($i ~ /deletion/) d=$(i-1) } print a+d }') : "${LOC_DELTA:=0}" @@ -311,6 +344,81 @@ jobs: REASONS="${REASONS}- LOC delta ($LOC_DELTA) exceeds 50-line cap\n" fi + # Docs suite: AST-enforce the docstring-only caveat on .py edits. + if [ "$SUITE" = "docs-and-references" ] && [ "$FILE_COUNT" -gt 0 ]; then + PY_FILES=$(printf '%s\n' "${FILE_ARR[@]}" | grep -E '^packages/[^/]+/src/.*\.py$' || true) + if [ -n "$PY_FILES" ]; then + NON_DOCSTRING=$(PY_FILES="$PY_FILES" DIFF_REF="$DIFF_REF" python3 - <<'PY' + import ast, os, subprocess, sys + + files = [p for p in os.environ['PY_FILES'].splitlines() if p] + diff_ref = os.environ['DIFF_REF'] + violations = [] + + for path in files: + try: + content = subprocess.check_output( + ['git', 'show', f'{diff_ref}:{path}'], text=True, errors='replace' + ) + except subprocess.CalledProcessError: + violations.append(f'{path}: file deleted (deletion not allowed in docs suite)') + continue + try: + tree = ast.parse(content) + except SyntaxError as e: + violations.append(f'{path}: parse error ({e})') + continue + docstring_lines = set() + for node in ast.walk(tree): + if isinstance(node, (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)): + body = getattr(node, 'body', None) or [] + if (body and isinstance(body[0], ast.Expr) + and isinstance(body[0].value, ast.Constant) + and isinstance(body[0].value.value, str)): + start, end = body[0].lineno, body[0].end_lineno + if start and end: + docstring_lines.update(range(start, end + 1)) + try: + hunks = subprocess.check_output( + ['git', 'diff', '-U0', f'origin/main...{diff_ref}', '--', path], + text=True, errors='replace', + ) + except subprocess.CalledProcessError: + violations.append(f'{path}: could not compute diff') + continue + cur = None + for line in hunks.splitlines(): + if line.startswith('@@'): + try: + plus = line.split('+', 1)[1].split(' ', 1)[0] + cur = int(plus.split(',')[0]) if ',' in plus else int(plus) + except (ValueError, IndexError): + cur = None + continue + if cur is None: + continue + if line.startswith('+') and not line.startswith('+++'): + stripped = line[1:].strip() + ln = cur + cur += 1 + if not stripped or stripped.startswith('#'): + continue + if ln not in docstring_lines: + violations.append(f'{path}:{ln}') + elif line.startswith(' '): + cur += 1 + + if violations: + print('\n'.join(violations)) + sys.exit(1) + PY + ) || true + if [ -n "$NON_DOCSTRING" ]; then + REASONS="${REASONS}- non-docstring/non-comment .py edits in docs suite:\n$(echo "$NON_DOCSTRING" | sed 's/^/ - /')\n" + fi + fi + fi + if [ -z "$REASONS" ]; then echo "Scope gate passed: ${FILE_COUNT} file(s), ${LOC_DELTA} LOC, all within allowlist." exit 0 @@ -331,7 +439,8 @@ jobs: FINDING_ID="$FINDING_ID" python3 -c " import json, os finding_id = os.environ['FINDING_ID'] - with open('.agentic-ci-state/runner-state.json') as f: + path = '.agentic-ci-state/runner-state.json' + with open(path) as f: state = json.load(f) for entry in state.get('attempted_fixes', []): if entry.get('id') != finding_id: @@ -340,8 +449,10 @@ jobs: if attempts and attempts[-1].get('outcome') == 'open': attempts[-1]['outcome'] = 'abandoned' attempts[-1]['gate_violation'] = True - with open('.agentic-ci-state/runner-state.json', 'w') as f: + tmp = path + '.tmp' + with open(tmp, 'w') as f: json.dump(state, f, indent=2) + os.replace(tmp, path) " exit 1 @@ -357,13 +468,19 @@ jobs: run: | set -o pipefail - OPEN=$(jq -c ' - .attempted_fixes // [] - | map(select((.attempts | last | .outcome) == "open")) + # Same snapshot-based selector as the scope gate: target only + # the entry whose attempts grew during this run. + OPEN=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' + (($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p + | .attempted_fixes // [] + | map(select( + ((.attempts | last | .outcome) == "open") + and ((.attempts | length) > ($p[.id] // 0)) + )) | last // empty ' .agentic-ci-state/runner-state.json) if [ -z "$OPEN" ] || [ "$OPEN" = "null" ]; then - echo "No open attempted_fix; skipping lockfile verification." + echo "No new open attempted_fix; skipping lockfile verification." exit 0 fi @@ -371,6 +488,13 @@ jobs: BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') FINDING_ID=$(echo "$OPEN" | jq -r '.id') + # Verify against the actually-pushed branch, not local HEAD. + if [ -n "$BRANCH" ]; then + git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true + git checkout --detach "refs/remotes/origin/$BRANCH" 2>/dev/null \ + || echo "::warning::Could not checkout origin/$BRANCH; verifying current tree" + fi + if make install-dev 2>&1 | tee /tmp/install-dev-verify.log; then echo "Lockfile resolves cleanly against the agent's pyproject changes." exit 0 @@ -389,7 +513,8 @@ jobs: FINDING_ID="$FINDING_ID" python3 -c " import json, os finding_id = os.environ['FINDING_ID'] - with open('.agentic-ci-state/runner-state.json') as f: + path = '.agentic-ci-state/runner-state.json' + with open(path) as f: state = json.load(f) for entry in state.get('attempted_fixes', []): if entry.get('id') != finding_id: @@ -398,8 +523,10 @@ jobs: if attempts and attempts[-1].get('outcome') == 'open': attempts[-1]['outcome'] = 'abandoned' attempts[-1]['lockfile_verification_failed'] = True - with open('.agentic-ci-state/runner-state.json', 'w') as f: + tmp = path + '.tmp' + with open(tmp, 'w') as f: json.dump(state, f, indent=2) + os.replace(tmp, path) " exit 1 From 68827c156f82a6bd457e748b81f838d08e3140d0 Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Fri, 8 May 2026 00:07:08 +0000 Subject: [PATCH 10/12] fix(agentic-ci): gate fix + scope_gate steps on snapshot.outcome MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Greptile review on 872d5617 flagged that the fix step's custom `if:` expression bypasses GitHub Actions' implicit success() check. Without explicitly referencing steps.snapshot.outcome, a snapshot failure (corrupt runner-state, disk error) would let the fix step run anyway. The scope gate's `jq --slurpfile prior /tmp/prior-attempted-fixes.json` would then exit non-zero on the missing file, leave OPEN empty, and hit the "nothing to validate" early-exit — silently approving whatever the agent pushed. Adds steps.snapshot.outcome == 'success' to both the fix step's condition (the actual fix) and the scope_gate step's condition (belt-and-suspenders against future refactors). --- .github/workflows/agentic-ci-daily.yml | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 5ed05768e..93832c0e8 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -223,7 +223,12 @@ jobs: - name: Run fix recipe id: fix - if: steps.audit.outcome == 'success' && steps.backlog.outcome == 'success' && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 + # Custom if: bypasses implicit success(), so snapshot.outcome must + # be checked explicitly. Without it, a snapshot failure (corrupt + # runner-state, disk error) would leave /tmp/prior-attempted-fixes.json + # missing, the scope gate's jq --slurpfile would short-circuit, and + # the gate would exit 0 — silently approving the agent's PR. + if: steps.audit.outcome == 'success' && steps.backlog.outcome == 'success' && steps.snapshot.outcome == 'success' && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 env: ANTHROPIC_BASE_URL: ${{ secrets.AGENTIC_CI_API_BASE_URL }} ANTHROPIC_API_KEY: ${{ secrets.AGENTIC_CI_API_KEY }} @@ -264,7 +269,10 @@ jobs: # suite additionally gets AST-based docstring-only enforcement on # .py edits (no non-docstring/non-comment lines may change). id: scope_gate - if: steps.fix.outcome == 'success' + # Belt-and-suspenders: also check snapshot succeeded. The fix step + # already gates on snapshot, so if we got here the snapshot ran, + # but be explicit so a future condition refactor cannot regress. + if: steps.fix.outcome == 'success' && steps.snapshot.outcome == 'success' env: SUITE: ${{ matrix.suite }} GH_TOKEN: ${{ github.token }} From 3a0bb95c583324cb92f00b69f0d2018575780ff1 Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Fri, 8 May 2026 00:38:44 +0000 Subject: [PATCH 11/12] fix(agentic-ci): harden daily fix gates Signed-off-by: Andre Manoel <amanoel@nvidia.com> --- .github/workflows/agentic-ci-daily.yml | 175 +++++++++++++++++++------ 1 file changed, 132 insertions(+), 43 deletions(-) diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 93832c0e8..7db78d141 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -298,20 +298,29 @@ jobs: BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') FINDING_ID=$(echo "$OPEN" | jq -r '.id') + + REASONS="" + if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) + if [ -n "$BRANCH" ]; then + echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + fi + fi if [ -z "$BRANCH" ]; then - echo "::warning::Open attempt has no branch; skipping gate." - exit 0 + REASONS="${REASONS}- open attempt has no branch and could not be recovered from PR ${PR_NUMBER:-unknown}\n" fi # Diff against the actual pushed branch (origin/$BRANCH), not # local HEAD — HEAD may not match what was pushed if the agent # left the working tree in an unexpected state. - git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true - if git rev-parse --verify "refs/remotes/origin/$BRANCH" >/dev/null 2>&1; then - DIFF_REF="origin/$BRANCH" - else - echo "::warning::origin/$BRANCH not fetchable; falling back to HEAD" - DIFF_REF="HEAD" + if [ -n "$BRANCH" ]; then + git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true + if git rev-parse --verify "refs/remotes/origin/$BRANCH" >/dev/null 2>&1; then + DIFF_REF="origin/$BRANCH" + else + echo "::warning::origin/$BRANCH not fetchable; falling back to HEAD" + DIFF_REF="HEAD" + fi fi case "$SUITE" in @@ -330,18 +339,24 @@ jobs: ;; esac - mapfile -t FILE_ARR < <(git diff --name-only "origin/main...$DIFF_REF") - FILE_COUNT=${#FILE_ARR[@]} - if [ "$FILE_COUNT" -gt 0 ]; then - BAD=$(printf '%s\n' "${FILE_ARR[@]}" | grep -vE "$ALLOW" | grep -v '^$' || true) + if [ -n "$BRANCH" ]; then + mapfile -t FILE_ARR < <(git diff --name-only "origin/main...$DIFF_REF") + FILE_COUNT=${#FILE_ARR[@]} + if [ "$FILE_COUNT" -gt 0 ]; then + BAD=$(printf '%s\n' "${FILE_ARR[@]}" | grep -vE "$ALLOW" | grep -v '^$' || true) + else + BAD="" + fi + LOC_DELTA=$(git diff --shortstat "origin/main...$DIFF_REF" \ + | awk '{ a=0; d=0; for (i=1;i<=NF;i++) { if ($i ~ /insertion/) a=$(i-1); if ($i ~ /deletion/) d=$(i-1) } print a+d }') + : "${LOC_DELTA:=0}" else + FILE_ARR=() + FILE_COUNT=0 BAD="" + LOC_DELTA=0 fi - LOC_DELTA=$(git diff --shortstat "origin/main...$DIFF_REF" \ - | awk '{ a=0; d=0; for (i=1;i<=NF;i++) { if ($i ~ /insertion/) a=$(i-1); if ($i ~ /deletion/) d=$(i-1) } print a+d }') - : "${LOC_DELTA:=0}" - REASONS="" if [ -n "$BAD" ]; then REASONS="${REASONS}- files outside allowlist:\n$(echo "$BAD" | sed 's/^/ - /')\n" fi @@ -357,26 +372,42 @@ jobs: PY_FILES=$(printf '%s\n' "${FILE_ARR[@]}" | grep -E '^packages/[^/]+/src/.*\.py$' || true) if [ -n "$PY_FILES" ]; then NON_DOCSTRING=$(PY_FILES="$PY_FILES" DIFF_REF="$DIFF_REF" python3 - <<'PY' - import ast, os, subprocess, sys + import ast + import os + import re + import subprocess + import sys files = [p for p in os.environ['PY_FILES'].splitlines() if p] diff_ref = os.environ['DIFF_REF'] violations = [] + hunk_re = re.compile(r'@@ -(?P<old>\d+)(?:,\d+)? \+(?P<new>\d+)(?:,\d+)? @@') - for path in files: + try: + base_ref = subprocess.check_output( + ['git', 'merge-base', 'origin/main', diff_ref], text=True + ).strip() + except subprocess.CalledProcessError: + violations.append(f'could not compute merge base for origin/main...{diff_ref}') + base_ref = 'origin/main' + + def collect_docstring_lines(ref, path, *, missing_ok=False): try: content = subprocess.check_output( - ['git', 'show', f'{diff_ref}:{path}'], text=True, errors='replace' + ['git', 'show', f'{ref}:{path}'], text=True, errors='replace' ) except subprocess.CalledProcessError: - violations.append(f'{path}: file deleted (deletion not allowed in docs suite)') - continue + if missing_ok: + return set() + violations.append(f'{path}: file missing at {ref}') + return None try: tree = ast.parse(content) except SyntaxError as e: - violations.append(f'{path}: parse error ({e})') - continue - docstring_lines = set() + violations.append(f'{path}: parse error at {ref} ({e})') + return None + + lines = set() for node in ast.walk(tree): if isinstance(node, (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)): body = getattr(node, 'body', None) or [] @@ -385,7 +416,17 @@ jobs: and isinstance(body[0].value.value, str)): start, end = body[0].lineno, body[0].end_lineno if start and end: - docstring_lines.update(range(start, end + 1)) + lines.update(range(start, end + 1)) + return lines + + for path in files: + old_docstring_lines = collect_docstring_lines(base_ref, path, missing_ok=True) + new_docstring_lines = collect_docstring_lines(diff_ref, path) + if new_docstring_lines is None: + violations.append(f'{path}: file deleted or unreadable at {diff_ref}') + continue + if old_docstring_lines is None: + continue try: hunks = subprocess.check_output( ['git', 'diff', '-U0', f'origin/main...{diff_ref}', '--', path], @@ -394,27 +435,39 @@ jobs: except subprocess.CalledProcessError: violations.append(f'{path}: could not compute diff') continue - cur = None + old_cur = None + new_cur = None for line in hunks.splitlines(): if line.startswith('@@'): - try: - plus = line.split('+', 1)[1].split(' ', 1)[0] - cur = int(plus.split(',')[0]) if ',' in plus else int(plus) - except (ValueError, IndexError): - cur = None + match = hunk_re.match(line) + if match: + old_cur = int(match.group('old')) + new_cur = int(match.group('new')) + else: + old_cur = None + new_cur = None continue - if cur is None: + if old_cur is None or new_cur is None: continue if line.startswith('+') and not line.startswith('+++'): stripped = line[1:].strip() - ln = cur - cur += 1 + ln = new_cur + new_cur += 1 if not stripped or stripped.startswith('#'): continue - if ln not in docstring_lines: - violations.append(f'{path}:{ln}') + if ln not in new_docstring_lines: + violations.append(f'{path}:{ln} added outside docstring') + elif line.startswith('-') and not line.startswith('---'): + stripped = line[1:].strip() + ln = old_cur + old_cur += 1 + if not stripped or stripped.startswith('#'): + continue + if ln not in old_docstring_lines: + violations.append(f'{path}:{ln} removed outside docstring') elif line.startswith(' '): - cur += 1 + old_cur += 1 + new_cur += 1 if violations: print('\n'.join(violations)) @@ -439,9 +492,11 @@ jobs: MSG=$(printf 'Closed by workflow scope gate. The pushed diff violated the localized-fix bar (see `.agents/recipes/_fix-policy.md`):\n\n%b\nThe `attempted_fixes` entry has been flipped to `abandoned`.' "$REASONS") gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ echo "::warning::gh pr close failed; branch may need manual cleanup" - else + elif [ -n "$BRANCH" ]; then git push origin --delete "$BRANCH" || \ echo "::warning::Could not delete remote branch $BRANCH" + else + echo "::warning::No PR number or branch available for cleanup" fi FINDING_ID="$FINDING_ID" python3 -c " @@ -463,14 +518,16 @@ jobs: os.replace(tmp, path) " - exit 1 + echo "rejected=true" >> "$GITHUB_OUTPUT" + exit 0 - name: Verify dependencies lockfile # Dependencies suite only: re-run make install-dev against the # agent's pyproject.toml changes. This catches the failure mode # where the per-package test target passed against the *old* # lockfile but the proposed dep does not actually resolve. - if: matrix.suite == 'dependencies' && steps.fix.outcome == 'success' && steps.scope_gate.outcome == 'success' + id: lockfile_gate + if: matrix.suite == 'dependencies' && steps.fix.outcome == 'success' && steps.scope_gate.outcome == 'success' && steps.scope_gate.outputs.rejected != 'true' env: GH_TOKEN: ${{ github.token }} run: | @@ -497,6 +554,12 @@ jobs: FINDING_ID=$(echo "$OPEN" | jq -r '.id') # Verify against the actually-pushed branch, not local HEAD. + if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) + if [ -n "$BRANCH" ]; then + echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + fi + fi if [ -n "$BRANCH" ]; then git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true git checkout --detach "refs/remotes/origin/$BRANCH" 2>/dev/null \ @@ -514,8 +577,10 @@ jobs: MSG="Closed by workflow lockfile verification. \`make install-dev\` failed against the agent's \`pyproject.toml\` changes — the dependency edit does not resolve cleanly. See \`/tmp/install-dev-verify.log\` in the workflow artifact." gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ echo "::warning::gh pr close failed" - else + elif [ -n "$BRANCH" ]; then git push origin --delete "$BRANCH" || true + else + echo "::warning::No PR number or branch available for cleanup" fi FINDING_ID="$FINDING_ID" python3 -c " @@ -537,7 +602,8 @@ jobs: os.replace(tmp, path) " - exit 1 + echo "rejected=true" >> "$GITHUB_OUTPUT" + exit 0 - name: Update runner memory if: always() @@ -545,7 +611,7 @@ jobs: SUITE: ${{ matrix.suite }} AUDIT_OUTCOME: ${{ steps.audit.outcome }} run: | - # Always validate state (cache saves regardless of outcome) + # Always validate state before cache save/post-job handling. python3 -c " import json, datetime, os try: @@ -589,12 +655,20 @@ jobs: AUDIT_OUTCOME: ${{ steps.audit.outcome }} FIX_OUTCOME: ${{ steps.fix.outcome }} BACKLOG_SIZE: ${{ steps.backlog.outputs.size }} + SCOPE_REJECTED: ${{ steps.scope_gate.outputs.rejected }} + LOCKFILE_REJECTED: ${{ steps.lockfile_gate.outputs.rejected }} run: | echo "## Daily Audit: ${SUITE}" >> "$GITHUB_STEP_SUMMARY" echo "" >> "$GITHUB_STEP_SUMMARY" echo "- Audit: \`${AUDIT_OUTCOME:-unknown}\`" >> "$GITHUB_STEP_SUMMARY" echo "- Fix backlog size: \`${BACKLOG_SIZE:-n/a}\`" >> "$GITHUB_STEP_SUMMARY" echo "- Fix: \`${FIX_OUTCOME:-skipped}\`" >> "$GITHUB_STEP_SUMMARY" + if [ "${SCOPE_REJECTED}" = "true" ]; then + echo "- Scope gate: \`rejected and closed PR\`" >> "$GITHUB_STEP_SUMMARY" + fi + if [ "${LOCKFILE_REJECTED}" = "true" ]; then + echo "- Lockfile gate: \`rejected and closed PR\`" >> "$GITHUB_STEP_SUMMARY" + fi echo "" >> "$GITHUB_STEP_SUMMARY" if [ -s "/tmp/audit-${SUITE}.md" ]; then @@ -602,3 +676,18 @@ jobs: else echo "No report generated. See the \`claude-audit-log-*\` artifact on failures for the full event stream." >> "$GITHUB_STEP_SUMMARY" fi + + - name: Save rejected gate state + if: always() && (steps.scope_gate.outputs.rejected == 'true' || steps.lockfile_gate.outputs.rejected == 'true') + uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5 + with: + path: | + .agentic-ci-state + graphify-out + key: agentic-ci-state-${{ matrix.suite }}-${{ github.run_id }}-${{ github.run_attempt }}-rejected + + - name: Fail rejected fix gates + if: always() && (steps.scope_gate.outputs.rejected == 'true' || steps.lockfile_gate.outputs.rejected == 'true') + run: | + echo "::error::A post-fix gate rejected and closed the agent PR. Runner memory was saved before failing." + exit 1 From c1d55708c36963fc986aa70f0685bb812001bfea Mon Sep 17 00:00:00 2001 From: Andre Manoel <amanoel@nvidia.com> Date: Fri, 8 May 2026 20:23:52 +0000 Subject: [PATCH 12/12] fix(agentic-ci): validate all grown fix attempts --- .github/workflows/agentic-ci-daily.yml | 295 ++++++++++++++----------- 1 file changed, 160 insertions(+), 135 deletions(-) diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index 7db78d141..247a13122 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -279,99 +279,107 @@ jobs: run: | set -o pipefail - # Identify the attempted_fixes entry that grew during *this* run - # (vs the pre-fix snapshot), not the last globally-open entry. - OPEN=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' + # Identify every attempted_fixes entry that grew during *this* run + # (vs the pre-fix snapshot), not just the last globally-open entry. + OPEN_ENTRIES=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' (($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p - | .attempted_fixes // [] - | map(select( + | [ + .attempted_fixes // [] + | .[] + | select( ((.attempts | last | .outcome) == "open") and ((.attempts | length) > ($p[.id] // 0)) - )) - | last // empty + ) + ] ' .agentic-ci-state/runner-state.json) - if [ -z "$OPEN" ] || [ "$OPEN" = "null" ]; then + OPEN_COUNT=$(echo "$OPEN_ENTRIES" | jq 'length') + if [ "$OPEN_COUNT" -eq 0 ]; then echo "No new open attempted_fix recorded by this run; nothing to validate." exit 0 fi - BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') - PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') - FINDING_ID=$(echo "$OPEN" | jq -r '.id') + echo "Validating ${OPEN_COUNT} new open attempted_fix entries." + REJECTED=0 - REASONS="" - if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then - BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) - if [ -n "$BRANCH" ]; then - echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + while IFS= read -r OPEN; do + BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') + PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') + FINDING_ID=$(echo "$OPEN" | jq -r '.id') + DIFF_REF="" + REASONS="" + + if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) + if [ -n "$BRANCH" ]; then + echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + fi + fi + if [ -z "$BRANCH" ]; then + REASONS="${REASONS}- open attempt has no branch and could not be recovered from PR ${PR_NUMBER:-unknown}\n" fi - fi - if [ -z "$BRANCH" ]; then - REASONS="${REASONS}- open attempt has no branch and could not be recovered from PR ${PR_NUMBER:-unknown}\n" - fi - # Diff against the actual pushed branch (origin/$BRANCH), not - # local HEAD — HEAD may not match what was pushed if the agent - # left the working tree in an unexpected state. - if [ -n "$BRANCH" ]; then - git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true - if git rev-parse --verify "refs/remotes/origin/$BRANCH" >/dev/null 2>&1; then - DIFF_REF="origin/$BRANCH" - else - echo "::warning::origin/$BRANCH not fetchable; falling back to HEAD" - DIFF_REF="HEAD" + # Diff against the actual pushed branch (origin/$BRANCH), not + # local HEAD — HEAD may not match what was pushed if the agent + # left the working tree in an unexpected state. + if [ -n "$BRANCH" ]; then + git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true + if git rev-parse --verify "refs/remotes/origin/$BRANCH" >/dev/null 2>&1; then + DIFF_REF="origin/$BRANCH" + else + echo "::warning::origin/$BRANCH not fetchable; falling back to HEAD" + DIFF_REF="HEAD" + fi fi - fi - case "$SUITE" in - docs-and-references) - ALLOW='^(architecture/|docs/|README\.md$|CONTRIBUTING\.md$|DEVELOPMENT\.md$|STYLEGUIDE\.md$|packages/[^/]+/src/.*\.py$)' - ;; - dependencies) - ALLOW='^packages/[^/]+/pyproject\.toml$' - ;; - structure|code-quality) - ALLOW='^packages/[^/]+/src/.*\.py$' - ;; - *) - echo "::error::No allowlist defined for suite: $SUITE" - exit 1 - ;; - esac + case "$SUITE" in + docs-and-references) + ALLOW='^(architecture/|docs/|README\.md$|CONTRIBUTING\.md$|DEVELOPMENT\.md$|STYLEGUIDE\.md$|packages/[^/]+/src/.*\.py$)' + ;; + dependencies) + ALLOW='^packages/[^/]+/pyproject\.toml$' + ;; + structure|code-quality) + ALLOW='^packages/[^/]+/src/.*\.py$' + ;; + *) + echo "::error::No allowlist defined for suite: $SUITE" + exit 1 + ;; + esac - if [ -n "$BRANCH" ]; then - mapfile -t FILE_ARR < <(git diff --name-only "origin/main...$DIFF_REF") - FILE_COUNT=${#FILE_ARR[@]} - if [ "$FILE_COUNT" -gt 0 ]; then - BAD=$(printf '%s\n' "${FILE_ARR[@]}" | grep -vE "$ALLOW" | grep -v '^$' || true) + if [ -n "$BRANCH" ]; then + mapfile -t FILE_ARR < <(git diff --name-only "origin/main...$DIFF_REF") + FILE_COUNT=${#FILE_ARR[@]} + if [ "$FILE_COUNT" -gt 0 ]; then + BAD=$(printf '%s\n' "${FILE_ARR[@]}" | grep -vE "$ALLOW" | grep -v '^$' || true) + else + BAD="" + fi + LOC_DELTA=$(git diff --shortstat "origin/main...$DIFF_REF" \ + | awk '{ a=0; d=0; for (i=1;i<=NF;i++) { if ($i ~ /insertion/) a=$(i-1); if ($i ~ /deletion/) d=$(i-1) } print a+d }') + : "${LOC_DELTA:=0}" else + FILE_ARR=() + FILE_COUNT=0 BAD="" + LOC_DELTA=0 fi - LOC_DELTA=$(git diff --shortstat "origin/main...$DIFF_REF" \ - | awk '{ a=0; d=0; for (i=1;i<=NF;i++) { if ($i ~ /insertion/) a=$(i-1); if ($i ~ /deletion/) d=$(i-1) } print a+d }') - : "${LOC_DELTA:=0}" - else - FILE_ARR=() - FILE_COUNT=0 - BAD="" - LOC_DELTA=0 - fi - if [ -n "$BAD" ]; then - REASONS="${REASONS}- files outside allowlist:\n$(echo "$BAD" | sed 's/^/ - /')\n" - fi - if [ "$FILE_COUNT" -gt 3 ]; then - REASONS="${REASONS}- file count ($FILE_COUNT) exceeds 3-file cap\n" - fi - if [ "$LOC_DELTA" -gt 50 ]; then - REASONS="${REASONS}- LOC delta ($LOC_DELTA) exceeds 50-line cap\n" - fi + if [ -n "$BAD" ]; then + REASONS="${REASONS}- files outside allowlist:\n$(echo "$BAD" | sed 's/^/ - /')\n" + fi + if [ "$FILE_COUNT" -gt 3 ]; then + REASONS="${REASONS}- file count ($FILE_COUNT) exceeds 3-file cap\n" + fi + if [ "$LOC_DELTA" -gt 50 ]; then + REASONS="${REASONS}- LOC delta ($LOC_DELTA) exceeds 50-line cap\n" + fi - # Docs suite: AST-enforce the docstring-only caveat on .py edits. - if [ "$SUITE" = "docs-and-references" ] && [ "$FILE_COUNT" -gt 0 ]; then - PY_FILES=$(printf '%s\n' "${FILE_ARR[@]}" | grep -E '^packages/[^/]+/src/.*\.py$' || true) - if [ -n "$PY_FILES" ]; then - NON_DOCSTRING=$(PY_FILES="$PY_FILES" DIFF_REF="$DIFF_REF" python3 - <<'PY' + # Docs suite: AST-enforce the docstring-only caveat on .py edits. + if [ "$SUITE" = "docs-and-references" ] && [ "$FILE_COUNT" -gt 0 ]; then + PY_FILES=$(printf '%s\n' "${FILE_ARR[@]}" | grep -E '^packages/[^/]+/src/.*\.py$' || true) + if [ -n "$PY_FILES" ]; then + NON_DOCSTRING=$(PY_FILES="$PY_FILES" DIFF_REF="$DIFF_REF" python3 - <<'PY' import ast import os import re @@ -473,33 +481,34 @@ jobs: print('\n'.join(violations)) sys.exit(1) PY - ) || true - if [ -n "$NON_DOCSTRING" ]; then - REASONS="${REASONS}- non-docstring/non-comment .py edits in docs suite:\n$(echo "$NON_DOCSTRING" | sed 's/^/ - /')\n" + ) || true + if [ -n "$NON_DOCSTRING" ]; then + REASONS="${REASONS}- non-docstring/non-comment .py edits in docs suite:\n$(echo "$NON_DOCSTRING" | sed 's/^/ - /')\n" + fi fi fi - fi - if [ -z "$REASONS" ]; then - echo "Scope gate passed: ${FILE_COUNT} file(s), ${LOC_DELTA} LOC, all within allowlist." - exit 0 - fi - - echo "::error::Scope gate violation" - printf '%b' "$REASONS" + if [ -z "$REASONS" ]; then + echo "Scope gate passed for ${FINDING_ID}: ${FILE_COUNT} file(s), ${LOC_DELTA} LOC, all within allowlist." + continue + fi - if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then - MSG=$(printf 'Closed by workflow scope gate. The pushed diff violated the localized-fix bar (see `.agents/recipes/_fix-policy.md`):\n\n%b\nThe `attempted_fixes` entry has been flipped to `abandoned`.' "$REASONS") - gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ - echo "::warning::gh pr close failed; branch may need manual cleanup" - elif [ -n "$BRANCH" ]; then - git push origin --delete "$BRANCH" || \ - echo "::warning::Could not delete remote branch $BRANCH" - else - echo "::warning::No PR number or branch available for cleanup" - fi + REJECTED=1 + echo "::error::Scope gate violation for ${FINDING_ID}" + printf '%b' "$REASONS" + + if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + MSG=$(printf 'Closed by workflow scope gate. The pushed diff violated the localized-fix bar (see `.agents/recipes/_fix-policy.md`):\n\n%b\nThe `attempted_fixes` entry has been flipped to `abandoned`.' "$REASONS") + gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ + echo "::warning::gh pr close failed; branch may need manual cleanup" + elif [ -n "$BRANCH" ]; then + git push origin --delete "$BRANCH" || \ + echo "::warning::Could not delete remote branch $BRANCH" + else + echo "::warning::No PR number or branch available for cleanup" + fi - FINDING_ID="$FINDING_ID" python3 -c " + FINDING_ID="$FINDING_ID" python3 -c " import json, os finding_id = os.environ['FINDING_ID'] path = '.agentic-ci-state/runner-state.json' @@ -518,7 +527,11 @@ jobs: os.replace(tmp, path) " - echo "rejected=true" >> "$GITHUB_OUTPUT" + done < <(echo "$OPEN_ENTRIES" | jq -c '.[]') + + if [ "$REJECTED" -eq 1 ]; then + echo "rejected=true" >> "$GITHUB_OUTPUT" + fi exit 0 - name: Verify dependencies lockfile @@ -527,63 +540,71 @@ jobs: # where the per-package test target passed against the *old* # lockfile but the proposed dep does not actually resolve. id: lockfile_gate - if: matrix.suite == 'dependencies' && steps.fix.outcome == 'success' && steps.scope_gate.outcome == 'success' && steps.scope_gate.outputs.rejected != 'true' + if: matrix.suite == 'dependencies' && steps.fix.outcome == 'success' && steps.scope_gate.outcome == 'success' env: GH_TOKEN: ${{ github.token }} run: | set -o pipefail - # Same snapshot-based selector as the scope gate: target only - # the entry whose attempts grew during this run. - OPEN=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' + # Same snapshot-based selector as the scope gate: target every + # entry whose attempts grew during this run. + OPEN_ENTRIES=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' (($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p - | .attempted_fixes // [] - | map(select( + | [ + .attempted_fixes // [] + | .[] + | select( ((.attempts | last | .outcome) == "open") and ((.attempts | length) > ($p[.id] // 0)) - )) - | last // empty + ) + ] ' .agentic-ci-state/runner-state.json) - if [ -z "$OPEN" ] || [ "$OPEN" = "null" ]; then + OPEN_COUNT=$(echo "$OPEN_ENTRIES" | jq 'length') + if [ "$OPEN_COUNT" -eq 0 ]; then echo "No new open attempted_fix; skipping lockfile verification." exit 0 fi - PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') - BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') - FINDING_ID=$(echo "$OPEN" | jq -r '.id') + echo "Verifying dependencies lockfile for ${OPEN_COUNT} new open attempted_fix entries." + REJECTED=0 - # Verify against the actually-pushed branch, not local HEAD. - if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then - BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) + while IFS= read -r OPEN; do + PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') + BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') + FINDING_ID=$(echo "$OPEN" | jq -r '.id') + + # Verify against the actually-pushed branch, not local HEAD. + if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) + if [ -n "$BRANCH" ]; then + echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + fi + fi if [ -n "$BRANCH" ]; then - echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true + git checkout --force --detach "refs/remotes/origin/$BRANCH" 2>/dev/null \ + || echo "::warning::Could not checkout origin/$BRANCH; verifying current tree" fi - fi - if [ -n "$BRANCH" ]; then - git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true - git checkout --detach "refs/remotes/origin/$BRANCH" 2>/dev/null \ - || echo "::warning::Could not checkout origin/$BRANCH; verifying current tree" - fi - if make install-dev 2>&1 | tee /tmp/install-dev-verify.log; then - echo "Lockfile resolves cleanly against the agent's pyproject changes." - exit 0 - fi + if make install-dev 2>&1 | tee /tmp/install-dev-verify.log; then + echo "Lockfile resolves cleanly for ${FINDING_ID}." + continue + fi - echo "::error::make install-dev failed against the agent's pyproject changes" + REJECTED=1 + echo "::error::make install-dev failed against the agent's pyproject changes for ${FINDING_ID}" - if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then - MSG="Closed by workflow lockfile verification. \`make install-dev\` failed against the agent's \`pyproject.toml\` changes — the dependency edit does not resolve cleanly. See \`/tmp/install-dev-verify.log\` in the workflow artifact." - gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ - echo "::warning::gh pr close failed" - elif [ -n "$BRANCH" ]; then - git push origin --delete "$BRANCH" || true - else - echo "::warning::No PR number or branch available for cleanup" - fi + if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + MSG="Closed by workflow lockfile verification. \`make install-dev\` failed against the agent's \`pyproject.toml\` changes — the dependency edit does not resolve cleanly. See \`/tmp/install-dev-verify.log\` in the workflow artifact." + gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ + echo "::warning::gh pr close failed" + elif [ -n "$BRANCH" ]; then + git push origin --delete "$BRANCH" || true + else + echo "::warning::No PR number or branch available for cleanup" + fi - FINDING_ID="$FINDING_ID" python3 -c " + FINDING_ID="$FINDING_ID" python3 -c " import json, os finding_id = os.environ['FINDING_ID'] path = '.agentic-ci-state/runner-state.json' @@ -602,7 +623,11 @@ jobs: os.replace(tmp, path) " - echo "rejected=true" >> "$GITHUB_OUTPUT" + done < <(echo "$OPEN_ENTRIES" | jq -c '.[]') + + if [ "$REJECTED" -eq 1 ]; then + echo "rejected=true" >> "$GITHUB_OUTPUT" + fi exit 0 - name: Update runner memory