diff --git a/.agents/recipes/_fix-policy.md b/.agents/recipes/_fix-policy.md new file mode 100644 index 000000000..210fd5435 --- /dev/null +++ b/.agents/recipes/_fix-policy.md @@ -0,0 +1,257 @@ +# Agentic CI Fix Policy + +Prepended to every daily-suite recipe alongside `_runner.md`. Defines what +"open a PR" means for these recipes and the rules that apply across all of +them. Each suite recipe declares only its eligible finding categories, its +branch types, and any risk-specific notes — everything else is here. + +When in doubt, fall back to report-only. + +## Localized fix bar + +A finding may be converted to a fix only if all hold: + +- **Bounded scope**: ≤3 files, ≤50 LOC net. +- **Reversible**: no public API changes, no `__all__` deletions, no version + bumps (Dependabot owns those), no schema changes, no migrations. +- **Self-evident**: the audit established both the problem *and* the unique + correct fix. Mechanical, not interpretive. +- **Test-safe**: when the recipe declares `test_required`, run the + per-package test target for the affected package and abort on failure. + Mapping (the Makefile does not expose `test-` directly): + + | Package directory | Test target | + |-------------------|-------------| + | `packages/data-designer-config` | `make test-config` | + | `packages/data-designer-engine` | `make test-engine` | + | `packages/data-designer` | `make test-interface` | +- **Single concern**: one finding per PR. +- **Allowlisted paths**: matches the suite's path allowlist. + +If the top-ranked candidate fails the bar, try the next. If none of the top +5 qualify, skip the fix step and emit report-only. + +## Allowlists + +### Per-suite path allowlist + +| Suite | Paths the recipe MAY modify | +|-------|-----------------------------| +| docs-and-references | `architecture/**`, `docs/**`, `README.md`, `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md`, `packages/*/src/**/*.py` (docstring-only edits) | +| dependencies | `packages/*/pyproject.toml` | +| structure | `packages/*/src/**/*.py` | +| code-quality | `packages/*/src/**/*.py` | +| test-health | (no fix phase) | + +### Shared forbidden paths (all suites) + +- `.github/workflows/**`, `.agents/**`, repo-root `pyproject.toml`, + `.git/**`, anything in `.gitignore`. + +### Shared forbidden commands + +- `git push --force` (any variant), `git rebase`, `git reset --hard`, + `git branch -D`/`-d`/`--delete`. +- `gh pr merge`, `gh pr close`, `gh pr review`. +- `pip install`, `uv pip install` (use `make install-dev` only). + +## Runner-state schema + +Each daily recipe maintains two arrays in +`{{memory_path}}/runner-state.json` beyond the existing `known_issues` / +`baselines`: + +```json +{ + "fix_backlog": [ + { "id": "", "category": "...", "first_seen": "YYYY-MM-DD", + "last_seen": "YYYY-MM-DD", "data": { /* category fields */ } } + ], + "attempted_fixes": [ + { "id": "", "attempts": [ + { "pr_number": 612, "outcome": "merged", "at": "YYYY-MM-DD", + "branch": "agentic-ci/..." } + ] } + ] +} +``` + +Also: `draft_until_proven` (boolean, per-suite, default `true` for +code-quality and unset elsewhere) controls draft-PR mode. + +### `fix_backlog` rules (audit phase populates this) + +- Append every detected finding in an eligible category. If `id` is already + present, **refresh both `last_seen` and `data`** with the current scan's + values. The `data` field is used by the fix phase to apply the change + without re-scanning, so stale `data` would let an old plan drive a new + PR after the underlying file moved or changed. +- Drop entries with `last_seen` older than 30 days. +- Cap at 200 entries (drop oldest by `first_seen`). +- Populated **before** the `known_issues` filter so fixable findings persist + even when their report row is suppressed for being unchanged. + +### `attempted_fixes` rules + +`outcome` ∈ `{open, merged, closed, abandoned}`. + +- `abandoned` means the recipe could not produce a PR (tests failed, + conflict, lint failed, allowlist rejected, etc.). +- Reconcile against open PRs (`gh pr list`) at the start of each fix + run to recover from crashes that left state un-updated. The + reconciliation algorithm: list open PRs whose bodies contain the + `` marker, parse out + each ``, and back-fill any missing `attempted_fixes` entries with + `outcome: "open"` and the parsed `pr_number` and `branch`. +- Prune: drop `merged` entries older than 90 days. Do **not** prune + `closed` or `abandoned` entries by age — pruning a single-strike entry + would erase the history needed to ever reach the two-strike threshold. +- The 200-entry cap handles long-tail cleanup. Eviction order: + non-two-strike entries first, oldest-first by `attempts[0].at`. + Two-strike entries (≥2 `closed`/`abandoned`) are exempt from cap + eviction unless every other entry has already been evicted — they + represent maintainer-action signals and must not be silently + forgotten. If two-strike entries alone exceed 200, that's itself a + signal worth surfacing; in that pathological case, evict oldest-first + by `attempts[0].at`. +- Two-strike entries surface in the report under + `Repeatedly-failed fix attempts` and are filtered from selection + permanently. + +## Finding hash + +`finding_id = sha1(suite + ":" + canonical_key)[:12]`, where +`canonical_key` uses durable identifiers only — never line numbers or free +text: + +| Suite (category) | canonical_key | +|------------------|---------------| +| docs (broken-link) | `:` | +| docs (docstring-drift) | `:::` | +| docs (arch-ref-rename) | `:` | +| dependencies (transitive-gap) | `::transitive` | +| dependencies (unused) | `::unused` | +| structure (missing-future) | `:missing-future` | +| structure (lazy-import) | `:lazy-import:` | +| code-quality (bare-except) | `::::bare-except` | + +Symbols use fully-qualified Python names. +`try-body-hash` is `sha1()[:8]`. +`ordinal` is the 1-based position of this bare-except among bare-excepts +in the same enclosing symbol, in source order. Both are needed: the body +hash distinguishes most cases, and the ordinal disambiguates the rare +case of two bare-except blocks with byte-identical try bodies. + +## Ranking + +Earlier criteria override later ones: + +1. **Fix confidence** (per-category): + + | Category | Confidence | + |----------|-----------| + | structure / missing-future | 1.0 | + | structure / lazy-import | 0.9 | + | docs / broken-link | 0.9 | + | dependencies / transitive-gap | 0.85 | + | docs / arch-ref-rename | 0.8 | + | dependencies / unused | 0.75 | + | docs / docstring-drift | 0.75 | + | code-quality / bare-except | 0.6 | + +2. **Defect severity**: + + | Severity | Examples | + |----------|----------| + | high | missing transitive dep, heavy import bypassing lazy system | + | medium | broken doc link visible on docs site, bare-except hiding errors, docstring drift on public API | + | low | broken link in dev-notes, missing `__future__ import annotations`, unused dep | + +3. **User-facing impact** — visible to docs-site readers or plugin + consumers vs internal-only. + +4. **Recency** — newer findings rank above long-standing ones. + +Record the chosen finding's id, scores, and rationale at the top of +`/tmp/audit-{{suite}}.md`. + +## Standard fix procedure + +The fix phase of every eligible recipe follows these steps. Suite recipes +declare only the parts that vary (eligible categories, branch type, +`test_required`, suite-specific quirks). + +1. Reconcile `attempted_fixes` against open PRs (`gh pr list`) to recover + any state lost to a prior crash. +2. Filter `fix_backlog`: drop entries whose latest attempt is `open` or + `merged`; surface two-strike entries in the report's + `Repeatedly-failed fix attempts` section and drop them from selection. +3. Rank the remainder per the Ranking section. +4. For each candidate, top 5 max: + 1. Re-verify the finding still applies (re-grep / re-read). If not, + remove from `fix_backlog` and continue. + 2. Apply the fix. If the diff exceeds the localized-fix bar or touches + a non-allowlisted path, abandon and continue. + 3. If the category sets `test_required: true`, run the per-package + test target (see the mapping table in "Localized fix bar" above) + for the package containing the change. On failure: abandon and + continue. + 4. Branch: `agentic-ci//-YYYYMMDD-`. Commit: + `(agentic-ci): `. Push. + 5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the + hidden metadata block: + `` + 6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft` + iff `draft_until_proven` is true for the suite. + 7. `gh pr edit --add-label agentic-ci --add-label agentic-ci/`. + 8. Record `attempted_fixes` entry with `outcome: "open"` and exit. +5. If all 5 candidates were abandoned, append a one-line note to the + report and exit cleanly. The state already reflects the abandonments. + +On any failure mid-flow: record `outcome: "abandoned"` for the chosen +finding (with `pr_number: null`), leave any pushed branch in place +(`pr-stale.yml` will reap it; branch deletion is forbidden), and continue +to the next candidate. + +## PR conventions + +- **Use `gh pr create --body-file`**, not `/create-pr`. The skill is + interactive-only and shells the body inline; CI needs determinism. +- **Title**: conventional, `(agentic-ci): `. +- **Labels**: `agentic-ci`, `agentic-ci/`. +- **Draft PRs**: `code-quality` opens draft until a maintainer flips + `draft_until_proven` to `false` in runner-state, after at least two + non-draft PRs from that suite have landed clean. This flip is + intentionally manual — it is the sole human-gated promotion step in + the fix policy and must not be automated. + +## Atomicity + +Each fix-phase invocation produces exactly one of: + +- **Report-only** — runner-state updated; no branch, commit, or PR. +- **Report + PR** — same, plus a pushed branch, a commit, and a PR. The + `attempted_fixes` entry is recorded *before* the recipe exits. + +No half-states. The runner state is the source of truth for what the +recipe has tried; never silently drop a failed attempt. + +The matrix-level concurrency for the daily workflow uses +`cancel-in-progress: false` so a fix in flight cannot be cancelled +between push and PR open. The trade-off is a queued duplicate run if a +manual dispatch arrives while cron is still going; that's preferable to +orphaned branches with no `attempted_fixes` record. + +## Workflow-level scope gate + +The agent's compliance with the path allowlists and the localized-fix +bar is load-bearing for autonomous PR generation, but the recipe alone +cannot enforce them. The daily workflow runs a post-fix scope gate that +re-derives the per-suite allowlist (mirrored from the table above) and +the diff stats from the pushed branch, then closes the PR and deletes +the remote branch on violation. The gate also flips the +`attempted_fixes` entry from `open` to `abandoned` so two-strike logic +sees the failure. Keep the workflow's allowlist regexes in sync with the +table above; the workflow is the enforcement, the table is the +specification. diff --git a/.agents/recipes/_phase-audit.md b/.agents/recipes/_phase-audit.md new file mode 100644 index 000000000..c2cef775c --- /dev/null +++ b/.agents/recipes/_phase-audit.md @@ -0,0 +1,18 @@ +## Phase directive + +This invocation runs the **AUDIT** phase only. + +- Execute the audit steps from the recipe and write the report to + `/tmp/audit-{{suite}}.md`. +- Update `{{memory_path}}/runner-state.json` with detected findings, + including `fix_backlog` entries per `_fix-policy.md` (populated BEFORE + applying the `known_issues` filter to the report, so fixable findings + persist across runs even when their report row is suppressed). +- Do NOT attempt any fix. Do NOT create any branches, commits, or PRs. +- Do NOT modify any files outside `{{memory_path}}/` and the report file + `/tmp/audit-{{suite}}.md` itself. +- A separate invocation will run the FIX phase if `fix_backlog` has + eligible candidates and the suite has a fix phase. +- Read the recipe in full for context; the "Fix phase" section informs + which finding categories should populate `fix_backlog`, but you must + not act on them in this invocation. diff --git a/.agents/recipes/_phase-fix.md b/.agents/recipes/_phase-fix.md new file mode 100644 index 000000000..44449dd8e --- /dev/null +++ b/.agents/recipes/_phase-fix.md @@ -0,0 +1,29 @@ +## Phase directive + +This invocation runs the **FIX** phase only. + +- The audit phase has already completed in a previous invocation. Its + report is at `/tmp/audit-{{suite}}.md` and + `{{memory_path}}/runner-state.json` has the populated `fix_backlog`. +- Execute only the recipe's "Fix phase" section per `_fix-policy.md`. + Do NOT redo audit work — that is, do NOT re-scan whole packages or + rebuild `fix_backlog` from scratch. The "no re-scan" rule does NOT + override the per-candidate re-verification step required by + `_fix-policy.md` §"Standard fix procedure" step 4.1: when you pick a + candidate, you MUST re-grep / re-read the specific file or symbol it + points at to confirm the finding still applies before editing. + Re-verification of a single candidate is required; re-scanning the + codebase to discover new findings is forbidden. +- Pick the highest-ranked eligible candidate from `fix_backlog`, apply + the fix, run the package's tests if applicable, commit, push, and open + the PR using `gh pr create --body-file`. +- Record the attempt in `attempted_fixes` (whether successful, abandoned, + or failed through the top-5 fallback) before exiting. +- If no candidate qualifies after trying up to 5 of them, exit cleanly, + append a short note to `/tmp/audit-{{suite}}.md` describing what was + tried, and update `attempted_fixes` accordingly. Do NOT open a PR. +- Do NOT delete branches, even on failure (per `_runner.md` and + `_fix-policy.md`). Leave them for the existing `pr-stale.yml` workflow + to reap over time. +- Read the recipe in full for context, but treat the audit phase as + already done. diff --git a/.agents/recipes/_runner.md b/.agents/recipes/_runner.md index 98081279b..9705633d5 100644 --- a/.agents/recipes/_runner.md +++ b/.agents/recipes/_runner.md @@ -76,6 +76,14 @@ Write all output to a temp file (e.g., `/tmp/recipe-output.md`). The workflow will handle posting it. Do not post directly to GitHub - the workflow controls output routing. -If your recipe produces code changes, commit them on a new branch and use -`/create-pr` to open a pull request. The branch name should follow the -pattern `agentic-ci/chore/{suite}-YYYYMMDD`. +If your recipe produces code changes, commit them on a new branch following +the pattern `agentic-ci/{type}/{suite}-YYYYMMDD-{short-slug}` where `{type}` +matches the change kind (`chore`/`docs`/`fix`/`refactor`). + +For PR creation in CI, use `gh pr create --body-file /tmp/pr-body-.md` +directly rather than the `/create-pr` skill. The skill assumes an interactive +session (it can prompt about uncommitted changes, base branch, etc.) and +shells the body inline, which breaks on backticks and special characters. +Daily-suite recipes that open PRs are governed by `_fix-policy.md` — read it +for the full PR contract (allowlists, draft mode, hidden metadata, branch +naming, atomicity). diff --git a/.agents/recipes/code-quality/recipe.md b/.agents/recipes/code-quality/recipe.md index ab8fcb793..268e0adff 100644 --- a/.agents/recipes/code-quality/recipe.md +++ b/.agents/recipes/code-quality/recipe.md @@ -35,6 +35,15 @@ Read `{{memory_path}}/runner-state.json` for baselines from previous runs re-reporting known issues. Flag metrics that are trending in the wrong direction compared to the previous baseline. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected bare-except +finding *before* the `known_issues` filter applies. (Other categories +remain report-only and do not enter `fix_backlog`.) + +The `draft_until_proven` flag in runner-state controls whether this +suite's PRs are opened as draft. Default `true` until a maintainer flips +it to `false`. + ## Instructions ### 1. Complexity hotspots @@ -64,8 +73,10 @@ Check for patterns that violate the project's "errors normalize at boundaries" principle (AGENTS.md): ```bash -# Bare except clauses (should use specific exception types) -grep -rn "except:" packages/*/src/ --include='*.py' | grep -v "# noqa" +# Bare except clauses (should use specific exception types). +# Catches both `except:` and `except BaseException:` — both swallow +# everything including KeyboardInterrupt and SystemExit. +grep -rnE "except\s*:|except\s+BaseException" packages/*/src/ --include='*.py' | grep -v "# noqa" # Swallowed exceptions (except + pass/continue with no logging) grep -rn -A1 "except" packages/*/src/ --include='*.py' | grep -B1 "pass$\|continue$" @@ -238,9 +249,51 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| bare-except | `refactor` | yes | Replace `except:` / `except BaseException:` with the specific exception type. Eligible only when grep across the try-block confirms **exactly one** exception type is plausibly raised, verified by inspecting the called functions or imported library docs. Multiple plausible types → ineligible. Test files are excluded (different exception-handling standards). | + +`fix_backlog.data` should record the proposed replacement exception type +and the grep evidence used to determine it. Within bare-except findings, +prefer ones in user-facing modules (`packages/data-designer/src/`) over +internal helpers (the ranking impact criterion handles this once +`data.user_facing` is set). + +The PR body should include the before/after of the try-block plus the +grep evidence that justified the chosen exception type, and a note that +the PR is draft until landing rate is proven (ask reviewers to mark +ready-for-review if the change is correct). + +**Draft mode**: this suite opens PRs as draft until a maintainer flips +`draft_until_proven` to `false` in runner-state, after at least two +non-draft PRs have landed clean. Bare-except narrowing is the most +inference-heavy fix in any suite (confidence 0.6); recipe judgement has +to be earned before promotion. Two-strike findings here are an +especially important signal — they suggest the detector is producing +false positives in an already-cautious category. + +**Not eligible** — stays report-only: + +- Complexity refactors, type annotation additions, exception hierarchy + normalization (judgement-heavy). +- **TODO line deletion** — the audit's "looks done" judgement is not + mechanical enough to delete code on. Deletion is forbidden. + ## Constraints -- Do not modify any files. This is a read-only audit. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify paths in the suite's path allowlist + (`packages/*/src/**/*.py`). Test files are excluded. +- **TODO line deletion is forbidden.** The audit phase still inventories + TODOs, but the fix phase does not act on them. +- Bare-except narrowing is only eligible when the exception type is + unambiguous. When in doubt, skip. - Do not flag test files for type coverage or exception hygiene. Tests have different standards. - Do not duplicate ruff checks (W, F, I, ICN, PIE, TID, UP*). Those are diff --git a/.agents/recipes/dependencies/recipe.md b/.agents/recipes/dependencies/recipe.md index 074e23857..bc72ee805 100644 --- a/.agents/recipes/dependencies/recipe.md +++ b/.agents/recipes/dependencies/recipe.md @@ -26,6 +26,10 @@ dependency versions. After the audit, update `known_issues` and `baselines.dependency_versions` with the current state. Skip reporting issues that already appear in `known_issues`. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected finding *before* +the `known_issues` filter applies. + ## Instructions ### 1. Inventory current dependencies @@ -154,12 +158,43 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| transitive-gap | `chore` | yes | Add the imported module to `[project.dependencies]` of the package that imports it, copying the version specifier from a sibling package that already declares it. Insert in alphabetical order; match existing quote/specifier style. **Ineligible** when no sibling package declares the dep — choosing a specifier from scratch is interpretive, not mechanical. Those findings stay report-only and surface for maintainer judgement. | +| unused | `chore` | yes | Remove the declaration. Eligible only when grep across the package's `src/`, lazy-import system, plugin entry points, and tests turns up zero references. | + +`fix_backlog.data` should record: for transitive-gap, the importing source +files and the sibling package whose specifier was copied (the recipe +must record this *during the audit*; the fix phase rejects entries with +no sibling source). For unused, which other packages also declare the +dep. + +Before running the per-package test target (see `_fix-policy.md` for the +mapping), run `make install-dev` to confirm the lockfile resolves cleanly. +`make install-dev` is the only sanctioned install command (no direct +`pip install` or `uv pip install`). + +**Not eligible** — stays report-only: + +- Cross-package version reconciliation, version pinning concerns + (judgement-heavy). +- CVE response (Dependabot's job). + ## Constraints -- Do not modify any files. This is a read-only audit. -- Do not install packages or run `pip install`. Only inspect `pyproject.toml` - and source files. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify `packages/*/pyproject.toml`. The + repo-root `pyproject.toml` is forbidden. +- `make install-dev` is the only sanctioned install command. Do not + invoke `pip install` or `uv pip install` directly. - Do not run `pip audit` (may not be available on the runner). Focus on structural dependency analysis, not CVE scanning (Dependabot handles that). - Do not recommend changes to dependencies you haven't verified are actually problematic. False positives erode trust in the audit. +- Version pinning changes are explicitly out of scope for the fix phase. diff --git a/.agents/recipes/docs-and-references/recipe.md b/.agents/recipes/docs-and-references/recipe.md index f8a7073dc..45a54b324 100644 --- a/.agents/recipes/docs-and-references/recipe.md +++ b/.agents/recipes/docs-and-references/recipe.md @@ -26,6 +26,11 @@ After completing the audit, update the file with any new findings (add to `known_issues` array with a short hash of the finding). Skip reporting issues that already appear in `known_issues`. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected finding *before* +the `known_issues` filter applies, so fixable findings persist across runs +even when their report row is suppressed for being unchanged. + ## Instructions ### 1. Docstring vs signature drift @@ -147,9 +152,30 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| broken-link | `docs` | no | Only when the corrected target is unambiguous (exact-match file at a different path, or a single similar anchor). Multiple candidates → ineligible. | +| docstring-drift | `docs` | yes | Purely signature-driven `Args:`/`Returns:`/`Raises:` updates. Rename a param to its current name, drop entries for removed params, add placeholder entries for added params (note the signature; do not invent semantic descriptions). | +| arch-ref-rename | `docs` | no | Only when grep confirms the old symbol is gone and exactly one similarly-named new symbol exists at the same role. | + +`fix_backlog.data` should carry whatever the fix step needs without +re-scanning: the proposed target for broken-link, the signature-vs-Args +delta for docstring-drift, the new symbol name for arch-ref-rename. + +All other audit categories (docs-site rewrites, dev-note edits, external +URL breakage) stay report-only. + ## Constraints -- Do not modify any files. This is a read-only audit. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify paths in the suite's path allowlist. + See `_fix-policy.md` for the shared command/path baseline. - Do not read file contents unless needed to verify a specific reference. Use `grep` and `head` for targeted checks rather than reading entire files. - Skip vendored or generated files. diff --git a/.agents/recipes/issue-triage/recipe.md b/.agents/recipes/issue-triage/recipe.md index 396b0667a..3536c4b82 100644 --- a/.agents/recipes/issue-triage/recipe.md +++ b/.agents/recipes/issue-triage/recipe.md @@ -1,6 +1,6 @@ --- name: issue-triage -description: Weekly triage of open issues and PRs - classify, verify, detect staleness, duplicates, and cross-reference +description: Weekly triage of open issues and PRs - decision-ready report organized by recommended action trigger: schedule tool: claude-code timeout_minutes: 15 @@ -14,7 +14,9 @@ permissions: # Repository Triage Triage all open issues and pull requests in this repository, then post a -combined report to the tracking issue. +decision-ready report to the tracking issue. The report is organized by +**recommended action** so a maintainer can resolve flagged items without +opening each one. ## Instructions @@ -23,164 +25,184 @@ combined report to the tracking issue. Collect all open issues, open PRs, and recent merge activity: ```bash -# All open issues with metadata gh issue list --state open --limit 200 \ --json number,title,state,createdAt,updatedAt,labels,assignees,author,body -# All open PRs with metadata gh pr list --state open --limit 200 \ --json number,title,state,createdAt,updatedAt,labels,author,headRefName,body -# Recently merged PRs (last 60 days) to cross-reference gh pr list --state merged --limit 100 \ --json number,title,headRefName,body,mergedAt -# PR check status for open PRs +# Failing-check counts for open PRs for pr in $(gh pr list --state open --json number --jq '.[].number'); do - echo "=== PR #${pr} ===" - gh pr checks "$pr" --json name,state --jq '[.[] | select(.state == "FAILURE" or .state == "ERROR")] | length' + FAILING=$(gh pr checks "$pr" --json name,state \ + --jq '[.[] | select(.state == "FAILURE" or .state == "ERROR")] | length') + echo "${pr} ${FAILING}" done ``` -### 2. Triage issues +### 2. Decide an action for every flagged item -For each open issue, determine: +For each open issue and PR, decide whether it needs maintainer action and, if +so, which **action bucket** it belongs in. Buckets are exclusive — every +flagged item appears under exactly one heading. -**Classification** (pick one): -- `bug` - something is broken -- `feature` - new capability or enhancement -- `chore` - maintenance, CI, docs, refactoring -- `discussion` - needs design input or decision before work starts +Buckets and the criteria for each: -**Staleness** (based on last update, today's date, and activity): -- `active` - updated within the last 14 days -- `aging` - updated 14-30 days ago -- `stale` - no update for 30+ days +| Bucket | Apply when | +|--------|-----------| +| `Close as resolved` | A merged PR closes the issue via `Fixes/Closes/Resolves #N`, OR a merged PR's title/branch/body strongly indicates it addressed the issue. The issue is still open. | +| `Close as duplicate` | An older open issue covers the same scope. Pick the older issue as the canonical one. | +| `Needs maintainer decision` | Issue labeled `discussion`, design-input items with no clear scope, or items labeled `needs-attention` (flagged by the stale-PR workflow because their linked PR was auto-closed). | +| `Ready for assignment` | Well-scoped issue, no assignee, no linked open PR, not stale (updated within 30 days). Brief enough that someone could pick it up today. | +| `Stuck PR` | Open PR with one or more failing checks, OR no author activity (push/comment) for 14+ days. | +| `Duplicate PRs` | Two or more open PRs reference the same issue (`Fixes/Closes/Resolves #N`). | +| `Stale, consider closing` | 60+ days since last activity, no assignee, no linked open PR. Older than `Ready for assignment` and without traction. | -**Verification** - check if the issue has been addressed: -- Search merged PRs for closing keywords (`Fixes #N`, `Closes #N`, `Resolves #N`) - referencing this issue -- Search merged PR titles and branches for keywords matching the issue -- If a merged PR appears to fix the issue, flag it as `potentially resolved` -- If there is an open PR linked to the issue, note the PR number +Items that don't fit any of the above are **healthy** — count them but do +not list them in the action sections. -**Labels as signals** - issues with `needs-attention` were flagged by the stale -PR workflow because their linked PR was auto-closed. Always include these in the -"Action needed" section. +Also check `attempted_fixes` in any daily-suite runner-state files (under +`.agentic-ci-state/` if accessible) — findings with two `closed` or +`abandoned` attempts are surfaced in their own section so the maintainer +sees them alongside other action items. (This section may be empty if +those state files are not available in the triage run; that's fine.) -**Duplicates / related** - flag issues that overlap in scope or description. +### 3. Build the report -### 3. Triage PRs +Write each part to a numbered file: `/tmp/issue-triage-report-1.md`, +`/tmp/issue-triage-report-2.md`, etc. Single-part reports use +`/tmp/issue-triage-report-1.md`. The workflow's fallback step looks for +numbered files first. -For each open PR, determine: +Format: -**Health flags** (check all that apply): -- `no-issue` - PR body has no `Fixes/Closes/Resolves #N` reference (external - contributors only - collaborators are exempt) -- `issue-closed` - PR links to an issue that is already closed (by another PR - or manually) -- `checks-failing` - PR has failing CI checks -- `stale` - no author activity (push or comment) for 14+ days with failing - checks -- `duplicate-fix` - another open PR references the same issue +````markdown + +## Repository Triage Report — YYYY-MM-DD -**Cross-reference** - for each PR that references an issue: -- Verify the linked issue exists and is open -- Check if another open or merged PR also references the same issue -- If two open PRs fix the same issue, flag both as `duplicate-fix` +**Open issues:** N | **Open PRs:** N | **Healthy (no action needed):** N -### 4. Build the report +--- -Write the combined report to `/tmp/issue-triage-report.md` using this format: +### Close as resolved (M) -```markdown - -## Repository Triage Report +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #123 | ... | Close | Merged in #456 | PR title says "fix ..." matching issue scope | -**Run date:** YYYY-MM-DD -**Open issues:** N | **Open PRs:** N +### Close as duplicate (M) ---- +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #234 | ... | Close, point to #200 | Overlaps #200 (older) | Both describe the same crash on empty config | -### Issues: action needed +### Needs maintainer decision (M) -Issues that need maintainer attention (potentially resolved, stale with no -assignee, possible duplicates, needs-attention label). +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #345 | ... | Decide direction | `discussion` label, no consensus | Two competing approaches in comments | -| # | Title | Category | Staleness | Flag | Notes | -|---|-------|----------|-----------|------|-------| +### Ready for assignment (M) -### Issues: active work +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #456 | ... | Assign | Scope clear, no assignee | One-line repro, fix likely <50 LOC | -Issues with assignees or linked open PRs. +### Stuck PR (M) -| # | Title | Category | Assignee | PR | Last updated | -|---|-------|----------|----------|-----|-------------| +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #567 | ... | Nudge author or close | 3 failing checks, 21d since push | DCO + lint failing, author hasn't responded | -### Issues: backlog +### Duplicate PRs (M) -Remaining open issues, ordered by staleness (most stale first). +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #678 / #679 | both fix #500 | Pick one, close other | Both reference #500 | #678 has tests, #679 is simpler | -| # | Title | Category | Staleness | Last updated | -|---|-------|----------|-----------|-------------| +### Stale, consider closing (M) ---- +| # | Title | Action | Evidence | Rationale | +|---|-------|--------|----------|-----------| +| #789 | ... | Close with note | 87d no activity, no assignee | No traction; linked design discussion went silent | -### PRs: action needed +### Repeatedly-failed fix attempts (M) -PRs with health flags that need maintainer attention. +(Only emit this section if any items qualify. See `_fix-policy.md` — +two-strike escalation.) -| # | Title | Author | Flags | Notes | -|---|-------|--------|-------|-------| +| Finding | Suite | Attempts | Notes | +|---------|-------|----------|-------| +| ... | docs-and-references | 2 closed | Detector may be flagging a false positive | -### PRs: healthy +--- -Open PRs with no flags. +
+Healthy items (M issues, M PRs) -| # | Title | Author | Linked issue | Last updated | -|---|-------|--------|-------------|-------------| +(One-line summary of each: `#N — <author> — <last update>`. No +action needed; this block is for completeness.) ---- +</details> ### Summary -**Issues:** -- N triaged, N flagged for action, N active, N backlog -- Flags: X potentially resolved, Y stale, Z duplicates +- N items flagged for action across 7 buckets +- M PRs flagged (X stuck, Y duplicate) +- K healthy items (collapsed above) +```` -**PRs:** -- N triaged, N flagged -- Flags: X no linked issue, Y checks failing, Z stale, W duplicate fixes -``` +The marker on the first line (`<!-- agentic-ci-issue-triage:1/N -->`) is +required. If the report fits in one comment, set N = 1. + +### 4. Multi-comment split + +GitHub issue comments cap at 65,536 characters. Use a 60,000-char per-part +budget to leave room for body manipulations. + +Build the parts: + +1. Render the full report. If `len(body) <= 60000`, you have one part. Use + marker `<!-- agentic-ci-issue-triage:1/1 -->`. +2. Otherwise, split on **action-bucket boundaries** (never split a table + mid-row). Each part starts with its own marker + `<!-- agentic-ci-issue-triage:i/N -->` and a heading + `### Triage Report — Part i of N`. +3. Place the summary and `Healthy items` `<details>` block at the end of + the last part. ### 5. Post the report -Find the tracking issue number from the `ISSUE_TRIAGE_TRACKING_ISSUE` -environment variable. Find the last comment by `github-actions[bot]` that -contains `<!-- agentic-ci-issue-triage -->` and note its ID. +Tracking issue number is in `ISSUE_TRIAGE_TRACKING_ISSUE`. List all +existing bot comments containing `agentic-ci-issue-triage:` (in id +order) and reconcile against the new parts: -- If a previous comment exists, **edit it in place** using - `gh api -X PATCH repos/{owner}/{repo}/issues/comments/{id}`. -- If no previous comment exists, post a new comment using `gh issue comment`. +- For `i in 0..min(len(existing), len(parts))`: PATCH `existing[i]` + with `parts[i]` (`gh api -X PATCH .../comments/<id> -f body=...`). +- Surplus parts: post via `gh issue comment --body-file`. +- Surplus existing comments: delete via + `gh api -X DELETE .../comments/<id>`. -```bash -# Edit existing comment -gh api -X PATCH "repos/${GITHUB_REPOSITORY}/issues/comments/${COMMENT_ID}" \ - -f body="$(cat /tmp/issue-triage-report.md)" +This keeps the report a coherent set across runs whether it grows, +shrinks, or stays stable. -# Or post new comment -gh issue comment "$TRACKING_ISSUE" --body-file /tmp/issue-triage-report.md -``` +### 6. Fallback + +If you cannot find the tracking issue or the API calls fail repeatedly, +write the report parts to `/tmp/issue-triage-report-*.md` and stop. The +workflow's fallback step posts every numbered part in order if no +agent-authored comments containing today's date already exist on the +tracking issue. ## Constraints -- **Read-only triage.** Do not close, label, or modify any issues or PRs. The - report is for maintainers to act on. -- **Do not post the report yourself if you cannot find the tracking issue.** - Write the report to `/tmp/issue-triage-report.md` and stop. The workflow - will handle fallback posting. -- **Stay concise.** Notes columns should be one sentence max. Link to the - relevant PR, issue, or duplicate - don't explain the fix. +- **Read-only triage.** Do not close, label, or modify any issues or PRs. + The report is for maintainers to act on. +- **Stay concise.** Rationale columns should be one sentence max. +- **No fix authority.** This recipe never opens PRs or commits code. It + reads, classifies, and posts a report. - **Cost awareness.** Do not read full issue/PR bodies unless needed to - determine duplicates or verify cross-references. The metadata from - `gh issue list` and `gh pr list` is enough for most checks. + determine duplicates, verify cross-references, or decide an action. The + metadata from `gh issue list` / `gh pr list` is enough for most checks. diff --git a/.agents/recipes/structure/recipe.md b/.agents/recipes/structure/recipe.md index 2dc793d54..3df885ecd 100644 --- a/.agents/recipes/structure/recipe.md +++ b/.agents/recipes/structure/recipe.md @@ -51,6 +51,10 @@ regardless of test coverage. Read `{{memory_path}}/runner-state.json` for known issues from previous runs. Update after the audit. Skip re-reporting known issues. +This recipe also maintains `fix_backlog` and `attempted_fixes` per +`_fix-policy.md`. Update `fix_backlog` for every detected finding *before* +the `known_issues` filter applies. + ## Instructions ### 1. Import boundary violations @@ -208,15 +212,35 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits: + +### Eligible categories + +| Category | Branch type | test_required | Eligibility note | +|----------|-------------|---------------|------------------| +| missing-future | `chore` | yes | Insert `from __future__ import annotations` after the SPDX header block, before other imports. Fully deterministic. Tests required because `__future__` annotations can affect introspection-heavy code paths. | +| lazy-import | `refactor` | yes | Move a top-level heavy import (pandas/numpy/polars/torch/duckdb/sqlfluff/faker) to the `data_designer.lazy_heavy_imports` accessor pattern. Eligible only when (a) file is under `packages/*/src/`, (b) the module is already wired in the lazy system, (c) the heavy module is used only inside function bodies. | + +**Not eligible** — stays report-only: + +- Import boundary violations (architectural judgement). +- Dead exports (audit labels them "potentially dead"; external plugin + consumers may use them). + ## Constraints -- Do not modify any files. This is a read-only audit. +- Outside the fix phase, this recipe is read-only — do not modify files. +- Within the fix phase, only modify paths in the suite's path allowlist. + See `_fix-policy.md` for the shared command/path baseline. - Imports inside `if TYPE_CHECKING:` blocks are allowed and should not be flagged for any check. - Lazy imports in `__init__.py` (via `__getattr__`) are deferred and should not be treated as violations. - Dead export detection has false positives. Mark uncertain cases as - "potentially dead" rather than definitively dead. + "potentially dead" rather than definitively dead. **Do not auto-remove + them** — they are not in the fix-eligible list. - Always cite which rule or doc is violated so maintainers can verify. - Import boundaries are currently clean. No findings in that section is normal and expected. diff --git a/.agents/recipes/test-health/recipe.md b/.agents/recipes/test-health/recipe.md index 1fe82c4a8..2224684ff 100644 --- a/.agents/recipes/test-health/recipe.md +++ b/.agents/recipes/test-health/recipe.md @@ -317,9 +317,29 @@ Write the report to `/tmp/audit-{{suite}}.md`: If no findings in any category, write `NO_FINDINGS` on the first line instead. +## Fix phase + +**This suite has no fix phase.** All categories — coverage gaps, hollow +tests, import perf regressions, smoke check failures, test isolation +violations — stay report-only. + +Rationale: the categories that look mechanical (rewriting hollow +`assert ... is not None` checks, adding missing test files, fixing +test-isolation violations) all require inferring intent or authoring new +code. The audit phase only commits to *flagging* these conservatively; +turning that into authored test changes is beyond the "self-evident" +bar in `_fix-policy.md`. + +A future revision may add **test-isolation violations** (config tests +importing engine, engine tests importing interface) as an eligible fix +category — those are mechanical (replace the cross-boundary import with +a test-local equivalent or fixture). Add only after `code-quality` (the +other inference-heavy suite) has its draft-PR landing rate proven and +`draft_until_proven` flipped to `false`. + ## Constraints -- Do not modify any test files. This is a read-only audit. +- This recipe is fully read-only — do not modify any files at all. - Do not run the full test suite or coverage tool. Analysis is based on file structure and static inspection, not execution. - Be conservative with hollow test detection. Only flag tests you've read diff --git a/.github/workflows/agentic-ci-daily.yml b/.github/workflows/agentic-ci-daily.yml index d04a5d839..247a13122 100644 --- a/.github/workflows/agentic-ci-daily.yml +++ b/.github/workflows/agentic-ci-daily.yml @@ -62,15 +62,19 @@ jobs: needs: determine-suite if: needs.determine-suite.outputs.suites != '[]' runs-on: [self-hosted, agentic-ci] - timeout-minutes: 20 + timeout-minutes: 40 strategy: fail-fast: false matrix: suite: ${{ fromJSON(needs.determine-suite.outputs.suites) }} concurrency: + # cancel-in-progress is intentionally false: a cancellation between + # the agent's git push and gh pr create would leave an orphaned + # branch with no attempted_fixes record. Queueing a duplicate run is + # the lesser evil. See _fix-policy.md "Atomicity". group: agentic-ci-daily-${{ matrix.suite }} - cancel-in-progress: true + cancel-in-progress: false steps: - name: Check required config @@ -127,6 +131,11 @@ jobs: python -m venv /tmp/graphify-venv /tmp/graphify-venv/bin/python -m pip install graphifyy==0.4.23 --quiet 2>&1 | tail -3 + - name: Configure git identity + run: | + git config user.email "41898282+github-actions[bot]@users.noreply.github.com" + git config user.name "github-actions[bot]" + - name: Pre-flight checks env: ANTHROPIC_BASE_URL: ${{ secrets.AGENTIC_CI_API_BASE_URL }} @@ -173,11 +182,13 @@ jobs: exit 1 fi - # Build prompt: _runner.md + recipe body (strip YAML frontmatter) + # Build prompt: phase directive + _runner.md + _fix-policy.md + recipe body (strip YAML frontmatter) + PHASE_DIRECTIVE=$(cat .agents/recipes/_phase-audit.md) RUNNER_CTX=$(cat .agents/recipes/_runner.md) + FIX_POLICY=$(cat .agents/recipes/_fix-policy.md) RECIPE_BODY=$(sed '1,/^---$/{ /^---$/,/^---$/d }' "${RECIPE_DIR}/recipe.md") - PROMPT=$(printf '%s\n\n%s\n' "${RUNNER_CTX}" "${RECIPE_BODY}" \ + PROMPT=$(printf '%s\n\n%s\n\n%s\n\n%s\n' "${PHASE_DIRECTIVE}" "${RUNNER_CTX}" "${FIX_POLICY}" "${RECIPE_BODY}" \ | sed "s|{{suite}}|${SUITE}|g" \ | sed "s|{{date}}|$(date -u +%Y-%m-%d)|g" \ | sed "s|{{memory_path}}|.agentic-ci-state|g") @@ -190,13 +201,442 @@ jobs: --verbose \ 2>&1 | tee /tmp/claude-audit-log.txt + - name: Check fix backlog + id: backlog + if: steps.audit.outcome == 'success' && matrix.suite != 'test-health' + run: | + BACKLOG_SIZE=$(jq '.fix_backlog // [] | length' .agentic-ci-state/runner-state.json 2>/dev/null || echo 0) + echo "size=${BACKLOG_SIZE}" >> "$GITHUB_OUTPUT" + echo "fix_backlog has ${BACKLOG_SIZE} entries" + + - name: Snapshot pre-fix attempted_fixes + # Captures (id, attempts-length) pairs before the fix step runs so + # the post-fix gates can identify which entry grew during *this* + # run, instead of grabbing the last globally-open entry (which + # might be a stale orphan from a prior crashed run). + id: snapshot + if: steps.audit.outcome == 'success' && steps.backlog.outcome == 'success' && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 + run: | + jq -c '.attempted_fixes // [] | map({id, n: (.attempts | length)})' \ + .agentic-ci-state/runner-state.json > /tmp/prior-attempted-fixes.json + echo "Snapshot: $(cat /tmp/prior-attempted-fixes.json)" + + - name: Run fix recipe + id: fix + # Custom if: bypasses implicit success(), so snapshot.outcome must + # be checked explicitly. Without it, a snapshot failure (corrupt + # runner-state, disk error) would leave /tmp/prior-attempted-fixes.json + # missing, the scope gate's jq --slurpfile would short-circuit, and + # the gate would exit 0 — silently approving the agent's PR. + if: steps.audit.outcome == 'success' && steps.backlog.outcome == 'success' && steps.snapshot.outcome == 'success' && matrix.suite != 'test-health' && fromJSON(steps.backlog.outputs.size || '0') > 0 + env: + ANTHROPIC_BASE_URL: ${{ secrets.AGENTIC_CI_API_BASE_URL }} + ANTHROPIC_API_KEY: ${{ secrets.AGENTIC_CI_API_KEY }} + AGENTIC_CI_MODEL: ${{ vars.AGENTIC_CI_MODEL }} + DISABLE_PROMPT_CACHING: "1" + GH_TOKEN: ${{ github.token }} + GITHUB_REPOSITORY: ${{ github.repository }} + SUITE: ${{ matrix.suite }} + run: | + set -o pipefail + + RECIPE_DIR=".agents/recipes/${SUITE}" + + # Build prompt: phase directive + _runner.md + _fix-policy.md + recipe body (strip YAML frontmatter) + PHASE_DIRECTIVE=$(cat .agents/recipes/_phase-fix.md) + RUNNER_CTX=$(cat .agents/recipes/_runner.md) + FIX_POLICY=$(cat .agents/recipes/_fix-policy.md) + RECIPE_BODY=$(sed '1,/^---$/{ /^---$/,/^---$/d }' "${RECIPE_DIR}/recipe.md") + + PROMPT=$(printf '%s\n\n%s\n\n%s\n\n%s\n' "${PHASE_DIRECTIVE}" "${RUNNER_CTX}" "${FIX_POLICY}" "${RECIPE_BODY}" \ + | sed "s|{{suite}}|${SUITE}|g" \ + | sed "s|{{date}}|$(date -u +%Y-%m-%d)|g" \ + | sed "s|{{memory_path}}|.agentic-ci-state|g") + + stdbuf -oL -eL claude \ + --model "$AGENTIC_CI_MODEL" \ + -p "$PROMPT" \ + --max-turns 50 \ + --output-format stream-json \ + --verbose \ + 2>&1 | tee /tmp/claude-fix-log.txt + + - name: Validate fix scope (allowlist + LOC + file cap) + # Workflow-level enforcement of the localized-fix bar from + # _fix-policy.md. Recipe instructions alone cannot bind the agent; + # this gate re-derives the diff and closes the PR if the agent + # escaped the allowlist or the LOC/file caps. The docs-and-references + # suite additionally gets AST-based docstring-only enforcement on + # .py edits (no non-docstring/non-comment lines may change). + id: scope_gate + # Belt-and-suspenders: also check snapshot succeeded. The fix step + # already gates on snapshot, so if we got here the snapshot ran, + # but be explicit so a future condition refactor cannot regress. + if: steps.fix.outcome == 'success' && steps.snapshot.outcome == 'success' + env: + SUITE: ${{ matrix.suite }} + GH_TOKEN: ${{ github.token }} + run: | + set -o pipefail + + # Identify every attempted_fixes entry that grew during *this* run + # (vs the pre-fix snapshot), not just the last globally-open entry. + OPEN_ENTRIES=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' + (($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p + | [ + .attempted_fixes // [] + | .[] + | select( + ((.attempts | last | .outcome) == "open") + and ((.attempts | length) > ($p[.id] // 0)) + ) + ] + ' .agentic-ci-state/runner-state.json) + OPEN_COUNT=$(echo "$OPEN_ENTRIES" | jq 'length') + if [ "$OPEN_COUNT" -eq 0 ]; then + echo "No new open attempted_fix recorded by this run; nothing to validate." + exit 0 + fi + + echo "Validating ${OPEN_COUNT} new open attempted_fix entries." + REJECTED=0 + + while IFS= read -r OPEN; do + BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') + PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') + FINDING_ID=$(echo "$OPEN" | jq -r '.id') + DIFF_REF="" + REASONS="" + + if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) + if [ -n "$BRANCH" ]; then + echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + fi + fi + if [ -z "$BRANCH" ]; then + REASONS="${REASONS}- open attempt has no branch and could not be recovered from PR ${PR_NUMBER:-unknown}\n" + fi + + # Diff against the actual pushed branch (origin/$BRANCH), not + # local HEAD — HEAD may not match what was pushed if the agent + # left the working tree in an unexpected state. + if [ -n "$BRANCH" ]; then + git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true + if git rev-parse --verify "refs/remotes/origin/$BRANCH" >/dev/null 2>&1; then + DIFF_REF="origin/$BRANCH" + else + echo "::warning::origin/$BRANCH not fetchable; falling back to HEAD" + DIFF_REF="HEAD" + fi + fi + + case "$SUITE" in + docs-and-references) + ALLOW='^(architecture/|docs/|README\.md$|CONTRIBUTING\.md$|DEVELOPMENT\.md$|STYLEGUIDE\.md$|packages/[^/]+/src/.*\.py$)' + ;; + dependencies) + ALLOW='^packages/[^/]+/pyproject\.toml$' + ;; + structure|code-quality) + ALLOW='^packages/[^/]+/src/.*\.py$' + ;; + *) + echo "::error::No allowlist defined for suite: $SUITE" + exit 1 + ;; + esac + + if [ -n "$BRANCH" ]; then + mapfile -t FILE_ARR < <(git diff --name-only "origin/main...$DIFF_REF") + FILE_COUNT=${#FILE_ARR[@]} + if [ "$FILE_COUNT" -gt 0 ]; then + BAD=$(printf '%s\n' "${FILE_ARR[@]}" | grep -vE "$ALLOW" | grep -v '^$' || true) + else + BAD="" + fi + LOC_DELTA=$(git diff --shortstat "origin/main...$DIFF_REF" \ + | awk '{ a=0; d=0; for (i=1;i<=NF;i++) { if ($i ~ /insertion/) a=$(i-1); if ($i ~ /deletion/) d=$(i-1) } print a+d }') + : "${LOC_DELTA:=0}" + else + FILE_ARR=() + FILE_COUNT=0 + BAD="" + LOC_DELTA=0 + fi + + if [ -n "$BAD" ]; then + REASONS="${REASONS}- files outside allowlist:\n$(echo "$BAD" | sed 's/^/ - /')\n" + fi + if [ "$FILE_COUNT" -gt 3 ]; then + REASONS="${REASONS}- file count ($FILE_COUNT) exceeds 3-file cap\n" + fi + if [ "$LOC_DELTA" -gt 50 ]; then + REASONS="${REASONS}- LOC delta ($LOC_DELTA) exceeds 50-line cap\n" + fi + + # Docs suite: AST-enforce the docstring-only caveat on .py edits. + if [ "$SUITE" = "docs-and-references" ] && [ "$FILE_COUNT" -gt 0 ]; then + PY_FILES=$(printf '%s\n' "${FILE_ARR[@]}" | grep -E '^packages/[^/]+/src/.*\.py$' || true) + if [ -n "$PY_FILES" ]; then + NON_DOCSTRING=$(PY_FILES="$PY_FILES" DIFF_REF="$DIFF_REF" python3 - <<'PY' + import ast + import os + import re + import subprocess + import sys + + files = [p for p in os.environ['PY_FILES'].splitlines() if p] + diff_ref = os.environ['DIFF_REF'] + violations = [] + hunk_re = re.compile(r'@@ -(?P<old>\d+)(?:,\d+)? \+(?P<new>\d+)(?:,\d+)? @@') + + try: + base_ref = subprocess.check_output( + ['git', 'merge-base', 'origin/main', diff_ref], text=True + ).strip() + except subprocess.CalledProcessError: + violations.append(f'could not compute merge base for origin/main...{diff_ref}') + base_ref = 'origin/main' + + def collect_docstring_lines(ref, path, *, missing_ok=False): + try: + content = subprocess.check_output( + ['git', 'show', f'{ref}:{path}'], text=True, errors='replace' + ) + except subprocess.CalledProcessError: + if missing_ok: + return set() + violations.append(f'{path}: file missing at {ref}') + return None + try: + tree = ast.parse(content) + except SyntaxError as e: + violations.append(f'{path}: parse error at {ref} ({e})') + return None + + lines = set() + for node in ast.walk(tree): + if isinstance(node, (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)): + body = getattr(node, 'body', None) or [] + if (body and isinstance(body[0], ast.Expr) + and isinstance(body[0].value, ast.Constant) + and isinstance(body[0].value.value, str)): + start, end = body[0].lineno, body[0].end_lineno + if start and end: + lines.update(range(start, end + 1)) + return lines + + for path in files: + old_docstring_lines = collect_docstring_lines(base_ref, path, missing_ok=True) + new_docstring_lines = collect_docstring_lines(diff_ref, path) + if new_docstring_lines is None: + violations.append(f'{path}: file deleted or unreadable at {diff_ref}') + continue + if old_docstring_lines is None: + continue + try: + hunks = subprocess.check_output( + ['git', 'diff', '-U0', f'origin/main...{diff_ref}', '--', path], + text=True, errors='replace', + ) + except subprocess.CalledProcessError: + violations.append(f'{path}: could not compute diff') + continue + old_cur = None + new_cur = None + for line in hunks.splitlines(): + if line.startswith('@@'): + match = hunk_re.match(line) + if match: + old_cur = int(match.group('old')) + new_cur = int(match.group('new')) + else: + old_cur = None + new_cur = None + continue + if old_cur is None or new_cur is None: + continue + if line.startswith('+') and not line.startswith('+++'): + stripped = line[1:].strip() + ln = new_cur + new_cur += 1 + if not stripped or stripped.startswith('#'): + continue + if ln not in new_docstring_lines: + violations.append(f'{path}:{ln} added outside docstring') + elif line.startswith('-') and not line.startswith('---'): + stripped = line[1:].strip() + ln = old_cur + old_cur += 1 + if not stripped or stripped.startswith('#'): + continue + if ln not in old_docstring_lines: + violations.append(f'{path}:{ln} removed outside docstring') + elif line.startswith(' '): + old_cur += 1 + new_cur += 1 + + if violations: + print('\n'.join(violations)) + sys.exit(1) + PY + ) || true + if [ -n "$NON_DOCSTRING" ]; then + REASONS="${REASONS}- non-docstring/non-comment .py edits in docs suite:\n$(echo "$NON_DOCSTRING" | sed 's/^/ - /')\n" + fi + fi + fi + + if [ -z "$REASONS" ]; then + echo "Scope gate passed for ${FINDING_ID}: ${FILE_COUNT} file(s), ${LOC_DELTA} LOC, all within allowlist." + continue + fi + + REJECTED=1 + echo "::error::Scope gate violation for ${FINDING_ID}" + printf '%b' "$REASONS" + + if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + MSG=$(printf 'Closed by workflow scope gate. The pushed diff violated the localized-fix bar (see `.agents/recipes/_fix-policy.md`):\n\n%b\nThe `attempted_fixes` entry has been flipped to `abandoned`.' "$REASONS") + gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ + echo "::warning::gh pr close failed; branch may need manual cleanup" + elif [ -n "$BRANCH" ]; then + git push origin --delete "$BRANCH" || \ + echo "::warning::Could not delete remote branch $BRANCH" + else + echo "::warning::No PR number or branch available for cleanup" + fi + + FINDING_ID="$FINDING_ID" python3 -c " + import json, os + finding_id = os.environ['FINDING_ID'] + path = '.agentic-ci-state/runner-state.json' + with open(path) as f: + state = json.load(f) + for entry in state.get('attempted_fixes', []): + if entry.get('id') != finding_id: + continue + attempts = entry.get('attempts') or [] + if attempts and attempts[-1].get('outcome') == 'open': + attempts[-1]['outcome'] = 'abandoned' + attempts[-1]['gate_violation'] = True + tmp = path + '.tmp' + with open(tmp, 'w') as f: + json.dump(state, f, indent=2) + os.replace(tmp, path) + " + + done < <(echo "$OPEN_ENTRIES" | jq -c '.[]') + + if [ "$REJECTED" -eq 1 ]; then + echo "rejected=true" >> "$GITHUB_OUTPUT" + fi + exit 0 + + - name: Verify dependencies lockfile + # Dependencies suite only: re-run make install-dev against the + # agent's pyproject.toml changes. This catches the failure mode + # where the per-package test target passed against the *old* + # lockfile but the proposed dep does not actually resolve. + id: lockfile_gate + if: matrix.suite == 'dependencies' && steps.fix.outcome == 'success' && steps.scope_gate.outcome == 'success' + env: + GH_TOKEN: ${{ github.token }} + run: | + set -o pipefail + + # Same snapshot-based selector as the scope gate: target every + # entry whose attempts grew during this run. + OPEN_ENTRIES=$(jq -c --slurpfile prior /tmp/prior-attempted-fixes.json ' + (($prior[0] // []) | map({key: .id, value: .n}) | from_entries) as $p + | [ + .attempted_fixes // [] + | .[] + | select( + ((.attempts | last | .outcome) == "open") + and ((.attempts | length) > ($p[.id] // 0)) + ) + ] + ' .agentic-ci-state/runner-state.json) + OPEN_COUNT=$(echo "$OPEN_ENTRIES" | jq 'length') + if [ "$OPEN_COUNT" -eq 0 ]; then + echo "No new open attempted_fix; skipping lockfile verification." + exit 0 + fi + + echo "Verifying dependencies lockfile for ${OPEN_COUNT} new open attempted_fix entries." + REJECTED=0 + + while IFS= read -r OPEN; do + PR_NUMBER=$(echo "$OPEN" | jq -r '.attempts | last | .pr_number // empty') + BRANCH=$(echo "$OPEN" | jq -r '.attempts | last | .branch // empty') + FINDING_ID=$(echo "$OPEN" | jq -r '.id') + + # Verify against the actually-pushed branch, not local HEAD. + if [ -z "$BRANCH" ] && [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + BRANCH=$(gh pr view "$PR_NUMBER" --json headRefName -q .headRefName 2>/dev/null || true) + if [ -n "$BRANCH" ]; then + echo "::warning::Open attempt had no branch; recovered $BRANCH from PR #$PR_NUMBER." + fi + fi + if [ -n "$BRANCH" ]; then + git fetch --depth=50 origin "$BRANCH" 2>/dev/null || true + git checkout --force --detach "refs/remotes/origin/$BRANCH" 2>/dev/null \ + || echo "::warning::Could not checkout origin/$BRANCH; verifying current tree" + fi + + if make install-dev 2>&1 | tee /tmp/install-dev-verify.log; then + echo "Lockfile resolves cleanly for ${FINDING_ID}." + continue + fi + + REJECTED=1 + echo "::error::make install-dev failed against the agent's pyproject changes for ${FINDING_ID}" + + if [ -n "$PR_NUMBER" ] && [ "$PR_NUMBER" != "null" ]; then + MSG="Closed by workflow lockfile verification. \`make install-dev\` failed against the agent's \`pyproject.toml\` changes — the dependency edit does not resolve cleanly. See \`/tmp/install-dev-verify.log\` in the workflow artifact." + gh pr close "$PR_NUMBER" --comment "$MSG" --delete-branch || \ + echo "::warning::gh pr close failed" + elif [ -n "$BRANCH" ]; then + git push origin --delete "$BRANCH" || true + else + echo "::warning::No PR number or branch available for cleanup" + fi + + FINDING_ID="$FINDING_ID" python3 -c " + import json, os + finding_id = os.environ['FINDING_ID'] + path = '.agentic-ci-state/runner-state.json' + with open(path) as f: + state = json.load(f) + for entry in state.get('attempted_fixes', []): + if entry.get('id') != finding_id: + continue + attempts = entry.get('attempts') or [] + if attempts and attempts[-1].get('outcome') == 'open': + attempts[-1]['outcome'] = 'abandoned' + attempts[-1]['lockfile_verification_failed'] = True + tmp = path + '.tmp' + with open(tmp, 'w') as f: + json.dump(state, f, indent=2) + os.replace(tmp, path) + " + + done < <(echo "$OPEN_ENTRIES" | jq -c '.[]') + + if [ "$REJECTED" -eq 1 ]; then + echo "rejected=true" >> "$GITHUB_OUTPUT" + fi + exit 0 + - name: Update runner memory if: always() env: SUITE: ${{ matrix.suite }} AUDIT_OUTCOME: ${{ steps.audit.outcome }} run: | - # Always validate state (cache saves regardless of outcome) + # Always validate state before cache save/post-job handling. python3 -c " import json, datetime, os try: @@ -205,7 +645,9 @@ jobs: except (json.JSONDecodeError, FileNotFoundError) as e: print(f'::warning::runner-state.json is invalid ({e}), resetting') state = {'suite': os.environ['SUITE'], 'known_issues': [], 'baselines': {}} - # Only stamp last_run if the audit actually succeeded + # Only stamp last_run if the audit actually succeeded. + # Fix phase manages its own state via attempted_fixes; its outcome + # does not gate last_run. if os.environ.get('AUDIT_OUTCOME') == 'success': state['last_run'] = datetime.datetime.now(datetime.timezone.utc).isoformat() state['suite'] = os.environ['SUITE'] @@ -214,13 +656,20 @@ jobs: " - name: Upload agent log - if: failure() + # Always upload: for autonomous PR generation, the most interesting + # failure mode is "the workflow succeeded but the PR was wrong". + # The full event stream is the only way to look back days later. + if: always() uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: claude-audit-log-${{ matrix.suite }}-${{ github.run_id }}-${{ github.run_attempt }} path: | /tmp/claude-audit-log.txt + /tmp/claude-fix-log.txt /tmp/audit-${{ matrix.suite }}.md + /tmp/pr-body-${{ matrix.suite }}.md + /tmp/install-dev-verify.log + .agentic-ci-state/runner-state.json retention-days: 14 if-no-files-found: ignore @@ -228,12 +677,42 @@ jobs: if: always() env: SUITE: ${{ matrix.suite }} + AUDIT_OUTCOME: ${{ steps.audit.outcome }} + FIX_OUTCOME: ${{ steps.fix.outcome }} + BACKLOG_SIZE: ${{ steps.backlog.outputs.size }} + SCOPE_REJECTED: ${{ steps.scope_gate.outputs.rejected }} + LOCKFILE_REJECTED: ${{ steps.lockfile_gate.outputs.rejected }} run: | echo "## Daily Audit: ${SUITE}" >> "$GITHUB_STEP_SUMMARY" echo "" >> "$GITHUB_STEP_SUMMARY" + echo "- Audit: \`${AUDIT_OUTCOME:-unknown}\`" >> "$GITHUB_STEP_SUMMARY" + echo "- Fix backlog size: \`${BACKLOG_SIZE:-n/a}\`" >> "$GITHUB_STEP_SUMMARY" + echo "- Fix: \`${FIX_OUTCOME:-skipped}\`" >> "$GITHUB_STEP_SUMMARY" + if [ "${SCOPE_REJECTED}" = "true" ]; then + echo "- Scope gate: \`rejected and closed PR\`" >> "$GITHUB_STEP_SUMMARY" + fi + if [ "${LOCKFILE_REJECTED}" = "true" ]; then + echo "- Lockfile gate: \`rejected and closed PR\`" >> "$GITHUB_STEP_SUMMARY" + fi + echo "" >> "$GITHUB_STEP_SUMMARY" if [ -s "/tmp/audit-${SUITE}.md" ]; then cat "/tmp/audit-${SUITE}.md" >> "$GITHUB_STEP_SUMMARY" else echo "No report generated. See the \`claude-audit-log-*\` artifact on failures for the full event stream." >> "$GITHUB_STEP_SUMMARY" fi + + - name: Save rejected gate state + if: always() && (steps.scope_gate.outputs.rejected == 'true' || steps.lockfile_gate.outputs.rejected == 'true') + uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5 + with: + path: | + .agentic-ci-state + graphify-out + key: agentic-ci-state-${{ matrix.suite }}-${{ github.run_id }}-${{ github.run_attempt }}-rejected + + - name: Fail rejected fix gates + if: always() && (steps.scope_gate.outputs.rejected == 'true' || steps.lockfile_gate.outputs.rejected == 'true') + run: | + echo "::error::A post-fix gate rejected and closed the agent PR. Runner memory was saved before failing." + exit 1 diff --git a/.github/workflows/agentic-ci-issue-triage.yml b/.github/workflows/agentic-ci-issue-triage.yml index 7a2dcdbc6..080ace5b7 100644 --- a/.github/workflows/agentic-ci-issue-triage.yml +++ b/.github/workflows/agentic-ci-issue-triage.yml @@ -97,46 +97,77 @@ jobs: continue-on-error: true - name: Fallback post if agent did not post + shell: bash env: GH_TOKEN: ${{ github.token }} TRACKING_ISSUE: ${{ vars.ISSUE_TRIAGE_TRACKING_ISSUE }} run: | - if [ ! -s "/tmp/issue-triage-report.md" ]; then + # Collect report parts. Numbered files (multi-part) take precedence; + # fall back to the unsuffixed file for single-part runs. + shopt -s nullglob + PARTS=(/tmp/issue-triage-report-*.md) + shopt -u nullglob + if [ ${#PARTS[@]} -eq 0 ] && [ -s /tmp/issue-triage-report.md ]; then + PARTS=(/tmp/issue-triage-report.md) + fi + if [ ${#PARTS[@]} -eq 0 ]; then echo "::warning::Triage report not created by agent." exit 0 fi - - # Check if the agent already posted/updated the comment. - MARKER="<!-- agentic-ci-issue-triage -->" - EXISTING=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ - --jq "[.[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${MARKER}\"))] | last | .id" \ - 2>/dev/null || echo "") - - REPORT=$(cat /tmp/issue-triage-report.md) - - # Only post if the report marker is not already in a recent comment - # with today's date (agent already posted). + mapfile -t PARTS < <(printf '%s\n' "${PARTS[@]}" | sort -V) + EXPECTED_PARTS=${#PARTS[@]} + + # Skip fallback only if every expected part identity (i/N) appears + # in a today-dated bot comment. Identity-based, not count-based: + # a duplicate post of one part should not mask a missing other. + # The pre-`capture()` `test()` guard is required: jq `capture()` + # raises an error on non-matching input, which would truncate the + # stream if any unrelated today-dated bot comment lacks the + # triage marker (e.g. from another automated workflow). TODAY=$(date -u +%Y-%m-%d) - if [ -n "$EXISTING" ] && [ "$EXISTING" != "null" ]; then - EXISTING_BODY=$(gh api "repos/${{ github.repository }}/issues/comments/${EXISTING}" --jq '.body') - if echo "$EXISTING_BODY" | grep -q "$TODAY"; then - echo "Agent already posted today's report. Skipping fallback." - exit 0 + SEEN_PARTS=$(gh api "repos/${{ github.repository }}/issues/${TRACKING_ISSUE}/comments" \ + --paginate \ + --jq ".[] | select(.user.login == \"github-actions[bot]\") | select(.body | contains(\"${TODAY}\")) | select(.body | test(\"agentic-ci-issue-triage:[0-9]+/${EXPECTED_PARTS}\")) | .body | capture(\"agentic-ci-issue-triage:(?<i>[0-9]+)/${EXPECTED_PARTS}\") | .i" \ + 2>/dev/null | sort -u) + + MISSING=() + for i in $(seq 1 "${EXPECTED_PARTS}"); do + if ! echo "${SEEN_PARTS}" | grep -qx "${i}"; then + MISSING+=("${i}") fi - # Update existing comment. - gh api -X PATCH "repos/${{ github.repository }}/issues/comments/${EXISTING}" \ - -f body="$REPORT" - echo "Updated existing triage comment." - else - gh issue comment "$TRACKING_ISSUE" --body-file /tmp/issue-triage-report.md - echo "Posted new triage comment." + done + + if [ ${#MISSING[@]} -eq 0 ]; then + echo "All ${EXPECTED_PARTS} parts already posted today (identities: $(echo "${SEEN_PARTS}" | tr '\n' ',' | sed 's/,$//')). Skipping fallback." + exit 0 + fi + + if [ -n "${SEEN_PARTS}" ]; then + echo "::warning::Posting only missing parts ${MISSING[*]} of ${EXPECTED_PARTS}; the agent already posted parts $(echo "${SEEN_PARTS}" | tr '\n' ',' | sed 's/,$//')." fi + for i in "${MISSING[@]}"; do + PART="${PARTS[$((i-1))]}" + gh issue comment "$TRACKING_ISSUE" --body-file "$PART" + echo "Posted part ${i}: ${PART}" + done + - name: Write job summary if: always() + shell: bash run: | - if [ -s "/tmp/issue-triage-report.md" ]; then - cat /tmp/issue-triage-report.md >> "$GITHUB_STEP_SUMMARY" + shopt -s nullglob + PARTS=(/tmp/issue-triage-report-*.md) + shopt -u nullglob + if [ ${#PARTS[@]} -eq 0 ] && [ -s /tmp/issue-triage-report.md ]; then + PARTS=(/tmp/issue-triage-report.md) + fi + if [ ${#PARTS[@]} -gt 0 ]; then + mapfile -t PARTS < <(printf '%s\n' "${PARTS[@]}" | sort -V) + for PART in "${PARTS[@]}"; do + cat "${PART}" >> "$GITHUB_STEP_SUMMARY" + echo "" >> "$GITHUB_STEP_SUMMARY" + done else echo "No triage report was generated." >> "$GITHUB_STEP_SUMMARY" fi