Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 257 additions & 0 deletions .agents/recipes/_fix-policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
# Agentic CI Fix Policy

Prepended to every daily-suite recipe alongside `_runner.md`. Defines what
"open a PR" means for these recipes and the rules that apply across all of
them. Each suite recipe declares only its eligible finding categories, its
branch types, and any risk-specific notes β€” everything else is here.

When in doubt, fall back to report-only.

## Localized fix bar

A finding may be converted to a fix only if all hold:

- **Bounded scope**: ≀3 files, ≀50 LOC net.
- **Reversible**: no public API changes, no `__all__` deletions, no version
bumps (Dependabot owns those), no schema changes, no migrations.
- **Self-evident**: the audit established both the problem *and* the unique
correct fix. Mechanical, not interpretive.
- **Test-safe**: when the recipe declares `test_required`, run the
per-package test target for the affected package and abort on failure.
Mapping (the Makefile does not expose `test-<package>` directly):

| Package directory | Test target |
|-------------------|-------------|
| `packages/data-designer-config` | `make test-config` |
| `packages/data-designer-engine` | `make test-engine` |
| `packages/data-designer` | `make test-interface` |
- **Single concern**: one finding per PR.
- **Allowlisted paths**: matches the suite's path allowlist.

If the top-ranked candidate fails the bar, try the next. If none of the top
5 qualify, skip the fix step and emit report-only.

## Allowlists

### Per-suite path allowlist

| Suite | Paths the recipe MAY modify |
|-------|-----------------------------|
| docs-and-references | `architecture/**`, `docs/**`, `README.md`, `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md`, `packages/*/src/**/*.py` (docstring-only edits) |
| dependencies | `packages/*/pyproject.toml` |
| structure | `packages/*/src/**/*.py` |
| code-quality | `packages/*/src/**/*.py` |
| test-health | (no fix phase) |

### Shared forbidden paths (all suites)

- `.github/workflows/**`, `.agents/**`, repo-root `pyproject.toml`,
`.git/**`, anything in `.gitignore`.

### Shared forbidden commands

- `git push --force` (any variant), `git rebase`, `git reset --hard`,
`git branch -D`/`-d`/`--delete`.
- `gh pr merge`, `gh pr close`, `gh pr review`.
- `pip install`, `uv pip install` (use `make install-dev` only).

## Runner-state schema

Each daily recipe maintains two arrays in
`{{memory_path}}/runner-state.json` beyond the existing `known_issues` /
`baselines`:

```json
{
"fix_backlog": [
{ "id": "<hash>", "category": "...", "first_seen": "YYYY-MM-DD",
"last_seen": "YYYY-MM-DD", "data": { /* category fields */ } }
],
"attempted_fixes": [
{ "id": "<hash>", "attempts": [
{ "pr_number": 612, "outcome": "merged", "at": "YYYY-MM-DD",
"branch": "agentic-ci/..." }
] }
]
}
```

Also: `draft_until_proven` (boolean, per-suite, default `true` for
code-quality and unset elsewhere) controls draft-PR mode.

### `fix_backlog` rules (audit phase populates this)

- Append every detected finding in an eligible category. If `id` is already
present, **refresh both `last_seen` and `data`** with the current scan's
values. The `data` field is used by the fix phase to apply the change
without re-scanning, so stale `data` would let an old plan drive a new
PR after the underlying file moved or changed.
- Drop entries with `last_seen` older than 30 days.
- Cap at 200 entries (drop oldest by `first_seen`).
- Populated **before** the `known_issues` filter so fixable findings persist
even when their report row is suppressed for being unchanged.

### `attempted_fixes` rules

`outcome` ∈ `{open, merged, closed, abandoned}`.

- `abandoned` means the recipe could not produce a PR (tests failed,
conflict, lint failed, allowlist rejected, etc.).
- Reconcile against open PRs (`gh pr list`) at the start of each fix
run to recover from crashes that left state un-updated. The
reconciliation algorithm: list open PRs whose bodies contain the
`<!-- agentic-ci finding=<id> suite=<suite> -->` marker, parse out
each `<id>`, and back-fill any missing `attempted_fixes` entries with
`outcome: "open"` and the parsed `pr_number` and `branch`.
- Prune: drop `merged` entries older than 90 days. Do **not** prune
`closed` or `abandoned` entries by age β€” pruning a single-strike entry
would erase the history needed to ever reach the two-strike threshold.
- The 200-entry cap handles long-tail cleanup. Eviction order:
non-two-strike entries first, oldest-first by `attempts[0].at`.
Two-strike entries (β‰₯2 `closed`/`abandoned`) are exempt from cap
eviction unless every other entry has already been evicted β€” they
represent maintainer-action signals and must not be silently
forgotten. If two-strike entries alone exceed 200, that's itself a
signal worth surfacing; in that pathological case, evict oldest-first
by `attempts[0].at`.
- Two-strike entries surface in the report under
`Repeatedly-failed fix attempts` and are filtered from selection
permanently.

## Finding hash

`finding_id = sha1(suite + ":" + canonical_key)[:12]`, where
`canonical_key` uses durable identifiers only β€” never line numbers or free
text:
Comment thread
andreatgretel marked this conversation as resolved.

| Suite (category) | canonical_key |
|------------------|---------------|
| docs (broken-link) | `<source-file>:<target>` |
| docs (docstring-drift) | `<source-file>:<symbol>:<param-or-empty>:<drift-type>` |
| docs (arch-ref-rename) | `<doc-file>:<old-symbol>` |
| dependencies (transitive-gap) | `<package>:<dep>:transitive` |
| dependencies (unused) | `<package>:<dep>:unused` |
| structure (missing-future) | `<source-file>:missing-future` |
| structure (lazy-import) | `<source-file>:lazy-import:<imported-module>` |
| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:<try-body-hash>:<ordinal>:bare-except` |

Symbols use fully-qualified Python names.
`try-body-hash` is `sha1(<try-block body, leading/trailing whitespace
stripped, internal lines preserved>)[:8]`.
`ordinal` is the 1-based position of this bare-except among bare-excepts
in the same enclosing symbol, in source order. Both are needed: the body
hash distinguishes most cases, and the ordinal disambiguates the rare
case of two bare-except blocks with byte-identical try bodies.

## Ranking

Earlier criteria override later ones:

1. **Fix confidence** (per-category):

| Category | Confidence |
|----------|-----------|
| structure / missing-future | 1.0 |
| structure / lazy-import | 0.9 |
| docs / broken-link | 0.9 |
| dependencies / transitive-gap | 0.85 |
| docs / arch-ref-rename | 0.8 |
| dependencies / unused | 0.75 |
| docs / docstring-drift | 0.75 |
| code-quality / bare-except | 0.6 |

2. **Defect severity**:

| Severity | Examples |
|----------|----------|
| high | missing transitive dep, heavy import bypassing lazy system |
| medium | broken doc link visible on docs site, bare-except hiding errors, docstring drift on public API |
| low | broken link in dev-notes, missing `__future__ import annotations`, unused dep |

3. **User-facing impact** β€” visible to docs-site readers or plugin
consumers vs internal-only.

4. **Recency** β€” newer findings rank above long-standing ones.

Record the chosen finding's id, scores, and rationale at the top of
`/tmp/audit-{{suite}}.md`.

## Standard fix procedure

The fix phase of every eligible recipe follows these steps. Suite recipes
declare only the parts that vary (eligible categories, branch type,
`test_required`, suite-specific quirks).

1. Reconcile `attempted_fixes` against open PRs (`gh pr list`) to recover
any state lost to a prior crash.
2. Filter `fix_backlog`: drop entries whose latest attempt is `open` or
`merged`; surface two-strike entries in the report's
`Repeatedly-failed fix attempts` section and drop them from selection.
3. Rank the remainder per the Ranking section.
4. For each candidate, top 5 max:
1. Re-verify the finding still applies (re-grep / re-read). If not,
remove from `fix_backlog` and continue.
2. Apply the fix. If the diff exceeds the localized-fix bar or touches
a non-allowlisted path, abandon and continue.
3. If the category sets `test_required: true`, run the per-package
test target (see the mapping table in "Localized fix bar" above)
for the package containing the change. On failure: abandon and
continue.
4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
`<type>(agentic-ci): <one-line>`. Push.
5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the
hidden metadata block:
`<!-- agentic-ci finding=<id> suite=<suite> -->`
6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
iff `draft_until_proven` is true for the suite.
7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
8. Record `attempted_fixes` entry with `outcome: "open"` and exit.
5. If all 5 candidates were abandoned, append a one-line note to the
report and exit cleanly. The state already reflects the abandonments.

On any failure mid-flow: record `outcome: "abandoned"` for the chosen
finding (with `pr_number: null`), leave any pushed branch in place
(`pr-stale.yml` will reap it; branch deletion is forbidden), and continue
to the next candidate.

## PR conventions

- **Use `gh pr create --body-file`**, not `/create-pr`. The skill is
interactive-only and shells the body inline; CI needs determinism.
- **Title**: conventional, `<type>(agentic-ci): <one-line>`.
- **Labels**: `agentic-ci`, `agentic-ci/<suite>`.
- **Draft PRs**: `code-quality` opens draft until a maintainer flips
`draft_until_proven` to `false` in runner-state, after at least two
non-draft PRs from that suite have landed clean. This flip is
intentionally manual β€” it is the sole human-gated promotion step in
the fix policy and must not be automated.

## Atomicity

Each fix-phase invocation produces exactly one of:

- **Report-only** β€” runner-state updated; no branch, commit, or PR.
- **Report + PR** β€” same, plus a pushed branch, a commit, and a PR. The
`attempted_fixes` entry is recorded *before* the recipe exits.

No half-states. The runner state is the source of truth for what the
recipe has tried; never silently drop a failed attempt.

The matrix-level concurrency for the daily workflow uses
`cancel-in-progress: false` so a fix in flight cannot be cancelled
between push and PR open. The trade-off is a queued duplicate run if a
manual dispatch arrives while cron is still going; that's preferable to
orphaned branches with no `attempted_fixes` record.

## Workflow-level scope gate

The agent's compliance with the path allowlists and the localized-fix
bar is load-bearing for autonomous PR generation, but the recipe alone
cannot enforce them. The daily workflow runs a post-fix scope gate that
re-derives the per-suite allowlist (mirrored from the table above) and
the diff stats from the pushed branch, then closes the PR and deletes
the remote branch on violation. The gate also flips the
`attempted_fixes` entry from `open` to `abandoned` so two-strike logic
sees the failure. Keep the workflow's allowlist regexes in sync with the
table above; the workflow is the enforcement, the table is the
specification.
18 changes: 18 additions & 0 deletions .agents/recipes/_phase-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Phase directive

This invocation runs the **AUDIT** phase only.

- Execute the audit steps from the recipe and write the report to
`/tmp/audit-{{suite}}.md`.
- Update `{{memory_path}}/runner-state.json` with detected findings,
including `fix_backlog` entries per `_fix-policy.md` (populated BEFORE
applying the `known_issues` filter to the report, so fixable findings
persist across runs even when their report row is suppressed).
- Do NOT attempt any fix. Do NOT create any branches, commits, or PRs.
- Do NOT modify any files outside `{{memory_path}}/` and the report file
`/tmp/audit-{{suite}}.md` itself.
- A separate invocation will run the FIX phase if `fix_backlog` has
eligible candidates and the suite has a fix phase.
- Read the recipe in full for context; the "Fix phase" section informs
which finding categories should populate `fix_backlog`, but you must
not act on them in this invocation.
29 changes: 29 additions & 0 deletions .agents/recipes/_phase-fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Phase directive

This invocation runs the **FIX** phase only.

- The audit phase has already completed in a previous invocation. Its
report is at `/tmp/audit-{{suite}}.md` and
`{{memory_path}}/runner-state.json` has the populated `fix_backlog`.
- Execute only the recipe's "Fix phase" section per `_fix-policy.md`.
Do NOT redo audit work β€” that is, do NOT re-scan whole packages or
rebuild `fix_backlog` from scratch. The "no re-scan" rule does NOT
override the per-candidate re-verification step required by
`_fix-policy.md` Β§"Standard fix procedure" step 4.1: when you pick a
candidate, you MUST re-grep / re-read the specific file or symbol it
points at to confirm the finding still applies before editing.
Re-verification of a single candidate is required; re-scanning the
codebase to discover new findings is forbidden.
- Pick the highest-ranked eligible candidate from `fix_backlog`, apply
the fix, run the package's tests if applicable, commit, push, and open
the PR using `gh pr create --body-file`.
- Record the attempt in `attempted_fixes` (whether successful, abandoned,
or failed through the top-5 fallback) before exiting.
- If no candidate qualifies after trying up to 5 of them, exit cleanly,
append a short note to `/tmp/audit-{{suite}}.md` describing what was
tried, and update `attempted_fixes` accordingly. Do NOT open a PR.
- Do NOT delete branches, even on failure (per `_runner.md` and
`_fix-policy.md`). Leave them for the existing `pr-stale.yml` workflow
to reap over time.
- Read the recipe in full for context, but treat the audit phase as
already done.
14 changes: 11 additions & 3 deletions .agents/recipes/_runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,14 @@ Write all output to a temp file (e.g., `/tmp/recipe-output.md`). The workflow
will handle posting it. Do not post directly to GitHub - the workflow controls
output routing.

If your recipe produces code changes, commit them on a new branch and use
`/create-pr` to open a pull request. The branch name should follow the
pattern `agentic-ci/chore/{suite}-YYYYMMDD`.
If your recipe produces code changes, commit them on a new branch following
the pattern `agentic-ci/{type}/{suite}-YYYYMMDD-{short-slug}` where `{type}`
matches the change kind (`chore`/`docs`/`fix`/`refactor`).

For PR creation in CI, use `gh pr create --body-file /tmp/pr-body-<suite>.md`
directly rather than the `/create-pr` skill. The skill assumes an interactive
session (it can prompt about uncommitted changes, base branch, etc.) and
shells the body inline, which breaks on backticks and special characters.
Daily-suite recipes that open PRs are governed by `_fix-policy.md` β€” read it
for the full PR contract (allowlists, draft mode, hidden metadata, branch
naming, atomicity).
Loading
Loading