diff --git a/.claude/commands/blame-context.md b/.claude/commands/blame-context.md new file mode 100644 index 0000000000..41e0a3a8cb --- /dev/null +++ b/.claude/commands/blame-context.md @@ -0,0 +1,52 @@ +--- +description: Recover historic context for a file:line via `git blame` plus the squash-PR pivot. Wraps `docs/skills/git-blame-historic-context.md` end-to-end and prints the strengthens / weakens / inverts decision so the caller can weight a finding before flagging suspicious-looking code. Read-only. +--- + +Look up historic context for: $ARGUMENTS + +## Parse $ARGUMENTS + +Accept any of: + +- `path:line` — single line. e.g. `crates/driver/src/.../settlement.rs:444` +- `path:start-end` — line range. e.g. `crates/driver/src/.../settlement.rs:434-447` +- `path line` or `path start-end` — space-separated form (handy when paths contain `:`). + +If `$ARGUMENTS` is empty or unparseable, print the usage block and abort: + +``` +Usage: /blame-context : + /blame-context :- +``` + +## Procedure + +Follow `docs/skills/git-blame-historic-context.md` end-to-end. Concretely: + +1. `git blame -L , -- ` to find the originating commit. +2. If the surface looks like a wholesale move/refactor (same author across many lines, recent date, identical hashes), retry with `git blame -w -C -C -C -L , -- ` to recover the real authoring commit. +3. `git log -1 --format='%s%n%b' ` for the commit body. +4. If the subject ends with `(#NNNN)`, pivot to the PR conversation: `gh pr view ` (the gh CLI infers the repo from the working directory). The PR body is usually richer than the squash commit alone. + +Then print the report below. + +## Output + +``` +─── Blame for : + + +─── Originating commit / PR + + +─── Decision + the suspicion that this code is unusual. + +Action: +``` + +## Rules + +- **Read-only.** `git blame`, `git log`, `git show`, `gh pr view`, `gh api` GET-only. No `git commit`, no `git checkout`, no mutating `gh api` verbs. +- **Don't comment on the PR.** Don't edit files. The caller owns what to do with the decision — this command just supplies the evidence. +- **Don't invent a decision.** If blame surfaces nothing useful (line is brand new, generator-output, vendored), print `Decision: insufficient signal` and explain why. diff --git a/.claude/commands/pr-synthesis.md b/.claude/commands/pr-synthesis.md new file mode 100644 index 0000000000..f813adbc6b --- /dev/null +++ b/.claude/commands/pr-synthesis.md @@ -0,0 +1,76 @@ +--- +description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Read-only. +--- + +Synthesise PR: $ARGUMENTS + +## Parse $ARGUMENTS + +Accept any of: + +- A PR number: `4267`/`#4267` +- A full URL: `https://github.com/cowprotocol/services/pull/4267` +- An `owner/repo#N` form: `cowprotocol/services#4267` + +Default `owner/repo` to `cowprotocol/services` when only a number is given. + +If `$ARGUMENTS` is empty or unparseable, print: + +``` +Usage: /pr-synthesis + /pr-synthesis https://github.com/owner/repo/pull/ + /pr-synthesis owner/repo# +``` + +and abort. + +## Procedure + +Fetch in this order — context first, then diff. Lets the synthesis read the diff with the author's intent already in mind, and lets `gh pr view` report `additions`/`deletions` so the diff fetch can be sized appropriately. + +**Step 1 — PR metadata + closing issues.** Cheap; bounded size. + +```bash +gh pr view -R / --json title,body,files,additions,deletions,labels,baseRefName,headRefName,closingIssuesReferences +``` + +**Step 2 — linked issues.** For each entry in `closingIssuesReferences` (GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N`, more reliable than regexing the body, and follows cross-repo references correctly), fetch the issue: + +```bash +gh issue view -R / --json title,body,labels,state +``` + +If `closingIssuesReferences` is empty, proceed without a linked issue — never invent one. + +**Step 3 — diff fetch (size-gated).** Use `additions + deletions` from step 1 to decide: + +- **`additions + deletions <= 2000`** — fetch the full diff: + + ```bash + gh pr diff -R / + ``` + +- **`additions + deletions > 2000`** — *do not* fetch the full diff. It can blow the context window (e.g. PR #4217 = 376k lines). Get the complete per-file list via the paginated REST endpoint — `gh pr view --json files` silently caps at 100 files, which underreports scope on big PRs: + + ```bash + gh api --paginate "repos///pulls//files" \ + --jq '.[] | {filename, status, additions, deletions}' + ``` + + Build `` from these per-file records (`filename`, `additions`, `deletions`, `status` — added / modified / renamed / removed). Bucket lockfiles, generated bindings, vendored artifacts, and codegen output — call them out as a single line each, not file-by-file. State in the synthesis that the diff was summarised at file-scope only. + +Build: + +- `` = full hunks (small PR) or per-file scope buckets (large PR). The ground truth. +- `` = title + body +- `` = fetched issue(s), if any + +Then follow `docs/skills/pr-context-synthesis.md` Rules and Shape. Output the synthesis verbatim — no header, no metadata, no separator lines. The caller pastes this directly into a review thread, an incident report, or a Slack message. + +## Rules + +- **Read-only.** `gh pr view`, `gh pr diff`, `gh issue view`, `gh api` GET-only. No `gh pr review`, no `gh pr comment`, no mutating `gh api` verbs. +- **Don't fetch a diff you can't read.** The 2000-line gate above is a hard preflight, not a suggestion. Above the gate, the per-file scope is enough to write the synthesis honestly; trying to swallow a 100k-line diff just truncates the context and corrupts the output silently. +- **Synthesise, don't copy-paste.** If `` is sparse, say so plainly: *"description is minimal; intent inferred from diff"*. +- **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism. The anti-vague-verb rule from the skill doc applies verbatim. +- **Watch for description-vs-diff drift.** `` must describe ``'s current state. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*. Fetching the issue + metadata before the diff is what lets you spot these — read the intent first, then check whether the diff matches. diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md new file mode 100644 index 0000000000..1777b3b353 --- /dev/null +++ b/.claude/commands/review-pr.md @@ -0,0 +1,135 @@ +--- +description: Produce a structured PR review for cowprotocol/services. Invoked locally as `/review-pr` (diff mode, against current branch vs main) or `/review-pr ` (PR mode). Same command also runs in CI via `.github/workflows/claude-code-review.yml`, where it posts a single review comment instead of printing to terminal. Read-only in local modes; the user posts any comments manually. +--- + +Review PR: $ARGUMENTS + +Follow the instructions in `./docs/COW_PR_REVIEW_SKILL.md` to produce the review report. + +## Prologue (execute in order; abort on any failure) + +### 1. Detect mode + +The skill runs in one of three modes. Detection: + +- If the environment variable `$GITHUB_ACTIONS == "true"` → `mode = "pr-ci"`. + - `$ARGUMENTS` MUST be a PR number, URL, or `owner/repo#N` form (the workflow passes it). +- Else if `$ARGUMENTS` is non-empty → `mode = "pr-local"`. Parse the argument (see [step 2](#2-parse-arguments-pr-modes-only)). +- Else (`$ARGUMENTS` empty, not in CI) → `mode = "diff"`. No `gh` calls needed; source is `git diff $(git merge-base HEAD origin/main)..HEAD` (the actual command runs in step 3 below, after fetching `origin/main`). + +### 2. Parse $ARGUMENTS (PR modes only) + +*(Skip in `diff` mode.)* + +Accept any of: + +- A PR number: `4267` +- A full URL: `https://github.com/cowprotocol/services/pull/4267` +- An `owner/repo#N` form: `cowprotocol/services#4267` + +Default `owner/repo` to `cowprotocol/services` when only a number is given. + +Extract: ``, ``, ``. + +If unparseable, print and abort: + +``` +Usage: /review-pr # diff mode (current branch vs main) + /review-pr # PR mode + /review-pr https://github.com/owner/repo/pull/ + /review-pr owner/repo# +``` + +### 3. Diff-mode preflight + +*(Only in `mode == "diff"`.)* + +Run: + +```bash +git fetch origin main --quiet +BASE=$(git merge-base HEAD origin/main) +git diff --stat "$BASE..HEAD" +``` + +If `git diff "$BASE..HEAD"` is empty, print `No diff vs main — nothing to review.` and exit clean (not an error). + +There is **no** clean-tree check, **no** rebase, and **no** `git pull` of main. Diff scope comes from the fetched `origin/main` merge-base. + +### 4. PR-mode preflight + +*(Only in `mode == "pr-local"` or `mode == "pr-ci"`.)* + +#### 4a. Working tree (pr-local only) + +Run `git status --porcelain`. If non-empty, print it plus: + +``` +Working tree is dirty. Stash or commit your changes, then re-run. + + git stash # temporary + git stash pop # to restore later +``` + +Then **abort**. Never auto-stash. + +#### 4b. Save the current branch (pr-local only) + +Save the current branch name to `` so the report's NEXT STEPS footer can suggest `git switch ` when the review is done. + +#### 4c. Fetch base ref + +Run `git fetch origin --quiet`. This makes the diff comparable to base without rebasing or modifying the user's branch. + +#### 4d. Checkout the PR + +Run `gh pr checkout -R /`. Failure handling: + +- **`gh` not installed** → print `Install gh: https://cli.github.com/` and abort. +- **Auth error** → print `gh auth status` output plus `Run: gh auth login` and abort. +- **PR doesn't exist / wrong repo** → surface `gh`'s error verbatim and abort. +- **Fork without checkout permission** → switch to **degraded static-diff mode**: + - Set `mode_qualifier = "degraded static-diff"`. + - In the reference doc's §2, replace `gh pr diff ` with `gh pr diff --patch -R /`. + - Flag the qualifier in the report header's `Mode:` line. +- **Any other error** → surface verbatim and abort. + +In `pr-ci`, the workflow has already checked out the PR branch — skip 4d and just verify `HEAD` matches the expected ref. + +### 5. Optional-tooling probe + +Detect which optional accelerators are available in the current session: Serena MCP (`mcp__plugin_serena_serena__*`) and `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`). + +Build a `loaded_context` list of whichever ones resolved. Pass it through to the reference doc; it prints verbatim in the report header's `Loaded context:` line. + +Do not abort if a skill is missing. Do not print install banners. + +### 6. Noise filter (before handoff) + +Classify each changed file: + +**Review surface (read fully):** + +- Anything under `crates/*/src/**/*.rs` (excluding `contracts/generated/**`). +- `crates/e2e/tests/**/*.rs`. +- `contracts/solidity/**/*.sol` (authored Solidity). +- Config files: `**/openapi.yml`, `database/sql/**`, `configs/**`, semantically interesting `*.toml`. + +**Noise (skip or skim):** + +- `Cargo.lock` (any). CI validates the resolution; reviewing the lockfile diff is high-cost, low-signal. +- `contracts/generated/**` and `contracts/artifacts/**` — machine-generated bindings and ABI JSON. +- Auto-generated `Cargo.toml` entries from contract-binding crates. +- Binary fixtures, ABI blobs, snapshot files. + +Report the filter in the report's `Scope:` line as `+X −Y across Z files (~N LOC human-written; rest generated/lockfile, filtered)`. + +### 7. Handoff + +Read `docs/COW_PR_REVIEW_SKILL.md` and follow it from §2 (Metadata Fetch) onward, passing through: + +- `mode` (`diff` / `pr-local` / `pr-ci`) and `mode_qualifier` if set. +- ``, ``, `` (PR modes only). +- `prior_branch` (`pr-local` only). +- `loaded_context` (optional skills detected in step 5). +- The noise-filter classification from step 6 — the reference doc uses it to decide what to read. diff --git a/.github/workflows/claude-code-review.yml b/.github/workflows/claude-code-review.yml new file mode 100644 index 0000000000..8623219db7 --- /dev/null +++ b/.github/workflows/claude-code-review.yml @@ -0,0 +1,34 @@ +name: Claude Code Review + +on: + pull_request: + types: [ready_for_review] + +jobs: + claude-review: + if: ${{ !github.event.pull_request.draft }} + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: write + issues: read + id-token: write + + steps: + - name: Checkout repository + uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 + with: + # Full history is needed so the skill's `git blame`-based + # historic-context step can resolve any line in the diff. + fetch-depth: 0 + + - name: Run Claude Code Review + id: claude-review + uses: anthropics/claude-code-action@e0f2d99545298b87c2f984ab534af3a6534142ae # v1 + with: + anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} + # Invoke the in-repo slash command. The skill detects + # $GITHUB_ACTIONS=true and switches to `pr-ci` mode, posting + # a single consolidated review comment instead of streaming + # to a terminal. Skill source of truth: docs/COW_PR_REVIEW_SKILL.md + prompt: '/review-pr ${{ github.repository }}#${{ github.event.pull_request.number }}' diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md new file mode 100644 index 0000000000..7a6d7855ee --- /dev/null +++ b/docs/COW_PR_REVIEW_SKILL.md @@ -0,0 +1,396 @@ +# CoW Services PR Review Skill + +This document instructs Claude how to produce a PR review for cowprotocol/services. It is invoked by `.claude/commands/review-pr.md` (locally) or by `.github/workflows/claude-code-review.yml` (in CI). One skill, three operating modes. + +At the point this document is read, the entry-point has already determined: + +- **`mode`** — one of: + - `diff` — local, no PR yet. Source is `git diff $(git merge-base HEAD origin/main)..HEAD` (the whole feature-branch worth of work, computed against the freshly-fetched remote `main` to avoid a stale local `main`). No PR metadata, no `gh` calls. Output to terminal. + - `pr-local` — local, PR exists. Source is `gh pr diff ` plus PR metadata. Output to terminal. + - `pr-ci` — running inside `.github/workflows/claude-code-review.yml`. Source is `gh pr diff ` plus PR metadata. Output is a single review comment posted to the PR. +- **``, ``, ``** (only in `pr-local` / `pr-ci`). +- **`prior_branch`** (only in `pr-local` — the branch to print at the end). + +The mode shapes which steps run and how the report is delivered, but the *content* of the review is the same in all three. + +--- + +## Core Principles (read before executing) + +- **Signal over noise.** Report genuine concerns only. LGTM is a perfectly valid verdict and is the correct one whenever the PR is clean. The goal is not to maximise finding count — it is to be worth a senior reviewer's attention. +- **Local modes never post to GitHub.** In `diff` and `pr-local`, output is strictly terminal. No `gh pr review`, no `gh pr comment`. The user posts whatever they choose. In `pr-ci`, post exactly one consolidated review comment — no per-line spam. +- **Code is the primary source of truth.** `CLAUDE.md`, design docs in `docs/`, and this skill's own sibling docs can go stale. When a finding turns on *"X is called from Y"* or *"this field is read by Z"*, verify with `git grep` / `rg` / LSP — not by citing a doc. +- **Inverted: this PR can make existing docs / comments / its own description stale.** If a code change makes a comment, a `docs/` page, or the PR's own description no longer match the diff's current state, that is itself a finding (`Action:` → update X). +- **`git blame` before flagging code that looks unusual.** Often code looks weird because it had to. Before suggesting a "cleanup", blame the affected lines, read the originating commit message and (if any) linked PR. A comment that says *"this looks accidental, did you mean X?"* without that step risks asking the author to undo a hard-won fix. +- **Explain, don't just flag.** Each finding must give the reviewer enough context to understand *and defend* the point — not just forward AI-generated text. +- **One framing per finding:** end with either `Action:` (concrete task) or `Question:` (clarification needed). Never both. +- **Token discipline.** Don't read whole files when grep or LSP suffices. Build a codemap before reading file bodies. + +--- + +## Universal Guardrails + +Apply these as the default lens for every change. Pull in CoW-specific siblings ([§3](#3-conditional-context)) only when the diff warrants them. + +1. **Keep the public API surface minimal.** A new `pub fn`, `pub struct`, or `pub mod` that isn't required by an external caller is a Medium finding asking why it isn't `pub(crate)` / `pub(super)` / private. Smaller surface = fewer downstream breakages = freer refactor for the next person. +2. **Avoid rightward drift.** Code that's deeply nested (4+ levels, especially `match` inside `if let` inside `for` inside `async`) is hard to read and usually hides a missing extraction. Suggest an early-return, a helper, or `let-else`. +3. **One responsibility per component.** A function, struct, or module that does two unrelated things (validates *and* persists; parses *and* renders) is harder to test and to reuse. Flag with a `Question:` if you're not sure the split is artificial. +4. **Split big files.** A new file pushing past ~500 lines, or a touched file growing past ~1000 lines, is worth flagging. Suggest a sensible split (often by responsibility from #3). +5. **Avoid argument bloat.** A function taking 6+ positional arguments is a code smell — usually missing a config struct, a builder, or a method on a context object. Especially flag if the arguments are mostly being threaded through unchanged. +6. **Errors carry context.** `?` propagating a low-level error to a high-level boundary without enrichment loses the *what was the caller trying to do* information. `anyhow!("{err}")` flattens cause chains. Both are findings; severity depends on the path. + +--- + +## Execution Flow + +Steps run in this order. `diff` mode skips PR-metadata steps; `pr-ci` swaps the report sink at the end. + +1. Fetch PR metadata and linked issue(s) — [§2](#2-metadata-fetch). *(`pr-local` / `pr-ci` only.)* +2. Triage prior review comments — [§2.5](#25-prior-comment-follow-up). *(`pr-local` / `pr-ci` only; further skip conditions in §2.5.)* +3. Classify diff paths and load conditional context — [§3](#3-conditional-context). +4. Build a targeted codemap — [§4](#4-codemap-phase). +5. Synthesize the context block — [§5](#5-context-synthesis). +6. Review and produce findings — [§6](#6-review-and-severity). +7. Emit the report — [§7](#7-report-templates). Sink depends on mode. +8. Offer verification (background) — [§8](#8-verification-offer). *(Local modes only.)* +9. Print cleanup hint — [§9](#9-cleanup). *(`pr-local` only.)* + +Error behaviour is consolidated in [§10](#10-error-playbook). + +--- + +## 2. Metadata Fetch + +*(Skip in `diff` mode — there is no PR yet.)* + +Run in **parallel** (single message, multiple Bash tool calls): + +```bash +gh pr view -R / \ + --json title,body,author,labels,files,baseRefName,headRefName,headRefOid,additions,deletions,commits,reviewDecision,isDraft,state,closingIssuesReferences + +gh pr diff -R / +``` + +In **degraded static-diff mode** (fork without checkout permission, only relevant in `pr-local`), replace `gh pr diff` with: + +```bash +gh pr diff --patch -R / +``` + +### Linked issues + +The `gh pr view` call above already returns `closingIssuesReferences` — GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N` from the PR body. Iterate that array (in parallel with the diff fetch) and pull each issue: + +```bash +gh issue view -R / --json title,body,labels,state +``` + +If `closingIssuesReferences` is empty, proceed without one. Do not manufacture one. + +### State handling + +- `state == "CLOSED"` or `"MERGED"` → proceed; prepend a one-line warning to the report. +- `isDraft == true` → proceed; prepend `Draft — author may still be iterating.` + +--- + +## 2.5 Prior-comment follow-up + +Skip when any of these holds — no output, the section is omitted from the report: + +- `mode == diff` (no PR yet). +- No prior human inline reviews on the PR. +- For every human reviewer, their latest review's `commit_id` already equals `` — no new commits since their last round. + +Otherwise, for each prior inline comment from a human reviewer (use `gh api repos///pulls//reviews` and `/comments`, GET only), surface one entry in the "Prior-comment follow-up" block ([§7](#7-report-templates)). Cite the new code at `:` that addresses it, the author's reply, or note that nothing has changed since ``. Use stable identifiers `[A]`, `[B]`, ... so findings raised in this review can chain context with `Re: [A]`. + +**Be conservative.** When the diff doesn't make the answer obvious, say so explicitly — false *"addressed"* tricks the reviewer into closing a thread that should stay open. Don't infer satisfaction from emoji reactions or short acknowledgements; the reviewer makes that call. + +**Read-only.** No `gh pr review`, no `gh api` mutating verbs (`POST`/`PATCH`/`DELETE`), no comment-resolution endpoints. The reviewer resolves threads, not you. + +--- + +## 3. Conditional Context + +For each changed file, evaluate this table and accumulate a `context_docs` list. Read each matched doc once. + +| Match | Load | +|---|---| +| Any file under `database/sql/**` | `docs/review-context/database-migrations.md` | + +Add a new sibling only when: + +- A real review surfaced a CoW-specific concern the AI consistently missed, **and** +- That concern can't reasonably be inferred from the [Universal Guardrails](#universal-guardrails) plus general Rust judgment, **and** +- It can be expressed as a tight checklist (≤30 lines), not a sprawling rulebook. + +### One inline guardrail worth keeping + +When a PR touches **any `openapi.yml`**: scan for breaking changes (removed/renamed/typed-changed fields, new required request fields, narrowed enums, changed auth or HTTP method). If any are present, ask whether the goal could be achieved non-breakingly (additive field, new optional, deprecation window) and whether the affected consumer teams (Frontend, SAFE, etc.) have been notified. Severity: High when undisclosed in the PR description; Medium otherwise. + +--- + +## 4. Codemap Phase + +**Purpose:** before reading file bodies, map the symbols the diff touches, their callers, and their call sites. A codemap turns a 1000-line diff into a ~20-line mental model and catches findings that only become visible at the *shape* level (caller-count inconsistencies, dead abstractions, leaky public APIs). + +### What to map + +For each non-trivial symbol the diff adds, modifies, or deletes: + +1. **New public types / traits / functions** — fields, methods, signatures. +2. **Modified function signatures** — caller count, were all sites updated? +3. **New trait impls** — which types implement the trait? Is the trait used outside this PR? +4. **Error-type changes** — where do callers match on this error? + +### Tools (cheapest viable option first) + +| Tool | Status | When to use | +|---|---|---| +| `Grep` / `rg` with `-n` on a symbol name | Always available | Caller counts and basic location lookups. Example: `rg 'OrderValidator::new\b' crates/`. | +| `git blame` / `git log` | Always available | Historic context on suspicious-looking lines. Local, read-only. | +| `gh api` (read-only verbs) | Always available | Reading individual file contents at a specific ref (e.g. degraded static-diff mode in §4). **Never** use mutating verbs (`POST`/`PATCH`/`DELETE`) — review skill is read-only. | +| `mcp__plugin_serena_serena__find_symbol` / `find_referencing_symbols` / `get_symbols_overview` | Optional (LSP-backed; available when Serena MCP is configured) | Precise location + kind + signature without reading the full file. | +| Skills from `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`) | Optional (installed via `npx skills add actionbook/rust-skills`) | Richer cross-crate analysis. Not present in CI by default. | +| `Read` of a full file | Last resort | Only when the diff hunks plus the cheaper tools don't pin down what you need. | + +**Fallback rule:** in CI and in `pr-local` (full checkout), every codemap step still works with `rg` and `git` against the local working tree. The optional tools are accelerators, not requirements. + +**Exception — degraded static-diff mode.** When `pr-local` falls back to static-diff (a fork the user can't `gh pr checkout`), the working tree is the *prior* branch, not the PR's. `rg crates/`, `Read` of repo files, and `git blame` on the working tree all reflect the wrong code. In this mode: + +- Take the diff content from `gh pr diff --patch -R /` (already arranged in [`commands/review-pr.md` §4d](../.claude/commands/review-pr.md)). +- Read individual files at the PR's head SHA via `gh api /repos///contents/?ref=` instead of `Read`. +- Skip `git blame`-based historic context for lines that only exist on the PR branch (the local repo doesn't know about them); blame against `origin/main` is still fine for *unchanged* surrounding code. +- Caller-count searches via `rg` are usable only for symbols that exist on `origin/main`; new symbols introduced by the PR must be looked up via the patch and `gh api`. + +The report header's `Mode:` line already flags `degraded static-diff` — a reviewer reading the output knows which constraint applied. + +### What the codemap produces + +A short block in the report that looks like: + +``` +Codemap +─────────────────────────────────────────────────────────── +New symbols (crate::module): + (kind, key fact) + ... + +Callers of : sites ( real, test). All updated ✓ +``` + +This is the raw material §5 (synthesis) and §6 (findings) work from. + +### When to skip + +Trivial PRs (docs-only, single-line bump, pure test addition) — skip. Pure refactors with no added public API — skim. Everything else — do it. + +--- + +## 5. Context Synthesis + +Follow the [`pr-context-synthesis`](skills/pr-context-synthesis.md) skill with `` = PR title + body from [§2](#2-metadata-fetch), `` = the fetched issue (if any, also from §2), and `` = the [§4](#4-codemap-phase) codemap plus the changed file list. + +--- + +## 6. Review and Severity + +Apply, in order: + +1. The [Universal Guardrails](#universal-guardrails). +2. The conditional context from [§3](#3-conditional-context), if any was loaded. +3. CoW-services conventions from `CLAUDE.md`. +4. Optional accelerators from [§4 → tools table](#tools-cheapest-viable-option-first) when available — Serena MCP for symbol lookups, `actionbook/rust-skills` for caller / trait / structural analysis. If they aren't installed, reason from `rg` + general Rust knowledge plus the [Universal Guardrails](#universal-guardrails). + +### Historic context + +Before flagging code that looks unusual, redundant, or "easy to clean up", follow the [`git-blame-historic-context`](skills/git-blame-historic-context.md) skill and factor what blame reveals into the finding's Explanation. + +### Severity Rubric + +| Severity | Meaning | +|---|---| +| **High** | Merging as-is is a real risk: correctness bug, data loss, security issue, incompatible DB migration, auction/settlement invariant broken, likely panic, unsound `unsafe`. | +| **Medium** | Worth fixing before merge — won't break prod but will cost later. Missing error context, public-API ergonomics, unhandled edge case, n-1 rollout incompatibility, undisclosed breaking API change. | +| **Small / QoL** | Would genuinely improve the code. **Not a nit.** | + +### Anti-nit Rule (mandatory) + +- If the only reason to change it is taste or stylistic preference, **do not report it**. +- Formatting belongs to `rustfmt` / CI. Never raise a finding whose fix is "run `cargo +nightly fmt`". +- Clippy lints are a CI concern. Don't surface them as findings — CI flags them automatically. If the new code reads poorly, raise a Small finding on the actual readability issue (Universal Guardrails #2/#3 cover most cases). +- If you're uncertain whether something is a nit, omit it. LGTM is the right verdict when the PR is clean. +- **Don't inflate severity** to look thorough. Each finding's severity is what a senior reviewer would actually call it on GitHub. Either it's worth discussing or it's omitted. + +### Per-finding shape + +1. **Title** — short noun phrase (≤ 8 words). +2. **Location** — `path/to/file.ext:line` or `path:start-end`. +3. **Explanation** — mechanism, impact, and (if relevant) what `git blame` / the codemap revealed. +4. **`Action:` OR `Question:`** — exactly one. + +When a finding has a clear mechanical fix, phrase the `Action:` so the reviewer can paste it as a GitHub *Suggested change* block. + +--- + +## 7. Report Templates + +### Terminal form (`diff` and `pr-local`) + +``` +═══════════════════════════════════════════════════════════ +PR # (or "Diff review — <branch>" in diff mode) +═══════════════════════════════════════════════════════════ +Author: @<author> (omit in diff mode) +Scope: +<add> −<del> across <N> files + (~X LOC human-written; rest generated/lockfile, filtered) +Labels: <labels or "—"> (omit in diff mode) +Base/Head: <baseRef> ← <headRef> (or "main ← <branch>" in diff mode) +Linked issue: #<N> — <title> (omit if none) +Mode: diff | pr-local | pr-ci ; full checkout | degraded static-diff + +Codemap +─────────────────────────────────────────────────────────── +<from §4 — omit if skipped> + +Prior-comment follow-up — @<reviewer> at <prior_sha_short> +─────────────────────────────────────────────────────────── +[A] <path>:<original_line> + Asked: <≤12-word recap of what was asked> + Status: ✓ Addressed | 💬 Discussion needed | ⏳ Pending + ⚠ Silently dropped | 🚫 Moot | ❓ Unclear + Cite: <path>:<line at HEAD>, or "Author replied: ...", or + "No change since <prior_sha_short>" + +Sort what-needs-attention first: Discussion needed → Pending → +Silently dropped → Unclear → Addressed → Moot. + +(Omit the whole block if §2.5 skipped.) + +─────────────────────────────────────────────────────────── +CONTEXT +─────────────────────────────────────────────────────────── +<synthesis from §5> + +─────────────────────────────────────────────────────────── +VERDICT: <LGTM | Changes requested | Needs clarification> +─────────────────────────────────────────────────────────── + +FINDINGS + [High] <count> + [Medium] <count> + [Small] <count> (QoL-only; nits omitted) + +───── High ───────────────────────────────────────────────── +1. <title> + Location: <path>:<line> + + <explanation> + + Action: <task> OR + Question: <question> + +───── Medium ─────────────────────────────────────────────── +<same shape> + +───── Small / QoL ───────────────────────────────────────── +<same shape> + +─────────────────────────────────────────────────────────── +VERIFICATION (only if user opted in) +─────────────────────────────────────────────────────────── +cargo check <status> +cargo clippy <status> +cargo +nightly fmt <status> + +─────────────────────────────────────────────────────────── +NEXT STEPS (pr-local only) +─────────────────────────────────────────────────────────── +Currently on: <current branch> +Return with: git switch <prior_branch> +``` + +#### LGTM short form + +When there are zero findings at all severities, collapse everything between `VERDICT:` and `NEXT STEPS` to a single line: + +``` +VERDICT: LGTM — no blocking or notable issues. +``` + +Header, CONTEXT, and (in `pr-local`) NEXT STEPS still print. + +### Comment form (`pr-ci`) + +In CI, post **one** review comment via the action. Use the same body shape as the terminal form, minus: + +- The NEXT STEPS section (no local branch state to print). +- The VERIFICATION block (CI runs its own checks separately). +- ANSI box-drawing characters (Markdown headings instead). + +GitHub-render the body using `##`/`###` headings and fenced code blocks. Keep findings collapsible with `<details>` if there are more than ~5. + +### Verdict selection + +- **LGTM** — no High or Medium findings. +- **Changes requested** — any High, or Medium that blocks safety/correctness. +- **Needs clarification** — no High, but one or more findings use the `Question:` form. + +--- + +## 8. Verification Offer + +*(Local modes only. Skip in `pr-ci` — CI runs its own check/clippy/fmt jobs in parallel.)* + +After printing the report, ask the user: + +> "Run local verification in the background? Reply with one of: +> `check` — `cargo check --locked --workspace --all-features --all-targets` +> `clippy` — `cargo clippy --locked --workspace --all-features --all-targets -- -D warnings` +> `fmt` — `cargo +nightly fmt --all -- --check` +> `all` — run all three +> `skip` — don't run anything" + +Dispatch each selected command as a **background** Bash invocation. On completion, append the result to a VERIFICATION block. + +If `cargo check` surfaces **compile errors inside files this PR modified**, fold them into Findings as **High**. `cargo clippy` and `cargo +nightly fmt --check` results are status-only — record them in the VERIFICATION block, never as findings. CI gates on both, so duplicating them in the review report is noise. + +Tests are intentionally not in the menu — services' suite is too long-running to gate every review on. + +--- + +## 9. Cleanup + +*(`pr-local` only.)* + +The NEXT STEPS footer names the current branch and the `git switch` to return to. Print it; never run it. The user may want to stay on the PR branch. + +--- + +## 10. Error Playbook + +| Condition | Behaviour | +|---|---| +| `gh` not installed (pr modes) | Print `Install gh: https://cli.github.com/`, abort. | +| `gh` not authenticated | Print `gh auth status` output + `Run: gh auth login`, abort. | +| PR number not parseable | Print usage, abort. | +| PR doesn't exist / wrong repo | Surface `gh` error verbatim, abort. | +| PR closed/merged | Prepend warning, proceed. | +| PR is draft | Prepend warning, proceed. | +| Working tree dirty (pr-local) | Print `git status --porcelain`, instruct stash/commit, abort. **No auto-stash.** | +| `gh pr checkout` fails (fork permission) | Degrade to static-diff mode, flag in report header. | +| Optional skill not installed | Continue using `rg` / general Rust knowledge. Do **not** abort. | +| Follow-up triage fails (5xx, rate-limit on `/reviews` or `/comments`) | Skip the section, append `follow-up triage skipped: <reason>` to the report header's `Mode:` line, continue with the rest of the review. | +| `diff` mode but no diff (branch == main) | Print `No diff vs main — nothing to review.`, exit clean. | +| Verification command fails | Surface output in VERIFICATION block; raise findings only for issues inside changed files. | + +**Rule of thumb:** never silently degrade. If a tool was missing or a step was skipped, the report's Mode/Header line should reflect it. + +--- + +## 11. Maintenance Notes + +- When the AI consistently misses a CoW-specific concern across multiple reviews, first try expressing it as one more bullet in [Universal Guardrails](#universal-guardrails). Only carve a sibling doc if it can't be expressed generically. +- When the skill produces a false-positive finding, add a one-line counter-example to the [Anti-nit Rule](#anti-nit-rule-mandatory). +- Keep this document under 500 lines. diff --git a/docs/review-context/database-migrations.md b/docs/review-context/database-migrations.md new file mode 100644 index 0000000000..6e3bc788f5 --- /dev/null +++ b/docs/review-context/database-migrations.md @@ -0,0 +1,53 @@ +# Review Context — Database Migrations + +Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches `database/sql/**`. + +This file extends — does not replace — the reminder in `.github/nitpicks.yml`. The nitpick bot posts the generic warning automatically; the reviewer applies judgment using the checks below. + +The checks are dictated by *how* we roll out — not by SQL itself. + +## Rollout shape + +Two facts shape every check below: + +1. **k8s rolls pods, not the cluster.** During a release, the new autopilot starts and runs Flyway *before* the old pod is shut down. For a non-trivial overlap window the previous version is processing requests against the new schema. Anything that breaks that path breaks production. +2. **Staging and production are independent and roll out at different times.** A migration lands in staging first; production follows on its own cadence. While the gap is open, our shadow environment can be running an *older* code version against the *newer* schema (or vice versa, briefly). Any change that breaks that combination breaks shadow without breaking the obvious path the author tested. + +Both points generalise beyond DB — config schema changes, request/response formats, and message-queue payloads have the same n-1 / staging-vs-production caveat. When a PR touches any of those *and* the DB, mention the cross-cutting risk in the synthesis. + +## Non-negotiables + +Each item carries its own severity. Items 1–3 are correctness/availability concerns and default to **High**; items 4–5 are author hygiene and default to **Medium**. + +1. **Reversibility.** State whether the migration is reversible. If yes, include or link the rollback script. If no, the PR explains *why* irreversibility is acceptable. Missing either → **High**. +2. **n-1 schema compatibility.** The previous app version must still function against the new schema. A migration that drops or renames a column, narrows a type, or adds a `NOT NULL` constraint without code already tolerating both shapes → **High** until the rollout plan is spelled out (typically: ship change in three releases — add → migrate code → drop). +3. **Blocking index creation on hot tables.** Use `CREATE INDEX CONCURRENTLY` on anything in the auction/settlement critical path (`orders`, `trades`, `auctions`, `settlements`, `order_events`, `auction_orders`, `quotes`, `order_quotes`). A blocking `CREATE INDEX` on one of these → **High**. +4. **Authoritative table list.** New tables must appear in `crates/database/src/lib.rs` (search for the top-level table list). Missing → **Medium** (CI may also catch it; still the author's responsibility). +5. **README.** Schema changes update the SQL or DB README, whichever is the convention at review time. Missing → **Medium**. + +## Other shapes that usually warrant **High** + +- `ALTER TABLE ... ADD COLUMN NOT NULL` on a multi-million-row table without a default — table lock plus slow backfill. Remedy: add nullable, batched backfill, then `NOT NULL` in a later migration. +- Renaming a column in a single migration rather than the three-release add → migrate → drop pattern. +- Adding a `UNIQUE` constraint without first verifying current data is unique (migrations fail when duplicates exist). +- `ALTER COLUMN ... TYPE ...` on a large table without a multi-step migration plan — Postgres rewrites every row and holds an `ACCESS EXCLUSIVE` lock for the duration. Remedy: new column, dual-write, backfill, swap, drop old. + +## Usually worth flagging as **Medium** + +- New foreign keys without an explicit `ON DELETE` clause (default `NO ACTION` is often surprising). +- New tables without indexes on the columns obvious queries will filter on. + +## Not findings + +- SQL style — trailing commas, keyword casing, indentation. +- Whether the migration could be one statement instead of three. If correct, accept it. +- Migration filename style — Flyway naming is enforced mechanically. + +## Questions worth asking the author + +When in doubt, prefer a `Question:` over a flagged `Action:`: + +- *"What's the expected row count in `<table>` at rollout time?"* — drives `CONCURRENTLY` and batched-backfill decisions. +- *"Which release pairs this migration with the code change?"* — makes the n-1 reasoning explicit. +- *"How does this look on shadow during the staging→production gap?"* — surfaces cross-environment compatibility before the author finds out by paging. +- *"Has the rollback script been tested against a production-sized dataset?"* — only relevant if a rollback was included. diff --git a/docs/skills/git-blame-historic-context.md b/docs/skills/git-blame-historic-context.md new file mode 100644 index 0000000000..6982fecfe3 --- /dev/null +++ b/docs/skills/git-blame-historic-context.md @@ -0,0 +1,111 @@ +# Skill — `git blame` for historic context + +Use before flagging code that looks unusual, redundant, accidental, or "easy to clean up". Often, certain decisions led to sub-optimal-looking code, and those decisions are codified in git history rather than the code itself. + +## When to invoke + +Not on first sighting. Build the full picture first — read the change / PR / file end-to-end and *collect* lines that look off during the pass — then run blame on each candidate. Code that looks weird is often being partially fixed, moved, or replaced by the diff you're currently reading; suspicion based on a single line, before you've seen its neighbours and the rest of the diff, is unreliable. + +## How to invoke + +Two ways: + +- **Procedurally** — follow the steps below. +- **Via slash command** — `/blame-context <path>:<line>` wraps the procedure end-to-end and prints the strengthens / weakens / inverts decision. Useful when you want a one-shot answer rather than walking the procedure manually. + +## Examples + +### Example 1 — magic constant (weakens) + +A reviewer sees this in `crates/driver/src/domain/competition/solution/settlement.rs:444`: + +```rust +let max_gas = eth::Gas(block_limit.0 / eth::U256::from(2)); +``` + +`/2` looks arbitrary — a reviewer might be tempted to flag it as a magic number. Run the procedure first: + +```bash +git blame -L 444,444 -- crates/driver/src/domain/competition/solution/settlement.rs +# → a4ee76aae3 Felix Leupold 2024-03-18 ... + +git log -1 --format='%s%n%b' a4ee76aae3 +# → subject ends with `(#NNNN)`; pivot to the PR +gh pr view <NNNN> +# → body: "block builders' default algorithm picks tx whose gas limit +# fits remaining space; leave headroom for inclusion." +``` + +Decision: **weakens** the "magic number" suspicion — the constant has documented inclusion-economics rationale. Drop the finding, or downgrade to a `Question:` confirming the rationale still applies on the chain in question. + +### Example 2 — defensive process exit (inverts) + +A reviewer sees this in `crates/observe/src/panic_hook.rs:14-15`: + +```rust +let new_hook = move |info: &std::panic::PanicHookInfo| { + previous_hook(info); + std::process::exit(1); +}; +``` + +Hard-killing the process from inside a panic hook looks aggressive — surely a panic should be reported and recovered from, not nuke the whole binary? Run the procedure first: + +```bash +git blame -L 14,15 -- crates/observe/src/panic_hook.rs +# → 8b918a02df Valentin 2022-09-12 ... +# (file moved from crates/shared/src/exit_process_on_panic.rs; the +# whitespace-insensitive copy-detection retry is unnecessary here +# because the same SHA blames the moved lines.) + +git log -1 --format='%s%n%b' 8b918a02df +# → "Exit process if thread panics (#530) +# Fixes #514 ." + +gh pr view 530 +# → body links to issue #514, whose body reads: +# "We spawn background tasks through `tokio::task::spawn` … a panic +# in a spawned task/thread does not affect the rest of the program. +# In these cases we want the whole program to exit." +``` + +Decision: **inverts** the suspicion. Removing the `process::exit(1)` would re-introduce exactly the silent-corruption failure mode the original PR fixed — a background panic (e.g. a cache updater) would be swallowed by tokio and the rest of the process would carry on with stale state. Drop the finding entirely; *suggesting* this change would ask the author to undo a deliberate cross-process invariant. + +## Procedure + +```bash +git blame -L <start>,<end> -- <path> # who/what/when + +# cowprotocol/services squash-merges. Most blames point at one commit whose +# subject ends with "(#NNNN)" — extract the PR number, then pivot to the +# PR conversation, which is usually richer than the commit body alone. +git log -1 --format='%s%n%b' <sha> +gh pr view <NNNN> +``` + +## Decision + +Promote what blame reveals into the finding's Explanation, then weigh the finding: + +- **Strengthens** — surrounding code was added recently for a reason the diff now contradicts. Keep / raise severity. +- **Weakens** — the originating PR explains *why* the shape is unusual (deliberate workaround, perf fix, cross-version compat). Soften, or pivot from `Action:` to `Question:`. Example: a two-line guard looked over-defensive; blame showed it was a hot-patch lifted from prod logs — Medium → Question, asking whether the failure mode still applies. +- **Inverts** — flagging this would ask the author to undo a hard-won fix. Drop the finding. Example: a `Some(_) =>` arm looked redundant; blame revealed it was added months ago to swallow a panic on an edge case the diff was about to remove. Dropped, with a note that the panic is back. + +## Edge cases + +- **Merge commit, not squash** — inspect with `git log -1 --format='%s%n%b' <sha>` and walk parents (`<sha>^1`, `^2`) to find the commit that actually authored the line. +- **Same author as the PR under review, recent** — context is fresh; ask the author directly in the review thread instead of synthesising blame. +- **Refactor moved code wholesale** — surface blame points at the move, not the originating fix. Use `git blame -w -C -C -C -L <start>,<end> -- <path>` (whitespace-insensitive, copy-detection) to recover the real authoring commit. +- **Vendored / generated / contract-binding code** — blame the generator's input (upstream config, source `.sol`, codegen template). Skip if the surface is a JSON ABI or lockfile. + +## When to skip + +- Lines entirely new in the diff under review (no history yet). +- Pure additions of new symbols (nothing to blame). +- Generated code where the input lives elsewhere — blame the input instead. + +## Used by + +- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) §6 — before flagging unusual-looking code. +- [`COW_ORDER_DEBUG_SKILL.md`](../COW_ORDER_DEBUG_SKILL.md) — when investigating *"why is this check here?"* during order debugging. +- Ad-hoc code investigations where a line of code prompts *"this looks accidental"*. diff --git a/docs/skills/pr-blame-walk.md b/docs/skills/pr-blame-walk.md new file mode 100644 index 0000000000..3ddb7e7655 --- /dev/null +++ b/docs/skills/pr-blame-walk.md @@ -0,0 +1,157 @@ +# Skill — pr-blame-walk + +Use to investigate *"a symptom appeared in prod between T1 and T2 — which merged PR is the most likely cause?"*. Inputs are a query (the symptom + locating signal) and a merge window; output is a ranked list of suspect PRs, each with a tight context block and a one-sentence evidence-anchored *"Why suspected"*. + +This is **not** a per-PR review — that is [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md). It is also not a causality proof: output is *suspects*, not *the* cause. Final attribution requires reproducing the symptom against a revert. + +## Inputs + +- `<query>` — the symptom and any locating signal: error string, endpoint, metric name, network, onset time. Examples: + - *"500s on `POST /api/v1/orders` mainnet starting 2026-04-20 14:00 UTC"* + - *"`auction_rewards` dropped 30% on Gnosis from 2026-04-15"* + - *"settlement gas usage on mainnet up 2× since last week"* +- `<onset>` — the timestamp the symptom began, RFC3339 if known. Used as a hard cutoff: PRs merged after `<onset>` cannot be the cause. +- `<time_window>` — the merge window, ending at or before `<onset>`. Either a date range (`merged:YYYY-MM-DD..YYYY-MM-DD`) or a count (last N merged PRs). +- `<repo>` — `<owner>/<repo>` (defaults to `cowprotocol/services`). +- `<scope>` — optional path filter (e.g. `crates/autopilot/`, `crates/driver/src/domain/competition/`, `database/sql/`). Skip if the symptom is system-wide. + +## Procedure + +### 1. Enumerate candidates + +```bash +gh pr list -R <repo> --state merged \ + --search "merged:<YYYY-MM-DD>..<YYYY-MM-DD>" \ + --json number,title,mergedAt,mergeCommit,author,additions,deletions \ + --limit 100 +``` + +If the result hits `--limit`, narrow `<time_window>` and re-run — over 100 candidates means the window is too wide for the rubric to do useful work. Default `gh pr list` limit is 30; pass `--limit` explicitly. + +### 2. Cheap pre-filter (no fetch) + +Drop candidates without fetching their diff: + +- **Merged after `<onset>`** — `mergedAt > <onset>`. Cannot be cause. +- **Pure dep bump** — title matches `^Bump `, `^chore: bump`, `RUSTSEC`, `dependabot`. Diff is `Cargo.toml` version bumps + `Cargo.lock` only. +- **Pure docs / CI workflow / formatting / test-only** — files confined to `docs/`, `.github/`, `*.md`, `**/tests/`. Keep CI changes only if `<query>` is itself a build/CI symptom. +- **Network mismatch** — title or label explicitly names a chain different from `<query>`'s network *and* the diff doesn't touch shared infra. + +### 3. Per-candidate fetch (parallel, capped) + +For each surviving candidate, in parallel — **cap concurrency at 5–10**. `gh` shares the GitHub API budget (5000 req/hr authenticated) with the rest of the session. + +```bash +gh pr view <N> -R <repo> --json title,body,files,baseRefName,headRefName,labels +gh pr diff <N> -R <repo> +``` + +If `gh` returns `API rate limit exceeded`, pause and retry with smaller batches. Do not fall back to scraping the web UI. + +### 4. Per-candidate context block + +Run [`pr-context-synthesis`](pr-context-synthesis.md) per candidate with `<pr_text>` = title+body, `<linked_issue>` = referenced issue if any, `<diff_summary>` = file list + diff hunks. **Cap output at 3–4 sentences** — this skill needs the *what*, not the full *why/how*. The anti-vague-verb rule from that skill's §Rules applies verbatim. + +If the diff revives or reverts code whose origin isn't obvious from the PR text, run [`git-blame-historic-context`](git-blame-historic-context.md) on the deleted lines (against `origin/main^`) to recover the originating commit/PR — useful for "Why suspected" when the suspect undoes a prior fix. + +### 5. Score, rank, emit + +Apply [Scoring rubric](#scoring-rubric). Drop everything below Medium. Sort by tier, then by tiebreaker. Emit per [Output](#output). + +## Scoring rubric + +For each surviving candidate, evaluate signals and pick the highest tier that applies: + +| Tier | Signal | +|---|---| +| **High** | Touches a file/symbol/endpoint **explicitly named** in `<query>`. | +| **High** | Modifies behaviour the symptom directly describes (e.g. `<query>` says *"rewards dropped"* and the PR changes a reward calculation; *"500 on /quote"* and the PR touches the quote handler). | +| **High** | Touches a code path the symptom would naturally route through, on the same network/chain mentioned in `<query>`. | +| **Medium** | Same crate/module as a High signal but a different surface (sibling function, shared util). | +| **Medium** | Touches shared infra (gas estimation, RPC client, DB access, serialization, settlement queueing) that could plausibly chain through to the symptom. | +| **Drop** | Pure rename / move / formatting — no behaviour change. (`additions ≈ deletions` and the diff is `mv`-shaped or whitespace-only.) | +| **Drop** | Pure comment-only changes. | +| **Drop** | Wholly unrelated crate with no plausible chain to the symptom. | + +(The pre-filter Drops in step 2 don't reach this rubric — they were dropped without fetching.) + +### Tiebreakers + +When two PRs land at the same tier: + +1. **Smaller diff first.** Smaller change = cheaper to bisect / revert. `additions+deletions` is the proxy. +2. **Hot-path > cold-path.** A change in a request-handling code path beats a change to one-shot init or batch-job code. +3. **Multiple High signals stacked > single High signal.** Note the stack in *"Why suspected"*. + +### Single-best mode + +If exactly one candidate scores High and the next-highest is Medium-or-lower, surface only that one in *High suspects* with `(only High suspect; next is <tier>)` after the title. Mediums still print. + +## Output + +``` +PR-blame-walk — <query> +Window: <time_window> +Onset: <onset or "—"> +Repo: <repo> +Scope: <scope or "—"> +Scanned: <total candidates after enumeration> +Pre-filter: <count> dropped (post-onset, deps, docs/CI, network mismatch) +Surfaced: <high count> high, <medium count> medium + +───── High suspects ──────────────────────────────────────── +1. #<N> — <title> [merged <YYYY-MM-DD> by @<author>; +<add>/-<del>] + What: <3–4 sentence synthesis from pr-context-synthesis> + Why suspected: <one or two sentences naming a file, symbol, or + behaviour change. Vague verbs forbidden.> + Inspect: gh pr view <N> -R <repo> --web + +2. ... + +───── Medium suspects ───────────────────────────────────── +<same shape> +``` + +Empty result: + +``` +No suspects in window. Symptom likely originates outside this PR set: +config change, upstream dep, infra/RPC, or a PR predating the window. +Consider broadening <time_window>, removing <scope>, or checking +deploys (`gh api -X GET repos/<repo>/deployments?per_page=20`) and infra. +``` + +## Evidence integration + +The skill itself doesn't run evidence queries — but the operator usually has them open already. Cite the query that would sharpen the score; let the operator run it. + +| Evidence | Strengthens score when | Read-only query | +|---|---|---| +| **Victoria Logs** | Symptom onset matches the suspect's deploy time | `victorialogs_query` MCP, e.g. `container:!controller AND <symptom> \| fields _time, _msg, all` over the window straddling `mergedAt` | +| **postgres MCP** | Symptom is DB-shaped (timeouts, missing rows, migration drift) and the suspect added a migration | `mcp__postgres-protocol__query` with `SELECT` only; show the SQL before running | +| **Squash-commit → PR lookup** | You have a SHA from `git log` or a deploy log and need its PR | `gh pr list -R <repo> --search "<sha>" --state merged` | +| **Deploys vs merges** | Suspect deploy-time vs merge-time mismatch (a PR can merge hours before deploy) | `gh api -X GET repos/<repo>/deployments?per_page=20` | + +The principle: name *which* evidence query would tip the score, not the result of running it. + +## Rules + +1. **Score from evidence, not vibes.** Every *"Why suspected"* line names a concrete file, symbol, or behaviour change. *"Looks suspicious"* / *"updates X"* / *"changes Y"* are failures — the anti-vague-verb rule from [`pr-context-synthesis`](pr-context-synthesis.md) §Rules applies verbatim. +2. **The query is the lens.** A PR risky in general but unrelated to `<query>` is not a suspect. This skill is not a retroactive review. +3. **Don't synthesise causality.** Output is *suspects*; final attribution requires a revert / bisect. +4. **Time-anchor.** Drop PRs merged after `<onset>` — they cannot be the cause. Watch for force-push edge cases: trust `mergeCommit.oid` and `gh pr view <N>`, not the squash subject's `(#NNNN)` alone. +5. **Read-only.** Every command in this skill is read-only: `gh` (`view` / `list` / `diff` / `api` GET-only), `git` (`log` / `blame` / `show`), MCP query tools (`SELECT` only). Never `gh pr review`, `gh pr comment`, mutating `gh api` verbs (`POST`/`PATCH`/`DELETE`), DB writes, or `git commit`/`push`/`checkout` of suspect branches. + +## When to skip + +- **Symptom not narrowed to a window yet** (e.g. *"things feel slow"*). Narrow via metrics/logs first, then run this skill. +- **Single suspect already in mind** — use `/review-pr <N>` or read the diff directly. This skill is N→1, not 1→1. +- **Symptom predates available PR history** — broaden the window or accept the cause is older than the rubric can see. +- **Symptom is config / infra / upstream** — there are no source-code PRs in this repo to scan; check deploys, config, RPC provider, or external dependencies instead. +- **Pre-filter drops everything** — the rubric has nothing to score; widen `<time_window>` or relax `<scope>`. + +## Used by + +- Ad-hoc incident investigations: paged on a fresh prod regression, run this before opening individual PRs. +- May be invoked as a follow-up from [`COW_ORDER_DEBUG_SKILL.md`](../COW_ORDER_DEBUG_SKILL.md) when an order-debug session pins a regression to a window with no obvious cause inside the order's lifecycle. +- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) is the *complement*, not a caller — once this skill surfaces a suspect, run `/review-pr <N>` on it. diff --git a/docs/skills/pr-context-synthesis.md b/docs/skills/pr-context-synthesis.md new file mode 100644 index 0000000000..941b5f22e5 --- /dev/null +++ b/docs/skills/pr-context-synthesis.md @@ -0,0 +1,49 @@ +# Skill — PR context synthesis + +Use to produce a tight 1–3 paragraph *what / why / how* block for a single PR (or PR-shaped change). Consumed by the PR review report's CONTEXT section and by other workflows that need a per-PR summary (e.g. ad-hoc *"summarise this PR for me"* or per-candidate context in an incident-investigation walk). + +## How to invoke + +Two ways: + +- **Procedurally** — follow the rules below when you need a tight 1–3 paragraph synthesis (CONTEXT block of `/review-pr`, ad-hoc *"summarise this PR for me"*, per-candidate context for an investigation). +- **Via slash command** — `/pr-synthesis <N|owner/repo#N|url>` fetches the PR, linked issue, and diff for you and prints the synthesis verbatim. + +## Inputs + +Listed ground-truth-first; downstream items are interpreted relative to the diff. + +- `<diff_summary>` — file scope plus a codemap or per-file note of the actual change. The ground truth the synthesis must stay anchored to. For large PRs this may be file-scope only (paths + ±counts + change type), without hunks — `/pr-synthesis` falls back to that when the diff is too big to fetch in full. +- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), use the current branch name and the relevant commit messages (i.e. diff against `main`). +- `<linked_issue>` — title and body of any issue referenced via `Fixes #N` / `Closes #N` / `Resolves #N`. May be empty. + +## Rules + +1. **Synthesize, don't copy-paste.** If `<pr_text>` is five words, say so plainly: *"description is minimal; intent inferred from diff"*. Don't pad to look thorough. +2. **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s *current* state, not the author's iteration history. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*. Don't raise an `Action:` finding here — this skill reports facts; the consumer (e.g. `/review-pr`) decides whether to escalate. Do **not** flag the absence of a changelog of removed/superseded behaviour either — that belongs in commit history, not the description. +3. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism. + +## Shape + +- **Paragraph 1** — *what* changed. Component + concrete change, drawn from `<diff_summary>`. +- **Paragraph 2** — *why*. Drawn from `<pr_text>` and `<linked_issue>`. If both are thin, say so. +- **Paragraph 3** (only if warranted) — *how*. The approach, not a line-by-line walkthrough. + +## Example + +Inputs (real PR — cowprotocol/services#4371): + +- `<pr_text>` — *"Enforce EIP-7825 per-tx gas cap on settlement"*; body explains the Fusaka mempool cap and references the existing quote-side enforcement (#4261). +- `<linked_issue>` — #4368, *"Driver doesn't enforce the EIP-7825 per-tx gas cap"* (labels: bug, good first issue). +- `<diff_summary>` — +58 −1 in `settlement.rs`; new const `EIP_7825_TX_GAS_CAP`, `Gas::new` capped via `min(half_block, EIP_7825)`, three tests. + +Output: + +> Fusaka introduced EIP-7825, capping any single tx at 2^24 − 1 gas. The driver's `Gas::new` was applying only the older `block_limit/2` heuristic; a solution between the EIP-7825 cap and that heuristic could pass validation in `/solve` and never settle. Issue #4368 flagged this as theoretical (no production hits); the fix ports the same idea PR #4261 applied to the quote path. +> +> Mechanically: `max_gas = min(half_block, EIP_7825)`, preserving inclusion-economics on chains with low block gas limits via the `min`. + +## When to skip + +- Trivial changes (docs typo, single-line dep bump, lockfile-only). One sentence is enough; don't force the three-paragraph shape. +- `<pr_text>` and `<linked_issue>` are both empty *and* `<diff_summary>` already speaks for itself (e.g. a single-file rename). One sentence noting that the diff is self-explanatory is the correct output.