From 37520e04ea3a63710856e31cdd5d0fa6f7054704 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Tue, 21 Apr 2026 14:06:06 +0530
Subject: [PATCH 01/14] added claude skill (WIP) for PR reviews help

---
 .claude/commands/review-pr.md              | 132 ++++++++
 .claude/settings.json                      |  13 +
 docs/COW_PR_REVIEW_SKILL.md                | 350 +++++++++++++++++++++
 docs/review-context/alloy-rs.md            |  58 ++++
 docs/review-context/database-migrations.md |  48 +++
 docs/review-context/openapi.md             |  58 ++++
 6 files changed, 659 insertions(+)
 create mode 100644 .claude/commands/review-pr.md
 create mode 100644 .claude/settings.json
 create mode 100644 docs/COW_PR_REVIEW_SKILL.md
 create mode 100644 docs/review-context/alloy-rs.md
 create mode 100644 docs/review-context/database-migrations.md
 create mode 100644 docs/review-context/openapi.md
diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md
new file mode 100644
index 0000000000..e246cb898a
--- /dev/null
+++ b/.claude/commands/review-pr.md
@@ -0,0 +1,132 @@
+---
+description: Produce a structured local review report for a CoW services PR (read-only; user posts comments manually)
+---
+
+Review PR: $ARGUMENTS
+
+Follow the instructions in `./docs/COW_PR_REVIEW_SKILL.md` to produce the review report.
+
+## Prologue (execute in order; abort on any failure)
+
+### 1. Parse $ARGUMENTS
+
+Accept any of:
+- A PR number: `4267`
+- A full URL: `https://github.com/cowprotocol/services/pull/4267`
+- An `owner/repo#N` form: `cowprotocol/services#4267`
+
+Default owner/repo to `cowprotocol/services` if only a number is given.
+
+Extract: `<PR_NUMBER>`, `<owner>`, `<repo>`.
+
+If unparseable, print this usage and abort:
+```
+Usage: /review-pr <PR_NUMBER>
+       /review-pr https://github.com/owner/repo/pull/<N>
+       /review-pr owner/repo#<N>
+```
+
+### 2. Prereq check
+
+**Hard (abort if missing):**
+
+Check that `<RUST_SKILLS_PATH>` exists as a directory. If missing, print and **abort**:
+```
+✗ Required skill missing: actionbook/rust-skills
+
+  Install:
+    npx skills add actionbook/rust-skills
+
+  Then exit this Claude session (/exit) and restart with:
+    claude --continue
+```
+
+**Soft (warn and continue if missing):**
+
+Check that `<QM_PATH>` exists as a directory. If missing, print the banner below and **continue**:
+```
+⚠ Optional skill missing: alirezarezvani/claude-skills/ra-qm-team
+  Install (recommended):
+    npx ai-agent-skills install alirezarezvani/claude-skills/ra-qm-team
+```
+
+Build a `loaded_context` list containing the prereq skills that ARE installed. This list appears in the report header's `Loaded context:` line.
+
+> **Note to maintainer:** Replace `<RUST_SKILLS_PATH>` and `<QM_PATH>` with the absolute paths captured in `docs/superpowers/plans/2026-04-21-pr-review-skill.notes.md` (§0.1). They're left as placeholders because the installer paths are determined empirically post-restart.
+
+### 3. Clean-tree check
+
+Run `git status --porcelain`. If output is non-empty, print it plus the following message, then **abort**:
+```
+Working tree is dirty. Stash or commit your changes, then re-run.
+
+  git stash        # temporary
+  git stash pop    # to restore later
+```
+
+**Never auto-stash.**
+
+### 4. Update main
+
+Save the current branch name to `<prior_branch>` (this variable is used again in step 8 of the reference doc).
+
+Then:
+- If the current branch IS `main`: run `git pull --rebase origin main`.
+- Otherwise: run `git fetch origin main` (do **not** rebase the user's feature branch silently).
+
+On rebase conflict, abort and print:
+```
+Rebase conflict on main. Resolve manually, then re-run.
+
+Conflicted files:
+<output of: git diff --name-only --diff-filter=U>
+```
+
+### 5. Checkout PR
+
+Run `gh pr checkout <PR_NUMBER> -R <owner>/<repo>`.
+
+On failure, handle specifically:
+
+- **`gh` not installed** → print `Install gh: https://cli.github.com/` and abort.
+- **Auth error** → print the output of `gh auth status` plus `Run: gh auth login` and abort.
+- **PR doesn't exist / wrong repo** → surface `gh`'s error verbatim and abort.
+- **Fork without checkout permission** → switch to **degraded static-diff mode**:
+  - Do **not** abort.
+  - Set `mode = "degraded static-diff"`.
+  - In step 6 of the reference doc, replace the `gh pr diff` call with:
+    ```bash
+    gh pr diff <PR_NUMBER> --patch -R <owner>/<repo>
+    ```
+  - Flag the degraded mode prominently in the report header's `Mode:` line.
+- **Any other error** → surface verbatim and abort.
+
+If successful, `mode = "full checkout"`.
+
+### 6. Noise filtering (before handoff)
+
+CoW services PRs frequently bundle generated/mechanical files with the real change. Before handing off, classify each changed file in the diff:
+
+**Review surface (read fully):**
+- Anything under `crates/*/src/**/*.rs` (excluding `contracts/generated/**`).
+- `crates/e2e/tests/**/*.rs`.
+- `contracts/solidity/**/*.sol` (authored Solidity).
+- Config files: `**/openapi.yml`, `database/sql/**`, `configs/**`, `*.toml` (only if semantically interesting — dep version bumps with CI green are generally fine).
+
+**Noise (skip or skim):**
+- `Cargo.lock` (any): reviewing lockfile diffs is high-cost, low-signal. If CI is green, cargo resolved the graph; a lockfile review won't catch what CI missed. Skip.
+- `contracts/generated/**` and `contracts/artifacts/**`: machine-generated bindings and ABI JSON. Skip unless the PR is explicitly *about* the binding generation process.
+- Auto-generated `Cargo.toml` entries from contract-binding crates (`contracts/generated/*/Cargo.toml`): skip.
+- Binary fixtures, ABI blobs, snapshot files: skip.
+
+Report the filter in the report's header `Scope:` line as `+X −Y across Z files (~N LOC human-written; rest generated/lockfile, filtered)`. This tells the human reviewer the *real* surface size — a 4000-line diff is often 400 lines of actual review.
+
+### 7. Handoff
+
+Read `docs/COW_PR_REVIEW_SKILL.md` and follow it from **§2. Metadata Fetch** onward, passing through:
+
+- `<PR_NUMBER>`, `<owner>`, `<repo>`
+- `<prior_branch>` (the branch to return to at the end)
+- `loaded_context` (list of installed prereq skills)
+- `mode` (`full checkout` or `degraded static-diff`)
+- The filter list from step 6 — the reference doc uses it to decide what to `Read`.
diff --git a/.claude/settings.json b/.claude/settings.json
new file mode 100644
index 0000000000..14cc499073
--- /dev/null
+++ b/.claude/settings.json
@@ -0,0 +1,13 @@
+{
+  "extraKnownMarketplaces": {
+    "superpowers-dev": {
+      "source": {
+        "source": "github",
+        "repo": "obra/superpowers"
+      }
+    }
+  },
+  "enabledPlugins": {
+    "superpowers@superpowers-dev": true
+  }
+}
diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
new file mode 100644
index 0000000000..403eadae0f
--- /dev/null
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -0,0 +1,350 @@
+# CoW Services PR Review Skill
+
+This document instructs Claude how to produce a local PR review report. It is invoked by `.claude/commands/review-pr.md` after the prologue (arg parsing, prereq check, clean-tree check, main update, PR checkout).
+
+At this point you have:
+
+- A parsed `<PR_NUMBER>`, `<owner>`, `<repo>`.
+- A `mode` flag: `full checkout` (default) or `degraded static-diff` (fork fallback).
+- A `loaded_context` list of installed prereq skills.
+- A `prior_branch` variable holding the branch name to return to.
+
+## Core Principles (read before executing)
+
+- **Signal over noise.** Report genuine concerns only. LGTM is a perfectly valid verdict and is the correct one whenever the PR is clean.
+- **Never post to GitHub.** Output is strictly for the user's terminal. No `gh pr review`, no `gh pr comment`, no `gh pr close`. The user decides what to say on GitHub.
+- **Explain, don't just flag.** Each finding must give the reviewer enough context to understand *and defend* the point — not just forward AI-generated text.
+- **Actionable framing.** Every finding ends with either a concrete `Action:` or a specific `Question:`. Never both.
+
+## Execution Flow
+
+1. Fetch PR metadata and linked issue(s) — [§2. Metadata Fetch](#2-metadata-fetch)
+2. Classify diff paths and load sibling context docs — [§3. Classification](#3-classification)
+3. Synthesize the context block — [§4. Context Synthesis](#4-context-synthesis)
+4. Produce findings by severity — [§5. Review and Severity](#5-review-and-severity)
+5. Print the structured report — [§6. Report Template](#6-report-template)
+6. Offer verification (background) — [§7. Verification Offer](#7-verification-offer)
+7. Print cleanup hint — [§8. Cleanup](#8-cleanup)
+
+Error behavior is consolidated in [§9. Error Playbook](#9-error-playbook).
+
+---
+
+## 2. Metadata Fetch
+
+Run these commands in **parallel** (single message, multiple Bash tool calls):
+
+```bash
+# Full metadata
+gh pr view <PR_NUMBER> -R <owner>/<repo> --json title,body,author,labels,files,baseRefName,headRefName,additions,deletions,commits,reviewDecision,isDraft,state
+
+# Full diff
+gh pr diff <PR_NUMBER> -R <owner>/<repo>
+```
+
+In **degraded static-diff mode** (fork fallback), replace `gh pr diff` with:
+```bash
+gh pr diff <PR_NUMBER> --patch -R <owner>/<repo>
+```
+
+### Linked issues
+
+Parse the PR body for `Fixes #<N>`, `Closes #<N>`, `Resolves #<N>` (case-insensitive). For each match, fetch in parallel with the above:
+```bash
+gh issue view <N> -R <owner>/<repo> --json title,body,labels,state
+```
+
+If no linked issue is referenced, proceed without one. Do not manufacture an issue link.
+
+### State handling
+
+- `state == "CLOSED"` or `state == "MERGED"` → proceed, but prepend to the report:
+  > ⚠ This PR is {closed,merged}; review is informational.
+- `isDraft == true` → proceed, but prepend:
+  > ⚠ This PR is a draft; author may still be iterating.
+
+---
+
+## 3. Classification
+
+Walk the file list from `gh pr view --json files`. For each changed file, evaluate these rules and accumulate a `context_docs` list:
+
+| Rule | Match | Load |
+|---|---|---|
+| R1 — Alloy usage (path) | Any change under `crates/ethrpc/`, `crates/chain/`, or `crates/contracts/` | `docs/review-context/alloy-rs.md` |
+| R2 — Alloy usage (import) | Any `.rs` file whose diff hunks add `use alloy::*;`, `use alloy_*;`, or `alloy::` qualified paths | `docs/review-context/alloy-rs.md` |
+| R3 — DB migrations | Any file under `database/sql/**` | `docs/review-context/database-migrations.md` |
+| R4 — OpenAPI | Any file matching `**/openapi.yml` | `docs/review-context/openapi.md` |
+
+R1 and R2 are OR'd — load `alloy-rs.md` **once** if either matches.
+
+### Loading
+
+Read each matched sibling doc via the `Read` tool. Record the loaded list — it will appear in the report header's `Loaded context:` line.
+
+If no rules match, `context_docs` is empty and the review proceeds with only the always-loaded `CLAUDE.md` + `actionbook/rust-skills`.
+
+### Future siblings
+
+This list will grow. When adding a new sibling doc (e.g. `solver-engine.md`, `autopilot.md`), add its trigger row to the table above. Keep the filters tight — a sibling should load only when its content is actually relevant.
+
+---
+
+## 4. Context Synthesis
+
+Produce a 1-3 paragraph block combining:
+
+- PR title.
+- PR description (Description / Changes / How to test sections from the template, if filled).
+- Linked issue title + description, if any.
+- File scope — breadth of the diff (single crate, cross-crate, docs-only, DB, API spec).
+- Rough intent — new feature, bugfix, refactor, dep bump, docs, test-only.
+
+### Rules
+
+1. **Synthesize, do not copy-paste.** If the description is five words, say so: *"PR description is minimal — intent inferred from diff"*. Don't pretend.
+2. **Flag description-vs-diff mismatches as findings.** If the description says "docs-only" but `.rs` files are touched, open a **High**-severity finding titled `PR description contradicts diff scope` immediately. This is the one case where a finding precedes the normal review loop.
+3. **Linked-issue context goes into synthesis, not as a separate block.** Summarise motivation from the issue alongside the PR's own description.
+4. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism.
+
+### Shape
+
+- **Paragraph 1** — *What* the PR changes (mechanics, file scope).
+- **Paragraph 2** — *Why* the change exists (from description + linked issue).
+- **Paragraph 3** (only if warranted) — *How* it's implemented (the approach, not a line-by-line walkthrough).
+
+---
+
+## 5. Review and Severity
+
+Read the diff. For non-trivial hunks, read the full changed file(s) for surrounding context. Apply, in order:
+
+1. Generic Rust review from `actionbook/rust-skills`.
+2. CoW services conventions from `CLAUDE.md`.
+3. Conditionally loaded sibling docs from [§3](#3-classification).
+4. Soft QM skill (`ra-qm-team`), if in `loaded_context`.
+
+### Severity Rubric
+
+| Severity | Meaning | Example |
+|---|---|---|
+| **High** | Merging as-is is a real risk: correctness bug, data loss, security issue, incompatible DB migration, auction/settlement invariant broken, likely panic, unsound `unsafe`. | `.unwrap()` on a solver response path; SQL migration that rewrites a multi-million-row table without `CONCURRENTLY`. |
+| **Medium** | Worth fixing before merge — won't break prod but will cost later. Missing error context, test gap on a new invariant, public API ergonomics, missing doc-comment on a cross-crate `pub`, unhandled edge case. | New `pub fn` in `shared` with no doc-comment; `?` swallowing an error without context. |
+| **Small / QoL** | Would genuinely improve the code. **Not a nit.** | `Vec` could be `impl Iterator` in a hot path; duplicated 3-line block could be a helper. |
+
+### Anti-nit Rule (mandatory)
+
+- If the only reason to change it is personal taste or stylistic preference, **do not report it**.
+- Formatting findings belong to `rustfmt` / CI, not to this skill. Never surface a finding whose fix is "run `cargo +nightly fmt`".
+- Clippy lints are a CI concern by default. **Exception:** a clippy warning inside the new code may be reported as **Small** if and only if it improves correctness or clarity — never for style.
+- If you're uncertain whether something is a nit, omit it. LGTM when clean.
+- **Don't inflate severity.** The severity of a finding is what a senior reviewer would actually call it in GitHub, not what sounds safer. Most substantive comments are just unmarked; only the few purely cosmetic ones carry a `nit:` prefix in practice. Don't tier-down everything to "Small" to avoid confrontation — that dilutes the signal. Either it's a Medium worth discussing or it's omitted.
+
+### Reviewer Discipline — Heuristics Beyond "Is This Bug"
+
+A senior reviewer catches things that aren't bugs. They shape the code for future readers and future maintainers. These are the patterns a good CoW review surfaces:
+
+1. **Motivation before mechanism.** Before reviewing *how* the code works, verify *why* it exists. If the PR description doesn't justify the change, or justifies it with an assumption ("EIP-4626 tokens will have coverage problems"), ask the author to confirm the assumption — as a `Question:`, not a demand. You can't judge tradeoffs against a motivation you don't understand.
+
+2. **Root cause over workaround.** If a new file contains logic that feels like it exists to cancel out a wart somewhere else ("insert WETH into non_vault_tokens", "treat this special case differently"), investigate the wart. Fixing the root cause upstream is usually cleaner than adding a compensating wart downstream. Ask the author: *"this logic looks like it's working around X — could X be fixed instead?"*
+
+3. **Suggest simpler primitives; listen to pushback.** If code uses `join!` where `try_join!` would work, ask. If it uses `Mutex<HashSet<_>>` where `DashSet` would work, ask. The author often has a reason (as in the eip4626.rs case where `join!` is deliberate because the branch logic depends on both results), and the back-and-forth resolves the question. Framing: *"Can this be `<simpler alternative>`?"*, not *"Use `<alternative>`"*.
+
+4. **Return-type consistency over side-effects at distance.** A function returning `Result<Option<(T, U)>, Error>` explicitly encodes its three states (value / no-value / error). A function that writes to a cache two layers down the call stack as a side effect hides the same information. Prefer explicit return types. The cost of a verbose type signature is always smaller than the cost of a reviewer (or future you) missing a hidden mutation.
+
+5. **Top-to-bottom readability.** Within a file, order functions in the direction callers-before-callees or high-level-before-low-level (pick one and keep it consistent). When reviewing, if you find yourself scrolling back and forth between `fn do_thing` and `fn helper_for_thing`, flag the ordering — *not as a nit, as a Medium if it's non-trivial navigation*.
+
+6. **Stale comments.** When behavior changes, comments describing that behavior often rot. Read the comments *against* the code around them and flag any drift. A comment saying "zero timeout signals best-effort" attached to code that no longer has a zero-timeout path is a Medium finding — future readers will believe it.
+
+7. **Description-vs-code mismatches.** If the PR body describes `Mutex<HashSet>` and the code has `DashSet`, the description is stale. Flag it — not because it changes the code, but because whoever reads the PR as history will be confused. Small finding, usually.
+
+8. **Bundled orthogonal changes.** When a PR contains a change that isn't clearly required by the main feature, ask: does this belong in its own PR? Sometimes the answer is "it's required, here's why" (fine — ask for the reason to be in the commit message or a code comment). Sometimes the answer is "you're right, let me split it" (better git history).
+
+9. **Error taxonomies in web3 code.** Blockchain RPC errors have multiple categories: contract reverts, RPC transport failures, provider-side rate-limiting, timeout errors, decoding failures. Each should lead to *different* handling. If new code treats `Err(_)` as a single bucket when the branches should differ (e.g. cache a non-vault verdict on contract revert, but *not* on a network error), this is a correctness issue — Medium or High depending on whether the wrong classification can poison downstream behavior.
+
+10. **Generated-code and lockfile diffs.** Never include generated-bindings or `Cargo.lock` findings. CI's type-check validates the generated code; if CI is green, there's nothing a reviewer will see that the compiler won't. Filter these at the entry-point (see `.claude/commands/review-pr.md` §6) and report only on human-written changes.
+
+### Using GitHub "Suggested change" Blocks in the Output
+
+When a finding has a clear mechanical fix (e.g. remove a `.clone()`, swap a type), phrase the `Action:` line so the user can paste it straight into a GitHub *Suggested change* block. Example:
+
+```
+Action: Replace `self.provider.clone()` with `&self.provider` — the alloy
+        Instance constructors take `IntoProvider` which is implemented for
+        both owned and borrowed providers. (GitHub suggested-change:
+        `let vault = IERC4626::new(token, &self.provider);`)
+```
+
+The bracketed suggestion is copy-pasteable into GitHub's suggestion feature, which makes the author's acceptance a single click.
+
+### Per-finding Shape
+
+Every finding contains exactly these four parts:
+
+1. **Title** — short noun phrase (≤ 8 words).
+2. **Location** — `path/to/file.ext:line` (or `path/to/file.ext:start-end` for a range).
+3. **Explanation** — enough that the reviewer understands *and can defend* the point without re-reading the diff. Must include:
+   - **The mechanism** (what's wrong).
+   - **The impact** (why it matters for CoW services specifically — auction, settlement, solver competition, DB migration, etc.).
+   - **Repo-specific context** where it helps (e.g. "this path runs per-auction, ~12s cadence, so an extra RPC is costly").
+4. **Action OR Question** — exactly one, never both:
+   - `Action: <concrete task the author should do>`
+   - `Question: <specific clarification needed before the reviewer can decide>`
+
+---
+
+## 6. Report Template
+
+Print the report in this exact shape. Omit sections that don't apply (e.g. no `VERIFICATION` block if the user declined).
+
+```
+═══════════════════════════════════════════════════════════
+PR #<N> — <title>
+═══════════════════════════════════════════════════════════
+Author:       @<author>
+Scope:        +<additions> −<deletions> across <N> files
+Labels:       <labels, comma-separated; or "—">
+Base/Head:    <baseRef> ← <headRef>
+Linked issue: #<N> — <issue title>            (omit line if none)
+Loaded context: <comma-separated loaded_context list>
+Mode:         full checkout   |   degraded static-diff
+
+───────────────────────────────────────────────────────────
+CONTEXT
+───────────────────────────────────────────────────────────
+<synthesis from §4>
+
+───────────────────────────────────────────────────────────
+VERDICT:  <LGTM | Changes requested | Needs clarification>
+───────────────────────────────────────────────────────────
+
+FINDINGS
+  [High]    <count>
+  [Medium]  <count>
+  [Small]   <count>    (QoL-only; true nits omitted)
+
+───── High ─────────────────────────────────────────────────
+1. <title>
+   Location:  <path>:<line>
+
+   <explanation>
+
+   Action:    <task>      OR
+   Question:  <question>
+
+───── Medium ───────────────────────────────────────────────
+<same shape>
+
+───── Small / QoL ─────────────────────────────────────────
+<same shape>
+
+───────────────────────────────────────────────────────────
+VERIFICATION   (only if user opted in)
+───────────────────────────────────────────────────────────
+cargo check            <status>
+cargo clippy           <status>
+cargo +nightly fmt     <status>
+
+───────────────────────────────────────────────────────────
+NEXT STEPS
+───────────────────────────────────────────────────────────
+Currently on:  <current branch>
+Return with:   git switch <prior_branch>
+```
+
+### LGTM short form
+
+When there are **zero** findings (High, Medium, and Small all zero), collapse everything between `VERDICT:` and `NEXT STEPS` to a single line:
+
+```
+VERDICT:  LGTM — no blocking or notable issues.
+```
+
+The header, CONTEXT, and NEXT STEPS sections still print.
+
+### Verdict selection
+
+- **LGTM** — no High or Medium findings; zero or a handful of Small findings that the reviewer may still choose to post.
+- **Changes requested** — any High findings, or Medium findings that block safety/correctness.
+- **Needs clarification** — no High findings, but one or more Medium/Small findings use the `Question:` form because the reviewer can't decide without author input.
+
+---
+
+## 7. Verification Offer
+
+After printing the report, ask the user inline:
+
+> "Run local verification in the background? Reply with one of:
+> `check` — `cargo check --locked --workspace --all-features --all-targets`
+> `clippy` — `cargo clippy --locked --workspace --all-features --all-targets -- -D warnings`
+> `fmt` — `cargo +nightly fmt --all -- --check`
+> `all` — run all three
+> `skip` — don't run anything"
+
+On user reply:
+
+- **`skip`** → omit the VERIFICATION block entirely. Proceed to [§8](#8-cleanup).
+- **`check` / `clippy` / `fmt` / `all`** → dispatch each selected command as a **background** Bash invocation (`run_in_background: true`). Do not sleep, do not poll — the runtime sends a completion notification. On completion, call `BashOutput` to retrieve the result and append it to the VERIFICATION block:
+  ```
+  cargo check            ✅  clean
+  cargo clippy           ⚠   2 warnings (details below)
+  cargo +nightly fmt     ✅  clean
+  ```
+
+### When verification output produces findings
+
+- If `cargo check` or `cargo clippy` surfaces errors/warnings **inside files changed by this PR**, add them to the Findings list:
+  - Compile errors → **High** (the PR doesn't build).
+  - Clippy warnings that flag a correctness issue → **Medium**.
+  - Clippy warnings that flag clarity (e.g. `needless_return` inside new code) → **Small** — but only when they pass the [§5](#5-review-and-severity) Anti-nit rule.
+- Do **not** surface warnings from files the PR did not modify.
+- `cargo +nightly fmt -- --check` failures are **not** findings. The Anti-nit rule forbids surfacing format findings. The VERIFICATION block reports the status; that's where it ends.
+
+### Never run tests by default
+
+The menu deliberately omits `cargo nextest run`. Services' test suite is large and takes minutes — gating every review on a full test run is too expensive. If the reviewer wants tests, they run them outside the skill.
+
+---
+
+## 8. Cleanup
+
+The NEXT STEPS footer (already part of the report template in [§6](#6-report-template)) names the current branch and the exact command to return to the prior branch:
+
+```
+Currently on:  <current branch>
+Return with:   git switch <prior_branch>
+```
+
+**Do not run `git switch` yourself.** The user may want to stay on the PR branch to continue investigating, run their own tests, or browse related code. The command is a hint, not an action.
+
+---
+
+## 9. Error Playbook
+
+| Condition | Behavior |
+|---|---|
+| `gh` not installed | Print `Install gh: https://cli.github.com/`, abort. |
+| `gh` not authenticated | Print `gh auth status` output + `Run: gh auth login`, abort. |
+| PR number not parseable | Print usage string (see `.claude/commands/review-pr.md` step 1), abort. |
+| PR doesn't exist / wrong repo | Surface `gh` error verbatim, abort. |
+| PR is closed or merged | Prepend state warning to the report ([§2](#2-metadata-fetch)), proceed. |
+| PR is a draft | Prepend draft warning to the report ([§2](#2-metadata-fetch)), proceed. |
+| Working tree dirty | Print `git status --porcelain`, instruct `git stash` or commit, abort — **no auto-stash**. |
+| `git pull --rebase origin main` conflict | Abort, print conflicted files, instruct manual resolution. |
+| `gh pr checkout` fails (fork permission) | Degrade to static-diff mode, flag in report header `Mode:` field. |
+| Hard prereq skill missing | Print install command, abort. Handled in the entry-point prologue, not here. |
+| Soft prereq skill missing | Print install-command banner, continue. Handled in the entry-point prologue, not here. |
+| Verification offer declined (`skip`) | Omit VERIFICATION block. |
+| Verification command fails or warns | Surface output in VERIFICATION block; add findings only for issues inside changed files. |
+
+**Rule of thumb:** never silently degrade behavior without the `Mode:` header reflecting it. If something went sideways, the reviewer should know from the report itself.
+
+---
+
+## Maintenance Notes
+
+- **When you find yourself adding a project-specific heuristic more than twice**, move it into a sibling context doc under `docs/review-context/` and add a trigger rule to [§3](#3-classification).
+- **When you find the skill making a false-positive finding**, add a counter-example to the Anti-nit rule section in [§5](#5-review-and-severity). The rubric calibrates over time.
+- **When CoW introduces a new subsystem with its own review considerations** (e.g. a new crate, new auction policy, new settlement path), create a sibling doc for it and wire in a trigger.
+
+This skill is expected to grow — both `docs/review-context/*.md` and this reference's rubric will accumulate CoW-specific knowledge over time.
diff --git a/docs/review-context/alloy-rs.md b/docs/review-context/alloy-rs.md
new file mode 100644
index 0000000000..24fba96dc1
--- /dev/null
+++ b/docs/review-context/alloy-rs.md
@@ -0,0 +1,58 @@
+# Review Context — Alloy-rs Usage
+
+Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches `crates/ethrpc/`, `crates/chain/`, `crates/contracts/`, or adds `alloy::*` / `alloy_*` imports.
+
+**External reference:** https://alloy.rs/introduction/prompting (AI-optimized alloy guide). Fetch it once per session when actually reviewing alloy changes.
+
+## High-signal review checks
+
+1. **Provider reuse — and pass by reference.** Alloy's `Provider` is clonable and meant to be cloned, not rebuilt. Any new `ProviderBuilder::new().connect_http(...)` in a hot path (auction loop, per-order path, per-request handler) is a **Medium** finding — it should reuse an existing provider.
+
+   Additionally, alloy's contract `Instance` constructors accept `IntoProvider`, which is implemented for both owned providers *and references*. Pass by reference when the caller retains ownership:
+   ```rust
+   // ❌ unnecessary clone
+   let vault = IERC4626::new(token, self.provider.clone());
+   // ✅ pass by reference
+   let vault = IERC4626::new(token, &self.provider);
+   ```
+   Gratuitous `.clone()` on providers is a **Small (QoL)** finding — and a good candidate for a GitHub `Suggested change` block (mechanical fix, zero-risk).
+
+2. **Batched RPC.** CoW's `ethrpc` crate adds batching on top of alloy. New direct RPC calls that bypass the batching layer can hit rate limits under auction load. Flag any direct `provider.get_*` calls in hot paths that aren't going through `ethrpc::Web3` — **Medium**.
+
+3. **Distinguish contract-revert from transport/network error.** This is the single most CoW-specific alloy gotcha. When code runs a `call()` against an arbitrary on-chain contract and interprets the `Err`, the error has categories:
+
+   - **Contract revert** (valid ERC-20 returned an error, `asset()` not implemented, etc.) → deterministic. Safe to cache "this token is X".
+   - **Network / transport error** (RPC timeout, 5xx from the node, connection reset) → transient. **Must not be cached** — a transient failure at cache-write time would permanently mis-classify the token until restart.
+   - **Decoding error** (contract returned bytes that didn't decode into the expected type) → treat as contract-level; usually means the contract at that address doesn't implement the interface.
+
+   The `ethrpc::alloy::errors::ContractErrorExt` trait exposes `.is_contract_error()` for this split. Whenever a diff has `Err(_)` in a `call()` result's match arm and then *caches* a verdict, verify that the match is gated on `is_contract_error()`. This is a **High** finding if missing — a single network blip could persistently mark a valid vault as non-vault.
+
+3. **Decoding errors.** Alloy's `sol!` macro emits `Result` types that should be surfaced with context, not `.unwrap()`. An `.unwrap()` on `decode()` output in a code path that handles arbitrary on-chain data is **High** — malformed returns from an uncooperative contract should not panic the auction loop.
+
+4. **Gas estimation.** Avoid `provider.estimate_gas` inside the settlement path. Gas estimation hits the node and can be slow or flaky. Use the precomputed gas values from the solver solution instead. Inline gas estimation in the critical settlement path → **High**; outside it → **Medium** with a suggestion to cache.
+
+5. **Block number or tag.** When reading chain state, verify the block tag is correct for the context:
+   - Auction-time reads → the auction's block (explicit tag, not `latest`).
+   - Simulation → `latest` or `pending` depending on purpose.
+   - Settlement submission → varies by private-RPC strategy; follow existing patterns in the same file.
+
+   A blind `BlockNumberOrTag::Latest` in code that should use the auction block is **High** (inconsistent with the auction's view of the world → score mismatches).
+
+6. **Ethers-rs migration artifacts.** Services is mid-migration from `ethers-rs` to `alloy`. Mixing primitive types across the boundary (e.g. `ethers::H160` passed to a function expecting `alloy::primitives::Address`) is **Medium** — each layer should have a single primitive type. Look for ad-hoc conversions that hint at a leak across the boundary.
+
+7. **Primitives vs. types.** `alloy::primitives::U256` vs. `alloy::sol_types::sol!`-generated types — verify arithmetic is done on the primitives and the sol! types are used only at contract boundaries. Arithmetic on sol! types inside business logic is **Small** at minimum, **Medium** if it obscures overflow handling.
+
+8. **`join!` vs `try_join!` with branch logic.** When code needs the result of *both* futures before deciding what the outcome means (classic pattern: `asset()` + `decimals()` on an unknown token, where the *combination* of successes/failures carries meaning), `tokio::join!` is correct even though `try_join!` looks shorter. `try_join!` short-circuits on the first error, collapsing four outcomes into two — which is wrong for classification code.
+
+   When reviewing: if you see `tokio::join!` where `try_join!` would be briefer, check whether the code's match arms depend on *which combination* of results errored. If yes, it's correct — don't suggest `try_join!`. If no, suggest `try_join!` as a simplification, framed as a question (*"Can this use `try_join!`?"*).
+
+9. **Generated `sol!` bindings vs inline `alloy::sol!`.** CoW prefers generated contract bindings (via the `contracts` crate) over inline `alloy::sol! { #[sol(rpc)] interface X { ... } }` blocks in business code. Generated bindings avoid per-crate compile cost and let `Cmd+Click` jump to the expanded source. An inline `sol!` macro in a non-test business file is **Small** with an Action to move it to the generated-bindings path. (Test code is a softer rule — inline for quick tests is fine.)
+
+## Not findings
+
+- Style choices in how `sol!` is invoked (macro-vs-file).
+- Which alloy sub-crate was imported directly vs. through `alloy::*` re-exports.
+- Import ordering, `use` grouping.
+- Whether to use `Address::ZERO` vs. `alloy::primitives::address!("0x...")` for the null address (both are fine).
+
+These are taste / `rustfmt` concerns. The Anti-nit rule in `COW_PR_REVIEW_SKILL.md §5` forbids surfacing them.
diff --git a/docs/review-context/database-migrations.md b/docs/review-context/database-migrations.md
new file mode 100644
index 0000000000..531dc22ad1
--- /dev/null
+++ b/docs/review-context/database-migrations.md
@@ -0,0 +1,48 @@
+# Review Context — Database Migrations
+
+Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches `database/sql/**`.
+
+This file extends — does not replace — the reminder in `.github/nitpicks.yml`. The nitpick bot posts the generic DB warning automatically; the reviewer applies judgment using the checks below.
+
+## Non-negotiables (all **High** until addressed)
+
+1. **Reversibility.** The migration must state whether it's reversible. If yes, the rollback script is present or explicitly linked in the PR description. If no, the PR explains *why* irreversibility is acceptable for this change. Missing either → **High**.
+
+2. **Compatibility window.** During rollout, k8s starts the new autopilot, runs Flyway, and only then shuts down the old pod. That overlap means the **previous** version must still be able to process requests on the **new** schema. Any migration that breaks the old schema path (e.g. drops a column the old code reads) → **High** until the PR lays out an n-1 compatibility plan, typically:
+   - Release A: make the change compatible (add new column, keep old).
+   - Release B: ship code that uses the new column.
+   - Release C: drop the old column.
+
+3. **Blocking index creation on hot tables.** Tables in the critical path — `orders`, `trades`, `auctions`, `settlements`, `order_events`, `auction_orders`, `quotes`, `order_quotes` — must use `CREATE INDEX CONCURRENTLY`. A blocking `CREATE INDEX` on one of these → **High**.
+
+4. **Table-list update.** New tables must be added to the authoritative list at `crates/database/src/lib.rs:51-87`. Missing → **Medium** (not High because a CI check may also catch it, but it's still the author's responsibility).
+
+5. **DB README update.** Schema changes must update the DB README (`crates/database/README.md` or the SQL folder's README — whichever is the convention at review time). Missing → **Medium**.
+
+## Usually worth flagging (**High**)
+
+- `ALTER TABLE ... ADD COLUMN NOT NULL` on a multi-million-row table **without** a default → table lock + slow backfill → **High**. Remedy: add as nullable, backfill in a batched job, then add `NOT NULL` in a later migration.
+- Dropping a column still referenced in code the old pod runs → **High** (see §2 compatibility window).
+- Renaming a column in a single migration (rather than add-new + migrate + drop-old across three releases) → **High**.
+- Adding a `UNIQUE` constraint without verifying current data is unique → **High** (migration will fail in prod if duplicates exist).
+- Changing a column's type in place (e.g. `VARCHAR(32)` → `VARCHAR(64)`) on a large table without a plan for the rewrite cost → **Medium** to **High** depending on table size.
+
+## Usually worth flagging (**Medium**)
+
+- Migration scripts without comments explaining the "why" when the "what" is non-obvious.
+- Introducing a foreign key without `ON DELETE` behavior being specified (the default is `NO ACTION`, which can be surprising).
+- New tables without indexes on columns that obvious queries will filter on.
+
+## Not findings
+
+- SQL style: trailing commas, keyword casing, indentation.
+- Whether the migration could have been one statement instead of three — if it's correct, it's correct.
+- Whether the migration file should have been named differently (Flyway naming convention is enforced mechanically).
+
+## Questions worth asking the author
+
+When reviewing, these often deserve a `Question:` in a finding:
+
+- "What's the expected row count in `<table>` in production at rollout time?" (drives the need for `CONCURRENTLY`, batched backfill).
+- "Which release are we pairing this migration with on the code side?" (drives the n-1 compatibility reasoning).
+- "Have you tested the rollback script against a production-sized dataset?" (if one is included).
diff --git a/docs/review-context/openapi.md b/docs/review-context/openapi.md
new file mode 100644
index 0000000000..2f3d7fbfe7
--- /dev/null
+++ b/docs/review-context/openapi.md
@@ -0,0 +1,58 @@
+# Review Context — OpenAPI Spec Changes
+
+Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches any `**/openapi.yml`.
+
+This file extends — does not replace — the reminder in `.github/nitpicks.yml` about breaking-change communication.
+
+## Classify the change first
+
+Before reviewing, bucket the diff into:
+
+- **Breaking** — any of:
+  - Removing an endpoint, path, or response field.
+  - Renaming a field, path parameter, or query parameter.
+  - Changing a field's type (string → number, array → object, enum widening into a non-enum, etc.).
+  - Making a previously optional field required.
+  - **Narrowing** an enum's value set (removing a value). Adding enum values is non-breaking in most clients but may still surprise strict parsers — see below.
+  - Changing auth requirements on an endpoint.
+  - Changing an endpoint's HTTP method or path.
+- **Non-breaking but noteworthy** — new required field on a request body; new endpoint; changed default; stricter validation; added enum value.
+- **Pure additive** — new response field (old clients ignore it), new optional request field with a server-side default.
+
+The bucket drives the severity ceiling.
+
+## Breaking changes
+
+If the PR contains any breaking change:
+
+1. **High** — unless the PR description explicitly:
+   - Calls out the break.
+   - Names the affected consumers (at minimum: **Frontend team** and **SAFE team**).
+   - Describes the versioning / migration path (rollout sequence, deprecation window, any shim endpoints).
+
+2. Always include a `Question:` asking whether the consumers have been notified. Even if the description says they have, a reviewer-visible "yes, confirmed" is worth forcing.
+
+3. If the break is irreversible (e.g. a response field's semantic meaning changes while the name stays the same), escalate to **High** even if communication looks clean — semantic breaks are the ones that silently corrupt client behavior.
+
+## Non-breaking but worth flagging
+
+- **New required field on a request body** → check whether a server-side default exists. If not, existing clients will 400 — effectively breaking. **High**.
+- **New endpoint without a mention in the PR description** → **Medium**. Clients discover endpoints via the spec; teammates should know a new one landed.
+- **Missing `description:` on a new field or endpoint** → **Small**.
+- **New enum value** on a field clients parse strictly → **Medium** with a `Question:` — many OpenAPI-generated clients fail hard on unknown enum values; worth checking whether the Frontend / SAFE generators are lenient.
+
+## Not findings
+
+- YAML style, key ordering, indentation.
+- Whether the change could have been split into multiple PRs — only worth flagging if the change genuinely mixes unrelated concerns.
+- Minor description rewording that doesn't change the contract.
+
+## Communication checklist (use in Action:)
+
+When the PR warrants a Changes-requested verdict for missing communication, the `Action:` can reference this list verbatim:
+
+- [ ] Breaking changes called out in the PR description.
+- [ ] Frontend team notified (link the Slack thread or issue).
+- [ ] SAFE team notified (link the Slack thread or issue).
+- [ ] Versioning/migration path documented — what rolls out when, what deprecates.
+- [ ] Any shim or compatibility endpoint planned; if yes, link its PR or issue.

From 0dcf5cb6eeca73bdd50b3e22cece72ca75bced13 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Wed, 22 Apr 2026 02:51:36 +0530
Subject: [PATCH 02/14] Expand PR review skill with codemap phase, skill
 router, and code-vs-docs discipline

---
 .claude/commands/review-pr.md |   2 +-
 docs/COW_PR_REVIEW_SKILL.md   | 130 +++++++++++++++++++++++++++++++---
 2 files changed, 120 insertions(+), 12 deletions(-)

diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md
index e246cb898a..ee2f9313c6 100644
--- a/.claude/commands/review-pr.md
+++ b/.claude/commands/review-pr.md
@@ -1,5 +1,5 @@
 ---
-description: Produce a structured local review report for a CoW services PR (read-only; user posts comments manually)
+description: Produce a structured local PR review report for cowprotocol/services — synthesizes context, builds a codemap, applies CoW-specific + Rust review skills, and emits severity-ranked findings with actionable questions. Use when the user says "review this PR", "/review-pr 1234", "look at PR #1234", or pastes a cowprotocol/services PR URL. Read-only; the user posts any comments manually.
 ---
 
 Review PR: $ARGUMENTS
diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index 403eadae0f..5ddff262ec 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -11,20 +11,23 @@ At this point you have:
 
 ## Core Principles (read before executing)
 
-- **Signal over noise.** Report genuine concerns only. LGTM is a perfectly valid verdict and is the correct one whenever the PR is clean.
-- **Never post to GitHub.** Output is strictly for the user's terminal. No `gh pr review`, no `gh pr comment`, no `gh pr close`. The user decides what to say on GitHub.
+- **CRITICAL: Signal over noise.** Report genuine concerns only. LGTM is a perfectly valid verdict and is the correct one whenever the PR is clean. The goal is not to maximise finding count — the goal is to be worth a senior reviewer's attention.
+- **CRITICAL: Never post to GitHub.** Output is strictly for the user's terminal. No `gh pr review`, no `gh pr comment`, no `gh pr close`. The user decides what to say on GitHub.
+- **CRITICAL: Code is the primary source of truth.** `CLAUDE.md`, existing design docs, and this skill's own sibling docs can go stale. When a finding turns on *"X is called from Y"* or *"this field is read by Z"*, verify by grepping the codebase or using an LSP symbol tool — not by citing a doc. Docs give you higher-level *shape*; code gives you ground truth.
 - **Explain, don't just flag.** Each finding must give the reviewer enough context to understand *and defend* the point — not just forward AI-generated text.
 - **Actionable framing.** Every finding ends with either a concrete `Action:` or a specific `Question:`. Never both.
+- **Token discipline.** Don't read entire files when a grep or a targeted LSP symbol lookup suffices. Build a codemap (see [§3.5](#35-codemap-phase)) *before* reading file bodies. When you do need a file, read hunks adjacent to changed lines rather than the whole thing.
 
 ## Execution Flow
 
 1. Fetch PR metadata and linked issue(s) — [§2. Metadata Fetch](#2-metadata-fetch)
 2. Classify diff paths and load sibling context docs — [§3. Classification](#3-classification)
-3. Synthesize the context block — [§4. Context Synthesis](#4-context-synthesis)
-4. Produce findings by severity — [§5. Review and Severity](#5-review-and-severity)
-5. Print the structured report — [§6. Report Template](#6-report-template)
-6. Offer verification (background) — [§7. Verification Offer](#7-verification-offer)
-7. Print cleanup hint — [§8. Cleanup](#8-cleanup)
+3. Build a targeted codemap — [§3.5. Codemap Phase](#35-codemap-phase)
+4. Synthesize the context block — [§4. Context Synthesis](#4-context-synthesis)
+5. Produce findings by severity — [§5. Review and Severity](#5-review-and-severity)
+6. Print the structured report — [§6. Report Template](#6-report-template)
+7. Offer verification (background) — [§7. Verification Offer](#7-verification-offer)
+8. Print cleanup hint — [§8. Cleanup](#8-cleanup)
 
 Error behavior is consolidated in [§9. Error Playbook](#9-error-playbook).
 
@@ -90,6 +93,61 @@ This list will grow. When adding a new sibling doc (e.g. `solver-engine.md`, `au
 
 ---
 
+## 3.5 Codemap Phase
+
+**Purpose:** Before reading file bodies, build a targeted map of the symbols the diff touches, their callers, and their call sites. A codemap turns a 1000-line diff into a ~20-line mental model and preempts the "I read 10 files to find the impact" failure mode. It also catches findings that only become visible at the *shape* level (API ergonomics, unused abstractions, caller-count inconsistencies).
+
+### What to map
+
+For each non-trivial symbol the diff adds, modifies, or deletes:
+
+1. **New public types / traits / functions** — what are their fields / methods / signatures? (`rust-symbol-analyzer` or `get_symbols_overview`.)
+2. **Modified function signatures** — who calls them? (`rust-call-graph`, `find_referencing_symbols`, or `rg '<fn_name>\b' crates/`.) This is how you catch "this signature changed but 4 call sites weren't updated".
+3. **New trait impls** — which types implement the trait? Is the trait used anywhere outside the PR? (`rust-trait-explorer` or `find_referencing_symbols`.)
+4. **Error-type changes** — where do callers match on this error? (`rg '::<ErrorVariant>'` / `find_referencing_symbols`.)
+
+### Tools (prefer the cheapest viable option)
+
+In order of token cost, ascending:
+
+| Tool | When |
+|---|---|
+| `Grep` with `-n` on a symbol name | Fastest. Use when you need caller counts, not structure. Example: `rg 'OrderValidator::new\b' crates/` to verify all call sites were updated. |
+| `mcp__plugin_serena_serena__find_symbol` / `find_referencing_symbols` | Cheap, precise. Use when you need location + kind + signature, not the whole file. |
+| `mcp__plugin_serena_serena__get_symbols_overview` on a single file | Use before reading the file body — gets the symbol table for free. |
+| LSP-backed skills (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`) | Richer analysis — full call hierarchies, trait impl trees, type relationships. Use for diffs that touch cross-crate abstractions. |
+| Reading full files with `Read` | Last resort. Only when the diff hunks don't give enough surrounding context and the LSP tools can't pin down what you need. |
+
+### What the codemap produces
+
+A short block in the report header (shown to the reviewer) that looks like:
+
+```
+Codemap
+───────────────────────────────────────────────────────────
+New symbols (shared::order_validation):
+  Eip1271Simulating      (trait, new)
+  Eip1271Simulator       (struct, 3 pub fields, no constructor)
+  ValidationError::SimulationFailed(String)  (new variant)
+
+Callers of OrderValidator::new: 16 sites total (1 real, 15 test).
+  All updated in diff ✓
+
+Config asymmetry noted:
+  configs::Eip1271SimulationMode (3 variants, Disabled default)
+  shared::Eip1271SimulationMode  (2 variants, Shadow default)
+```
+
+This is not filler — it's the raw material §4 (synthesis) and §5 (findings) work from. A finding like *"`Eip1271Simulator` pub fields have no constructor; 16 callers means each future field addition is a source-break"* only becomes findable once the codemap surfaces the caller count.
+
+### When to skip the codemap
+
+- Trivial PRs (docs-only, single-line version bump, pure test addition) — skip.
+- Pure refactor PRs where the diff has no added public API — skim only.
+- Everything else — do it.
+
+---
+
 ## 4. Context Synthesis
 
 Produce a 1-3 paragraph block combining:
@@ -117,13 +175,32 @@ Produce a 1-3 paragraph block combining:
 
 ## 5. Review and Severity
 
-Read the diff. For non-trivial hunks, read the full changed file(s) for surrounding context. Apply, in order:
+With the codemap ([§3.5](#35-codemap-phase)) and context synthesis ([§4](#4-context-synthesis)) in hand, review the diff. For non-trivial hunks, read the full changed file only when the codemap + diff don't answer the question. Apply, in order:
 
-1. Generic Rust review from `actionbook/rust-skills`.
-2. CoW services conventions from `CLAUDE.md`.
-3. Conditionally loaded sibling docs from [§3](#3-classification).
+1. CoW services conventions from `CLAUDE.md`.
+2. Sibling docs from [§3](#3-classification) (conditionally loaded).
+3. **Activate installed Rust review skills by diff content (below).**
 4. Soft QM skill (`ra-qm-team`), if in `loaded_context`.
 
+### Skill router — activate installed Rust skills by diff content
+
+These skills are installed via `actionbook/rust-skills` (hard prereq) and the related ecosystem. They're most effective when *explicitly* activated based on what the diff contains. Before writing findings, scan the diff and invoke any skill whose trigger fires:
+
+| Skill | Trigger in diff | Why activate |
+|---|---|---|
+| `m06-error-handling` | Adds/modifies `Result`, `Option`, `?`, `.unwrap()`, `.expect()`, `anyhow!`, `thiserror`, or error-enum variants | Validates error taxonomy, propagation, lost context (e.g. `anyhow!("{err}")` flattening), panic-vs-Result choice. |
+| `m07-concurrency` | Adds `tokio::`, `async fn`, `.await`, `tokio::join!` / `try_join!`, `tokio::spawn`, `tokio::time::timeout`, `Mutex`, `RwLock`, `Arc<...>` in shared state | Validates timeout scoping, join-vs-try_join, deadlock/lock-contention, task cancellation semantics, Send/Sync bounds. |
+| `m04-zero-cost` | Adds new generics, `impl Trait`, `dyn Trait`, trait objects, `Box<dyn ...>` | Validates static-vs-dynamic dispatch choice, unnecessary allocation, trait-object safety, monomorphization cost on a workspace this large. |
+| `m05-type-driven` | Adds newtypes, `PhantomData`, marker traits, builder patterns, type-state | Validates "make invalid states unrepresentable" and whether the type design actually narrows the state space. |
+| `m15-anti-pattern` | Any non-trivial new code | Sanity pass for common Rust anti-patterns. Cheap; run it. |
+| `m10-performance` | Changes to hot paths (auction loop, settlement submission, per-order handlers, native price estimation) | Validates allocations, caching, loop invariants, lock granularity. |
+| `unsafe-checker` | Any `unsafe` block, FFI (`extern`), `transmute`, raw pointers, `MaybeUninit` | **Mandatory** — any finding here defaults to **High**. Soundness issues are never Small. |
+| `rust-trait-explorer` | Adds a new trait or a new impl of an existing trait | Maps the trait's existing impls — catches "you added a default method to a trait with 12 impls, one of them should override it". |
+| `rust-call-graph` / `rust-code-navigator` | Modified function signatures on cross-crate public APIs | Catches missed caller sites, breaking changes, downstream blast radius. |
+| `ra-qm-skills` | Soft prereq | QM checklists — supplementary if installed. |
+
+**Rule of thumb:** If a skill's trigger keywords appear in the diff's **added** lines, activate it. Don't run skills on context lines (unchanged code around the diff) — that wastes tokens on things you're not actually reviewing.
+
 ### Severity Rubric
 
 | Severity | Meaning | Example |
@@ -158,6 +235,8 @@ A senior reviewer catches things that aren't bugs. They shape the code for futur
 
 7. **Description-vs-code mismatches.** If the PR body describes `Mutex<HashSet>` and the code has `DashSet`, the description is stale. Flag it — not because it changes the code, but because whoever reads the PR as history will be confused. Small finding, usually.
 
+   **Related: design spec referenced but not committed.** If the PR body mentions a design spec (e.g. `docs/superpowers/specs/...`) that isn't in the PR's diff, that's a **Small** finding asking the author to commit it — teammates reading the PR six months from now can't see the rationale otherwise.
+
 8. **Bundled orthogonal changes.** When a PR contains a change that isn't clearly required by the main feature, ask: does this belong in its own PR? Sometimes the answer is "it's required, here's why" (fine — ask for the reason to be in the commit message or a code comment). Sometimes the answer is "you're right, let me split it" (better git history).
 
 9. **Error taxonomies in web3 code.** Blockchain RPC errors have multiple categories: contract reverts, RPC transport failures, provider-side rate-limiting, timeout errors, decoding failures. Each should lead to *different* handling. If new code treats `Err(_)` as a single bucket when the branches should differ (e.g. cache a non-vault verdict on contract revert, but *not* on a network error), this is a correctness issue — Medium or High depending on whether the wrong classification can poison downstream behavior.
@@ -203,12 +282,21 @@ PR #<N> — <title>
 ═══════════════════════════════════════════════════════════
 Author:       @<author>
 Scope:        +<additions> −<deletions> across <N> files
+              (include a "(~X LOC human-written; rest generated/lockfile,
+              filtered)" suffix when the filter materially changed the count)
 Labels:       <labels, comma-separated; or "—">
 Base/Head:    <baseRef> ← <headRef>
 Linked issue: #<N> — <issue title>            (omit line if none)
 Loaded context: <comma-separated loaded_context list>
+Activated skills: <list of Rust skills fired by the skill router, e.g.
+                  m07-concurrency, m06-error-handling>
 Mode:         full checkout   |   degraded static-diff
 
+Codemap
+───────────────────────────────────────────────────────────
+<concise codemap from §3.5 — new symbols, modified signatures, caller counts.
+ Omit when the diff is trivial enough that §3.5 was skipped.>
+
 ───────────────────────────────────────────────────────────
 CONTEXT
 ───────────────────────────────────────────────────────────
@@ -341,6 +429,26 @@ Return with:   git switch <prior_branch>
 
 ---
 
+## 10. Code-vs-docs Discipline (Always Apply)
+
+When a finding rests on a claim about the codebase, verify the claim by looking at the code — not by trusting a doc, a comment, or this skill's own sibling files.
+
+- **Claim:** *"`OrderSimulator::encode_order` only reads `OrderData` and `Interactions`."*
+  **Wrong:** cite a doc comment.
+  **Right:** `rg 'fn encode_order' crates/orderbook/src/` → read the function body → verify.
+
+- **Claim:** *"All `OrderValidator::new` call sites have been updated."*
+  **Wrong:** count the test assertions in the diff.
+  **Right:** `rg 'OrderValidator::new\b' crates/` → compare the count to the diff's modified lines.
+
+- **Claim:** *"This module is only used by X."*
+  **Wrong:** trust the module's top-level comment.
+  **Right:** `find_referencing_symbols` on the public exports → see actual call graph.
+
+Docs age. Comments lie. Grep and LSP don't. When reporting a finding that depends on such a claim, verify *before* you write the finding — not after the author pushes back.
+
+---
+
 ## Maintenance Notes
 
 - **When you find yourself adding a project-specific heuristic more than twice**, move it into a sibling context doc under `docs/review-context/` and add a trigger rule to [§3](#3-classification).

From 120ca11b04f8fbf265bc5b04064ca1799d30979a Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Sat, 25 Apr 2026 21:41:40 +0530
Subject: [PATCH 03/14] refactor acc to review comments: 3 modes, CI wiring,
 smaller surface

---
 .claude/commands/review-pr.md              | 135 +++---
 .claude/settings.json                      |  13 -
 .github/workflows/claude-code-review.yml   |  34 ++
 docs/COW_PR_REVIEW_SKILL.md                | 456 +++++++++------------
 docs/review-context/alloy-rs.md            |  58 ---
 docs/review-context/database-migrations.md |  58 +--
 docs/review-context/openapi.md             |  58 ---
 7 files changed, 320 insertions(+), 492 deletions(-)
 delete mode 100644 .claude/settings.json
 create mode 100644 .github/workflows/claude-code-review.yml
 delete mode 100644 docs/review-context/alloy-rs.md
 delete mode 100644 docs/review-context/openapi.md

diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md
index ee2f9313c6..ad811bcd93 100644
--- a/.claude/commands/review-pr.md
+++ b/.claude/commands/review-pr.md
@@ -1,5 +1,5 @@
 ---
-description: Produce a structured local PR review report for cowprotocol/services — synthesizes context, builds a codemap, applies CoW-specific + Rust review skills, and emits severity-ranked findings with actionable questions. Use when the user says "review this PR", "/review-pr 1234", "look at PR #1234", or pastes a cowprotocol/services PR URL. Read-only; the user posts any comments manually.
+description: Produce a structured PR review for cowprotocol/services. Invoked locally as `/review-pr` (diff mode, against current branch vs main) or `/review-pr <N|url>` (PR mode). Same command also runs in CI via `.github/workflows/claude-code-review.yml`, where it posts a single review comment instead of printing to terminal. Read-only in local modes; the user posts any comments manually.
 ---
 
 Review PR: $ARGUMENTS
@@ -8,55 +8,62 @@ Follow the instructions in `./docs/COW_PR_REVIEW_SKILL.md` to produce the review
 
 ## Prologue (execute in order; abort on any failure)
 
-### 1. Parse $ARGUMENTS
+### 1. Detect mode
+
+The skill runs in one of three modes. Detection:
+
+- If the environment variable `$GITHUB_ACTIONS == "true"` → `mode = "pr-ci"`.
+  - `$ARGUMENTS` MUST be a PR number, URL, or `owner/repo#N` form (the workflow passes it).
+- Else if `$ARGUMENTS` is non-empty → `mode = "pr-local"`. Parse the argument (see [step 2](#2-parse-arguments-pr-modes-only)).
+- Else (`$ARGUMENTS` empty, not in CI) → `mode = "diff"`. No `gh` calls needed; source is `git diff $(git merge-base HEAD main)..HEAD`.
+
+### 2. Parse $ARGUMENTS (PR modes only)
+
+*(Skip in `diff` mode.)*
 
 Accept any of:
+
 - A PR number: `4267`
 - A full URL: `https://github.com/cowprotocol/services/pull/4267`
 - An `owner/repo#N` form: `cowprotocol/services#4267`
 
-Default owner/repo to `cowprotocol/services` if only a number is given.
+Default `owner/repo` to `cowprotocol/services` when only a number is given.
 
 Extract: `<PR_NUMBER>`, `<owner>`, `<repo>`.
 
-If unparseable, print this usage and abort:
+If unparseable, print and abort:
+
 ```
-Usage: /review-pr <PR_NUMBER>
+Usage: /review-pr                       # diff mode (current branch vs main)
+       /review-pr <PR_NUMBER>           # PR mode
        /review-pr https://github.com/owner/repo/pull/<N>
        /review-pr owner/repo#<N>
 ```
 
-### 2. Prereq check
+### 3. Diff-mode preflight
 
-**Hard (abort if missing):**
+*(Only in `mode == "diff"`.)*
 
-Check that `<RUST_SKILLS_PATH>` exists as a directory. If missing, print and **abort**:
-```
-✗ Required skill missing: actionbook/rust-skills
-
-  Install:
-    npx skills add actionbook/rust-skills
+Run:
 
-  Then exit this Claude session (/exit) and restart with:
-    claude --continue
+```bash
+git fetch origin main --quiet
+BASE=$(git merge-base HEAD origin/main)
+git diff --stat "$BASE..HEAD"
 ```
 
-**Soft (warn and continue if missing):**
+If `git diff "$BASE..HEAD"` is empty, print `No diff vs main — nothing to review.` and exit clean (not an error).
 
-Check that `<QM_PATH>` exists as a directory. If missing, print the banner below and **continue**:
-```
-⚠ Optional skill missing: alirezarezvani/claude-skills/ra-qm-team
-  Install (recommended):
-    npx ai-agent-skills install alirezarezvani/claude-skills/ra-qm-team
-```
+There is **no** clean-tree check, **no** rebase, and **no** `git pull` of main. Diff scope comes from the fetched `origin/main` merge-base.
+
+### 4. PR-mode preflight
 
-Build a `loaded_context` list containing the prereq skills that ARE installed. This list appears in the report header's `Loaded context:` line.
+*(Only in `mode == "pr-local"` or `mode == "pr-ci"`.)*
 
-> **Note to maintainer:** Replace `<RUST_SKILLS_PATH>` and `<QM_PATH>` with the absolute paths captured in `docs/superpowers/plans/2026-04-21-pr-review-skill.notes.md` (§0.1). They're left as placeholders because the installer paths are determined empirically post-restart.
+#### 4a. Working tree (pr-local only)
 
-### 3. Clean-tree check
+Run `git status --porcelain`. If non-empty, print it plus:
 
-Run `git status --porcelain`. If output is non-empty, print it plus the following message, then **abort**:
 ```
 Working tree is dirty. Stash or commit your changes, then re-run.
 
@@ -64,69 +71,65 @@ Working tree is dirty. Stash or commit your changes, then re-run.
   git stash pop    # to restore later
 ```
 
-**Never auto-stash.**
+Then **abort**. Never auto-stash.
 
-### 4. Update main
+#### 4b. Save the current branch (pr-local only)
 
-Save the current branch name to `<prior_branch>` (this variable is used again in step 8 of the reference doc).
+Save the current branch name to `<prior_branch>` so the report's NEXT STEPS footer can suggest `git switch <prior_branch>` when the review is done.
 
-Then:
-- If the current branch IS `main`: run `git pull --rebase origin main`.
-- Otherwise: run `git fetch origin main` (do **not** rebase the user's feature branch silently).
+#### 4c. Fetch base ref
 
-On rebase conflict, abort and print:
-```
-Rebase conflict on main. Resolve manually, then re-run.
-
-Conflicted files:
-<output of: git diff --name-only --diff-filter=U>
-```
+Run `git fetch origin --quiet`. This makes the diff comparable to base without rebasing or modifying the user's branch.
 
-### 5. Checkout PR
+#### 4d. Checkout the PR
 
-Run `gh pr checkout <PR_NUMBER> -R <owner>/<repo>`.
-
-On failure, handle specifically:
+Run `gh pr checkout <PR_NUMBER> -R <owner>/<repo>`. Failure handling:
 
 - **`gh` not installed** → print `Install gh: https://cli.github.com/` and abort.
-- **Auth error** → print the output of `gh auth status` plus `Run: gh auth login` and abort.
+- **Auth error** → print `gh auth status` output plus `Run: gh auth login` and abort.
 - **PR doesn't exist / wrong repo** → surface `gh`'s error verbatim and abort.
 - **Fork without checkout permission** → switch to **degraded static-diff mode**:
-  - Do **not** abort.
-  - Set `mode = "degraded static-diff"`.
-  - In step 6 of the reference doc, replace the `gh pr diff` call with:
-    ```bash
-    gh pr diff <PR_NUMBER> --patch -R <owner>/<repo>
-    ```
-  - Flag the degraded mode prominently in the report header's `Mode:` line.
+  - Set `mode_qualifier = "degraded static-diff"`.
+  - In the reference doc's §2, replace `gh pr diff <N>` with `gh pr diff <N> --patch -R <owner>/<repo>`.
+  - Flag the qualifier in the report header's `Mode:` line.
 - **Any other error** → surface verbatim and abort.
 
-If successful, `mode = "full checkout"`.
+In `pr-ci`, the workflow has already checked out the PR branch — skip 4d and just verify `HEAD` matches the expected ref.
+
+### 5. Optional-tooling probe
 
-### 6. Noise filtering (before handoff)
+Detect which optional accelerators are available in the current session: Serena MCP (`mcp__plugin_serena_serena__*`), `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`), `ra-qm-skills`, the `m04`/`m06`/`m07`/`m15` modules, `unsafe-checker`.
 
-CoW services PRs frequently bundle generated/mechanical files with the real change. Before handing off, classify each changed file in the diff:
+Build a `loaded_context` list of whichever ones resolved. Pass it through to the reference doc; it prints verbatim in the report header's `Loaded context:` line.
+
+Do not abort if a skill is missing. Do not print install banners.
+
+### 6. Noise filter (before handoff)
+
+Classify each changed file:
 
 **Review surface (read fully):**
+
 - Anything under `crates/*/src/**/*.rs` (excluding `contracts/generated/**`).
 - `crates/e2e/tests/**/*.rs`.
 - `contracts/solidity/**/*.sol` (authored Solidity).
-- Config files: `**/openapi.yml`, `database/sql/**`, `configs/**`, `*.toml` (only if semantically interesting — dep version bumps with CI green are generally fine).
+- Config files: `**/openapi.yml`, `database/sql/**`, `configs/**`, semantically interesting `*.toml`.
 
 **Noise (skip or skim):**
-- `Cargo.lock` (any): reviewing lockfile diffs is high-cost, low-signal. If CI is green, cargo resolved the graph; a lockfile review won't catch what CI missed. Skip.
-- `contracts/generated/**` and `contracts/artifacts/**`: machine-generated bindings and ABI JSON. Skip unless the PR is explicitly *about* the binding generation process.
-- Auto-generated `Cargo.toml` entries from contract-binding crates (`contracts/generated/*/Cargo.toml`): skip.
-- Binary fixtures, ABI blobs, snapshot files: skip.
 
-Report the filter in the report's header `Scope:` line as `+X −Y across Z files (~N LOC human-written; rest generated/lockfile, filtered)`. This tells the human reviewer the *real* surface size — a 4000-line diff is often 400 lines of actual review.
+- `Cargo.lock` (any). CI validates the resolution; reviewing the lockfile diff is high-cost, low-signal.
+- `contracts/generated/**` and `contracts/artifacts/**` — machine-generated bindings and ABI JSON.
+- Auto-generated `Cargo.toml` entries from contract-binding crates.
+- Binary fixtures, ABI blobs, snapshot files.
+
+Report the filter in the report's `Scope:` line as `+X −Y across Z files (~N LOC human-written; rest generated/lockfile, filtered)`.
 
 ### 7. Handoff
 
-Read `docs/COW_PR_REVIEW_SKILL.md` and follow it from **§2. Metadata Fetch** onward, passing through:
+Read `docs/COW_PR_REVIEW_SKILL.md` and follow it from §2 (Metadata Fetch) onward, passing through:
 
-- `<PR_NUMBER>`, `<owner>`, `<repo>`
-- `<prior_branch>` (the branch to return to at the end)
-- `loaded_context` (list of installed prereq skills)
-- `mode` (`full checkout` or `degraded static-diff`)
-- The filter list from step 6 — the reference doc uses it to decide what to `Read`.
+- `mode` (`diff` / `pr-local` / `pr-ci`) and `mode_qualifier` if set.
+- `<PR_NUMBER>`, `<owner>`, `<repo>` (PR modes only).
+- `prior_branch` (`pr-local` only).
+- `loaded_context` (optional skills detected in step 5).
+- The noise-filter classification from step 6 — the reference doc uses it to decide what to read.
diff --git a/.claude/settings.json b/.claude/settings.json
deleted file mode 100644
index 14cc499073..0000000000
--- a/.claude/settings.json
+++ /dev/null
@@ -1,13 +0,0 @@
-{
-  "extraKnownMarketplaces": {
-    "superpowers-dev": {
-      "source": {
-        "source": "github",
-        "repo": "obra/superpowers"
-      }
-    }
-  },
-  "enabledPlugins": {
-    "superpowers@superpowers-dev": true
-  }
-}
diff --git a/.github/workflows/claude-code-review.yml b/.github/workflows/claude-code-review.yml
new file mode 100644
index 0000000000..8623219db7
--- /dev/null
+++ b/.github/workflows/claude-code-review.yml
@@ -0,0 +1,34 @@
+name: Claude Code Review
+
+on:
+  pull_request:
+    types: [ready_for_review]
+
+jobs:
+  claude-review:
+    if: ${{ !github.event.pull_request.draft }}
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+      issues: read
+      id-token: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          # Full history is needed so the skill's `git blame`-based
+          # historic-context step can resolve any line in the diff.
+          fetch-depth: 0
+
+      - name: Run Claude Code Review
+        id: claude-review
+        uses: anthropics/claude-code-action@e0f2d99545298b87c2f984ab534af3a6534142ae # v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          # Invoke the in-repo slash command. The skill detects
+          # $GITHUB_ACTIONS=true and switches to `pr-ci` mode, posting
+          # a single consolidated review comment instead of streaming
+          # to a terminal. Skill source of truth: docs/COW_PR_REVIEW_SKILL.md
+          prompt: '/review-pr ${{ github.repository }}#${{ github.event.pull_request.number }}'
diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index 5ddff262ec..179f444745 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -1,306 +1,256 @@
 # CoW Services PR Review Skill
 
-This document instructs Claude how to produce a local PR review report. It is invoked by `.claude/commands/review-pr.md` after the prologue (arg parsing, prereq check, clean-tree check, main update, PR checkout).
+This document instructs Claude how to produce a PR review for cowprotocol/services. It is invoked by `.claude/commands/review-pr.md` (locally) or by `.github/workflows/claude-code-review.yml` (in CI). One skill, three operating modes.
 
-At this point you have:
+At the point this document is read, the entry-point has already determined:
 
-- A parsed `<PR_NUMBER>`, `<owner>`, `<repo>`.
-- A `mode` flag: `full checkout` (default) or `degraded static-diff` (fork fallback).
-- A `loaded_context` list of installed prereq skills.
-- A `prior_branch` variable holding the branch name to return to.
+- **`mode`** — one of:
+  - `diff` — local, no PR yet. Source is `git diff $(git merge-base HEAD main)..HEAD` (the whole feature-branch worth of work). No PR metadata, no `gh` calls. Output to terminal.
+  - `pr-local` — local, PR exists. Source is `gh pr diff <N>` plus PR metadata. Output to terminal.
+  - `pr-ci` — running inside `.github/workflows/claude-code-review.yml`. Source is `gh pr diff <N>` plus PR metadata. Output is a single review comment posted to the PR.
+- **`<PR_NUMBER>`, `<owner>`, `<repo>`** (only in `pr-local` / `pr-ci`).
+- **`prior_branch`** (only in `pr-local` — the branch to print at the end).
+
+The mode shapes which steps run and how the report is delivered, but the *content* of the review is the same in all three.
+
+---
 
 ## Core Principles (read before executing)
 
-- **CRITICAL: Signal over noise.** Report genuine concerns only. LGTM is a perfectly valid verdict and is the correct one whenever the PR is clean. The goal is not to maximise finding count — the goal is to be worth a senior reviewer's attention.
-- **CRITICAL: Never post to GitHub.** Output is strictly for the user's terminal. No `gh pr review`, no `gh pr comment`, no `gh pr close`. The user decides what to say on GitHub.
-- **CRITICAL: Code is the primary source of truth.** `CLAUDE.md`, existing design docs, and this skill's own sibling docs can go stale. When a finding turns on *"X is called from Y"* or *"this field is read by Z"*, verify by grepping the codebase or using an LSP symbol tool — not by citing a doc. Docs give you higher-level *shape*; code gives you ground truth.
+- **Signal over noise.** Report genuine concerns only. LGTM is a perfectly valid verdict and is the correct one whenever the PR is clean. The goal is not to maximise finding count — it is to be worth a senior reviewer's attention.
+- **Local modes never post to GitHub.** In `diff` and `pr-local`, output is strictly terminal. No `gh pr review`, no `gh pr comment`. The user posts whatever they choose. In `pr-ci`, post exactly one consolidated review comment — no per-line spam.
+- **Code is the primary source of truth.** `CLAUDE.md`, design docs in `docs/`, and this skill's own sibling docs can go stale. When a finding turns on *"X is called from Y"* or *"this field is read by Z"*, verify with `git grep` / `rg` / LSP — not by citing a doc.
+- **Inverted: this PR can make existing docs / comments / its own description stale.** If a code change makes a comment, a `docs/` page, or the PR's own description no longer match the diff's current state, that is itself a finding (`Action:` → update X).
+- **`git blame` before flagging code that looks unusual.** Often code looks weird because it had to. Before suggesting a "cleanup", blame the affected lines, read the originating commit message and (if any) linked PR. A comment that says *"this looks accidental, did you mean X?"* without that step risks asking the author to undo a hard-won fix.
 - **Explain, don't just flag.** Each finding must give the reviewer enough context to understand *and defend* the point — not just forward AI-generated text.
-- **Actionable framing.** Every finding ends with either a concrete `Action:` or a specific `Question:`. Never both.
-- **Token discipline.** Don't read entire files when a grep or a targeted LSP symbol lookup suffices. Build a codemap (see [§3.5](#35-codemap-phase)) *before* reading file bodies. When you do need a file, read hunks adjacent to changed lines rather than the whole thing.
+- **One framing per finding:** end with either `Action:` (concrete task) or `Question:` (clarification needed). Never both.
+- **Token discipline.** Don't read whole files when grep or LSP suffices. Build a codemap before reading file bodies.
+
+---
+
+## Universal Guardrails
+
+Apply these as the default lens for every change. Pull in CoW-specific siblings ([§3](#3-conditional-context)) only when the diff warrants them.
+
+1. **Keep the public API surface minimal.** A new `pub fn`, `pub struct`, or `pub mod` that isn't required by an external caller is a Medium finding asking why it isn't `pub(crate)` / `pub(super)` / private. Smaller surface = fewer downstream breakages = freer refactor for the next person.
+2. **Avoid rightward drift.** Code that's deeply nested (4+ levels, especially `match` inside `if let` inside `for` inside `async`) is hard to read and usually hides a missing extraction. Suggest an early-return, a helper, or `let-else`.
+3. **One responsibility per component.** A function, struct, or module that does two unrelated things (validates *and* persists; parses *and* renders) is harder to test and to reuse. Flag with a `Question:` if you're not sure the split is artificial.
+4. **Split big files.** A new file pushing past ~500 lines, or a touched file growing past ~1000 lines, is worth flagging. Suggest a sensible split (often by responsibility from #3).
+5. **Avoid argument bloat.** A function taking 6+ positional arguments is a code smell — usually missing a config struct, a builder, or a method on a context object. Especially flag if the arguments are mostly being threaded through unchanged.
+6. **Errors carry context.** `?` propagating a low-level error to a high-level boundary without enrichment loses the *what was the caller trying to do* information. `anyhow!("{err}")` flattens cause chains. Both are findings; severity depends on the path.
+
+---
 
 ## Execution Flow
 
-1. Fetch PR metadata and linked issue(s) — [§2. Metadata Fetch](#2-metadata-fetch)
-2. Classify diff paths and load sibling context docs — [§3. Classification](#3-classification)
-3. Build a targeted codemap — [§3.5. Codemap Phase](#35-codemap-phase)
-4. Synthesize the context block — [§4. Context Synthesis](#4-context-synthesis)
-5. Produce findings by severity — [§5. Review and Severity](#5-review-and-severity)
-6. Print the structured report — [§6. Report Template](#6-report-template)
-7. Offer verification (background) — [§7. Verification Offer](#7-verification-offer)
-8. Print cleanup hint — [§8. Cleanup](#8-cleanup)
+Steps run in this order. `diff` mode skips PR-metadata steps; `pr-ci` swaps the report sink at the end.
+
+1. Fetch PR metadata and linked issue(s) — [§2](#2-metadata-fetch). *(`pr-local` / `pr-ci` only.)*
+2. Classify diff paths and load conditional context — [§3](#3-conditional-context).
+3. Build a targeted codemap — [§4](#4-codemap-phase).
+4. Synthesize the context block — [§5](#5-context-synthesis).
+5. Review and produce findings — [§6](#6-review-and-severity).
+6. Emit the report — [§7](#7-report-templates). Sink depends on mode.
+7. Offer verification (background) — [§8](#8-verification-offer). *(Local modes only.)*
+8. Print cleanup hint — [§9](#9-cleanup). *(`pr-local` only.)*
 
-Error behavior is consolidated in [§9. Error Playbook](#9-error-playbook).
+Error behaviour is consolidated in [§10](#10-error-playbook).
 
 ---
 
 ## 2. Metadata Fetch
 
-Run these commands in **parallel** (single message, multiple Bash tool calls):
+*(Skip in `diff` mode — there is no PR yet.)*
+
+Run in **parallel** (single message, multiple Bash tool calls):
 
 ```bash
-# Full metadata
-gh pr view <PR_NUMBER> -R <owner>/<repo> --json title,body,author,labels,files,baseRefName,headRefName,additions,deletions,commits,reviewDecision,isDraft,state
+gh pr view <PR_NUMBER> -R <owner>/<repo> \
+  --json title,body,author,labels,files,baseRefName,headRefName,additions,deletions,commits,reviewDecision,isDraft,state
 
-# Full diff
 gh pr diff <PR_NUMBER> -R <owner>/<repo>
 ```
 
-In **degraded static-diff mode** (fork fallback), replace `gh pr diff` with:
+In **degraded static-diff mode** (fork without checkout permission, only relevant in `pr-local`), replace `gh pr diff` with:
+
 ```bash
 gh pr diff <PR_NUMBER> --patch -R <owner>/<repo>
 ```
 
 ### Linked issues
 
-Parse the PR body for `Fixes #<N>`, `Closes #<N>`, `Resolves #<N>` (case-insensitive). For each match, fetch in parallel with the above:
+Parse the PR body for `Fixes #<N>`, `Closes #<N>`, `Resolves #<N>` (case-insensitive). Fetch each in parallel with the above:
+
 ```bash
 gh issue view <N> -R <owner>/<repo> --json title,body,labels,state
 ```
 
-If no linked issue is referenced, proceed without one. Do not manufacture an issue link.
+If no linked issue is referenced, proceed without one. Do not manufacture one.
 
 ### State handling
 
-- `state == "CLOSED"` or `state == "MERGED"` → proceed, but prepend to the report:
-  > ⚠ This PR is {closed,merged}; review is informational.
-- `isDraft == true` → proceed, but prepend:
-  > ⚠ This PR is a draft; author may still be iterating.
+- `state == "CLOSED"` or `"MERGED"` → proceed; prepend a one-line warning to the report.
+- `isDraft == true` → proceed; prepend `Draft — author may still be iterating.`
 
 ---
 
-## 3. Classification
+## 3. Conditional Context
 
-Walk the file list from `gh pr view --json files`. For each changed file, evaluate these rules and accumulate a `context_docs` list:
+For each changed file, evaluate this table and accumulate a `context_docs` list. Read each matched doc once.
 
-| Rule | Match | Load |
-|---|---|---|
-| R1 — Alloy usage (path) | Any change under `crates/ethrpc/`, `crates/chain/`, or `crates/contracts/` | `docs/review-context/alloy-rs.md` |
-| R2 — Alloy usage (import) | Any `.rs` file whose diff hunks add `use alloy::*;`, `use alloy_*;`, or `alloy::` qualified paths | `docs/review-context/alloy-rs.md` |
-| R3 — DB migrations | Any file under `database/sql/**` | `docs/review-context/database-migrations.md` |
-| R4 — OpenAPI | Any file matching `**/openapi.yml` | `docs/review-context/openapi.md` |
-
-R1 and R2 are OR'd — load `alloy-rs.md` **once** if either matches.
-
-### Loading
+| Match | Load |
+|---|---|
+| Any file under `database/sql/**` | `docs/review-context/database-migrations.md` |
 
-Read each matched sibling doc via the `Read` tool. Record the loaded list — it will appear in the report header's `Loaded context:` line.
+Add a new sibling only when:
 
-If no rules match, `context_docs` is empty and the review proceeds with only the always-loaded `CLAUDE.md` + `actionbook/rust-skills`.
+- A real review surfaced a CoW-specific concern the AI consistently missed, **and**
+- That concern can't reasonably be inferred from the [Universal Guardrails](#universal-guardrails) plus general Rust judgment, **and**
+- It can be expressed as a tight checklist (≤30 lines), not a sprawling rulebook.
 
-### Future siblings
+### One inline guardrail worth keeping
 
-This list will grow. When adding a new sibling doc (e.g. `solver-engine.md`, `autopilot.md`), add its trigger row to the table above. Keep the filters tight — a sibling should load only when its content is actually relevant.
+When a PR touches **any `openapi.yml`**: scan for breaking changes (removed/renamed/typed-changed fields, new required request fields, narrowed enums, changed auth or HTTP method). If any are present, ask whether the goal could be achieved non-breakingly (additive field, new optional, deprecation window) and whether the affected consumer teams (Frontend, SAFE, etc.) have been notified. Severity: High when undisclosed in the PR description; Medium otherwise.
 
 ---
 
-## 3.5 Codemap Phase
+## 4. Codemap Phase
 
-**Purpose:** Before reading file bodies, build a targeted map of the symbols the diff touches, their callers, and their call sites. A codemap turns a 1000-line diff into a ~20-line mental model and preempts the "I read 10 files to find the impact" failure mode. It also catches findings that only become visible at the *shape* level (API ergonomics, unused abstractions, caller-count inconsistencies).
+**Purpose:** before reading file bodies, map the symbols the diff touches, their callers, and their call sites. A codemap turns a 1000-line diff into a ~20-line mental model and catches findings that only become visible at the *shape* level (caller-count inconsistencies, dead abstractions, leaky public APIs).
 
 ### What to map
 
 For each non-trivial symbol the diff adds, modifies, or deletes:
 
-1. **New public types / traits / functions** — what are their fields / methods / signatures? (`rust-symbol-analyzer` or `get_symbols_overview`.)
-2. **Modified function signatures** — who calls them? (`rust-call-graph`, `find_referencing_symbols`, or `rg '<fn_name>\b' crates/`.) This is how you catch "this signature changed but 4 call sites weren't updated".
-3. **New trait impls** — which types implement the trait? Is the trait used anywhere outside the PR? (`rust-trait-explorer` or `find_referencing_symbols`.)
-4. **Error-type changes** — where do callers match on this error? (`rg '::<ErrorVariant>'` / `find_referencing_symbols`.)
+1. **New public types / traits / functions** — fields, methods, signatures.
+2. **Modified function signatures** — caller count, were all sites updated?
+3. **New trait impls** — which types implement the trait? Is the trait used outside this PR?
+4. **Error-type changes** — where do callers match on this error?
 
-### Tools (prefer the cheapest viable option)
+### Tools (cheapest viable option first)
 
-In order of token cost, ascending:
+| Tool | Status | When to use |
+|---|---|---|
+| `Grep` / `rg` with `-n` on a symbol name | Always available | Caller counts and basic location lookups. Example: `rg 'OrderValidator::new\b' crates/`. |
+| `gh api` / `git blame` | Always available | Historic context on suspicious-looking lines. |
+| `mcp__plugin_serena_serena__find_symbol` / `find_referencing_symbols` / `get_symbols_overview` | Optional (LSP-backed; available when Serena MCP is configured) | Precise location + kind + signature without reading the full file. |
+| Skills from `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`) | Optional (installed via `npx skills add actionbook/rust-skills`) | Richer cross-crate analysis. Not present in CI by default. |
+| `Read` of a full file | Last resort | Only when the diff hunks plus the cheaper tools don't pin down what you need. |
 
-| Tool | When |
-|---|---|
-| `Grep` with `-n` on a symbol name | Fastest. Use when you need caller counts, not structure. Example: `rg 'OrderValidator::new\b' crates/` to verify all call sites were updated. |
-| `mcp__plugin_serena_serena__find_symbol` / `find_referencing_symbols` | Cheap, precise. Use when you need location + kind + signature, not the whole file. |
-| `mcp__plugin_serena_serena__get_symbols_overview` on a single file | Use before reading the file body — gets the symbol table for free. |
-| LSP-backed skills (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`) | Richer analysis — full call hierarchies, trait impl trees, type relationships. Use for diffs that touch cross-crate abstractions. |
-| Reading full files with `Read` | Last resort. Only when the diff hunks don't give enough surrounding context and the LSP tools can't pin down what you need. |
+**Fallback rule:** in CI (or any environment where the optional tools aren't installed), every codemap step still works with `rg` and `git`. The optional tools are accelerators, not requirements.
 
 ### What the codemap produces
 
-A short block in the report header (shown to the reviewer) that looks like:
+A short block in the report that looks like:
 
 ```
 Codemap
 ───────────────────────────────────────────────────────────
-New symbols (shared::order_validation):
-  Eip1271Simulating      (trait, new)
-  Eip1271Simulator       (struct, 3 pub fields, no constructor)
-  ValidationError::SimulationFailed(String)  (new variant)
-
-Callers of OrderValidator::new: 16 sites total (1 real, 15 test).
-  All updated in diff ✓
+New symbols (crate::module):
+  <Name>  (kind, key fact)
+  ...
 
-Config asymmetry noted:
-  configs::Eip1271SimulationMode (3 variants, Disabled default)
-  shared::Eip1271SimulationMode  (2 variants, Shadow default)
+Callers of <Name>: <count> sites (<real> real, <test> test). All updated ✓
 ```
 
-This is not filler — it's the raw material §4 (synthesis) and §5 (findings) work from. A finding like *"`Eip1271Simulator` pub fields have no constructor; 16 callers means each future field addition is a source-break"* only becomes findable once the codemap surfaces the caller count.
+This is the raw material §5 (synthesis) and §6 (findings) work from.
 
-### When to skip the codemap
+### When to skip
 
-- Trivial PRs (docs-only, single-line version bump, pure test addition) — skip.
-- Pure refactor PRs where the diff has no added public API — skim only.
-- Everything else — do it.
+Trivial PRs (docs-only, single-line bump, pure test addition) — skip. Pure refactors with no added public API — skim. Everything else — do it.
 
 ---
 
-## 4. Context Synthesis
+## 5. Context Synthesis
 
-Produce a 1-3 paragraph block combining:
-
-- PR title.
-- PR description (Description / Changes / How to test sections from the template, if filled).
-- Linked issue title + description, if any.
-- File scope — breadth of the diff (single crate, cross-crate, docs-only, DB, API spec).
-- Rough intent — new feature, bugfix, refactor, dep bump, docs, test-only.
+Produce 1–3 paragraphs combining the PR title, description, linked issue(s), file scope, and intent (feature, bugfix, refactor, dep bump, docs, test).
 
 ### Rules
 
-1. **Synthesize, do not copy-paste.** If the description is five words, say so: *"PR description is minimal — intent inferred from diff"*. Don't pretend.
-2. **Flag description-vs-diff mismatches as findings.** If the description says "docs-only" but `.rs` files are touched, open a **High**-severity finding titled `PR description contradicts diff scope` immediately. This is the one case where a finding precedes the normal review loop.
-3. **Linked-issue context goes into synthesis, not as a separate block.** Summarise motivation from the issue alongside the PR's own description.
-4. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism.
+1. **Synthesize, don't copy-paste.** If the description is five words, say so plainly: *"description is minimal; intent inferred from diff"*.
+2. **Watch for description-vs-diff drift.** A PR description must describe the diff's *current* state, not the author's iteration history. If a claim in the description is no longer true of the diff, raise a finding with `Action: update the PR description to match the current diff`. Do not flag the absence of a changelog of removed/superseded behaviour — that belongs in commit history, not the description.
+3. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism.
 
 ### Shape
 
-- **Paragraph 1** — *What* the PR changes (mechanics, file scope).
-- **Paragraph 2** — *Why* the change exists (from description + linked issue).
-- **Paragraph 3** (only if warranted) — *How* it's implemented (the approach, not a line-by-line walkthrough).
+- **Paragraph 1** — *what* changed.
+- **Paragraph 2** — *why* (description + linked issue).
+- **Paragraph 3** (if warranted) — *how* (the approach, not a line-by-line walkthrough).
 
 ---
 
-## 5. Review and Severity
+## 6. Review and Severity
 
-With the codemap ([§3.5](#35-codemap-phase)) and context synthesis ([§4](#4-context-synthesis)) in hand, review the diff. For non-trivial hunks, read the full changed file only when the codemap + diff don't answer the question. Apply, in order:
+Apply, in order:
 
-1. CoW services conventions from `CLAUDE.md`.
-2. Sibling docs from [§3](#3-classification) (conditionally loaded).
-3. **Activate installed Rust review skills by diff content (below).**
-4. Soft QM skill (`ra-qm-team`), if in `loaded_context`.
+1. The [Universal Guardrails](#universal-guardrails).
+2. The conditional context from [§3](#3-conditional-context), if any was loaded.
+3. CoW-services conventions from `CLAUDE.md`.
+4. Optional skills from [§4 → tools table](#tools-cheapest-viable-option-first), activated by what the diff actually contains. If installed, invoke `m06-error-handling` for `Result`/`Option`/`?` changes, `m07-concurrency` for `tokio::*` / async / locking, `m04-zero-cost` for new generics or trait objects, `m15-anti-pattern` for general sanity, `unsafe-checker` for any `unsafe` (mandatory High). If they aren't installed, reason from general Rust knowledge plus the [Universal Guardrails](#universal-guardrails).
 
-### Skill router — activate installed Rust skills by diff content
+### Use `git blame` for historic context
 
-These skills are installed via `actionbook/rust-skills` (hard prereq) and the related ecosystem. They're most effective when *explicitly* activated based on what the diff contains. Before writing findings, scan the diff and invoke any skill whose trigger fires:
+Before flagging code that looks unusual, redundant, or "easy to clean up":
 
-| Skill | Trigger in diff | Why activate |
-|---|---|---|
-| `m06-error-handling` | Adds/modifies `Result`, `Option`, `?`, `.unwrap()`, `.expect()`, `anyhow!`, `thiserror`, or error-enum variants | Validates error taxonomy, propagation, lost context (e.g. `anyhow!("{err}")` flattening), panic-vs-Result choice. |
-| `m07-concurrency` | Adds `tokio::`, `async fn`, `.await`, `tokio::join!` / `try_join!`, `tokio::spawn`, `tokio::time::timeout`, `Mutex`, `RwLock`, `Arc<...>` in shared state | Validates timeout scoping, join-vs-try_join, deadlock/lock-contention, task cancellation semantics, Send/Sync bounds. |
-| `m04-zero-cost` | Adds new generics, `impl Trait`, `dyn Trait`, trait objects, `Box<dyn ...>` | Validates static-vs-dynamic dispatch choice, unnecessary allocation, trait-object safety, monomorphization cost on a workspace this large. |
-| `m05-type-driven` | Adds newtypes, `PhantomData`, marker traits, builder patterns, type-state | Validates "make invalid states unrepresentable" and whether the type design actually narrows the state space. |
-| `m15-anti-pattern` | Any non-trivial new code | Sanity pass for common Rust anti-patterns. Cheap; run it. |
-| `m10-performance` | Changes to hot paths (auction loop, settlement submission, per-order handlers, native price estimation) | Validates allocations, caching, loop invariants, lock granularity. |
-| `unsafe-checker` | Any `unsafe` block, FFI (`extern`), `transmute`, raw pointers, `MaybeUninit` | **Mandatory** — any finding here defaults to **High**. Soundness issues are never Small. |
-| `rust-trait-explorer` | Adds a new trait or a new impl of an existing trait | Maps the trait's existing impls — catches "you added a default method to a trait with 12 impls, one of them should override it". |
-| `rust-call-graph` / `rust-code-navigator` | Modified function signatures on cross-crate public APIs | Catches missed caller sites, breaking changes, downstream blast radius. |
-| `ra-qm-skills` | Soft prereq | QM checklists — supplementary if installed. |
-
-**Rule of thumb:** If a skill's trigger keywords appear in the diff's **added** lines, activate it. Don't run skills on context lines (unchanged code around the diff) — that wastes tokens on things you're not actually reviewing.
+```bash
+git blame -L <start>,<end> -- <file>            # who/what/when
+git log --format='%H %s' -n 1 <commit>          # commit message
+gh pr view <#> -R cowprotocol/services          # if commit message links a PR
+```
+
+If the originating commit message or PR explains *why* the code is shaped that way, factor that into your finding. A "this looks accidental" comment is much weaker when blame shows a deliberate fix from six months ago. Mention what blame revealed in the finding's Explanation so the reviewer can defend the point.
 
 ### Severity Rubric
 
-| Severity | Meaning | Example |
-|---|---|---|
-| **High** | Merging as-is is a real risk: correctness bug, data loss, security issue, incompatible DB migration, auction/settlement invariant broken, likely panic, unsound `unsafe`. | `.unwrap()` on a solver response path; SQL migration that rewrites a multi-million-row table without `CONCURRENTLY`. |
-| **Medium** | Worth fixing before merge — won't break prod but will cost later. Missing error context, test gap on a new invariant, public API ergonomics, missing doc-comment on a cross-crate `pub`, unhandled edge case. | New `pub fn` in `shared` with no doc-comment; `?` swallowing an error without context. |
-| **Small / QoL** | Would genuinely improve the code. **Not a nit.** | `Vec` could be `impl Iterator` in a hot path; duplicated 3-line block could be a helper. |
+| Severity | Meaning |
+|---|---|
+| **High** | Merging as-is is a real risk: correctness bug, data loss, security issue, incompatible DB migration, auction/settlement invariant broken, likely panic, unsound `unsafe`. |
+| **Medium** | Worth fixing before merge — won't break prod but will cost later. Missing error context, public-API ergonomics, unhandled edge case, n-1 rollout incompatibility, undisclosed breaking API change. |
+| **Small / QoL** | Would genuinely improve the code. **Not a nit.** |
 
 ### Anti-nit Rule (mandatory)
 
-- If the only reason to change it is personal taste or stylistic preference, **do not report it**.
-- Formatting findings belong to `rustfmt` / CI, not to this skill. Never surface a finding whose fix is "run `cargo +nightly fmt`".
-- Clippy lints are a CI concern by default. **Exception:** a clippy warning inside the new code may be reported as **Small** if and only if it improves correctness or clarity — never for style.
-- If you're uncertain whether something is a nit, omit it. LGTM when clean.
-- **Don't inflate severity.** The severity of a finding is what a senior reviewer would actually call it in GitHub, not what sounds safer. Most substantive comments are just unmarked; only the few purely cosmetic ones carry a `nit:` prefix in practice. Don't tier-down everything to "Small" to avoid confrontation — that dilutes the signal. Either it's a Medium worth discussing or it's omitted.
-
-### Reviewer Discipline — Heuristics Beyond "Is This Bug"
-
-A senior reviewer catches things that aren't bugs. They shape the code for future readers and future maintainers. These are the patterns a good CoW review surfaces:
-
-1. **Motivation before mechanism.** Before reviewing *how* the code works, verify *why* it exists. If the PR description doesn't justify the change, or justifies it with an assumption ("EIP-4626 tokens will have coverage problems"), ask the author to confirm the assumption — as a `Question:`, not a demand. You can't judge tradeoffs against a motivation you don't understand.
-
-2. **Root cause over workaround.** If a new file contains logic that feels like it exists to cancel out a wart somewhere else ("insert WETH into non_vault_tokens", "treat this special case differently"), investigate the wart. Fixing the root cause upstream is usually cleaner than adding a compensating wart downstream. Ask the author: *"this logic looks like it's working around X — could X be fixed instead?"*
-
-3. **Suggest simpler primitives; listen to pushback.** If code uses `join!` where `try_join!` would work, ask. If it uses `Mutex<HashSet<_>>` where `DashSet` would work, ask. The author often has a reason (as in the eip4626.rs case where `join!` is deliberate because the branch logic depends on both results), and the back-and-forth resolves the question. Framing: *"Can this be `<simpler alternative>`?"*, not *"Use `<alternative>`"*.
-
-4. **Return-type consistency over side-effects at distance.** A function returning `Result<Option<(T, U)>, Error>` explicitly encodes its three states (value / no-value / error). A function that writes to a cache two layers down the call stack as a side effect hides the same information. Prefer explicit return types. The cost of a verbose type signature is always smaller than the cost of a reviewer (or future you) missing a hidden mutation.
-
-5. **Top-to-bottom readability.** Within a file, order functions in the direction callers-before-callees or high-level-before-low-level (pick one and keep it consistent). When reviewing, if you find yourself scrolling back and forth between `fn do_thing` and `fn helper_for_thing`, flag the ordering — *not as a nit, as a Medium if it's non-trivial navigation*.
+- If the only reason to change it is taste or stylistic preference, **do not report it**.
+- Formatting belongs to `rustfmt` / CI. Never raise a finding whose fix is "run `cargo +nightly fmt`".
+- Clippy lints are a CI concern by default. **Exception:** a clippy warning inside the new code may be reported as **Small** if and only if it improves correctness or clarity — never style.
+- If you're uncertain whether something is a nit, omit it. LGTM is the right verdict when the PR is clean.
+- **Don't inflate severity** to look thorough. Each finding's severity is what a senior reviewer would actually call it on GitHub. Either it's worth discussing or it's omitted.
 
-6. **Stale comments.** When behavior changes, comments describing that behavior often rot. Read the comments *against* the code around them and flag any drift. A comment saying "zero timeout signals best-effort" attached to code that no longer has a zero-timeout path is a Medium finding — future readers will believe it.
-
-7. **Description-vs-code mismatches.** If the PR body describes `Mutex<HashSet>` and the code has `DashSet`, the description is stale. Flag it — not because it changes the code, but because whoever reads the PR as history will be confused. Small finding, usually.
-
-   **Related: design spec referenced but not committed.** If the PR body mentions a design spec (e.g. `docs/superpowers/specs/...`) that isn't in the PR's diff, that's a **Small** finding asking the author to commit it — teammates reading the PR six months from now can't see the rationale otherwise.
-
-8. **Bundled orthogonal changes.** When a PR contains a change that isn't clearly required by the main feature, ask: does this belong in its own PR? Sometimes the answer is "it's required, here's why" (fine — ask for the reason to be in the commit message or a code comment). Sometimes the answer is "you're right, let me split it" (better git history).
-
-9. **Error taxonomies in web3 code.** Blockchain RPC errors have multiple categories: contract reverts, RPC transport failures, provider-side rate-limiting, timeout errors, decoding failures. Each should lead to *different* handling. If new code treats `Err(_)` as a single bucket when the branches should differ (e.g. cache a non-vault verdict on contract revert, but *not* on a network error), this is a correctness issue — Medium or High depending on whether the wrong classification can poison downstream behavior.
-
-10. **Generated-code and lockfile diffs.** Never include generated-bindings or `Cargo.lock` findings. CI's type-check validates the generated code; if CI is green, there's nothing a reviewer will see that the compiler won't. Filter these at the entry-point (see `.claude/commands/review-pr.md` §6) and report only on human-written changes.
-
-### Using GitHub "Suggested change" Blocks in the Output
-
-When a finding has a clear mechanical fix (e.g. remove a `.clone()`, swap a type), phrase the `Action:` line so the user can paste it straight into a GitHub *Suggested change* block. Example:
-
-```
-Action: Replace `self.provider.clone()` with `&self.provider` — the alloy
-        Instance constructors take `IntoProvider` which is implemented for
-        both owned and borrowed providers. (GitHub suggested-change:
-        `let vault = IERC4626::new(token, &self.provider);`)
-```
-
-The bracketed suggestion is copy-pasteable into GitHub's suggestion feature, which makes the author's acceptance a single click.
-
-### Per-finding Shape
-
-Every finding contains exactly these four parts:
+### Per-finding shape
 
 1. **Title** — short noun phrase (≤ 8 words).
-2. **Location** — `path/to/file.ext:line` (or `path/to/file.ext:start-end` for a range).
-3. **Explanation** — enough that the reviewer understands *and can defend* the point without re-reading the diff. Must include:
-   - **The mechanism** (what's wrong).
-   - **The impact** (why it matters for CoW services specifically — auction, settlement, solver competition, DB migration, etc.).
-   - **Repo-specific context** where it helps (e.g. "this path runs per-auction, ~12s cadence, so an extra RPC is costly").
-4. **Action OR Question** — exactly one, never both:
-   - `Action: <concrete task the author should do>`
-   - `Question: <specific clarification needed before the reviewer can decide>`
+2. **Location** — `path/to/file.ext:line` or `path:start-end`.
+3. **Explanation** — mechanism, impact, and (if relevant) what `git blame` / the codemap revealed.
+4. **`Action:` OR `Question:`** — exactly one.
+
+When a finding has a clear mechanical fix, phrase the `Action:` so the reviewer can paste it as a GitHub *Suggested change* block.
 
 ---
 
-## 6. Report Template
+## 7. Report Templates
 
-Print the report in this exact shape. Omit sections that don't apply (e.g. no `VERIFICATION` block if the user declined).
+### Terminal form (`diff` and `pr-local`)
 
 ```
 ═══════════════════════════════════════════════════════════
-PR #<N> — <title>
+PR #<N> — <title>                       (or "Diff review — <branch>" in diff mode)
 ═══════════════════════════════════════════════════════════
-Author:       @<author>
-Scope:        +<additions> −<deletions> across <N> files
-              (include a "(~X LOC human-written; rest generated/lockfile,
-              filtered)" suffix when the filter materially changed the count)
-Labels:       <labels, comma-separated; or "—">
-Base/Head:    <baseRef> ← <headRef>
-Linked issue: #<N> — <issue title>            (omit line if none)
-Loaded context: <comma-separated loaded_context list>
-Activated skills: <list of Rust skills fired by the skill router, e.g.
-                  m07-concurrency, m06-error-handling>
-Mode:         full checkout   |   degraded static-diff
+Author:       @<author>                 (omit in diff mode)
+Scope:        +<add> −<del> across <N> files
+              (~X LOC human-written; rest generated/lockfile, filtered)
+Labels:       <labels or "—">           (omit in diff mode)
+Base/Head:    <baseRef> ← <headRef>     (or "main ← <branch>" in diff mode)
+Linked issue: #<N> — <title>            (omit if none)
+Mode:         diff | pr-local | pr-ci   ;  full checkout | degraded static-diff
 
 Codemap
 ───────────────────────────────────────────────────────────
-<concise codemap from §3.5 — new symbols, modified signatures, caller counts.
- Omit when the diff is trivial enough that §3.5 was skipped.>
+<from §4 — omit if skipped>
 
 ───────────────────────────────────────────────────────────
 CONTEXT
 ───────────────────────────────────────────────────────────
-<synthesis from §4>
+<synthesis from §5>
 
 ───────────────────────────────────────────────────────────
 VERDICT:  <LGTM | Changes requested | Needs clarification>
@@ -309,7 +259,7 @@ VERDICT:  <LGTM | Changes requested | Needs clarification>
 FINDINGS
   [High]    <count>
   [Medium]  <count>
-  [Small]   <count>    (QoL-only; true nits omitted)
+  [Small]   <count>    (QoL-only; nits omitted)
 
 ───── High ─────────────────────────────────────────────────
 1. <title>
@@ -334,33 +284,45 @@ cargo clippy           <status>
 cargo +nightly fmt     <status>
 
 ───────────────────────────────────────────────────────────
-NEXT STEPS
+NEXT STEPS                              (pr-local only)
 ───────────────────────────────────────────────────────────
 Currently on:  <current branch>
 Return with:   git switch <prior_branch>
 ```
 
-### LGTM short form
+#### LGTM short form
 
-When there are **zero** findings (High, Medium, and Small all zero), collapse everything between `VERDICT:` and `NEXT STEPS` to a single line:
+When there are zero findings at all severities, collapse everything between `VERDICT:` and `NEXT STEPS` to a single line:
 
 ```
 VERDICT:  LGTM — no blocking or notable issues.
 ```
 
-The header, CONTEXT, and NEXT STEPS sections still print.
+Header, CONTEXT, and (in `pr-local`) NEXT STEPS still print.
+
+### Comment form (`pr-ci`)
+
+In CI, post **one** review comment via the action. Use the same body shape as the terminal form, minus:
+
+- The NEXT STEPS section (no local branch state to print).
+- The VERIFICATION block (CI runs its own checks separately).
+- ANSI box-drawing characters (Markdown headings instead).
+
+GitHub-render the body using `##`/`###` headings and fenced code blocks. Keep findings collapsible with `<details>` if there are more than ~5.
 
 ### Verdict selection
 
-- **LGTM** — no High or Medium findings; zero or a handful of Small findings that the reviewer may still choose to post.
-- **Changes requested** — any High findings, or Medium findings that block safety/correctness.
-- **Needs clarification** — no High findings, but one or more Medium/Small findings use the `Question:` form because the reviewer can't decide without author input.
+- **LGTM** — no High or Medium findings.
+- **Changes requested** — any High, or Medium that blocks safety/correctness.
+- **Needs clarification** — no High, but one or more findings use the `Question:` form.
 
 ---
 
-## 7. Verification Offer
+## 8. Verification Offer
 
-After printing the report, ask the user inline:
+*(Local modes only. Skip in `pr-ci` — CI runs its own check/clippy/fmt jobs in parallel.)*
+
+After printing the report, ask the user:
 
 > "Run local verification in the background? Reply with one of:
 > `check` — `cargo check --locked --workspace --all-features --all-targets`
@@ -369,90 +331,44 @@ After printing the report, ask the user inline:
 > `all` — run all three
 > `skip` — don't run anything"
 
-On user reply:
-
-- **`skip`** → omit the VERIFICATION block entirely. Proceed to [§8](#8-cleanup).
-- **`check` / `clippy` / `fmt` / `all`** → dispatch each selected command as a **background** Bash invocation (`run_in_background: true`). Do not sleep, do not poll — the runtime sends a completion notification. On completion, call `BashOutput` to retrieve the result and append it to the VERIFICATION block:
-  ```
-  cargo check            ✅  clean
-  cargo clippy           ⚠   2 warnings (details below)
-  cargo +nightly fmt     ✅  clean
-  ```
-
-### When verification output produces findings
+Dispatch each selected command as a **background** Bash invocation. On completion, append the result to a VERIFICATION block.
 
-- If `cargo check` or `cargo clippy` surfaces errors/warnings **inside files changed by this PR**, add them to the Findings list:
-  - Compile errors → **High** (the PR doesn't build).
-  - Clippy warnings that flag a correctness issue → **Medium**.
-  - Clippy warnings that flag clarity (e.g. `needless_return` inside new code) → **Small** — but only when they pass the [§5](#5-review-and-severity) Anti-nit rule.
-- Do **not** surface warnings from files the PR did not modify.
-- `cargo +nightly fmt -- --check` failures are **not** findings. The Anti-nit rule forbids surfacing format findings. The VERIFICATION block reports the status; that's where it ends.
+If `cargo check` or `cargo clippy` surfaces issues **inside files this PR modified**, fold them into Findings (compile errors → High, correctness clippy → Medium, clarity clippy → Small subject to the Anti-nit rule). Do not surface warnings from files the PR didn't touch. `fmt --check` failures are a status, never a finding.
 
-### Never run tests by default
-
-The menu deliberately omits `cargo nextest run`. Services' test suite is large and takes minutes — gating every review on a full test run is too expensive. If the reviewer wants tests, they run them outside the skill.
+Tests are intentionally not in the menu — services' suite is too long-running to gate every review on.
 
 ---
 
-## 8. Cleanup
-
-The NEXT STEPS footer (already part of the report template in [§6](#6-report-template)) names the current branch and the exact command to return to the prior branch:
+## 9. Cleanup
 
-```
-Currently on:  <current branch>
-Return with:   git switch <prior_branch>
-```
+*(`pr-local` only.)*
 
-**Do not run `git switch` yourself.** The user may want to stay on the PR branch to continue investigating, run their own tests, or browse related code. The command is a hint, not an action.
+The NEXT STEPS footer names the current branch and the `git switch` to return to. Print it; never run it. The user may want to stay on the PR branch.
 
 ---
 
-## 9. Error Playbook
+## 10. Error Playbook
 
-| Condition | Behavior |
+| Condition | Behaviour |
 |---|---|
-| `gh` not installed | Print `Install gh: https://cli.github.com/`, abort. |
+| `gh` not installed (pr modes) | Print `Install gh: https://cli.github.com/`, abort. |
 | `gh` not authenticated | Print `gh auth status` output + `Run: gh auth login`, abort. |
-| PR number not parseable | Print usage string (see `.claude/commands/review-pr.md` step 1), abort. |
+| PR number not parseable | Print usage, abort. |
 | PR doesn't exist / wrong repo | Surface `gh` error verbatim, abort. |
-| PR is closed or merged | Prepend state warning to the report ([§2](#2-metadata-fetch)), proceed. |
-| PR is a draft | Prepend draft warning to the report ([§2](#2-metadata-fetch)), proceed. |
-| Working tree dirty | Print `git status --porcelain`, instruct `git stash` or commit, abort — **no auto-stash**. |
-| `git pull --rebase origin main` conflict | Abort, print conflicted files, instruct manual resolution. |
-| `gh pr checkout` fails (fork permission) | Degrade to static-diff mode, flag in report header `Mode:` field. |
-| Hard prereq skill missing | Print install command, abort. Handled in the entry-point prologue, not here. |
-| Soft prereq skill missing | Print install-command banner, continue. Handled in the entry-point prologue, not here. |
-| Verification offer declined (`skip`) | Omit VERIFICATION block. |
-| Verification command fails or warns | Surface output in VERIFICATION block; add findings only for issues inside changed files. |
-
-**Rule of thumb:** never silently degrade behavior without the `Mode:` header reflecting it. If something went sideways, the reviewer should know from the report itself.
+| PR closed/merged | Prepend warning, proceed. |
+| PR is draft | Prepend warning, proceed. |
+| Working tree dirty (pr-local) | Print `git status --porcelain`, instruct stash/commit, abort. **No auto-stash.** |
+| `gh pr checkout` fails (fork permission) | Degrade to static-diff mode, flag in report header. |
+| Optional skill not installed | Continue using `rg` / general Rust knowledge. Do **not** abort. |
+| `diff` mode but no diff (branch == main) | Print `No diff vs main — nothing to review.`, exit clean. |
+| Verification command fails | Surface output in VERIFICATION block; raise findings only for issues inside changed files. |
 
----
-
-## 10. Code-vs-docs Discipline (Always Apply)
-
-When a finding rests on a claim about the codebase, verify the claim by looking at the code — not by trusting a doc, a comment, or this skill's own sibling files.
-
-- **Claim:** *"`OrderSimulator::encode_order` only reads `OrderData` and `Interactions`."*
-  **Wrong:** cite a doc comment.
-  **Right:** `rg 'fn encode_order' crates/orderbook/src/` → read the function body → verify.
-
-- **Claim:** *"All `OrderValidator::new` call sites have been updated."*
-  **Wrong:** count the test assertions in the diff.
-  **Right:** `rg 'OrderValidator::new\b' crates/` → compare the count to the diff's modified lines.
-
-- **Claim:** *"This module is only used by X."*
-  **Wrong:** trust the module's top-level comment.
-  **Right:** `find_referencing_symbols` on the public exports → see actual call graph.
-
-Docs age. Comments lie. Grep and LSP don't. When reporting a finding that depends on such a claim, verify *before* you write the finding — not after the author pushes back.
+**Rule of thumb:** never silently degrade. If a tool was missing or a step was skipped, the report's Mode/Header line should reflect it.
 
 ---
 
-## Maintenance Notes
-
-- **When you find yourself adding a project-specific heuristic more than twice**, move it into a sibling context doc under `docs/review-context/` and add a trigger rule to [§3](#3-classification).
-- **When you find the skill making a false-positive finding**, add a counter-example to the Anti-nit rule section in [§5](#5-review-and-severity). The rubric calibrates over time.
-- **When CoW introduces a new subsystem with its own review considerations** (e.g. a new crate, new auction policy, new settlement path), create a sibling doc for it and wire in a trigger.
+## 11. Maintenance Notes
 
-This skill is expected to grow — both `docs/review-context/*.md` and this reference's rubric will accumulate CoW-specific knowledge over time.
+- When the AI consistently misses a CoW-specific concern across multiple reviews, first try expressing it as one more bullet in [Universal Guardrails](#universal-guardrails). Only carve a sibling doc if it can't be expressed generically.
+- When the skill produces a false-positive finding, add a one-line counter-example to the [Anti-nit Rule](#anti-nit-rule-mandatory).
+- Keep this document under 500 lines.
diff --git a/docs/review-context/alloy-rs.md b/docs/review-context/alloy-rs.md
deleted file mode 100644
index 24fba96dc1..0000000000
--- a/docs/review-context/alloy-rs.md
+++ /dev/null
@@ -1,58 +0,0 @@
-# Review Context — Alloy-rs Usage
-
-Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches `crates/ethrpc/`, `crates/chain/`, `crates/contracts/`, or adds `alloy::*` / `alloy_*` imports.
-
-**External reference:** https://alloy.rs/introduction/prompting (AI-optimized alloy guide). Fetch it once per session when actually reviewing alloy changes.
-
-## High-signal review checks
-
-1. **Provider reuse — and pass by reference.** Alloy's `Provider` is clonable and meant to be cloned, not rebuilt. Any new `ProviderBuilder::new().connect_http(...)` in a hot path (auction loop, per-order path, per-request handler) is a **Medium** finding — it should reuse an existing provider.
-
-   Additionally, alloy's contract `Instance` constructors accept `IntoProvider`, which is implemented for both owned providers *and references*. Pass by reference when the caller retains ownership:
-   ```rust
-   // ❌ unnecessary clone
-   let vault = IERC4626::new(token, self.provider.clone());
-   // ✅ pass by reference
-   let vault = IERC4626::new(token, &self.provider);
-   ```
-   Gratuitous `.clone()` on providers is a **Small (QoL)** finding — and a good candidate for a GitHub `Suggested change` block (mechanical fix, zero-risk).
-
-2. **Batched RPC.** CoW's `ethrpc` crate adds batching on top of alloy. New direct RPC calls that bypass the batching layer can hit rate limits under auction load. Flag any direct `provider.get_*` calls in hot paths that aren't going through `ethrpc::Web3` — **Medium**.
-
-3. **Distinguish contract-revert from transport/network error.** This is the single most CoW-specific alloy gotcha. When code runs a `call()` against an arbitrary on-chain contract and interprets the `Err`, the error has categories:
-
-   - **Contract revert** (valid ERC-20 returned an error, `asset()` not implemented, etc.) → deterministic. Safe to cache "this token is X".
-   - **Network / transport error** (RPC timeout, 5xx from the node, connection reset) → transient. **Must not be cached** — a transient failure at cache-write time would permanently mis-classify the token until restart.
-   - **Decoding error** (contract returned bytes that didn't decode into the expected type) → treat as contract-level; usually means the contract at that address doesn't implement the interface.
-
-   The `ethrpc::alloy::errors::ContractErrorExt` trait exposes `.is_contract_error()` for this split. Whenever a diff has `Err(_)` in a `call()` result's match arm and then *caches* a verdict, verify that the match is gated on `is_contract_error()`. This is a **High** finding if missing — a single network blip could persistently mark a valid vault as non-vault.
-
-3. **Decoding errors.** Alloy's `sol!` macro emits `Result` types that should be surfaced with context, not `.unwrap()`. An `.unwrap()` on `decode()` output in a code path that handles arbitrary on-chain data is **High** — malformed returns from an uncooperative contract should not panic the auction loop.
-
-4. **Gas estimation.** Avoid `provider.estimate_gas` inside the settlement path. Gas estimation hits the node and can be slow or flaky. Use the precomputed gas values from the solver solution instead. Inline gas estimation in the critical settlement path → **High**; outside it → **Medium** with a suggestion to cache.
-
-5. **Block number or tag.** When reading chain state, verify the block tag is correct for the context:
-   - Auction-time reads → the auction's block (explicit tag, not `latest`).
-   - Simulation → `latest` or `pending` depending on purpose.
-   - Settlement submission → varies by private-RPC strategy; follow existing patterns in the same file.
-
-   A blind `BlockNumberOrTag::Latest` in code that should use the auction block is **High** (inconsistent with the auction's view of the world → score mismatches).
-
-6. **Ethers-rs migration artifacts.** Services is mid-migration from `ethers-rs` to `alloy`. Mixing primitive types across the boundary (e.g. `ethers::H160` passed to a function expecting `alloy::primitives::Address`) is **Medium** — each layer should have a single primitive type. Look for ad-hoc conversions that hint at a leak across the boundary.
-
-7. **Primitives vs. types.** `alloy::primitives::U256` vs. `alloy::sol_types::sol!`-generated types — verify arithmetic is done on the primitives and the sol! types are used only at contract boundaries. Arithmetic on sol! types inside business logic is **Small** at minimum, **Medium** if it obscures overflow handling.
-
-8. **`join!` vs `try_join!` with branch logic.** When code needs the result of *both* futures before deciding what the outcome means (classic pattern: `asset()` + `decimals()` on an unknown token, where the *combination* of successes/failures carries meaning), `tokio::join!` is correct even though `try_join!` looks shorter. `try_join!` short-circuits on the first error, collapsing four outcomes into two — which is wrong for classification code.
-
-   When reviewing: if you see `tokio::join!` where `try_join!` would be briefer, check whether the code's match arms depend on *which combination* of results errored. If yes, it's correct — don't suggest `try_join!`. If no, suggest `try_join!` as a simplification, framed as a question (*"Can this use `try_join!`?"*).
-
-9. **Generated `sol!` bindings vs inline `alloy::sol!`.** CoW prefers generated contract bindings (via the `contracts` crate) over inline `alloy::sol! { #[sol(rpc)] interface X { ... } }` blocks in business code. Generated bindings avoid per-crate compile cost and let `Cmd+Click` jump to the expanded source. An inline `sol!` macro in a non-test business file is **Small** with an Action to move it to the generated-bindings path. (Test code is a softer rule — inline for quick tests is fine.)
-
-## Not findings
-
-- Style choices in how `sol!` is invoked (macro-vs-file).
-- Which alloy sub-crate was imported directly vs. through `alloy::*` re-exports.
-- Import ordering, `use` grouping.
-- Whether to use `Address::ZERO` vs. `alloy::primitives::address!("0x...")` for the null address (both are fine).
-
-These are taste / `rustfmt` concerns. The Anti-nit rule in `COW_PR_REVIEW_SKILL.md §5` forbids surfacing them.
diff --git a/docs/review-context/database-migrations.md b/docs/review-context/database-migrations.md
index 531dc22ad1..8dfb155ccc 100644
--- a/docs/review-context/database-migrations.md
+++ b/docs/review-context/database-migrations.md
@@ -2,47 +2,51 @@
 
 Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches `database/sql/**`.
 
-This file extends — does not replace — the reminder in `.github/nitpicks.yml`. The nitpick bot posts the generic DB warning automatically; the reviewer applies judgment using the checks below.
+This file extends — does not replace — the reminder in `.github/nitpicks.yml`. The nitpick bot posts the generic warning automatically; the reviewer applies judgment using the checks below.
 
-## Non-negotiables (all **High** until addressed)
+The checks are dictated by *how* we roll out — not by SQL itself.
 
-1. **Reversibility.** The migration must state whether it's reversible. If yes, the rollback script is present or explicitly linked in the PR description. If no, the PR explains *why* irreversibility is acceptable for this change. Missing either → **High**.
+## Rollout shape
 
-2. **Compatibility window.** During rollout, k8s starts the new autopilot, runs Flyway, and only then shuts down the old pod. That overlap means the **previous** version must still be able to process requests on the **new** schema. Any migration that breaks the old schema path (e.g. drops a column the old code reads) → **High** until the PR lays out an n-1 compatibility plan, typically:
-   - Release A: make the change compatible (add new column, keep old).
-   - Release B: ship code that uses the new column.
-   - Release C: drop the old column.
+Two facts shape every check below:
 
-3. **Blocking index creation on hot tables.** Tables in the critical path — `orders`, `trades`, `auctions`, `settlements`, `order_events`, `auction_orders`, `quotes`, `order_quotes` — must use `CREATE INDEX CONCURRENTLY`. A blocking `CREATE INDEX` on one of these → **High**.
+1. **k8s rolls pods, not the cluster.** During a release, the new autopilot starts and runs Flyway *before* the old pod is shut down. For a non-trivial overlap window the previous version is processing requests against the new schema. Anything that breaks that path breaks production.
+2. **Staging and production are independent and roll out at different times.** A migration lands in staging first; production follows on its own cadence. While the gap is open, our shadow environment can be running an *older* code version against the *newer* schema (or vice versa, briefly). Any change that breaks that combination breaks shadow without breaking the obvious path the author tested.
 
-4. **Table-list update.** New tables must be added to the authoritative list at `crates/database/src/lib.rs:51-87`. Missing → **Medium** (not High because a CI check may also catch it, but it's still the author's responsibility).
+Both points generalise beyond DB — config schema changes, request/response formats, and message-queue payloads have the same n-1 / staging-vs-production caveat. When a PR touches any of those *and* the DB, mention the cross-cutting risk in the synthesis.
 
-5. **DB README update.** Schema changes must update the DB README (`crates/database/README.md` or the SQL folder's README — whichever is the convention at review time). Missing → **Medium**.
+## Non-negotiables (default **High** until addressed)
 
-## Usually worth flagging (**High**)
+1. **Reversibility.** State whether the migration is reversible. If yes, include or link the rollback script. If no, the PR explains *why* irreversibility is acceptable. Missing either → **High**.
+2. **n-1 schema compatibility.** The previous app version must still function against the new schema. A migration that drops or renames a column, narrows a type, or adds a `NOT NULL` constraint without code already tolerating both shapes → **High** until the rollout plan is spelled out (typically: ship change in three releases — add → migrate code → drop).
+3. **Blocking index creation on hot tables.** Use `CREATE INDEX CONCURRENTLY` on anything in the auction/settlement critical path (`orders`, `trades`, `auctions`, `settlements`, `order_events`, `auction_orders`, `quotes`, `order_quotes`). A blocking `CREATE INDEX` on one of these → **High**.
+4. **Authoritative table list.** New tables must appear in `crates/database/src/lib.rs` (search for the top-level table list). Missing → **Medium** (CI may also catch it; still the author's responsibility).
+5. **README.** Schema changes update the SQL or DB README, whichever is the convention at review time. Missing → **Medium**.
 
-- `ALTER TABLE ... ADD COLUMN NOT NULL` on a multi-million-row table **without** a default → table lock + slow backfill → **High**. Remedy: add as nullable, backfill in a batched job, then add `NOT NULL` in a later migration.
-- Dropping a column still referenced in code the old pod runs → **High** (see §2 compatibility window).
-- Renaming a column in a single migration (rather than add-new + migrate + drop-old across three releases) → **High**.
-- Adding a `UNIQUE` constraint without verifying current data is unique → **High** (migration will fail in prod if duplicates exist).
-- Changing a column's type in place (e.g. `VARCHAR(32)` → `VARCHAR(64)`) on a large table without a plan for the rewrite cost → **Medium** to **High** depending on table size.
+## Other shapes that usually warrant **High**
 
-## Usually worth flagging (**Medium**)
+- `ALTER TABLE ... ADD COLUMN NOT NULL` on a multi-million-row table without a default — table lock plus slow backfill. Remedy: add nullable, batched backfill, then `NOT NULL` in a later migration.
+- Renaming a column in a single migration rather than the three-release add → migrate → drop pattern.
+- Adding a `UNIQUE` constraint without first verifying current data is unique (the migration will fail in prod if duplicates exist).
+- Type changes in place on a large table without a plan for the rewrite cost.
 
-- Migration scripts without comments explaining the "why" when the "what" is non-obvious.
-- Introducing a foreign key without `ON DELETE` behavior being specified (the default is `NO ACTION`, which can be surprising).
-- New tables without indexes on columns that obvious queries will filter on.
+## Usually worth flagging as **Medium**
+
+- Migration scripts without comments when the *what* is non-obvious.
+- New foreign keys without an explicit `ON DELETE` clause (default `NO ACTION` is often surprising).
+- New tables without indexes on the columns obvious queries will filter on.
 
 ## Not findings
 
-- SQL style: trailing commas, keyword casing, indentation.
-- Whether the migration could have been one statement instead of three — if it's correct, it's correct.
-- Whether the migration file should have been named differently (Flyway naming convention is enforced mechanically).
+- SQL style — trailing commas, keyword casing, indentation.
+- Whether the migration could be one statement instead of three. If correct, accept it.
+- Migration filename style — Flyway naming is enforced mechanically.
 
 ## Questions worth asking the author
 
-When reviewing, these often deserve a `Question:` in a finding:
+When in doubt, prefer a `Question:` over a flagged `Action:`:
 
-- "What's the expected row count in `<table>` in production at rollout time?" (drives the need for `CONCURRENTLY`, batched backfill).
-- "Which release are we pairing this migration with on the code side?" (drives the n-1 compatibility reasoning).
-- "Have you tested the rollback script against a production-sized dataset?" (if one is included).
+- *"What's the expected row count in `<table>` at rollout time?"* — drives `CONCURRENTLY` and batched-backfill decisions.
+- *"Which release pairs this migration with the code change?"* — makes the n-1 reasoning explicit.
+- *"How does this look on shadow during the staging→production gap?"* — surfaces cross-environment compatibility before the author finds out by paging.
+- *"Has the rollback script been tested against a production-sized dataset?"* — only relevant if a rollback was included.
diff --git a/docs/review-context/openapi.md b/docs/review-context/openapi.md
deleted file mode 100644
index 2f3d7fbfe7..0000000000
--- a/docs/review-context/openapi.md
+++ /dev/null
@@ -1,58 +0,0 @@
-# Review Context — OpenAPI Spec Changes
-
-Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches any `**/openapi.yml`.
-
-This file extends — does not replace — the reminder in `.github/nitpicks.yml` about breaking-change communication.
-
-## Classify the change first
-
-Before reviewing, bucket the diff into:
-
-- **Breaking** — any of:
-  - Removing an endpoint, path, or response field.
-  - Renaming a field, path parameter, or query parameter.
-  - Changing a field's type (string → number, array → object, enum widening into a non-enum, etc.).
-  - Making a previously optional field required.
-  - **Narrowing** an enum's value set (removing a value). Adding enum values is non-breaking in most clients but may still surprise strict parsers — see below.
-  - Changing auth requirements on an endpoint.
-  - Changing an endpoint's HTTP method or path.
-- **Non-breaking but noteworthy** — new required field on a request body; new endpoint; changed default; stricter validation; added enum value.
-- **Pure additive** — new response field (old clients ignore it), new optional request field with a server-side default.
-
-The bucket drives the severity ceiling.
-
-## Breaking changes
-
-If the PR contains any breaking change:
-
-1. **High** — unless the PR description explicitly:
-   - Calls out the break.
-   - Names the affected consumers (at minimum: **Frontend team** and **SAFE team**).
-   - Describes the versioning / migration path (rollout sequence, deprecation window, any shim endpoints).
-
-2. Always include a `Question:` asking whether the consumers have been notified. Even if the description says they have, a reviewer-visible "yes, confirmed" is worth forcing.
-
-3. If the break is irreversible (e.g. a response field's semantic meaning changes while the name stays the same), escalate to **High** even if communication looks clean — semantic breaks are the ones that silently corrupt client behavior.
-
-## Non-breaking but worth flagging
-
-- **New required field on a request body** → check whether a server-side default exists. If not, existing clients will 400 — effectively breaking. **High**.
-- **New endpoint without a mention in the PR description** → **Medium**. Clients discover endpoints via the spec; teammates should know a new one landed.
-- **Missing `description:` on a new field or endpoint** → **Small**.
-- **New enum value** on a field clients parse strictly → **Medium** with a `Question:` — many OpenAPI-generated clients fail hard on unknown enum values; worth checking whether the Frontend / SAFE generators are lenient.
-
-## Not findings
-
-- YAML style, key ordering, indentation.
-- Whether the change could have been split into multiple PRs — only worth flagging if the change genuinely mixes unrelated concerns.
-- Minor description rewording that doesn't change the contract.
-
-## Communication checklist (use in Action:)
-
-When the PR warrants a Changes-requested verdict for missing communication, the `Action:` can reference this list verbatim:
-
-- [ ] Breaking changes called out in the PR description.
-- [ ] Frontend team notified (link the Slack thread or issue).
-- [ ] SAFE team notified (link the Slack thread or issue).
-- [ ] Versioning/migration path documented — what rolls out when, what deprecates.
-- [ ] Any shim or compatibility endpoint planned; if yes, link its PR or issue.

From c84480d8fe9c33e6f1653eab5d83c5cfb1d89440 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Mon, 27 Apr 2026 06:36:22 +0530
Subject: [PATCH 04/14] refactor: update acc to relevant gemini review points

---
 .claude/commands/review-pr.md              |  2 +-
 docs/COW_PR_REVIEW_SKILL.md                | 13 +++++++++++--
 docs/review-context/database-migrations.md |  4 +++-
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md
index ad811bcd93..b94f88f14c 100644
--- a/.claude/commands/review-pr.md
+++ b/.claude/commands/review-pr.md
@@ -15,7 +15,7 @@ The skill runs in one of three modes. Detection:
 - If the environment variable `$GITHUB_ACTIONS == "true"` → `mode = "pr-ci"`.
   - `$ARGUMENTS` MUST be a PR number, URL, or `owner/repo#N` form (the workflow passes it).
 - Else if `$ARGUMENTS` is non-empty → `mode = "pr-local"`. Parse the argument (see [step 2](#2-parse-arguments-pr-modes-only)).
-- Else (`$ARGUMENTS` empty, not in CI) → `mode = "diff"`. No `gh` calls needed; source is `git diff $(git merge-base HEAD main)..HEAD`.
+- Else (`$ARGUMENTS` empty, not in CI) → `mode = "diff"`. No `gh` calls needed; source is `git diff $(git merge-base HEAD origin/main)..HEAD` (the actual command runs in step 3 below, after fetching `origin/main`).
 
 ### 2. Parse $ARGUMENTS (PR modes only)
 
diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index 179f444745..0a81bfa5ed 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -5,7 +5,7 @@ This document instructs Claude how to produce a PR review for cowprotocol/servic
 At the point this document is read, the entry-point has already determined:
 
 - **`mode`** — one of:
-  - `diff` — local, no PR yet. Source is `git diff $(git merge-base HEAD main)..HEAD` (the whole feature-branch worth of work). No PR metadata, no `gh` calls. Output to terminal.
+  - `diff` — local, no PR yet. Source is `git diff $(git merge-base HEAD origin/main)..HEAD` (the whole feature-branch worth of work, computed against the freshly-fetched remote `main` to avoid a stale local `main`). No PR metadata, no `gh` calls. Output to terminal.
   - `pr-local` — local, PR exists. Source is `gh pr diff <N>` plus PR metadata. Output to terminal.
   - `pr-ci` — running inside `.github/workflows/claude-code-review.yml`. Source is `gh pr diff <N>` plus PR metadata. Output is a single review comment posted to the PR.
 - **`<PR_NUMBER>`, `<owner>`, `<repo>`** (only in `pr-local` / `pr-ci`).
@@ -137,7 +137,16 @@ For each non-trivial symbol the diff adds, modifies, or deletes:
 | Skills from `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`) | Optional (installed via `npx skills add actionbook/rust-skills`) | Richer cross-crate analysis. Not present in CI by default. |
 | `Read` of a full file | Last resort | Only when the diff hunks plus the cheaper tools don't pin down what you need. |
 
-**Fallback rule:** in CI (or any environment where the optional tools aren't installed), every codemap step still works with `rg` and `git`. The optional tools are accelerators, not requirements.
+**Fallback rule:** in CI and in `pr-local` (full checkout), every codemap step still works with `rg` and `git` against the local working tree. The optional tools are accelerators, not requirements.
+
+**Exception — degraded static-diff mode.** When `pr-local` falls back to static-diff (a fork the user can't `gh pr checkout`), the working tree is the *prior* branch, not the PR's. `rg crates/`, `Read` of repo files, and `git blame` on the working tree all reflect the wrong code. In this mode:
+
+- Take the diff content from `gh pr diff <N> --patch -R <owner>/<repo>` (already arranged in [`commands/review-pr.md` §4d](../.claude/commands/review-pr.md)).
+- Read individual files at the PR's head SHA via `gh api /repos/<owner>/<repo>/contents/<path>?ref=<head_sha>` instead of `Read`.
+- Skip `git blame`-based historic context for lines that only exist on the PR branch (the local repo doesn't know about them); blame against `origin/main` is still fine for *unchanged* surrounding code.
+- Caller-count searches via `rg` are usable only for symbols that exist on `origin/main`; new symbols introduced by the PR must be looked up via the patch and `gh api`.
+
+The report header's `Mode:` line already flags `degraded static-diff` — a reviewer reading the output knows which constraint applied.
 
 ### What the codemap produces
 
diff --git a/docs/review-context/database-migrations.md b/docs/review-context/database-migrations.md
index 8dfb155ccc..c12626356b 100644
--- a/docs/review-context/database-migrations.md
+++ b/docs/review-context/database-migrations.md
@@ -15,7 +15,9 @@ Two facts shape every check below:
 
 Both points generalise beyond DB — config schema changes, request/response formats, and message-queue payloads have the same n-1 / staging-vs-production caveat. When a PR touches any of those *and* the DB, mention the cross-cutting risk in the synthesis.
 
-## Non-negotiables (default **High** until addressed)
+## Non-negotiables
+
+Each item carries its own severity. Items 1–3 are correctness/availability concerns and default to **High**; items 4–5 are author hygiene and default to **Medium**.
 
 1. **Reversibility.** State whether the migration is reversible. If yes, include or link the rollback script. If no, the PR explains *why* irreversibility is acceptable. Missing either → **High**.
 2. **n-1 schema compatibility.** The previous app version must still function against the new schema. A migration that drops or renames a column, narrows a type, or adds a `NOT NULL` constraint without code already tolerating both shapes → **High** until the rollout plan is spelled out (typically: ship change in three releases — add → migrate code → drop).

From 2dc79ba04b15ed6fabc244863260be8615835ecd Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Mon, 27 Apr 2026 22:48:39 +0530
Subject: [PATCH 05/14] refactor: update optional accelerators in PR review
 documentation

---
 .claude/commands/review-pr.md | 2 +-
 docs/COW_PR_REVIEW_SKILL.md   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md
index b94f88f14c..1777b3b353 100644
--- a/.claude/commands/review-pr.md
+++ b/.claude/commands/review-pr.md
@@ -98,7 +98,7 @@ In `pr-ci`, the workflow has already checked out the PR branch — skip 4d and j
 
 ### 5. Optional-tooling probe
 
-Detect which optional accelerators are available in the current session: Serena MCP (`mcp__plugin_serena_serena__*`), `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`), `ra-qm-skills`, the `m04`/`m06`/`m07`/`m15` modules, `unsafe-checker`.
+Detect which optional accelerators are available in the current session: Serena MCP (`mcp__plugin_serena_serena__*`) and `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`).
 
 Build a `loaded_context` list of whichever ones resolved. Pass it through to the reference doc; it prints verbatim in the report header's `Loaded context:` line.
 
diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index 0a81bfa5ed..c1477648e1 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -195,7 +195,7 @@ Apply, in order:
 1. The [Universal Guardrails](#universal-guardrails).
 2. The conditional context from [§3](#3-conditional-context), if any was loaded.
 3. CoW-services conventions from `CLAUDE.md`.
-4. Optional skills from [§4 → tools table](#tools-cheapest-viable-option-first), activated by what the diff actually contains. If installed, invoke `m06-error-handling` for `Result`/`Option`/`?` changes, `m07-concurrency` for `tokio::*` / async / locking, `m04-zero-cost` for new generics or trait objects, `m15-anti-pattern` for general sanity, `unsafe-checker` for any `unsafe` (mandatory High). If they aren't installed, reason from general Rust knowledge plus the [Universal Guardrails](#universal-guardrails).
+4. Optional accelerators from [§4 → tools table](#tools-cheapest-viable-option-first) when available — Serena MCP for symbol lookups, `actionbook/rust-skills` for caller / trait / structural analysis. If they aren't installed, reason from `rg` + general Rust knowledge plus the [Universal Guardrails](#universal-guardrails).
 
 ### Use `git blame` for historic context
 

From 2598a28581306d7e4f2717bab894087024ca9c6c Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Mon, 27 Apr 2026 23:11:16 +0530
Subject: [PATCH 06/14] =?UTF-8?q?refactor:=20address=20Jose=20review=20?=
 =?UTF-8?q?=E2=80=94=20split=20gh=20api=20/=20git=20blame,=20drop=20clippy?=
 =?UTF-8?q?=20exception,=20tighten=20db-migrations=20sibling?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/COW_PR_REVIEW_SKILL.md                | 7 ++++---
 docs/review-context/database-migrations.md | 5 ++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index c1477648e1..3065800f6c 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -132,7 +132,8 @@ For each non-trivial symbol the diff adds, modifies, or deletes:
 | Tool | Status | When to use |
 |---|---|---|
 | `Grep` / `rg` with `-n` on a symbol name | Always available | Caller counts and basic location lookups. Example: `rg 'OrderValidator::new\b' crates/`. |
-| `gh api` / `git blame` | Always available | Historic context on suspicious-looking lines. |
+| `git blame` / `git log` | Always available | Historic context on suspicious-looking lines. Local, read-only. |
+| `gh api` (read-only verbs) | Always available | Reading individual file contents at a specific ref (e.g. degraded static-diff mode in §4). **Never** use mutating verbs (`POST`/`PATCH`/`DELETE`) — review skill is read-only. |
 | `mcp__plugin_serena_serena__find_symbol` / `find_referencing_symbols` / `get_symbols_overview` | Optional (LSP-backed; available when Serena MCP is configured) | Precise location + kind + signature without reading the full file. |
 | Skills from `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`) | Optional (installed via `npx skills add actionbook/rust-skills`) | Richer cross-crate analysis. Not present in CI by default. |
 | `Read` of a full file | Last resort | Only when the diff hunks plus the cheaper tools don't pin down what you need. |
@@ -221,7 +222,7 @@ If the originating commit message or PR explains *why* the code is shaped that w
 
 - If the only reason to change it is taste or stylistic preference, **do not report it**.
 - Formatting belongs to `rustfmt` / CI. Never raise a finding whose fix is "run `cargo +nightly fmt`".
-- Clippy lints are a CI concern by default. **Exception:** a clippy warning inside the new code may be reported as **Small** if and only if it improves correctness or clarity — never style.
+- Clippy lints are a CI concern. Don't surface them as findings — CI flags them automatically. If the new code reads poorly, raise a Small finding on the actual readability issue (Universal Guardrails #2/#3 cover most cases).
 - If you're uncertain whether something is a nit, omit it. LGTM is the right verdict when the PR is clean.
 - **Don't inflate severity** to look thorough. Each finding's severity is what a senior reviewer would actually call it on GitHub. Either it's worth discussing or it's omitted.
 
@@ -342,7 +343,7 @@ After printing the report, ask the user:
 
 Dispatch each selected command as a **background** Bash invocation. On completion, append the result to a VERIFICATION block.
 
-If `cargo check` or `cargo clippy` surfaces issues **inside files this PR modified**, fold them into Findings (compile errors → High, correctness clippy → Medium, clarity clippy → Small subject to the Anti-nit rule). Do not surface warnings from files the PR didn't touch. `fmt --check` failures are a status, never a finding.
+If `cargo check` surfaces **compile errors inside files this PR modified**, fold them into Findings as **High**. `cargo clippy` and `cargo +nightly fmt --check` results are status-only — record them in the VERIFICATION block, never as findings. CI gates on both, so duplicating them in the review report is noise.
 
 Tests are intentionally not in the menu — services' suite is too long-running to gate every review on.
 
diff --git a/docs/review-context/database-migrations.md b/docs/review-context/database-migrations.md
index c12626356b..6e3bc788f5 100644
--- a/docs/review-context/database-migrations.md
+++ b/docs/review-context/database-migrations.md
@@ -29,12 +29,11 @@ Each item carries its own severity. Items 1–3 are correctness/availability con
 
 - `ALTER TABLE ... ADD COLUMN NOT NULL` on a multi-million-row table without a default — table lock plus slow backfill. Remedy: add nullable, batched backfill, then `NOT NULL` in a later migration.
 - Renaming a column in a single migration rather than the three-release add → migrate → drop pattern.
-- Adding a `UNIQUE` constraint without first verifying current data is unique (the migration will fail in prod if duplicates exist).
-- Type changes in place on a large table without a plan for the rewrite cost.
+- Adding a `UNIQUE` constraint without first verifying current data is unique (migrations fail when duplicates exist).
+- `ALTER COLUMN ... TYPE ...` on a large table without a multi-step migration plan — Postgres rewrites every row and holds an `ACCESS EXCLUSIVE` lock for the duration. Remedy: new column, dual-write, backfill, swap, drop old.
 
 ## Usually worth flagging as **Medium**
 
-- Migration scripts without comments when the *what* is non-obvious.
 - New foreign keys without an explicit `ON DELETE` clause (default `NO ACTION` is often surprising).
 - New tables without indexes on the columns obvious queries will filter on.
 

From 04370d6670100c70a0be0917585285159b8968a6 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Tue, 28 Apr 2026 01:43:27 +0530
Subject: [PATCH 07/14] feat: split out two skills for independent usage

---
 docs/COW_PR_REVIEW_SKILL.md               |  26 +----
 docs/skills/git-blame-historic-context.md |  23 +++++
 docs/skills/pr-blame-walk.md              | 111 ++++++++++++++++++++++
 docs/skills/pr-context-synthesis.md       |  26 +++++
 4 files changed, 163 insertions(+), 23 deletions(-)
 create mode 100644 docs/skills/git-blame-historic-context.md
 create mode 100644 docs/skills/pr-blame-walk.md
 create mode 100644 docs/skills/pr-context-synthesis.md

diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index 3065800f6c..fbcd0e21a7 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -173,19 +173,7 @@ Trivial PRs (docs-only, single-line bump, pure test addition) — skip. Pure ref
 
 ## 5. Context Synthesis
 
-Produce 1–3 paragraphs combining the PR title, description, linked issue(s), file scope, and intent (feature, bugfix, refactor, dep bump, docs, test).
-
-### Rules
-
-1. **Synthesize, don't copy-paste.** If the description is five words, say so plainly: *"description is minimal; intent inferred from diff"*.
-2. **Watch for description-vs-diff drift.** A PR description must describe the diff's *current* state, not the author's iteration history. If a claim in the description is no longer true of the diff, raise a finding with `Action: update the PR description to match the current diff`. Do not flag the absence of a changelog of removed/superseded behaviour — that belongs in commit history, not the description.
-3. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism.
-
-### Shape
-
-- **Paragraph 1** — *what* changed.
-- **Paragraph 2** — *why* (description + linked issue).
-- **Paragraph 3** (if warranted) — *how* (the approach, not a line-by-line walkthrough).
+Follow the [`pr-context-synthesis`](skills/pr-context-synthesis.md) skill with `<pr_text>` = PR title + body from [§2](#2-metadata-fetch), `<linked_issue>` = the fetched issue (if any, also from §2), and `<diff_summary>` = the [§4](#4-codemap-phase) codemap plus the changed file list.
 
 ---
 
@@ -198,17 +186,9 @@ Apply, in order:
 3. CoW-services conventions from `CLAUDE.md`.
 4. Optional accelerators from [§4 → tools table](#tools-cheapest-viable-option-first) when available — Serena MCP for symbol lookups, `actionbook/rust-skills` for caller / trait / structural analysis. If they aren't installed, reason from `rg` + general Rust knowledge plus the [Universal Guardrails](#universal-guardrails).
 
-### Use `git blame` for historic context
-
-Before flagging code that looks unusual, redundant, or "easy to clean up":
-
-```bash
-git blame -L <start>,<end> -- <file>            # who/what/when
-git log --format='%H %s' -n 1 <commit>          # commit message
-gh pr view <#> -R cowprotocol/services          # if commit message links a PR
-```
+### Historic context
 
-If the originating commit message or PR explains *why* the code is shaped that way, factor that into your finding. A "this looks accidental" comment is much weaker when blame shows a deliberate fix from six months ago. Mention what blame revealed in the finding's Explanation so the reviewer can defend the point.
+Before flagging code that looks unusual, redundant, or "easy to clean up", follow the [`git-blame-historic-context`](skills/git-blame-historic-context.md) skill and factor what blame reveals into the finding's Explanation.
 
 ### Severity Rubric
 
diff --git a/docs/skills/git-blame-historic-context.md b/docs/skills/git-blame-historic-context.md
new file mode 100644
index 0000000000..1de29a1cef
--- /dev/null
+++ b/docs/skills/git-blame-historic-context.md
@@ -0,0 +1,23 @@
+# Skill — `git blame` for historic context
+
+Use before flagging code that looks unusual, redundant, accidental, or "easy to clean up". Often that code looks weird because it had to.
+
+## Procedure
+
+```bash
+git blame -L <start>,<end> -- <file>            # who/what/when
+git log --format='%H %s' -n 1 <commit>          # commit message
+gh pr view <#> -R <owner>/<repo>                # if commit message links a PR
+```
+
+## Decision
+
+If the originating commit message or PR explains *why* the code is shaped the way it is, factor that into the finding. A "this looks accidental, did you mean X?" comment is much weaker when blame shows a deliberate fix from six months ago — and risks asking the author to undo a hard-won change.
+
+Mention what blame revealed in the finding's Explanation so the reviewer can defend the point without re-running the lookup.
+
+## When to skip
+
+- Lines entirely new in the diff under review (no history yet).
+- Pure additions of new symbols (nothing to blame).
+- Generated code — blame the generator's input instead.
diff --git a/docs/skills/pr-blame-walk.md b/docs/skills/pr-blame-walk.md
new file mode 100644
index 0000000000..0283918393
--- /dev/null
+++ b/docs/skills/pr-blame-walk.md
@@ -0,0 +1,111 @@
+# Skill — pr-blame-walk
+
+Use to investigate *"did one of the last N merged PRs cause X?"* — a regression, alert, metric drop, or fresh prod incident. Inputs are a query (the symptom) and a time window of merged PRs; output is a ranked list of suspect PRs with evidence per suspect.
+
+This is **not** a per-PR review. The unit of work is a window of merged PRs scored against an external observation (an alert, a regression report, a metric drop), not a single PR being assessed for merge-readiness. For per-PR review, use [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md).
+
+## Inputs
+
+- `<query>` — what we're hunting. The symptom and any locating signal: error string, endpoint, metric name, network, onset time. Examples:
+  - *"500s on `POST /api/v1/orders` mainnet starting 2026-04-20 14:00 UTC"*
+  - *"`auction_rewards` dropped 30% on Gnosis from 2026-04-15"*
+  - *"settlement gas usage on mainnet up 2× since last week"*
+- `<time_window>` — the merge window to scan. Either a date range (`merged:2026-04-15..2026-04-20`) or a count (last N merged PRs). The window must bracket the symptom's onset; over-broad windows produce noise.
+- `<repo>` — `<owner>/<repo>` (defaults to `cowprotocol/services`).
+- `<scope>` — optional path filter (e.g. `crates/autopilot/`, `crates/driver/src/domain/competition/`, `database/sql/`) to narrow candidates when the symptom localises clearly. Skip if the symptom is system-wide.
+
+## Procedure
+
+### 1. Enumerate candidates
+
+```bash
+gh pr list -R <repo> --state merged --search "<time_window>" \
+  --json number,title,mergedAt,mergeCommit,author,files,additions,deletions \
+  --limit 100
+```
+
+If `<scope>` is set, drop PRs whose `files[].path` list doesn't intersect it. Drop pure docs/lockfile/dep-bump PRs unless the symptom is a build/CI issue.
+
+### 2. Per-candidate fetch (parallel)
+
+For each surviving candidate, in parallel:
+
+```bash
+gh pr view <N> -R <repo> --json title,body,files
+gh pr diff <N> -R <repo>
+```
+
+### 3. Per-candidate context block
+
+Run [`pr-context-synthesis`](pr-context-synthesis.md) with `<pr_text>` = title+body, `<linked_issue>` = the referenced issue (if any), `<diff_summary>` = file list + diff hunks. Keep the output to **one paragraph** — this skill needs the *what*, not the full *why/how*.
+
+### 4. Score against the query
+
+Apply the rubric below. Drop everything that scores below Medium. Keep High and Medium.
+
+### 5. Rank and emit
+
+Sort by score (High first, then Medium). Output uses the shape in [Output](#output).
+
+## Scoring rubric
+
+For each PR, evaluate signals and pick the highest tier that applies:
+
+| Tier | Signal |
+|---|---|
+| **High** | Touches a file/symbol/endpoint **explicitly named** in `<query>`. |
+| **High** | Modifies behaviour the symptom directly describes (e.g. `<query>` says "rewards dropped" and the PR changes a reward calculation; or `<query>` says "500 on /quote" and the PR touches the quote handler). |
+| **High** | Touches a code path the symptom would naturally route through, on the same network/chain mentioned in `<query>`. |
+| **Medium** | Touches the same crate/module as a High signal but a different surface (sibling function, shared util). |
+| **Medium** | Touches shared infra (gas estimation, RPC client, DB access, serialization) that could plausibly chain through to the symptom. |
+| **Drop** | Pure docs / dep bumps / lockfile / formatting / test-only / CI config — unless `<query>` is itself a build/CI symptom. |
+| **Drop** | Touches a wholly unrelated crate with no plausible chain to the symptom. |
+
+If two High signals stack (e.g. names the symbol *and* touches the same network's config), note that in the "Why suspected" line — it's a stronger lead than a single signal.
+
+## Output
+
+```
+PR-blame-walk — <query>
+Window:     <time_window>
+Repo:       <repo>
+Scope:      <scope or "—">
+Scanned:    <total candidates after scope filter>
+Surfaced:   <high count> high, <medium count> medium
+
+───── High suspects ────────────────────────────────────────
+1. #<N> — <title>                  [merged <YYYY-MM-DD> by @<author>]
+   What: <one-paragraph synthesis from pr-context-synthesis>
+   Why suspected: <one or two sentences anchoring the score against
+                  the query — must name a file, symbol, or behaviour
+                  change>
+   Inspect: gh pr view <N> -R <repo> --web
+
+2. ...
+
+───── Medium suspects ─────────────────────────────────────
+<same shape>
+```
+
+If zero candidates score High or Medium:
+
+```
+No suspects in window. Symptom likely originates outside this PR set:
+config change, upstream dep, infra/RPC, or a PR that predates the window.
+Consider broadening <time_window> or removing <scope>.
+```
+
+## Rules
+
+1. **Score from evidence, not vibes.** Every "Why suspected" line must point at a concrete file, symbol, or behaviour change. *"Looks suspicious"* is not an answer.
+2. **The query is the lens.** A PR that is risky in general but unrelated to `<query>` is not a suspect. This skill is not a retroactive review — that is what the per-PR review skill is for.
+3. **Don't synthesise causality.** The output identifies *suspects*, not *the* cause. Final attribution requires reproducing the symptom against a revert (or the equivalent) of the suspect.
+4. **Time-anchor the window.** If `<query>` includes "started at 14:00 UTC on 2026-04-20", the window must bracket that timestamp. A multi-month sweep produces noise, not signal.
+5. **Read-only.** Never run mutating `gh` verbs, never post on GitHub from this skill, never check out a suspect's branch. The output goes to terminal.
+
+## When to skip
+
+- The symptom predates the available PR history in the window — broaden the window or accept that the cause is older.
+- The symptom is clearly config / infra / upstream — there are no PRs in this repo to scan.
+- A single-PR investigation is what's needed (you already have a suspect) — use the per-PR review skill (`/review-pr <N>`) instead.
+- Symptom hasn't been narrowed enough for the rubric to do work (e.g. *"things feel slow"*) — narrow the symptom first via logs / metrics, then run this skill.
diff --git a/docs/skills/pr-context-synthesis.md b/docs/skills/pr-context-synthesis.md
new file mode 100644
index 0000000000..9e898d5de7
--- /dev/null
+++ b/docs/skills/pr-context-synthesis.md
@@ -0,0 +1,26 @@
+# Skill — PR context synthesis
+
+Use to produce a tight 1–3 paragraph *what / why / how* block for a single PR (or PR-shaped change). Consumed by the PR review report's CONTEXT section and by [`pr-blame-walk`](pr-blame-walk.md), which calls this skill once per candidate PR before scoring.
+
+## Inputs
+
+- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), substitute the branch name plus the relevant commit messages.
+- `<linked_issue>` — title and body of any issue referenced via `Fixes #N` / `Closes #N` / `Resolves #N`. May be empty.
+- `<diff_summary>` — file scope plus a codemap or per-file note of the actual change. The ground truth the synthesis must stay anchored to.
+
+## Rules
+
+1. **Synthesize, don't copy-paste.** If `<pr_text>` is five words, say so plainly: *"description is minimal; intent inferred from diff"*. Don't pad to look thorough.
+2. **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s *current* state, not the author's iteration history. If a claim is no longer true of the diff, raise a finding with `Action: update the PR description to match the current diff`. Do **not** flag the absence of a changelog of removed/superseded behaviour — that belongs in commit history, not the description.
+3. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism.
+
+## Shape
+
+- **Paragraph 1** — *what* changed. Component + concrete change, drawn from `<diff_summary>`.
+- **Paragraph 2** — *why*. Drawn from `<pr_text>` and `<linked_issue>`. If both are thin, say so.
+- **Paragraph 3** (only if warranted) — *how*. The approach, not a line-by-line walkthrough.
+
+## When to skip
+
+- Trivial changes (docs typo, single-line dep bump, lockfile-only). One sentence is enough; don't force the three-paragraph shape.
+- `<pr_text>` and `<linked_issue>` are both empty *and* `<diff_summary>` already speaks for itself (e.g. a single-file rename). One sentence noting that the diff is self-explanatory is the correct output.

From 827b159525a556cccd1e4ad29b97bb9f1b470d79 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Tue, 28 Apr 2026 04:28:09 +0530
Subject: [PATCH 08/14] refactors for git blame and PR blame walk skills

---
 docs/skills/git-blame-historic-context.md |  31 +++++-
 docs/skills/pr-blame-walk.md              | 130 +++++++++++++++-------
 2 files changed, 113 insertions(+), 48 deletions(-)

diff --git a/docs/skills/git-blame-historic-context.md b/docs/skills/git-blame-historic-context.md
index 1de29a1cef..bcac34cac6 100644
--- a/docs/skills/git-blame-historic-context.md
+++ b/docs/skills/git-blame-historic-context.md
@@ -5,19 +5,38 @@ Use before flagging code that looks unusual, redundant, accidental, or "easy to
 ## Procedure
 
 ```bash
-git blame -L <start>,<end> -- <file>            # who/what/when
-git log --format='%H %s' -n 1 <commit>          # commit message
-gh pr view <#> -R <owner>/<repo>                # if commit message links a PR
+git blame -L <start>,<end> -- <path>                # who/what/when
+
+# cowprotocol/services squash-merges. Most blames point at one commit whose
+# subject ends with "(#NNNN)" — extract the PR number, then pivot to the
+# PR conversation, which is usually richer than the commit body alone.
+git log -1 --format='%s%n%b' <sha>
+gh pr view <NNNN> -R <owner>/<repo>
 ```
 
 ## Decision
 
-If the originating commit message or PR explains *why* the code is shaped the way it is, factor that into the finding. A "this looks accidental, did you mean X?" comment is much weaker when blame shows a deliberate fix from six months ago — and risks asking the author to undo a hard-won change.
+Promote what blame reveals into the finding's Explanation, then weight the finding:
+
+- **Strengthens** — surrounding code was added recently for a reason the diff now contradicts. Keep / raise severity.
+- **Weakens** — the originating PR explains *why* the shape is unusual (deliberate workaround, perf fix, cross-version compat). Soften, or pivot from `Action:` to `Question:`. Example: a two-line guard looked over-defensive; blame showed it was a hot-patch lifted from prod logs — Medium → Question, asking whether the failure mode still applies.
+- **Inverts** — flagging this would ask the author to undo a hard-won fix. Drop the finding. Example: a `Some(_) =>` arm looked redundant; blame revealed it was added months ago to swallow a panic on an edge case the diff was about to remove. Dropped, with a note that the panic is back.
+
+## Edge cases
 
-Mention what blame revealed in the finding's Explanation so the reviewer can defend the point without re-running the lookup.
+- **Merge commit, not squash** — rare here; inspect with `git log -1 --format='%s%n%b' <sha>` and walk parents (`<sha>^1`, `^2`) to find the commit that actually authored the line.
+- **Same author as the PR under review, recent** — context is fresh; ask the author directly in the review thread instead of synthesising blame.
+- **Refactor moved code wholesale** — surface blame points at the move, not the originating fix. Use `git blame -w -C -C -C -L <start>,<end> -- <path>` (whitespace-insensitive, copy-detection) to recover the real authoring commit.
+- **Vendored / generated / contract-binding code** — blame the generator's input (upstream config, source `.sol`, codegen template). Skip if the surface is a JSON ABI or lockfile.
 
 ## When to skip
 
 - Lines entirely new in the diff under review (no history yet).
 - Pure additions of new symbols (nothing to blame).
-- Generated code — blame the generator's input instead.
+- Generated code where the input lives elsewhere — blame the input instead.
+
+## Used by
+
+- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) §6 — before flagging unusual-looking code.
+- [`COW_ORDER_DEBUG_SKILL.md`](../COW_ORDER_DEBUG_SKILL.md) — when investigating *"why is this check here?"* during order debugging.
+- Ad-hoc code investigations where a line of code prompts *"this looks accidental"*.
diff --git a/docs/skills/pr-blame-walk.md b/docs/skills/pr-blame-walk.md
index 0283918393..3ddb7e7655 100644
--- a/docs/skills/pr-blame-walk.md
+++ b/docs/skills/pr-blame-walk.md
@@ -1,84 +1,109 @@
 # Skill — pr-blame-walk
 
-Use to investigate *"did one of the last N merged PRs cause X?"* — a regression, alert, metric drop, or fresh prod incident. Inputs are a query (the symptom) and a time window of merged PRs; output is a ranked list of suspect PRs with evidence per suspect.
+Use to investigate *"a symptom appeared in prod between T1 and T2 — which merged PR is the most likely cause?"*. Inputs are a query (the symptom + locating signal) and a merge window; output is a ranked list of suspect PRs, each with a tight context block and a one-sentence evidence-anchored *"Why suspected"*.
 
-This is **not** a per-PR review. The unit of work is a window of merged PRs scored against an external observation (an alert, a regression report, a metric drop), not a single PR being assessed for merge-readiness. For per-PR review, use [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md).
+This is **not** a per-PR review — that is [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md). It is also not a causality proof: output is *suspects*, not *the* cause. Final attribution requires reproducing the symptom against a revert.
 
 ## Inputs
 
-- `<query>` — what we're hunting. The symptom and any locating signal: error string, endpoint, metric name, network, onset time. Examples:
+- `<query>` — the symptom and any locating signal: error string, endpoint, metric name, network, onset time. Examples:
   - *"500s on `POST /api/v1/orders` mainnet starting 2026-04-20 14:00 UTC"*
   - *"`auction_rewards` dropped 30% on Gnosis from 2026-04-15"*
   - *"settlement gas usage on mainnet up 2× since last week"*
-- `<time_window>` — the merge window to scan. Either a date range (`merged:2026-04-15..2026-04-20`) or a count (last N merged PRs). The window must bracket the symptom's onset; over-broad windows produce noise.
+- `<onset>` — the timestamp the symptom began, RFC3339 if known. Used as a hard cutoff: PRs merged after `<onset>` cannot be the cause.
+- `<time_window>` — the merge window, ending at or before `<onset>`. Either a date range (`merged:YYYY-MM-DD..YYYY-MM-DD`) or a count (last N merged PRs).
 - `<repo>` — `<owner>/<repo>` (defaults to `cowprotocol/services`).
-- `<scope>` — optional path filter (e.g. `crates/autopilot/`, `crates/driver/src/domain/competition/`, `database/sql/`) to narrow candidates when the symptom localises clearly. Skip if the symptom is system-wide.
+- `<scope>` — optional path filter (e.g. `crates/autopilot/`, `crates/driver/src/domain/competition/`, `database/sql/`). Skip if the symptom is system-wide.
 
 ## Procedure
 
 ### 1. Enumerate candidates
 
 ```bash
-gh pr list -R <repo> --state merged --search "<time_window>" \
-  --json number,title,mergedAt,mergeCommit,author,files,additions,deletions \
+gh pr list -R <repo> --state merged \
+  --search "merged:<YYYY-MM-DD>..<YYYY-MM-DD>" \
+  --json number,title,mergedAt,mergeCommit,author,additions,deletions \
   --limit 100
 ```
 
-If `<scope>` is set, drop PRs whose `files[].path` list doesn't intersect it. Drop pure docs/lockfile/dep-bump PRs unless the symptom is a build/CI issue.
+If the result hits `--limit`, narrow `<time_window>` and re-run — over 100 candidates means the window is too wide for the rubric to do useful work. Default `gh pr list` limit is 30; pass `--limit` explicitly.
 
-### 2. Per-candidate fetch (parallel)
+### 2. Cheap pre-filter (no fetch)
 
-For each surviving candidate, in parallel:
+Drop candidates without fetching their diff:
+
+- **Merged after `<onset>`** — `mergedAt > <onset>`. Cannot be cause.
+- **Pure dep bump** — title matches `^Bump `, `^chore: bump`, `RUSTSEC`, `dependabot`. Diff is `Cargo.toml` version bumps + `Cargo.lock` only.
+- **Pure docs / CI workflow / formatting / test-only** — files confined to `docs/`, `.github/`, `*.md`, `**/tests/`. Keep CI changes only if `<query>` is itself a build/CI symptom.
+- **Network mismatch** — title or label explicitly names a chain different from `<query>`'s network *and* the diff doesn't touch shared infra.
+
+### 3. Per-candidate fetch (parallel, capped)
+
+For each surviving candidate, in parallel — **cap concurrency at 5–10**. `gh` shares the GitHub API budget (5000 req/hr authenticated) with the rest of the session.
 
 ```bash
-gh pr view <N> -R <repo> --json title,body,files
+gh pr view <N> -R <repo> --json title,body,files,baseRefName,headRefName,labels
 gh pr diff <N> -R <repo>
 ```
 
-### 3. Per-candidate context block
+If `gh` returns `API rate limit exceeded`, pause and retry with smaller batches. Do not fall back to scraping the web UI.
 
-Run [`pr-context-synthesis`](pr-context-synthesis.md) with `<pr_text>` = title+body, `<linked_issue>` = the referenced issue (if any), `<diff_summary>` = file list + diff hunks. Keep the output to **one paragraph** — this skill needs the *what*, not the full *why/how*.
+### 4. Per-candidate context block
 
-### 4. Score against the query
+Run [`pr-context-synthesis`](pr-context-synthesis.md) per candidate with `<pr_text>` = title+body, `<linked_issue>` = referenced issue if any, `<diff_summary>` = file list + diff hunks. **Cap output at 3–4 sentences** — this skill needs the *what*, not the full *why/how*. The anti-vague-verb rule from that skill's §Rules applies verbatim.
 
-Apply the rubric below. Drop everything that scores below Medium. Keep High and Medium.
+If the diff revives or reverts code whose origin isn't obvious from the PR text, run [`git-blame-historic-context`](git-blame-historic-context.md) on the deleted lines (against `origin/main^`) to recover the originating commit/PR — useful for "Why suspected" when the suspect undoes a prior fix.
 
-### 5. Rank and emit
+### 5. Score, rank, emit
 
-Sort by score (High first, then Medium). Output uses the shape in [Output](#output).
+Apply [Scoring rubric](#scoring-rubric). Drop everything below Medium. Sort by tier, then by tiebreaker. Emit per [Output](#output).
 
 ## Scoring rubric
 
-For each PR, evaluate signals and pick the highest tier that applies:
+For each surviving candidate, evaluate signals and pick the highest tier that applies:
 
 | Tier | Signal |
 |---|---|
 | **High** | Touches a file/symbol/endpoint **explicitly named** in `<query>`. |
-| **High** | Modifies behaviour the symptom directly describes (e.g. `<query>` says "rewards dropped" and the PR changes a reward calculation; or `<query>` says "500 on /quote" and the PR touches the quote handler). |
+| **High** | Modifies behaviour the symptom directly describes (e.g. `<query>` says *"rewards dropped"* and the PR changes a reward calculation; *"500 on /quote"* and the PR touches the quote handler). |
 | **High** | Touches a code path the symptom would naturally route through, on the same network/chain mentioned in `<query>`. |
-| **Medium** | Touches the same crate/module as a High signal but a different surface (sibling function, shared util). |
-| **Medium** | Touches shared infra (gas estimation, RPC client, DB access, serialization) that could plausibly chain through to the symptom. |
-| **Drop** | Pure docs / dep bumps / lockfile / formatting / test-only / CI config — unless `<query>` is itself a build/CI symptom. |
-| **Drop** | Touches a wholly unrelated crate with no plausible chain to the symptom. |
+| **Medium** | Same crate/module as a High signal but a different surface (sibling function, shared util). |
+| **Medium** | Touches shared infra (gas estimation, RPC client, DB access, serialization, settlement queueing) that could plausibly chain through to the symptom. |
+| **Drop** | Pure rename / move / formatting — no behaviour change. (`additions ≈ deletions` and the diff is `mv`-shaped or whitespace-only.) |
+| **Drop** | Pure comment-only changes. |
+| **Drop** | Wholly unrelated crate with no plausible chain to the symptom. |
+
+(The pre-filter Drops in step 2 don't reach this rubric — they were dropped without fetching.)
+
+### Tiebreakers
+
+When two PRs land at the same tier:
+
+1. **Smaller diff first.** Smaller change = cheaper to bisect / revert. `additions+deletions` is the proxy.
+2. **Hot-path > cold-path.** A change in a request-handling code path beats a change to one-shot init or batch-job code.
+3. **Multiple High signals stacked > single High signal.** Note the stack in *"Why suspected"*.
 
-If two High signals stack (e.g. names the symbol *and* touches the same network's config), note that in the "Why suspected" line — it's a stronger lead than a single signal.
+### Single-best mode
+
+If exactly one candidate scores High and the next-highest is Medium-or-lower, surface only that one in *High suspects* with `(only High suspect; next is <tier>)` after the title. Mediums still print.
 
 ## Output
 
 ```
 PR-blame-walk — <query>
 Window:     <time_window>
+Onset:      <onset or "—">
 Repo:       <repo>
 Scope:      <scope or "—">
-Scanned:    <total candidates after scope filter>
+Scanned:    <total candidates after enumeration>
+Pre-filter: <count> dropped (post-onset, deps, docs/CI, network mismatch)
 Surfaced:   <high count> high, <medium count> medium
 
 ───── High suspects ────────────────────────────────────────
-1. #<N> — <title>                  [merged <YYYY-MM-DD> by @<author>]
-   What: <one-paragraph synthesis from pr-context-synthesis>
-   Why suspected: <one or two sentences anchoring the score against
-                  the query — must name a file, symbol, or behaviour
-                  change>
+1. #<N> — <title>                  [merged <YYYY-MM-DD> by @<author>; +<add>/-<del>]
+   What: <3–4 sentence synthesis from pr-context-synthesis>
+   Why suspected: <one or two sentences naming a file, symbol, or
+                  behaviour change. Vague verbs forbidden.>
    Inspect: gh pr view <N> -R <repo> --web
 
 2. ...
@@ -87,25 +112,46 @@ Surfaced:   <high count> high, <medium count> medium
 <same shape>
 ```
 
-If zero candidates score High or Medium:
+Empty result:
 
 ```
 No suspects in window. Symptom likely originates outside this PR set:
-config change, upstream dep, infra/RPC, or a PR that predates the window.
-Consider broadening <time_window> or removing <scope>.
+config change, upstream dep, infra/RPC, or a PR predating the window.
+Consider broadening <time_window>, removing <scope>, or checking
+deploys (`gh api -X GET repos/<repo>/deployments?per_page=20`) and infra.
 ```
 
+## Evidence integration
+
+The skill itself doesn't run evidence queries — but the operator usually has them open already. Cite the query that would sharpen the score; let the operator run it.
+
+| Evidence | Strengthens score when | Read-only query |
+|---|---|---|
+| **Victoria Logs** | Symptom onset matches the suspect's deploy time | `victorialogs_query` MCP, e.g. `container:!controller AND <symptom> \| fields _time, _msg, all` over the window straddling `mergedAt` |
+| **postgres MCP** | Symptom is DB-shaped (timeouts, missing rows, migration drift) and the suspect added a migration | `mcp__postgres-protocol__query` with `SELECT` only; show the SQL before running |
+| **Squash-commit → PR lookup** | You have a SHA from `git log` or a deploy log and need its PR | `gh pr list -R <repo> --search "<sha>" --state merged` |
+| **Deploys vs merges** | Suspect deploy-time vs merge-time mismatch (a PR can merge hours before deploy) | `gh api -X GET repos/<repo>/deployments?per_page=20` |
+
+The principle: name *which* evidence query would tip the score, not the result of running it.
+
 ## Rules
 
-1. **Score from evidence, not vibes.** Every "Why suspected" line must point at a concrete file, symbol, or behaviour change. *"Looks suspicious"* is not an answer.
-2. **The query is the lens.** A PR that is risky in general but unrelated to `<query>` is not a suspect. This skill is not a retroactive review — that is what the per-PR review skill is for.
-3. **Don't synthesise causality.** The output identifies *suspects*, not *the* cause. Final attribution requires reproducing the symptom against a revert (or the equivalent) of the suspect.
-4. **Time-anchor the window.** If `<query>` includes "started at 14:00 UTC on 2026-04-20", the window must bracket that timestamp. A multi-month sweep produces noise, not signal.
-5. **Read-only.** Never run mutating `gh` verbs, never post on GitHub from this skill, never check out a suspect's branch. The output goes to terminal.
+1. **Score from evidence, not vibes.** Every *"Why suspected"* line names a concrete file, symbol, or behaviour change. *"Looks suspicious"* / *"updates X"* / *"changes Y"* are failures — the anti-vague-verb rule from [`pr-context-synthesis`](pr-context-synthesis.md) §Rules applies verbatim.
+2. **The query is the lens.** A PR risky in general but unrelated to `<query>` is not a suspect. This skill is not a retroactive review.
+3. **Don't synthesise causality.** Output is *suspects*; final attribution requires a revert / bisect.
+4. **Time-anchor.** Drop PRs merged after `<onset>` — they cannot be the cause. Watch for force-push edge cases: trust `mergeCommit.oid` and `gh pr view <N>`, not the squash subject's `(#NNNN)` alone.
+5. **Read-only.** Every command in this skill is read-only: `gh` (`view` / `list` / `diff` / `api` GET-only), `git` (`log` / `blame` / `show`), MCP query tools (`SELECT` only). Never `gh pr review`, `gh pr comment`, mutating `gh api` verbs (`POST`/`PATCH`/`DELETE`), DB writes, or `git commit`/`push`/`checkout` of suspect branches.
 
 ## When to skip
 
-- The symptom predates the available PR history in the window — broaden the window or accept that the cause is older.
-- The symptom is clearly config / infra / upstream — there are no PRs in this repo to scan.
-- A single-PR investigation is what's needed (you already have a suspect) — use the per-PR review skill (`/review-pr <N>`) instead.
-- Symptom hasn't been narrowed enough for the rubric to do work (e.g. *"things feel slow"*) — narrow the symptom first via logs / metrics, then run this skill.
+- **Symptom not narrowed to a window yet** (e.g. *"things feel slow"*). Narrow via metrics/logs first, then run this skill.
+- **Single suspect already in mind** — use `/review-pr <N>` or read the diff directly. This skill is N→1, not 1→1.
+- **Symptom predates available PR history** — broaden the window or accept the cause is older than the rubric can see.
+- **Symptom is config / infra / upstream** — there are no source-code PRs in this repo to scan; check deploys, config, RPC provider, or external dependencies instead.
+- **Pre-filter drops everything** — the rubric has nothing to score; widen `<time_window>` or relax `<scope>`.
+
+## Used by
+
+- Ad-hoc incident investigations: paged on a fresh prod regression, run this before opening individual PRs.
+- May be invoked as a follow-up from [`COW_ORDER_DEBUG_SKILL.md`](../COW_ORDER_DEBUG_SKILL.md) when an order-debug session pins a regression to a window with no obvious cause inside the order's lifecycle.
+- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) is the *complement*, not a caller — once this skill surfaces a suspect, run `/review-pr <N>` on it.

From 456d4a9117c2bb67aac6b74110e83acb41a1e344 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Fri, 1 May 2026 14:30:24 +0530
Subject: [PATCH 09/14] feat: enhance PR review process by introducing
 prior-comment follow-up skill

---
 docs/COW_PR_REVIEW_SKILL.md       |  34 +++++--
 docs/skills/pr-followup-triage.md | 163 ++++++++++++++++++++++++++++++
 2 files changed, 190 insertions(+), 7 deletions(-)
 create mode 100644 docs/skills/pr-followup-triage.md

diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index fbcd0e21a7..57961b6816 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -46,13 +46,14 @@ Apply these as the default lens for every change. Pull in CoW-specific siblings
 Steps run in this order. `diff` mode skips PR-metadata steps; `pr-ci` swaps the report sink at the end.
 
 1. Fetch PR metadata and linked issue(s) — [§2](#2-metadata-fetch). *(`pr-local` / `pr-ci` only.)*
-2. Classify diff paths and load conditional context — [§3](#3-conditional-context).
-3. Build a targeted codemap — [§4](#4-codemap-phase).
-4. Synthesize the context block — [§5](#5-context-synthesis).
-5. Review and produce findings — [§6](#6-review-and-severity).
-6. Emit the report — [§7](#7-report-templates). Sink depends on mode.
-7. Offer verification (background) — [§8](#8-verification-offer). *(Local modes only.)*
-8. Print cleanup hint — [§9](#9-cleanup). *(`pr-local` only.)*
+2. Triage prior review comments — [§2.5](#25-prior-comment-follow-up). *(`pr-local` / `pr-ci` only; further skip conditions in the sibling skill.)*
+3. Classify diff paths and load conditional context — [§3](#3-conditional-context).
+4. Build a targeted codemap — [§4](#4-codemap-phase).
+5. Synthesize the context block — [§5](#5-context-synthesis).
+6. Review and produce findings — [§6](#6-review-and-severity).
+7. Emit the report — [§7](#7-report-templates). Sink depends on mode.
+8. Offer verification (background) — [§8](#8-verification-offer). *(Local modes only.)*
+9. Print cleanup hint — [§9](#9-cleanup). *(`pr-local` only.)*
 
 Error behaviour is consolidated in [§10](#10-error-playbook).
 
@@ -94,6 +95,20 @@ If no linked issue is referenced, proceed without one. Do not manufacture one.
 
 ---
 
+## 2.5 Prior-comment follow-up
+
+*(Skip in `diff` mode — there is no PR yet.)*
+
+When a PR has prior inline review comments and the author has pushed new commits since, the reviewer's first question is *"did the author address what I asked last round?"* — not *"is this PR sound from scratch?"*. Surfacing that delta up front lets the reviewer resolve threads (or push back) without re-reading the whole conversation cold.
+
+Follow the [`pr-followup-triage`](skills/pr-followup-triage.md) skill with `<PR_NUMBER>`, `<owner>`, `<repo>` from [§2](#2-metadata-fetch), and `<head_sha>` = `commits[-1].oid` from the same metadata.
+
+The skill's own [When to skip](skills/pr-followup-triage.md#when-to-skip) gates apply — most PRs (no prior reviews, or no commits since) will produce no output and the section is omitted from the report. Do not synthesise a follow-up block for a first-pass review.
+
+The output is the "Prior-comment follow-up" section in [§7](#7-report-templates). Findings raised in this PR may reference a prior comment by its bracketed identifier with `Re: [A]` to chain context.
+
+---
+
 ## 3. Conditional Context
 
 For each changed file, evaluate this table and accumulate a `context_docs` list. Read each matched doc once.
@@ -237,6 +252,10 @@ Codemap
 ───────────────────────────────────────────────────────────
 <from §4 — omit if skipped>
 
+Prior-comment follow-up — @<reviewer> at <prior_sha_short>
+───────────────────────────────────────────────────────────
+<from §2.5 — omit entirely if the sibling skill skipped>
+
 ───────────────────────────────────────────────────────────
 CONTEXT
 ───────────────────────────────────────────────────────────
@@ -350,6 +369,7 @@ The NEXT STEPS footer names the current branch and the `git switch` to return to
 | Working tree dirty (pr-local) | Print `git status --porcelain`, instruct stash/commit, abort. **No auto-stash.** |
 | `gh pr checkout` fails (fork permission) | Degrade to static-diff mode, flag in report header. |
 | Optional skill not installed | Continue using `rg` / general Rust knowledge. Do **not** abort. |
+| Follow-up triage fails (5xx, rate-limit on `/reviews` or `/comments`) | Skip the section, append `follow-up triage skipped: <reason>` to the report header's `Mode:` line, continue with the rest of the review. |
 | `diff` mode but no diff (branch == main) | Print `No diff vs main — nothing to review.`, exit clean. |
 | Verification command fails | Surface output in VERIFICATION block; raise findings only for issues inside changed files. |
 
diff --git a/docs/skills/pr-followup-triage.md b/docs/skills/pr-followup-triage.md
new file mode 100644
index 0000000000..66b2cc6aa9
--- /dev/null
+++ b/docs/skills/pr-followup-triage.md
@@ -0,0 +1,163 @@
+# Skill — pr-followup-triage
+
+Use when a PR under review already has prior inline review comments and the author has pushed new commits since. Classifies each prior comment as **Addressed**, **Discussion needed**, **Pending**, **Silently dropped**, **Moot**, or **Unclear** at the current HEAD, each with a citation the reviewer can click through to verify in one step.
+
+This is a *triage* skill: it surfaces evidence so the human reviewer can decide whether to resolve a GitHub conversation. It never resolves threads itself, never re-quotes the original comments, and never tries to read sentiment from emoji reactions or upvotes.
+
+Read-only: only `gh api` GET verbs and `git` read commands. Never `gh pr review`, mutating `gh api` verbs (`POST` / `PATCH` / `DELETE`), or comment-resolution endpoints.
+
+## Inputs
+
+- `<PR_NUMBER>`, `<owner>`, `<repo>` — same as [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md).
+- `<head_sha>` — current PR head, already fetched in [§2 of the main skill](../COW_PR_REVIEW_SKILL.md#2-metadata-fetch).
+- `<reviewer_filter>` — logins to track. Defaults:
+  - `pr-local`: the `gh`-authenticated user (`gh api user --jq .login`). If that call fails, fall back to "all human reviewers".
+  - `pr-ci`: all human reviewers (`user.type == "User"`; excludes `*[bot]`).
+
+## When to skip
+
+- No reviews on the PR (zero rows from `gh api repos/<owner>/<repo>/pulls/<N>/reviews`).
+- No reviews from any login in `<reviewer_filter>`.
+- For *every* filtered reviewer, their latest review's `commit_id` equals `<head_sha>` — no commits since their last round, so nothing has changed.
+
+The third gate is per-reviewer: triage continues for any reviewer whose latest review predates `<head_sha>`. Only when *all* of them are caught up do we skip the section entirely.
+
+In any skip case, omit the "Prior-comment follow-up" section from the report entirely. Do not print a placeholder.
+
+## Procedure
+
+### 1. Enumerate prior reviews
+
+```bash
+gh api repos/<owner>/<repo>/pulls/<PR_NUMBER>/reviews --paginate
+```
+
+Drop reviews where `state == "DISMISSED"`. Drop reviews from logins outside `<reviewer_filter>`.
+
+For each remaining reviewer, keep only their **latest** review (by `submitted_at`). Older rounds are assumed superseded — by either the reviewer's later round or the new commits.
+
+If nothing remains, skip per [When to skip](#when-to-skip).
+
+### 2. Fetch the inline comments
+
+```bash
+gh api repos/<owner>/<repo>/pulls/<PR_NUMBER>/comments --paginate
+```
+
+Filter to comments whose `pull_request_review_id` matches one of the latest reviews from step 1.
+
+For each comment `c`, record:
+- `c.id`, `c.path`, `c.original_line` (stable; use this), `c.commit_id` (the commit it was made against), `c.body`.
+- Replies — every comment whose `in_reply_to_id == c.id`, sorted by `created_at` ascending.
+
+A review with only a top-level body and no inline comments is general feedback the reviewer can recall directly; it produces no triage entry.
+
+### 3. Classify each comment
+
+For each `c`:
+
+#### a. File-level moot
+
+If `c.path` does not exist at `<head_sha>` (`gh api 'repos/<owner>/<repo>/contents/<c.path>?ref=<head_sha>'` returns 404), check whether the file was *renamed* by scanning the compare endpoint:
+
+```bash
+gh api 'repos/<owner>/<repo>/compare/<c.commit_id>...<head_sha>' \
+  --jq '.files[] | select(.previous_filename == "<c.path>") | .filename'
+```
+
+- If a rename target exists, treat that as the new `c.path` for steps b–d and record the rename in the cite line.
+- Otherwise, status = **Moot (file removed)**. Done.
+
+#### b. Code-change detection
+
+Pull the file at the two commits and locate the function/block containing `c.original_line`:
+
+```bash
+gh api 'repos/<owner>/<repo>/contents/<c.path>?ref=<head_sha>'      --jq '.content' | base64 -d
+gh api 'repos/<owner>/<repo>/contents/<c.path>?ref=<c.commit_id>'   --jq '.content' | base64 -d
+```
+
+Compare the relevant span (function body, struct definition, or surrounding ~20 lines if no clean enclosing scope):
+
+- **Block removed** in HEAD → status = **Moot (code removed)**. Note which commit removed it via `git log <c.commit_id>..<head_sha> -- <c.path>` (run locally; falls back to `gh api .../compare/...` in degraded static-diff mode).
+- **Span identical** → `code_change = false`. Move to step c.
+- **Span differs** → `code_change = true`. Capture the new content.
+
+#### c. Reply detection
+
+Inspect the reply chain for `c`:
+
+- No replies → `reply = none`.
+- Latest reply is short and forward-looking (`"will fix"`, `"good catch"`, `"TODO"`, `"later"`) → `reply = commitment`.
+- Latest reply asserts done or argues a position (`"done"`, `"fixed in <sha>"`, `"no, because X"`) → `reply = substantive`.
+
+Don't over-classify; if the reply is ambiguous, treat as `substantive` and let the human read it.
+
+#### d. Final status
+
+| `code_change` | `reply` | Status |
+|---|---|---|
+| yes, addresses the comment | any | **Addressed** |
+| yes, diverges from the comment | any | **Discussion needed** |
+| yes, can't tell from the diff | any | **Unclear** |
+| no | `substantive` | **Discussion needed** |
+| no | `commitment` | **Pending** |
+| no | `none` | **Silently dropped** |
+
+Whether a code change "addresses" the comment is an LLM judgement. **Be conservative.** Default to **Unclear** when in doubt — a false **Addressed** misleads the reviewer into resolving a thread that should stay open.
+
+## Output
+
+Append one section to the main report, between Codemap and CONTEXT:
+
+```
+Prior-comment follow-up — @<reviewer> at <prior_sha_short>
+───────────────────────────────────────────────────────────
+[A] <c.path>:<original_line>
+    Asked:   <≤12-word recap of the comment's request>
+    Status:  ✓ Addressed
+    Cite:    <c.path>:<line at HEAD>  ← changed in <commit_short>; reads
+             "<2-line excerpt of the new code>"
+
+[B] <c.path>:<original_line>
+    Asked:   ...
+    Status:  💬 Discussion needed
+    Cite:    Author replied: "<≤30-char excerpt>"; <c.path>:<line> unchanged.
+
+[C] <c.path>:<original_line>
+    Asked:   ...
+    Status:  ⏳ Pending
+    Cite:    Author replied "will fix" but <c.path>:<line> unchanged.
+
+[D] ...
+    Status:  ⚠ Silently dropped
+    Cite:    No reply; <c.path>:<line> unchanged since <prior_sha_short>.
+
+[E] ...
+    Status:  🚫 Moot (file removed in <commit_short>)
+
+[F] ...
+    Status:  ❓ Unclear
+    Cite:    Code at <c.path>:<line> changed in <commit_short>, but the
+             new shape doesn't obviously match what was asked. Re-read
+             the thread.
+```
+
+Sort entries by status, with what-needs-attention first: `Discussion needed` → `Pending` → `Silently dropped` → `Unclear` → `Addressed` → `Moot`. The reviewer's eye lands first on what still needs attention.
+
+Stable identifiers `[A]`, `[B]`, ... let new findings reference an old comment with `Re: [A]`.
+
+If `<reviewer_filter>` matched multiple logins, group by reviewer with a sub-heading.
+
+## Rules
+
+1. **Conservative defaults.** When the LLM cannot tell whether a code change addresses a comment, status is **Unclear**, never **Addressed**. False "Addressed" is worse than no claim — it tricks the reviewer into closing a thread that shouldn't close.
+2. **Cite, don't paraphrase.** Every `Cite:` line ends with a `<path>:<line>` at HEAD, a `"removed in <commit>"`, or a short reply excerpt. The reviewer must be one click from verifying.
+3. **No sentiment parsing.** Don't infer satisfaction from `:+1:` reactions or "ok thanks". The reviewer makes that judgement, not the AI.
+4. **Don't quote the original comment.** The reviewer wrote it; they remember. The output's job is the *delta* — what changed about the comment's status since it was made.
+5. **Read-only.** No `gh pr review`, no comment-resolution endpoints, no mutating verbs. Even if the AI is certain everything is fine, the human resolves.
+6. **One round per reviewer.** Triage the latest review per reviewer only. Earlier rounds are out of scope; if they matter, the reviewer leaves a fresh round.
+
+## Used by
+
+- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) — invoked from the new follow-up triage step when prior human reviews exist on the PR.

From 09d2413ae9479867a5b22e306a68f639aa266649 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Fri, 1 May 2026 17:13:57 +0530
Subject: [PATCH 10/14] refactor: update prior-comment follow-up section in PR
 review documentation and remove obsolete skill

---
 docs/COW_PR_REVIEW_SKILL.md       |  26 +++--
 docs/skills/pr-followup-triage.md | 163 ------------------------------
 2 files changed, 19 insertions(+), 170 deletions(-)
 delete mode 100644 docs/skills/pr-followup-triage.md

diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index 57961b6816..c3a111357c 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -46,7 +46,7 @@ Apply these as the default lens for every change. Pull in CoW-specific siblings
 Steps run in this order. `diff` mode skips PR-metadata steps; `pr-ci` swaps the report sink at the end.
 
 1. Fetch PR metadata and linked issue(s) — [§2](#2-metadata-fetch). *(`pr-local` / `pr-ci` only.)*
-2. Triage prior review comments — [§2.5](#25-prior-comment-follow-up). *(`pr-local` / `pr-ci` only; further skip conditions in the sibling skill.)*
+2. Triage prior review comments — [§2.5](#25-prior-comment-follow-up). *(`pr-local` / `pr-ci` only; further skip conditions in §2.5.)*
 3. Classify diff paths and load conditional context — [§3](#3-conditional-context).
 4. Build a targeted codemap — [§4](#4-codemap-phase).
 5. Synthesize the context block — [§5](#5-context-synthesis).
@@ -97,15 +97,17 @@ If no linked issue is referenced, proceed without one. Do not manufacture one.
 
 ## 2.5 Prior-comment follow-up
 
-*(Skip in `diff` mode — there is no PR yet.)*
+Skip when any of these holds — no output, the section is omitted from the report:
 
-When a PR has prior inline review comments and the author has pushed new commits since, the reviewer's first question is *"did the author address what I asked last round?"* — not *"is this PR sound from scratch?"*. Surfacing that delta up front lets the reviewer resolve threads (or push back) without re-reading the whole conversation cold.
+- `mode == diff` (no PR yet).
+- No prior human inline reviews on the PR.
+- For every human reviewer, their latest review's `commit_id` already equals `<head_sha>` — no new commits since their last round.
 
-Follow the [`pr-followup-triage`](skills/pr-followup-triage.md) skill with `<PR_NUMBER>`, `<owner>`, `<repo>` from [§2](#2-metadata-fetch), and `<head_sha>` = `commits[-1].oid` from the same metadata.
+Otherwise, for each prior inline comment from a human reviewer (use `gh api repos/<owner>/<repo>/pulls/<PR_NUMBER>/reviews` and `/comments`, GET only), surface one entry in the "Prior-comment follow-up" block ([§7](#7-report-templates)). Cite the new code at `<path>:<line>` that addresses it, the author's reply, or note that nothing has changed since `<prior_sha>`. Use stable identifiers `[A]`, `[B]`, ... so findings raised in this review can chain context with `Re: [A]`.
 
-The skill's own [When to skip](skills/pr-followup-triage.md#when-to-skip) gates apply — most PRs (no prior reviews, or no commits since) will produce no output and the section is omitted from the report. Do not synthesise a follow-up block for a first-pass review.
+**Be conservative.** When the diff doesn't make the answer obvious, say so explicitly — false *"addressed"* tricks the reviewer into closing a thread that should stay open. Don't infer satisfaction from emoji reactions or short acknowledgements; the reviewer makes that call.
 
-The output is the "Prior-comment follow-up" section in [§7](#7-report-templates). Findings raised in this PR may reference a prior comment by its bracketed identifier with `Re: [A]` to chain context.
+**Read-only.** No `gh pr review`, no `gh api` mutating verbs (`POST`/`PATCH`/`DELETE`), no comment-resolution endpoints. The reviewer resolves threads, not you.
 
 ---
 
@@ -254,7 +256,17 @@ Codemap
 
 Prior-comment follow-up — @<reviewer> at <prior_sha_short>
 ───────────────────────────────────────────────────────────
-<from §2.5 — omit entirely if the sibling skill skipped>
+[A] <path>:<original_line>
+    Asked:   <≤12-word recap of what was asked>
+    Status:  ✓ Addressed  |  💬 Discussion needed  |  ⏳ Pending
+             ⚠ Silently dropped  |  🚫 Moot  |  ❓ Unclear
+    Cite:    <path>:<line at HEAD>, or "Author replied: ...", or
+             "No change since <prior_sha_short>"
+
+Sort what-needs-attention first: Discussion needed → Pending →
+Silently dropped → Unclear → Addressed → Moot.
+
+(Omit the whole block if §2.5 skipped.)
 
 ───────────────────────────────────────────────────────────
 CONTEXT
diff --git a/docs/skills/pr-followup-triage.md b/docs/skills/pr-followup-triage.md
deleted file mode 100644
index 66b2cc6aa9..0000000000
--- a/docs/skills/pr-followup-triage.md
+++ /dev/null
@@ -1,163 +0,0 @@
-# Skill — pr-followup-triage
-
-Use when a PR under review already has prior inline review comments and the author has pushed new commits since. Classifies each prior comment as **Addressed**, **Discussion needed**, **Pending**, **Silently dropped**, **Moot**, or **Unclear** at the current HEAD, each with a citation the reviewer can click through to verify in one step.
-
-This is a *triage* skill: it surfaces evidence so the human reviewer can decide whether to resolve a GitHub conversation. It never resolves threads itself, never re-quotes the original comments, and never tries to read sentiment from emoji reactions or upvotes.
-
-Read-only: only `gh api` GET verbs and `git` read commands. Never `gh pr review`, mutating `gh api` verbs (`POST` / `PATCH` / `DELETE`), or comment-resolution endpoints.
-
-## Inputs
-
-- `<PR_NUMBER>`, `<owner>`, `<repo>` — same as [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md).
-- `<head_sha>` — current PR head, already fetched in [§2 of the main skill](../COW_PR_REVIEW_SKILL.md#2-metadata-fetch).
-- `<reviewer_filter>` — logins to track. Defaults:
-  - `pr-local`: the `gh`-authenticated user (`gh api user --jq .login`). If that call fails, fall back to "all human reviewers".
-  - `pr-ci`: all human reviewers (`user.type == "User"`; excludes `*[bot]`).
-
-## When to skip
-
-- No reviews on the PR (zero rows from `gh api repos/<owner>/<repo>/pulls/<N>/reviews`).
-- No reviews from any login in `<reviewer_filter>`.
-- For *every* filtered reviewer, their latest review's `commit_id` equals `<head_sha>` — no commits since their last round, so nothing has changed.
-
-The third gate is per-reviewer: triage continues for any reviewer whose latest review predates `<head_sha>`. Only when *all* of them are caught up do we skip the section entirely.
-
-In any skip case, omit the "Prior-comment follow-up" section from the report entirely. Do not print a placeholder.
-
-## Procedure
-
-### 1. Enumerate prior reviews
-
-```bash
-gh api repos/<owner>/<repo>/pulls/<PR_NUMBER>/reviews --paginate
-```
-
-Drop reviews where `state == "DISMISSED"`. Drop reviews from logins outside `<reviewer_filter>`.
-
-For each remaining reviewer, keep only their **latest** review (by `submitted_at`). Older rounds are assumed superseded — by either the reviewer's later round or the new commits.
-
-If nothing remains, skip per [When to skip](#when-to-skip).
-
-### 2. Fetch the inline comments
-
-```bash
-gh api repos/<owner>/<repo>/pulls/<PR_NUMBER>/comments --paginate
-```
-
-Filter to comments whose `pull_request_review_id` matches one of the latest reviews from step 1.
-
-For each comment `c`, record:
-- `c.id`, `c.path`, `c.original_line` (stable; use this), `c.commit_id` (the commit it was made against), `c.body`.
-- Replies — every comment whose `in_reply_to_id == c.id`, sorted by `created_at` ascending.
-
-A review with only a top-level body and no inline comments is general feedback the reviewer can recall directly; it produces no triage entry.
-
-### 3. Classify each comment
-
-For each `c`:
-
-#### a. File-level moot
-
-If `c.path` does not exist at `<head_sha>` (`gh api 'repos/<owner>/<repo>/contents/<c.path>?ref=<head_sha>'` returns 404), check whether the file was *renamed* by scanning the compare endpoint:
-
-```bash
-gh api 'repos/<owner>/<repo>/compare/<c.commit_id>...<head_sha>' \
-  --jq '.files[] | select(.previous_filename == "<c.path>") | .filename'
-```
-
-- If a rename target exists, treat that as the new `c.path` for steps b–d and record the rename in the cite line.
-- Otherwise, status = **Moot (file removed)**. Done.
-
-#### b. Code-change detection
-
-Pull the file at the two commits and locate the function/block containing `c.original_line`:
-
-```bash
-gh api 'repos/<owner>/<repo>/contents/<c.path>?ref=<head_sha>'      --jq '.content' | base64 -d
-gh api 'repos/<owner>/<repo>/contents/<c.path>?ref=<c.commit_id>'   --jq '.content' | base64 -d
-```
-
-Compare the relevant span (function body, struct definition, or surrounding ~20 lines if no clean enclosing scope):
-
-- **Block removed** in HEAD → status = **Moot (code removed)**. Note which commit removed it via `git log <c.commit_id>..<head_sha> -- <c.path>` (run locally; falls back to `gh api .../compare/...` in degraded static-diff mode).
-- **Span identical** → `code_change = false`. Move to step c.
-- **Span differs** → `code_change = true`. Capture the new content.
-
-#### c. Reply detection
-
-Inspect the reply chain for `c`:
-
-- No replies → `reply = none`.
-- Latest reply is short and forward-looking (`"will fix"`, `"good catch"`, `"TODO"`, `"later"`) → `reply = commitment`.
-- Latest reply asserts done or argues a position (`"done"`, `"fixed in <sha>"`, `"no, because X"`) → `reply = substantive`.
-
-Don't over-classify; if the reply is ambiguous, treat as `substantive` and let the human read it.
-
-#### d. Final status
-
-| `code_change` | `reply` | Status |
-|---|---|---|
-| yes, addresses the comment | any | **Addressed** |
-| yes, diverges from the comment | any | **Discussion needed** |
-| yes, can't tell from the diff | any | **Unclear** |
-| no | `substantive` | **Discussion needed** |
-| no | `commitment` | **Pending** |
-| no | `none` | **Silently dropped** |
-
-Whether a code change "addresses" the comment is an LLM judgement. **Be conservative.** Default to **Unclear** when in doubt — a false **Addressed** misleads the reviewer into resolving a thread that should stay open.
-
-## Output
-
-Append one section to the main report, between Codemap and CONTEXT:
-
-```
-Prior-comment follow-up — @<reviewer> at <prior_sha_short>
-───────────────────────────────────────────────────────────
-[A] <c.path>:<original_line>
-    Asked:   <≤12-word recap of the comment's request>
-    Status:  ✓ Addressed
-    Cite:    <c.path>:<line at HEAD>  ← changed in <commit_short>; reads
-             "<2-line excerpt of the new code>"
-
-[B] <c.path>:<original_line>
-    Asked:   ...
-    Status:  💬 Discussion needed
-    Cite:    Author replied: "<≤30-char excerpt>"; <c.path>:<line> unchanged.
-
-[C] <c.path>:<original_line>
-    Asked:   ...
-    Status:  ⏳ Pending
-    Cite:    Author replied "will fix" but <c.path>:<line> unchanged.
-
-[D] ...
-    Status:  ⚠ Silently dropped
-    Cite:    No reply; <c.path>:<line> unchanged since <prior_sha_short>.
-
-[E] ...
-    Status:  🚫 Moot (file removed in <commit_short>)
-
-[F] ...
-    Status:  ❓ Unclear
-    Cite:    Code at <c.path>:<line> changed in <commit_short>, but the
-             new shape doesn't obviously match what was asked. Re-read
-             the thread.
-```
-
-Sort entries by status, with what-needs-attention first: `Discussion needed` → `Pending` → `Silently dropped` → `Unclear` → `Addressed` → `Moot`. The reviewer's eye lands first on what still needs attention.
-
-Stable identifiers `[A]`, `[B]`, ... let new findings reference an old comment with `Re: [A]`.
-
-If `<reviewer_filter>` matched multiple logins, group by reviewer with a sub-heading.
-
-## Rules
-
-1. **Conservative defaults.** When the LLM cannot tell whether a code change addresses a comment, status is **Unclear**, never **Addressed**. False "Addressed" is worse than no claim — it tricks the reviewer into closing a thread that shouldn't close.
-2. **Cite, don't paraphrase.** Every `Cite:` line ends with a `<path>:<line>` at HEAD, a `"removed in <commit>"`, or a short reply excerpt. The reviewer must be one click from verifying.
-3. **No sentiment parsing.** Don't infer satisfaction from `:+1:` reactions or "ok thanks". The reviewer makes that judgement, not the AI.
-4. **Don't quote the original comment.** The reviewer wrote it; they remember. The output's job is the *delta* — what changed about the comment's status since it was made.
-5. **Read-only.** No `gh pr review`, no comment-resolution endpoints, no mutating verbs. Even if the AI is certain everything is fine, the human resolves.
-6. **One round per reviewer.** Triage the latest review per reviewer only. Earlier rounds are out of scope; if they matter, the reviewer leaves a fresh round.
-
-## Used by
-
-- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) — invoked from the new follow-up triage step when prior human reviews exist on the PR.

From b04e4f56d2c96f70ec0f67b3ea5e874d036369c7 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Tue, 5 May 2026 18:43:58 +0530
Subject: [PATCH 11/14] feat: add /blame-context and /pr-synthesis slash
 commands with worked examples

---
 .claude/commands/blame-context.md         | 52 +++++++++++++++++++++
 .claude/commands/pr-synthesis.md          | 57 +++++++++++++++++++++++
 docs/skills/git-blame-historic-context.md | 30 ++++++++++++
 docs/skills/pr-context-synthesis.md       | 21 +++++++++
 4 files changed, 160 insertions(+)
 create mode 100644 .claude/commands/blame-context.md
 create mode 100644 .claude/commands/pr-synthesis.md

diff --git a/.claude/commands/blame-context.md b/.claude/commands/blame-context.md
new file mode 100644
index 0000000000..bfddef651c
--- /dev/null
+++ b/.claude/commands/blame-context.md
@@ -0,0 +1,52 @@
+---
+description: Recover historic context for a file:line via `git blame` plus the squash-PR pivot. Wraps `docs/skills/git-blame-historic-context.md` end-to-end and prints the strengthens / weakens / inverts decision so the caller can weight a finding before flagging suspicious-looking code. Read-only.
+---
+
+Look up historic context for: $ARGUMENTS
+
+## Parse $ARGUMENTS
+
+Accept any of:
+
+- `path:line` — single line. e.g. `crates/driver/src/.../settlement.rs:444`
+- `path:start-end` — line range. e.g. `crates/driver/src/.../settlement.rs:434-447`
+- `path line` or `path start-end` — space-separated form (handy when paths contain `:`).
+
+If `$ARGUMENTS` is empty or unparseable, print the usage block and abort:
+
+```
+Usage: /blame-context <path>:<line>
+       /blame-context <path>:<start>-<end>
+```
+
+## Procedure
+
+Follow `docs/skills/git-blame-historic-context.md` end-to-end. Concretely:
+
+1. `git blame -L <start>,<end> -- <path>` to find the originating commit.
+2. If the surface looks like a wholesale move/refactor (same author across many lines, recent date, identical hashes), retry with `git blame -w -C -C -C -L <start>,<end> -- <path>` to recover the real authoring commit.
+3. `git log -1 --format='%s%n%b' <sha>` for the commit body.
+4. If the subject ends with `(#NNNN)`, pivot to the PR conversation: `gh pr view <NNNN> -R <owner>/<repo>`. The PR body is usually richer than the squash commit alone.
+
+Then print the report below.
+
+## Output
+
+```
+─── Blame for <path>:<lines>
+<sha>  <author>  <date>  <subject>
+
+─── Originating commit / PR
+<commit body, OR PR title + body if a (#NNNN) PR was found>
+
+─── Decision
+<Strengthens | Weakens | Inverts> the suspicion that this code is unusual.
+<one-sentence reason naming the concrete signal from the originating PR/commit>
+Action: <keep finding | downgrade to Question | drop>
+```
+
+## Rules
+
+- **Read-only.** `git blame`, `git log`, `git show`, `gh pr view`, `gh api` GET-only. No `git commit`, no `git checkout`, no mutating `gh api` verbs.
+- **Don't comment on the PR.** Don't edit files. The caller owns what to do with the decision — this command just supplies the evidence.
+- **Don't invent a decision.** If blame surfaces nothing useful (line is brand new, generator-output, vendored), print `Decision: insufficient signal` and explain why.
diff --git a/.claude/commands/pr-synthesis.md b/.claude/commands/pr-synthesis.md
new file mode 100644
index 0000000000..e1a2b23400
--- /dev/null
+++ b/.claude/commands/pr-synthesis.md
@@ -0,0 +1,57 @@
+---
+description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Useful as a standalone summary or as a building block for review / incident-investigation flows. Read-only.
+---
+
+Synthesise PR: $ARGUMENTS
+
+## Parse $ARGUMENTS
+
+Accept any of:
+
+- A PR number: `4267`
+- A full URL: `https://github.com/cowprotocol/services/pull/4267`
+- An `owner/repo#N` form: `cowprotocol/services#4267`
+
+Default `owner/repo` to `cowprotocol/services` when only a number is given.
+
+If `$ARGUMENTS` is empty or unparseable, print:
+
+```
+Usage: /pr-synthesis <PR_NUMBER>
+       /pr-synthesis https://github.com/owner/repo/pull/<N>
+       /pr-synthesis owner/repo#<N>
+```
+
+and abort.
+
+## Procedure
+
+Fetch in parallel:
+
+```bash
+gh pr view <N> -R <owner>/<repo> --json title,body,files,labels,baseRefName,headRefName
+gh pr diff <N> -R <owner>/<repo>
+```
+
+Parse the PR body for `Fixes #N` / `Closes #N` / `Resolves #N` (case-insensitive) and fetch each linked issue:
+
+```bash
+gh issue view <N> -R <owner>/<repo> --json title,body,labels,state
+```
+
+If no linked issue is referenced, proceed without one — never invent one.
+
+Build:
+
+- `<pr_text>` = title + body
+- `<linked_issue>` = the fetched issue, if any
+- `<diff_summary>` = a per-file note of the change scope (path + brief description, drawn from the diff hunks)
+
+Then follow `docs/skills/pr-context-synthesis.md` Rules and Shape. Output the synthesis verbatim — no header, no metadata, no separator lines. The caller pastes this directly into a review thread, an incident report, or a Slack message.
+
+## Rules
+
+- **Read-only.** `gh pr view`, `gh pr diff`, `gh issue view`, `gh api` GET-only. No `gh pr review`, no `gh pr comment`, no mutating `gh api` verbs.
+- **Synthesise, don't copy-paste.** If `<pr_text>` is sparse, say so plainly: *"description is minimal; intent inferred from diff"*.
+- **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism. The anti-vague-verb rule from the skill doc applies verbatim.
+- **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s current state. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*.
diff --git a/docs/skills/git-blame-historic-context.md b/docs/skills/git-blame-historic-context.md
index bcac34cac6..1a13d8990c 100644
--- a/docs/skills/git-blame-historic-context.md
+++ b/docs/skills/git-blame-historic-context.md
@@ -2,6 +2,36 @@
 
 Use before flagging code that looks unusual, redundant, accidental, or "easy to clean up". Often that code looks weird because it had to.
 
+## How to invoke
+
+Two ways:
+
+- **Procedurally** — follow the steps below when you encounter unusual-looking code in any context (PR review, order-debug session, ad-hoc *"why is this here?"*).
+- **Via slash command** — `/blame-context <path>:<line>` wraps the procedure end-to-end and prints the strengthens / weakens / inverts decision. Useful when you want a one-shot answer rather than walking the procedure manually.
+
+## Example
+
+A reviewer sees this in `crates/driver/src/domain/competition/solution/settlement.rs:444`:
+
+```rust
+let max_gas = eth::Gas(block_limit.0 / eth::U256::from(2));
+```
+
+`/2` looks arbitrary — a reviewer might be tempted to flag it as a magic number. Run the procedure first:
+
+```bash
+git blame -L 444,444 -- crates/driver/src/domain/competition/solution/settlement.rs
+# → a4ee76aae3  Felix Leupold  2024-03-18  ...
+
+git log -1 --format='%s%n%b' a4ee76aae3
+# → subject ends with `(#NNNN)`; pivot to the PR
+gh pr view <NNNN> -R cowprotocol/services
+# → body: "block builders' default algorithm picks tx whose gas limit
+#    fits remaining space; leave headroom for inclusion."
+```
+
+Decision: **weakens** the "magic number" suspicion — the constant has documented inclusion-economics rationale. Drop the finding, or downgrade to a `Question:` confirming the rationale still applies on the chain in question.
+
 ## Procedure
 
 ```bash
diff --git a/docs/skills/pr-context-synthesis.md b/docs/skills/pr-context-synthesis.md
index 9e898d5de7..0afbd8b480 100644
--- a/docs/skills/pr-context-synthesis.md
+++ b/docs/skills/pr-context-synthesis.md
@@ -2,6 +2,13 @@
 
 Use to produce a tight 1–3 paragraph *what / why / how* block for a single PR (or PR-shaped change). Consumed by the PR review report's CONTEXT section and by [`pr-blame-walk`](pr-blame-walk.md), which calls this skill once per candidate PR before scoring.
 
+## How to invoke
+
+Two ways:
+
+- **Procedurally** — follow the rules below when you need a tight 1–3 paragraph synthesis (CONTEXT block of `/review-pr`, per-candidate block in `pr-blame-walk`, ad-hoc *"summarise this PR for me"*).
+- **Via slash command** — `/pr-synthesis <N|owner/repo#N|url>` fetches the PR, linked issue, and diff for you and prints the synthesis verbatim.
+
 ## Inputs
 
 - `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), substitute the branch name plus the relevant commit messages.
@@ -20,6 +27,20 @@ Use to produce a tight 1–3 paragraph *what / why / how* block for a single PR
 - **Paragraph 2** — *why*. Drawn from `<pr_text>` and `<linked_issue>`. If both are thin, say so.
 - **Paragraph 3** (only if warranted) — *how*. The approach, not a line-by-line walkthrough.
 
+## Example
+
+Inputs (real PR — cowprotocol/services#4371):
+
+- `<pr_text>` — *"Enforce EIP-7825 per-tx gas cap on settlement"*; body explains the Fusaka mempool cap and references the existing quote-side enforcement (#4261).
+- `<linked_issue>` — #4368, *"Driver doesn't enforce the EIP-7825 per-tx gas cap"* (labels: bug, good first issue).
+- `<diff_summary>` — +58 −1 in `settlement.rs`; new const `EIP_7825_TX_GAS_CAP`, `Gas::new` capped via `min(half_block, EIP_7825)`, three tests.
+
+Output:
+
+> Fusaka introduced EIP-7825, capping any single tx at 2^24 − 1 gas. The driver's `Gas::new` was applying only the older `block_limit/2` heuristic; a solution between the EIP-7825 cap and that heuristic could pass validation in `/solve` and never settle. Issue #4368 flagged this as theoretical (no production hits); the fix ports the same idea PR #4261 applied to the quote path.
+>
+> Mechanically: `max_gas = min(half_block, EIP_7825)`, preserving inclusion-economics on chains with low block gas limits via the `min`.
+
 ## When to skip
 
 - Trivial changes (docs typo, single-line dep bump, lockfile-only). One sentence is enough; don't force the three-paragraph shape.

From 6f58e13ba9175bf65d60dd343271c5f874f101f2 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Tue, 5 May 2026 19:43:16 +0530
Subject: [PATCH 12/14] =?UTF-8?q?fix:=20address=20Gemini=20review=20?=
 =?UTF-8?q?=E2=80=94=20headRefOid,=20closingIssuesReferences,=20drop=20unu?=
 =?UTF-8?q?sed=20-R=20flags,=20align=20drift=20rule?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .claude/commands/blame-context.md         | 2 +-
 .claude/commands/pr-synthesis.md          | 8 ++++----
 docs/COW_PR_REVIEW_SKILL.md               | 8 ++++----
 docs/skills/git-blame-historic-context.md | 4 ++--
 docs/skills/pr-context-synthesis.md       | 2 +-
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/.claude/commands/blame-context.md b/.claude/commands/blame-context.md
index bfddef651c..41e0a3a8cb 100644
--- a/.claude/commands/blame-context.md
+++ b/.claude/commands/blame-context.md
@@ -26,7 +26,7 @@ Follow `docs/skills/git-blame-historic-context.md` end-to-end. Concretely:
 1. `git blame -L <start>,<end> -- <path>` to find the originating commit.
 2. If the surface looks like a wholesale move/refactor (same author across many lines, recent date, identical hashes), retry with `git blame -w -C -C -C -L <start>,<end> -- <path>` to recover the real authoring commit.
 3. `git log -1 --format='%s%n%b' <sha>` for the commit body.
-4. If the subject ends with `(#NNNN)`, pivot to the PR conversation: `gh pr view <NNNN> -R <owner>/<repo>`. The PR body is usually richer than the squash commit alone.
+4. If the subject ends with `(#NNNN)`, pivot to the PR conversation: `gh pr view <NNNN>` (the gh CLI infers the repo from the working directory). The PR body is usually richer than the squash commit alone.
 
 Then print the report below.
 
diff --git a/.claude/commands/pr-synthesis.md b/.claude/commands/pr-synthesis.md
index e1a2b23400..558122896f 100644
--- a/.claude/commands/pr-synthesis.md
+++ b/.claude/commands/pr-synthesis.md
@@ -29,17 +29,17 @@ and abort.
 Fetch in parallel:
 
 ```bash
-gh pr view <N> -R <owner>/<repo> --json title,body,files,labels,baseRefName,headRefName
+gh pr view <N> -R <owner>/<repo> --json title,body,files,labels,baseRefName,headRefName,closingIssuesReferences
 gh pr diff <N> -R <owner>/<repo>
 ```
 
-Parse the PR body for `Fixes #N` / `Closes #N` / `Resolves #N` (case-insensitive) and fetch each linked issue:
+For each entry in `closingIssuesReferences` (GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N`, more reliable than regexing the body), fetch the issue:
 
 ```bash
-gh issue view <N> -R <owner>/<repo> --json title,body,labels,state
+gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name> --json title,body,labels,state
 ```
 
-If no linked issue is referenced, proceed without one — never invent one.
+If `closingIssuesReferences` is empty, proceed without a linked issue — never invent one.
 
 Build:
 
diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
index c3a111357c..7a6d7855ee 100644
--- a/docs/COW_PR_REVIEW_SKILL.md
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -67,7 +67,7 @@ Run in **parallel** (single message, multiple Bash tool calls):
 
 ```bash
 gh pr view <PR_NUMBER> -R <owner>/<repo> \
-  --json title,body,author,labels,files,baseRefName,headRefName,additions,deletions,commits,reviewDecision,isDraft,state
+  --json title,body,author,labels,files,baseRefName,headRefName,headRefOid,additions,deletions,commits,reviewDecision,isDraft,state,closingIssuesReferences
 
 gh pr diff <PR_NUMBER> -R <owner>/<repo>
 ```
@@ -80,13 +80,13 @@ gh pr diff <PR_NUMBER> --patch -R <owner>/<repo>
 
 ### Linked issues
 
-Parse the PR body for `Fixes #<N>`, `Closes #<N>`, `Resolves #<N>` (case-insensitive). Fetch each in parallel with the above:
+The `gh pr view` call above already returns `closingIssuesReferences` — GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N` from the PR body. Iterate that array (in parallel with the diff fetch) and pull each issue:
 
 ```bash
-gh issue view <N> -R <owner>/<repo> --json title,body,labels,state
+gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name> --json title,body,labels,state
 ```
 
-If no linked issue is referenced, proceed without one. Do not manufacture one.
+If `closingIssuesReferences` is empty, proceed without one. Do not manufacture one.
 
 ### State handling
 
diff --git a/docs/skills/git-blame-historic-context.md b/docs/skills/git-blame-historic-context.md
index 1a13d8990c..a9da2a9db1 100644
--- a/docs/skills/git-blame-historic-context.md
+++ b/docs/skills/git-blame-historic-context.md
@@ -25,7 +25,7 @@ git blame -L 444,444 -- crates/driver/src/domain/competition/solution/settlement
 
 git log -1 --format='%s%n%b' a4ee76aae3
 # → subject ends with `(#NNNN)`; pivot to the PR
-gh pr view <NNNN> -R cowprotocol/services
+gh pr view <NNNN>
 # → body: "block builders' default algorithm picks tx whose gas limit
 #    fits remaining space; leave headroom for inclusion."
 ```
@@ -41,7 +41,7 @@ git blame -L <start>,<end> -- <path>                # who/what/when
 # subject ends with "(#NNNN)" — extract the PR number, then pivot to the
 # PR conversation, which is usually richer than the commit body alone.
 git log -1 --format='%s%n%b' <sha>
-gh pr view <NNNN> -R <owner>/<repo>
+gh pr view <NNNN>
 ```
 
 ## Decision
diff --git a/docs/skills/pr-context-synthesis.md b/docs/skills/pr-context-synthesis.md
index 0afbd8b480..dbc7ff355d 100644
--- a/docs/skills/pr-context-synthesis.md
+++ b/docs/skills/pr-context-synthesis.md
@@ -18,7 +18,7 @@ Two ways:
 ## Rules
 
 1. **Synthesize, don't copy-paste.** If `<pr_text>` is five words, say so plainly: *"description is minimal; intent inferred from diff"*. Don't pad to look thorough.
-2. **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s *current* state, not the author's iteration history. If a claim is no longer true of the diff, raise a finding with `Action: update the PR description to match the current diff`. Do **not** flag the absence of a changelog of removed/superseded behaviour — that belongs in commit history, not the description.
+2. **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s *current* state, not the author's iteration history. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*. Don't raise an `Action:` finding here — this skill reports facts; the consumer (e.g. `/review-pr`) decides whether to escalate. Do **not** flag the absence of a changelog of removed/superseded behaviour either — that belongs in commit history, not the description.
 3. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism.
 
 ## Shape

From c867cbcbd6f06a157e53147569120b97e18ba3ca Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Wed, 6 May 2026 20:35:57 +0530
Subject: [PATCH 13/14] address Jose review on /pr-synthesis: size-gate diff
 fetch, reorder fetch+inputs, cosmetic fixes

---
 .claude/commands/pr-synthesis.md    | 37 ++++++++++++++++++++++-------
 docs/skills/pr-context-synthesis.md | 10 ++++----
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/.claude/commands/pr-synthesis.md b/.claude/commands/pr-synthesis.md
index 558122896f..f813adbc6b 100644
--- a/.claude/commands/pr-synthesis.md
+++ b/.claude/commands/pr-synthesis.md
@@ -1,5 +1,5 @@
 ---
-description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Useful as a standalone summary or as a building block for review / incident-investigation flows. Read-only.
+description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Read-only.
 ---
 
 Synthesise PR: $ARGUMENTS
@@ -8,7 +8,7 @@ Synthesise PR: $ARGUMENTS
 
 Accept any of:
 
-- A PR number: `4267`
+- A PR number: `4267`/`#4267`
 - A full URL: `https://github.com/cowprotocol/services/pull/4267`
 - An `owner/repo#N` form: `cowprotocol/services#4267`
 
@@ -26,14 +26,15 @@ and abort.
 
 ## Procedure
 
-Fetch in parallel:
+Fetch in this order — context first, then diff. Lets the synthesis read the diff with the author's intent already in mind, and lets `gh pr view` report `additions`/`deletions` so the diff fetch can be sized appropriately.
+
+**Step 1 — PR metadata + closing issues.** Cheap; bounded size.
 
 ```bash
-gh pr view <N> -R <owner>/<repo> --json title,body,files,labels,baseRefName,headRefName,closingIssuesReferences
-gh pr diff <N> -R <owner>/<repo>
+gh pr view <N> -R <owner>/<repo> --json title,body,files,additions,deletions,labels,baseRefName,headRefName,closingIssuesReferences
 ```
 
-For each entry in `closingIssuesReferences` (GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N`, more reliable than regexing the body), fetch the issue:
+**Step 2 — linked issues.** For each entry in `closingIssuesReferences` (GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N`, more reliable than regexing the body, and follows cross-repo references correctly), fetch the issue:
 
 ```bash
 gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name> --json title,body,labels,state
@@ -41,17 +42,35 @@ gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name>
 
 If `closingIssuesReferences` is empty, proceed without a linked issue — never invent one.
 
+**Step 3 — diff fetch (size-gated).** Use `additions + deletions` from step 1 to decide:
+
+- **`additions + deletions <= 2000`** — fetch the full diff:
+
+  ```bash
+  gh pr diff <N> -R <owner>/<repo>
+  ```
+
+- **`additions + deletions > 2000`** — *do not* fetch the full diff. It can blow the context window (e.g. PR #4217 = 376k lines). Get the complete per-file list via the paginated REST endpoint — `gh pr view --json files` silently caps at 100 files, which underreports scope on big PRs:
+
+  ```bash
+  gh api --paginate "repos/<owner>/<repo>/pulls/<N>/files" \
+    --jq '.[] | {filename, status, additions, deletions}'
+  ```
+
+  Build `<diff_summary>` from these per-file records (`filename`, `additions`, `deletions`, `status` — added / modified / renamed / removed). Bucket lockfiles, generated bindings, vendored artifacts, and codegen output — call them out as a single line each, not file-by-file. State in the synthesis that the diff was summarised at file-scope only.
+
 Build:
 
+- `<diff_summary>` = full hunks (small PR) or per-file scope buckets (large PR). The ground truth.
 - `<pr_text>` = title + body
-- `<linked_issue>` = the fetched issue, if any
-- `<diff_summary>` = a per-file note of the change scope (path + brief description, drawn from the diff hunks)
+- `<linked_issue>` = fetched issue(s), if any
 
 Then follow `docs/skills/pr-context-synthesis.md` Rules and Shape. Output the synthesis verbatim — no header, no metadata, no separator lines. The caller pastes this directly into a review thread, an incident report, or a Slack message.
 
 ## Rules
 
 - **Read-only.** `gh pr view`, `gh pr diff`, `gh issue view`, `gh api` GET-only. No `gh pr review`, no `gh pr comment`, no mutating `gh api` verbs.
+- **Don't fetch a diff you can't read.** The 2000-line gate above is a hard preflight, not a suggestion. Above the gate, the per-file scope is enough to write the synthesis honestly; trying to swallow a 100k-line diff just truncates the context and corrupts the output silently.
 - **Synthesise, don't copy-paste.** If `<pr_text>` is sparse, say so plainly: *"description is minimal; intent inferred from diff"*.
 - **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism. The anti-vague-verb rule from the skill doc applies verbatim.
-- **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s current state. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*.
+- **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s current state. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*. Fetching the issue + metadata before the diff is what lets you spot these — read the intent first, then check whether the diff matches.
diff --git a/docs/skills/pr-context-synthesis.md b/docs/skills/pr-context-synthesis.md
index dbc7ff355d..941b5f22e5 100644
--- a/docs/skills/pr-context-synthesis.md
+++ b/docs/skills/pr-context-synthesis.md
@@ -1,19 +1,21 @@
 # Skill — PR context synthesis
 
-Use to produce a tight 1–3 paragraph *what / why / how* block for a single PR (or PR-shaped change). Consumed by the PR review report's CONTEXT section and by [`pr-blame-walk`](pr-blame-walk.md), which calls this skill once per candidate PR before scoring.
+Use to produce a tight 1–3 paragraph *what / why / how* block for a single PR (or PR-shaped change). Consumed by the PR review report's CONTEXT section and by other workflows that need a per-PR summary (e.g. ad-hoc *"summarise this PR for me"* or per-candidate context in an incident-investigation walk).
 
 ## How to invoke
 
 Two ways:
 
-- **Procedurally** — follow the rules below when you need a tight 1–3 paragraph synthesis (CONTEXT block of `/review-pr`, per-candidate block in `pr-blame-walk`, ad-hoc *"summarise this PR for me"*).
+- **Procedurally** — follow the rules below when you need a tight 1–3 paragraph synthesis (CONTEXT block of `/review-pr`, ad-hoc *"summarise this PR for me"*, per-candidate context for an investigation).
 - **Via slash command** — `/pr-synthesis <N|owner/repo#N|url>` fetches the PR, linked issue, and diff for you and prints the synthesis verbatim.
 
 ## Inputs
 
-- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), substitute the branch name plus the relevant commit messages.
+Listed ground-truth-first; downstream items are interpreted relative to the diff.
+
+- `<diff_summary>` — file scope plus a codemap or per-file note of the actual change. The ground truth the synthesis must stay anchored to. For large PRs this may be file-scope only (paths + ±counts + change type), without hunks — `/pr-synthesis` falls back to that when the diff is too big to fetch in full.
+- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), use the current branch name and the relevant commit messages (i.e. diff against `main`).
 - `<linked_issue>` — title and body of any issue referenced via `Fixes #N` / `Closes #N` / `Resolves #N`. May be empty.
-- `<diff_summary>` — file scope plus a codemap or per-file note of the actual change. The ground truth the synthesis must stay anchored to.
 
 ## Rules
 

From eb4c869718a71c8cb34fb9cd00e7a061a359e365 Mon Sep 17 00:00:00 2001
From: Aryan Godara <aryangodara03@gmail.com>
Date: Wed, 6 May 2026 21:12:09 +0530
Subject: [PATCH 14/14] address Jose review on git-blame skill: reframe 'when
 to invoke', add second worked example, typo + cosmetic fixes

---
 docs/skills/git-blame-historic-context.md | 49 ++++++++++++++++++++---
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/docs/skills/git-blame-historic-context.md b/docs/skills/git-blame-historic-context.md
index a9da2a9db1..6982fecfe3 100644
--- a/docs/skills/git-blame-historic-context.md
+++ b/docs/skills/git-blame-historic-context.md
@@ -1,15 +1,21 @@
 # Skill — `git blame` for historic context
 
-Use before flagging code that looks unusual, redundant, accidental, or "easy to clean up". Often that code looks weird because it had to.
+Use before flagging code that looks unusual, redundant, accidental, or "easy to clean up". Often, certain decisions led to sub-optimal-looking code, and those decisions are codified in git history rather than the code itself.
+
+## When to invoke
+
+Not on first sighting. Build the full picture first — read the change / PR / file end-to-end and *collect* lines that look off during the pass — then run blame on each candidate. Code that looks weird is often being partially fixed, moved, or replaced by the diff you're currently reading; suspicion based on a single line, before you've seen its neighbours and the rest of the diff, is unreliable.
 
 ## How to invoke
 
 Two ways:
 
-- **Procedurally** — follow the steps below when you encounter unusual-looking code in any context (PR review, order-debug session, ad-hoc *"why is this here?"*).
+- **Procedurally** — follow the steps below.
 - **Via slash command** — `/blame-context <path>:<line>` wraps the procedure end-to-end and prints the strengthens / weakens / inverts decision. Useful when you want a one-shot answer rather than walking the procedure manually.
 
-## Example
+## Examples
+
+### Example 1 — magic constant (weakens)
 
 A reviewer sees this in `crates/driver/src/domain/competition/solution/settlement.rs:444`:
 
@@ -32,6 +38,39 @@ gh pr view <NNNN>
 
 Decision: **weakens** the "magic number" suspicion — the constant has documented inclusion-economics rationale. Drop the finding, or downgrade to a `Question:` confirming the rationale still applies on the chain in question.
 
+### Example 2 — defensive process exit (inverts)
+
+A reviewer sees this in `crates/observe/src/panic_hook.rs:14-15`:
+
+```rust
+let new_hook = move |info: &std::panic::PanicHookInfo| {
+    previous_hook(info);
+    std::process::exit(1);
+};
+```
+
+Hard-killing the process from inside a panic hook looks aggressive — surely a panic should be reported and recovered from, not nuke the whole binary? Run the procedure first:
+
+```bash
+git blame -L 14,15 -- crates/observe/src/panic_hook.rs
+# → 8b918a02df  Valentin  2022-09-12  ...
+# (file moved from crates/shared/src/exit_process_on_panic.rs; the
+# whitespace-insensitive copy-detection retry is unnecessary here
+# because the same SHA blames the moved lines.)
+
+git log -1 --format='%s%n%b' 8b918a02df
+# → "Exit process if thread panics (#530)
+#    Fixes #514 ."
+
+gh pr view 530
+# → body links to issue #514, whose body reads:
+#   "We spawn background tasks through `tokio::task::spawn` … a panic
+#    in a spawned task/thread does not affect the rest of the program.
+#    In these cases we want the whole program to exit."
+```
+
+Decision: **inverts** the suspicion. Removing the `process::exit(1)` would re-introduce exactly the silent-corruption failure mode the original PR fixed — a background panic (e.g. a cache updater) would be swallowed by tokio and the rest of the process would carry on with stale state. Drop the finding entirely; *suggesting* this change would ask the author to undo a deliberate cross-process invariant.
+
 ## Procedure
 
 ```bash
@@ -46,7 +85,7 @@ gh pr view <NNNN>
 
 ## Decision
 
-Promote what blame reveals into the finding's Explanation, then weight the finding:
+Promote what blame reveals into the finding's Explanation, then weigh the finding:
 
 - **Strengthens** — surrounding code was added recently for a reason the diff now contradicts. Keep / raise severity.
 - **Weakens** — the originating PR explains *why* the shape is unusual (deliberate workaround, perf fix, cross-version compat). Soften, or pivot from `Action:` to `Question:`. Example: a two-line guard looked over-defensive; blame showed it was a hot-patch lifted from prod logs — Medium → Question, asking whether the failure mode still applies.
@@ -54,7 +93,7 @@ Promote what blame reveals into the finding's Explanation, then weight the findi
 
 ## Edge cases
 
-- **Merge commit, not squash** — rare here; inspect with `git log -1 --format='%s%n%b' <sha>` and walk parents (`<sha>^1`, `^2`) to find the commit that actually authored the line.
+- **Merge commit, not squash** — inspect with `git log -1 --format='%s%n%b' <sha>` and walk parents (`<sha>^1`, `^2`) to find the commit that actually authored the line.
 - **Same author as the PR under review, recent** — context is fresh; ask the author directly in the review thread instead of synthesising blame.
 - **Refactor moved code wholesale** — surface blame points at the move, not the originating fix. Use `git blame -w -C -C -C -L <start>,<end> -- <path>` (whitespace-insensitive, copy-detection) to recover the real authoring commit.
 - **Vendored / generated / contract-binding code** — blame the generator's input (upstream config, source `.sol`, codegen template). Skip if the surface is a JSON ABI or lockfile.