diff --git a/.claude/commands/blame-context.md b/.claude/commands/blame-context.md
new file mode 100644
index 0000000000..41e0a3a8cb
--- /dev/null
+++ b/.claude/commands/blame-context.md
@@ -0,0 +1,52 @@
+---
+description: Recover historic context for a file:line via `git blame` plus the squash-PR pivot. Wraps `docs/skills/git-blame-historic-context.md` end-to-end and prints the strengthens / weakens / inverts decision so the caller can weight a finding before flagging suspicious-looking code. Read-only.
+---
+
+Look up historic context for: $ARGUMENTS
+
+## Parse $ARGUMENTS
+
+Accept any of:
+
+- `path:line` — single line. e.g. `crates/driver/src/.../settlement.rs:444`
+- `path:start-end` — line range. e.g. `crates/driver/src/.../settlement.rs:434-447`
+- `path line` or `path start-end` — space-separated form (handy when paths contain `:`).
+
+If `$ARGUMENTS` is empty or unparseable, print the usage block and abort:
+
+```
+Usage: /blame-context <path>:<line>
+       /blame-context <path>:<start>-<end>
+```
+
+## Procedure
+
+Follow `docs/skills/git-blame-historic-context.md` end-to-end. Concretely:
+
+1. `git blame -L <start>,<end> -- <path>` to find the originating commit.
+2. If the surface looks like a wholesale move/refactor (same author across many lines, recent date, identical hashes), retry with `git blame -w -C -C -C -L <start>,<end> -- <path>` to recover the real authoring commit.
+3. `git log -1 --format='%s%n%b' <sha>` for the commit body.
+4. If the subject ends with `(#NNNN)`, pivot to the PR conversation: `gh pr view <NNNN>` (the gh CLI infers the repo from the working directory). The PR body is usually richer than the squash commit alone.
+
+Then print the report below.
+
+## Output
+
+```
+─── Blame for <path>:<lines>
+<sha>  <author>  <date>  <subject>
+
+─── Originating commit / PR
+<commit body, OR PR title + body if a (#NNNN) PR was found>
+
+─── Decision
+<Strengthens | Weakens | Inverts> the suspicion that this code is unusual.
+<one-sentence reason naming the concrete signal from the originating PR/commit>
+Action: <keep finding | downgrade to Question | drop>
+```
+
+## Rules
+
+- **Read-only.** `git blame`, `git log`, `git show`, `gh pr view`, `gh api` GET-only. No `git commit`, no `git checkout`, no mutating `gh api` verbs.
+- **Don't comment on the PR.** Don't edit files. The caller owns what to do with the decision — this command just supplies the evidence.
+- **Don't invent a decision.** If blame surfaces nothing useful (line is brand new, generator-output, vendored), print `Decision: insufficient signal` and explain why.
diff --git a/.claude/commands/pr-synthesis.md b/.claude/commands/pr-synthesis.md
new file mode 100644
index 0000000000..f813adbc6b
--- /dev/null
+++ b/.claude/commands/pr-synthesis.md
@@ -0,0 +1,76 @@
+---
+description: Produce a tight 1–3 paragraph what / why / how synthesis of a single PR. Fetches title, body, linked issue, and diff scope; outputs per `docs/skills/pr-context-synthesis.md`. Read-only.
+---
+
+Synthesise PR: $ARGUMENTS
+
+## Parse $ARGUMENTS
+
+Accept any of:
+
+- A PR number: `4267`/`#4267`
+- A full URL: `https://github.com/cowprotocol/services/pull/4267`
+- An `owner/repo#N` form: `cowprotocol/services#4267`
+
+Default `owner/repo` to `cowprotocol/services` when only a number is given.
+
+If `$ARGUMENTS` is empty or unparseable, print:
+
+```
+Usage: /pr-synthesis <PR_NUMBER>
+       /pr-synthesis https://github.com/owner/repo/pull/<N>
+       /pr-synthesis owner/repo#<N>
+```
+
+and abort.
+
+## Procedure
+
+Fetch in this order — context first, then diff. Lets the synthesis read the diff with the author's intent already in mind, and lets `gh pr view` report `additions`/`deletions` so the diff fetch can be sized appropriately.
+
+**Step 1 — PR metadata + closing issues.** Cheap; bounded size.
+
+```bash
+gh pr view <N> -R <owner>/<repo> --json title,body,files,additions,deletions,labels,baseRefName,headRefName,closingIssuesReferences
+```
+
+**Step 2 — linked issues.** For each entry in `closingIssuesReferences` (GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N`, more reliable than regexing the body, and follows cross-repo references correctly), fetch the issue:
+
+```bash
+gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name> --json title,body,labels,state
+```
+
+If `closingIssuesReferences` is empty, proceed without a linked issue — never invent one.
+
+**Step 3 — diff fetch (size-gated).** Use `additions + deletions` from step 1 to decide:
+
+- **`additions + deletions <= 2000`** — fetch the full diff:
+
+  ```bash
+  gh pr diff <N> -R <owner>/<repo>
+  ```
+
+- **`additions + deletions > 2000`** — *do not* fetch the full diff. It can blow the context window (e.g. PR #4217 = 376k lines). Get the complete per-file list via the paginated REST endpoint — `gh pr view --json files` silently caps at 100 files, which underreports scope on big PRs:
+
+  ```bash
+  gh api --paginate "repos/<owner>/<repo>/pulls/<N>/files" \
+    --jq '.[] | {filename, status, additions, deletions}'
+  ```
+
+  Build `<diff_summary>` from these per-file records (`filename`, `additions`, `deletions`, `status` — added / modified / renamed / removed). Bucket lockfiles, generated bindings, vendored artifacts, and codegen output — call them out as a single line each, not file-by-file. State in the synthesis that the diff was summarised at file-scope only.
+
+Build:
+
+- `<diff_summary>` = full hunks (small PR) or per-file scope buckets (large PR). The ground truth.
+- `<pr_text>` = title + body
+- `<linked_issue>` = fetched issue(s), if any
+
+Then follow `docs/skills/pr-context-synthesis.md` Rules and Shape. Output the synthesis verbatim — no header, no metadata, no separator lines. The caller pastes this directly into a review thread, an incident report, or a Slack message.
+
+## Rules
+
+- **Read-only.** `gh pr view`, `gh pr diff`, `gh issue view`, `gh api` GET-only. No `gh pr review`, no `gh pr comment`, no mutating `gh api` verbs.
+- **Don't fetch a diff you can't read.** The 2000-line gate above is a hard preflight, not a suggestion. Above the gate, the per-file scope is enough to write the synthesis honestly; trying to swallow a 100k-line diff just truncates the context and corrupts the output silently.
+- **Synthesise, don't copy-paste.** If `<pr_text>` is sparse, say so plainly: *"description is minimal; intent inferred from diff"*.
+- **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism. The anti-vague-verb rule from the skill doc applies verbatim.
+- **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s current state. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*. Fetching the issue + metadata before the diff is what lets you spot these — read the intent first, then check whether the diff matches.
diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md
new file mode 100644
index 0000000000..1777b3b353
--- /dev/null
+++ b/.claude/commands/review-pr.md
@@ -0,0 +1,135 @@
+---
+description: Produce a structured PR review for cowprotocol/services. Invoked locally as `/review-pr` (diff mode, against current branch vs main) or `/review-pr <N|url>` (PR mode). Same command also runs in CI via `.github/workflows/claude-code-review.yml`, where it posts a single review comment instead of printing to terminal. Read-only in local modes; the user posts any comments manually.
+---
+
+Review PR: $ARGUMENTS
+
+Follow the instructions in `./docs/COW_PR_REVIEW_SKILL.md` to produce the review report.
+
+## Prologue (execute in order; abort on any failure)
+
+### 1. Detect mode
+
+The skill runs in one of three modes. Detection:
+
+- If the environment variable `$GITHUB_ACTIONS == "true"` → `mode = "pr-ci"`.
+  - `$ARGUMENTS` MUST be a PR number, URL, or `owner/repo#N` form (the workflow passes it).
+- Else if `$ARGUMENTS` is non-empty → `mode = "pr-local"`. Parse the argument (see [step 2](#2-parse-arguments-pr-modes-only)).
+- Else (`$ARGUMENTS` empty, not in CI) → `mode = "diff"`. No `gh` calls needed; source is `git diff $(git merge-base HEAD origin/main)..HEAD` (the actual command runs in step 3 below, after fetching `origin/main`).
+
+### 2. Parse $ARGUMENTS (PR modes only)
+
+*(Skip in `diff` mode.)*
+
+Accept any of:
+
+- A PR number: `4267`
+- A full URL: `https://github.com/cowprotocol/services/pull/4267`
+- An `owner/repo#N` form: `cowprotocol/services#4267`
+
+Default `owner/repo` to `cowprotocol/services` when only a number is given.
+
+Extract: `<PR_NUMBER>`, `<owner>`, `<repo>`.
+
+If unparseable, print and abort:
+
+```
+Usage: /review-pr                       # diff mode (current branch vs main)
+       /review-pr <PR_NUMBER>           # PR mode
+       /review-pr https://github.com/owner/repo/pull/<N>
+       /review-pr owner/repo#<N>
+```
+
+### 3. Diff-mode preflight
+
+*(Only in `mode == "diff"`.)*
+
+Run:
+
+```bash
+git fetch origin main --quiet
+BASE=$(git merge-base HEAD origin/main)
+git diff --stat "$BASE..HEAD"
+```
+
+If `git diff "$BASE..HEAD"` is empty, print `No diff vs main — nothing to review.` and exit clean (not an error).
+
+There is **no** clean-tree check, **no** rebase, and **no** `git pull` of main. Diff scope comes from the fetched `origin/main` merge-base.
+
+### 4. PR-mode preflight
+
+*(Only in `mode == "pr-local"` or `mode == "pr-ci"`.)*
+
+#### 4a. Working tree (pr-local only)
+
+Run `git status --porcelain`. If non-empty, print it plus:
+
+```
+Working tree is dirty. Stash or commit your changes, then re-run.
+
+  git stash        # temporary
+  git stash pop    # to restore later
+```
+
+Then **abort**. Never auto-stash.
+
+#### 4b. Save the current branch (pr-local only)
+
+Save the current branch name to `<prior_branch>` so the report's NEXT STEPS footer can suggest `git switch <prior_branch>` when the review is done.
+
+#### 4c. Fetch base ref
+
+Run `git fetch origin --quiet`. This makes the diff comparable to base without rebasing or modifying the user's branch.
+
+#### 4d. Checkout the PR
+
+Run `gh pr checkout <PR_NUMBER> -R <owner>/<repo>`. Failure handling:
+
+- **`gh` not installed** → print `Install gh: https://cli.github.com/` and abort.
+- **Auth error** → print `gh auth status` output plus `Run: gh auth login` and abort.
+- **PR doesn't exist / wrong repo** → surface `gh`'s error verbatim and abort.
+- **Fork without checkout permission** → switch to **degraded static-diff mode**:
+  - Set `mode_qualifier = "degraded static-diff"`.
+  - In the reference doc's §2, replace `gh pr diff <N>` with `gh pr diff <N> --patch -R <owner>/<repo>`.
+  - Flag the qualifier in the report header's `Mode:` line.
+- **Any other error** → surface verbatim and abort.
+
+In `pr-ci`, the workflow has already checked out the PR branch — skip 4d and just verify `HEAD` matches the expected ref.
+
+### 5. Optional-tooling probe
+
+Detect which optional accelerators are available in the current session: Serena MCP (`mcp__plugin_serena_serena__*`) and `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`).
+
+Build a `loaded_context` list of whichever ones resolved. Pass it through to the reference doc; it prints verbatim in the report header's `Loaded context:` line.
+
+Do not abort if a skill is missing. Do not print install banners.
+
+### 6. Noise filter (before handoff)
+
+Classify each changed file:
+
+**Review surface (read fully):**
+
+- Anything under `crates/*/src/**/*.rs` (excluding `contracts/generated/**`).
+- `crates/e2e/tests/**/*.rs`.
+- `contracts/solidity/**/*.sol` (authored Solidity).
+- Config files: `**/openapi.yml`, `database/sql/**`, `configs/**`, semantically interesting `*.toml`.
+
+**Noise (skip or skim):**
+
+- `Cargo.lock` (any). CI validates the resolution; reviewing the lockfile diff is high-cost, low-signal.
+- `contracts/generated/**` and `contracts/artifacts/**` — machine-generated bindings and ABI JSON.
+- Auto-generated `Cargo.toml` entries from contract-binding crates.
+- Binary fixtures, ABI blobs, snapshot files.
+
+Report the filter in the report's `Scope:` line as `+X −Y across Z files (~N LOC human-written; rest generated/lockfile, filtered)`.
+
+### 7. Handoff
+
+Read `docs/COW_PR_REVIEW_SKILL.md` and follow it from §2 (Metadata Fetch) onward, passing through:
+
+- `mode` (`diff` / `pr-local` / `pr-ci`) and `mode_qualifier` if set.
+- `<PR_NUMBER>`, `<owner>`, `<repo>` (PR modes only).
+- `prior_branch` (`pr-local` only).
+- `loaded_context` (optional skills detected in step 5).
+- The noise-filter classification from step 6 — the reference doc uses it to decide what to read.
diff --git a/.github/workflows/claude-code-review.yml b/.github/workflows/claude-code-review.yml
new file mode 100644
index 0000000000..8623219db7
--- /dev/null
+++ b/.github/workflows/claude-code-review.yml
@@ -0,0 +1,34 @@
+name: Claude Code Review
+
+on:
+  pull_request:
+    types: [ready_for_review]
+
+jobs:
+  claude-review:
+    if: ${{ !github.event.pull_request.draft }}
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+      issues: read
+      id-token: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          # Full history is needed so the skill's `git blame`-based
+          # historic-context step can resolve any line in the diff.
+          fetch-depth: 0
+
+      - name: Run Claude Code Review
+        id: claude-review
+        uses: anthropics/claude-code-action@e0f2d99545298b87c2f984ab534af3a6534142ae # v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          # Invoke the in-repo slash command. The skill detects
+          # $GITHUB_ACTIONS=true and switches to `pr-ci` mode, posting
+          # a single consolidated review comment instead of streaming
+          # to a terminal. Skill source of truth: docs/COW_PR_REVIEW_SKILL.md
+          prompt: '/review-pr ${{ github.repository }}#${{ github.event.pull_request.number }}'
diff --git a/docs/COW_PR_REVIEW_SKILL.md b/docs/COW_PR_REVIEW_SKILL.md
new file mode 100644
index 0000000000..7a6d7855ee
--- /dev/null
+++ b/docs/COW_PR_REVIEW_SKILL.md
@@ -0,0 +1,396 @@
+# CoW Services PR Review Skill
+
+This document instructs Claude how to produce a PR review for cowprotocol/services. It is invoked by `.claude/commands/review-pr.md` (locally) or by `.github/workflows/claude-code-review.yml` (in CI). One skill, three operating modes.
+
+At the point this document is read, the entry-point has already determined:
+
+- **`mode`** — one of:
+  - `diff` — local, no PR yet. Source is `git diff $(git merge-base HEAD origin/main)..HEAD` (the whole feature-branch worth of work, computed against the freshly-fetched remote `main` to avoid a stale local `main`). No PR metadata, no `gh` calls. Output to terminal.
+  - `pr-local` — local, PR exists. Source is `gh pr diff <N>` plus PR metadata. Output to terminal.
+  - `pr-ci` — running inside `.github/workflows/claude-code-review.yml`. Source is `gh pr diff <N>` plus PR metadata. Output is a single review comment posted to the PR.
+- **`<PR_NUMBER>`, `<owner>`, `<repo>`** (only in `pr-local` / `pr-ci`).
+- **`prior_branch`** (only in `pr-local` — the branch to print at the end).
+
+The mode shapes which steps run and how the report is delivered, but the *content* of the review is the same in all three.
+
+---
+
+## Core Principles (read before executing)
+
+- **Signal over noise.** Report genuine concerns only. LGTM is a perfectly valid verdict and is the correct one whenever the PR is clean. The goal is not to maximise finding count — it is to be worth a senior reviewer's attention.
+- **Local modes never post to GitHub.** In `diff` and `pr-local`, output is strictly terminal. No `gh pr review`, no `gh pr comment`. The user posts whatever they choose. In `pr-ci`, post exactly one consolidated review comment — no per-line spam.
+- **Code is the primary source of truth.** `CLAUDE.md`, design docs in `docs/`, and this skill's own sibling docs can go stale. When a finding turns on *"X is called from Y"* or *"this field is read by Z"*, verify with `git grep` / `rg` / LSP — not by citing a doc.
+- **Inverted: this PR can make existing docs / comments / its own description stale.** If a code change makes a comment, a `docs/` page, or the PR's own description no longer match the diff's current state, that is itself a finding (`Action:` → update X).
+- **`git blame` before flagging code that looks unusual.** Often code looks weird because it had to. Before suggesting a "cleanup", blame the affected lines, read the originating commit message and (if any) linked PR. A comment that says *"this looks accidental, did you mean X?"* without that step risks asking the author to undo a hard-won fix.
+- **Explain, don't just flag.** Each finding must give the reviewer enough context to understand *and defend* the point — not just forward AI-generated text.
+- **One framing per finding:** end with either `Action:` (concrete task) or `Question:` (clarification needed). Never both.
+- **Token discipline.** Don't read whole files when grep or LSP suffices. Build a codemap before reading file bodies.
+
+---
+
+## Universal Guardrails
+
+Apply these as the default lens for every change. Pull in CoW-specific siblings ([§3](#3-conditional-context)) only when the diff warrants them.
+
+1. **Keep the public API surface minimal.** A new `pub fn`, `pub struct`, or `pub mod` that isn't required by an external caller is a Medium finding asking why it isn't `pub(crate)` / `pub(super)` / private. Smaller surface = fewer downstream breakages = freer refactor for the next person.
+2. **Avoid rightward drift.** Code that's deeply nested (4+ levels, especially `match` inside `if let` inside `for` inside `async`) is hard to read and usually hides a missing extraction. Suggest an early-return, a helper, or `let-else`.
+3. **One responsibility per component.** A function, struct, or module that does two unrelated things (validates *and* persists; parses *and* renders) is harder to test and to reuse. Flag with a `Question:` if you're not sure the split is artificial.
+4. **Split big files.** A new file pushing past ~500 lines, or a touched file growing past ~1000 lines, is worth flagging. Suggest a sensible split (often by responsibility from #3).
+5. **Avoid argument bloat.** A function taking 6+ positional arguments is a code smell — usually missing a config struct, a builder, or a method on a context object. Especially flag if the arguments are mostly being threaded through unchanged.
+6. **Errors carry context.** `?` propagating a low-level error to a high-level boundary without enrichment loses the *what was the caller trying to do* information. `anyhow!("{err}")` flattens cause chains. Both are findings; severity depends on the path.
+
+---
+
+## Execution Flow
+
+Steps run in this order. `diff` mode skips PR-metadata steps; `pr-ci` swaps the report sink at the end.
+
+1. Fetch PR metadata and linked issue(s) — [§2](#2-metadata-fetch). *(`pr-local` / `pr-ci` only.)*
+2. Triage prior review comments — [§2.5](#25-prior-comment-follow-up). *(`pr-local` / `pr-ci` only; further skip conditions in §2.5.)*
+3. Classify diff paths and load conditional context — [§3](#3-conditional-context).
+4. Build a targeted codemap — [§4](#4-codemap-phase).
+5. Synthesize the context block — [§5](#5-context-synthesis).
+6. Review and produce findings — [§6](#6-review-and-severity).
+7. Emit the report — [§7](#7-report-templates). Sink depends on mode.
+8. Offer verification (background) — [§8](#8-verification-offer). *(Local modes only.)*
+9. Print cleanup hint — [§9](#9-cleanup). *(`pr-local` only.)*
+
+Error behaviour is consolidated in [§10](#10-error-playbook).
+
+---
+
+## 2. Metadata Fetch
+
+*(Skip in `diff` mode — there is no PR yet.)*
+
+Run in **parallel** (single message, multiple Bash tool calls):
+
+```bash
+gh pr view <PR_NUMBER> -R <owner>/<repo> \
+  --json title,body,author,labels,files,baseRefName,headRefName,headRefOid,additions,deletions,commits,reviewDecision,isDraft,state,closingIssuesReferences
+
+gh pr diff <PR_NUMBER> -R <owner>/<repo>
+```
+
+In **degraded static-diff mode** (fork without checkout permission, only relevant in `pr-local`), replace `gh pr diff` with:
+
+```bash
+gh pr diff <PR_NUMBER> --patch -R <owner>/<repo>
+```
+
+### Linked issues
+
+The `gh pr view` call above already returns `closingIssuesReferences` — GitHub's own parsing of `Fixes #N` / `Closes #N` / `Resolves #N` from the PR body. Iterate that array (in parallel with the diff fetch) and pull each issue:
+
+```bash
+gh issue view <issue.number> -R <issue.repository.owner>/<issue.repository.name> --json title,body,labels,state
+```
+
+If `closingIssuesReferences` is empty, proceed without one. Do not manufacture one.
+
+### State handling
+
+- `state == "CLOSED"` or `"MERGED"` → proceed; prepend a one-line warning to the report.
+- `isDraft == true` → proceed; prepend `Draft — author may still be iterating.`
+
+---
+
+## 2.5 Prior-comment follow-up
+
+Skip when any of these holds — no output, the section is omitted from the report:
+
+- `mode == diff` (no PR yet).
+- No prior human inline reviews on the PR.
+- For every human reviewer, their latest review's `commit_id` already equals `<head_sha>` — no new commits since their last round.
+
+Otherwise, for each prior inline comment from a human reviewer (use `gh api repos/<owner>/<repo>/pulls/<PR_NUMBER>/reviews` and `/comments`, GET only), surface one entry in the "Prior-comment follow-up" block ([§7](#7-report-templates)). Cite the new code at `<path>:<line>` that addresses it, the author's reply, or note that nothing has changed since `<prior_sha>`. Use stable identifiers `[A]`, `[B]`, ... so findings raised in this review can chain context with `Re: [A]`.
+
+**Be conservative.** When the diff doesn't make the answer obvious, say so explicitly — false *"addressed"* tricks the reviewer into closing a thread that should stay open. Don't infer satisfaction from emoji reactions or short acknowledgements; the reviewer makes that call.
+
+**Read-only.** No `gh pr review`, no `gh api` mutating verbs (`POST`/`PATCH`/`DELETE`), no comment-resolution endpoints. The reviewer resolves threads, not you.
+
+---
+
+## 3. Conditional Context
+
+For each changed file, evaluate this table and accumulate a `context_docs` list. Read each matched doc once.
+
+| Match | Load |
+|---|---|
+| Any file under `database/sql/**` | `docs/review-context/database-migrations.md` |
+
+Add a new sibling only when:
+
+- A real review surfaced a CoW-specific concern the AI consistently missed, **and**
+- That concern can't reasonably be inferred from the [Universal Guardrails](#universal-guardrails) plus general Rust judgment, **and**
+- It can be expressed as a tight checklist (≤30 lines), not a sprawling rulebook.
+
+### One inline guardrail worth keeping
+
+When a PR touches **any `openapi.yml`**: scan for breaking changes (removed/renamed/typed-changed fields, new required request fields, narrowed enums, changed auth or HTTP method). If any are present, ask whether the goal could be achieved non-breakingly (additive field, new optional, deprecation window) and whether the affected consumer teams (Frontend, SAFE, etc.) have been notified. Severity: High when undisclosed in the PR description; Medium otherwise.
+
+---
+
+## 4. Codemap Phase
+
+**Purpose:** before reading file bodies, map the symbols the diff touches, their callers, and their call sites. A codemap turns a 1000-line diff into a ~20-line mental model and catches findings that only become visible at the *shape* level (caller-count inconsistencies, dead abstractions, leaky public APIs).
+
+### What to map
+
+For each non-trivial symbol the diff adds, modifies, or deletes:
+
+1. **New public types / traits / functions** — fields, methods, signatures.
+2. **Modified function signatures** — caller count, were all sites updated?
+3. **New trait impls** — which types implement the trait? Is the trait used outside this PR?
+4. **Error-type changes** — where do callers match on this error?
+
+### Tools (cheapest viable option first)
+
+| Tool | Status | When to use |
+|---|---|---|
+| `Grep` / `rg` with `-n` on a symbol name | Always available | Caller counts and basic location lookups. Example: `rg 'OrderValidator::new\b' crates/`. |
+| `git blame` / `git log` | Always available | Historic context on suspicious-looking lines. Local, read-only. |
+| `gh api` (read-only verbs) | Always available | Reading individual file contents at a specific ref (e.g. degraded static-diff mode in §4). **Never** use mutating verbs (`POST`/`PATCH`/`DELETE`) — review skill is read-only. |
+| `mcp__plugin_serena_serena__find_symbol` / `find_referencing_symbols` / `get_symbols_overview` | Optional (LSP-backed; available when Serena MCP is configured) | Precise location + kind + signature without reading the full file. |
+| Skills from `actionbook/rust-skills` (`rust-call-graph`, `rust-symbol-analyzer`, `rust-trait-explorer`, `rust-code-navigator`) | Optional (installed via `npx skills add actionbook/rust-skills`) | Richer cross-crate analysis. Not present in CI by default. |
+| `Read` of a full file | Last resort | Only when the diff hunks plus the cheaper tools don't pin down what you need. |
+
+**Fallback rule:** in CI and in `pr-local` (full checkout), every codemap step still works with `rg` and `git` against the local working tree. The optional tools are accelerators, not requirements.
+
+**Exception — degraded static-diff mode.** When `pr-local` falls back to static-diff (a fork the user can't `gh pr checkout`), the working tree is the *prior* branch, not the PR's. `rg crates/`, `Read` of repo files, and `git blame` on the working tree all reflect the wrong code. In this mode:
+
+- Take the diff content from `gh pr diff <N> --patch -R <owner>/<repo>` (already arranged in [`commands/review-pr.md` §4d](../.claude/commands/review-pr.md)).
+- Read individual files at the PR's head SHA via `gh api /repos/<owner>/<repo>/contents/<path>?ref=<head_sha>` instead of `Read`.
+- Skip `git blame`-based historic context for lines that only exist on the PR branch (the local repo doesn't know about them); blame against `origin/main` is still fine for *unchanged* surrounding code.
+- Caller-count searches via `rg` are usable only for symbols that exist on `origin/main`; new symbols introduced by the PR must be looked up via the patch and `gh api`.
+
+The report header's `Mode:` line already flags `degraded static-diff` — a reviewer reading the output knows which constraint applied.
+
+### What the codemap produces
+
+A short block in the report that looks like:
+
+```
+Codemap
+───────────────────────────────────────────────────────────
+New symbols (crate::module):
+  <Name>  (kind, key fact)
+  ...
+
+Callers of <Name>: <count> sites (<real> real, <test> test). All updated ✓
+```
+
+This is the raw material §5 (synthesis) and §6 (findings) work from.
+
+### When to skip
+
+Trivial PRs (docs-only, single-line bump, pure test addition) — skip. Pure refactors with no added public API — skim. Everything else — do it.
+
+---
+
+## 5. Context Synthesis
+
+Follow the [`pr-context-synthesis`](skills/pr-context-synthesis.md) skill with `<pr_text>` = PR title + body from [§2](#2-metadata-fetch), `<linked_issue>` = the fetched issue (if any, also from §2), and `<diff_summary>` = the [§4](#4-codemap-phase) codemap plus the changed file list.
+
+---
+
+## 6. Review and Severity
+
+Apply, in order:
+
+1. The [Universal Guardrails](#universal-guardrails).
+2. The conditional context from [§3](#3-conditional-context), if any was loaded.
+3. CoW-services conventions from `CLAUDE.md`.
+4. Optional accelerators from [§4 → tools table](#tools-cheapest-viable-option-first) when available — Serena MCP for symbol lookups, `actionbook/rust-skills` for caller / trait / structural analysis. If they aren't installed, reason from `rg` + general Rust knowledge plus the [Universal Guardrails](#universal-guardrails).
+
+### Historic context
+
+Before flagging code that looks unusual, redundant, or "easy to clean up", follow the [`git-blame-historic-context`](skills/git-blame-historic-context.md) skill and factor what blame reveals into the finding's Explanation.
+
+### Severity Rubric
+
+| Severity | Meaning |
+|---|---|
+| **High** | Merging as-is is a real risk: correctness bug, data loss, security issue, incompatible DB migration, auction/settlement invariant broken, likely panic, unsound `unsafe`. |
+| **Medium** | Worth fixing before merge — won't break prod but will cost later. Missing error context, public-API ergonomics, unhandled edge case, n-1 rollout incompatibility, undisclosed breaking API change. |
+| **Small / QoL** | Would genuinely improve the code. **Not a nit.** |
+
+### Anti-nit Rule (mandatory)
+
+- If the only reason to change it is taste or stylistic preference, **do not report it**.
+- Formatting belongs to `rustfmt` / CI. Never raise a finding whose fix is "run `cargo +nightly fmt`".
+- Clippy lints are a CI concern. Don't surface them as findings — CI flags them automatically. If the new code reads poorly, raise a Small finding on the actual readability issue (Universal Guardrails #2/#3 cover most cases).
+- If you're uncertain whether something is a nit, omit it. LGTM is the right verdict when the PR is clean.
+- **Don't inflate severity** to look thorough. Each finding's severity is what a senior reviewer would actually call it on GitHub. Either it's worth discussing or it's omitted.
+
+### Per-finding shape
+
+1. **Title** — short noun phrase (≤ 8 words).
+2. **Location** — `path/to/file.ext:line` or `path:start-end`.
+3. **Explanation** — mechanism, impact, and (if relevant) what `git blame` / the codemap revealed.
+4. **`Action:` OR `Question:`** — exactly one.
+
+When a finding has a clear mechanical fix, phrase the `Action:` so the reviewer can paste it as a GitHub *Suggested change* block.
+
+---
+
+## 7. Report Templates
+
+### Terminal form (`diff` and `pr-local`)
+
+```
+═══════════════════════════════════════════════════════════
+PR #<N> — <title>                       (or "Diff review — <branch>" in diff mode)
+═══════════════════════════════════════════════════════════
+Author:       @<author>                 (omit in diff mode)
+Scope:        +<add> −<del> across <N> files
+              (~X LOC human-written; rest generated/lockfile, filtered)
+Labels:       <labels or "—">           (omit in diff mode)
+Base/Head:    <baseRef> ← <headRef>     (or "main ← <branch>" in diff mode)
+Linked issue: #<N> — <title>            (omit if none)
+Mode:         diff | pr-local | pr-ci   ;  full checkout | degraded static-diff
+
+Codemap
+───────────────────────────────────────────────────────────
+<from §4 — omit if skipped>
+
+Prior-comment follow-up — @<reviewer> at <prior_sha_short>
+───────────────────────────────────────────────────────────
+[A] <path>:<original_line>
+    Asked:   <≤12-word recap of what was asked>
+    Status:  ✓ Addressed  |  💬 Discussion needed  |  ⏳ Pending
+             ⚠ Silently dropped  |  🚫 Moot  |  ❓ Unclear
+    Cite:    <path>:<line at HEAD>, or "Author replied: ...", or
+             "No change since <prior_sha_short>"
+
+Sort what-needs-attention first: Discussion needed → Pending →
+Silently dropped → Unclear → Addressed → Moot.
+
+(Omit the whole block if §2.5 skipped.)
+
+───────────────────────────────────────────────────────────
+CONTEXT
+───────────────────────────────────────────────────────────
+<synthesis from §5>
+
+───────────────────────────────────────────────────────────
+VERDICT:  <LGTM | Changes requested | Needs clarification>
+───────────────────────────────────────────────────────────
+
+FINDINGS
+  [High]    <count>
+  [Medium]  <count>
+  [Small]   <count>    (QoL-only; nits omitted)
+
+───── High ─────────────────────────────────────────────────
+1. <title>
+   Location:  <path>:<line>
+
+   <explanation>
+
+   Action:    <task>      OR
+   Question:  <question>
+
+───── Medium ───────────────────────────────────────────────
+<same shape>
+
+───── Small / QoL ─────────────────────────────────────────
+<same shape>
+
+───────────────────────────────────────────────────────────
+VERIFICATION   (only if user opted in)
+───────────────────────────────────────────────────────────
+cargo check            <status>
+cargo clippy           <status>
+cargo +nightly fmt     <status>
+
+───────────────────────────────────────────────────────────
+NEXT STEPS                              (pr-local only)
+───────────────────────────────────────────────────────────
+Currently on:  <current branch>
+Return with:   git switch <prior_branch>
+```
+
+#### LGTM short form
+
+When there are zero findings at all severities, collapse everything between `VERDICT:` and `NEXT STEPS` to a single line:
+
+```
+VERDICT:  LGTM — no blocking or notable issues.
+```
+
+Header, CONTEXT, and (in `pr-local`) NEXT STEPS still print.
+
+### Comment form (`pr-ci`)
+
+In CI, post **one** review comment via the action. Use the same body shape as the terminal form, minus:
+
+- The NEXT STEPS section (no local branch state to print).
+- The VERIFICATION block (CI runs its own checks separately).
+- ANSI box-drawing characters (Markdown headings instead).
+
+GitHub-render the body using `##`/`###` headings and fenced code blocks. Keep findings collapsible with `<details>` if there are more than ~5.
+
+### Verdict selection
+
+- **LGTM** — no High or Medium findings.
+- **Changes requested** — any High, or Medium that blocks safety/correctness.
+- **Needs clarification** — no High, but one or more findings use the `Question:` form.
+
+---
+
+## 8. Verification Offer
+
+*(Local modes only. Skip in `pr-ci` — CI runs its own check/clippy/fmt jobs in parallel.)*
+
+After printing the report, ask the user:
+
+> "Run local verification in the background? Reply with one of:
+> `check` — `cargo check --locked --workspace --all-features --all-targets`
+> `clippy` — `cargo clippy --locked --workspace --all-features --all-targets -- -D warnings`
+> `fmt` — `cargo +nightly fmt --all -- --check`
+> `all` — run all three
+> `skip` — don't run anything"
+
+Dispatch each selected command as a **background** Bash invocation. On completion, append the result to a VERIFICATION block.
+
+If `cargo check` surfaces **compile errors inside files this PR modified**, fold them into Findings as **High**. `cargo clippy` and `cargo +nightly fmt --check` results are status-only — record them in the VERIFICATION block, never as findings. CI gates on both, so duplicating them in the review report is noise.
+
+Tests are intentionally not in the menu — services' suite is too long-running to gate every review on.
+
+---
+
+## 9. Cleanup
+
+*(`pr-local` only.)*
+
+The NEXT STEPS footer names the current branch and the `git switch` to return to. Print it; never run it. The user may want to stay on the PR branch.
+
+---
+
+## 10. Error Playbook
+
+| Condition | Behaviour |
+|---|---|
+| `gh` not installed (pr modes) | Print `Install gh: https://cli.github.com/`, abort. |
+| `gh` not authenticated | Print `gh auth status` output + `Run: gh auth login`, abort. |
+| PR number not parseable | Print usage, abort. |
+| PR doesn't exist / wrong repo | Surface `gh` error verbatim, abort. |
+| PR closed/merged | Prepend warning, proceed. |
+| PR is draft | Prepend warning, proceed. |
+| Working tree dirty (pr-local) | Print `git status --porcelain`, instruct stash/commit, abort. **No auto-stash.** |
+| `gh pr checkout` fails (fork permission) | Degrade to static-diff mode, flag in report header. |
+| Optional skill not installed | Continue using `rg` / general Rust knowledge. Do **not** abort. |
+| Follow-up triage fails (5xx, rate-limit on `/reviews` or `/comments`) | Skip the section, append `follow-up triage skipped: <reason>` to the report header's `Mode:` line, continue with the rest of the review. |
+| `diff` mode but no diff (branch == main) | Print `No diff vs main — nothing to review.`, exit clean. |
+| Verification command fails | Surface output in VERIFICATION block; raise findings only for issues inside changed files. |
+
+**Rule of thumb:** never silently degrade. If a tool was missing or a step was skipped, the report's Mode/Header line should reflect it.
+
+---
+
+## 11. Maintenance Notes
+
+- When the AI consistently misses a CoW-specific concern across multiple reviews, first try expressing it as one more bullet in [Universal Guardrails](#universal-guardrails). Only carve a sibling doc if it can't be expressed generically.
+- When the skill produces a false-positive finding, add a one-line counter-example to the [Anti-nit Rule](#anti-nit-rule-mandatory).
+- Keep this document under 500 lines.
diff --git a/docs/review-context/database-migrations.md b/docs/review-context/database-migrations.md
new file mode 100644
index 0000000000..6e3bc788f5
--- /dev/null
+++ b/docs/review-context/database-migrations.md
@@ -0,0 +1,53 @@
+# Review Context — Database Migrations
+
+Loaded by `COW_PR_REVIEW_SKILL.md §3` when a PR touches `database/sql/**`.
+
+This file extends — does not replace — the reminder in `.github/nitpicks.yml`. The nitpick bot posts the generic warning automatically; the reviewer applies judgment using the checks below.
+
+The checks are dictated by *how* we roll out — not by SQL itself.
+
+## Rollout shape
+
+Two facts shape every check below:
+
+1. **k8s rolls pods, not the cluster.** During a release, the new autopilot starts and runs Flyway *before* the old pod is shut down. For a non-trivial overlap window the previous version is processing requests against the new schema. Anything that breaks that path breaks production.
+2. **Staging and production are independent and roll out at different times.** A migration lands in staging first; production follows on its own cadence. While the gap is open, our shadow environment can be running an *older* code version against the *newer* schema (or vice versa, briefly). Any change that breaks that combination breaks shadow without breaking the obvious path the author tested.
+
+Both points generalise beyond DB — config schema changes, request/response formats, and message-queue payloads have the same n-1 / staging-vs-production caveat. When a PR touches any of those *and* the DB, mention the cross-cutting risk in the synthesis.
+
+## Non-negotiables
+
+Each item carries its own severity. Items 1–3 are correctness/availability concerns and default to **High**; items 4–5 are author hygiene and default to **Medium**.
+
+1. **Reversibility.** State whether the migration is reversible. If yes, include or link the rollback script. If no, the PR explains *why* irreversibility is acceptable. Missing either → **High**.
+2. **n-1 schema compatibility.** The previous app version must still function against the new schema. A migration that drops or renames a column, narrows a type, or adds a `NOT NULL` constraint without code already tolerating both shapes → **High** until the rollout plan is spelled out (typically: ship change in three releases — add → migrate code → drop).
+3. **Blocking index creation on hot tables.** Use `CREATE INDEX CONCURRENTLY` on anything in the auction/settlement critical path (`orders`, `trades`, `auctions`, `settlements`, `order_events`, `auction_orders`, `quotes`, `order_quotes`). A blocking `CREATE INDEX` on one of these → **High**.
+4. **Authoritative table list.** New tables must appear in `crates/database/src/lib.rs` (search for the top-level table list). Missing → **Medium** (CI may also catch it; still the author's responsibility).
+5. **README.** Schema changes update the SQL or DB README, whichever is the convention at review time. Missing → **Medium**.
+
+## Other shapes that usually warrant **High**
+
+- `ALTER TABLE ... ADD COLUMN NOT NULL` on a multi-million-row table without a default — table lock plus slow backfill. Remedy: add nullable, batched backfill, then `NOT NULL` in a later migration.
+- Renaming a column in a single migration rather than the three-release add → migrate → drop pattern.
+- Adding a `UNIQUE` constraint without first verifying current data is unique (migrations fail when duplicates exist).
+- `ALTER COLUMN ... TYPE ...` on a large table without a multi-step migration plan — Postgres rewrites every row and holds an `ACCESS EXCLUSIVE` lock for the duration. Remedy: new column, dual-write, backfill, swap, drop old.
+
+## Usually worth flagging as **Medium**
+
+- New foreign keys without an explicit `ON DELETE` clause (default `NO ACTION` is often surprising).
+- New tables without indexes on the columns obvious queries will filter on.
+
+## Not findings
+
+- SQL style — trailing commas, keyword casing, indentation.
+- Whether the migration could be one statement instead of three. If correct, accept it.
+- Migration filename style — Flyway naming is enforced mechanically.
+
+## Questions worth asking the author
+
+When in doubt, prefer a `Question:` over a flagged `Action:`:
+
+- *"What's the expected row count in `<table>` at rollout time?"* — drives `CONCURRENTLY` and batched-backfill decisions.
+- *"Which release pairs this migration with the code change?"* — makes the n-1 reasoning explicit.
+- *"How does this look on shadow during the staging→production gap?"* — surfaces cross-environment compatibility before the author finds out by paging.
+- *"Has the rollback script been tested against a production-sized dataset?"* — only relevant if a rollback was included.
diff --git a/docs/skills/git-blame-historic-context.md b/docs/skills/git-blame-historic-context.md
new file mode 100644
index 0000000000..6982fecfe3
--- /dev/null
+++ b/docs/skills/git-blame-historic-context.md
@@ -0,0 +1,111 @@
+# Skill — `git blame` for historic context
+
+Use before flagging code that looks unusual, redundant, accidental, or "easy to clean up". Often, certain decisions led to sub-optimal-looking code, and those decisions are codified in git history rather than the code itself.
+
+## When to invoke
+
+Not on first sighting. Build the full picture first — read the change / PR / file end-to-end and *collect* lines that look off during the pass — then run blame on each candidate. Code that looks weird is often being partially fixed, moved, or replaced by the diff you're currently reading; suspicion based on a single line, before you've seen its neighbours and the rest of the diff, is unreliable.
+
+## How to invoke
+
+Two ways:
+
+- **Procedurally** — follow the steps below.
+- **Via slash command** — `/blame-context <path>:<line>` wraps the procedure end-to-end and prints the strengthens / weakens / inverts decision. Useful when you want a one-shot answer rather than walking the procedure manually.
+
+## Examples
+
+### Example 1 — magic constant (weakens)
+
+A reviewer sees this in `crates/driver/src/domain/competition/solution/settlement.rs:444`:
+
+```rust
+let max_gas = eth::Gas(block_limit.0 / eth::U256::from(2));
+```
+
+`/2` looks arbitrary — a reviewer might be tempted to flag it as a magic number. Run the procedure first:
+
+```bash
+git blame -L 444,444 -- crates/driver/src/domain/competition/solution/settlement.rs
+# → a4ee76aae3  Felix Leupold  2024-03-18  ...
+
+git log -1 --format='%s%n%b' a4ee76aae3
+# → subject ends with `(#NNNN)`; pivot to the PR
+gh pr view <NNNN>
+# → body: "block builders' default algorithm picks tx whose gas limit
+#    fits remaining space; leave headroom for inclusion."
+```
+
+Decision: **weakens** the "magic number" suspicion — the constant has documented inclusion-economics rationale. Drop the finding, or downgrade to a `Question:` confirming the rationale still applies on the chain in question.
+
+### Example 2 — defensive process exit (inverts)
+
+A reviewer sees this in `crates/observe/src/panic_hook.rs:14-15`:
+
+```rust
+let new_hook = move |info: &std::panic::PanicHookInfo| {
+    previous_hook(info);
+    std::process::exit(1);
+};
+```
+
+Hard-killing the process from inside a panic hook looks aggressive — surely a panic should be reported and recovered from, not nuke the whole binary? Run the procedure first:
+
+```bash
+git blame -L 14,15 -- crates/observe/src/panic_hook.rs
+# → 8b918a02df  Valentin  2022-09-12  ...
+# (file moved from crates/shared/src/exit_process_on_panic.rs; the
+# whitespace-insensitive copy-detection retry is unnecessary here
+# because the same SHA blames the moved lines.)
+
+git log -1 --format='%s%n%b' 8b918a02df
+# → "Exit process if thread panics (#530)
+#    Fixes #514 ."
+
+gh pr view 530
+# → body links to issue #514, whose body reads:
+#   "We spawn background tasks through `tokio::task::spawn` … a panic
+#    in a spawned task/thread does not affect the rest of the program.
+#    In these cases we want the whole program to exit."
+```
+
+Decision: **inverts** the suspicion. Removing the `process::exit(1)` would re-introduce exactly the silent-corruption failure mode the original PR fixed — a background panic (e.g. a cache updater) would be swallowed by tokio and the rest of the process would carry on with stale state. Drop the finding entirely; *suggesting* this change would ask the author to undo a deliberate cross-process invariant.
+
+## Procedure
+
+```bash
+git blame -L <start>,<end> -- <path>                # who/what/when
+
+# cowprotocol/services squash-merges. Most blames point at one commit whose
+# subject ends with "(#NNNN)" — extract the PR number, then pivot to the
+# PR conversation, which is usually richer than the commit body alone.
+git log -1 --format='%s%n%b' <sha>
+gh pr view <NNNN>
+```
+
+## Decision
+
+Promote what blame reveals into the finding's Explanation, then weigh the finding:
+
+- **Strengthens** — surrounding code was added recently for a reason the diff now contradicts. Keep / raise severity.
+- **Weakens** — the originating PR explains *why* the shape is unusual (deliberate workaround, perf fix, cross-version compat). Soften, or pivot from `Action:` to `Question:`. Example: a two-line guard looked over-defensive; blame showed it was a hot-patch lifted from prod logs — Medium → Question, asking whether the failure mode still applies.
+- **Inverts** — flagging this would ask the author to undo a hard-won fix. Drop the finding. Example: a `Some(_) =>` arm looked redundant; blame revealed it was added months ago to swallow a panic on an edge case the diff was about to remove. Dropped, with a note that the panic is back.
+
+## Edge cases
+
+- **Merge commit, not squash** — inspect with `git log -1 --format='%s%n%b' <sha>` and walk parents (`<sha>^1`, `^2`) to find the commit that actually authored the line.
+- **Same author as the PR under review, recent** — context is fresh; ask the author directly in the review thread instead of synthesising blame.
+- **Refactor moved code wholesale** — surface blame points at the move, not the originating fix. Use `git blame -w -C -C -C -L <start>,<end> -- <path>` (whitespace-insensitive, copy-detection) to recover the real authoring commit.
+- **Vendored / generated / contract-binding code** — blame the generator's input (upstream config, source `.sol`, codegen template). Skip if the surface is a JSON ABI or lockfile.
+
+## When to skip
+
+- Lines entirely new in the diff under review (no history yet).
+- Pure additions of new symbols (nothing to blame).
+- Generated code where the input lives elsewhere — blame the input instead.
+
+## Used by
+
+- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) §6 — before flagging unusual-looking code.
+- [`COW_ORDER_DEBUG_SKILL.md`](../COW_ORDER_DEBUG_SKILL.md) — when investigating *"why is this check here?"* during order debugging.
+- Ad-hoc code investigations where a line of code prompts *"this looks accidental"*.
diff --git a/docs/skills/pr-blame-walk.md b/docs/skills/pr-blame-walk.md
new file mode 100644
index 0000000000..3ddb7e7655
--- /dev/null
+++ b/docs/skills/pr-blame-walk.md
@@ -0,0 +1,157 @@
+# Skill — pr-blame-walk
+
+Use to investigate *"a symptom appeared in prod between T1 and T2 — which merged PR is the most likely cause?"*. Inputs are a query (the symptom + locating signal) and a merge window; output is a ranked list of suspect PRs, each with a tight context block and a one-sentence evidence-anchored *"Why suspected"*.
+
+This is **not** a per-PR review — that is [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md). It is also not a causality proof: output is *suspects*, not *the* cause. Final attribution requires reproducing the symptom against a revert.
+
+## Inputs
+
+- `<query>` — the symptom and any locating signal: error string, endpoint, metric name, network, onset time. Examples:
+  - *"500s on `POST /api/v1/orders` mainnet starting 2026-04-20 14:00 UTC"*
+  - *"`auction_rewards` dropped 30% on Gnosis from 2026-04-15"*
+  - *"settlement gas usage on mainnet up 2× since last week"*
+- `<onset>` — the timestamp the symptom began, RFC3339 if known. Used as a hard cutoff: PRs merged after `<onset>` cannot be the cause.
+- `<time_window>` — the merge window, ending at or before `<onset>`. Either a date range (`merged:YYYY-MM-DD..YYYY-MM-DD`) or a count (last N merged PRs).
+- `<repo>` — `<owner>/<repo>` (defaults to `cowprotocol/services`).
+- `<scope>` — optional path filter (e.g. `crates/autopilot/`, `crates/driver/src/domain/competition/`, `database/sql/`). Skip if the symptom is system-wide.
+
+## Procedure
+
+### 1. Enumerate candidates
+
+```bash
+gh pr list -R <repo> --state merged \
+  --search "merged:<YYYY-MM-DD>..<YYYY-MM-DD>" \
+  --json number,title,mergedAt,mergeCommit,author,additions,deletions \
+  --limit 100
+```
+
+If the result hits `--limit`, narrow `<time_window>` and re-run — over 100 candidates means the window is too wide for the rubric to do useful work. Default `gh pr list` limit is 30; pass `--limit` explicitly.
+
+### 2. Cheap pre-filter (no fetch)
+
+Drop candidates without fetching their diff:
+
+- **Merged after `<onset>`** — `mergedAt > <onset>`. Cannot be cause.
+- **Pure dep bump** — title matches `^Bump `, `^chore: bump`, `RUSTSEC`, `dependabot`. Diff is `Cargo.toml` version bumps + `Cargo.lock` only.
+- **Pure docs / CI workflow / formatting / test-only** — files confined to `docs/`, `.github/`, `*.md`, `**/tests/`. Keep CI changes only if `<query>` is itself a build/CI symptom.
+- **Network mismatch** — title or label explicitly names a chain different from `<query>`'s network *and* the diff doesn't touch shared infra.
+
+### 3. Per-candidate fetch (parallel, capped)
+
+For each surviving candidate, in parallel — **cap concurrency at 5–10**. `gh` shares the GitHub API budget (5000 req/hr authenticated) with the rest of the session.
+
+```bash
+gh pr view <N> -R <repo> --json title,body,files,baseRefName,headRefName,labels
+gh pr diff <N> -R <repo>
+```
+
+If `gh` returns `API rate limit exceeded`, pause and retry with smaller batches. Do not fall back to scraping the web UI.
+
+### 4. Per-candidate context block
+
+Run [`pr-context-synthesis`](pr-context-synthesis.md) per candidate with `<pr_text>` = title+body, `<linked_issue>` = referenced issue if any, `<diff_summary>` = file list + diff hunks. **Cap output at 3–4 sentences** — this skill needs the *what*, not the full *why/how*. The anti-vague-verb rule from that skill's §Rules applies verbatim.
+
+If the diff revives or reverts code whose origin isn't obvious from the PR text, run [`git-blame-historic-context`](git-blame-historic-context.md) on the deleted lines (against `origin/main^`) to recover the originating commit/PR — useful for "Why suspected" when the suspect undoes a prior fix.
+
+### 5. Score, rank, emit
+
+Apply [Scoring rubric](#scoring-rubric). Drop everything below Medium. Sort by tier, then by tiebreaker. Emit per [Output](#output).
+
+## Scoring rubric
+
+For each surviving candidate, evaluate signals and pick the highest tier that applies:
+
+| Tier | Signal |
+|---|---|
+| **High** | Touches a file/symbol/endpoint **explicitly named** in `<query>`. |
+| **High** | Modifies behaviour the symptom directly describes (e.g. `<query>` says *"rewards dropped"* and the PR changes a reward calculation; *"500 on /quote"* and the PR touches the quote handler). |
+| **High** | Touches a code path the symptom would naturally route through, on the same network/chain mentioned in `<query>`. |
+| **Medium** | Same crate/module as a High signal but a different surface (sibling function, shared util). |
+| **Medium** | Touches shared infra (gas estimation, RPC client, DB access, serialization, settlement queueing) that could plausibly chain through to the symptom. |
+| **Drop** | Pure rename / move / formatting — no behaviour change. (`additions ≈ deletions` and the diff is `mv`-shaped or whitespace-only.) |
+| **Drop** | Pure comment-only changes. |
+| **Drop** | Wholly unrelated crate with no plausible chain to the symptom. |
+
+(The pre-filter Drops in step 2 don't reach this rubric — they were dropped without fetching.)
+
+### Tiebreakers
+
+When two PRs land at the same tier:
+
+1. **Smaller diff first.** Smaller change = cheaper to bisect / revert. `additions+deletions` is the proxy.
+2. **Hot-path > cold-path.** A change in a request-handling code path beats a change to one-shot init or batch-job code.
+3. **Multiple High signals stacked > single High signal.** Note the stack in *"Why suspected"*.
+
+### Single-best mode
+
+If exactly one candidate scores High and the next-highest is Medium-or-lower, surface only that one in *High suspects* with `(only High suspect; next is <tier>)` after the title. Mediums still print.
+
+## Output
+
+```
+PR-blame-walk — <query>
+Window:     <time_window>
+Onset:      <onset or "—">
+Repo:       <repo>
+Scope:      <scope or "—">
+Scanned:    <total candidates after enumeration>
+Pre-filter: <count> dropped (post-onset, deps, docs/CI, network mismatch)
+Surfaced:   <high count> high, <medium count> medium
+
+───── High suspects ────────────────────────────────────────
+1. #<N> — <title>                  [merged <YYYY-MM-DD> by @<author>; +<add>/-<del>]
+   What: <3–4 sentence synthesis from pr-context-synthesis>
+   Why suspected: <one or two sentences naming a file, symbol, or
+                  behaviour change. Vague verbs forbidden.>
+   Inspect: gh pr view <N> -R <repo> --web
+
+2. ...
+
+───── Medium suspects ─────────────────────────────────────
+<same shape>
+```
+
+Empty result:
+
+```
+No suspects in window. Symptom likely originates outside this PR set:
+config change, upstream dep, infra/RPC, or a PR predating the window.
+Consider broadening <time_window>, removing <scope>, or checking
+deploys (`gh api -X GET repos/<repo>/deployments?per_page=20`) and infra.
+```
+
+## Evidence integration
+
+The skill itself doesn't run evidence queries — but the operator usually has them open already. Cite the query that would sharpen the score; let the operator run it.
+
+| Evidence | Strengthens score when | Read-only query |
+|---|---|---|
+| **Victoria Logs** | Symptom onset matches the suspect's deploy time | `victorialogs_query` MCP, e.g. `container:!controller AND <symptom> \| fields _time, _msg, all` over the window straddling `mergedAt` |
+| **postgres MCP** | Symptom is DB-shaped (timeouts, missing rows, migration drift) and the suspect added a migration | `mcp__postgres-protocol__query` with `SELECT` only; show the SQL before running |
+| **Squash-commit → PR lookup** | You have a SHA from `git log` or a deploy log and need its PR | `gh pr list -R <repo> --search "<sha>" --state merged` |
+| **Deploys vs merges** | Suspect deploy-time vs merge-time mismatch (a PR can merge hours before deploy) | `gh api -X GET repos/<repo>/deployments?per_page=20` |
+
+The principle: name *which* evidence query would tip the score, not the result of running it.
+
+## Rules
+
+1. **Score from evidence, not vibes.** Every *"Why suspected"* line names a concrete file, symbol, or behaviour change. *"Looks suspicious"* / *"updates X"* / *"changes Y"* are failures — the anti-vague-verb rule from [`pr-context-synthesis`](pr-context-synthesis.md) §Rules applies verbatim.
+2. **The query is the lens.** A PR risky in general but unrelated to `<query>` is not a suspect. This skill is not a retroactive review.
+3. **Don't synthesise causality.** Output is *suspects*; final attribution requires a revert / bisect.
+4. **Time-anchor.** Drop PRs merged after `<onset>` — they cannot be the cause. Watch for force-push edge cases: trust `mergeCommit.oid` and `gh pr view <N>`, not the squash subject's `(#NNNN)` alone.
+5. **Read-only.** Every command in this skill is read-only: `gh` (`view` / `list` / `diff` / `api` GET-only), `git` (`log` / `blame` / `show`), MCP query tools (`SELECT` only). Never `gh pr review`, `gh pr comment`, mutating `gh api` verbs (`POST`/`PATCH`/`DELETE`), DB writes, or `git commit`/`push`/`checkout` of suspect branches.
+
+## When to skip
+
+- **Symptom not narrowed to a window yet** (e.g. *"things feel slow"*). Narrow via metrics/logs first, then run this skill.
+- **Single suspect already in mind** — use `/review-pr <N>` or read the diff directly. This skill is N→1, not 1→1.
+- **Symptom predates available PR history** — broaden the window or accept the cause is older than the rubric can see.
+- **Symptom is config / infra / upstream** — there are no source-code PRs in this repo to scan; check deploys, config, RPC provider, or external dependencies instead.
+- **Pre-filter drops everything** — the rubric has nothing to score; widen `<time_window>` or relax `<scope>`.
+
+## Used by
+
+- Ad-hoc incident investigations: paged on a fresh prod regression, run this before opening individual PRs.
+- May be invoked as a follow-up from [`COW_ORDER_DEBUG_SKILL.md`](../COW_ORDER_DEBUG_SKILL.md) when an order-debug session pins a regression to a window with no obvious cause inside the order's lifecycle.
+- [`COW_PR_REVIEW_SKILL.md`](../COW_PR_REVIEW_SKILL.md) is the *complement*, not a caller — once this skill surfaces a suspect, run `/review-pr <N>` on it.
diff --git a/docs/skills/pr-context-synthesis.md b/docs/skills/pr-context-synthesis.md
new file mode 100644
index 0000000000..941b5f22e5
--- /dev/null
+++ b/docs/skills/pr-context-synthesis.md
@@ -0,0 +1,49 @@
+# Skill — PR context synthesis
+
+Use to produce a tight 1–3 paragraph *what / why / how* block for a single PR (or PR-shaped change). Consumed by the PR review report's CONTEXT section and by other workflows that need a per-PR summary (e.g. ad-hoc *"summarise this PR for me"* or per-candidate context in an incident-investigation walk).
+
+## How to invoke
+
+Two ways:
+
+- **Procedurally** — follow the rules below when you need a tight 1–3 paragraph synthesis (CONTEXT block of `/review-pr`, ad-hoc *"summarise this PR for me"*, per-candidate context for an investigation).
+- **Via slash command** — `/pr-synthesis <N|owner/repo#N|url>` fetches the PR, linked issue, and diff for you and prints the synthesis verbatim.
+
+## Inputs
+
+Listed ground-truth-first; downstream items are interpreted relative to the diff.
+
+- `<diff_summary>` — file scope plus a codemap or per-file note of the actual change. The ground truth the synthesis must stay anchored to. For large PRs this may be file-scope only (paths + ±counts + change type), without hunks — `/pr-synthesis` falls back to that when the diff is too big to fetch in full.
+- `<pr_text>` — PR title and body. If there is no PR yet (e.g. local-diff mode), use the current branch name and the relevant commit messages (i.e. diff against `main`).
+- `<linked_issue>` — title and body of any issue referenced via `Fixes #N` / `Closes #N` / `Resolves #N`. May be empty.
+
+## Rules
+
+1. **Synthesize, don't copy-paste.** If `<pr_text>` is five words, say so plainly: *"description is minimal; intent inferred from diff"*. Don't pad to look thorough.
+2. **Watch for description-vs-diff drift.** `<pr_text>` must describe `<diff_summary>`'s *current* state, not the author's iteration history. If a claim is no longer true of the diff, note it in the synthesis as *"description claims X; diff shows Y"*. Don't raise an `Action:` finding here — this skill reports facts; the consumer (e.g. `/review-pr`) decides whether to escalate. Do **not** flag the absence of a changelog of removed/superseded behaviour either — that belongs in commit history, not the description.
+3. **No vague verbs.** *"This PR updates something"* is a failure. Name the component, the change, and the mechanism.
+
+## Shape
+
+- **Paragraph 1** — *what* changed. Component + concrete change, drawn from `<diff_summary>`.
+- **Paragraph 2** — *why*. Drawn from `<pr_text>` and `<linked_issue>`. If both are thin, say so.
+- **Paragraph 3** (only if warranted) — *how*. The approach, not a line-by-line walkthrough.
+
+## Example
+
+Inputs (real PR — cowprotocol/services#4371):
+
+- `<pr_text>` — *"Enforce EIP-7825 per-tx gas cap on settlement"*; body explains the Fusaka mempool cap and references the existing quote-side enforcement (#4261).
+- `<linked_issue>` — #4368, *"Driver doesn't enforce the EIP-7825 per-tx gas cap"* (labels: bug, good first issue).
+- `<diff_summary>` — +58 −1 in `settlement.rs`; new const `EIP_7825_TX_GAS_CAP`, `Gas::new` capped via `min(half_block, EIP_7825)`, three tests.
+
+Output:
+
+> Fusaka introduced EIP-7825, capping any single tx at 2^24 − 1 gas. The driver's `Gas::new` was applying only the older `block_limit/2` heuristic; a solution between the EIP-7825 cap and that heuristic could pass validation in `/solve` and never settle. Issue #4368 flagged this as theoretical (no production hits); the fix ports the same idea PR #4261 applied to the quote path.
+>
+> Mechanically: `max_gas = min(half_block, EIP_7825)`, preserving inclusion-economics on chains with low block gas limits via the `min`.
+
+## When to skip
+
+- Trivial changes (docs typo, single-line dep bump, lockfile-only). One sentence is enough; don't force the three-paragraph shape.
+- `<pr_text>` and `<linked_issue>` are both empty *and* `<diff_summary>` already speaks for itself (e.g. a single-file rename). One sentence noting that the diff is self-explanatory is the correct output.