diff --git a/skills/requesting-code-review/SKILL.md b/skills/requesting-code-review/SKILL.md index 6db906a..e6b0129 100644 --- a/skills/requesting-code-review/SKILL.md +++ b/skills/requesting-code-review/SKILL.md @@ -62,6 +62,9 @@ The reviewer must explicitly scan for each of these bug classes on every diff. T | **Scope-vs-dispatch drift** | The implementation ships materially MORE or LESS than the dispatch asked for. Both directions are findings. (See the dedicated gate above.) | | **Shortcut / band-aid fix** | The fix suppresses a symptom without addressing the root cause. Examples: nil-guard on a value that should never be nil (ask why it is nil); retry loop masking a race condition; special-casing one known-bad input instead of fixing the upstream producer; deleting a failing test instead of fixing the underlying code. Ask: "if the root cause were restored and only the symptom suppressor kept, would the test still pass?" If yes, the test proves nothing about correctness. | | **One-sided boundary wiring** | A change touches an interface boundary (producer→consumer, caller→callee, plugin→host, sender→handler — see `agents/boundary-classes.md` for the canonical list) but only one side is implemented or tested. The other side is a stub, TODO, or unchecked assumption. Both sides of every boundary must be wired, and at least one test must exercise the full crossing end-to-end (not mock both sides independently). | +| **Test hermeticity / non-determinism** | A test makes a real network call (e.g. `connect("localhost:50051")`), depends on a real DB without a `t.Skip`-when-missing guard, writes to a shared filesystem path that races with parallel tests, depends on the system clock without injection, or relies on external services that may flake. CI should be reproducible from a clean checkout; tests that touch the world without mediation are landmines. Promote to **Important** even if the test currently passes — non-determinism only fails on the day you most need green CI. | +| **Cross-platform / portability** | A path uses `/` literals or string-`contains("\\")` checks where `std::path::Path::components` (Rust) or `filepath.Join`/`filepath.Separator` (Go) would be portable. Code spawns shell commands with bash-only syntax. Code reads Unix-only env vars (`HOME`, `XDG_*`, `SHELL`) without falling back on Windows. Code splits `PATH` on `:` rather than the platform separator (`;` on Windows). Asset paths assume one filesystem layout. CI matrices commonly hit Linux first, so these regressions slip past local development. | +| **Vacuous assertion** | A test whose body asserts something tautologically true so it cannot fail meaningfully. Two flavors: (1) syntactic tautology — `assert err == nil \|\| err != nil`, `assert result == result`, `assert p >= Floor \|\| p < Floor`. (2) trivially satisfied for every test input — assertion is non-tautological in general but the test inputs never exercise the failing branch (e.g., `assert composed_probability == Floor` on inputs that never trigger clamping; `assert spawns_observed == spawns_recorded` where both sides come from the same in-process list). The function under test may be correct, but the test proves nothing. Worse than no test — gives false confidence. Either rewrite the assertion to exercise the named path, or rename the test to match what it does. | For each finding, the reviewer cites the bug class explicitly. This makes patterns visible across reviews and lets the team identify recurring weaknesses. diff --git a/skills/subagent-driven-development/SKILL.md b/skills/subagent-driven-development/SKILL.md index 481e367..9e27d48 100644 --- a/skills/subagent-driven-development/SKILL.md +++ b/skills/subagent-driven-development/SKILL.md @@ -199,7 +199,7 @@ Agent tool: - DM the implementer who built it with specific issues - Wait for fix and re-review 4. If approved: - - Mark the "Review quality:" task as completed via TaskUpdate + - Mark BOTH tasks completed via TaskUpdate: the "Review quality:" task AND the corresponding "Implement:" task. Code-reviewer is the only role that flips the Implement task to completed; missing this leaves the orchestrator with bookkeeping stragglers. - DM team-lead: "Task N fully approved" ## Team Conventions diff --git a/skills/subagent-driven-development/code-quality-reviewer-prompt.md b/skills/subagent-driven-development/code-quality-reviewer-prompt.md index 34bcba9..165571d 100644 --- a/skills/subagent-driven-development/code-quality-reviewer-prompt.md +++ b/skills/subagent-driven-development/code-quality-reviewer-prompt.md @@ -2,16 +2,16 @@ Use this template when dispatching a code quality reviewer subagent. -**Purpose:** Verify implementation is well-built (clean, tested, maintainable) +**Purpose:** Verify implementation is well-built (clean, tested, maintainable) and find bugs the implementer missed **Only dispatch after spec compliance review passes.** **Agent Teams additions:** When using Agent Teams, also add to the prompt: - Wait for DMs from spec-reviewer saying a task is spec-approved -- DM implementer when quality issues are found +- DM the implementer who CURRENTLY OWNS the task (per `TaskList` `owner` field), not the implementer who wrote the original test fixture or whoever happened to be idle. The owner is the person responsible for fixing it. - DM the orchestrator (team-lead) when task is fully approved -- Use TaskUpdate to mark "Review quality:" tasks as completed +- After approving, **mark BOTH tasks completed**: the "Review quality: …" task AND the corresponding "Implement: …" task. Code-reviewer is the only role that flips the Implement task to completed; missing this leaves the team-lead with bookkeeping stragglers. ``` @@ -30,9 +30,81 @@ Task tool (superpowers:code-reviewer): For the bug-class checklist and verdict vocabulary, use `skills/requesting-code-review/SKILL.md`. + ## Adversarial framing — required mindset + + You are NOT validating that the code matches the spec. You are looking + for bugs the implementer missed. Find at least three things wrong, even + if they seem minor. Bias toward finding issues. If fewer than three are + found, explicitly document every bug-class check you ran and what you + verified. Reflexive approval is forbidden. + + ## Required workflow (DO ALL FOUR — do not skip any) + + 1. **Read the diff for the WHOLE task:** + - When `BASE_SHA` and `HEAD_SHA` are both known: `git diff ..` (covers all commits in the task) + - When you only have the tip commit: `git show ` covers ONE commit. Check `git log ..` first; if the task spans multiple commits, prefer the range diff. + - Do NOT use `git diff HEAD~1` blindly — it picks an arbitrary parent that may not be the task's base. + + 2. **Run the relevant tests yourself** in the same turn. Do NOT trust + the implementer's "tests pass" report. Use the project's actual test + invocation, NOT a hard-coded one — read the project's CONTRIBUTING.md / + README / Makefile if unsure: + - Go projects with build tags: `go test -race -tags= .//...` — only include `-tags=e2e` if the project actually uses that tag AND you intend to run the integration tests. Don't force non-hermetic e2e runs that the rubric's hermeticity rule says to flag. + - Go projects without build tags: `go test -race .//...` + - Rust: `cargo test --manifest-path ` (add `--features ` only when the diff actually changed features-gated code). + - For tests-only diffs: run the smallest relevant test scope. Go: `go test .// -run ` to scope to specific tests, or the full package if multiple. Rust: `cargo test --manifest-path --test ` to scope to a specific integration test, or `cargo test -p ` for the full crate. Capture pass/fail in the review. + + 3. **Run the full bug-class checklist from + `skills/requesting-code-review/SKILL.md`**. Every class. State for + each: "ran, found X" or "ran, nothing found". + + 4. **Apply Minor→Important promotion rules.** Promote any of these + from Minor to Important (blocking) before issuing your verdict: + - Test makes a real network call, real DB call without skip-when-missing, + real filesystem write that could race, or any non-hermetic dependency + that could cause CI flakes. + - Magic number duplicated across modules where the source-of-truth + constant should be referenced (or exposed for tests to reference). + - Hardcoded path separator (`/`), endpoint, or platform-specific + assumption that breaks on Windows/macOS/Linux when it should be + portable. + - Test-only deferral noted as "fix later in Task X" that is the third + (or later) such deferral against the same module — call it out as + cascade risk before it becomes a fix-cycle in a later task. + - **Vacuous assertion** — two flavors: (a) syntactic tautology like + `assert err == nil || err != nil` or `assert p >= Floor || p < Floor`; + (b) assertion is non-tautological in general but the chosen test inputs + trivially satisfy it (e.g., `assert composed_probability == Floor` when + no input combination actually triggers clamping; `assert spawn_count == + observed_count` when both values come from the same in-memory list). + The function under test may be correct, but the test doesn't prove it. + Worse than no test — gives false confidence. Promote to Important and + require the implementer to either rewrite the assertion to actually + exercise the named path, or rename the test to match what it does. + + ## Output format + + Per-finding inline at file:line, using the format from + `skills/requesting-code-review/SKILL.md`. End with one verdict: + SHIP-IT | FIX-FORWARD | REQUEST-CHANGES | REVERT-AND-REWRITE. + When notified that a task is spec-approved and ready for quality review: - Notify the implementer when quality issues are found - Notify the orchestrator (team-lead on Claude Code) when the task is fully approved ``` **Code reviewer returns:** Strengths, Issues (Critical/Important/Minor), Assessment + +## Why this template inlines key rules + +The bug-class checklist and adversarial framing live in +`skills/requesting-code-review/SKILL.md`, but field experience shows that +delegating these via reference alone produces reviewers who "review the +work" rather than adversarially scan it. The reviewer reads the spec, +sees the diff matches, and signs off — without running tests, without +applying every bug class, and without promoting Minor determinism issues +to Important. + +The four-step workflow (read diff → run tests → full checklist → promote) +is the floor, not the ceiling. Inlining it here forces every dispatch to +include the discipline rather than trusting the agent to look it up. diff --git a/skills/subagent-driven-development/spec-reviewer-prompt.md b/skills/subagent-driven-development/spec-reviewer-prompt.md index 48d60b9..aee6f0c 100644 --- a/skills/subagent-driven-development/spec-reviewer-prompt.md +++ b/skills/subagent-driven-development/spec-reviewer-prompt.md @@ -8,8 +8,8 @@ Use this template when dispatching a spec compliance reviewer subagent. **Agent Teams additions:** When using Agent Teams, also add to the prompt: - Wait for DMs from implementers saying a task is ready - DM code-reviewer when spec compliance passes -- DM implementer when issues are found -- Use TaskUpdate to mark "Review spec:" tasks as completed +- DM the implementer who CURRENTLY OWNS the task (per `TaskList` `owner` field) when issues are found, not whoever sent the most recent DM +- Use TaskUpdate to mark "Review spec:" tasks as completed (only the Review spec task; never the Implement task — code-reviewer marks that one) ``` @@ -30,12 +30,61 @@ Task tool (general-purpose): [From implementer's report] - Verify by reading actual code — do not trust the report. Compare - implementation to spec line by line. Report: - - ✅ Spec compliant (after code inspection) - - ❌ Issues found: [list specifically what's missing or extra, with file:line references] + ## Required workflow (DO ALL FIVE) + + 1. **Read the actual diff for the WHOLE task**: + - If `BASE_SHA` and `HEAD_SHA` are both known: `git diff ..` (covers all commits in the task) + - If only the latest commit is provided: `git show ` covers a single commit; if you suspect the task spans multiple commits (look at `git log ..`), prefer the range diff + - Do NOT trust the implementer's summary + - Do NOT use `git diff HEAD~1` blindly — it picks an arbitrary parent that may not be the task's base + + 2. **Walk every "Acceptance" bullet from the task description**. + For each, locate the corresponding change in the diff: + - Required file created/modified? Confirm path. + - Required function/RPC/test added? grep the diff for it. + - Required behavior asserted? Read the test body, not just the file. + - Required side-effect (DB write, AGE edge, forensic_evidence row, + event emission)? grep for the helper call by name. + + If any acceptance bullet has no corresponding change: **MISSING**. + + 3. **For TDD tasks, run the test yourself**. Do not trust "I ran it": + - "RED" task (failing test): `go test ./...` (or equivalent) MUST FAIL + when you run it. If it passes, the test isn't asserting what the + spec says it asserts. + - "GREEN" task (implementation makes prior RED test pass): run the + same command; the prior failing test MUST now pass. + - Capture and paste the actual command output in your review. + + 4. **Diff scope check** — list everything the diff does that the spec + did NOT ask for. Even helpful additions are SCOPE CREEP findings; + flag them. Implementer may justify and you accept on merits. + + 5. **Reuse-vs-reinvention check** — if the spec says "emit X via the + existing pipeline", grep the diff to confirm the existing helper + was called. A new parallel helper or inline raw SQL where a typed + helper exists = reject as scope creep / drift. + + ## Output + + - ✅ Spec compliant (after running steps 1–5; cite which were verified) + - ❌ Issues found — list per finding with file:line: + - **MISSING** — spec asked for X, diff doesn't deliver it + - **SCOPE CREEP** — diff includes Y not in spec + - **DRIFT** — diff fakes/reinvents an existing helper When notified that a task is ready for review: - Notify the code-reviewer when spec compliance passes - Notify the implementer when issues are found ``` + +## Why these steps are inlined + +Field experience: spec-reviewers who "read the spec and check the diff" +reflexively approve once each acceptance bullet has a matching file in +the diff — without verifying the file actually does what the spec says, +without running the test the spec describes, and without grepping for +the side-effects the spec demands. + +The five-step workflow is the floor. The reviewer must produce evidence +(command output, grep matches) for each step before issuing approval.