Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions skills/requesting-code-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ The reviewer must explicitly scan for each of these bug classes on every diff. T
| **Scope-vs-dispatch drift** | The implementation ships materially MORE or LESS than the dispatch asked for. Both directions are findings. (See the dedicated gate above.) |
| **Shortcut / band-aid fix** | The fix suppresses a symptom without addressing the root cause. Examples: nil-guard on a value that should never be nil (ask why it is nil); retry loop masking a race condition; special-casing one known-bad input instead of fixing the upstream producer; deleting a failing test instead of fixing the underlying code. Ask: "if the root cause were restored and only the symptom suppressor kept, would the test still pass?" If yes, the test proves nothing about correctness. |
| **One-sided boundary wiring** | A change touches an interface boundary (producer→consumer, caller→callee, plugin→host, sender→handler — see `agents/boundary-classes.md` for the canonical list) but only one side is implemented or tested. The other side is a stub, TODO, or unchecked assumption. Both sides of every boundary must be wired, and at least one test must exercise the full crossing end-to-end (not mock both sides independently). |
| **Test hermeticity / non-determinism** | A test makes a real network call (e.g. `connect("localhost:50051")`), depends on a real DB without a `t.Skip`-when-missing guard, writes to a shared filesystem path that races with parallel tests, depends on the system clock without injection, or relies on external services that may flake. CI should be reproducible from a clean checkout; tests that touch the world without mediation are landmines. Promote to **Important** even if the test currently passes — non-determinism only fails on the day you most need green CI. |
| **Cross-platform / portability** | A path uses `/` literals or string-`contains("\\")` checks where `std::path::Path::components` (Rust) or `filepath.Join`/`filepath.Separator` (Go) would be portable. Code spawns shell commands with bash-only syntax. Code reads Unix-only env vars (`HOME`, `XDG_*`, `SHELL`) without falling back on Windows. Code splits `PATH` on `:` rather than the platform separator (`;` on Windows). Asset paths assume one filesystem layout. CI matrices commonly hit Linux first, so these regressions slip past local development. |
| **Vacuous assertion** | A test whose body asserts something tautologically true so it cannot fail meaningfully. Two flavors: (1) syntactic tautology — `assert err == nil \|\| err != nil`, `assert result == result`, `assert p >= Floor \|\| p < Floor`. (2) trivially satisfied for every test input — assertion is non-tautological in general but the test inputs never exercise the failing branch (e.g., `assert composed_probability == Floor` on inputs that never trigger clamping; `assert spawns_observed == spawns_recorded` where both sides come from the same in-process list). The function under test may be correct, but the test proves nothing. Worse than no test — gives false confidence. Either rewrite the assertion to exercise the named path, or rename the test to match what it does. |

For each finding, the reviewer cites the bug class explicitly. This makes patterns visible across reviews and lets the team identify recurring weaknesses.

Expand Down
2 changes: 1 addition & 1 deletion skills/subagent-driven-development/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ Agent tool:
- DM the implementer who built it with specific issues
- Wait for fix and re-review
4. If approved:
- Mark the "Review quality:" task as completed via TaskUpdate
- Mark BOTH tasks completed via TaskUpdate: the "Review quality:" task AND the corresponding "Implement:" task. Code-reviewer is the only role that flips the Implement task to completed; missing this leaves the orchestrator with bookkeeping stragglers.
- DM team-lead: "Task N fully approved"

## Team Conventions
Expand Down
78 changes: 75 additions & 3 deletions skills/subagent-driven-development/code-quality-reviewer-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

Use this template when dispatching a code quality reviewer subagent.

**Purpose:** Verify implementation is well-built (clean, tested, maintainable)
**Purpose:** Verify implementation is well-built (clean, tested, maintainable) and find bugs the implementer missed

**Only dispatch after spec compliance review passes.**

<host: claude-code>
**Agent Teams additions:** When using Agent Teams, also add to the prompt:
- Wait for DMs from spec-reviewer saying a task is spec-approved
- DM implementer when quality issues are found
- DM the implementer who CURRENTLY OWNS the task (per `TaskList` `owner` field), not the implementer who wrote the original test fixture or whoever happened to be idle. The owner is the person responsible for fixing it.
- DM the orchestrator (team-lead) when task is fully approved
- Use TaskUpdate to mark "Review quality:" tasks as completed
- After approving, **mark BOTH tasks completed**: the "Review quality: …" task AND the corresponding "Implement: …" task. Code-reviewer is the only role that flips the Implement task to completed; missing this leaves the team-lead with bookkeeping stragglers.
Comment thread
intel352 marked this conversation as resolved.
</host>

```
Expand All @@ -30,9 +30,81 @@ Task tool (superpowers:code-reviewer):
For the bug-class checklist and verdict vocabulary, use
`skills/requesting-code-review/SKILL.md`.

## Adversarial framing — required mindset

You are NOT validating that the code matches the spec. You are looking
for bugs the implementer missed. Find at least three things wrong, even
if they seem minor. Bias toward finding issues. If fewer than three are
found, explicitly document every bug-class check you ran and what you
verified. Reflexive approval is forbidden.

## Required workflow (DO ALL FOUR — do not skip any)

1. **Read the diff for the WHOLE task:**
- When `BASE_SHA` and `HEAD_SHA` are both known: `git diff <BASE_SHA>..<HEAD_SHA>` (covers all commits in the task)
- When you only have the tip commit: `git show <HEAD_SHA>` covers ONE commit. Check `git log <suspected-base>..<HEAD_SHA>` first; if the task spans multiple commits, prefer the range diff.
- Do NOT use `git diff HEAD~1` blindly — it picks an arbitrary parent that may not be the task's base.

2. **Run the relevant tests yourself** in the same turn. Do NOT trust
the implementer's "tests pass" report. Use the project's actual test
invocation, NOT a hard-coded one — read the project's CONTRIBUTING.md /
README / Makefile if unsure:
- Go projects with build tags: `go test -race -tags=<project's tag(s)> ./<changed-pkg>/...` — only include `-tags=e2e` if the project actually uses that tag AND you intend to run the integration tests. Don't force non-hermetic e2e runs that the rubric's hermeticity rule says to flag.
- Go projects without build tags: `go test -race ./<changed-pkg>/...`
- Rust: `cargo test --manifest-path <path>` (add `--features <relevant>` only when the diff actually changed features-gated code).
- For tests-only diffs: run the smallest relevant test scope. Go: `go test ./<changed-pkg>/ -run <TestName>` to scope to specific tests, or the full package if multiple. Rust: `cargo test --manifest-path <path> --test <test-binary>` to scope to a specific integration test, or `cargo test -p <crate>` for the full crate. Capture pass/fail in the review.

3. **Run the full bug-class checklist from
`skills/requesting-code-review/SKILL.md`**. Every class. State for
each: "ran, found X" or "ran, nothing found".

4. **Apply Minor→Important promotion rules.** Promote any of these
from Minor to Important (blocking) before issuing your verdict:
- Test makes a real network call, real DB call without skip-when-missing,
real filesystem write that could race, or any non-hermetic dependency
that could cause CI flakes.
- Magic number duplicated across modules where the source-of-truth
constant should be referenced (or exposed for tests to reference).
- Hardcoded path separator (`/`), endpoint, or platform-specific
assumption that breaks on Windows/macOS/Linux when it should be
portable.
- Test-only deferral noted as "fix later in Task X" that is the third
(or later) such deferral against the same module — call it out as
cascade risk before it becomes a fix-cycle in a later task.
- **Vacuous assertion** — two flavors: (a) syntactic tautology like
`assert err == nil || err != nil` or `assert p >= Floor || p < Floor`;
(b) assertion is non-tautological in general but the chosen test inputs
trivially satisfy it (e.g., `assert composed_probability == Floor` when
no input combination actually triggers clamping; `assert spawn_count ==
observed_count` when both values come from the same in-memory list).
The function under test may be correct, but the test doesn't prove it.
Worse than no test — gives false confidence. Promote to Important and
require the implementer to either rewrite the assertion to actually
exercise the named path, or rename the test to match what it does.

## Output format

Per-finding inline at file:line, using the format from
`skills/requesting-code-review/SKILL.md`. End with one verdict:
SHIP-IT | FIX-FORWARD | REQUEST-CHANGES | REVERT-AND-REWRITE.

When notified that a task is spec-approved and ready for quality review:
- Notify the implementer when quality issues are found
- Notify the orchestrator (team-lead on Claude Code) when the task is fully approved
```

**Code reviewer returns:** Strengths, Issues (Critical/Important/Minor), Assessment

## Why this template inlines key rules

The bug-class checklist and adversarial framing live in
`skills/requesting-code-review/SKILL.md`, but field experience shows that
delegating these via reference alone produces reviewers who "review the
work" rather than adversarially scan it. The reviewer reads the spec,
sees the diff matches, and signs off — without running tests, without
applying every bug class, and without promoting Minor determinism issues
to Important.

The four-step workflow (read diff → run tests → full checklist → promote)
is the floor, not the ceiling. Inlining it here forces every dispatch to
include the discipline rather than trusting the agent to look it up.
61 changes: 55 additions & 6 deletions skills/subagent-driven-development/spec-reviewer-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ Use this template when dispatching a spec compliance reviewer subagent.
**Agent Teams additions:** When using Agent Teams, also add to the prompt:
- Wait for DMs from implementers saying a task is ready
- DM code-reviewer when spec compliance passes
- DM implementer when issues are found
- Use TaskUpdate to mark "Review spec:" tasks as completed
- DM the implementer who CURRENTLY OWNS the task (per `TaskList` `owner` field) when issues are found, not whoever sent the most recent DM
- Use TaskUpdate to mark "Review spec:" tasks as completed (only the Review spec task; never the Implement task — code-reviewer marks that one)
</host>

```
Expand All @@ -30,12 +30,61 @@ Task tool (general-purpose):

[From implementer's report]

Verify by reading actual code — do not trust the report. Compare
implementation to spec line by line. Report:
- ✅ Spec compliant (after code inspection)
- ❌ Issues found: [list specifically what's missing or extra, with file:line references]
## Required workflow (DO ALL FIVE)

1. **Read the actual diff for the WHOLE task**:
- If `BASE_SHA` and `HEAD_SHA` are both known: `git diff <BASE_SHA>..<HEAD_SHA>` (covers all commits in the task)
- If only the latest commit is provided: `git show <HEAD_SHA>` covers a single commit; if you suspect the task spans multiple commits (look at `git log <suspected-base>..<HEAD_SHA>`), prefer the range diff
- Do NOT trust the implementer's summary
- Do NOT use `git diff HEAD~1` blindly — it picks an arbitrary parent that may not be the task's base

2. **Walk every "Acceptance" bullet from the task description**.
For each, locate the corresponding change in the diff:
- Required file created/modified? Confirm path.
- Required function/RPC/test added? grep the diff for it.
- Required behavior asserted? Read the test body, not just the file.
- Required side-effect (DB write, AGE edge, forensic_evidence row,
event emission)? grep for the helper call by name.

If any acceptance bullet has no corresponding change: **MISSING**.

3. **For TDD tasks, run the test yourself**. Do not trust "I ran it":
- "RED" task (failing test): `go test ./...` (or equivalent) MUST FAIL
when you run it. If it passes, the test isn't asserting what the
spec says it asserts.
- "GREEN" task (implementation makes prior RED test pass): run the
same command; the prior failing test MUST now pass.
- Capture and paste the actual command output in your review.

4. **Diff scope check** — list everything the diff does that the spec
did NOT ask for. Even helpful additions are SCOPE CREEP findings;
flag them. Implementer may justify and you accept on merits.

5. **Reuse-vs-reinvention check** — if the spec says "emit X via the
existing pipeline", grep the diff to confirm the existing helper
was called. A new parallel helper or inline raw SQL where a typed
helper exists = reject as scope creep / drift.

## Output

- ✅ Spec compliant (after running steps 1–5; cite which were verified)
- ❌ Issues found — list per finding with file:line:
- **MISSING** — spec asked for X, diff doesn't deliver it
- **SCOPE CREEP** — diff includes Y not in spec
- **DRIFT** — diff fakes/reinvents an existing helper

When notified that a task is ready for review:
- Notify the code-reviewer when spec compliance passes
- Notify the implementer when issues are found
```

## Why these steps are inlined

Field experience: spec-reviewers who "read the spec and check the diff"
reflexively approve once each acceptance bullet has a matching file in
the diff — without verifying the file actually does what the spec says,
without running the test the spec describes, and without grepping for
the side-effects the spec demands.

The five-step workflow is the floor. The reviewer must produce evidence
(command output, grep matches) for each step before issuing approval.
Loading