GoCodeAlone · intel352 · Apr 25, 2026 · Apr 25, 2026 · Apr 25, 2026 · Apr 25, 2026
diff --git a/skills/requesting-code-review/SKILL.md b/skills/requesting-code-review/SKILL.md
@@ -62,6 +62,9 @@ The reviewer must explicitly scan for each of these bug classes on every diff. T
 | **Scope-vs-dispatch drift** | The implementation ships materially MORE or LESS than the dispatch asked for. Both directions are findings. (See the dedicated gate above.) |
 | **Shortcut / band-aid fix** | The fix suppresses a symptom without addressing the root cause. Examples: nil-guard on a value that should never be nil (ask why it is nil); retry loop masking a race condition; special-casing one known-bad input instead of fixing the upstream producer; deleting a failing test instead of fixing the underlying code. Ask: "if the root cause were restored and only the symptom suppressor kept, would the test still pass?" If yes, the test proves nothing about correctness. |
 | **One-sided boundary wiring** | A change touches an interface boundary (producer→consumer, caller→callee, plugin→host, sender→handler — see `agents/boundary-classes.md` for the canonical list) but only one side is implemented or tested. The other side is a stub, TODO, or unchecked assumption. Both sides of every boundary must be wired, and at least one test must exercise the full crossing end-to-end (not mock both sides independently). |
+| **Test hermeticity / non-determinism** | A test makes a real network call (e.g. `connect("localhost:50051")`), depends on a real DB without a `t.Skip`-when-missing guard, writes to a shared filesystem path that races with parallel tests, depends on the system clock without injection, or relies on external services that may flake. CI should be reproducible from a clean checkout; tests that touch the world without mediation are landmines. Promote to **Important** even if the test currently passes — non-determinism only fails on the day you most need green CI. |
+| **Cross-platform / portability** | A path uses `/` literals or string-`contains("\\")` checks where `std::path::Path::components` (Rust) or `filepath.Join`/`filepath.Separator` (Go) would be portable. Code spawns shell commands with bash-only syntax. Code reads Unix-only env vars (`HOME`, `XDG_*`, `SHELL`) without falling back on Windows. Code splits `PATH` on `:` rather than the platform separator (`;` on Windows). Asset paths assume one filesystem layout. CI matrices commonly hit Linux first, so these regressions slip past local development. |
+| **Vacuous assertion** | A test whose body asserts something tautologically true so it cannot fail meaningfully. Two flavors: (1) syntactic tautology — `assert err == nil \|\| err != nil`, `assert result == result`, `assert p >= Floor \|\| p < Floor`. (2) trivially satisfied for every test input — assertion is non-tautological in general but the test inputs never exercise the failing branch (e.g., `assert composed_probability == Floor` on inputs that never trigger clamping; `assert spawns_observed == spawns_recorded` where both sides come from the same in-process list). The function under test may be correct, but the test proves nothing. Worse than no test — gives false confidence. Either rewrite the assertion to exercise the named path, or rename the test to match what it does. |
 
 For each finding, the reviewer cites the bug class explicitly. This makes patterns visible across reviews and lets the team identify recurring weaknesses.
 

diff --git a/skills/subagent-driven-development/SKILL.md b/skills/subagent-driven-development/SKILL.md
@@ -199,7 +199,7 @@ Agent tool:
        - DM the implementer who built it with specific issues
        - Wait for fix and re-review
     4. If approved:
-       - Mark the "Review quality:" task as completed via TaskUpdate
+       - Mark BOTH tasks completed via TaskUpdate: the "Review quality:" task AND the corresponding "Implement:" task. Code-reviewer is the only role that flips the Implement task to completed; missing this leaves the orchestrator with bookkeeping stragglers.
        - DM team-lead: "Task N fully approved"
 
     ## Team Conventions

diff --git a/skills/subagent-driven-development/code-quality-reviewer-prompt.md b/skills/subagent-driven-development/code-quality-reviewer-prompt.md
@@ -2,16 +2,16 @@
 
 Use this template when dispatching a code quality reviewer subagent.
 
-**Purpose:** Verify implementation is well-built (clean, tested, maintainable)
+**Purpose:** Verify implementation is well-built (clean, tested, maintainable) and find bugs the implementer missed
 
 **Only dispatch after spec compliance review passes.**
 
 <host: claude-code>
 **Agent Teams additions:** When using Agent Teams, also add to the prompt:
 - Wait for DMs from spec-reviewer saying a task is spec-approved
-- DM implementer when quality issues are found
+- DM the implementer who CURRENTLY OWNS the task (per `TaskList` `owner` field), not the implementer who wrote the original test fixture or whoever happened to be idle. The owner is the person responsible for fixing it.
 - DM the orchestrator (team-lead) when task is fully approved
-- Use TaskUpdate to mark "Review quality:" tasks as completed
+- After approving, **mark BOTH tasks completed**: the "Review quality: …" task AND the corresponding "Implement: …" task. Code-reviewer is the only role that flips the Implement task to completed; missing this leaves the team-lead with bookkeeping stragglers.
 </host>
 
 ```
@@ -30,9 +30,81 @@ Task tool (superpowers:code-reviewer):
   For the bug-class checklist and verdict vocabulary, use
   `skills/requesting-code-review/SKILL.md`.
 
+  ## Adversarial framing — required mindset
+
+  You are NOT validating that the code matches the spec. You are looking
+  for bugs the implementer missed. Find at least three things wrong, even
+  if they seem minor. Bias toward finding issues. If fewer than three are
+  found, explicitly document every bug-class check you ran and what you
+  verified. Reflexive approval is forbidden.
+
+  ## Required workflow (DO ALL FOUR — do not skip any)
+
+  1. **Read the diff for the WHOLE task:**
+     - When `BASE_SHA` and `HEAD_SHA` are both known: `git diff <BASE_SHA>..<HEAD_SHA>` (covers all commits in the task)
+     - When you only have the tip commit: `git show <HEAD_SHA>` covers ONE commit. Check `git log <suspected-base>..<HEAD_SHA>` first; if the task spans multiple commits, prefer the range diff.
+     - Do NOT use `git diff HEAD~1` blindly — it picks an arbitrary parent that may not be the task's base.
+
+  2. **Run the relevant tests yourself** in the same turn. Do NOT trust
+     the implementer's "tests pass" report. Use the project's actual test
+     invocation, NOT a hard-coded one — read the project's CONTRIBUTING.md /
+     README / Makefile if unsure:
+     - Go projects with build tags: `go test -race -tags=<project's tag(s)> ./<changed-pkg>/...` — only include `-tags=e2e` if the project actually uses that tag AND you intend to run the integration tests. Don't force non-hermetic e2e runs that the rubric's hermeticity rule says to flag.
+     - Go projects without build tags: `go test -race ./<changed-pkg>/...`
+     - Rust: `cargo test --manifest-path <path>` (add `--features <relevant>` only when the diff actually changed features-gated code).
+     - For tests-only diffs: run the smallest relevant test scope. Go: `go test ./<changed-pkg>/ -run <TestName>` to scope to specific tests, or the full package if multiple. Rust: `cargo test --manifest-path <path> --test <test-binary>` to scope to a specific integration test, or `cargo test -p <crate>` for the full crate. Capture pass/fail in the review.
+
+  3. **Run the full bug-class checklist from
+     `skills/requesting-code-review/SKILL.md`**. Every class. State for
+     each: "ran, found X" or "ran, nothing found".
+
+  4. **Apply Minor→Important promotion rules.** Promote any of these
+     from Minor to Important (blocking) before issuing your verdict:
+     - Test makes a real network call, real DB call without skip-when-missing,
+       real filesystem write that could race, or any non-hermetic dependency
+       that could cause CI flakes.
+     - Magic number duplicated across modules where the source-of-truth
+       constant should be referenced (or exposed for tests to reference).
+     - Hardcoded path separator (`/`), endpoint, or platform-specific
+       assumption that breaks on Windows/macOS/Linux when it should be
+       portable.
+     - Test-only deferral noted as "fix later in Task X" that is the third
+       (or later) such deferral against the same module — call it out as
+       cascade risk before it becomes a fix-cycle in a later task.
+     - **Vacuous assertion** — two flavors: (a) syntactic tautology like
+       `assert err == nil || err != nil` or `assert p >= Floor || p < Floor`;
+       (b) assertion is non-tautological in general but the chosen test inputs
+       trivially satisfy it (e.g., `assert composed_probability == Floor` when
+       no input combination actually triggers clamping; `assert spawn_count ==
+       observed_count` when both values come from the same in-memory list).
+       The function under test may be correct, but the test doesn't prove it.
+       Worse than no test — gives false confidence. Promote to Important and
+       require the implementer to either rewrite the assertion to actually
+       exercise the named path, or rename the test to match what it does.
+
+  ## Output format
+
+  Per-finding inline at file:line, using the format from
+  `skills/requesting-code-review/SKILL.md`. End with one verdict:
+  SHIP-IT | FIX-FORWARD | REQUEST-CHANGES | REVERT-AND-REWRITE.
+
   When notified that a task is spec-approved and ready for quality review:
   - Notify the implementer when quality issues are found
   - Notify the orchestrator (team-lead on Claude Code) when the task is fully approved
 ```
 
 **Code reviewer returns:** Strengths, Issues (Critical/Important/Minor), Assessment
+
+## Why this template inlines key rules
+
+The bug-class checklist and adversarial framing live in
+`skills/requesting-code-review/SKILL.md`, but field experience shows that
+delegating these via reference alone produces reviewers who "review the
+work" rather than adversarially scan it. The reviewer reads the spec,
+sees the diff matches, and signs off — without running tests, without
+applying every bug class, and without promoting Minor determinism issues
+to Important.
+
+The four-step workflow (read diff → run tests → full checklist → promote)
+is the floor, not the ceiling. Inlining it here forces every dispatch to
+include the discipline rather than trusting the agent to look it up.
diff --git a/skills/subagent-driven-development/spec-reviewer-prompt.md b/skills/subagent-driven-development/spec-reviewer-prompt.md
@@ -8,8 +8,8 @@ Use this template when dispatching a spec compliance reviewer subagent.
 **Agent Teams additions:** When using Agent Teams, also add to the prompt:
 - Wait for DMs from implementers saying a task is ready
 - DM code-reviewer when spec compliance passes
-- DM implementer when issues are found
-- Use TaskUpdate to mark "Review spec:" tasks as completed
+- DM the implementer who CURRENTLY OWNS the task (per `TaskList` `owner` field) when issues are found, not whoever sent the most recent DM
+- Use TaskUpdate to mark "Review spec:" tasks as completed (only the Review spec task; never the Implement task — code-reviewer marks that one)
 </host>
 
 ```
@@ -30,12 +30,61 @@ Task tool (general-purpose):
 
     [From implementer's report]
 
-    Verify by reading actual code — do not trust the report. Compare
-    implementation to spec line by line. Report:
-    - ✅ Spec compliant (after code inspection)
-    - ❌ Issues found: [list specifically what's missing or extra, with file:line references]
+    ## Required workflow (DO ALL FIVE)
+
+    1. **Read the actual diff for the WHOLE task**:
+       - If `BASE_SHA` and `HEAD_SHA` are both known: `git diff <BASE_SHA>..<HEAD_SHA>` (covers all commits in the task)
+       - If only the latest commit is provided: `git show <HEAD_SHA>` covers a single commit; if you suspect the task spans multiple commits (look at `git log <suspected-base>..<HEAD_SHA>`), prefer the range diff
+       - Do NOT trust the implementer's summary
+       - Do NOT use `git diff HEAD~1` blindly — it picks an arbitrary parent that may not be the task's base
+
+    2. **Walk every "Acceptance" bullet from the task description**.
+       For each, locate the corresponding change in the diff:
+       - Required file created/modified? Confirm path.
+       - Required function/RPC/test added? grep the diff for it.
+       - Required behavior asserted? Read the test body, not just the file.
+       - Required side-effect (DB write, AGE edge, forensic_evidence row,
+         event emission)? grep for the helper call by name.
+
+       If any acceptance bullet has no corresponding change: **MISSING**.
+
+    3. **For TDD tasks, run the test yourself**. Do not trust "I ran it":
+       - "RED" task (failing test): `go test ./...` (or equivalent) MUST FAIL
+         when you run it. If it passes, the test isn't asserting what the
+         spec says it asserts.
+       - "GREEN" task (implementation makes prior RED test pass): run the
+         same command; the prior failing test MUST now pass.
+       - Capture and paste the actual command output in your review.
+
+    4. **Diff scope check** — list everything the diff does that the spec
+       did NOT ask for. Even helpful additions are SCOPE CREEP findings;
+       flag them. Implementer may justify and you accept on merits.
+
+    5. **Reuse-vs-reinvention check** — if the spec says "emit X via the
+       existing pipeline", grep the diff to confirm the existing helper
+       was called. A new parallel helper or inline raw SQL where a typed
+       helper exists = reject as scope creep / drift.
+
+    ## Output
+
+    - ✅ Spec compliant (after running steps 1–5; cite which were verified)
+    - ❌ Issues found — list per finding with file:line:
+      - **MISSING** — spec asked for X, diff doesn't deliver it
+      - **SCOPE CREEP** — diff includes Y not in spec
+      - **DRIFT** — diff fakes/reinvents an existing helper
 
     When notified that a task is ready for review:
     - Notify the code-reviewer when spec compliance passes
     - Notify the implementer when issues are found
 ```
+
+## Why these steps are inlined
+
+Field experience: spec-reviewers who "read the spec and check the diff"
+reflexively approve once each acceptance bullet has a matching file in
+the diff — without verifying the file actually does what the spec says,
+without running the test the spec describes, and without grepping for
+the side-effects the spec demands.
+
+The five-step workflow is the floor. The reviewer must produce evidence
+(command output, grep matches) for each step before issuing approval.