Tighten subagent-driven-development reviewer prompts#21
Merged
Conversation
Field experience running subagent-driven-development on a Phase 0
foundations branch (separate-Steam-SKU companion-game scaffold) showed
that reviewers reflexively approve once each spec acceptance bullet has
a matching file in the diff — without running the test the spec
describes, without applying every bug class, and without promoting
Minor determinism issues to Important. Copilot subsequently caught
seven issues that internal review had missed:
- test using a real network call to localhost
- hardcoded magic number duplicated against a constant in another module
- Windows path-separator bug in a HUD-file detector
- FORBIDDEN list scope mismatched its own doc comment
- test invocation command copy-paste error
- Cargo.toml missing [[test]] entries (deferred-then-cascaded)
These are bugs the existing rubric should have caught — symmetry
violations, comment-vs-code drift, missing edge cases, magic literals.
The gap was reviewer discipline: prompts delegated the bug-class checklist
to skills/requesting-code-review/SKILL.md by reference, and reviewers
glossed past it without applying every class.
Changes:
1. skills/subagent-driven-development/code-quality-reviewer-prompt.md —
inline a four-step required workflow (read git show <sha>, run the
tests yourself, full bug-class scan, Minor->Important promotion).
The bug-class checklist still lives in SKILL.md but the discipline
to apply it on every dispatch is now in the prompt template.
2. skills/subagent-driven-development/spec-reviewer-prompt.md —
inline a five-step workflow that requires walking every Acceptance
bullet and locating its corresponding diff change, running TDD
tests yourself (red must fail; green must pass), checking scope
creep, and grepping for reuse-of-existing-helpers.
3. skills/requesting-code-review/SKILL.md — add two new bug classes
to the checklist:
- Test hermeticity / non-determinism (real network/DB/clock/fs in
tests; promote to Important even if the test passes today).
- Cross-platform / portability (path-separator literals, bash-only
shell syntax, Unix-only env-var assumptions).
The four-step / five-step inlines duplicate spirit-of-rule that's also
in SKILL.md; this is intentional. Field experience: the reviewer skill
file gets read once at agent spawn and then forgotten on subsequent
review rounds. Inlining the discipline in the prompt template makes it
unavoidable on every dispatch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR tightens the reviewer prompt templates used by the subagent-driven-development workflow to enforce more disciplined reviews (read the real diff, run tests, apply the full checklist, and promote determinism/portability issues).
Changes:
- Inline a required step-by-step workflow into the spec reviewer prompt (acceptance-by-acceptance verification, TDD test execution, scope and reuse checks).
- Inline an adversarial, test-running, checklist-driven workflow into the code-quality reviewer prompt (including Minor→Important promotion rules).
- Extend the requesting-code-review bug-class checklist with new “test hermeticity” and “cross-platform/portability” classes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| skills/subagent-driven-development/spec-reviewer-prompt.md | Adds an explicit 5-step spec-review workflow and required output expectations. |
| skills/subagent-driven-development/code-quality-reviewer-prompt.md | Adds adversarial framing and a 4-step quality-review workflow including test execution and promotion rules. |
| skills/requesting-code-review/SKILL.md | Adds two new bug classes to the reviewer checklist: hermeticity and portability. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Phase 1 field-data refinements after running the tightened rubric on a 13-task implementation phase. Three patterns surfaced that the existing prompt didn't address explicitly: 1. **DM-routing slips.** Code-reviewer occasionally DM'd the wrong implementer when fixes were needed — typically the one who wrote a prior task's test fixture, rather than the current task's owner. Add: explicit "DM the implementer who CURRENTLY OWNS the task per TaskList owner field" rule. Same rule added to spec-reviewer prompt. 2. **Bookkeeping completion gap.** Code-reviewer consistently marked the "Review quality" task completed but missed flipping the "Implement" task to completed, leaving team-lead with a stragglers queue. Add: explicit "mark BOTH tasks completed" rule with note that code-reviewer is the only role that flips the Implement task. 3. **Vacuous-assertion bug class.** Phase 1 reviewer-loop caught a test whose assertion `p < Floor || p > Ceiling` was tautologically true for all the inputs it exercised — the function under test was correct but the test proved nothing. Existing checklist had "Test-name-vs-body mismatch" but didn't explicitly cover the tautology case. Add as a 14th bug class with explicit examples. Field results from Phase 1 with the tightened rubric (pre-refinement): - 4 fix cycles (3 caught by code-reviewer, 1 by spec-reviewer running tests directly) - Error-swallowing class caught twice (Tasks 1.7 and 1.13) by the same pattern recognition — pure validation of the existing checklist - Code-reviewer cited actual `go test` / `cargo test` output on every approval, walked failure modes one-by-one, and consistently produced audit-grade output Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- SKILL.md: clarify cross-platform examples — PATH exists on Windows with `;` separator; HOME / XDG_* / SHELL are the truly Unix-only vars. Mention the PATH-separator gotcha explicitly. - spec-reviewer-prompt.md Step 1: prefer `git diff <BASE>..<HEAD>` when the task spans multiple commits; `git show <HEAD>` covers only the tip and would miss earlier task commits. - code-quality-reviewer-prompt.md Step 1: same clarification on range diff vs single-commit show. - code-quality-reviewer-prompt.md Step 2: drop hard-coded `-tags=e2e` for Go. The project's actual tag set is project-specific; forcing e2e tags can run non-hermetic tests, contradicting the rubric's own hermeticity rule. Examples now show: with project's actual tags; without tags; Rust with conditional features. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- SKILL.md (subagent-driven-development) line 202: code-reviewer now marks BOTH tasks completed (Review quality + Implement). Reconciles the canonical workflow with the prompt-template addition. Field experience showed leaving Implement tasks at in_progress strands bookkeeping for the orchestrator. - SKILL.md (requesting-code-review) Vacuous-assertion class: replaced the misleading example. Now distinguishes (a) syntactic tautology (`err == nil || err != nil`, `p >= Floor || p < Floor`) from (b) trivially-satisfied-for-every-test-input (`composed_probability == Floor` on inputs that never trigger clamping; `spawn_count == observed_count` where both come from the same in-memory list — this was the actual Phase 2 finding pattern). - code-quality-reviewer-prompt.md: same vacuous-assertion fix as SKILL.md table. - code-quality-reviewer-prompt.md "tests-only" command: replaced the vague "run only the changed test file" with language-specific smallest-scope examples — `go test ./<pkg>/ -run <TestName>` or full package for Go; `cargo test --test <binary>` or `-p <crate>` for Rust. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
Apr 25, 2026
Picks up the reviewer-prompt tightening from PR #21: - Inlined four-step / five-step required workflows in the subagent-driven-development reviewer prompt templates so reviewers apply the bug-class checklist + run-the-tests-yourself discipline on every dispatch - Three new bug classes in skills/requesting-code-review/SKILL.md: Test hermeticity / non-determinism, Cross-platform / portability, Vacuous assertion - DM-routing rule (DM the implementer who currently OWNS the task) and bookkeeping rule (code-reviewer marks BOTH the Review-quality AND the Implement task completed) reconciled across SKILL.md and the prompt template - Range-diff guidance for multi-commit tasks (`git diff <base>..<head>`) - Project-appropriate test invocations (no hard-coded `-tags=e2e`) Field-validated across 24 implementation tasks (Phase 1 + Phase 2 of a Core Runner companion-game scaffold): 4 vacuous-assertion catches, 3 error-swallowing catches, 2 cross-task cascade preemptions, code- reviewers consistently citing actual test-run output. Additive only. No breaking changes to existing skill APIs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
Apr 25, 2026
Picks up the reviewer-prompt tightening from PR #21: - Inlined four-step / five-step required workflows in the subagent-driven-development reviewer prompt templates so reviewers apply the bug-class checklist + run-the-tests-yourself discipline on every dispatch - Three new bug classes in skills/requesting-code-review/SKILL.md: Test hermeticity / non-determinism, Cross-platform / portability, Vacuous assertion - DM-routing rule (DM the implementer who currently OWNS the task) and bookkeeping rule (code-reviewer marks BOTH the Review-quality AND the Implement task completed) reconciled across SKILL.md and the prompt template - Range-diff guidance for multi-commit tasks (`git diff <base>..<head>`) - Project-appropriate test invocations (no hard-coded `-tags=e2e`) Field-validated across 24 implementation tasks (Phase 1 + Phase 2 of a Core Runner companion-game scaffold): 4 vacuous-assertion catches, 3 error-swallowing catches, 2 cross-task cascade preemptions, code- reviewers consistently citing actual test-run output. Additive only. No breaking changes to existing skill APIs. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Field experience running
subagent-driven-developmenton a Phase 0 + Phase 1 + Phase 2 sequence (Core Runner companion-game scaffold for the Core Dump MMORPG) showed that internal reviewers were systematically passing PRs that Copilot subsequently flagged. The existing rubric inskills/requesting-code-review/SKILL.mdwas comprehensive — twelve bug classes with adversarial framing — but reviewers were reflexively approving once each spec acceptance bullet had a matching file in the diff, without running the test the spec described, without applying every bug class, and without promoting Minor determinism issues to Important.This PR tightens the prompt templates to inline the discipline that the bug-class checklist alone wasn't enforcing. Plus three new bug classes (Test hermeticity, Cross-platform / portability, Vacuous assertion) all validated by repeated field catches across Phases 1 and 2.
Field validation
Across 24 implementation tasks (Phase 1 + 2) under the tightened rubric:
_ = err/ unchecked-Scan patternCode-reviewer now consistently cites actual
go test/cargo testoutput on every approval. Spec-reviewer runs TDD red/green checks themselves rather than trusting implementer reports.Changes
skills/subagent-driven-development/code-quality-reviewer-prompt.mdgit diff <BASE>..<HEAD>for multi-commit tasks; singlegit show <HEAD>only when verified single-commit), run the tests yourself with project-appropriate invocation, run the full bug-class checklist, apply Minor→Important promotion.skills/subagent-driven-development/spec-reviewer-prompt.mdskills/requesting-code-review/SKILL.md(bug-class checklist) — adds three new classes:/path literals where portable APIs exist, bash-only shell syntax, Unix-only env vars (with note thatPATHexists on Windows with;separator).err == nil || err != nil,p >= Floor || p < Floor) from (b) trivially-satisfied-for-every-test-input (the more common field pattern, e.g. assertingcomposed_probability == Flooron inputs that never trigger clamping).skills/subagent-driven-development/SKILL.md— code-reviewer now marks BOTH the "Review quality:" task AND the corresponding "Implement:" task as completed. Field experience: leaving Implement tasks atin_progressafter quality approval strands bookkeeping for the orchestrator. The prompt template and SKILL.md are now consistent on this rule.Phase 2 retro additions:
TaskList, not whoever sent the most recent message)Why inline rules that already exist in SKILL.md?
Bug-class checklists and adversarial framing live in
requesting-code-review/SKILL.md, but field experience shows that delegating these by reference alone produces reviewers who "review the work" rather than adversarially scan it. The reviewer reads the spec, sees the diff matches, signs off — without running tests, without applying every bug class.The four-step / five-step inlines duplicate spirit-of-rule that's also in SKILL.md. This is intentional. The reviewer skill file gets read once at agent spawn and then forgotten on subsequent review rounds. Inlining the discipline in the prompt template makes it unavoidable on every dispatch.
Test plan
🤖 Generated with Claude Code