Harden review prompts for consistency and noise reduction#579
Harden review prompts for consistency and noise reduction#579mariusvniekerk wants to merge 13 commits intomainfrom
Conversation
Bare "high/medium/low" labels give agents no shared calibration standard, leading to inconsistent severity across reviews. Defining each level in terms of real-world impact (data loss, exploitability, blast radius) aligns all agents on the same scale and naturally prevents low-value findings from being over-rated. Inspired by the impact × breadth scoring pattern from research-oriented analysis skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace "brief explanation of the problem" with "what specifically goes wrong if this is not fixed." This is the articulation test pattern from research-oriented analysis skills — every finding must justify itself with concrete impact reasoning, not just pattern-matching against a checklist. Findings like "this violates best practices" become impossible to write when the prompt demands specific harm. This is the single most effective noise reduction technique across the mop-mapping skill set. Applied to all review types: standard, dirty, range, security, and design. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /rethink skill uses explicit evidence thresholds — "1 observation is a data point, 3+ is a pattern worth investigating." The /verify skill grounds every check in specific data. Applied here as negative prompt instructions that suppress the most common false positive categories: hypothetical issues in unseen code, style preferences, unfounded "missing tests" claims, and flagging patterns that match existing codebase conventions. Security reviews get a lighter version — they should still err toward reporting, but not flag theoretical vulnerabilities in untouched code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /verify skill's "recite" phase is its most powerful technique: read only the title, predict what the content should be, then check alignment. Applied here by reversing the old instruction "Do not review the commit message" — the commit message now becomes the primary lens for evaluating the diff. When a commit says "fix race condition" but the diff adds a mutex on the wrong resource, that's a high-value finding that pure diff-scanning misses. Intent-implementation gaps are now the first check category, above bugs and security, because they catch the class of errors where the code is internally consistent but doesn't do what the developer intended. The dirty-changes prompt is unchanged since uncommitted changes have no commit message to analyze. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /verify and /synthesize skills both enforce quality gates — checks that must pass before output is considered complete. Applied here as a final self-verification instruction: every finding must reference a specific diff location, severity must match the described impact, and no two findings may contradict each other. Findings that fail these checks are dropped. This catches the most embarrassing review failures (high-severity verdict with no actual line references, "pass" with critical findings listed) at near-zero cost since the model performs the check during the same generation. Applied to all review types: standard, dirty, range, and security. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /rethink skill's evidence accumulation pattern — "1 observation is a data point, 3+ is a pattern worth investigating" — directly applies to the insights system. Without explicit thresholds, the insights agent may recommend guideline changes from 1-2 occurrences (noise) or hesitate on strong 6+ patterns. Added tiered thresholds to the recurring patterns section and gated guideline suggestions on minimum 3 occurrences. This helps close the feedback loop between review noise and guideline refinement with appropriate confidence levels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Short or meaningless commit messages like "fix", "wip", or "update" don't carry enough signal for an intent-implementation comparison. When the message is vague, the reviewer now infers intent from the diff itself and skips the alignment check rather than fixating on a low-information message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Two review findings: 1. Quality gate required exact file-and-line for every finding, which would discard legitimate omission-based findings (missing test coverage, architectural gaps). Now requires "narrowest applicable location" — line when possible, file or diff-level when the issue is an omission or span. 2. Range prompt compared individual commit messages against the aggregate diff. In multi-commit ranges, later commits intentionally refine earlier ones, producing false "intent gap" findings. Now validates whether the final result achieves the series' overall goal instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Commit messages are attacker-controlled input in shared repositories and could contain prompt-like instructions to suppress or skew review findings. Two mitigations: 1. System prompts now explicitly state that commit messages are untrusted descriptive context that must never be followed as instructions. 2. The builder wraps embedded commit message content in <commit-message> and <commit-messages> XML tags with context-only="true" to clearly demarcate where untrusted data begins and ends. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
roborev: Combined Review (
|
A crafted commit message containing </commit-message> could break out of the context-only wrapper and inject top-level instructions into the review prompt. Use encoding/xml.EscapeText from stdlib to escape all interpolated commit metadata (subject, author, body) before embedding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Return a safe placeholder instead of silently ignoring the error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Commit metadata lives in the trimmable currentOverflow section and may be dropped when the diff is very large. The system prompts now conditionally enable intent-alignment: "If a <commit-message> tag is present..." with an explicit fallback to infer intent from the diff when no message is available. This avoids fabricated intent-based findings when prompt budget trimming removes the commit message, while keeping the metadata trimmable so oversized subjects/authors don't blow the budget for Codex fallback variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
|
Going to handle all this overflow crap with some follow on templating work |
Summary
high/medium/lowlabels with concrete definitions tied to real-world impact (data loss, exploitability, blast radius). Gives all agents a shared calibration standard so severity is consistent across reviews.🤖 Generated with Claude Code