Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Adversarial bug finding skill for [Claude Code](https://claude.com/claude-code).

## How it works

Inspired by [@systematicls's article]([https://x.com/systematicls](https://x.com/systematicls/status/2028814227004395561)) on exploiting LLM sycophancy for better code review:
Inspired by [@systematicls's article](https://x.com/systematicls/status/2028814227004395561) on exploiting LLM sycophancy for better code review:

1. **Hunter** - Scans your code and reports every possible bug (biased to over-report)
2. **Skeptic** - Tries to disprove each bug (biased to dismiss false positives)
Expand Down Expand Up @@ -34,6 +34,41 @@ Claude Code auto-discovers skills in `~/.claude/skills/`.

**Branch diff mode** (`-b`) scans only files changed in a branch compared to a base branch (defaults to `main`). It reads the full file contents — not just the diff — so bug detection quality is preserved.

### Dynamic Model Assignment

Assign different AI providers to each role using CLI flags. Defaults to all-Claude when no flags are given.

**Presets:**

```
/bug-hunt --preset=claude src/ # All Claude (default)
/bug-hunt --preset=codex src/ # All Codex
/bug-hunt --preset=gemini src/ # All Gemini
/bug-hunt --preset=mixed src/ # Hunter=Codex, Skeptic=Claude, Referee=Gemini
```

**Individual role overrides:**

```
/bug-hunt --hunter=codex --skeptic=claude --referee=gemini src/
/bug-hunt --hunter=codex src/ # Only Hunter uses Codex, rest default to Claude
```

Individual flags override preset values:

```
/bug-hunt --preset=codex --referee=claude src/ # Codex for Hunter+Skeptic, Claude for Referee
```

**Supported providers:**

| Provider | CLI Required | Install |
|----------|-------------|---------|
| `claude` | None (built-in) | Included with Claude Code |
| `codex` | [Codex CLI](https://github.com/openai/codex) | `npm install -g @openai/codex` |
| `gemini` | [Gemini CLI](https://github.com/google-gemini/gemini-cli) | `npm install -g @google/gemini-cli` |

Claude roles run as isolated Claude Code subagents. Codex and Gemini roles shell out to their respective CLI tools.
## Update

```bash
Expand Down
174 changes: 138 additions & 36 deletions SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: bug-hunt
description: "Run adversarial bug hunting on your codebase. Uses 3 isolated agents (Hunter, Skeptic, Referee) to find and verify real bugs with high fidelity. Invoke with /bug-hunt, /bug-hunt [path], or /bug-hunt -b <branch> [--base <base>]."
argument-hint: "[path | -b <branch> [--base <base-branch>]]"
description: "Run adversarial bug hunting on your codebase. Uses 3 isolated agents (Hunter, Skeptic, Referee) to find and verify real bugs with high fidelity. Supports dynamic model assignment with --preset and --hunter/--skeptic/--referee flags, plus branch-diff scans with -b/--base."
argument-hint: "[--preset=claude|codex|gemini|mixed] [--hunter=claude|codex|gemini] [--skeptic=claude|codex|gemini] [--referee=claude|codex|gemini] [path | -b <branch> [--base <base-branch>]]"
disable-model-invocation: true
---

Expand All @@ -12,77 +12,179 @@ Run a 3-agent adversarial bug hunt on your codebase. Each agent runs in isolatio
## Usage

```
/bug-hunt # Scan entire project
/bug-hunt src/ # Scan specific directory
/bug-hunt lib/auth.ts # Scan specific file
/bug-hunt -b feature-xyz # Scan files changed in feature-xyz vs main
/bug-hunt -b feature-xyz --base dev # Scan files changed in feature-xyz vs dev
/bug-hunt
/bug-hunt src/
/bug-hunt lib/auth.ts
/bug-hunt -b feature-xyz
/bug-hunt -b feature-xyz --base dev
/bug-hunt --preset=mixed src/
/bug-hunt --hunter=codex --referee=gemini -b feature-xyz --base dev
```

## Target
## Arguments

The raw arguments are: $ARGUMENTS

**Parse the arguments as follows:**
## Step 0: Parse and Validate Arguments

1. If arguments contain `-b <branch>`: this is a **branch diff mode**.
Parse the arguments string above to extract:

1. **Provider selection flags**: `--preset=`, `--hunter=`, `--skeptic=`, `--referee=`
2. **Branch diff flags**: `-b <branch>` and optional `--base <base-branch>`
3. **Scan target**: either a path target or the explicit file list from branch diff mode

**Valid providers:** `claude`, `codex`, `gemini`

**Validation:** If any role-specific provider flag (`--hunter`, `--skeptic`, or `--referee`) contains a value other than `claude`, `codex`, or `gemini`, stop immediately and report the error:
```
Error: Invalid provider "[value]". Valid providers are: claude, codex, gemini
```

If `--preset` contains a value other than `claude`, `codex`, `gemini`, or `mixed`, stop immediately and report the error:
```
Error: Invalid preset "[value]". Valid presets are: claude, codex, gemini, mixed
```

**Preset definitions:**

| Preset | Hunter | Skeptic | Referee |
|--------|--------|---------|---------|
| `claude` (default) | claude | claude | claude |
| `codex` | codex | codex | codex |
| `gemini` | gemini | gemini | gemini |
| `mixed` | codex | claude | gemini |

**Resolution order:**
1. Start with the preset (default: `claude`)
2. Individual `--hunter=`, `--skeptic=`, `--referee=` flags override the preset for that role

**Target resolution:**

1. If arguments contain `-b <branch>`, use **branch diff mode**.
- Extract the branch name after `-b`.
- If `--base <base-branch>` is also present, use that as the base branch. Otherwise default to `main`.
- Run `git diff --name-only <base>...<branch>` using the Bash tool to get the list of changed files.
- If the command fails (e.g. branch not found), report the error to the user and stop.
- If the command fails (for example, branch not found), report the error to the user and stop.
- If no files changed, tell the user there are no changes to scan and stop.
- The scan target is the list of changed files (scan their full contents, not just the diff).
2. If arguments do NOT contain `-b`: treat the entire argument string as a **path target** (file or directory). If empty, scan the current working directory.

## Execution Steps
- The scan target is the list of changed files. Scan their full contents, not just the diff.
2. If arguments do NOT contain `-b`, treat the remaining non-provider arguments as a path target (file or directory). If empty, scan the current working directory.

You MUST follow these steps in exact order. Each agent runs as a separate subagent via the Agent tool to ensure context isolation.
**Path validation:** If a path target was specified, verify it exists using Glob or the filesystem before proceeding. If the target does not exist, stop immediately and report:
```
Error: Scan target "[path]" does not exist.
```

### Step 1: Parse arguments and resolve target
After parsing and validation, announce the configuration to the user:

Follow the rules in the **Target** section above to determine the scan target. If in branch diff mode, run the git diff command now and collect the file list.
```
Bug Hunt Configuration:
Hunter: [provider]
Skeptic: [provider]
Referee: [provider]
Target: [path, "current directory", or changed files from <base>...<branch>]
```

### Step 2: Read the prompt files
## Step 1: Read the prompt files

Read these files using the skill directory variable:
- ${CLAUDE_SKILL_DIR}/prompts/hunter.md
- ${CLAUDE_SKILL_DIR}/prompts/skeptic.md
- ${CLAUDE_SKILL_DIR}/prompts/referee.md

### Step 3: Run the Hunter Agent
## Step 2: Run the Hunter Agent

Dispatch the Hunter based on its assigned provider:

**If provider is `claude`:**
Launch a general-purpose subagent with the hunter prompt via the Agent tool. Include the resolved scan target in the agent's task. If in branch diff mode, pass the explicit file list so the Hunter only scans those files, using their full contents. The Hunter must use tools (Read, Glob, Grep) to examine the actual code.

Launch a general-purpose subagent with the hunter prompt. Include the scan target in the agent's task. If in branch diff mode, pass the explicit file list so the Hunter only scans those files (full contents). The Hunter must use tools (Read, Glob, Grep) to examine the actual code.
**If provider is `codex` or `gemini`:**
Write the full prompt (hunter instructions + scan target) to a unique temporary file using `mktemp`:

```bash
PROMPT_FILE=$(mktemp /tmp/bug-hunt-hunter-XXXXXX.md)
```

Wait for the Hunter to complete and capture its full output.
Write the prompt content to `$PROMPT_FILE` using the Write tool, then invoke the CLI via stdin:

### Step 3b: Check for findings
- **Codex:** `cat "$PROMPT_FILE" | codex exec -`
- **Gemini:** `cat "$PROMPT_FILE" | gemini -p -`

If the Hunter reported TOTAL FINDINGS: 0, skip Steps 4-5 and go directly to Step 6 with a clean report. No need to run Skeptic and Referee on zero findings.
**Never interpolate prompt content or scan target directly into shell command strings.** Always pass via stdin or file to prevent injection.

### Step 4: Run the Skeptic Agent
If the external CLI is not installed, report the error clearly and suggest installing it. Clean up the temp file after capturing the output.

Launch a NEW general-purpose subagent with the skeptic prompt. Inject the Hunter's structured bug list (BUG-IDs, files, lines, claims, evidence, severity, points). Do NOT include any narrative or methodology text outside the structured findings.
Wait for completion and capture the full output.

## Step 2b: Check Hunter success and findings

**First, verify the Hunter completed successfully.** If the Hunter agent failed (CLI error, crash, empty output, or missing structured findings), stop immediately and report the error to the user. Do not proceed to the Skeptic or Referee.

If the Hunter reported TOTAL FINDINGS: 0, skip Steps 3-4 and go directly to Step 5 with a clean report. No need to run Skeptic and Referee on zero findings.

## Step 3: Run the Skeptic Agent

Dispatch the Skeptic based on its assigned provider:

**If provider is `claude`:**
Launch a NEW general-purpose subagent with the skeptic prompt via the Agent tool. Inject the Hunter's structured bug list (BUG-IDs, files, lines, claims, evidence, severity, points). Do NOT include any narrative or methodology text outside the structured findings.

The Skeptic must independently read the code to verify each claim.

Wait for the Skeptic to complete and capture its full output.
**If provider is `codex` or `gemini`:**
Write the full prompt (skeptic instructions + Hunter's structured bug list) to a unique temporary file using `mktemp`:

```bash
PROMPT_FILE=$(mktemp /tmp/bug-hunt-skeptic-XXXXXX.md)
```

Write the prompt content to `$PROMPT_FILE`, then invoke the CLI via stdin:

- **Codex:** `cat "$PROMPT_FILE" | codex exec -`
- **Gemini:** `cat "$PROMPT_FILE" | gemini -p -`

**Never interpolate prompt content or report data directly into shell command strings.** Always pass via stdin or file.

### Step 5: Run the Referee Agent
Clean up the temp file after capturing the output.

Launch a NEW general-purpose subagent with the referee prompt. Inject BOTH:
Wait for completion and capture the full output.

## Step 4: Run the Referee Agent

Dispatch the Referee based on its assigned provider:

**If provider is `claude`:**
Launch a NEW general-purpose subagent with the referee prompt via the Agent tool. Inject BOTH:
- The Hunter's full bug report
- The Skeptic's full challenge report

The Referee must independently read the code to make final judgments.

Wait for the Referee to complete and capture its full output.
**If provider is `codex` or `gemini`:**
Write the full prompt (referee instructions + Hunter's report + Skeptic's report) to a unique temporary file using `mktemp`:

```bash
PROMPT_FILE=$(mktemp /tmp/bug-hunt-referee-XXXXXX.md)
```

Write the prompt content to `$PROMPT_FILE`, then invoke the CLI via stdin:

- **Codex:** `cat "$PROMPT_FILE" | codex exec -`
- **Gemini:** `cat "$PROMPT_FILE" | gemini -p -`

**Never interpolate prompt content or report data directly into shell command strings.** Always pass via stdin or file.

Clean up the temp file after capturing the output.

Wait for completion and capture the full output.

### Step 6: Present the Final Report
## Step 5: Present the Final Report

Display the Referee's final verified bug report to the user. Include:
1. The summary stats
2. The confirmed bugs table (sorted by severity)
3. Low-confidence items flagged for manual review
4. A collapsed section with dismissed bugs (for transparency)
1. The configuration used (which provider ran each role)
2. The summary stats
3. The confirmed bugs table (sorted by severity)
4. Low-confidence items flagged for manual review
5. A collapsed section with dismissed bugs (for transparency)

If zero bugs were confirmed, say so clearly a clean report is a good result.
If zero bugs were confirmed, say so clearly - a clean report is a good result.