From 8656a5fefc6522a2434bd36022d3be31bb6f2086 Mon Sep 17 00:00:00 2001 From: clydle <161477597+clydle@users.noreply.github.com> Date: Sat, 9 May 2026 15:16:53 -0400 Subject: [PATCH] codex: add verify mode + Claude-skepticism preamble (v1.1.0) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds two hardening changes to the /codex skill: A) SKEPTICISM PREAMBLE injected into all three existing modes (Review, Challenge, Consult). Codex is instructed to flag as P1: unverified third-party UI/API claims, bulk actions without single-item test prerequisites, confidence-language without evidence, and WordPress URL changes without DB reference checks. B) New Step 2D (Verify mode): /codex verify — pastes a Claude recommendation and gets independent fact-checking from Codex. Each external-system claim gets CLAIM/CITATION/VERDICT/REASONING/P1 entries and a single GATE: PASS/FAIL line. Both changes use tempfile + stdin (codex review - < file, codex exec - < file) rather than heredocs or positional args, sidestepping shell-escaping, EOF-collision, and Windows command-line length limits. Closes the 2026-05-09 ShipStation incident failure path: Claude recommended clicking "Send Notifications" (label-based inference, no doc verification), which re-sent shipping emails to 814 customers instead of replaying webhooks. Verify mode would have returned GATE: FAIL on that recommendation. Also bumps version 1.0.0 → 1.1.0 and updates frontmatter description. Co-Authored-By: Claude Sonnet 4.6 --- codex/SKILL.md | 512 ++++++++++++++++++++++++++++++++++++++++---- codex/SKILL.md.tmpl | 510 +++++++++++++++++++++++++++++++++++++++---- 2 files changed, 937 insertions(+), 85 deletions(-) diff --git a/codex/SKILL.md b/codex/SKILL.md index 77705f7e1d..d2a980bd00 100644 --- a/codex/SKILL.md +++ b/codex/SKILL.md @@ -1,12 +1,14 @@ --- name: codex -version: 1.0.0 +version: 1.1.0 description: | - OpenAI Codex CLI wrapper — three modes. Code review: independent diff review via - codex review with pass/fail gate. Challenge: adversarial mode that tries to break - your code. Consult: ask codex anything with session continuity for follow-ups. - The "200 IQ autistic developer" second opinion. Use when asked to "codex review", - "codex challenge", "ask codex", "second opinion", or "consult codex". + OpenAI Codex CLI wrapper — four modes. Code review: independent diff review + with pass/fail gate, Claude-skepticism preamble. Challenge: adversarial mode + that tries to break your code. Consult: ask codex anything with session + continuity. Verify: audit a recommendation Claude gave you, independently — + closes the 2026-05-09 ShipStation-class failure path. Use when asked to + "codex review", "codex challenge", "codex verify", "ask codex", "second + opinion", or "consult codex". allowed-tools: - Bash - Read @@ -251,6 +253,53 @@ assumptions, catches things you might miss. Present its output faithfully, not s --- +## Claude Code Trust Boundaries + +The /codex skill exists because Claude Code has a documented failure mode of +making confident, plausible-sounding claims about third-party systems without +verifying them against documentation. On 2026-05-09 Claude recommended clicking +ShipStation's "Send Notifications" button as "the preferred path" for replaying +WooCommerce shipnotify webhooks. The button does not replay webhooks — it +re-sends ShipStation's own branded shipping confirmation emails. 814 customers +received duplicate emails for orders shipped weeks earlier. Source: button-label +inference, no doc verification. + +When constructing any codex prompt — Review, Challenge, Consult, or Verify — +prepend the following SKEPTICISM PREAMBLE verbatim: + +──────────────────────────────────────────────────────────────────── +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. +──────────────────────────────────────────────────────────────────── + +--- + ## Step 0: Check codex binary ```bash @@ -269,7 +318,8 @@ Parse the user's input to determine which mode to run: 1. `/codex review` or `/codex review ` — **Review mode** (Step 2A) 2. `/codex challenge` or `/codex challenge ` — **Challenge mode** (Step 2B) -3. `/codex` with no arguments — **Auto-detect:** +3. `/codex verify` or `/codex verify ` — **Verify mode** (Step 2D) +4. `/codex` with no arguments — **Auto-detect:** - Check for a diff (with fallback if origin isn't available): `git diff origin/ --stat 2>/dev/null | tail -1 || git diff --stat 2>/dev/null | tail -1` - If a diff exists, use AskUserQuestion: @@ -277,48 +327,90 @@ Parse the user's input to determine which mode to run: Codex detected changes against the base branch. What should it do? A) Review the diff (code review with pass/fail gate) B) Challenge the diff (adversarial — try to break it) - C) Something else — I'll provide a prompt + C) Verify a recommendation Claude gave me (paste it next) + D) Something else — I'll provide a prompt ``` - If no diff, check for plan files scoped to the current project: `ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1` If no project-scoped match, fall back to: `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1` but warn the user: "Note: this plan may be from a different project." - If a plan file exists, offer to review it - - Otherwise, ask: "What would you like to ask Codex?" -4. `/codex ` — **Consult mode** (Step 2C), where the remaining text is the prompt + - Otherwise, ask: "Did you want to verify a recommendation Claude gave you, or ask Codex something else?" +5. `/codex ` — **Consult mode** (Step 2C), where the remaining text is the prompt --- ## Step 2A: Review Mode -Run Codex code review against the current branch diff. +Run Codex code review against the current branch diff, with the SKEPTICISM PREAMBLE prepended. -1. Create temp files for output capture: +1. Create temp files: ```bash +TMPPROMPT=$(mktemp /tmp/codex-prompt-XXXXXX.txt) TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt) ``` -2. Run the review (5-minute timeout): +2. Write the assembled prompt (preamble + review body) to the temp file. Substitute + `` with the detected base branch before writing. Use a single-quoted heredoc + since the SKEPTICISM PREAMBLE contains no shell variables that need expansion: ```bash -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +Now perform code review on the diff against the base branch. Apply P1/P2 +markers per existing convention, plus the additions from the preamble above. +EOF_PREAMBLE +``` + +If the user provided custom instructions (e.g., `/codex review focus on security`), +append them after the body — they never replace the preamble: +```bash +printf '\n\nAdditional user focus: %s\n' "$USER_FOCUS" >> "$TMPPROMPT" ``` -Use `timeout: 300000` on the Bash call. If the user provided custom instructions -(e.g., `/codex review focus on security`), pass them as the prompt argument: +3. Run the review via stdin (5-minute timeout). `codex review` accepts `-` to read + prompt from stdin: ```bash -codex review "focus on security" --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review - --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" < "$TMPPROMPT" ``` -3. Capture the output. Then parse cost from stderr: +4. Capture the output. Then parse cost from stderr: ```bash grep "tokens used" "$TMPERR" 2>/dev/null || echo "tokens: unknown" ``` -4. Determine gate verdict by checking the review output for critical findings. +5. Determine gate verdict by checking the review output for critical findings. If the output contains `[P1]` — the gate is **FAIL**. If no `[P1]` markers are found (only `[P2]` or no findings) — the gate is **PASS**. -5. Present the output: +6. Present the output: ``` CODEX SAYS (code review): @@ -334,7 +426,7 @@ or GATE: FAIL (N critical findings) ``` -6. **Cross-model comparison:** If `/review` (Claude's own review) was already run +7. **Cross-model comparison:** If `/review` (Claude's own review) was already run earlier in this conversation, compare the two sets of findings: ``` @@ -343,9 +435,19 @@ CROSS-MODEL ANALYSIS: Only Codex found: [findings unique to Codex] Only Claude found: [findings unique to Claude's /review] Agreement rate: X% (N/M total unique findings overlap) + +EXTERNAL-SYSTEM DISAGREEMENTS (if any): + When Codex and Claude disagree on a claim about a third-party UI/API/plugin/ + vendor system, surface the disagreement with: + "Codex contradicts Claude on . Codex cites: . Claude's + reasoning was: ." + Per the SKEPTICISM PREAMBLE, the user should default to Codex's verdict + unless they produce explicit contradicting evidence (a fetched doc URL or + verified test result). This rule is stated once in the preamble; this + section just makes the disagreement findable. ``` -7. Persist the review result: +8. Persist the review result: ```bash ~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N}' ``` @@ -353,9 +455,9 @@ CROSS-MODEL ANALYSIS: Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers). -8. Clean up temp files: +9. Clean up temp files: ```bash -rm -f "$TMPERR" +rm -f "$TMPPROMPT" "$TMPERR" ``` --- @@ -365,18 +467,91 @@ rm -f "$TMPERR" Codex tries to break your code — finding edge cases, race conditions, security holes, and failure modes that a normal review would miss. -1. Construct the adversarial prompt. If the user provided a focus area -(e.g., `/codex challenge security`), include it: - -Default prompt (no focus): -"Review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." +1. Create a temp file: +```bash +TMPPROMPT=$(mktemp /tmp/codex-prompt-XXXXXX.txt) +``` -With focus (e.g., "security"): -"Review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial." +2. Write the assembled prompt (preamble + adversarial body) to the temp file. + Substitute `` with the actual detected base branch before writing. -2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout): + Default prompt (no focus): +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +Now: review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. +EOF_PREAMBLE +``` + + With a user-specified focus (e.g., "security") — substitute `` and + `` with their actual values before writing: +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +Now: review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Focus specifically on . Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial. +EOF_PREAMBLE +``` + +3. Run codex exec with **JSONL output** via stdin (5-minute timeout). `codex exec` + accepts `-` to read prompt from stdin: ```bash -codex exec "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | python3 -c " +codex exec - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null < "$TMPPROMPT" | python3 -c " import sys, json for line in sys.stdin: line = line.strip() @@ -407,7 +582,12 @@ for line in sys.stdin: This parses codex's JSONL events to extract reasoning traces, tool calls, and the final response. The `[codex thinking]` lines show what codex reasoned through before its answer. -3. Present the full streamed output: +4. Clean up: +```bash +rm -f "$TMPPROMPT" +``` + +5. Present the full streamed output: ``` CODEX SAYS (adversarial challenge): @@ -437,6 +617,7 @@ B) Start a new conversation 2. Create temp files: ```bash +TMPPROMPT=$(mktemp /tmp/codex-prompt-XXXXXX.txt) TMPRESP=$(mktemp /tmp/codex-resp-XXXXXX.txt) TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt) ``` @@ -448,20 +629,91 @@ ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/d ``` If no project-scoped match, fall back to `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1` but warn: "Note: this plan may be from a different project — verify before sending to Codex." -Read the plan file and prepend the persona to the user's prompt: -"You are a brutally honest technical reviewer. Review this plan for: logical gaps and + +4. Assemble the prompt with the SKEPTICISM PREAMBLE prepended and write to the temp file. + + For **plan review**, read the plan content and include it in the prompt: +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +You are a brutally honest technical reviewer. Review this plan for: logical gaps and unstated assumptions, missing error handling or edge cases, overcomplexity (is there a simpler approach?), feasibility risks (what could go wrong?), and missing dependencies or sequencing issues. Be direct. Be terse. No compliments. Just the problems. THE PLAN: -" + +EOF_PREAMBLE +``` -4. Run codex exec with **JSONL output** to capture reasoning traces (5-minute timeout): + For **general consult**, write the preamble then append the user's prompt: +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. +EOF_PREAMBLE +printf '\n\n%s\n' "$USER_PROMPT" >> "$TMPPROMPT" +``` + +5. Run codex exec with **JSONL output** via stdin (5-minute timeout): For a **new session:** ```bash -codex exec "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c " +codex exec - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" < "$TMPPROMPT" | python3 -c " import sys, json for line in sys.stdin: line = line.strip() @@ -494,12 +746,12 @@ for line in sys.stdin: For a **resumed session** (user chose "Continue"): ```bash -codex exec resume "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c " +codex exec resume - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" < "$TMPPROMPT" | python3 -c " " ``` -5. Capture session ID from the streamed output. The parser prints `SESSION_ID:` +6. Capture session ID from the streamed output. The parser prints `SESSION_ID:` from the `thread.started` event. Save it for follow-ups: ```bash mkdir -p .context @@ -507,7 +759,7 @@ mkdir -p .context Save the session ID printed by the parser (the line starting with `SESSION_ID:`) to `.context/codex-session-id`. -6. Present the full streamed output: +7. Present the full streamed output: ``` CODEX SAYS (consult): @@ -518,10 +770,181 @@ Tokens: N | Est. cost: ~$X.XX Session saved — run /codex again to continue this conversation. ``` -7. After presenting, note any points where Codex's analysis differs from your own +8. After presenting, note any points where Codex's analysis differs from your own understanding. If there is a disagreement, flag it: "Note: Claude Code disagrees on X because Y." +9. Clean up: +```bash +rm -f "$TMPPROMPT" "$TMPRESP" "$TMPERR" +``` + +--- + +## Step 2D: Verify Mode + +The user pastes a recommendation Claude gave them. Codex independently verifies +each factual claim about external systems against documentation. + +1. **Get the payload.** If the user invoked `/codex verify` with no payload, + emit a literal text prompt: "Paste the recommendation you want verified. + When you're done, send it. (No length limit — write to a file if it's + very long.)" Then wait for the next user message. Do NOT use + AskUserQuestion here — it's wrong for free-text paste of unbounded length. + The next user message becomes `$RECOMMENDATION_TEXT`. + +2. **Empty-payload guard.** If `$RECOMMENDATION_TEXT` after trimming is empty, + short-circuit: +```bash +echo "$RECOMMENDATION_TEXT" | tr -d '[:space:]' +``` + Output: `GATE: PASS — no recommendation provided, nothing to verify.` + Skip the Codex call entirely. Do NOT attempt to detect "no external-system + claims" in non-empty payloads — let Codex decide. The cost of one Codex call + on genuinely trivial input is acceptable; misclassifying a real claim as + "doesn't need checking" is exactly the failure mode this mode defends against. + +3. **Build the prompt via tempfile** (avoids shell escaping, EOF-collision, and + Windows command-line length limits): +```bash +TMPPROMPT=$(mktemp /tmp/codex-verify-prompt-XXXXXX.txt) +TMPRECOMMEND=$(mktemp /tmp/codex-verify-rec-XXXXXX.txt) +TMPERR=$(mktemp /tmp/codex-verify-err-XXXXXX.txt) + +# Write the user's pasted recommendation raw — no shell expansion. +printf '%s' "$RECOMMENDATION_TEXT" > "$TMPRECOMMEND" + +# Build the verify prompt. Single-quoted heredoc — no $ expansion. +cat > "$TMPPROMPT" <<'EOF_VERIFY_PROMPT' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +The file at the path provided below contains a recommendation generated by +Claude Code (an LLM). The user is considering acting on it. Your job is +independent verification. + +Begin by reading the recommendation: use your file-read or shell tool to +`cat` the file at the path listed at the end of this prompt. + +For every factual claim in the recommendation that touches an external +system (third-party UI button, API endpoint, plugin behavior, vendor +configuration, SaaS dashboard semantic, button label meaning, webhook +behavior, payment gateway behavior, email send behavior), produce one +verification entry in this exact format: + + CLAIM: + CITATION: + VERDICT: + REASONING: + P1: + +After listing all verification entries, output exactly one GATE line: + GATE: PASS — only if every claim is VERIFIED with a non-DOC_NOT_FOUND citation. + GATE: FAIL — N P1 findings. + +Hard rule: PASS-by-omission is not allowed. If you found no claims to +check, output `GATE: PASS — no external-system claims found.` instead. + +RECOMMENDATION FILE PATH: +EOF_VERIFY_PROMPT + +# Append the file path outside the heredoc — so it expands correctly and +# a future EOF_VERIFY_PROMPT in user content can't break the heredoc. +printf '%s\n' "$TMPRECOMMEND" >> "$TMPPROMPT" +``` + +4. **Run codex exec via stdin** (5-minute timeout): +```bash +codex exec - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" < "$TMPPROMPT" | python3 -c " +import sys, json +for line in sys.stdin: + line = line.strip() + if not line: continue + try: + obj = json.loads(line) + t = obj.get('type','') + if t == 'item.completed' and 'item' in obj: + item = obj['item'] + itype = item.get('type','') + text = item.get('text','') + if itype == 'reasoning' and text: + print(f'[codex thinking] {text}') + print() + elif itype == 'agent_message' and text: + print(text) + elif itype == 'command_execution': + cmd = item.get('command','') + if cmd: print(f'[codex ran] {cmd}') + elif t == 'turn.completed': + usage = obj.get('usage',{}) + tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0) + if tokens: print(f'\ntokens used: {tokens}') + except: pass +" +``` + +5. **Parse the GATE line and present output:** + +``` +CODEX SAYS (verify): +════════════════════════════════════════════════════════════ + +════════════════════════════════════════════════════════════ +GATE: FAIL — N P1 findings. DO NOT ACT ON THE RECOMMENDATION AS WRITTEN. +Tokens: N | Est. cost: ~$X.XX +``` + +or: + +``` +GATE: PASS — all external-system claims verified with citations. +``` + +6. **Persist the verify result:** +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-verify","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N}' +``` + +Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL), +GATE ("pass" or "fail"), findings (count of P1 entries). + +7. **Clean up:** +```bash +rm -f "$TMPPROMPT" "$TMPRECOMMEND" "$TMPERR" +``` + +8. **No session continuity for verify.** Unlike Step 2C (Consult), verify is + one-shot. If a follow-up verify is needed for a related claim, the user + re-invokes `/codex verify` with new payload. + --- ## Model & Reasoning @@ -572,3 +995,6 @@ If token count is not available, display: `Tokens: unknown` - **5-minute timeout** on all Bash calls to codex (`timeout: 300000`). - **No double-reviewing.** If the user already ran `/review`, Codex provides a second independent opinion. Do not re-run Claude Code's own review. +- **Skepticism preamble is mandatory.** Every codex prompt — Review, Challenge, + Consult, Verify — must prepend the SKEPTICISM PREAMBLE block from the + "Claude Code Trust Boundaries" section. No exceptions. diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl index 30b603ee28..f1e8583523 100644 --- a/codex/SKILL.md.tmpl +++ b/codex/SKILL.md.tmpl @@ -1,12 +1,14 @@ --- name: codex -version: 1.0.0 +version: 1.1.0 description: | - OpenAI Codex CLI wrapper — three modes. Code review: independent diff review via - codex review with pass/fail gate. Challenge: adversarial mode that tries to break - your code. Consult: ask codex anything with session continuity for follow-ups. - The "200 IQ autistic developer" second opinion. Use when asked to "codex review", - "codex challenge", "ask codex", "second opinion", or "consult codex". + OpenAI Codex CLI wrapper — four modes. Code review: independent diff review + with pass/fail gate, Claude-skepticism preamble. Challenge: adversarial mode + that tries to break your code. Consult: ask codex anything with session + continuity. Verify: audit a recommendation Claude gave you, independently — + closes the 2026-05-09 ShipStation-class failure path. Use when asked to + "codex review", "codex challenge", "codex verify", "ask codex", "second + opinion", or "consult codex". allowed-tools: - Bash - Read @@ -30,6 +32,53 @@ assumptions, catches things you might miss. Present its output faithfully, not s --- +## Claude Code Trust Boundaries + +The /codex skill exists because Claude Code has a documented failure mode of +making confident, plausible-sounding claims about third-party systems without +verifying them against documentation. On 2026-05-09 Claude recommended clicking +ShipStation's "Send Notifications" button as "the preferred path" for replaying +WooCommerce shipnotify webhooks. The button does not replay webhooks — it +re-sends ShipStation's own branded shipping confirmation emails. 814 customers +received duplicate emails for orders shipped weeks earlier. Source: button-label +inference, no doc verification. + +When constructing any codex prompt — Review, Challenge, Consult, or Verify — +prepend the following SKEPTICISM PREAMBLE verbatim: + +──────────────────────────────────────────────────────────────────── +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. +──────────────────────────────────────────────────────────────────── + +--- + ## Step 0: Check codex binary ```bash @@ -48,7 +97,8 @@ Parse the user's input to determine which mode to run: 1. `/codex review` or `/codex review ` — **Review mode** (Step 2A) 2. `/codex challenge` or `/codex challenge ` — **Challenge mode** (Step 2B) -3. `/codex` with no arguments — **Auto-detect:** +3. `/codex verify` or `/codex verify ` — **Verify mode** (Step 2D) +4. `/codex` with no arguments — **Auto-detect:** - Check for a diff (with fallback if origin isn't available): `git diff origin/ --stat 2>/dev/null | tail -1 || git diff --stat 2>/dev/null | tail -1` - If a diff exists, use AskUserQuestion: @@ -56,48 +106,90 @@ Parse the user's input to determine which mode to run: Codex detected changes against the base branch. What should it do? A) Review the diff (code review with pass/fail gate) B) Challenge the diff (adversarial — try to break it) - C) Something else — I'll provide a prompt + C) Verify a recommendation Claude gave me (paste it next) + D) Something else — I'll provide a prompt ``` - If no diff, check for plan files scoped to the current project: `ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1` If no project-scoped match, fall back to: `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1` but warn the user: "Note: this plan may be from a different project." - If a plan file exists, offer to review it - - Otherwise, ask: "What would you like to ask Codex?" -4. `/codex ` — **Consult mode** (Step 2C), where the remaining text is the prompt + - Otherwise, ask: "Did you want to verify a recommendation Claude gave you, or ask Codex something else?" +5. `/codex ` — **Consult mode** (Step 2C), where the remaining text is the prompt --- ## Step 2A: Review Mode -Run Codex code review against the current branch diff. +Run Codex code review against the current branch diff, with the SKEPTICISM PREAMBLE prepended. -1. Create temp files for output capture: +1. Create temp files: ```bash +TMPPROMPT=$(mktemp /tmp/codex-prompt-XXXXXX.txt) TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt) ``` -2. Run the review (5-minute timeout): +2. Write the assembled prompt (preamble + review body) to the temp file. Substitute + `` with the detected base branch before writing. Use a single-quoted heredoc + since the SKEPTICISM PREAMBLE contains no shell variables that need expansion: +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +Now perform code review on the diff against the base branch. Apply P1/P2 +markers per existing convention, plus the additions from the preamble above. +EOF_PREAMBLE +``` + +If the user provided custom instructions (e.g., `/codex review focus on security`), +append them after the body — they never replace the preamble: ```bash -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +printf '\n\nAdditional user focus: %s\n' "$USER_FOCUS" >> "$TMPPROMPT" ``` -Use `timeout: 300000` on the Bash call. If the user provided custom instructions -(e.g., `/codex review focus on security`), pass them as the prompt argument: +3. Run the review via stdin (5-minute timeout). `codex review` accepts `-` to read + prompt from stdin: ```bash -codex review "focus on security" --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review - --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" < "$TMPPROMPT" ``` -3. Capture the output. Then parse cost from stderr: +4. Capture the output. Then parse cost from stderr: ```bash grep "tokens used" "$TMPERR" 2>/dev/null || echo "tokens: unknown" ``` -4. Determine gate verdict by checking the review output for critical findings. +5. Determine gate verdict by checking the review output for critical findings. If the output contains `[P1]` — the gate is **FAIL**. If no `[P1]` markers are found (only `[P2]` or no findings) — the gate is **PASS**. -5. Present the output: +6. Present the output: ``` CODEX SAYS (code review): @@ -113,7 +205,7 @@ or GATE: FAIL (N critical findings) ``` -6. **Cross-model comparison:** If `/review` (Claude's own review) was already run +7. **Cross-model comparison:** If `/review` (Claude's own review) was already run earlier in this conversation, compare the two sets of findings: ``` @@ -122,9 +214,19 @@ CROSS-MODEL ANALYSIS: Only Codex found: [findings unique to Codex] Only Claude found: [findings unique to Claude's /review] Agreement rate: X% (N/M total unique findings overlap) + +EXTERNAL-SYSTEM DISAGREEMENTS (if any): + When Codex and Claude disagree on a claim about a third-party UI/API/plugin/ + vendor system, surface the disagreement with: + "Codex contradicts Claude on . Codex cites: . Claude's + reasoning was: ." + Per the SKEPTICISM PREAMBLE, the user should default to Codex's verdict + unless they produce explicit contradicting evidence (a fetched doc URL or + verified test result). This rule is stated once in the preamble; this + section just makes the disagreement findable. ``` -7. Persist the review result: +8. Persist the review result: ```bash ~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N}' ``` @@ -132,9 +234,9 @@ CROSS-MODEL ANALYSIS: Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers). -8. Clean up temp files: +9. Clean up temp files: ```bash -rm -f "$TMPERR" +rm -f "$TMPPROMPT" "$TMPERR" ``` --- @@ -144,18 +246,91 @@ rm -f "$TMPERR" Codex tries to break your code — finding edge cases, race conditions, security holes, and failure modes that a normal review would miss. -1. Construct the adversarial prompt. If the user provided a focus area -(e.g., `/codex challenge security`), include it: +1. Create a temp file: +```bash +TMPPROMPT=$(mktemp /tmp/codex-prompt-XXXXXX.txt) +``` -Default prompt (no focus): -"Review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." +2. Write the assembled prompt (preamble + adversarial body) to the temp file. + Substitute `` with the actual detected base branch before writing. + + Default prompt (no focus): +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +Now: review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. +EOF_PREAMBLE +``` -With focus (e.g., "security"): -"Review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial." + With a user-specified focus (e.g., "security") — substitute `` and + `` with their actual values before writing: +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +Now: review the changes on this branch against the base branch. Run `git diff origin/` to see the diff. Focus specifically on . Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial. +EOF_PREAMBLE +``` -2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout): +3. Run codex exec with **JSONL output** via stdin (5-minute timeout). `codex exec` + accepts `-` to read prompt from stdin: ```bash -codex exec "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | python3 -c " +codex exec - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null < "$TMPPROMPT" | python3 -c " import sys, json for line in sys.stdin: line = line.strip() @@ -186,7 +361,12 @@ for line in sys.stdin: This parses codex's JSONL events to extract reasoning traces, tool calls, and the final response. The `[codex thinking]` lines show what codex reasoned through before its answer. -3. Present the full streamed output: +4. Clean up: +```bash +rm -f "$TMPPROMPT" +``` + +5. Present the full streamed output: ``` CODEX SAYS (adversarial challenge): @@ -216,6 +396,7 @@ B) Start a new conversation 2. Create temp files: ```bash +TMPPROMPT=$(mktemp /tmp/codex-prompt-XXXXXX.txt) TMPRESP=$(mktemp /tmp/codex-resp-XXXXXX.txt) TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt) ``` @@ -227,20 +408,91 @@ ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/d ``` If no project-scoped match, fall back to `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1` but warn: "Note: this plan may be from a different project — verify before sending to Codex." -Read the plan file and prepend the persona to the user's prompt: -"You are a brutally honest technical reviewer. Review this plan for: logical gaps and + +4. Assemble the prompt with the SKEPTICISM PREAMBLE prepended and write to the temp file. + + For **plan review**, read the plan content and include it in the prompt: +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +You are a brutally honest technical reviewer. Review this plan for: logical gaps and unstated assumptions, missing error handling or edge cases, overcomplexity (is there a simpler approach?), feasibility risks (what could go wrong?), and missing dependencies or sequencing issues. Be direct. Be terse. No compliments. Just the problems. THE PLAN: -" + +EOF_PREAMBLE +``` -4. Run codex exec with **JSONL output** to capture reasoning traces (5-minute timeout): + For **general consult**, write the preamble then append the user's prompt: +```bash +cat > "$TMPPROMPT" <<'EOF_PREAMBLE' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. +EOF_PREAMBLE +printf '\n\n%s\n' "$USER_PROMPT" >> "$TMPPROMPT" +``` + +5. Run codex exec with **JSONL output** via stdin (5-minute timeout): For a **new session:** ```bash -codex exec "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c " +codex exec - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" < "$TMPPROMPT" | python3 -c " import sys, json for line in sys.stdin: line = line.strip() @@ -273,12 +525,12 @@ for line in sys.stdin: For a **resumed session** (user chose "Continue"): ```bash -codex exec resume "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c " +codex exec resume - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" < "$TMPPROMPT" | python3 -c " " ``` -5. Capture session ID from the streamed output. The parser prints `SESSION_ID:` +6. Capture session ID from the streamed output. The parser prints `SESSION_ID:` from the `thread.started` event. Save it for follow-ups: ```bash mkdir -p .context @@ -286,7 +538,7 @@ mkdir -p .context Save the session ID printed by the parser (the line starting with `SESSION_ID:`) to `.context/codex-session-id`. -6. Present the full streamed output: +7. Present the full streamed output: ``` CODEX SAYS (consult): @@ -297,10 +549,181 @@ Tokens: N | Est. cost: ~$X.XX Session saved — run /codex again to continue this conversation. ``` -7. After presenting, note any points where Codex's analysis differs from your own +8. After presenting, note any points where Codex's analysis differs from your own understanding. If there is a disagreement, flag it: "Note: Claude Code disagrees on X because Y." +9. Clean up: +```bash +rm -f "$TMPPROMPT" "$TMPRESP" "$TMPERR" +``` + +--- + +## Step 2D: Verify Mode + +The user pastes a recommendation Claude gave them. Codex independently verifies +each factual claim about external systems against documentation. + +1. **Get the payload.** If the user invoked `/codex verify` with no payload, + emit a literal text prompt: "Paste the recommendation you want verified. + When you're done, send it. (No length limit — write to a file if it's + very long.)" Then wait for the next user message. Do NOT use + AskUserQuestion here — it's wrong for free-text paste of unbounded length. + The next user message becomes `$RECOMMENDATION_TEXT`. + +2. **Empty-payload guard.** If `$RECOMMENDATION_TEXT` after trimming is empty, + short-circuit: +```bash +echo "$RECOMMENDATION_TEXT" | tr -d '[:space:]' +``` + Output: `GATE: PASS — no recommendation provided, nothing to verify.` + Skip the Codex call entirely. Do NOT attempt to detect "no external-system + claims" in non-empty payloads — let Codex decide. The cost of one Codex call + on genuinely trivial input is acceptable; misclassifying a real claim as + "doesn't need checking" is exactly the failure mode this mode defends against. + +3. **Build the prompt via tempfile** (avoids shell escaping, EOF-collision, and + Windows command-line length limits): +```bash +TMPPROMPT=$(mktemp /tmp/codex-verify-prompt-XXXXXX.txt) +TMPRECOMMEND=$(mktemp /tmp/codex-verify-rec-XXXXXX.txt) +TMPERR=$(mktemp /tmp/codex-verify-err-XXXXXX.txt) + +# Write the user's pasted recommendation raw — no shell expansion. +printf '%s' "$RECOMMENDATION_TEXT" > "$TMPRECOMMEND" + +# Build the verify prompt. Single-quoted heredoc — no $ expansion. +cat > "$TMPPROMPT" <<'EOF_VERIFY_PROMPT' +SKEPTICISM PREAMBLE — read first. + +The artifact you are reviewing was authored or recommended by Claude Code, an +LLM with a documented history of confident-but-unverified claims about +third-party systems. Treat every claim about external systems with extreme +skepticism. + +Specifically flag as P1: +1. ANY claim about a third-party UI button, API endpoint, plugin behavior, + vendor configuration, or SaaS dashboard semantic that is NOT accompanied by + a documentation URL. Claude inferring behavior from a label is NOT + sufficient; demand a fetched-this-session doc URL. +2. ANY recommendation for a bulk action (>1 record) that does NOT specify a + single-item test as a hard prerequisite. Format expected: "Test on first. Watch for . Confirm . + Then proceed to bulk." +3. ANY assertion about external behavior using confidence-language without + evidence: "should work", "the preferred path", "the cleanest approach", + "typically does X", "I believe", "go ahead and click", "recommended". When + these phrases appear with respect to an unverified third-party action, P1. +4. ANY change that touches files served at public URLs, renames directories, + modifies URLs, or changes plugin/theme behavior in WordPress contexts + without a database reference check (l1o_posts, l1o_postmeta, l1o_options). + +Default verdict on disagreement: when codex's analysis contradicts Claude's +claim about an external system, codex's verdict wins unless the user produces +explicit evidence (a fetched doc URL or verified test result) supporting +Claude. + +The file at the path provided below contains a recommendation generated by +Claude Code (an LLM). The user is considering acting on it. Your job is +independent verification. + +Begin by reading the recommendation: use your file-read or shell tool to +`cat` the file at the path listed at the end of this prompt. + +For every factual claim in the recommendation that touches an external +system (third-party UI button, API endpoint, plugin behavior, vendor +configuration, SaaS dashboard semantic, button label meaning, webhook +behavior, payment gateway behavior, email send behavior), produce one +verification entry in this exact format: + + CLAIM: + CITATION: + VERDICT: + REASONING: + P1: + +After listing all verification entries, output exactly one GATE line: + GATE: PASS — only if every claim is VERIFIED with a non-DOC_NOT_FOUND citation. + GATE: FAIL — N P1 findings. + +Hard rule: PASS-by-omission is not allowed. If you found no claims to +check, output `GATE: PASS — no external-system claims found.` instead. + +RECOMMENDATION FILE PATH: +EOF_VERIFY_PROMPT + +# Append the file path outside the heredoc — so it expands correctly and +# a future EOF_VERIFY_PROMPT in user content can't break the heredoc. +printf '%s\n' "$TMPRECOMMEND" >> "$TMPPROMPT" +``` + +4. **Run codex exec via stdin** (5-minute timeout): +```bash +codex exec - -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" < "$TMPPROMPT" | python3 -c " +import sys, json +for line in sys.stdin: + line = line.strip() + if not line: continue + try: + obj = json.loads(line) + t = obj.get('type','') + if t == 'item.completed' and 'item' in obj: + item = obj['item'] + itype = item.get('type','') + text = item.get('text','') + if itype == 'reasoning' and text: + print(f'[codex thinking] {text}') + print() + elif itype == 'agent_message' and text: + print(text) + elif itype == 'command_execution': + cmd = item.get('command','') + if cmd: print(f'[codex ran] {cmd}') + elif t == 'turn.completed': + usage = obj.get('usage',{}) + tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0) + if tokens: print(f'\ntokens used: {tokens}') + except: pass +" +``` + +5. **Parse the GATE line and present output:** + +``` +CODEX SAYS (verify): +════════════════════════════════════════════════════════════ + +════════════════════════════════════════════════════════════ +GATE: FAIL — N P1 findings. DO NOT ACT ON THE RECOMMENDATION AS WRITTEN. +Tokens: N | Est. cost: ~$X.XX +``` + +or: + +``` +GATE: PASS — all external-system claims verified with citations. +``` + +6. **Persist the verify result:** +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-verify","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N}' +``` + +Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL), +GATE ("pass" or "fail"), findings (count of P1 entries). + +7. **Clean up:** +```bash +rm -f "$TMPPROMPT" "$TMPRECOMMEND" "$TMPERR" +``` + +8. **No session continuity for verify.** Unlike Step 2C (Consult), verify is + one-shot. If a follow-up verify is needed for a related claim, the user + re-invokes `/codex verify` with new payload. + --- ## Model & Reasoning @@ -351,3 +774,6 @@ If token count is not available, display: `Tokens: unknown` - **5-minute timeout** on all Bash calls to codex (`timeout: 300000`). - **No double-reviewing.** If the user already ran `/review`, Codex provides a second independent opinion. Do not re-run Claude Code's own review. +- **Skepticism preamble is mandatory.** Every codex prompt — Review, Challenge, + Consult, Verify — must prepend the SKEPTICISM PREAMBLE block from the + "Claude Code Trust Boundaries" section. No exceptions.