From 647e27cefa12bcccdc88b1e5d8d9c70c0117e6b6 Mon Sep 17 00:00:00 2001 From: zazabap Date: Tue, 31 Mar 2026 16:13:42 +0000 Subject: [PATCH 01/16] docs: add design spec for proposed reductions Typst note Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198), 4 Tier 1a blocked issues (#379, #380, #888, #822), and 3 Tier 1b blocked issues (#892, #894, #890). Co-Authored-By: Claude Opus 4.6 (1M context) --- ...6-03-31-proposed-reductions-note-design.md | 125 ++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md diff --git a/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md b/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md new file mode 100644 index 000000000..9ce7a1a90 --- /dev/null +++ b/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md @@ -0,0 +1,125 @@ +# Design: Proposed Reduction Rules — Typst Verification Note + +**Date:** 2026-03-31 +**Goal:** Create a standalone Typst document with compiled PDF that formalizes 9 reduction rules from issue #770, resolving blockers for 7 incomplete issues and adding 2 high-leverage NP-hardness chain extensions. + +## Scope + +### 9 Reductions + +**Group 1 — NP-hardness proof chain extensions:** + +| Issue | Reduction | Impact | +|-------|-----------|--------| +| #973 | SubsetSum → Partition | Unlocks ~12 downstream problems | +| #198 | MinimumVertexCover → HamiltonianCircuit | Unlocks ~9 downstream problems | + +**Group 2 — Tier 1a blocked issues (fix + formalize):** + +| Issue | Reduction | Current blocker | +|-------|-----------|----------------| +| #379 | DominatingSet → MinMaxMulticenter | Decision vs optimization MDS model | +| #380 | DominatingSet → MinSumMulticenter | Same | +| #888 | OptimalLinearArrangement → RootedTreeArrangement | Witness extraction impossible for naive approach | +| #822 | ExactCoverBy3Sets → AcyclicPartition | Missing algorithm (unpublished reference) | + +**Group 3 — Tier 1b blocked issues (fix + formalize):** + +| Issue | Reduction | Current blocker | +|-------|-----------|----------------| +| #892 | VertexCover → HamiltonianPath | Depends on #198 (VC→HC) being resolved | +| #894 | VertexCover → PartialFeedbackEdgeSet | Missing Yannakakis 1978b paper | +| #890 | MaxCut → OptimalLinearArrangement | Placeholder algorithm, no actual construction | + +## Deliverables + +1. **`docs/paper/proposed-reductions.typ`** — standalone Typst document +2. **`docs/paper/proposed-reductions.pdf`** — compiled PDF checked into repo +3. **Updated GitHub issues** — #379, #380, #888, #822, #892, #894, #890 corrected with verified algorithms +4. **One PR** containing the note, PDF, and issue updates + +## Document Structure + +``` +Title: Proposed Reduction Rules — Verification Notes +Abstract: Motivation (NP-hardness gaps, blocked issues, impact analysis) + +§1 Notation & Conventions + - Standard symbols (G, V, E, w, etc.) + - Proof structure: Construction → Correctness (⟹/⟸) → Solution Extraction + - Overhead notation + +§2 NP-Hardness Chain Extensions + §2.1 SubsetSum → Partition (#973) + §2.2 MinimumVertexCover → HamiltonianCircuit (#198) + §2.3 VertexCover → HamiltonianPath (#892) + +§3 Graph Reductions + §3.1 MaxCut → OptimalLinearArrangement (#890) + §3.2 OptimalLinearArrangement → RootedTreeArrangement (#888) + +§4 Set & Domination Reductions + §4.1 DominatingSet → MinMaxMulticenter (#379) + §4.2 DominatingSet → MinSumMulticenter (#380) + §4.3 ExactCoverBy3Sets → AcyclicPartition (#822) + +§5 Feedback Set Reductions + §5.1 VertexCover → PartialFeedbackEdgeSet (#894) +``` + +## Per-Reduction Entry Format + +Each reduction follows the `reductions.typ` convention: + +1. **Theorem statement** — 1-3 sentence intuition, citation (e.g., `[GJ79, ND50]`) +2. **Proof** with three mandatory subsections: + - _Construction._ Numbered algorithm steps, all symbols defined before use + - _Correctness._ Bidirectional: (⟹) forward direction, (⟸) backward direction + - _Solution extraction._ How to map target solution back to source +3. **Overhead table** — target size fields as functions of source size fields +4. **Worked example** — concrete small instance, full construction steps, solution verification + +Mathematical notation uses Typst math mode: `$V$`, `$E$`, `$arrow.r.double$`, `$overline(x)$`, etc. + +## Research Plan for Blocked Issues + +For each blocked reduction: + +1. **Search** for the original reference via the citation in Garey & Johnson +2. **Reconstruct** the correct algorithm from the paper or from first principles +3. **Verify** correctness with a hand-worked example in the note +4. **Resolve** the blocker: + - #379/#380: Clarify that the reduction operates on the decision variant; note model alignment needed + - #888: Research Gavril 1977a gadget construction for forcing path-tree solutions + - #822: Research the acyclic partition reduction from G&J or construct from first principles + - #892: Chain through #198 (VC→HC→HP); detail the HC→HP modification + - #894: Search for Yannakakis 1978b or reconstruct the gadget + - #890: Research the Garey-Johnson-Stockmeyer 1976 construction + +If a reference is unavailable, construct a novel reduction and clearly mark it as such. + +## Typst Setup + +- Standalone document (not importing from `reductions.typ`) +- Uses: `ctheorems` for theorem/proof environments, `cetz` if diagrams needed +- Page: A4, New Computer Modern 10pt +- Theorem numbering: `Theorem 2.1`, `Theorem 2.2`, etc. +- No dependency on `examples.json` or `reduction_graph.json` (standalone) +- Compile command: `typst compile docs/paper/proposed-reductions.typ docs/paper/proposed-reductions.pdf` + +## Quality Criteria + +Each reduction must satisfy: +1. **Math equations correct** — all formulas verified against source paper or hand-derivation +2. **Provable correctness** — both directions of the proof are rigorous, no hand-waving +3. **Algorithm clear** — detailed enough that a developer can implement `reduce_to()` and `extract_solution()` directly from the proof +4. **From math to code verifiable** — overhead expressions match the construction, worked example can be used as a test case + +## PR Structure + +- Branch: `feat/proposed-reductions-note` +- Files: + - `docs/paper/proposed-reductions.typ` (new) + - `docs/paper/proposed-reductions.pdf` (new, compiled) +- No code changes — this is a documentation-only PR +- Issue updates done via `gh issue edit` (not in the PR diff) From aa5735372b0318515b98998c3a7919643cc7da1e Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 05:26:21 +0000 Subject: [PATCH 02/16] feat: add verify-reduction skill for mathematical verification of reductions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New skill: /verify-reduction End-to-end pipeline that takes a reduction rule issue and produces: 1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples) 2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5) 3. Lean 4 lemmas (non-trivial structural proofs required) Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR. Strict quality gates (zero tolerance): - No "trivial" category — every reduction ≥5000 checks - 7 mandatory Python sections including NO (infeasible) example - Non-trivial Lean required (rfl/omega tautologies rejected) - Zero hand-waving in Typst ("clearly", "obviously" → rejected) - Mandatory gap analysis: every proof claim must have a test - Self-review checklist with 20+ items across 4 categories Developed and validated through PR #975 (800K+ checks, 3 bugs caught) and tested on issues #868 (caught wrong example) and #841 (35K checks). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/CLAUDE.md | 1 + .claude/skills/verify-reduction/SKILL.md | 460 +++++++++++++++++++++++ 2 files changed, 461 insertions(+) create mode 100644 .claude/skills/verify-reduction/SKILL.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 27ce2026c..659825237 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -26,6 +26,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), and Lean lemmas (non-trivial required). Iterates until all checks pass. Creates worktree + PR. - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. ## Codex Compatibility diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md new file mode 100644 index 000000000..9adff5f62 --- /dev/null +++ b/.claude/skills/verify-reduction/SKILL.md @@ -0,0 +1,460 @@ +# Verify Reduction + +End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational and formal verification, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. + +Outputs: Typst proof entry, Python verification script, Lean lemmas — all at PR #975 quality level. + +**This skill is STRICT. Cutting corners produces reductions that get rejected during review.** + +## Invocation + +``` +/verify-reduction 868 # from a GitHub issue number +/verify-reduction SubsetSum Partition # from source/target names +``` + +## Prerequisites + +- `sympy` and `networkx` installed (`pip install sympy networkx`) +- Both source and target models must exist in the codebase (`pred show `) +- For Lean: `elan` installed with Lean 4 toolchain + +## Process + +```dot +digraph verify { + rankdir=TB; + "Parse input" [shape=box]; + "Create worktree" [shape=box]; + "Read issue" [shape=box]; + "Study models" [shape=box]; + "Write Typst proof" [shape=box]; + "Compile PDF" [shape=box]; + "Write Python script" [shape=box]; + "Run script" [shape=box]; + "All pass?" [shape=diamond]; + "Fix proof + script" [shape=box]; + "Enough checks?" [shape=diamond]; + "Enhance script" [shape=box]; + "Add Lean lemmas" [shape=box]; + "Lean builds?" [shape=diamond]; + "Fix Lean" [shape=box]; + "Self-review (HARSH)" [shape=box]; + "Create PR" [shape=box]; + "Report" [shape=doublecircle]; + + "Parse input" -> "Create worktree"; + "Create worktree" -> "Read issue"; + "Read issue" -> "Study models"; + "Study models" -> "Write Typst proof"; + "Write Typst proof" -> "Compile PDF"; + "Compile PDF" -> "Write Python script"; + "Write Python script" -> "Run script"; + "Run script" -> "All pass?"; + "All pass?" -> "Enough checks?" [label="yes"]; + "All pass?" -> "Fix proof + script" [label="no"]; + "Fix proof + script" -> "Run script"; + "Enough checks?" -> "Add Lean lemmas" [label="yes"]; + "Enough checks?" -> "Enhance script" [label="no"]; + "Enhance script" -> "Run script"; + "Add Lean lemmas" -> "Lean builds?"; + "Lean builds?" -> "Self-review (HARSH)" [label="yes"]; + "Lean builds?" -> "Fix Lean" [label="no"]; + "Fix Lean" -> "Lean builds?"; + "Self-review (HARSH)" -> "Create PR"; + "Create PR" -> "Report"; +} +``` + +--- + +## Step 0: Parse Input and Create Worktree + +### 0a. Parse input + +```bash +REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) +ISSUE= +ISSUE_JSON=$(gh issue view "$ISSUE" --json title,body,number) +``` + +### 0b. Create worktree + +```bash +REPO_ROOT=$(pwd) +BRANCH_JSON=$(python3 scripts/pipeline_worktree.py prepare-issue-branch \ + --issue "$ISSUE" --slug "verify--" --base main --format json) +BRANCH=$(printf '%s\n' "$BRANCH_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['branch'])") +WORKTREE_JSON=$(python3 scripts/pipeline_worktree.py enter --name "verify-$ISSUE" --format json) +WORKTREE_DIR=$(printf '%s\n' "$WORKTREE_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['worktree_dir'])") +cd "$WORKTREE_DIR" && git checkout "$BRANCH" +``` + +If already inside a worktree, skip creation and use the current branch. + +## Step 1: Read Issue and Study Models + +```bash +gh issue view "$ISSUE" --json title,body +pred show --json +pred show --json +``` + +Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. + +If the issue is incomplete, use WebSearch to find the original reference. + +## Step 2: Write Typst Proof + +Append to `docs/paper/proposed-reductions.typ` (or create a standalone `proposed-reductions-.typ` if the main file is on another branch). + +### MANDATORY structure + +```typst +== Source $arrow.r$ Target + +#theorem[...] + +#proof[ + _Construction._ ... + _Correctness._ + ($arrow.r.double$) ... + ($arrow.l.double$) ... + _Solution extraction._ ... +] + +*Overhead.* (table) + +*Feasible example.* (YES instance, fully worked) + +*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) +``` + +### HARD requirements + +- **Construction**: numbered steps, every symbol defined before first use +- **Correctness**: genuinely independent ⟹ and ⟸ — NOT "the converse is similar" +- **No hand-waving**: ZERO instances of "clearly", "obviously", "it is easy to see", "straightforward" +- **No scratch work**: ZERO instances of "Wait", "Hmm", "Actually", "Let me try" +- **TWO examples minimum**: one YES instance (satisfiable/feasible) and one NO instance (unsatisfiable/infeasible). Both fully worked with numerical verification. +- **Example must be non-trivial**: the example must have ≥ 3 variables/vertices. A 1-variable or 2-vertex example is too degenerate to catch bugs. + +Compile: +```bash +python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')" +``` + +## Step 3: Write Python Verification Script + +Create `docs/paper/verify-reductions/verify__.py`. + +### ALL 7 sections are MANDATORY + +```python +#!/usr/bin/env python3 +"""§X.Y Source → Target (#NNN): exhaustive + structural verification.""" +import itertools, sys +from sympy import symbols, simplify # Section 1 is NOT optional + +passed = failed = 0 + +def check(condition, msg=""): + global passed, failed + if condition: passed += 1 + else: failed += 1; print(f" FAIL: {msg}") + +def main(): + # === Section 1: Symbolic checks (sympy) — MANDATORY === + # At minimum: verify overhead formula symbolically for general n. + # For algebraic reductions: verify key identities. + # "The overhead is trivial" is NOT an excuse to skip this section. + + # === Section 2: Exhaustive forward + backward — MANDATORY === + # n ≤ 5 MINIMUM for all reduction types. + # n ≤ 6 for identity/algebraic reductions. + # Test ALL instances (or sample ≥ 300 per (n,m) if exhaustive is infeasible). + + # === Section 3: Solution extraction — MANDATORY === + # For EVERY feasible instance: extract source solution from target, + # verify it satisfies the source problem. + # This is the most commonly skipped section. DO NOT SKIP IT. + + # === Section 4: Overhead formula — MANDATORY === + # Build the actual target instance, measure its size fields, + # compare against the overhead formula. + + # === Section 5: Structural properties — MANDATORY === + # Even for "trivial" reductions, verify at least: + # - Target instance is well-formed (valid graph, valid formula, etc.) + # - No degenerate cases (empty subsets, isolated vertices, etc.) + # For gadget reductions: girth, connectivity, widget structure. + + # === Section 6: YES example from Typst — MANDATORY === + # Reproduce the exact numbers from the Typst proof's feasible example. + + # === Section 7: NO example from Typst — MANDATORY === + # Reproduce the exact numbers from the Typst proof's infeasible example. + # Verify that both source and target are infeasible. + + print(f"Source → Target: {passed} passed, {failed} failed") + return 1 if failed else 0 + +if __name__ == "__main__": + sys.exit(main()) +``` + +### Minimum check counts — STRICTLY ENFORCED + +| Type | Minimum checks | Minimum n | Strategy | +|------|---------------|-----------|----------| +| Identity (same graph, different objective) | 10,000 | n ≤ 6 | Exhaustive ALL graphs | +| Algebraic (padding, complement, De Morgan) | 10,000 | n ≤ 5 | Symbolic + exhaustive | +| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | Construction + formula + structural | +| Composition (A→B→C) | 10,000 | n ≤ 5 | Exhaustive per step | + +**There is no "trivial" category.** Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing. + +## Step 4: Run and Iterate (THE CRITICAL LOOP) + +```bash +python3 docs/paper/verify-reductions/verify__.py +``` + +### Iteration 1: First run + +Run the script. Fix any failures. Re-run. + +### Iteration 2: Check count audit — STRICT + +Print this table and fill it in honestly: + +``` +CHECK COUNT AUDIT: + Total checks: ___ (minimum: 5,000) + Forward direction: ___ instances tested (minimum: all n ≤ 5) + Backward direction: ___ instances tested (minimum: all n ≤ 5) + Solution extraction: ___ feasible instances tested + Overhead formula: ___ instances compared + Symbolic (sympy): ___ identities verified + YES example: verified? [yes/no] + NO example: verified? [yes/no] + Structural properties: ___ checks +``` + +If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. + +### Iteration 3: Gap analysis — MANDATORY + +List EVERY claim in the Typst proof. For each, state whether it's tested: + +``` +CLAIM TESTED BY +"Universe has 2n elements" Section 4: overhead formula ✓ +"Complementarity forces consistency" Section 3: extraction ✓ +"Clause subset is non-monochromatic" Section 2: forward direction ✓ +"No clause is all-true or all-false" Section 2: backward direction ✓ +... +``` + +If any claim has no test, add one. If it's untestable, document WHY. + +## Step 5: Add Lean Lemmas — STRICT REQUIREMENTS + +### HARD requirement: at least one NON-TRIVIAL lemma + +`n + m = n + m := rfl` does NOT count. A "non-trivial" lemma must satisfy at least one of: + +1. **Uses a Mathlib tactic beyond `rfl`/`omega`**: e.g., `simp`, `Finset.sum_union`, lattice operations +2. **States a structural property**: e.g., "complementarity subsets partition the universe into pairs" +3. **Formalizes the key invariant**: e.g., "NAE-satisfying ↔ 2-colorable hypergraph" + +If Mathlib genuinely lacks the infrastructure (no Hamiltonian paths, no DAG quotients), write the strongest lemma you CAN prove and document what WOULD be proved with better Mathlib support. + +### Examples of acceptable vs unacceptable Lean lemmas + +**Unacceptable (trivial arithmetic):** +```lean +theorem overhead (n m : ℕ) : n + m = n + m := rfl -- proves nothing +theorem universe (n : ℕ) : 2 * n = 2 * n := rfl -- proves nothing +``` + +**Acceptable (structural):** +```lean +-- Uses Mathlib's lattice theory to prove a graph-structural fact +theorem complement_partition (G : SimpleGraph V) : G ⊔ Gᶜ = ⊤ := sup_compl_eq_top + +-- Formalizes the key definition used in the proof +def IsNAESatisfying (assignment : Fin n → Bool) (clause : Finset (Fin n × Bool)) : Prop := + ¬(∀ l ∈ clause, ...) ∧ ¬(∀ l ∈ clause, ...) + +-- Proves the overhead formula requires actual computation +theorem overhead_nontrivial (n m : ℕ) (h : m > 0) : + 2 * n + (n + m) > 2 * n := by omega -- at least uses a hypothesis +``` + +### Build and verify + +```bash +cd docs/paper/verify-reductions/lean +export PATH="$HOME/.elan/bin:$PATH" +lake build +``` + +## Step 6: Self-Review — THE HARSHEST STEP + +Before declaring verified, run through this checklist. **Every item must be YES.** If any is NO, go back and fix it. + +### Typst proof + +- [ ] Compiles without errors +- [ ] Has Construction with numbered steps +- [ ] Has Correctness with independent ⟹ and ⟸ paragraphs +- [ ] Has Solution extraction +- [ ] Has Overhead table with formula +- [ ] Has YES example (feasible, ≥ 3 variables/vertices, fully worked) +- [ ] Has NO example (infeasible, fully worked with explanation of WHY infeasible) +- [ ] Zero instances of "clearly", "obviously", "it is easy to see" +- [ ] Zero instances of "Wait", "Hmm", "Actually", scratch work +- [ ] Every symbol defined before first use + +### Python script + +- [ ] 0 failures +- [ ] ≥ 5,000 total checks +- [ ] Section 1 (symbolic) present and non-empty +- [ ] Section 2 (exhaustive) covers n ≤ 5 minimum +- [ ] Section 3 (extraction) tests EVERY feasible instance +- [ ] Section 4 (overhead) compares formula vs actual for all tested instances +- [ ] Section 5 (structural) has at least one non-trivial check +- [ ] Section 6 (YES example) reproduces Typst example numbers exactly +- [ ] Section 7 (NO example) reproduces Typst infeasible example exactly +- [ ] Gap analysis performed — every Typst claim has a corresponding test + +### Lean + +- [ ] Builds without errors (warnings OK) +- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) +- [ ] Every `sorry` has a comment explaining WHY + +### Cross-consistency + +- [ ] The Python script's `reduce()` function implements EXACTLY the Typst construction +- [ ] The Python script's `extract_solution()` implements EXACTLY the Typst extraction +- [ ] The overhead formula in Python matches the Typst overhead table +- [ ] The examples in Python match the Typst examples (same numbers, same instances) + +## Step 7: Report + +``` +=== Verification Report: Source → Target (#NNN) === + +Typst proof: §X.Y + - Construction: ✓ (N steps) + - Correctness: ✓ (⟹ + ⟸) + - Extraction: ✓ + - Overhead: ✓ + - YES example: ✓ (N vars/vertices) + - NO example: ✓ (N vars/vertices, reason: ...) + +Python: verify__.py + - Checks: N passed, 0 failed + - Sections: 1(sympy) 2(exhaustive) 3(extraction) 4(overhead) 5(structural) 6(YES) 7(NO) + - Forward: exhaustive n ≤ K + - Backward: exhaustive n ≤ K + - Gap analysis: all claims covered + +Lean: ReductionProofs/Basic.lean (or .lean) + - Non-trivial lemmas: N + - Trivial lemmas: M + - Sorry: J (with justification) + +Bugs found: +Iterations: N rounds + +Verdict: VERIFIED / OPEN (with reason) +``` + +## Step 8: Commit, Create PR, Clean Up + +### 8a. Commit + +```bash +git add docs/paper/.typ docs/paper/verify-reductions/verify_*.py \ + docs/paper/verify-reductions/lean/ReductionProofs/*.lean +git add -f docs/paper/.pdf +git commit -m "docs: /verify-reduction # VERIFIED + +Typst: Construction + Correctness + Extraction + Overhead + YES/NO examples +Python: N checks, 0 failures (exhaustive n ≤ K, 7 sections) +Lean: M non-trivial lemmas + +Co-Authored-By: Claude Opus 4.6 (1M context) " +``` + +### 8b. Push and create PR + +```bash +git push -u origin "$BRANCH" +gh pr create --title "docs: verify reduction #" --body "..." +``` + +### 8c. Clean up worktree + +```bash +cd "$REPO_ROOT" +python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" +``` + +### 8d. Comment on issue + +```bash +gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." +``` + +## Quality Gates — NON-NEGOTIABLE + +A reduction is **VERIFIED** when ALL of these hold: + +- [ ] Typst compiles, has all mandatory sections including YES and NO examples +- [ ] Zero hand-waving language +- [ ] Python has 0 failures AND ≥ 5,000 checks +- [ ] All 7 Python sections present and non-empty +- [ ] Exhaustive n ≤ 5 minimum +- [ ] Solution extraction verified for all feasible instances +- [ ] Overhead formula matches actual construction +- [ ] Both Typst examples reproduced by script +- [ ] Gap analysis shows all Typst claims tested +- [ ] At least 1 non-trivial Lean lemma +- [ ] Cross-consistency between Typst and Python verified + +**If even ONE gate fails, the reduction is NOT verified. Go back and fix it.** + +## Common Mistakes — ZERO TOLERANCE + +| Mistake | Consequence | +|---------|-------------| +| Lean lemma is just `rfl` or `omega` on a tautology | Rejected — add structural lemma | +| No symbolic checks (Section 1 empty) | Rejected — add sympy verification | +| Only YES example, no NO example | Rejected — add infeasible instance | +| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | +| "Passed on first run" without gap analysis | Rejected — perform gap analysis | +| Example has < 3 variables | Rejected — too degenerate | +| Script has < 5,000 checks | Rejected — enhance exhaustive testing | +| Extraction not tested | Rejected — add Section 3 | +| Typst proof says "clearly" | Rejected — rewrite without hand-waving | + +## Integration + +- **After `add-rule`**: invoke `/verify-reduction` before creating PR +- **After `write-rule-in-paper`**: invoke to verify paper entry +- **During `review-structural`**: check verification script exists and passes +- **Before `issue-to-pr --execute`**: pre-validate the algorithm + +## Reference: PR #975 Quality Level + +Target quality defined by PR #975: +- 800,000+ total checks, 0 unexpected failures +- 3 bugs caught through iteration loop +- Every script has forward + backward + extraction + overhead + example +- Non-trivial Lean: `G ⊔ Gᶜ = ⊤` via `sup_compl_eq_top` +- Failures marked OPEN honestly with diagnosis From 83e39318d06456163edfc3434c4c579ed553e753 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 05:34:17 +0000 Subject: [PATCH 03/16] style: add YAML frontmatter + professional tone to verify-reduction skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added frontmatter (name, description) matching other skills' convention - Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP", "NON-NEGOTIABLE") to professional but firm language - All quality gates unchanged — same strictness, better presentation Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 9adff5f62..beecc5d46 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -1,11 +1,14 @@ +--- +name: verify-reduction +description: End-to-end verification of a reduction rule — generates Typst proof, Python verification script (≥5000 checks), and Lean lemmas, iterating until all checks pass. Creates worktree + PR. +--- + # Verify Reduction End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational and formal verification, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. Outputs: Typst proof entry, Python verification script, Lean lemmas — all at PR #975 quality level. -**This skill is STRICT. Cutting corners produces reductions that get rejected during review.** - ## Invocation ``` @@ -212,9 +215,9 @@ if __name__ == "__main__": | Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | Construction + formula + structural | | Composition (A→B→C) | 10,000 | n ≤ 5 | Exhaustive per step | -**There is no "trivial" category.** Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing. +Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing regardless of perceived simplicity. -## Step 4: Run and Iterate (THE CRITICAL LOOP) +## Step 4: Run and Iterate ```bash python3 docs/paper/verify-reductions/verify__.py @@ -258,7 +261,7 @@ CLAIM TESTED BY If any claim has no test, add one. If it's untestable, document WHY. -## Step 5: Add Lean Lemmas — STRICT REQUIREMENTS +## Step 5: Add Lean Lemmas ### HARD requirement: at least one NON-TRIVIAL lemma @@ -300,7 +303,7 @@ export PATH="$HOME/.elan/bin:$PATH" lake build ``` -## Step 6: Self-Review — THE HARSHEST STEP +## Step 6: Self-Review Before declaring verified, run through this checklist. **Every item must be YES.** If any is NO, go back and fix it. @@ -411,7 +414,7 @@ python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." ``` -## Quality Gates — NON-NEGOTIABLE +## Quality Gates A reduction is **VERIFIED** when ALL of these hold: @@ -427,9 +430,9 @@ A reduction is **VERIFIED** when ALL of these hold: - [ ] At least 1 non-trivial Lean lemma - [ ] Cross-consistency between Typst and Python verified -**If even ONE gate fails, the reduction is NOT verified. Go back and fix it.** +If any gate fails, go back and fix it before declaring the reduction verified. -## Common Mistakes — ZERO TOLERANCE +## Common Mistakes | Mistake | Consequence | |---------|-------------| From d09823d860b117f4248b5b1bdc6f13ecc74ac033 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 06:04:36 +0000 Subject: [PATCH 04/16] feat: adversarial multi-agent verification in verify-reduction skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces Lean-required gates with adversarial second agent: - Step 5: Adversary agent independently implements reduce() and extract_solution() from theorem statement only (not constructor's script) - Step 5c: Cross-comparison of both implementations on 1000 instances - Lean downgraded from required to optional - hypothesis property-based testing for n up to 50 - Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md): - Same agent writing proof + test is the #1 risk for AI verification - Two independent implementations agreeing > one + trivial Lean - Lean caught 0 bugs in PR #975; Python caught all 4 Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 302 +++++++++++++----- ...6-04-01-adversarial-verification-design.md | 145 +++++++++ 2 files changed, 372 insertions(+), 75 deletions(-) create mode 100644 docs/superpowers/specs/2026-04-01-adversarial-verification-design.md diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index beecc5d46..8981aa8d4 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -1,13 +1,13 @@ --- name: verify-reduction -description: End-to-end verification of a reduction rule — generates Typst proof, Python verification script (≥5000 checks), and Lean lemmas, iterating until all checks pass. Creates worktree + PR. +description: End-to-end verification of a reduction rule — generates Typst proof, constructor Python script (>=5000 checks), and adversary Python script (>=5000 independent checks), iterating until all checks pass. Creates worktree + PR. --- # Verify Reduction -End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational and formal verification, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. +End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational verification from two independent implementations, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. -Outputs: Typst proof entry, Python verification script, Lean lemmas — all at PR #975 quality level. +Outputs: Typst proof entry, constructor Python verification script, adversary Python verification script — all at PR #975 quality level. ## Invocation @@ -18,9 +18,9 @@ Outputs: Typst proof entry, Python verification script, Lean lemmas — all at P ## Prerequisites -- `sympy` and `networkx` installed (`pip install sympy networkx`) +- `sympy`, `networkx`, and `hypothesis` installed (`pip install sympy networkx hypothesis`) - Both source and target models must exist in the codebase (`pred show `) -- For Lean: `elan` installed with Lean 4 toolchain +- Optional: `elan` with Lean 4 toolchain (for formal lemmas — not required) ## Process @@ -33,16 +33,19 @@ digraph verify { "Study models" [shape=box]; "Write Typst proof" [shape=box]; "Compile PDF" [shape=box]; - "Write Python script" [shape=box]; - "Run script" [shape=box]; + "Write constructor script" [shape=box]; + "Run constructor" [shape=box]; "All pass?" [shape=diamond]; "Fix proof + script" [shape=box]; "Enough checks?" [shape=diamond]; "Enhance script" [shape=box]; - "Add Lean lemmas" [shape=box]; - "Lean builds?" [shape=diamond]; - "Fix Lean" [shape=box]; - "Self-review (HARSH)" [shape=box]; + "Dispatch adversary" [shape=box]; + "Run adversary" [shape=box]; + "Adversary pass?" [shape=diamond]; + "Cross-compare" [shape=box]; + "Verdict table" [shape=diamond]; + "Investigate discrepancy" [shape=box]; + "Self-review" [shape=box]; "Create PR" [shape=box]; "Report" [shape=doublecircle]; @@ -51,20 +54,23 @@ digraph verify { "Read issue" -> "Study models"; "Study models" -> "Write Typst proof"; "Write Typst proof" -> "Compile PDF"; - "Compile PDF" -> "Write Python script"; - "Write Python script" -> "Run script"; - "Run script" -> "All pass?"; + "Compile PDF" -> "Write constructor script"; + "Write constructor script" -> "Run constructor"; + "Run constructor" -> "All pass?"; "All pass?" -> "Enough checks?" [label="yes"]; "All pass?" -> "Fix proof + script" [label="no"]; - "Fix proof + script" -> "Run script"; - "Enough checks?" -> "Add Lean lemmas" [label="yes"]; + "Fix proof + script" -> "Run constructor"; + "Enough checks?" -> "Dispatch adversary" [label="yes"]; "Enough checks?" -> "Enhance script" [label="no"]; - "Enhance script" -> "Run script"; - "Add Lean lemmas" -> "Lean builds?"; - "Lean builds?" -> "Self-review (HARSH)" [label="yes"]; - "Lean builds?" -> "Fix Lean" [label="no"]; - "Fix Lean" -> "Lean builds?"; - "Self-review (HARSH)" -> "Create PR"; + "Enhance script" -> "Run constructor"; + "Dispatch adversary" -> "Run adversary"; + "Run adversary" -> "Adversary pass?"; + "Adversary pass?" -> "Cross-compare" [label="yes or no"]; + "Cross-compare" -> "Verdict table"; + "Verdict table" -> "Self-review" [label="verified"]; + "Verdict table" -> "Investigate discrepancy" [label="discrepancy"]; + "Investigate discrepancy" -> "Write constructor script"; + "Self-review" -> "Create PR"; "Create PR" -> "Report"; } ``` @@ -147,7 +153,7 @@ Compile: python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')" ``` -## Step 3: Write Python Verification Script +## Step 3: Write Constructor Python Verification Script Create `docs/paper/verify-reductions/verify__.py`. @@ -261,51 +267,164 @@ CLAIM TESTED BY If any claim has no test, add one. If it's untestable, document WHY. -## Step 5: Add Lean Lemmas +## Step 5: Adversary Verification -### HARD requirement: at least one NON-TRIVIAL lemma +The adversary step provides independent verification by a second agent that implements the reduction from scratch, using only the Typst proof as specification. This catches bugs that the constructor's own tests cannot find (confirmation bias, shared implementation errors). -`n + m = n + m := rfl` does NOT count. A "non-trivial" lemma must satisfy at least one of: +### 5a. Dispatch adversary agent -1. **Uses a Mathlib tactic beyond `rfl`/`omega`**: e.g., `simp`, `Finset.sum_union`, lattice operations -2. **States a structural property**: e.g., "complementarity subsets partition the universe into pairs" -3. **Formalizes the key invariant**: e.g., "NAE-satisfying ↔ 2-colorable hypergraph" +Launch a subagent with the following prompt template. The adversary must not see the constructor's Python script — it reads only the Typst proof. -If Mathlib genuinely lacks the infrastructure (no Hamiltonian paths, no DAG quotients), write the strongest lemma you CAN prove and document what WOULD be proved with better Mathlib support. +**Adversary prompt:** -### Examples of acceptable vs unacceptable Lean lemmas +```` +You are an adversary verifier for a mathematical reduction proof. -**Unacceptable (trivial arithmetic):** -```lean -theorem overhead (n m : ℕ) : n + m = n + m := rfl -- proves nothing -theorem universe (n : ℕ) : 2 * n = 2 * n := rfl -- proves nothing -``` +Your goal: independently implement and test the reduction described in the Typst +proof below, trying to find bugs. You have no access to the constructor's +implementation. Write your own from scratch based solely on the mathematical +specification. + +## Input + +Typst proof file: docs/paper/proposed-reductions.typ (section on ) +Issue: # + +## Your task + +1. Read the Typst proof carefully. Extract: + - The construction algorithm (how to build the target instance) + - The correctness argument (forward and backward directions) + - The solution extraction procedure + - The overhead formula + - The YES and NO examples -**Acceptable (structural):** -```lean --- Uses Mathlib's lattice theory to prove a graph-structural fact -theorem complement_partition (G : SimpleGraph V) : G ⊔ Gᶜ = ⊤ := sup_compl_eq_top +2. Create `docs/paper/verify-reductions/adversary__.py` with: + - Your own `reduce()` function implementing the construction from scratch + - Your own `extract_solution()` function implementing solution extraction + - Your own `is_feasible_source()` and `is_feasible_target()` validators + - Exhaustive testing for n ≤ 5 (forward + backward + extraction) + - Overhead formula verification + - Reproduction of both Typst examples (YES and NO) + - Property-based testing using `hypothesis` (at least 2 strategies) + - Minimum 5,000 total checks --- Formalizes the key definition used in the proof -def IsNAESatisfying (assignment : Fin n → Bool) (clause : Finset (Fin n × Bool)) : Prop := - ¬(∀ l ∈ clause, ...) ∧ ¬(∀ l ∈ clause, ...) +3. Use `hypothesis` for property-based testing. Example strategies: + ```python + from hypothesis import given, settings + from hypothesis import strategies as st --- Proves the overhead formula requires actual computation -theorem overhead_nontrivial (n m : ℕ) (h : m > 0) : - 2 * n + (n + m) > 2 * n := by omega -- at least uses a hypothesis + @given(st.lists(st.integers(0, 1), min_size=3, max_size=8)) + @settings(max_examples=500) + def test_roundtrip(assignment): + # Build source from assignment, reduce, check equivalence + ... + ``` + +4. Run the script. Report pass/fail counts and any bugs found. + +5. Do NOT read or import from `verify__.py`. Your implementation + must be independent. + +## Output format + +Print at the end: +``` +ADVERSARY: : N passed, M failed +BUGS FOUND: ``` +```` -### Build and verify +### 5b. Run adversary script ```bash -cd docs/paper/verify-reductions/lean -export PATH="$HOME/.elan/bin:$PATH" -lake build +python3 docs/paper/verify-reductions/adversary__.py ``` +### 5c. Cross-comparison + +After both scripts have run, compare their `reduce()` outputs on a shared set of instances. Create and run this comparison inline: + +```python +#!/usr/bin/env python3 +"""Cross-compare constructor and adversary implementations.""" +import sys +sys.path.insert(0, "docs/paper/verify-reductions") + +from verify__ import reduce as constructor_reduce +from adversary__ import reduce as adversary_reduce + +# Also import feasibility checkers from both sides +from verify__ import is_feasible_source, is_feasible_target +from adversary__ import ( + is_feasible_source as adv_is_feasible_source, + is_feasible_target as adv_is_feasible_target, +) + +def normalize(target_instance): + """Convert target instance to a canonical hashable form for comparison. + Adapt this to the specific target problem type.""" + # For graph problems: sorted edge list + vertex count + # For formula problems: sorted clause list + # For set problems: frozenset of frozensets + return tuple(sorted(str(x) for x in target_instance)) + +agree = disagree = 0 +feasibility_mismatch = 0 + +# Generate shared test instances (adapt to source problem type) +import itertools +for n in range(3, 6): + for instance in generate_all_instances(n): # problem-specific generator + c_target = constructor_reduce(instance) + a_target = adversary_reduce(instance) + + # Compare structural equivalence + if normalize(c_target) == normalize(a_target): + agree += 1 + else: + disagree += 1 + print(f" DISAGREE on instance {instance}") + print(f" Constructor: {c_target}") + print(f" Adversary: {a_target}") + + # Compare feasibility verdicts + c_feas = is_feasible_target(c_target) + a_feas = adv_is_feasible_target(a_target) + if c_feas != a_feas: + feasibility_mismatch += 1 + print(f" FEASIBILITY MISMATCH on {instance}: " + f"constructor={c_feas}, adversary={a_feas}") + +print(f"\nCross-comparison: {agree} agree, {disagree} disagree, " + f"{feasibility_mismatch} feasibility mismatches") +if disagree > 0 or feasibility_mismatch > 0: + print("ACTION REQUIRED: investigate discrepancies before proceeding") + sys.exit(1) +``` + +Adapt the `normalize()` function and instance generator to the specific source/target problem types. + +### 5d. Verdict criteria + +Use this table to determine the outcome. All three signals must be considered together: + +| Constructor | Adversary | Cross-comparison | Verdict | Action | +|-------------|-----------|-----------------|---------|--------| +| Pass | Pass | Agree | Verified | Proceed to Step 6 | +| Pass | Pass | Disagree | Suspect | Both produce valid but structurally different targets. Investigate whether both are correct reductions (e.g., isomorphic gadgets) or one has a latent bug. Resolve before proceeding. | +| Pass | Fail | Agree | Adversary bug | Review adversary script for implementation errors. If adversary misread the Typst spec, fix spec clarity and re-run adversary. | +| Pass | Fail | Disagree | Suspect | Constructor may have a bug masked by its own tests. Investigate the disagreeing instances manually. | +| Fail | Pass | Agree | Constructor bug | Fix constructor script, re-run from Step 4. | +| Fail | Pass | Disagree | Suspect | Investigate both — the agreeing instances may be coincidental. | +| Fail | Fail | Agree | Proof bug | The reduction itself is likely wrong. Re-examine the Typst proof. Return to Step 2. | +| Fail | Fail | Disagree | Proof bug | Same as above — shared failures with structural disagreement indicate a fundamental problem. | + +When the verdict is "Suspect," the resolution is to manually inspect the disagreeing instances until the root cause is identified, then loop back to the appropriate step. + ## Step 6: Self-Review -Before declaring verified, run through this checklist. **Every item must be YES.** If any is NO, go back and fix it. +Before declaring verified, run through this checklist. Every item must be YES. If any is NO, go back and fix it. ### Typst proof @@ -320,7 +439,7 @@ Before declaring verified, run through this checklist. **Every item must be YES. - [ ] Zero instances of "Wait", "Hmm", "Actually", scratch work - [ ] Every symbol defined before first use -### Python script +### Constructor Python script - [ ] 0 failures - [ ] ≥ 5,000 total checks @@ -333,18 +452,31 @@ Before declaring verified, run through this checklist. **Every item must be YES. - [ ] Section 7 (NO example) reproduces Typst infeasible example exactly - [ ] Gap analysis performed — every Typst claim has a corresponding test -### Lean +### Adversary Python script -- [ ] Builds without errors (warnings OK) -- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) -- [ ] Every `sorry` has a comment explaining WHY +- [ ] 0 failures +- [ ] ≥ 5,000 total checks +- [ ] Implemented independently (no imports from constructor script) +- [ ] Uses `hypothesis` property-based testing (at least 2 strategies) +- [ ] Exhaustive testing for n ≤ 5 +- [ ] Reproduces both Typst examples ### Cross-consistency -- [ ] The Python script's `reduce()` function implements EXACTLY the Typst construction -- [ ] The Python script's `extract_solution()` implements EXACTLY the Typst extraction -- [ ] The overhead formula in Python matches the Typst overhead table -- [ ] The examples in Python match the Typst examples (same numbers, same instances) +- [ ] Cross-comparison script ran with 0 disagreements +- [ ] Cross-comparison script ran with 0 feasibility mismatches +- [ ] The constructor script's `reduce()` implements the Typst construction +- [ ] The adversary script's `reduce()` implements the Typst construction independently +- [ ] The overhead formula in both scripts matches the Typst overhead table +- [ ] The examples in both scripts match the Typst examples (same numbers, same instances) + +### Lean (optional) + +If Lean lemmas were added: + +- [ ] Builds without errors (warnings OK) +- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) +- [ ] Every `sorry` has a comment explaining WHY ## Step 7: Report @@ -359,14 +491,26 @@ Typst proof: §X.Y - YES example: ✓ (N vars/vertices) - NO example: ✓ (N vars/vertices, reason: ...) -Python: verify__.py +Constructor: verify__.py - Checks: N passed, 0 failed - Sections: 1(sympy) 2(exhaustive) 3(extraction) 4(overhead) 5(structural) 6(YES) 7(NO) - Forward: exhaustive n ≤ K - Backward: exhaustive n ≤ K - Gap analysis: all claims covered -Lean: ReductionProofs/Basic.lean (or .lean) +Adversary: adversary__.py + - Checks: N passed, 0 failed + - Property-based: M hypothesis examples + - Forward: exhaustive n ≤ K + - Backward: exhaustive n ≤ K + - Bugs found: + +Cross-comparison: + - Instances compared: N + - Structural agreement: N/N + - Feasibility agreement: N/N + +Lean: .lean (optional) - Non-trivial lemmas: N - Trivial lemmas: M - Sorry: J (with justification) @@ -382,14 +526,16 @@ Verdict: VERIFIED / OPEN (with reason) ### 8a. Commit ```bash -git add docs/paper/.typ docs/paper/verify-reductions/verify_*.py \ - docs/paper/verify-reductions/lean/ReductionProofs/*.lean +git add docs/paper/.typ \ + docs/paper/verify-reductions/verify_*.py \ + docs/paper/verify-reductions/adversary_*.py git add -f docs/paper/.pdf git commit -m "docs: /verify-reduction # VERIFIED Typst: Construction + Correctness + Extraction + Overhead + YES/NO examples -Python: N checks, 0 failures (exhaustive n ≤ K, 7 sections) -Lean: M non-trivial lemmas +Constructor: N checks, 0 failures (exhaustive n ≤ K, 7 sections) +Adversary: M checks, 0 failures (independent impl + hypothesis) +Cross-comparison: all instances agree Co-Authored-By: Claude Opus 4.6 (1M context) " ``` @@ -420,15 +566,18 @@ A reduction is **VERIFIED** when ALL of these hold: - [ ] Typst compiles, has all mandatory sections including YES and NO examples - [ ] Zero hand-waving language -- [ ] Python has 0 failures AND ≥ 5,000 checks -- [ ] All 7 Python sections present and non-empty -- [ ] Exhaustive n ≤ 5 minimum +- [ ] Constructor Python has 0 failures and ≥ 5,000 checks +- [ ] All 7 constructor Python sections present and non-empty +- [ ] Adversary Python has 0 failures and ≥ 5,000 checks +- [ ] Adversary Python uses `hypothesis` property-based testing +- [ ] Adversary implementation is independent (no shared code with constructor) +- [ ] Exhaustive n ≤ 5 minimum (both scripts) - [ ] Solution extraction verified for all feasible instances - [ ] Overhead formula matches actual construction -- [ ] Both Typst examples reproduced by script +- [ ] Both Typst examples reproduced by both scripts - [ ] Gap analysis shows all Typst claims tested -- [ ] At least 1 non-trivial Lean lemma -- [ ] Cross-consistency between Typst and Python verified +- [ ] Cross-comparison shows 0 disagreements and 0 feasibility mismatches +- [ ] Cross-consistency between Typst, constructor, and adversary verified If any gate fails, go back and fix it before declaring the reduction verified. @@ -436,15 +585,18 @@ If any gate fails, go back and fix it before declaring the reduction verified. | Mistake | Consequence | |---------|-------------| -| Lean lemma is just `rfl` or `omega` on a tautology | Rejected — add structural lemma | +| Adversary imports from constructor script | Rejected — must be independent | +| No property-based testing in adversary | Rejected — add hypothesis strategies | | No symbolic checks (Section 1 empty) | Rejected — add sympy verification | | Only YES example, no NO example | Rejected — add infeasible instance | | n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | | "Passed on first run" without gap analysis | Rejected — perform gap analysis | | Example has < 3 variables | Rejected — too degenerate | -| Script has < 5,000 checks | Rejected — enhance exhaustive testing | +| Either script has < 5,000 checks | Rejected — enhance exhaustive testing | | Extraction not tested | Rejected — add Section 3 | | Typst proof says "clearly" | Rejected — rewrite without hand-waving | +| Cross-comparison skipped | Rejected — run comparison script | +| Disagreements dismissed without investigation | Rejected — resolve every discrepancy | ## Integration @@ -459,5 +611,5 @@ Target quality defined by PR #975: - 800,000+ total checks, 0 unexpected failures - 3 bugs caught through iteration loop - Every script has forward + backward + extraction + overhead + example -- Non-trivial Lean: `G ⊔ Gᶜ = ⊤` via `sup_compl_eq_top` +- Two independent implementations agreeing on all test instances - Failures marked OPEN honestly with diagnosis diff --git a/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md b/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md new file mode 100644 index 000000000..c42684f08 --- /dev/null +++ b/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md @@ -0,0 +1,145 @@ +# Design: Adversarial Multi-Agent Reduction Verification + +**Date:** 2026-04-01 +**Goal:** Strengthen `/verify-reduction` with an adversarial second agent that independently verifies the reduction, catching systematic errors that the single-agent approach misses. + +## Problem Statement + +The current verification flow has a fundamental weakness: the same AI agent writes both the mathematical proof AND the verification script. If the agent systematically misunderstands the reduction (e.g., wrong variable mapping, wrong extraction direction), it will write a wrong proof and a wrong script that agree with each other. + +Evidence from PR #975: +- Python caught 4 bugs; Typst proofs caught 0; Lean caught 0 +- All bugs were construction errors or formula mistakes +- No bug was caused by "proof is right but script is wrong" — they were always "both are wrong in the same way" + +The adversarial approach addresses this by having two independent agents verify the same reduction. + +## Architecture + +``` +/verify-reduction + │ + ├── Steps 0-2: Parse input, create worktree, read issue, write Typst proof + │ + ├── Step 3: Constructor Agent + │ └── Writes verify__.py (7 sections, ≥5000 checks) + │ + ├── Step 4: Run constructor script, iterate until 0 failures + │ + ├── Step 4b: Adversary Agent (dispatched as subagent with worktree isolation) + │ ├── Input: theorem statement + construction steps ONLY + │ │ (stripped of constructor's script, internal reasoning, examples) + │ ├── Writes: adversary__.py + │ │ ├── Independent reduce() implementation + │ │ ├── Independent extract_solution() + │ │ ├── Property-based testing (hypothesis, n up to 50) + │ │ ├── Targeted counterexample search + │ │ └── ≥5000 independent checks + │ └── Output: pass/fail + counterexamples + │ + ├── Step 4c: Cross-Comparison + │ ├── constructor.reduce(x) vs adversary.reduce(x) for 1000 instances + │ ├── constructor.check(adversary.reduce(x)) vs adversary.check(constructor.reduce(x)) + │ └── Disagreement → diagnose (real bug vs script bug) + │ + ├── Steps 5-8: Self-review, report, commit, PR + │ + └── Final PR includes BOTH scripts + cross-comparison results +``` + +## Adversary Agent Design + +### What the adversary receives + +A stripped prompt containing ONLY: +- The theorem statement (1-3 sentences) +- The construction steps (numbered) +- The extraction procedure +- The overhead formula + +The adversary does NOT receive: +- The constructor's Python script +- The Typst proof's correctness argument +- The worked examples +- Any internal reasoning from Step 3 + +### What the adversary produces + +`adversary__.py` with: + +1. **Independent `reduce()` function** — implemented from the construction steps, without seeing the constructor's implementation +2. **Independent `extract_solution()` function** — same +3. **Exhaustive testing** — forward + backward for n ≤ 5 (matching constructor's minimum) +4. **Property-based testing** — `hypothesis` with `@given(st.integers(min_value=2, max_value=50))` for random instances +5. **Targeted counterexample search**: + - All-True / all-False assignments + - Single-variable instances + - Maximum and minimum clause/subset sizes + - Instances where source is barely feasible (one satisfying assignment) + - Instances where source is barely infeasible (off by one element) +6. **Diagnostic output** — any counterexample prints the full instance, expected result, actual result + +### Cross-comparison + +After both scripts pass independently, the orchestrator runs: + +```python +# Import both implementations +from verify_source_target import reduce as constructor_reduce, check as constructor_check +from adversary_source_target import reduce as adversary_reduce, check as adversary_check + +# 1. Do they produce the same target instance? +for instance in random_instances(1000): + t1 = constructor_reduce(instance) + t2 = adversary_reduce(instance) + assert t1 == t2, f"Reductions disagree on {instance}" + +# 2. Cross-check feasibility +for instance in random_instances(1000): + assert constructor_check(adversary_reduce(instance)) == adversary_check(constructor_reduce(instance)) +``` + +## Verdict Criteria + +| Constructor | Adversary | Cross-check | Verdict | +|-------------|-----------|-------------|---------| +| 0 fail | 0 fail | Agreement | **VERIFIED** | +| 0 fail | Counterexample found | — | Real bug (constructor has blind spot) | +| Fail | — | — | Bug found before adversary needed | +| 0 fail | 0 fail | Disagreement | One implementation is wrong — diagnose | + +## Why Lean is Dropped + +Lean's contribution to this project: +- **Caught bugs:** 0 out of 4 +- **Useful proofs:** `G ⊔ Gᶜ = ⊤` (1 genuinely meaningful theorem) +- **Trivial proofs:** `n + m = n + m`, `14m + (2m-n) + 2nK = 16m-n+2nK` (add no confidence) +- **Infrastructure gap:** No `ReduceTo`, `NAESatisfiability`, `SetSplitting`, or reduction framework in Mathlib +- **ROI:** Negative. Every Lean lemma requires duplicating problem definitions that already exist in Rust. + +The adversarial Python approach provides stronger evidence: +- Two independent implementations agreeing > one implementation + arithmetic Lean proofs +- Property-based testing catches scale-dependent bugs that Lean can't +- No Mathlib infrastructure needed + +Lean remains **optional** (nice for credibility) but is removed from the required quality gates. + +## Changes to verify-reduction Skill + +| Step | Before | After | +|------|--------|-------| +| Step 3 | Constructor writes script | Unchanged | +| Step 4 | Iterate until 0 failures | Unchanged | +| **Step 4b** | — | Adversary agent writes independent script | +| **Step 4c** | — | Cross-comparison | +| Step 5 | Lean required | Lean optional | +| Quality gates | 1 script ≥5000 checks | 2 independent scripts ≥5000 each + cross-comparison | +| Deliverables | 3 files (Typst + Python + Lean) | 3 files (Typst + constructor Python + adversary Python) | + +## Cost + +- Compute: ~2x (two agents instead of one) +- Time: ~1.5x (adversary runs in parallel via subagent) +- Complexity: moderate (orchestrator manages two outputs) + +Worth it: a wrong reduction that passes review is far more expensive than 2x compute. From 518824a273370a910169a5cf8466581d31adef6d Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 07:55:09 +0000 Subject: [PATCH 05/16] docs: design spec for verify-reduction skill enhancements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Typst↔Python auto-matching, test vectors JSON for downstream consumption by add-rule and review-pipeline, adversary tailoring by reduction type, compositional verification via pred CLI. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...01-verify-reduction-enhancements-design.md | 174 ++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md diff --git a/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md b/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md new file mode 100644 index 000000000..1ea071e75 --- /dev/null +++ b/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md @@ -0,0 +1,174 @@ +# Verify-Reduction Skill Enhancements + +**Date:** 2026-04-01 +**Status:** Draft +**Context:** Lessons from running `/verify-reduction` on 3 reductions (#841 NAE→SS, #973 SS→Partition, #198 VC→HC) and comparison with PR #975's verification approach. + +## Problem + +The verify-reduction skill (PR #979) produces high-quality mathematical verification (Typst proofs + dual Python scripts), but: + +1. **Typst ↔ Python matching is manual.** The human writes examples in both places and hopes they agree. No automated cross-check. +2. **Gap analysis is manual.** The skill asks the agent to list every Typst claim and map it to a test — this is done as a text block, not verified programmatically. +3. **Verification is disconnected from Rust implementation.** The Python scripts verify the math, but when `/add-rule` implements the Rust code, it re-reads the issue from scratch. Verified artifacts aren't consumed downstream. +4. **No compositional testing.** The strongest verification — testing via alternative `pred` reduction paths — only becomes possible after the Rust implementation exists. The current skill doesn't set this up. + +## Design + +### Pipeline Integration + +The verify-reduction skill is a **pre-verification gate** in a larger pipeline: + +``` +Issue → /verify-reduction → /add-rule → /review-pipeline + (Typst + Python) (Rust impl) (agentic test) +``` + +The Python `reduce()` function is the verified spec. `/add-rule` translates it to Rust. `/review-pipeline`'s agentic test (`pred reduce`, `pred solve`) confirms the Rust matches. + +### Enhancement 1: Typst ↔ Python Auto-Matching + +#### Example-level (mandatory) + +The constructor script exports test vectors to a JSON sidecar file: + +``` +docs/paper/verify-reductions/test_vectors__.json +``` + +Contents: +```json +{ + "source": "SubsetSum", + "target": "Partition", + "yes_instance": { + "input": {"sizes": [3, 5, 7, 1, 4], "target": 8}, + "output": {"sizes": [3, 5, 7, 1, 4, 4]}, + "source_feasible": true, + "target_feasible": true, + "source_solution": [0, 1, 1, 0, 0], + "extracted_solution": [0, 1, 1, 0, 0] + }, + "no_instance": { + "input": {"sizes": [3, 7, 11], "target": 5}, + "output": {"sizes": [3, 7, 11, 11]}, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "num_elements": "num_elements + 1" + } +} +``` + +A new validation step (Step 4.5) loads both the test vectors JSON and the Typst file, then checks that key numerical values from the JSON (input sizes, targets, output sizes, feasibility verdicts) appear in the Typst YES/NO example sections. The check is substring-based on the rendered numbers — not a full Typst parser. This catches divergence between the proof and the scripts (e.g., Typst says `S = {3, 5, 7}` but Python tests `S = {3, 5, 8}`). + +The same JSON is consumed downstream: +- `/add-rule` reads it to generate Rust `#[test]` closed-loop cases +- `/review-pipeline`'s agentic test uses it for `pred reduce`/`pred solve` verification + +#### Claim-level (best-effort) + +The constructor script emits structured claim tags: + +```python +claim("overhead_universe_size", "2*n", verified=True) +claim("forward_case2", "T + d = Sigma - T", verified=True) +claim("extraction_opposite_side", "sigma < 2T => opposite side from padding", verified=True) +``` + +Implementation: a `claims` list accumulated during `main()`, exported alongside the test vectors. The self-review step checks: every claim keyword should plausibly map to a Typst proof section. Claims without `verified=True` get flagged. This replaces the manual gap analysis table. + +### Enhancement 2: Adversary Tailoring by Reduction Type + +The adversary prompt is currently generic. Tailor it by reduction type: + +| Type | Adversary Focus | Rationale | +|------|----------------|-----------| +| Identity (same graph, different objective) | Exhaustive enumeration, edge-case configs | Bugs hide in subtle objective differences | +| Algebraic (padding, complement, case split) | Symbolic case boundaries, off-by-one in padding | SS→Partition had 3 cases; boundary conditions are error-prone | +| Gadget (widget construction) | Widget structure, traversal patterns, connectivity | VC→HC: the 3 traversal patterns and cross-edge positions are the core invariant | + +The skill detects type from the issue description (gadget keywords: "widget", "component", "gadget"; algebraic keywords: "padding", "complement", "case"; identity: everything else) and adjusts the adversary prompt template. + +### Enhancement 3: Downstream Integration with add-rule + +When `/add-rule` is invoked for a reduction that has verified artifacts: + +1. **Check for existing verification:** Look for `docs/paper/verify-reductions/verify__.py` and `test_vectors__.json`. +2. **Read Python `reduce()` as pseudocode:** The verified Python function is the spec for the Rust `reduce_to()` implementation. The agent translates it directly. +3. **Read overhead from test vectors JSON:** The `overhead` field maps directly to `#[reduction(overhead = {...})]` expressions. +4. **Generate Rust tests from test vectors:** The YES/NO instances become closed-loop test cases: construct source → reduce → solve target → extract → verify. +5. **Read Typst proof for documentation:** The proof's Construction section becomes the doc comment on the Rust impl. + +This eliminates re-derivation from the issue. The verified artifacts are the single source of truth. + +### Enhancement 4: Compositional Verification in review-pipeline + +After the Rust implementation exists, review-pipeline's agentic test gains a compositional check: + +1. Load test vectors from `test_vectors__.json` +2. For each test vector, run `pred solve ` via the new path +3. If an alternative path exists (e.g., both `NAE→SS→ILP` and `NAE→ILP`), run via both and compare +4. Disagreement = bug in either the new reduction or the test vector + +This is the strongest verification because it tests the actual Rust code through independent code paths. It happens naturally in review-pipeline's existing agentic test step — we just need the test vectors file to drive it. + +## Changes to SKILL.md + +### New Step 4.5: Auto-Matching Validation + +After the constructor passes (Step 4), before the adversary (Step 5): + +1. Constructor exports `test_vectors__.json` with YES/NO instances, overhead, and claims +2. Parse the Typst proof for numerical values in example sections +3. Verify the Typst examples match the JSON test vectors +4. Verify all `claim()` tags have `verified=True` +5. Flag any untested claims + +### Updated Step 5: Typed Adversary Prompt + +The adversary prompt template includes a reduction-type section: + +- **Identity reductions:** "Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration." +- **Algebraic reductions:** "Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently." +- **Gadget reductions:** "Focus on widget structure invariants. Verify each traversal pattern independently. Test that interior vertices have no external edges. Check connectivity after removing widgets." + +### Updated Step 8: Downstream Artifacts + +The commit includes the test vectors JSON alongside the Typst and Python files: + +```bash +git add docs/paper/verify-reductions/test_vectors__.json +``` + +The PR description notes: "Test vectors available for `/add-rule` to consume when implementing the Rust reduction." + +### Updated Integration Section + +``` +- **Before `/add-rule`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-rule`**: reads Python `reduce()` as pseudocode, overhead from JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances, compositional check via alternative paths if available +``` + +## Non-Goals + +- **No Rust code generation.** verify-reduction stays in Typst + Python. Rust translation is add-rule's job. +- **No manifest format.** The Python script + test vectors JSON are the artifacts. No intermediate spec language. +- **No Lean changes.** Lean remains optional and orthogonal to this enhancement. + +## Files Changed + +1. `.claude/skills/verify-reduction/SKILL.md` — new Step 4.5, updated Step 5 and Step 8 +2. `.claude/skills/add-rule/SKILL.md` — add "check for existing verification artifacts" to Step 1 +3. `.claude/skills/review-pipeline/SKILL.md` — add compositional test vector check to agentic test step (or document as optional enhancement) + +## Success Criteria + +After implementing these enhancements, running `/verify-reduction` → `/add-rule` → `/review-pipeline` on a new reduction should: + +1. Produce a test vectors JSON consumed by both add-rule and review-pipeline +2. Auto-detect Typst ↔ Python example divergence (if introduced deliberately as a test) +3. Generate Rust tests from the same test vectors that the Python scripts verified +4. Catch Rust implementation bugs via `pred reduce`/`pred solve` on those test vectors From 05d06ac86199acdfd1c715dcbf89ab6dbcaedd7e Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:15:23 +0000 Subject: [PATCH 06/16] docs: implementation plan for verify-reduction enhancements 5 tasks: update verify-reduction (Step 4.5 auto-matching, Step 5 typed adversary, Step 8 downstream artifacts), create add-reduction skill, register in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...026-04-01-verify-reduction-enhancements.md | 466 ++++++++++++++++++ 1 file changed, 466 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md diff --git a/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md b/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md new file mode 100644 index 000000000..debd6c62f --- /dev/null +++ b/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md @@ -0,0 +1,466 @@ +# Verify-Reduction Enhancements Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Enhance `/verify-reduction` with auto-matching, test vector export, and typed adversary prompts. Create new `/add-reduction` skill that consumes verified artifacts. Keep original `/add-rule` and `/review-pipeline` unchanged. + +**Architecture:** Three file changes — update verify-reduction SKILL.md (new Step 4.5, updated Step 5/8), create add-reduction SKILL.md (fork of add-rule that reads Python reduce() + test vectors JSON), register add-reduction in CLAUDE.md. + +**Tech Stack:** Markdown skill definitions only — no code changes. + +--- + +### Task 1: Update verify-reduction SKILL.md — Add Step 4.5 (Auto-Matching + Test Vectors Export) + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +The new Step 4.5 goes between the current Step 4 (Run and Iterate) and Step 5 (Adversary Verification). It adds three things: (a) constructor exports test vectors JSON, (b) Typst ↔ JSON cross-check, (c) structured claim tags replace manual gap analysis. + +- [ ] **Step 1: Insert new Step 4.5 after the current "Iteration 3: Gap analysis" block** + +In `.claude/skills/verify-reduction/SKILL.md`, find the line `If any claim has no test, add one. If it's untestable, document WHY.` (end of Step 4, Iteration 3). After it, insert: + +```markdown +### Iteration 4: Export test vectors and validate Typst matching — MANDATORY + +After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption: + +**File:** `docs/paper/verify-reductions/test_vectors__.json` + +```json +{ + "source": "", + "target": "", + "issue": , + "yes_instance": { + "input": { ... }, + "output": { ... }, + "source_feasible": true, + "target_feasible": true, + "source_solution": [ ... ], + "extracted_solution": [ ... ] + }, + "no_instance": { + "input": { ... }, + "output": { ... }, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "field_name": "expression using source getters" + }, + "claims": [ + {"tag": "overhead_universe_size", "formula": "2*n", "verified": true}, + {"tag": "forward_correctness", "description": "NAE-sat implies valid splitting", "verified": true} + ] +} +``` + +Add this export at the end of `main()` in the constructor script: + +```python +import json + +test_vectors = { + "source": "", + "target": "", + "issue": , + "yes_instance": { ... }, # from Section 6 + "no_instance": { ... }, # from Section 7 + "overhead": { ... }, # from Section 1/4 + "claims": claims_list, # accumulated claim() calls +} + +with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: + json.dump(test_vectors, f, indent=2) +print(f"Test vectors exported to test_vectors__.json") +``` + +**Typst ↔ JSON cross-check:** + +After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes, feasibility verdicts), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text, not a full parser: + +```python +typst_text = open("").read() +for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing YES value: {val}" +for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing NO value: {val}" +``` + +If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. + +**Structured claims (best-effort replacement for manual gap analysis):** + +Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: + +```python +claims_list = [] + +def claim(tag, formula_or_desc, verified=True): + claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) +``` + +Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). +``` + +- [ ] **Step 2: Verify the edit is well-placed** + +Read the file and confirm Step 4.5 appears between Step 4 and Step 5 with correct markdown heading level. + +- [ ] **Step 3: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: add Step 4.5 to verify-reduction — test vectors export + Typst auto-matching" +``` + +--- + +### Task 2: Update verify-reduction SKILL.md — Typed Adversary Prompt (Step 5) + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +- [ ] **Step 1: Add reduction-type detection and tailored instructions to the adversary prompt** + +In `.claude/skills/verify-reduction/SKILL.md`, find the adversary prompt template in Step 5a (the block starting `You are an adversary verifier`). Before the `## Your task` section of the adversary prompt, insert: + +````markdown +## Reduction type + +Detect the reduction type from the Typst proof and tailor your testing focus: + +- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. + +- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. + +- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. +```` + +- [ ] **Step 2: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: add typed adversary prompts to verify-reduction (identity/algebraic/gadget)" +``` + +--- + +### Task 3: Update verify-reduction SKILL.md — Downstream Artifacts (Step 8) and Integration + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +- [ ] **Step 1: Update Step 8a commit to include test vectors JSON** + +Find the `git add` block in Step 8a. Change it to: + +```bash +git add docs/paper/.typ \ + docs/paper/verify-reductions/verify_*.py \ + docs/paper/verify-reductions/adversary_*.py \ + docs/paper/verify-reductions/test_vectors_*.json +git add -f docs/paper/.pdf +``` + +- [ ] **Step 2: Update the Integration section at the bottom of the file** + +Replace the current Integration section with: + +```markdown +## Integration + +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available + +### Standalone usage + +- **After `write-rule-in-paper`**: invoke to verify paper entry +- **During `review-structural`**: check verification script exists and passes +- **Before `issue-to-pr --execute`**: pre-validate the algorithm +``` + +- [ ] **Step 3: Add test vectors JSON to the Quality Gates checklist** + +In the Quality Gates section, add after the cross-consistency items: + +```markdown +- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims +- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) +``` + +- [ ] **Step 4: Add test vectors JSON to the Self-Review checklist (Step 6)** + +In Step 6, add a new subsection after "Cross-consistency": + +```markdown +### Test vectors and auto-matching + +- [ ] `test_vectors__.json` exported successfully +- [ ] YES instance in JSON matches Typst feasible example (values present) +- [ ] NO instance in JSON matches Typst infeasible example (values present) +- [ ] All claims have `verified: true` +- [ ] At least 5 claims for non-trivial reductions +``` + +- [ ] **Step 5: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: update verify-reduction Step 8 + Integration for downstream pipeline" +``` + +--- + +### Task 4: Create add-reduction skill + +**Files:** +- Create: `.claude/skills/add-reduction/SKILL.md` + +This is a fork of add-rule that reads verified artifacts instead of deriving from the issue. + +- [ ] **Step 1: Create the skill directory** + +```bash +mkdir -p .claude/skills/add-reduction +``` + +- [ ] **Step 2: Write the SKILL.md** + +Create `.claude/skills/add-reduction/SKILL.md` with the following content: + +```markdown +--- +name: add-reduction +description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON +--- + +# Add Reduction (from Verified Artifacts) + +Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. + +**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. + +## Step 0: Locate Verified Artifacts + +Check for existing verification artifacts: + +```bash +ls docs/paper/verify-reductions/verify__.py +ls docs/paper/verify-reductions/test_vectors__.json +ls docs/paper/verify-reductions/_.typ +``` + +If any are missing, run `/verify-reduction` first. + +### Read the artifacts + +1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. +2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. +3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. + +```bash +# Load test vectors +TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) +``` + +Extract from test vectors JSON: +- `overhead` → use directly in `#[reduction(overhead = { ... })]` +- `yes_instance.input` / `yes_instance.output` → first closed-loop test case +- `no_instance.input` / `no_instance.output` → infeasible test case +- `claims` → verify each is preserved in the Rust implementation + +## Reference Implementations + +Same as `/add-rule`: +- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` +- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` +- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` +- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) + +## Step 1: Implement the reduction + +Create `src/rules/_.rs`. + +**Translation guide:** Map the Python `reduce()` function to Rust: + +| Python | Rust | +|--------|------| +| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | +| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | +| `literal_to_element(lit, n)` | Private helper method | +| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | + +**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: + +```rust +#[reduction(overhead = { + // Copy expressions verbatim from test_vectors JSON "overhead" field + field_name = "expression", +})] +``` + +The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. + +## Step 2: Register in mod.rs + +Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. + +## Step 3: Write unit tests from test vectors + +Create `src/unit_tests/rules/_.rs`. + +**Generate tests directly from test vectors JSON:** + +The YES instance becomes the primary closed-loop test: + +```rust +#[test] +fn test__to__closed_loop() { + // Construct source from test_vectors.yes_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target matches test_vectors.yes_instance.output + let target = reduction.target_problem(); + assert_eq!(target.(), /* value from JSON output */); + + // Solve and extract + let solver = BruteForce; + for witness in solver.find_all_witnesses(target).unwrap() { + let extracted = reduction.extract_solution(&witness); + // Verify extracted solution is valid for source + let val = source.evaluate(&extracted); + assert!(val.0); // Or check objective value + } +} +``` + +The NO instance becomes the infeasible test: + +```rust +#[test] +fn test__to__infeasible() { + // Construct source from test_vectors.no_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target is also infeasible + let solver = BruteForce; + let witnesses = solver.find_all_witnesses(reduction.target_problem()); + assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); +} +``` + +Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. + +## Step 4: Add canonical example to example_db + +Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. + +## Step 5: Document in paper + +The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). + +Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. + +## Step 6: Regenerate exports and verify + +Same as `/add-rule` Step 6: + +```bash +cargo run --example export_graph +cargo run --example export_schemas +make regenerate-fixtures +make test clippy +``` + +## Solver Rules + +Same as `/add-rule`. If the target problem needs ILP, implement alongside. + +## CLI Impact + +Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. + +## File Naming + +Same as `/add-rule`: +- Rule file: `src/rules/_.rs` +- Test file: `src/unit_tests/rules/_.rs` +- Canonical example: builder function in `src/example_db/rule_builders.rs` + +## Common Mistakes + +All mistakes from `/add-rule` apply, plus: + +| Mistake | Fix | +|---------|-----| +| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | +| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | +| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | +| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | +| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | +``` + +- [ ] **Step 3: Commit** + +```bash +git add .claude/skills/add-reduction/SKILL.md +git commit -m "feat: create add-reduction skill — consumes verified artifacts from verify-reduction" +``` + +--- + +### Task 5: Register add-reduction in CLAUDE.md + +**Files:** +- Modify: `.claude/CLAUDE.md` + +- [ ] **Step 1: Add add-reduction entry to the Skills list** + +In `.claude/CLAUDE.md`, find the line: +``` +- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +``` + +Add immediately after it: +``` +- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. +``` + +- [ ] **Step 2: Update the verify-reduction description** + +Find the line starting `- [verify-reduction]` and replace it with: + +``` +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. +``` + +- [ ] **Step 3: Commit** + +```bash +git add .claude/CLAUDE.md +git commit -m "feat: register add-reduction skill in CLAUDE.md, update verify-reduction description" +``` + +--- + +Plan complete and saved to `docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md`. Two execution options: + +**1. Subagent-Driven (recommended)** - I dispatch a fresh subagent per task, review between tasks, fast iteration + +**2. Inline Execution** - Execute tasks in this session using executing-plans, batch execution with checkpoints + +Which approach? \ No newline at end of file From d5b782189e250ca500dc859bce345ecebd8531d4 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:27:11 +0000 Subject: [PATCH 07/16] feat: enhance verify-reduction with test vectors export, typed adversary, pipeline integration Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 115 ++++++++++++++++++++++- 1 file changed, 113 insertions(+), 2 deletions(-) diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 8981aa8d4..d2db24139 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -267,6 +267,87 @@ CLAIM TESTED BY If any claim has no test, add one. If it's untestable, document WHY. +### Iteration 4: Export test vectors and validate Typst matching — MANDATORY + +After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption by `/add-reduction`: + +**File:** `docs/paper/verify-reductions/test_vectors__.json` + +```json +{ + "source": "", + "target": "", + "issue": "", + "yes_instance": { + "input": { "...problem-specific fields..." }, + "output": { "...reduced instance fields..." }, + "source_feasible": true, + "target_feasible": true, + "source_solution": ["...config..."], + "extracted_solution": ["...config..."] + }, + "no_instance": { + "input": { "...problem-specific fields..." }, + "output": { "...reduced instance fields..." }, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "field_name": "expression using source getters" + }, + "claims": [ + {"tag": "claim_tag", "formula": "formula or description", "verified": true} + ] +} +``` + +Add this export at the end of `main()` in the constructor script: + +```python +import json + +test_vectors = { + "source": "", + "target": "", + "issue": , + "yes_instance": { ... }, # from Section 6 + "no_instance": { ... }, # from Section 7 + "overhead": { ... }, # from Section 1/4 + "claims": claims_list, # accumulated claim() calls +} + +with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: + json.dump(test_vectors, f, indent=2) +print(f"Test vectors exported to test_vectors__.json") +``` + +**Typst ↔ JSON cross-check:** + +After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text — not a full parser: + +```python +typst_text = open("").read() +for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing YES value: {val}" +for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing NO value: {val}" +``` + +If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. + +**Structured claims (best-effort replacement for manual gap analysis):** + +Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: + +```python +claims_list = [] + +def claim(tag, formula_or_desc, verified=True): + claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) +``` + +Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). + ## Step 5: Adversary Verification The adversary step provides independent verification by a second agent that implements the reduction from scratch, using only the Typst proof as specification. This catches bugs that the constructor's own tests cannot find (confirmation bias, shared implementation errors). @@ -290,6 +371,16 @@ specification. Typst proof file: docs/paper/proposed-reductions.typ (section on ) Issue: # +## Reduction type + +Detect the reduction type from the Typst proof and tailor your testing focus: + +- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. + +- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. + +- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. + ## Your task 1. Read the Typst proof carefully. Extract: @@ -470,6 +561,14 @@ Before declaring verified, run through this checklist. Every item must be YES. I - [ ] The overhead formula in both scripts matches the Typst overhead table - [ ] The examples in both scripts match the Typst examples (same numbers, same instances) +### Test vectors and auto-matching + +- [ ] `test_vectors__.json` exported successfully +- [ ] YES instance in JSON matches Typst feasible example (values present) +- [ ] NO instance in JSON matches Typst infeasible example (values present) +- [ ] All claims have `verified: true` +- [ ] At least 5 claims for non-trivial reductions + ### Lean (optional) If Lean lemmas were added: @@ -528,7 +627,8 @@ Verdict: VERIFIED / OPEN (with reason) ```bash git add docs/paper/.typ \ docs/paper/verify-reductions/verify_*.py \ - docs/paper/verify-reductions/adversary_*.py + docs/paper/verify-reductions/adversary_*.py \ + docs/paper/verify-reductions/test_vectors_*.json git add -f docs/paper/.pdf git commit -m "docs: /verify-reduction # VERIFIED @@ -578,6 +678,8 @@ A reduction is **VERIFIED** when ALL of these hold: - [ ] Gap analysis shows all Typst claims tested - [ ] Cross-comparison shows 0 disagreements and 0 feasibility mismatches - [ ] Cross-consistency between Typst, constructor, and adversary verified +- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims +- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) If any gate fails, go back and fix it before declaring the reduction verified. @@ -600,7 +702,16 @@ If any gate fails, go back and fix it before declaring the reduction verified. ## Integration -- **After `add-rule`**: invoke `/verify-reduction` before creating PR +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available + +### Standalone usage + - **After `write-rule-in-paper`**: invoke to verify paper entry - **During `review-structural`**: check verification script exists and passes - **Before `issue-to-pr --execute`**: pre-validate the algorithm From 7d8c417fa24bf3893931254700570be84374cbca Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:27:22 +0000 Subject: [PATCH 08/16] =?UTF-8?q?feat:=20create=20add-reduction=20skill=20?= =?UTF-8?q?=E2=80=94=20consumes=20verified=20artifacts=20from=20verify-red?= =?UTF-8?q?uction?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .claude/skills/add-reduction/SKILL.md | 175 ++++++++++++++++++++++++++ 1 file changed, 175 insertions(+) create mode 100644 .claude/skills/add-reduction/SKILL.md diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md new file mode 100644 index 000000000..325180acf --- /dev/null +++ b/.claude/skills/add-reduction/SKILL.md @@ -0,0 +1,175 @@ +--- +name: add-reduction +description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON +--- + +# Add Reduction (from Verified Artifacts) + +Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. + +**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. + +## Step 0: Locate Verified Artifacts + +Check for existing verification artifacts: + +```bash +ls docs/paper/verify-reductions/verify__.py +ls docs/paper/verify-reductions/test_vectors__.json +ls docs/paper/verify-reductions/_.typ +``` + +If any are missing, run `/verify-reduction` first. + +### Read the artifacts + +1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. +2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. +3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. + +```bash +# Load test vectors +TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) +``` + +Extract from test vectors JSON: +- `overhead` → use directly in `#[reduction(overhead = { ... })]` +- `yes_instance.input` / `yes_instance.output` → first closed-loop test case +- `no_instance.input` / `no_instance.output` → infeasible test case +- `claims` → verify each is preserved in the Rust implementation + +## Reference Implementations + +Same as `/add-rule`: +- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` +- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` +- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` +- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) + +## Step 1: Implement the reduction + +Create `src/rules/_.rs`. + +**Translation guide:** Map the Python `reduce()` function to Rust: + +| Python | Rust | +|--------|------| +| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | +| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | +| `literal_to_element(lit, n)` | Private helper method | +| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | + +**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: + +```rust +#[reduction(overhead = { + // Copy expressions verbatim from test_vectors JSON "overhead" field + field_name = "expression", +})] +``` + +The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. + +## Step 2: Register in mod.rs + +Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. + +## Step 3: Write unit tests from test vectors + +Create `src/unit_tests/rules/_.rs`. + +**Generate tests directly from test vectors JSON:** + +The YES instance becomes the primary closed-loop test: + +```rust +#[test] +fn test__to__closed_loop() { + // Construct source from test_vectors.yes_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target matches test_vectors.yes_instance.output + let target = reduction.target_problem(); + assert_eq!(target.(), /* value from JSON output */); + + // Solve and extract + let solver = BruteForce; + for witness in solver.find_all_witnesses(target).unwrap() { + let extracted = reduction.extract_solution(&witness); + // Verify extracted solution is valid for source + let val = source.evaluate(&extracted); + assert!(val.0); // Or check objective value + } +} +``` + +The NO instance becomes the infeasible test: + +```rust +#[test] +fn test__to__infeasible() { + // Construct source from test_vectors.no_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target is also infeasible + let solver = BruteForce; + let witnesses = solver.find_all_witnesses(reduction.target_problem()); + assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); +} +``` + +Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. + +## Step 4: Add canonical example to example_db + +Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. + +## Step 5: Document in paper + +The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). + +Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. + +## Step 6: Regenerate exports and verify + +Same as `/add-rule` Step 6: + +```bash +cargo run --example export_graph +cargo run --example export_schemas +make regenerate-fixtures +make test clippy +``` + +## Solver Rules + +Same as `/add-rule`. If the target problem needs ILP, implement alongside. + +## CLI Impact + +Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. + +## File Naming + +Same as `/add-rule`: +- Rule file: `src/rules/_.rs` +- Test file: `src/unit_tests/rules/_.rs` +- Canonical example: builder function in `src/example_db/rule_builders.rs` + +## Common Mistakes + +All mistakes from `/add-rule` apply, plus: + +| Mistake | Fix | +|---------|-----| +| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | +| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | +| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | +| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | +| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | From 022c1c0a9520c63bcf889a4257983c0106223994 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:55:51 +0000 Subject: [PATCH 09/16] feat: register add-reduction skill in CLAUDE.md, update verify-reduction description Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/CLAUDE.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 659825237..a3b771d5a 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -10,6 +10,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct ` -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`. - [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. - [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. - [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`. - [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`. - [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`). @@ -26,7 +27,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. -- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), and Lean lemmas (non-trivial required). Iterates until all checks pass. Creates worktree + PR. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. ## Codex Compatibility From 1895610ea091c8fcbdb0144639b23ea2879aa0da Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 10:17:57 +0000 Subject: [PATCH 10/16] fix: three improvements to verify-reduction and add-reduction skills MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. verify-reduction Step 1: type compatibility gate — checks source/target Value types before proceeding. Stops and comments on issue if types are incompatible (e.g., optimization → decision needs K parameter). 2. add-reduction Step 7: mandatory cleanup of verification artifacts from docs/paper/verify-reductions/ — Python scripts, JSON, Typst, PDF must not get into the library. 3. add-reduction Steps 4/4b/5: mandatory requirements from #974 — canonical example in rule_builders.rs (Check 9), example-db lookup test (Check 10), paper reduction-rule entry (Check 11). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 72 ++++++++++++++++++++++-- .claude/skills/verify-reduction/SKILL.md | 35 ++++++++++++ 2 files changed, 102 insertions(+), 5 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index 325180acf..2848e9df6 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -126,15 +126,58 @@ fn test__to__infeasible() { Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. -## Step 4: Add canonical example to example_db +## Step 4: Add canonical example to example_db (Check 9 from #974) -Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. +**This step is MANDATORY — many past PRs skipped it.** Add a builder function in `src/example_db/rule_builders.rs` that constructs a small canonical instance for this reduction. -## Step 5: Document in paper +Use the YES instance from the test vectors JSON as the canonical example. Follow existing patterns in `rule_builders.rs`: -The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). +```rust +pub fn build__to__example() -> RuleExample { + // Construct the source instance from test vectors YES instance + let source = ::new(/* fields from test_vectors.yes_instance.input */); + let reduction = ReduceTo::::reduce_to(&source); + RuleExample { + source: Box::new(source), + target: Box::new(reduction.target_problem().clone()), + reduction_result: Box::new(reduction), + } +} +``` + +Register the builder in `build_rule_examples()` in the same file. + +## Step 4b: Add example-db lookup test (Check 10 from #974) + +**Also MANDATORY.** Add a test in `src/unit_tests/example_db.rs` that verifies the canonical example can be loaded: + +```rust +#[test] +fn test_example__to_() { + let examples = build_rule_examples(); + let found = examples.iter().any(|e| /* matches source/target names */); + assert!(found, "Missing canonical example for "); +} +``` + +Or add the rule to the existing exhaustive lookup test if one exists. + +## Step 5: Document in paper (Check 11 from #974) + +**Also MANDATORY.** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template: + +```typst +#reduction-rule("Source", "Target", + example: true, + example-caption: [Description], +)[ + Rule statement from Typst proof... +][ + Proof from Typst proof (Construction + Correctness + Extraction)... +] +``` -Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. +The proof text, worked examples, and overhead table are already written in the verification Typst file — adapt them to the paper's macros. Follow `/add-rule` Step 5 for the exact format. ## Step 6: Regenerate exports and verify @@ -147,6 +190,21 @@ make regenerate-fixtures make test clippy ``` +## Step 7: Clean up verification artifacts + +**MANDATORY.** After the Rust implementation is complete and all tests pass, remove the verification artifacts from `docs/paper/verify-reductions/`. These artifacts served their purpose as a pre-verification gate and must NOT be committed into the library. + +```bash +git rm docs/paper/verify-reductions/verify__.py +git rm docs/paper/verify-reductions/adversary__.py +git rm docs/paper/verify-reductions/cross_compare__.py +git rm docs/paper/verify-reductions/test_vectors__.json +git rm docs/paper/verify-reductions/_.typ +git rm docs/paper/verify-reductions/_.pdf +``` + +The Typst proof content lives on in the paper entry (`docs/paper/reductions.typ`). The Python scripts and test vectors were scaffolding — the Rust tests are the permanent verification. + ## Solver Rules Same as `/add-rule`. If the target problem needs ILP, implement alongside. @@ -173,3 +231,7 @@ All mistakes from `/add-rule` apply, plus: | Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | | Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | | Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | +| Missing canonical example in `rule_builders.rs` (Check 9 from #974) | MANDATORY — add builder function using the YES test vector | +| Missing example-db lookup test (Check 10 from #974) | MANDATORY — add test in `example_db.rs` | +| Missing paper `reduction-rule` entry (Check 11 from #974) | MANDATORY — integrate Typst proof into `reductions.typ` | +| Leaving verification artifacts in `docs/paper/verify-reductions/` | MANDATORY cleanup — `git rm` all Python scripts, JSON, Typst, PDF from verify-reductions | diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index d2db24139..21310f016 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -109,6 +109,41 @@ pred show --json pred show --json ``` +### Type compatibility gate — MANDATORY + +Before proceeding, check that the source and target Value types are compatible for a witness-preserving reduction (`ReduceTo` / `ReductionResult`): + +```bash +# Check Value types in the model source files +grep "type Value = " src/models/*/.rs +grep "type Value = " src/models/*/.rs +``` + +**Compatible pairs:** +- `Or` → `Or` (feasibility → feasibility) — always OK +- `Min` → `Min` or `Max` → `Max` (same optimization sense) — OK if witness extraction preserves optimality +- `Or` → `Max`/`Min` — OK for decision-to-optimization embeddings + +**Incompatible pairs that require a K parameter:** +- `Min` or `Max` → `Or` (optimization → decision) — the G&J-style reductions need a threshold K that doesn't exist on the source type. These cannot be implemented as `ReduceTo`. + +If the types are incompatible, **STOP immediately** and comment on the issue: + +```bash +gh issue comment "$ISSUE" --body "verify-reduction: **BLOCKED** — type mismatch. + +Source \`\` has \`Value = \` (optimization) but target \`\` has \`Value = \` (decision). The G&J reduction requires a threshold parameter K that doesn't exist on the source type. This reduction cannot be implemented as \`ReduceTo\` in the current type system. + +Options: +1. Add a decision-variant model with a K parameter +2. Use \`ReduceToAggregate\` for value-only reduction (no witness extraction) +3. Choose a different source/target pair with compatible types" +``` + +Do NOT create a worktree, write proofs, or run verification for incompatible reductions. + +### Continue if compatible + Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. If the issue is incomplete, use WebSearch to find the original reference. From deb05981047cf88b03eb69aae73818b6d93e872b Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 10:31:59 +0000 Subject: [PATCH 11/16] =?UTF-8?q?refactor:=20concise=20verify-reduction=20?= =?UTF-8?q?(761=E2=86=92124=20lines)=20+=20self-contained=20add-reduction?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit verify-reduction: removed verbose templates, condensed checklists into prose, kept all requirements but removed boilerplate code blocks that the agent can derive from context. add-reduction: integrated add-rule Steps 1-6, write-rule-in-paper Steps 1-6, and #974 requirements (Checks 9/10/11) into a single self-contained skill. No need to read 3 other skills. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 263 ++++---- .claude/skills/verify-reduction/SKILL.md | 743 ++--------------------- 2 files changed, 163 insertions(+), 843 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index 2848e9df6..f47c0fb8e 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -5,13 +5,11 @@ description: Add a new reduction rule using verified artifacts from /verify-redu # Add Reduction (from Verified Artifacts) -Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. +Complete pipeline for adding a reduction rule when `/verify-reduction` has produced verified artifacts. Translates Python `reduce()` to Rust, writes tests from test vectors, adds example-db entry, writes paper entry, then cleans up verification artifacts. -**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. +**When to use:** After `/verify-reduction` PR exists. Use `/add-rule` when no verification artifacts exist. -## Step 0: Locate Verified Artifacts - -Check for existing verification artifacts: +## Step 0: Locate and Read Verified Artifacts ```bash ls docs/paper/verify-reductions/verify__.py @@ -21,217 +19,176 @@ ls docs/paper/verify-reductions/_.typ If any are missing, run `/verify-reduction` first. -### Read the artifacts - -1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. -2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. -3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. - -```bash -# Load test vectors -TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) -``` - -Extract from test vectors JSON: -- `overhead` → use directly in `#[reduction(overhead = { ... })]` -- `yes_instance.input` / `yes_instance.output` → first closed-loop test case -- `no_instance.input` / `no_instance.output` → infeasible test case -- `claims` → verify each is preserved in the Rust implementation +**Read these three artifacts:** +1. **Python `reduce()` function** — the verified spec. Translate the algorithm to Rust, not the syntax. +2. **Test vectors JSON** — YES/NO instances with exact values, overhead expressions, verified claims. +3. **Typst proof** — Construction section for doc comments, proof for paper entry. -## Reference Implementations +## Step 1: Implement the Reduction -Same as `/add-rule`: -- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` -- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` -- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` -- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) +Create `src/rules/_.rs`. Follow the pattern in `src/rules/satisfiability_naesatisfiability.rs`. -## Step 1: Implement the reduction - -Create `src/rules/_.rs`. - -**Translation guide:** Map the Python `reduce()` function to Rust: +**Translation guide from Python to Rust:** | Python | Rust | |--------|------| | `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | | `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | -| `literal_to_element(lit, n)` | Private helper method | -| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | +| Helper functions (e.g., `literal_to_element`) | Private functions in the same file | +| Python list of ints | `Vec`, `Vec`, etc. (match the target problem's API) | -**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: +**Required structure:** ```rust +/// Result struct — holds target + any state needed for extraction. +#[derive(Debug, Clone)] +pub struct ReductionXToY { target: TargetType, /* mapping state */ } + +impl ReductionResult for ReductionXToY { + type Source = SourceType; + type Target = TargetType; + fn target_problem(&self) -> &Self::Target { &self.target } + fn extract_solution(&self, target_solution: &[usize]) -> Vec { + // Translate Python extract_assignment() logic + } +} + #[reduction(overhead = { - // Copy expressions verbatim from test_vectors JSON "overhead" field + // Copy verbatim from test_vectors JSON "overhead" field field_name = "expression", })] -``` +impl ReduceTo for SourceType { + type Result = ReductionXToY; + fn reduce_to(&self) -> Self::Result { + // Translate Python reduce() logic + } +} -The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. +#[cfg(test)] +#[path = "../unit_tests/rules/_.rs"] +mod tests; +``` ## Step 2: Register in mod.rs -Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. +Add to `src/rules/mod.rs`: +```rust +pub(crate) mod _; +``` -## Step 3: Write unit tests from test vectors +## Step 3: Write Unit Tests from Test Vectors Create `src/unit_tests/rules/_.rs`. -**Generate tests directly from test vectors JSON:** - -The YES instance becomes the primary closed-loop test: +**From test vectors JSON, generate at minimum:** -```rust -#[test] -fn test__to__closed_loop() { - // Construct source from test_vectors.yes_instance.input - let source = ::try_new(/* fields from JSON */).unwrap(); - - // Reduce - let reduction = ReduceTo::::reduce_to(&source); - - // Verify target matches test_vectors.yes_instance.output - let target = reduction.target_problem(); - assert_eq!(target.(), /* value from JSON output */); - - // Solve and extract - let solver = BruteForce; - for witness in solver.find_all_witnesses(target).unwrap() { - let extracted = reduction.extract_solution(&witness); - // Verify extracted solution is valid for source - let val = source.evaluate(&extracted); - assert!(val.0); // Or check objective value - } -} -``` +1. **Closed-loop test** (`test__to__closed_loop`) — construct source from `yes_instance.input`, reduce, verify target matches `yes_instance.output`, solve target with `BruteForce`, extract solutions, verify each is valid for source. -The NO instance becomes the infeasible test: +2. **Infeasible test** (`test__to__infeasible`) — construct source from `no_instance.input`, reduce, verify target is also infeasible (no witnesses). -```rust -#[test] -fn test__to__infeasible() { - // Construct source from test_vectors.no_instance.input - let source = ::try_new(/* fields from JSON */).unwrap(); - - // Reduce - let reduction = ReduceTo::::reduce_to(&source); - - // Verify target is also infeasible - let solver = BruteForce; - let witnesses = solver.find_all_witnesses(reduction.target_problem()); - assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); -} -``` +3. **Structural test** — verify target dimensions match overhead formula, check well-formedness. -Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. +Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` -## Step 4: Add canonical example to example_db (Check 9 from #974) +## Step 4: Add Canonical Example to example_db -**This step is MANDATORY — many past PRs skipped it.** Add a builder function in `src/example_db/rule_builders.rs` that constructs a small canonical instance for this reduction. +**MANDATORY (Check 9 from #974).** Add a builder in `src/example_db/rule_builders.rs` using the YES test vector instance. Register in `build_rule_examples()`. Follow existing patterns in the file. -Use the YES instance from the test vectors JSON as the canonical example. Follow existing patterns in `rule_builders.rs`: +## Step 4b: Add Example-DB Lookup Test -```rust -pub fn build__to__example() -> RuleExample { - // Construct the source instance from test vectors YES instance - let source = ::new(/* fields from test_vectors.yes_instance.input */); - let reduction = ReduceTo::::reduce_to(&source); - RuleExample { - source: Box::new(source), - target: Box::new(reduction.target_problem().clone()), - reduction_result: Box::new(reduction), - } -} -``` +**MANDATORY (Check 10 from #974).** Verify the new example is discoverable in `src/unit_tests/example_db.rs`. Add the rule to the existing exhaustive lookup test, or add a standalone test. -Register the builder in `build_rule_examples()` in the same file. +## Step 5: Write Paper Entry -## Step 4b: Add example-db lookup test (Check 10 from #974) +**MANDATORY (Check 11 from #974).** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the paper's macros. -**Also MANDATORY.** Add a test in `src/unit_tests/example_db.rs` that verifies the canonical example can be loaded: +### 5a. Load example data -```rust -#[test] -fn test_example__to_() { - let examples = build_rule_examples(); - let found = examples.iter().any(|e| /* matches source/target names */); - assert!(found, "Missing canonical example for "); -} +```typst +#let src_tgt = load-example("Source", "Target") +#let src_tgt_sol = src_tgt.solutions.at(0) ``` -Or add the rule to the existing exhaustive lookup test if one exists. - -## Step 5: Document in paper (Check 11 from #974) - -**Also MANDATORY.** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template: +### 5b. Write reduction-rule entry ```typst #reduction-rule("Source", "Target", example: true, - example-caption: [Description], + example-caption: [Description ($n = ...$, $|E| = ...$)], )[ - Rule statement from Typst proof... + // Theorem body: complexity + construction summary + overhead hint ][ - Proof from Typst proof (Construction + Correctness + Extraction)... + // Proof body: _Construction._ ... _Correctness._ ... _Solution extraction._ ... + // Adapt from the verified Typst proof — reformat, don't rewrite ] ``` -The proof text, worked examples, and overhead table are already written in the verification Typst file — adapt them to the paper's macros. Follow `/add-rule` Step 5 for the exact format. +### 5c. Write worked example (extra block) + +Step-by-step walkthrough with concrete numbers from JSON data. Must include: +- `pred-commands()` block at top (create/reduce/solve/evaluate) +- Source instance display (graph visualization if applicable) +- Construction walkthrough with intermediate values +- Solution verified end-to-end +- Witness multiplicity note + +**Gold-standard reference:** Search for `reduction-rule("KColoring", "QUBO"` in `reductions.typ`. -## Step 6: Regenerate exports and verify +### 5d. Register display name (if new problem in paper) -Same as `/add-rule` Step 6: +Add to `display-name` dictionary if the problem doesn't have an entry yet. + +### 5e. Build + +```bash +make paper # Must compile without errors or new completeness warnings +``` + +## Step 6: Regenerate Exports and Verify ```bash cargo run --example export_graph cargo run --example export_schemas -make regenerate-fixtures +make regenerate-fixtures # Slow — needs fixtures for paper example data make test clippy ``` -## Step 7: Clean up verification artifacts +## Step 7: Clean Up Verification Artifacts -**MANDATORY.** After the Rust implementation is complete and all tests pass, remove the verification artifacts from `docs/paper/verify-reductions/`. These artifacts served their purpose as a pre-verification gate and must NOT be committed into the library. +**MANDATORY.** Verification artifacts must NOT be committed into the library. Remove them after the Rust implementation is complete: ```bash -git rm docs/paper/verify-reductions/verify__.py -git rm docs/paper/verify-reductions/adversary__.py -git rm docs/paper/verify-reductions/cross_compare__.py -git rm docs/paper/verify-reductions/test_vectors__.json -git rm docs/paper/verify-reductions/_.typ -git rm docs/paper/verify-reductions/_.pdf +git rm -f docs/paper/verify-reductions/*** ``` -The Typst proof content lives on in the paper entry (`docs/paper/reductions.typ`). The Python scripts and test vectors were scaffolding — the Rust tests are the permanent verification. - -## Solver Rules +The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. -Same as `/add-rule`. If the target problem needs ILP, implement alongside. +## Step 8: Commit and Create PR -## CLI Impact - -Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. - -## File Naming - -Same as `/add-rule`: -- Rule file: `src/rules/_.rs` -- Test file: `src/unit_tests/rules/_.rs` -- Canonical example: builder function in `src/example_db/rule_builders.rs` +```bash +git add src/rules/_.rs \ + src/unit_tests/rules/_.rs \ + src/rules/mod.rs \ + src/example_db/rule_builders.rs \ + src/unit_tests/example_db.rs \ + docs/paper/reductions.typ +git commit -m "feat: add reduction (#) + +Implemented via /verify-reduction → /add-reduction pipeline. +Verification: N checks (constructor) + M checks (adversary), 0 failures." +git push -u origin "" +gh pr create --title "feat: add reduction (#)" --body "..." +``` ## Common Mistakes -All mistakes from `/add-rule` apply, plus: - | Mistake | Fix | |---------|-----| -| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | -| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | -| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | -| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | -| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | -| Missing canonical example in `rule_builders.rs` (Check 9 from #974) | MANDATORY — add builder function using the YES test vector | -| Missing example-db lookup test (Check 10 from #974) | MANDATORY — add test in `example_db.rs` | -| Missing paper `reduction-rule` entry (Check 11 from #974) | MANDATORY — integrate Typst proof into `reductions.typ` | -| Leaving verification artifacts in `docs/paper/verify-reductions/` | MANDATORY cleanup — `git rm` all Python scripts, JSON, Typst, PDF from verify-reductions | +| Re-deriving algorithm from issue instead of Python `reduce()` | Python function is the verified spec — translate it | +| Overhead expressions don't match test vectors JSON | Copy verbatim from `overhead` field | +| Skipping infeasible (NO) test case | NO instance is in test vectors — always include | +| Missing canonical example in `rule_builders.rs` | MANDATORY — Check 9 from #974 | +| Missing example-db lookup test | MANDATORY — Check 10 from #974 | +| Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | +| Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | +| Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 21310f016..89b11ee2a 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -5,103 +5,32 @@ description: End-to-end verification of a reduction rule — generates Typst pro # Verify Reduction -End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational verification from two independent implementations, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. +Pre-verification gate for reduction rules. Produces Typst proof + dual Python verification scripts + test vectors JSON, iterating until all checks pass. Creates worktree + PR. Downstream: `/add-reduction` consumes the artifacts to implement Rust code. -Outputs: Typst proof entry, constructor Python verification script, adversary Python verification script — all at PR #975 quality level. +``` +Issue → /verify-reduction (Typst + Python) → /add-reduction (Rust) → /review-pipeline +``` ## Invocation ``` -/verify-reduction 868 # from a GitHub issue number -/verify-reduction SubsetSum Partition # from source/target names +/verify-reduction 868 +/verify-reduction SubsetSum Partition ``` -## Prerequisites - -- `sympy`, `networkx`, and `hypothesis` installed (`pip install sympy networkx hypothesis`) -- Both source and target models must exist in the codebase (`pred show `) -- Optional: `elan` with Lean 4 toolchain (for formal lemmas — not required) - -## Process - -```dot -digraph verify { - rankdir=TB; - "Parse input" [shape=box]; - "Create worktree" [shape=box]; - "Read issue" [shape=box]; - "Study models" [shape=box]; - "Write Typst proof" [shape=box]; - "Compile PDF" [shape=box]; - "Write constructor script" [shape=box]; - "Run constructor" [shape=box]; - "All pass?" [shape=diamond]; - "Fix proof + script" [shape=box]; - "Enough checks?" [shape=diamond]; - "Enhance script" [shape=box]; - "Dispatch adversary" [shape=box]; - "Run adversary" [shape=box]; - "Adversary pass?" [shape=diamond]; - "Cross-compare" [shape=box]; - "Verdict table" [shape=diamond]; - "Investigate discrepancy" [shape=box]; - "Self-review" [shape=box]; - "Create PR" [shape=box]; - "Report" [shape=doublecircle]; - - "Parse input" -> "Create worktree"; - "Create worktree" -> "Read issue"; - "Read issue" -> "Study models"; - "Study models" -> "Write Typst proof"; - "Write Typst proof" -> "Compile PDF"; - "Compile PDF" -> "Write constructor script"; - "Write constructor script" -> "Run constructor"; - "Run constructor" -> "All pass?"; - "All pass?" -> "Enough checks?" [label="yes"]; - "All pass?" -> "Fix proof + script" [label="no"]; - "Fix proof + script" -> "Run constructor"; - "Enough checks?" -> "Dispatch adversary" [label="yes"]; - "Enough checks?" -> "Enhance script" [label="no"]; - "Enhance script" -> "Run constructor"; - "Dispatch adversary" -> "Run adversary"; - "Run adversary" -> "Adversary pass?"; - "Adversary pass?" -> "Cross-compare" [label="yes or no"]; - "Cross-compare" -> "Verdict table"; - "Verdict table" -> "Self-review" [label="verified"]; - "Verdict table" -> "Investigate discrepancy" [label="discrepancy"]; - "Investigate discrepancy" -> "Write constructor script"; - "Self-review" -> "Create PR"; - "Create PR" -> "Report"; -} -``` - ---- - ## Step 0: Parse Input and Create Worktree -### 0a. Parse input - ```bash REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) ISSUE= ISSUE_JSON=$(gh issue view "$ISSUE" --json title,body,number) -``` - -### 0b. Create worktree - -```bash REPO_ROOT=$(pwd) -BRANCH_JSON=$(python3 scripts/pipeline_worktree.py prepare-issue-branch \ - --issue "$ISSUE" --slug "verify--" --base main --format json) -BRANCH=$(printf '%s\n' "$BRANCH_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['branch'])") WORKTREE_JSON=$(python3 scripts/pipeline_worktree.py enter --name "verify-$ISSUE" --format json) WORKTREE_DIR=$(printf '%s\n' "$WORKTREE_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['worktree_dir'])") -cd "$WORKTREE_DIR" && git checkout "$BRANCH" +cd "$WORKTREE_DIR" && git checkout -b "issue/$ISSUE-verify--" ``` -If already inside a worktree, skip creation and use the current branch. - -## Step 1: Read Issue and Study Models +## Step 1: Read Issue, Study Models, Type Check ```bash gh issue view "$ISSUE" --json title,body @@ -111,651 +40,85 @@ pred show --json ### Type compatibility gate — MANDATORY -Before proceeding, check that the source and target Value types are compatible for a witness-preserving reduction (`ReduceTo` / `ReductionResult`): +Check source/target `Value` types before any work: ```bash -# Check Value types in the model source files -grep "type Value = " src/models/*/.rs -grep "type Value = " src/models/*/.rs +grep "type Value = " src/models/*/.rs src/models/*/.rs ``` -**Compatible pairs:** -- `Or` → `Or` (feasibility → feasibility) — always OK -- `Min` → `Min` or `Max` → `Max` (same optimization sense) — OK if witness extraction preserves optimality -- `Or` → `Max`/`Min` — OK for decision-to-optimization embeddings - -**Incompatible pairs that require a K parameter:** -- `Min` or `Max` → `Or` (optimization → decision) — the G&J-style reductions need a threshold K that doesn't exist on the source type. These cannot be implemented as `ReduceTo`. - -If the types are incompatible, **STOP immediately** and comment on the issue: +**Compatible:** `Or`→`Or`, `Min`→`Min`, `Max`→`Max`, `Or`→`Min`/`Max`. +**Incompatible:** `Min`/`Max`→`Or` (needs threshold K that the source type lacks). -```bash -gh issue comment "$ISSUE" --body "verify-reduction: **BLOCKED** — type mismatch. - -Source \`\` has \`Value = \` (optimization) but target \`\` has \`Value = \` (decision). The G&J reduction requires a threshold parameter K that doesn't exist on the source type. This reduction cannot be implemented as \`ReduceTo\` in the current type system. - -Options: -1. Add a decision-variant model with a K parameter -2. Use \`ReduceToAggregate\` for value-only reduction (no witness extraction) -3. Choose a different source/target pair with compatible types" -``` +If incompatible, STOP and comment on the issue explaining the type mismatch and options (add decision variant, use `ReduceToAggregate`, or choose different pair). Do NOT proceed. -Do NOT create a worktree, write proofs, or run verification for incompatible reductions. +### If compatible -### Continue if compatible - -Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. - -If the issue is incomplete, use WebSearch to find the original reference. +Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. Use WebSearch if the issue is incomplete. ## Step 2: Write Typst Proof -Append to `docs/paper/proposed-reductions.typ` (or create a standalone `proposed-reductions-.typ` if the main file is on another branch). - -### MANDATORY structure - -```typst -== Source $arrow.r$ Target - -#theorem[...] - -#proof[ - _Construction._ ... - _Correctness._ - ($arrow.r.double$) ... - ($arrow.l.double$) ... - _Solution extraction._ ... -] - -*Overhead.* (table) - -*Feasible example.* (YES instance, fully worked) - -*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) -``` - -### HARD requirements - -- **Construction**: numbered steps, every symbol defined before first use -- **Correctness**: genuinely independent ⟹ and ⟸ — NOT "the converse is similar" -- **No hand-waving**: ZERO instances of "clearly", "obviously", "it is easy to see", "straightforward" -- **No scratch work**: ZERO instances of "Wait", "Hmm", "Actually", "Let me try" -- **TWO examples minimum**: one YES instance (satisfiable/feasible) and one NO instance (unsatisfiable/infeasible). Both fully worked with numerical verification. -- **Example must be non-trivial**: the example must have ≥ 3 variables/vertices. A 1-variable or 2-vertex example is too degenerate to catch bugs. - -Compile: -```bash -python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')" -``` - -## Step 3: Write Constructor Python Verification Script - -Create `docs/paper/verify-reductions/verify__.py`. +Create standalone `docs/paper/verify-reductions/_.typ`. -### ALL 7 sections are MANDATORY +**Mandatory structure:** Construction (numbered steps, symbols defined before use) → Correctness (independent ⟹ and ⟸) → Solution extraction → Overhead table → YES example (≥3 variables, fully worked) → NO example (fully worked, explain WHY infeasible). -```python -#!/usr/bin/env python3 -"""§X.Y Source → Target (#NNN): exhaustive + structural verification.""" -import itertools, sys -from sympy import symbols, simplify # Section 1 is NOT optional +**Hard rules:** Zero "clearly"/"obviously"/"straightforward". Zero scratch work. Two examples minimum with ≥3 variables. -passed = failed = 0 +Compile: `python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')"` -def check(condition, msg=""): - global passed, failed - if condition: passed += 1 - else: failed += 1; print(f" FAIL: {msg}") +## Step 3: Write Constructor Python Script -def main(): - # === Section 1: Symbolic checks (sympy) — MANDATORY === - # At minimum: verify overhead formula symbolically for general n. - # For algebraic reductions: verify key identities. - # "The overhead is trivial" is NOT an excuse to skip this section. +Create `docs/paper/verify-reductions/verify__.py` with ALL 7 mandatory sections: - # === Section 2: Exhaustive forward + backward — MANDATORY === - # n ≤ 5 MINIMUM for all reduction types. - # n ≤ 6 for identity/algebraic reductions. - # Test ALL instances (or sample ≥ 300 per (n,m) if exhaustive is infeasible). +1. **Symbolic checks** (sympy) — overhead formulas, key algebraic identities +2. **Exhaustive forward + backward** — n ≤ 5 minimum, all instances or ≥300 sampled +3. **Solution extraction** — extract source solution from every feasible target witness +4. **Overhead formula** — compare actual target size against formula +5. **Structural properties** — well-formedness, no degenerate cases, gadget structure +6. **YES example** — reproduce exact Typst numbers +7. **NO example** — reproduce exact Typst numbers, verify both sides infeasible - # === Section 3: Solution extraction — MANDATORY === - # For EVERY feasible instance: extract source solution from target, - # verify it satisfies the source problem. - # This is the most commonly skipped section. DO NOT SKIP IT. - - # === Section 4: Overhead formula — MANDATORY === - # Build the actual target instance, measure its size fields, - # compare against the overhead formula. - - # === Section 5: Structural properties — MANDATORY === - # Even for "trivial" reductions, verify at least: - # - Target instance is well-formed (valid graph, valid formula, etc.) - # - No degenerate cases (empty subsets, isolated vertices, etc.) - # For gadget reductions: girth, connectivity, widget structure. - - # === Section 6: YES example from Typst — MANDATORY === - # Reproduce the exact numbers from the Typst proof's feasible example. - - # === Section 7: NO example from Typst — MANDATORY === - # Reproduce the exact numbers from the Typst proof's infeasible example. - # Verify that both source and target are infeasible. - - print(f"Source → Target: {passed} passed, {failed} failed") - return 1 if failed else 0 - -if __name__ == "__main__": - sys.exit(main()) -``` - -### Minimum check counts — STRICTLY ENFORCED - -| Type | Minimum checks | Minimum n | Strategy | -|------|---------------|-----------|----------| -| Identity (same graph, different objective) | 10,000 | n ≤ 6 | Exhaustive ALL graphs | -| Algebraic (padding, complement, De Morgan) | 10,000 | n ≤ 5 | Symbolic + exhaustive | -| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | Construction + formula + structural | -| Composition (A→B→C) | 10,000 | n ≤ 5 | Exhaustive per step | - -Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing regardless of perceived simplicity. +**Minimum:** 5,000 checks regardless of reduction type. 10,000 for identity/algebraic. ## Step 4: Run and Iterate -```bash -python3 docs/paper/verify-reductions/verify__.py -``` - -### Iteration 1: First run - -Run the script. Fix any failures. Re-run. - -### Iteration 2: Check count audit — STRICT - -Print this table and fill it in honestly: - -``` -CHECK COUNT AUDIT: - Total checks: ___ (minimum: 5,000) - Forward direction: ___ instances tested (minimum: all n ≤ 5) - Backward direction: ___ instances tested (minimum: all n ≤ 5) - Solution extraction: ___ feasible instances tested - Overhead formula: ___ instances compared - Symbolic (sympy): ___ identities verified - YES example: verified? [yes/no] - NO example: verified? [yes/no] - Structural properties: ___ checks -``` - -If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. - -### Iteration 3: Gap analysis — MANDATORY - -List EVERY claim in the Typst proof. For each, state whether it's tested: +Run the script. Fix failures. Re-run until 0 failures. -``` -CLAIM TESTED BY -"Universe has 2n elements" Section 4: overhead formula ✓ -"Complementarity forces consistency" Section 3: extraction ✓ -"Clause subset is non-monochromatic" Section 2: forward direction ✓ -"No clause is all-true or all-false" Section 2: backward direction ✓ -... -``` - -If any claim has no test, add one. If it's untestable, document WHY. - -### Iteration 4: Export test vectors and validate Typst matching — MANDATORY - -After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption by `/add-reduction`: - -**File:** `docs/paper/verify-reductions/test_vectors__.json` - -```json -{ - "source": "", - "target": "", - "issue": "", - "yes_instance": { - "input": { "...problem-specific fields..." }, - "output": { "...reduced instance fields..." }, - "source_feasible": true, - "target_feasible": true, - "source_solution": ["...config..."], - "extracted_solution": ["...config..."] - }, - "no_instance": { - "input": { "...problem-specific fields..." }, - "output": { "...reduced instance fields..." }, - "source_feasible": false, - "target_feasible": false - }, - "overhead": { - "field_name": "expression using source getters" - }, - "claims": [ - {"tag": "claim_tag", "formula": "formula or description", "verified": true} - ] -} -``` +**Check count audit:** Print totals for each category. If any is below minimum, enhance and re-run. -Add this export at the end of `main()` in the constructor script: - -```python -import json - -test_vectors = { - "source": "", - "target": "", - "issue": , - "yes_instance": { ... }, # from Section 6 - "no_instance": { ... }, # from Section 7 - "overhead": { ... }, # from Section 1/4 - "claims": claims_list, # accumulated claim() calls -} - -with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: - json.dump(test_vectors, f, indent=2) -print(f"Test vectors exported to test_vectors__.json") -``` - -**Typst ↔ JSON cross-check:** - -After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text — not a full parser: - -```python -typst_text = open("").read() -for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: - assert str(val) in typst_text, f"Typst missing YES value: {val}" -for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: - assert str(val) in typst_text, f"Typst missing NO value: {val}" -``` +**Gap analysis:** List every Typst claim and its corresponding test. Add tests for untested claims. -If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. - -**Structured claims (best-effort replacement for manual gap analysis):** - -Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: - -```python -claims_list = [] - -def claim(tag, formula_or_desc, verified=True): - claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) -``` - -Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). +**Export test vectors** (`test_vectors__.json`) with YES/NO instances, overhead expressions, and structured `claim()` tags. Cross-check that key values from the JSON appear in the Typst file text. ## Step 5: Adversary Verification -The adversary step provides independent verification by a second agent that implements the reduction from scratch, using only the Typst proof as specification. This catches bugs that the constructor's own tests cannot find (confirmation bias, shared implementation errors). - -### 5a. Dispatch adversary agent - -Launch a subagent with the following prompt template. The adversary must not see the constructor's Python script — it reads only the Typst proof. - -**Adversary prompt:** - -```` -You are an adversary verifier for a mathematical reduction proof. - -Your goal: independently implement and test the reduction described in the Typst -proof below, trying to find bugs. You have no access to the constructor's -implementation. Write your own from scratch based solely on the mathematical -specification. - -## Input - -Typst proof file: docs/paper/proposed-reductions.typ (section on ) -Issue: # - -## Reduction type - -Detect the reduction type from the Typst proof and tailor your testing focus: - -- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. - -- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. - -- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. - -## Your task - -1. Read the Typst proof carefully. Extract: - - The construction algorithm (how to build the target instance) - - The correctness argument (forward and backward directions) - - The solution extraction procedure - - The overhead formula - - The YES and NO examples - -2. Create `docs/paper/verify-reductions/adversary__.py` with: - - Your own `reduce()` function implementing the construction from scratch - - Your own `extract_solution()` function implementing solution extraction - - Your own `is_feasible_source()` and `is_feasible_target()` validators - - Exhaustive testing for n ≤ 5 (forward + backward + extraction) - - Overhead formula verification - - Reproduction of both Typst examples (YES and NO) - - Property-based testing using `hypothesis` (at least 2 strategies) - - Minimum 5,000 total checks - -3. Use `hypothesis` for property-based testing. Example strategies: - ```python - from hypothesis import given, settings - from hypothesis import strategies as st - - @given(st.lists(st.integers(0, 1), min_size=3, max_size=8)) - @settings(max_examples=500) - def test_roundtrip(assignment): - # Build source from assignment, reduce, check equivalence - ... - ``` - -4. Run the script. Report pass/fail counts and any bugs found. - -5. Do NOT read or import from `verify__.py`. Your implementation - must be independent. - -## Output format - -Print at the end: -``` -ADVERSARY: : N passed, M failed -BUGS FOUND: -``` -```` - -### 5b. Run adversary script - -```bash -python3 docs/paper/verify-reductions/adversary__.py -``` - -### 5c. Cross-comparison - -After both scripts have run, compare their `reduce()` outputs on a shared set of instances. Create and run this comparison inline: - -```python -#!/usr/bin/env python3 -"""Cross-compare constructor and adversary implementations.""" -import sys -sys.path.insert(0, "docs/paper/verify-reductions") - -from verify__ import reduce as constructor_reduce -from adversary__ import reduce as adversary_reduce - -# Also import feasibility checkers from both sides -from verify__ import is_feasible_source, is_feasible_target -from adversary__ import ( - is_feasible_source as adv_is_feasible_source, - is_feasible_target as adv_is_feasible_target, -) - -def normalize(target_instance): - """Convert target instance to a canonical hashable form for comparison. - Adapt this to the specific target problem type.""" - # For graph problems: sorted edge list + vertex count - # For formula problems: sorted clause list - # For set problems: frozenset of frozensets - return tuple(sorted(str(x) for x in target_instance)) - -agree = disagree = 0 -feasibility_mismatch = 0 - -# Generate shared test instances (adapt to source problem type) -import itertools -for n in range(3, 6): - for instance in generate_all_instances(n): # problem-specific generator - c_target = constructor_reduce(instance) - a_target = adversary_reduce(instance) - - # Compare structural equivalence - if normalize(c_target) == normalize(a_target): - agree += 1 - else: - disagree += 1 - print(f" DISAGREE on instance {instance}") - print(f" Constructor: {c_target}") - print(f" Adversary: {a_target}") - - # Compare feasibility verdicts - c_feas = is_feasible_target(c_target) - a_feas = adv_is_feasible_target(a_target) - if c_feas != a_feas: - feasibility_mismatch += 1 - print(f" FEASIBILITY MISMATCH on {instance}: " - f"constructor={c_feas}, adversary={a_feas}") - -print(f"\nCross-comparison: {agree} agree, {disagree} disagree, " - f"{feasibility_mismatch} feasibility mismatches") -if disagree > 0 or feasibility_mismatch > 0: - print("ACTION REQUIRED: investigate discrepancies before proceeding") - sys.exit(1) -``` - -Adapt the `normalize()` function and instance generator to the specific source/target problem types. - -### 5d. Verdict criteria - -Use this table to determine the outcome. All three signals must be considered together: - -| Constructor | Adversary | Cross-comparison | Verdict | Action | -|-------------|-----------|-----------------|---------|--------| -| Pass | Pass | Agree | Verified | Proceed to Step 6 | -| Pass | Pass | Disagree | Suspect | Both produce valid but structurally different targets. Investigate whether both are correct reductions (e.g., isomorphic gadgets) or one has a latent bug. Resolve before proceeding. | -| Pass | Fail | Agree | Adversary bug | Review adversary script for implementation errors. If adversary misread the Typst spec, fix spec clarity and re-run adversary. | -| Pass | Fail | Disagree | Suspect | Constructor may have a bug masked by its own tests. Investigate the disagreeing instances manually. | -| Fail | Pass | Agree | Constructor bug | Fix constructor script, re-run from Step 4. | -| Fail | Pass | Disagree | Suspect | Investigate both — the agreeing instances may be coincidental. | -| Fail | Fail | Agree | Proof bug | The reduction itself is likely wrong. Re-examine the Typst proof. Return to Step 2. | -| Fail | Fail | Disagree | Proof bug | Same as above — shared failures with structural disagreement indicate a fundamental problem. | - -When the verdict is "Suspect," the resolution is to manually inspect the disagreeing instances until the root cause is identified, then loop back to the appropriate step. - -## Step 6: Self-Review - -Before declaring verified, run through this checklist. Every item must be YES. If any is NO, go back and fix it. - -### Typst proof - -- [ ] Compiles without errors -- [ ] Has Construction with numbered steps -- [ ] Has Correctness with independent ⟹ and ⟸ paragraphs -- [ ] Has Solution extraction -- [ ] Has Overhead table with formula -- [ ] Has YES example (feasible, ≥ 3 variables/vertices, fully worked) -- [ ] Has NO example (infeasible, fully worked with explanation of WHY infeasible) -- [ ] Zero instances of "clearly", "obviously", "it is easy to see" -- [ ] Zero instances of "Wait", "Hmm", "Actually", scratch work -- [ ] Every symbol defined before first use - -### Constructor Python script - -- [ ] 0 failures -- [ ] ≥ 5,000 total checks -- [ ] Section 1 (symbolic) present and non-empty -- [ ] Section 2 (exhaustive) covers n ≤ 5 minimum -- [ ] Section 3 (extraction) tests EVERY feasible instance -- [ ] Section 4 (overhead) compares formula vs actual for all tested instances -- [ ] Section 5 (structural) has at least one non-trivial check -- [ ] Section 6 (YES example) reproduces Typst example numbers exactly -- [ ] Section 7 (NO example) reproduces Typst infeasible example exactly -- [ ] Gap analysis performed — every Typst claim has a corresponding test - -### Adversary Python script - -- [ ] 0 failures -- [ ] ≥ 5,000 total checks -- [ ] Implemented independently (no imports from constructor script) -- [ ] Uses `hypothesis` property-based testing (at least 2 strategies) -- [ ] Exhaustive testing for n ≤ 5 -- [ ] Reproduces both Typst examples - -### Cross-consistency - -- [ ] Cross-comparison script ran with 0 disagreements -- [ ] Cross-comparison script ran with 0 feasibility mismatches -- [ ] The constructor script's `reduce()` implements the Typst construction -- [ ] The adversary script's `reduce()` implements the Typst construction independently -- [ ] The overhead formula in both scripts matches the Typst overhead table -- [ ] The examples in both scripts match the Typst examples (same numbers, same instances) - -### Test vectors and auto-matching +Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. Requirements: own `reduce()`, own `extract_solution()`, `hypothesis` PBT (≥2 strategies), ≥5,000 checks, reproduce both Typst examples. -- [ ] `test_vectors__.json` exported successfully -- [ ] YES instance in JSON matches Typst feasible example (values present) -- [ ] NO instance in JSON matches Typst infeasible example (values present) -- [ ] All claims have `verified: true` -- [ ] At least 5 claims for non-trivial reductions +**Typed adversary prompt:** Include reduction-type focus instructions: +- **Identity:** exhaustive enumeration n ≤ 6, edge-case configs +- **Algebraic:** case boundary conditions (e.g., Σ = 2T ± 1), per-case extraction +- **Gadget:** widget structure invariants, traversal patterns, interior isolation -### Lean (optional) +After adversary runs, **cross-compare** constructor and adversary `reduce()` outputs on shared instances. Use the verdict table: -If Lean lemmas were added: +| Constructor | Adversary | Cross-compare | Verdict | +|-------------|-----------|---------------|---------| +| Pass | Pass | Agree | **Verified** → proceed | +| Pass | Pass | Disagree | **Suspect** → investigate structural differences | +| Any fail | — | — | Fix and re-run from the failing step | +| Both fail | — | — | **Proof bug** → return to Step 2 | -- [ ] Builds without errors (warnings OK) -- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) -- [ ] Every `sorry` has a comment explaining WHY +## Step 6: Self-Review Checklist -## Step 7: Report +Every item must be YES: Typst compiles with all sections, zero hand-waving, constructor ≥5K checks with 0 failures across all 7 sections, adversary ≥5K checks with 0 failures + hypothesis, cross-comparison 0 disagreements, test vectors JSON exported with Typst auto-matching verified. -``` -=== Verification Report: Source → Target (#NNN) === - -Typst proof: §X.Y - - Construction: ✓ (N steps) - - Correctness: ✓ (⟹ + ⟸) - - Extraction: ✓ - - Overhead: ✓ - - YES example: ✓ (N vars/vertices) - - NO example: ✓ (N vars/vertices, reason: ...) - -Constructor: verify__.py - - Checks: N passed, 0 failed - - Sections: 1(sympy) 2(exhaustive) 3(extraction) 4(overhead) 5(structural) 6(YES) 7(NO) - - Forward: exhaustive n ≤ K - - Backward: exhaustive n ≤ K - - Gap analysis: all claims covered - -Adversary: adversary__.py - - Checks: N passed, 0 failed - - Property-based: M hypothesis examples - - Forward: exhaustive n ≤ K - - Backward: exhaustive n ≤ K - - Bugs found: - -Cross-comparison: - - Instances compared: N - - Structural agreement: N/N - - Feasibility agreement: N/N - -Lean: .lean (optional) - - Non-trivial lemmas: N - - Trivial lemmas: M - - Sorry: J (with justification) - -Bugs found: -Iterations: N rounds - -Verdict: VERIFIED / OPEN (with reason) -``` - -## Step 8: Commit, Create PR, Clean Up - -### 8a. Commit +## Step 7: Commit, Create PR, Clean Up ```bash -git add docs/paper/.typ \ - docs/paper/verify-reductions/verify_*.py \ - docs/paper/verify-reductions/adversary_*.py \ - docs/paper/verify-reductions/test_vectors_*.json -git add -f docs/paper/.pdf -git commit -m "docs: /verify-reduction # VERIFIED - -Typst: Construction + Correctness + Extraction + Overhead + YES/NO examples -Constructor: N checks, 0 failures (exhaustive n ≤ K, 7 sections) -Adversary: M checks, 0 failures (independent impl + hypothesis) -Cross-comparison: all instances agree - -Co-Authored-By: Claude Opus 4.6 (1M context) " -``` - -### 8b. Push and create PR - -```bash -git push -u origin "$BRANCH" +git add docs/paper/verify-reductions/*** +git add -f docs/paper/verify-reductions/_.pdf +git commit -m "docs: /verify-reduction # VERIFIED" +git push -u origin "" gh pr create --title "docs: verify reduction #" --body "..." -``` - -### 8c. Clean up worktree - -```bash -cd "$REPO_ROOT" -python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" -``` - -### 8d. Comment on issue - -```bash gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." +cd "$REPO_ROOT" && python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" ``` - -## Quality Gates - -A reduction is **VERIFIED** when ALL of these hold: - -- [ ] Typst compiles, has all mandatory sections including YES and NO examples -- [ ] Zero hand-waving language -- [ ] Constructor Python has 0 failures and ≥ 5,000 checks -- [ ] All 7 constructor Python sections present and non-empty -- [ ] Adversary Python has 0 failures and ≥ 5,000 checks -- [ ] Adversary Python uses `hypothesis` property-based testing -- [ ] Adversary implementation is independent (no shared code with constructor) -- [ ] Exhaustive n ≤ 5 minimum (both scripts) -- [ ] Solution extraction verified for all feasible instances -- [ ] Overhead formula matches actual construction -- [ ] Both Typst examples reproduced by both scripts -- [ ] Gap analysis shows all Typst claims tested -- [ ] Cross-comparison shows 0 disagreements and 0 feasibility mismatches -- [ ] Cross-consistency between Typst, constructor, and adversary verified -- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims -- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) - -If any gate fails, go back and fix it before declaring the reduction verified. - -## Common Mistakes - -| Mistake | Consequence | -|---------|-------------| -| Adversary imports from constructor script | Rejected — must be independent | -| No property-based testing in adversary | Rejected — add hypothesis strategies | -| No symbolic checks (Section 1 empty) | Rejected — add sympy verification | -| Only YES example, no NO example | Rejected — add infeasible instance | -| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | -| "Passed on first run" without gap analysis | Rejected — perform gap analysis | -| Example has < 3 variables | Rejected — too degenerate | -| Either script has < 5,000 checks | Rejected — enhance exhaustive testing | -| Extraction not tested | Rejected — add Section 3 | -| Typst proof says "clearly" | Rejected — rewrite without hand-waving | -| Cross-comparison skipped | Rejected — run comparison script | -| Disagreements dismissed without investigation | Rejected — resolve every discrepancy | - -## Integration - -### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline - -`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. - -- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON -- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors -- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available - -### Standalone usage - -- **After `write-rule-in-paper`**: invoke to verify paper entry -- **During `review-structural`**: check verification script exists and passes -- **Before `issue-to-pr --execute`**: pre-validate the algorithm - -## Reference: PR #975 Quality Level - -Target quality defined by PR #975: -- 800,000+ total checks, 0 unexpected failures -- 3 bugs caught through iteration loop -- Every script has forward + backward + extraction + overhead + example -- Two independent implementations agreeing on all test instances -- Failures marked OPEN honestly with diagnosis From 26d1ca2ea5b958eb74f55201b805821593f73f89 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 10:55:13 +0000 Subject: [PATCH 12/16] =?UTF-8?q?fix:=20restore=20structural=20requirement?= =?UTF-8?q?s=20in=20verify-reduction=20(124=E2=86=92274=20lines)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous rewrite over-condensed the skill, removing gates that agents need to follow: 7-section descriptions with table, minimum check count table, check count audit template, gap analysis format, common mistakes table, and self-review checklist with checkboxes. Restored: all structural gates and requirements. Kept concise: no verbose Python/Typst code templates (agent derives these). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 204 ++++++++++++++++++++--- 1 file changed, 177 insertions(+), 27 deletions(-) diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 89b11ee2a..9bb30a03f 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -59,9 +59,28 @@ Extract: construction algorithm, correctness argument, overhead formulas, worked Create standalone `docs/paper/verify-reductions/_.typ`. -**Mandatory structure:** Construction (numbered steps, symbols defined before use) → Correctness (independent ⟹ and ⟸) → Solution extraction → Overhead table → YES example (≥3 variables, fully worked) → NO example (fully worked, explain WHY infeasible). +**Mandatory structure:** + +```typst +== Source $arrow.r$ Target +#theorem[...] +#proof[ + _Construction._ (numbered steps, every symbol defined before first use) + _Correctness._ + ($arrow.r.double$) ... (genuinely independent, NOT "the converse is similar") + ($arrow.l.double$) ... + _Solution extraction._ ... +] +*Overhead.* (table with target metric → formula) +*Feasible example.* (YES instance, ≥3 variables, fully worked with numbers) +*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) +``` -**Hard rules:** Zero "clearly"/"obviously"/"straightforward". Zero scratch work. Two examples minimum with ≥3 variables. +**Hard rules:** +- Zero instances of "clearly", "obviously", "it is easy to see", "straightforward" +- Zero scratch work ("Wait", "Hmm", "Actually", "Let me try") +- Two examples minimum, both with ≥3 variables/vertices +- Every symbol defined before first use Compile: `python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')"` @@ -69,47 +88,166 @@ Compile: `python3 -c "import typst; typst.compile('.typ', output='.p Create `docs/paper/verify-reductions/verify__.py` with ALL 7 mandatory sections: -1. **Symbolic checks** (sympy) — overhead formulas, key algebraic identities -2. **Exhaustive forward + backward** — n ≤ 5 minimum, all instances or ≥300 sampled -3. **Solution extraction** — extract source solution from every feasible target witness -4. **Overhead formula** — compare actual target size against formula -5. **Structural properties** — well-formedness, no degenerate cases, gadget structure -6. **YES example** — reproduce exact Typst numbers -7. **NO example** — reproduce exact Typst numbers, verify both sides infeasible +| Section | What to verify | Notes | +|---------|---------------|-------| +| 1. Symbolic (sympy) | Overhead formulas symbolically for general n | "The overhead is trivial" is NOT an excuse to skip | +| 2. Exhaustive forward+backward | Source feasible ⟺ target feasible | n ≤ 5 minimum. ALL instances or ≥300 sampled per (n,m) | +| 3. Solution extraction | Extract source solution from every feasible target witness | Most commonly skipped section. DO NOT SKIP | +| 4. Overhead formula | Build target, measure actual size, compare against formula | Catches off-by-one in construction | +| 5. Structural properties | Target well-formed, no degenerate cases | Gadget reductions: girth, connectivity, widget structure | +| 6. YES example | Reproduce exact Typst feasible example numbers | Every value must match | +| 7. NO example | Reproduce exact Typst infeasible example, verify both sides infeasible | Must verify WHY infeasible | + +### Minimum check counts — STRICTLY ENFORCED -**Minimum:** 5,000 checks regardless of reduction type. 10,000 for identity/algebraic. +| Type | Minimum checks | Minimum n | +|------|---------------|-----------| +| Identity (same graph, different objective) | 10,000 | n ≤ 6 | +| Algebraic (padding, complement, case split) | 10,000 | n ≤ 5 | +| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | + +Every reduction gets at least 5,000 checks regardless of perceived simplicity. ## Step 4: Run and Iterate -Run the script. Fix failures. Re-run until 0 failures. +```bash +python3 docs/paper/verify-reductions/verify__.py +``` + +### Iteration 1: Fix failures -**Check count audit:** Print totals for each category. If any is below minimum, enhance and re-run. +Run the script. Fix any failures. Re-run until 0 failures. -**Gap analysis:** List every Typst claim and its corresponding test. Add tests for untested claims. +### Iteration 2: Check count audit -**Export test vectors** (`test_vectors__.json`) with YES/NO instances, overhead expressions, and structured `claim()` tags. Cross-check that key values from the JSON appear in the Typst file text. +Print and fill this table honestly: + +``` +CHECK COUNT AUDIT: + Total checks: ___ (minimum: 5,000) + Forward direction: ___ instances (minimum: all n ≤ 5) + Backward direction: ___ instances (minimum: all n ≤ 5) + Solution extraction: ___ feasible instances tested + Overhead formula: ___ instances compared + Symbolic (sympy): ___ identities verified + YES example: verified? [yes/no] + NO example: verified? [yes/no] + Structural properties: ___ checks +``` + +If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. + +### Iteration 3: Gap analysis + +List EVERY claim in the Typst proof and whether it's tested: + +``` +CLAIM TESTED BY +"Universe has 2n elements" Section 4: overhead ✓ +"Complementarity forces consistency" Section 3: extraction ✓ +"Forward: NAE-sat → valid splitting" Section 2: exhaustive ✓ +... +``` + +If any claim has no test, add one. If untestable, document WHY. + +### Iteration 4: Export test vectors and validate Typst matching + +Export `docs/paper/verify-reductions/test_vectors__.json` with: + +```json +{ + "source": "", "target": "", "issue": , + "yes_instance": { "input": {...}, "output": {...}, "source_feasible": true, "target_feasible": true, "source_solution": [...], "extracted_solution": [...] }, + "no_instance": { "input": {...}, "output": {...}, "source_feasible": false, "target_feasible": false }, + "overhead": { "field": "expression" }, + "claims": [ {"tag": "...", "formula": "...", "verified": true} ] +} +``` + +**Typst ↔ JSON cross-check:** Verify key numerical values from the JSON appear in the Typst example sections (substring search). If any value is missing, the proof and script are out of sync — fix before proceeding. ## Step 5: Adversary Verification -Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. Requirements: own `reduce()`, own `extract_solution()`, `hypothesis` PBT (≥2 strategies), ≥5,000 checks, reproduce both Typst examples. +Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. + +**Adversary requirements:** +- Own `reduce()` function from scratch +- Own `extract_solution()` function +- Own `is_feasible_source()` and `is_feasible_target()` validators +- Exhaustive forward + backward for n ≤ 5 +- `hypothesis` property-based testing (≥2 strategies) +- Reproduce both Typst examples (YES and NO) +- ≥5,000 total checks +- Must NOT import from the constructor script -**Typed adversary prompt:** Include reduction-type focus instructions: -- **Identity:** exhaustive enumeration n ≤ 6, edge-case configs -- **Algebraic:** case boundary conditions (e.g., Σ = 2T ± 1), per-case extraction -- **Gadget:** widget structure invariants, traversal patterns, interior isolation +**Typed adversary focus** (include in prompt): +- **Identity reductions:** exhaustive enumeration n ≤ 6, edge-case configs (all-zero, all-one, alternating) +- **Algebraic reductions:** case boundary conditions (e.g., Σ = 2T exactly, Σ = 2T ± 1), per-case extraction +- **Gadget reductions:** widget structure invariants, traversal patterns, interior vertex isolation -After adversary runs, **cross-compare** constructor and adversary `reduce()` outputs on shared instances. Use the verdict table: +### Cross-comparison -| Constructor | Adversary | Cross-compare | Verdict | -|-------------|-----------|---------------|---------| -| Pass | Pass | Agree | **Verified** → proceed | -| Pass | Pass | Disagree | **Suspect** → investigate structural differences | -| Any fail | — | — | Fix and re-run from the failing step | -| Both fail | — | — | **Proof bug** → return to Step 2 | +After both scripts pass, compare `reduce()` outputs on shared instances. Both must produce structurally identical targets and agree on feasibility for all tested instances. + +### Verdict table + +| Constructor | Adversary | Cross-compare | Verdict | Action | +|-------------|-----------|---------------|---------|--------| +| Pass | Pass | Agree | **Verified** | Proceed to Step 6 | +| Pass | Pass | Disagree | **Suspect** | Investigate — may be isomorphic or latent bug | +| Pass | Fail | — | **Adversary bug** | Fix adversary or clarify Typst spec | +| Fail | Pass | — | **Constructor bug** | Fix constructor, re-run from Step 4 | +| Fail | Fail | — | **Proof bug** | Re-examine Typst proof, return to Step 2 | ## Step 6: Self-Review Checklist -Every item must be YES: Typst compiles with all sections, zero hand-waving, constructor ≥5K checks with 0 failures across all 7 sections, adversary ≥5K checks with 0 failures + hypothesis, cross-comparison 0 disagreements, test vectors JSON exported with Typst auto-matching verified. +Every item must be YES. If any is NO, go back and fix. + +### Typst proof +- [ ] Compiles without errors +- [ ] Construction with numbered steps, symbols defined before use +- [ ] Correctness with independent ⟹ and ⟸ paragraphs +- [ ] Solution extraction section present +- [ ] Overhead table with formulas +- [ ] YES example (≥3 variables, fully worked) +- [ ] NO example (fully worked, explains WHY infeasible) +- [ ] Zero hand-waving language +- [ ] Zero scratch work + +### Constructor Python +- [ ] 0 failures, ≥5,000 total checks +- [ ] All 7 sections present and non-empty +- [ ] Exhaustive n ≤ 5 +- [ ] Extraction tested for every feasible instance +- [ ] Gap analysis: every Typst claim has a test + +### Adversary Python +- [ ] 0 failures, ≥5,000 total checks +- [ ] Independent implementation (no imports from constructor) +- [ ] `hypothesis` PBT with ≥2 strategies +- [ ] Reproduces both Typst examples + +### Cross-consistency +- [ ] Cross-comparison: 0 disagreements, 0 feasibility mismatches +- [ ] Test vectors JSON exported with Typst auto-matching verified +- [ ] All claims have `verified: true` + +## Common Mistakes + +| Mistake | Consequence | +|---------|-------------| +| Adversary imports from constructor script | Rejected — must be independent | +| No `hypothesis` PBT in adversary | Rejected | +| Section 1 (symbolic) empty | Rejected — "overhead is trivial" is not an excuse | +| Only YES example, no NO example | Rejected | +| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | +| No gap analysis | Rejected — perform before proceeding | +| Example has < 3 variables | Rejected — too degenerate | +| Either script has < 5,000 checks | Rejected — enhance testing | +| Extraction (Section 3) not tested | Rejected — most commonly skipped | +| Cross-comparison skipped | Rejected | +| Disagreements dismissed without investigation | Rejected | ## Step 7: Commit, Create PR, Clean Up @@ -122,3 +260,15 @@ gh pr create --title "docs: verify reduction #" gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." cd "$REPO_ROOT" && python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" ``` + +## Integration + +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +### Standalone usage + +- After `write-rule-in-paper`: invoke to verify paper entry +- During `review-structural`: check verification script exists and passes +- Before `issue-to-pr --execute`: pre-validate the algorithm From 1e2b375201468cfbd42d59fb664f01071b155fe4 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 12:40:07 +0000 Subject: [PATCH 13/16] fix: harden add-reduction with file-level verification gates Steps 4, 4b, 5 now have HARD GATE labels with verification commands that check the SPECIFIC required files appear in `git diff --name-only`. Step 8 has a pre-commit gate that lists all 6 required files and blocks commit if any is missing. Root cause: subagents skipped Steps 4 (put example in rule file instead of rule_builders.rs) and 5 (skipped paper entry entirely) because the skill said "MANDATORY" but had no mechanical enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 63 +++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 9 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index f47c0fb8e..6986157b3 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -92,15 +92,29 @@ Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` ## Step 4: Add Canonical Example to example_db -**MANDATORY (Check 9 from #974).** Add a builder in `src/example_db/rule_builders.rs` using the YES test vector instance. Register in `build_rule_examples()`. Follow existing patterns in the file. +**HARD GATE (Check 9 from #974).** You MUST modify `src/example_db/rule_builders.rs` — not the rule file, not anywhere else. Read the file first, follow existing patterns, add a builder function using the YES test vector instance, register in `build_rule_examples()`. + +**Verification — run this and confirm the file was modified:** +```bash +git diff --name-only | grep "rule_builders.rs" +# MUST show: src/example_db/rule_builders.rs +# If it does NOT appear, you skipped this step. Go back and do it. +``` ## Step 4b: Add Example-DB Lookup Test -**MANDATORY (Check 10 from #974).** Verify the new example is discoverable in `src/unit_tests/example_db.rs`. Add the rule to the existing exhaustive lookup test, or add a standalone test. +**HARD GATE (Check 10 from #974).** You MUST modify `src/unit_tests/example_db.rs`. Read the file first, add the new rule to the existing exhaustive lookup test or add a standalone test. + +**Verification:** +```bash +git diff --name-only | grep "example_db.rs" +# MUST show: src/unit_tests/example_db.rs +# If it does NOT appear, you skipped this step. Go back and do it. +``` ## Step 5: Write Paper Entry -**MANDATORY (Check 11 from #974).** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the paper's macros. +**HARD GATE (Check 11 from #974).** You MUST modify `docs/paper/reductions.typ`. The Typst proof already exists from `/verify-reduction` — reformat it into the paper's macros. Do NOT skip this step. ### 5a. Load example data @@ -138,12 +152,19 @@ Step-by-step walkthrough with concrete numbers from JSON data. Must include: Add to `display-name` dictionary if the problem doesn't have an entry yet. -### 5e. Build +### 5e. Build and verify ```bash make paper # Must compile without errors or new completeness warnings ``` +**Verification — confirm the paper file was modified:** +```bash +git diff --name-only | grep "reductions.typ" +# MUST show: docs/paper/reductions.typ +# If it does NOT appear, you skipped Step 5. Go back and do it. +``` + ## Step 6: Regenerate Exports and Verify ```bash @@ -163,7 +184,34 @@ git rm -f docs/paper/verify-reductions/*** The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. -## Step 8: Commit and Create PR +## Step 8: Pre-Commit Gate and Create PR + +**Before committing, run this checklist. ALL must pass:** + +```bash +# Gate 1: Required files modified +echo "=== Pre-commit file gate ===" +for f in \ + "src/rules/_.rs" \ + "src/unit_tests/rules/_.rs" \ + "src/rules/mod.rs" \ + "src/example_db/rule_builders.rs" \ + "src/unit_tests/example_db.rs" \ + "docs/paper/reductions.typ"; do + git diff --name-only HEAD | grep -q "$(basename $f)" && echo " ✓ $f" || echo " ✗ MISSING: $f" +done + +# Gate 2: No verification artifacts remaining +ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" + +# Gate 3: Tests pass +cargo test 2>&1 | tail -3 + +# Gate 4: Paper compiles +make paper 2>&1 | tail -3 +``` + +**If ANY file shows ✗ MISSING, STOP and go back to the skipped step.** Do NOT commit with missing files. ```bash git add src/rules/_.rs \ @@ -172,10 +220,7 @@ git add src/rules/_.rs \ src/example_db/rule_builders.rs \ src/unit_tests/example_db.rs \ docs/paper/reductions.typ -git commit -m "feat: add reduction (#) - -Implemented via /verify-reduction → /add-reduction pipeline. -Verification: N checks (constructor) + M checks (adversary), 0 failures." +git commit -m "feat: add reduction (#)" git push -u origin "" gh pr create --title "feat: add reduction (#)" --body "..." ``` From f5fa88771e4b4e5d3656178d58187b03531fc2b7 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 14:02:25 +0000 Subject: [PATCH 14/16] fix: add CI-equivalent checks to add-reduction pre-commit gate Root cause: PRs #985 and #991 failed CI because: 1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop) 2. New reductions can create paths that dominate existing direct reductions (test_find_dominated_rules_returns_known_set has hardcoded known set) Added to Step 6: - Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`) - Mandatory `cargo test` (full suite, not filtered) - Explicit dominated-rules gate with fix instructions Added to Common Mistakes: - clippy without -D warnings - dominated rules test - skipping full cargo test Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 38 ++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index 6986157b3..a644a0650 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -165,15 +165,36 @@ git diff --name-only | grep "reductions.typ" # If it does NOT appear, you skipped Step 5. Go back and do it. ``` -## Step 6: Regenerate Exports and Verify +## Step 6: Regenerate Exports, CI Checks, and Dominated-Rules Gate ```bash cargo run --example export_graph cargo run --example export_schemas make regenerate-fixtures # Slow — needs fixtures for paper example data -make test clippy ``` +### CI-equivalent checks — MANDATORY + +These must pass before committing. CI will fail if you skip them. + +```bash +# Clippy with -D warnings (CI uses this exact flag — local clippy without -D is insufficient) +cargo clippy -- -D warnings 2>&1 | tail -5 + +# Full test suite (catches dominated-rules test failures) +cargo test 2>&1 | tail -5 +``` + +### Dominated-rules gate — CHECK THIS + +Adding a new reduction can create paths that **dominate** existing direct reductions. The test `test_find_dominated_rules_returns_known_set` in `src/unit_tests/rules/analysis.rs` has a hardcoded set of known dominated pairs. If your new reduction creates a shorter path to a target that already has a direct reduction, this test will fail. + +```bash +cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -10 +``` + +If it fails with "Unexpected dominated rule: X -> Y (dominated by X -> ... -> Y)", add the new pair to the known set in `src/unit_tests/rules/analysis.rs`. + ## Step 7: Clean Up Verification Artifacts **MANDATORY.** Verification artifacts must NOT be committed into the library. Remove them after the Rust implementation is complete: @@ -204,10 +225,16 @@ done # Gate 2: No verification artifacts remaining ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" -# Gate 3: Tests pass +# Gate 3: Clippy with -D warnings (CI uses this — local clippy alone is insufficient) +cargo clippy -- -D warnings 2>&1 | tail -3 + +# Gate 4: Full test suite (catches dominated-rules failures) cargo test 2>&1 | tail -3 -# Gate 4: Paper compiles +# Gate 5: Dominated-rules test specifically +cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -3 + +# Gate 6: Paper compiles make paper 2>&1 | tail -3 ``` @@ -237,3 +264,6 @@ gh pr create --title "feat: add reduction (#)" --bo | Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | | Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | | Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | +| Running `cargo clippy` without `-D warnings` | CI uses `-D warnings` — local clippy without it misses lint failures | +| New reduction dominates existing direct reduction | Check `test_find_dominated_rules_returns_known_set` — add new pair to known set in `analysis.rs` | +| Skipping full `cargo test` before commit | Dominated-rules test only runs in full suite, not filtered tests | From 257845fc730e8937678bd748d9fdd5f2cd7a2ff9 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 14:08:25 +0000 Subject: [PATCH 15/16] fix: correct add-reduction HARD GATE for canonical examples MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit rule_builders.rs is a 4-line pass-through — canonical examples are registered via canonical_rule_example_specs() in each rule file, wired through mod.rs. Updated Step 4 to match actual architecture. Also added analysis.rs to git add list (for dominated-rules updates). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index a644a0650..b84a2e9fc 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -92,13 +92,20 @@ Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` ## Step 4: Add Canonical Example to example_db -**HARD GATE (Check 9 from #974).** You MUST modify `src/example_db/rule_builders.rs` — not the rule file, not anywhere else. Read the file first, follow existing patterns, add a builder function using the YES test vector instance, register in `build_rule_examples()`. +**HARD GATE (Check 9 from #974).** Add a `canonical_rule_example_specs()` function in the reduction rule file, and wire it into `src/rules/mod.rs` where other specs are collected. Read `src/rules/mod.rs` first — search for `canonical_rule_example_specs` to see how existing rules register their examples. -**Verification — run this and confirm the file was modified:** +The builder function must: +1. Construct the source instance from the YES test vector +2. Reduce it to the target +3. Return a `RuleExampleSpec` with build closure + +**Verification:** ```bash -git diff --name-only | grep "rule_builders.rs" -# MUST show: src/example_db/rule_builders.rs -# If it does NOT appear, you skipped this step. Go back and do it. +# The rule file must define canonical_rule_example_specs +grep "canonical_rule_example_specs" src/rules/_.rs +# mod.rs must wire it in +grep "_::canonical_rule_example_specs" src/rules/mod.rs +# Both must show results. If either is empty, go back and do it. ``` ## Step 4b: Add Example-DB Lookup Test @@ -216,7 +223,6 @@ for f in \ "src/rules/_.rs" \ "src/unit_tests/rules/_.rs" \ "src/rules/mod.rs" \ - "src/example_db/rule_builders.rs" \ "src/unit_tests/example_db.rs" \ "docs/paper/reductions.typ"; do git diff --name-only HEAD | grep -q "$(basename $f)" && echo " ✓ $f" || echo " ✗ MISSING: $f" @@ -244,8 +250,8 @@ make paper 2>&1 | tail -3 git add src/rules/_.rs \ src/unit_tests/rules/_.rs \ src/rules/mod.rs \ - src/example_db/rule_builders.rs \ src/unit_tests/example_db.rs \ + src/unit_tests/rules/analysis.rs \ docs/paper/reductions.typ git commit -m "feat: add reduction (#)" git push -u origin "" @@ -259,8 +265,8 @@ gh pr create --title "feat: add reduction (#)" --bo | Re-deriving algorithm from issue instead of Python `reduce()` | Python function is the verified spec — translate it | | Overhead expressions don't match test vectors JSON | Copy verbatim from `overhead` field | | Skipping infeasible (NO) test case | NO instance is in test vectors — always include | -| Missing canonical example in `rule_builders.rs` | MANDATORY — Check 9 from #974 | -| Missing example-db lookup test | MANDATORY — Check 10 from #974 | +| Missing `canonical_rule_example_specs()` in rule file + `mod.rs` wiring | MANDATORY — Check 9 from #974 | +| Missing example-db lookup test in `example_db.rs` | MANDATORY — Check 10 from #974 | | Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | | Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | | Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | From 62a26fa44d4952bbcfb08bced41afefb961e5694 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 14:29:13 +0000 Subject: [PATCH 16/16] feat: parent-side verification + pre-commit hook for add-reduction MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two enforcement mechanisms that don't rely on subagent compliance: 1. Parent-side verification (Step 8a): After subagent reports DONE, the parent runs file gate checks independently. If any required file is missing, sends the subagent back — doesn't trust self-report. 2. Pre-commit hook (.claude/hooks/add-reduction-precommit.sh): Mechanically blocks commits of new rule files unless example_db.rs, reductions.typ, and mod.rs are also staged. Subagents cannot bypass. Root cause: subagents skip HARD GATE steps despite skill text saying "MANDATORY". Text-based enforcement doesn't work — need mechanical checks that run after the subagent, not instructions the subagent reads. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/hooks/add-reduction-precommit.sh | 49 ++++++++++ .claude/skills/add-reduction/SKILL.md | 113 +++++++++++++++++------ 2 files changed, 135 insertions(+), 27 deletions(-) create mode 100755 .claude/hooks/add-reduction-precommit.sh diff --git a/.claude/hooks/add-reduction-precommit.sh b/.claude/hooks/add-reduction-precommit.sh new file mode 100755 index 000000000..f59aaefa6 --- /dev/null +++ b/.claude/hooks/add-reduction-precommit.sh @@ -0,0 +1,49 @@ +#!/bin/bash +# Pre-commit hook for add-reduction: blocks commits missing required files. +# +# This hook fires on any commit that stages a new rule file under src/rules/. +# It checks that the companion files (example_db test, paper entry, mod.rs) +# are also staged. If any are missing, the commit is blocked. +# +# Install: symlink or copy to .git/hooks/pre-commit +# ln -sf ../../.claude/hooks/add-reduction-precommit.sh .git/hooks/pre-commit + +STAGED=$(git diff --cached --name-only) + +# Only run for commits that add/modify rule files (not mod.rs or traits.rs) +RULE_FILES=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | grep '\.rs$') +if [ -z "$RULE_FILES" ]; then + exit 0 +fi + +ERRORS=0 + +# Check example_db.rs is staged +if ! echo "$STAGED" | grep -q "example_db.rs"; then + echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check reductions.typ is staged +if ! echo "$STAGED" | grep -q "reductions.typ"; then + echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check mod.rs is staged +if ! echo "$STAGED" | grep -q "mod.rs"; then + echo "BLOCKED: src/rules/mod.rs not staged" + ERRORS=$((ERRORS + 1)) +fi + +# Check no verification artifacts are staged +if echo "$STAGED" | grep -q "docs/paper/verify-reductions/"; then + echo "BLOCKED: verification artifacts still staged — run git rm first" + ERRORS=$((ERRORS + 1)) +fi + +if [ $ERRORS -gt 0 ]; then + echo "" + echo "Fix the above and re-commit. See /add-reduction skill for details." + exit 1 +fi diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index b84a2e9fc..42498cdfe 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -212,52 +212,111 @@ git rm -f docs/paper/verify-reductions/*** The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. -## Step 8: Pre-Commit Gate and Create PR +## Step 8: Parent-Side Verification and Commit -**Before committing, run this checklist. ALL must pass:** +**Do NOT trust the subagent's self-report.** After the implementation subagent reports DONE, the parent (you) MUST run these checks independently: + +### 8a. File gate (parent runs this, not the subagent) ```bash -# Gate 1: Required files modified -echo "=== Pre-commit file gate ===" -for f in \ - "src/rules/_.rs" \ - "src/unit_tests/rules/_.rs" \ - "src/rules/mod.rs" \ - "src/unit_tests/example_db.rs" \ - "docs/paper/reductions.typ"; do - git diff --name-only HEAD | grep -q "$(basename $f)" && echo " ✓ $f" || echo " ✗ MISSING: $f" +echo "=== Parent-side file gate ===" +REQUIRED_PATTERNS=( + "_.rs" # rule file + "mod.rs" # module registration + "example_db.rs" # lookup test + "reductions.typ" # paper entry +) +CHANGED=$(git diff --name-only HEAD) +for pat in "${REQUIRED_PATTERNS[@]}"; do + echo "$CHANGED" | grep -q "$pat" && echo " ✓ $pat" || echo " ✗ MISSING: $pat — send subagent back" done -# Gate 2: No verification artifacts remaining -ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" +# Canonical example wired? +grep -q "canonical_rule_example_specs" src/rules/_.rs && echo " ✓ canonical_rule_example_specs in rule file" || echo " ✗ MISSING canonical_rule_example_specs" +grep -q "_::canonical_rule_example_specs" src/rules/mod.rs && echo " ✓ wired in mod.rs" || echo " ✗ MISSING mod.rs wiring" -# Gate 3: Clippy with -D warnings (CI uses this — local clippy alone is insufficient) -cargo clippy -- -D warnings 2>&1 | tail -3 +# Artifacts cleaned? +ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" +``` -# Gate 4: Full test suite (catches dominated-rules failures) -cargo test 2>&1 | tail -3 +**If ANY line shows ✗, send the subagent back to fix it.** Do NOT proceed. -# Gate 5: Dominated-rules test specifically -cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -3 +### 8b. CI-equivalent checks (parent runs) -# Gate 6: Paper compiles +```bash +cargo clippy -- -D warnings 2>&1 | tail -3 +cargo test 2>&1 | tail -3 make paper 2>&1 | tail -3 ``` -**If ANY file shows ✗ MISSING, STOP and go back to the skipped step.** Do NOT commit with missing files. +### 8c. Commit and push + +Only after ALL gates pass: ```bash -git add src/rules/_.rs \ - src/unit_tests/rules/_.rs \ - src/rules/mod.rs \ - src/unit_tests/example_db.rs \ - src/unit_tests/rules/analysis.rs \ - docs/paper/reductions.typ +git add -A git commit -m "feat: add reduction (#)" git push -u origin "" gh pr create --title "feat: add reduction (#)" --body "..." ``` +## Pre-Commit Hook + +Install this hook to mechanically block commits missing required files. The hook runs automatically — subagents cannot bypass it. + +**File:** `.claude/hooks/add-reduction-precommit.sh` + +```bash +#!/bin/bash +# Pre-commit hook for add-reduction: blocks commits missing required files. +# Install: copy to .git/hooks/pre-commit or add to settings.json hooks. + +STAGED=$(git diff --cached --name-only) + +# Only run for reduction rule commits +if ! echo "$STAGED" | grep -q "src/rules/.*\.rs$"; then + exit 0 +fi + +# Extract the reduction name from staged rule files +RULE_FILE=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | head -1) +if [ -z "$RULE_FILE" ]; then + exit 0 +fi + +ERRORS=0 + +# Check example_db.rs is staged +if ! echo "$STAGED" | grep -q "example_db.rs"; then + echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check reductions.typ is staged +if ! echo "$STAGED" | grep -q "reductions.typ"; then + echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check mod.rs is staged +if ! echo "$STAGED" | grep -q "mod.rs"; then + echo "BLOCKED: src/rules/mod.rs not staged" + ERRORS=$((ERRORS + 1)) +fi + +# Check no verification artifacts are staged +if echo "$STAGED" | grep -q "verify-reductions/"; then + echo "BLOCKED: verification artifacts still staged — run git rm first" + ERRORS=$((ERRORS + 1)) +fi + +if [ $ERRORS -gt 0 ]; then + echo "" + echo "Fix the above and re-commit. See /add-reduction skill for details." + exit 1 +fi +``` + ## Common Mistakes | Mistake | Fix |