diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 27ce2026c..a3b771d5a 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -10,6 +10,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct ` -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`. - [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. - [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. - [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`. - [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`. - [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`). @@ -26,6 +27,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. ## Codex Compatibility diff --git a/.claude/hooks/add-reduction-precommit.sh b/.claude/hooks/add-reduction-precommit.sh new file mode 100755 index 000000000..f59aaefa6 --- /dev/null +++ b/.claude/hooks/add-reduction-precommit.sh @@ -0,0 +1,49 @@ +#!/bin/bash +# Pre-commit hook for add-reduction: blocks commits missing required files. +# +# This hook fires on any commit that stages a new rule file under src/rules/. +# It checks that the companion files (example_db test, paper entry, mod.rs) +# are also staged. If any are missing, the commit is blocked. +# +# Install: symlink or copy to .git/hooks/pre-commit +# ln -sf ../../.claude/hooks/add-reduction-precommit.sh .git/hooks/pre-commit + +STAGED=$(git diff --cached --name-only) + +# Only run for commits that add/modify rule files (not mod.rs or traits.rs) +RULE_FILES=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | grep '\.rs$') +if [ -z "$RULE_FILES" ]; then + exit 0 +fi + +ERRORS=0 + +# Check example_db.rs is staged +if ! echo "$STAGED" | grep -q "example_db.rs"; then + echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check reductions.typ is staged +if ! echo "$STAGED" | grep -q "reductions.typ"; then + echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check mod.rs is staged +if ! echo "$STAGED" | grep -q "mod.rs"; then + echo "BLOCKED: src/rules/mod.rs not staged" + ERRORS=$((ERRORS + 1)) +fi + +# Check no verification artifacts are staged +if echo "$STAGED" | grep -q "docs/paper/verify-reductions/"; then + echo "BLOCKED: verification artifacts still staged — run git rm first" + ERRORS=$((ERRORS + 1)) +fi + +if [ $ERRORS -gt 0 ]; then + echo "" + echo "Fix the above and re-commit. See /add-reduction skill for details." + exit 1 +fi diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md new file mode 100644 index 000000000..42498cdfe --- /dev/null +++ b/.claude/skills/add-reduction/SKILL.md @@ -0,0 +1,334 @@ +--- +name: add-reduction +description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON +--- + +# Add Reduction (from Verified Artifacts) + +Complete pipeline for adding a reduction rule when `/verify-reduction` has produced verified artifacts. Translates Python `reduce()` to Rust, writes tests from test vectors, adds example-db entry, writes paper entry, then cleans up verification artifacts. + +**When to use:** After `/verify-reduction` PR exists. Use `/add-rule` when no verification artifacts exist. + +## Step 0: Locate and Read Verified Artifacts + +```bash +ls docs/paper/verify-reductions/verify__.py +ls docs/paper/verify-reductions/test_vectors__.json +ls docs/paper/verify-reductions/_.typ +``` + +If any are missing, run `/verify-reduction` first. + +**Read these three artifacts:** +1. **Python `reduce()` function** — the verified spec. Translate the algorithm to Rust, not the syntax. +2. **Test vectors JSON** — YES/NO instances with exact values, overhead expressions, verified claims. +3. **Typst proof** — Construction section for doc comments, proof for paper entry. + +## Step 1: Implement the Reduction + +Create `src/rules/_.rs`. Follow the pattern in `src/rules/satisfiability_naesatisfiability.rs`. + +**Translation guide from Python to Rust:** + +| Python | Rust | +|--------|------| +| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | +| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | +| Helper functions (e.g., `literal_to_element`) | Private functions in the same file | +| Python list of ints | `Vec`, `Vec`, etc. (match the target problem's API) | + +**Required structure:** + +```rust +/// Result struct — holds target + any state needed for extraction. +#[derive(Debug, Clone)] +pub struct ReductionXToY { target: TargetType, /* mapping state */ } + +impl ReductionResult for ReductionXToY { + type Source = SourceType; + type Target = TargetType; + fn target_problem(&self) -> &Self::Target { &self.target } + fn extract_solution(&self, target_solution: &[usize]) -> Vec { + // Translate Python extract_assignment() logic + } +} + +#[reduction(overhead = { + // Copy verbatim from test_vectors JSON "overhead" field + field_name = "expression", +})] +impl ReduceTo for SourceType { + type Result = ReductionXToY; + fn reduce_to(&self) -> Self::Result { + // Translate Python reduce() logic + } +} + +#[cfg(test)] +#[path = "../unit_tests/rules/_.rs"] +mod tests; +``` + +## Step 2: Register in mod.rs + +Add to `src/rules/mod.rs`: +```rust +pub(crate) mod _; +``` + +## Step 3: Write Unit Tests from Test Vectors + +Create `src/unit_tests/rules/_.rs`. + +**From test vectors JSON, generate at minimum:** + +1. **Closed-loop test** (`test__to__closed_loop`) — construct source from `yes_instance.input`, reduce, verify target matches `yes_instance.output`, solve target with `BruteForce`, extract solutions, verify each is valid for source. + +2. **Infeasible test** (`test__to__infeasible`) — construct source from `no_instance.input`, reduce, verify target is also infeasible (no witnesses). + +3. **Structural test** — verify target dimensions match overhead formula, check well-formedness. + +Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` + +## Step 4: Add Canonical Example to example_db + +**HARD GATE (Check 9 from #974).** Add a `canonical_rule_example_specs()` function in the reduction rule file, and wire it into `src/rules/mod.rs` where other specs are collected. Read `src/rules/mod.rs` first — search for `canonical_rule_example_specs` to see how existing rules register their examples. + +The builder function must: +1. Construct the source instance from the YES test vector +2. Reduce it to the target +3. Return a `RuleExampleSpec` with build closure + +**Verification:** +```bash +# The rule file must define canonical_rule_example_specs +grep "canonical_rule_example_specs" src/rules/_.rs +# mod.rs must wire it in +grep "_::canonical_rule_example_specs" src/rules/mod.rs +# Both must show results. If either is empty, go back and do it. +``` + +## Step 4b: Add Example-DB Lookup Test + +**HARD GATE (Check 10 from #974).** You MUST modify `src/unit_tests/example_db.rs`. Read the file first, add the new rule to the existing exhaustive lookup test or add a standalone test. + +**Verification:** +```bash +git diff --name-only | grep "example_db.rs" +# MUST show: src/unit_tests/example_db.rs +# If it does NOT appear, you skipped this step. Go back and do it. +``` + +## Step 5: Write Paper Entry + +**HARD GATE (Check 11 from #974).** You MUST modify `docs/paper/reductions.typ`. The Typst proof already exists from `/verify-reduction` — reformat it into the paper's macros. Do NOT skip this step. + +### 5a. Load example data + +```typst +#let src_tgt = load-example("Source", "Target") +#let src_tgt_sol = src_tgt.solutions.at(0) +``` + +### 5b. Write reduction-rule entry + +```typst +#reduction-rule("Source", "Target", + example: true, + example-caption: [Description ($n = ...$, $|E| = ...$)], +)[ + // Theorem body: complexity + construction summary + overhead hint +][ + // Proof body: _Construction._ ... _Correctness._ ... _Solution extraction._ ... + // Adapt from the verified Typst proof — reformat, don't rewrite +] +``` + +### 5c. Write worked example (extra block) + +Step-by-step walkthrough with concrete numbers from JSON data. Must include: +- `pred-commands()` block at top (create/reduce/solve/evaluate) +- Source instance display (graph visualization if applicable) +- Construction walkthrough with intermediate values +- Solution verified end-to-end +- Witness multiplicity note + +**Gold-standard reference:** Search for `reduction-rule("KColoring", "QUBO"` in `reductions.typ`. + +### 5d. Register display name (if new problem in paper) + +Add to `display-name` dictionary if the problem doesn't have an entry yet. + +### 5e. Build and verify + +```bash +make paper # Must compile without errors or new completeness warnings +``` + +**Verification — confirm the paper file was modified:** +```bash +git diff --name-only | grep "reductions.typ" +# MUST show: docs/paper/reductions.typ +# If it does NOT appear, you skipped Step 5. Go back and do it. +``` + +## Step 6: Regenerate Exports, CI Checks, and Dominated-Rules Gate + +```bash +cargo run --example export_graph +cargo run --example export_schemas +make regenerate-fixtures # Slow — needs fixtures for paper example data +``` + +### CI-equivalent checks — MANDATORY + +These must pass before committing. CI will fail if you skip them. + +```bash +# Clippy with -D warnings (CI uses this exact flag — local clippy without -D is insufficient) +cargo clippy -- -D warnings 2>&1 | tail -5 + +# Full test suite (catches dominated-rules test failures) +cargo test 2>&1 | tail -5 +``` + +### Dominated-rules gate — CHECK THIS + +Adding a new reduction can create paths that **dominate** existing direct reductions. The test `test_find_dominated_rules_returns_known_set` in `src/unit_tests/rules/analysis.rs` has a hardcoded set of known dominated pairs. If your new reduction creates a shorter path to a target that already has a direct reduction, this test will fail. + +```bash +cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -10 +``` + +If it fails with "Unexpected dominated rule: X -> Y (dominated by X -> ... -> Y)", add the new pair to the known set in `src/unit_tests/rules/analysis.rs`. + +## Step 7: Clean Up Verification Artifacts + +**MANDATORY.** Verification artifacts must NOT be committed into the library. Remove them after the Rust implementation is complete: + +```bash +git rm -f docs/paper/verify-reductions/*** +``` + +The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. + +## Step 8: Parent-Side Verification and Commit + +**Do NOT trust the subagent's self-report.** After the implementation subagent reports DONE, the parent (you) MUST run these checks independently: + +### 8a. File gate (parent runs this, not the subagent) + +```bash +echo "=== Parent-side file gate ===" +REQUIRED_PATTERNS=( + "_.rs" # rule file + "mod.rs" # module registration + "example_db.rs" # lookup test + "reductions.typ" # paper entry +) +CHANGED=$(git diff --name-only HEAD) +for pat in "${REQUIRED_PATTERNS[@]}"; do + echo "$CHANGED" | grep -q "$pat" && echo " ✓ $pat" || echo " ✗ MISSING: $pat — send subagent back" +done + +# Canonical example wired? +grep -q "canonical_rule_example_specs" src/rules/_.rs && echo " ✓ canonical_rule_example_specs in rule file" || echo " ✗ MISSING canonical_rule_example_specs" +grep -q "_::canonical_rule_example_specs" src/rules/mod.rs && echo " ✓ wired in mod.rs" || echo " ✗ MISSING mod.rs wiring" + +# Artifacts cleaned? +ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" +``` + +**If ANY line shows ✗, send the subagent back to fix it.** Do NOT proceed. + +### 8b. CI-equivalent checks (parent runs) + +```bash +cargo clippy -- -D warnings 2>&1 | tail -3 +cargo test 2>&1 | tail -3 +make paper 2>&1 | tail -3 +``` + +### 8c. Commit and push + +Only after ALL gates pass: + +```bash +git add -A +git commit -m "feat: add reduction (#)" +git push -u origin "" +gh pr create --title "feat: add reduction (#)" --body "..." +``` + +## Pre-Commit Hook + +Install this hook to mechanically block commits missing required files. The hook runs automatically — subagents cannot bypass it. + +**File:** `.claude/hooks/add-reduction-precommit.sh` + +```bash +#!/bin/bash +# Pre-commit hook for add-reduction: blocks commits missing required files. +# Install: copy to .git/hooks/pre-commit or add to settings.json hooks. + +STAGED=$(git diff --cached --name-only) + +# Only run for reduction rule commits +if ! echo "$STAGED" | grep -q "src/rules/.*\.rs$"; then + exit 0 +fi + +# Extract the reduction name from staged rule files +RULE_FILE=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | head -1) +if [ -z "$RULE_FILE" ]; then + exit 0 +fi + +ERRORS=0 + +# Check example_db.rs is staged +if ! echo "$STAGED" | grep -q "example_db.rs"; then + echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check reductions.typ is staged +if ! echo "$STAGED" | grep -q "reductions.typ"; then + echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check mod.rs is staged +if ! echo "$STAGED" | grep -q "mod.rs"; then + echo "BLOCKED: src/rules/mod.rs not staged" + ERRORS=$((ERRORS + 1)) +fi + +# Check no verification artifacts are staged +if echo "$STAGED" | grep -q "verify-reductions/"; then + echo "BLOCKED: verification artifacts still staged — run git rm first" + ERRORS=$((ERRORS + 1)) +fi + +if [ $ERRORS -gt 0 ]; then + echo "" + echo "Fix the above and re-commit. See /add-reduction skill for details." + exit 1 +fi +``` + +## Common Mistakes + +| Mistake | Fix | +|---------|-----| +| Re-deriving algorithm from issue instead of Python `reduce()` | Python function is the verified spec — translate it | +| Overhead expressions don't match test vectors JSON | Copy verbatim from `overhead` field | +| Skipping infeasible (NO) test case | NO instance is in test vectors — always include | +| Missing `canonical_rule_example_specs()` in rule file + `mod.rs` wiring | MANDATORY — Check 9 from #974 | +| Missing example-db lookup test in `example_db.rs` | MANDATORY — Check 10 from #974 | +| Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | +| Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | +| Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | +| Running `cargo clippy` without `-D warnings` | CI uses `-D warnings` — local clippy without it misses lint failures | +| New reduction dominates existing direct reduction | Check `test_find_dominated_rules_returns_known_set` — add new pair to known set in `analysis.rs` | +| Skipping full `cargo test` before commit | Dominated-rules test only runs in full suite, not filtered tests | diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md new file mode 100644 index 000000000..9bb30a03f --- /dev/null +++ b/.claude/skills/verify-reduction/SKILL.md @@ -0,0 +1,274 @@ +--- +name: verify-reduction +description: End-to-end verification of a reduction rule — generates Typst proof, constructor Python script (>=5000 checks), and adversary Python script (>=5000 independent checks), iterating until all checks pass. Creates worktree + PR. +--- + +# Verify Reduction + +Pre-verification gate for reduction rules. Produces Typst proof + dual Python verification scripts + test vectors JSON, iterating until all checks pass. Creates worktree + PR. Downstream: `/add-reduction` consumes the artifacts to implement Rust code. + +``` +Issue → /verify-reduction (Typst + Python) → /add-reduction (Rust) → /review-pipeline +``` + +## Invocation + +``` +/verify-reduction 868 +/verify-reduction SubsetSum Partition +``` + +## Step 0: Parse Input and Create Worktree + +```bash +REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) +ISSUE= +ISSUE_JSON=$(gh issue view "$ISSUE" --json title,body,number) +REPO_ROOT=$(pwd) +WORKTREE_JSON=$(python3 scripts/pipeline_worktree.py enter --name "verify-$ISSUE" --format json) +WORKTREE_DIR=$(printf '%s\n' "$WORKTREE_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['worktree_dir'])") +cd "$WORKTREE_DIR" && git checkout -b "issue/$ISSUE-verify--" +``` + +## Step 1: Read Issue, Study Models, Type Check + +```bash +gh issue view "$ISSUE" --json title,body +pred show --json +pred show --json +``` + +### Type compatibility gate — MANDATORY + +Check source/target `Value` types before any work: + +```bash +grep "type Value = " src/models/*/.rs src/models/*/.rs +``` + +**Compatible:** `Or`→`Or`, `Min`→`Min`, `Max`→`Max`, `Or`→`Min`/`Max`. +**Incompatible:** `Min`/`Max`→`Or` (needs threshold K that the source type lacks). + +If incompatible, STOP and comment on the issue explaining the type mismatch and options (add decision variant, use `ReduceToAggregate`, or choose different pair). Do NOT proceed. + +### If compatible + +Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. Use WebSearch if the issue is incomplete. + +## Step 2: Write Typst Proof + +Create standalone `docs/paper/verify-reductions/_.typ`. + +**Mandatory structure:** + +```typst +== Source $arrow.r$ Target +#theorem[...] +#proof[ + _Construction._ (numbered steps, every symbol defined before first use) + _Correctness._ + ($arrow.r.double$) ... (genuinely independent, NOT "the converse is similar") + ($arrow.l.double$) ... + _Solution extraction._ ... +] +*Overhead.* (table with target metric → formula) +*Feasible example.* (YES instance, ≥3 variables, fully worked with numbers) +*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) +``` + +**Hard rules:** +- Zero instances of "clearly", "obviously", "it is easy to see", "straightforward" +- Zero scratch work ("Wait", "Hmm", "Actually", "Let me try") +- Two examples minimum, both with ≥3 variables/vertices +- Every symbol defined before first use + +Compile: `python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')"` + +## Step 3: Write Constructor Python Script + +Create `docs/paper/verify-reductions/verify__.py` with ALL 7 mandatory sections: + +| Section | What to verify | Notes | +|---------|---------------|-------| +| 1. Symbolic (sympy) | Overhead formulas symbolically for general n | "The overhead is trivial" is NOT an excuse to skip | +| 2. Exhaustive forward+backward | Source feasible ⟺ target feasible | n ≤ 5 minimum. ALL instances or ≥300 sampled per (n,m) | +| 3. Solution extraction | Extract source solution from every feasible target witness | Most commonly skipped section. DO NOT SKIP | +| 4. Overhead formula | Build target, measure actual size, compare against formula | Catches off-by-one in construction | +| 5. Structural properties | Target well-formed, no degenerate cases | Gadget reductions: girth, connectivity, widget structure | +| 6. YES example | Reproduce exact Typst feasible example numbers | Every value must match | +| 7. NO example | Reproduce exact Typst infeasible example, verify both sides infeasible | Must verify WHY infeasible | + +### Minimum check counts — STRICTLY ENFORCED + +| Type | Minimum checks | Minimum n | +|------|---------------|-----------| +| Identity (same graph, different objective) | 10,000 | n ≤ 6 | +| Algebraic (padding, complement, case split) | 10,000 | n ≤ 5 | +| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | + +Every reduction gets at least 5,000 checks regardless of perceived simplicity. + +## Step 4: Run and Iterate + +```bash +python3 docs/paper/verify-reductions/verify__.py +``` + +### Iteration 1: Fix failures + +Run the script. Fix any failures. Re-run until 0 failures. + +### Iteration 2: Check count audit + +Print and fill this table honestly: + +``` +CHECK COUNT AUDIT: + Total checks: ___ (minimum: 5,000) + Forward direction: ___ instances (minimum: all n ≤ 5) + Backward direction: ___ instances (minimum: all n ≤ 5) + Solution extraction: ___ feasible instances tested + Overhead formula: ___ instances compared + Symbolic (sympy): ___ identities verified + YES example: verified? [yes/no] + NO example: verified? [yes/no] + Structural properties: ___ checks +``` + +If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. + +### Iteration 3: Gap analysis + +List EVERY claim in the Typst proof and whether it's tested: + +``` +CLAIM TESTED BY +"Universe has 2n elements" Section 4: overhead ✓ +"Complementarity forces consistency" Section 3: extraction ✓ +"Forward: NAE-sat → valid splitting" Section 2: exhaustive ✓ +... +``` + +If any claim has no test, add one. If untestable, document WHY. + +### Iteration 4: Export test vectors and validate Typst matching + +Export `docs/paper/verify-reductions/test_vectors__.json` with: + +```json +{ + "source": "", "target": "", "issue": , + "yes_instance": { "input": {...}, "output": {...}, "source_feasible": true, "target_feasible": true, "source_solution": [...], "extracted_solution": [...] }, + "no_instance": { "input": {...}, "output": {...}, "source_feasible": false, "target_feasible": false }, + "overhead": { "field": "expression" }, + "claims": [ {"tag": "...", "formula": "...", "verified": true} ] +} +``` + +**Typst ↔ JSON cross-check:** Verify key numerical values from the JSON appear in the Typst example sections (substring search). If any value is missing, the proof and script are out of sync — fix before proceeding. + +## Step 5: Adversary Verification + +Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. + +**Adversary requirements:** +- Own `reduce()` function from scratch +- Own `extract_solution()` function +- Own `is_feasible_source()` and `is_feasible_target()` validators +- Exhaustive forward + backward for n ≤ 5 +- `hypothesis` property-based testing (≥2 strategies) +- Reproduce both Typst examples (YES and NO) +- ≥5,000 total checks +- Must NOT import from the constructor script + +**Typed adversary focus** (include in prompt): +- **Identity reductions:** exhaustive enumeration n ≤ 6, edge-case configs (all-zero, all-one, alternating) +- **Algebraic reductions:** case boundary conditions (e.g., Σ = 2T exactly, Σ = 2T ± 1), per-case extraction +- **Gadget reductions:** widget structure invariants, traversal patterns, interior vertex isolation + +### Cross-comparison + +After both scripts pass, compare `reduce()` outputs on shared instances. Both must produce structurally identical targets and agree on feasibility for all tested instances. + +### Verdict table + +| Constructor | Adversary | Cross-compare | Verdict | Action | +|-------------|-----------|---------------|---------|--------| +| Pass | Pass | Agree | **Verified** | Proceed to Step 6 | +| Pass | Pass | Disagree | **Suspect** | Investigate — may be isomorphic or latent bug | +| Pass | Fail | — | **Adversary bug** | Fix adversary or clarify Typst spec | +| Fail | Pass | — | **Constructor bug** | Fix constructor, re-run from Step 4 | +| Fail | Fail | — | **Proof bug** | Re-examine Typst proof, return to Step 2 | + +## Step 6: Self-Review Checklist + +Every item must be YES. If any is NO, go back and fix. + +### Typst proof +- [ ] Compiles without errors +- [ ] Construction with numbered steps, symbols defined before use +- [ ] Correctness with independent ⟹ and ⟸ paragraphs +- [ ] Solution extraction section present +- [ ] Overhead table with formulas +- [ ] YES example (≥3 variables, fully worked) +- [ ] NO example (fully worked, explains WHY infeasible) +- [ ] Zero hand-waving language +- [ ] Zero scratch work + +### Constructor Python +- [ ] 0 failures, ≥5,000 total checks +- [ ] All 7 sections present and non-empty +- [ ] Exhaustive n ≤ 5 +- [ ] Extraction tested for every feasible instance +- [ ] Gap analysis: every Typst claim has a test + +### Adversary Python +- [ ] 0 failures, ≥5,000 total checks +- [ ] Independent implementation (no imports from constructor) +- [ ] `hypothesis` PBT with ≥2 strategies +- [ ] Reproduces both Typst examples + +### Cross-consistency +- [ ] Cross-comparison: 0 disagreements, 0 feasibility mismatches +- [ ] Test vectors JSON exported with Typst auto-matching verified +- [ ] All claims have `verified: true` + +## Common Mistakes + +| Mistake | Consequence | +|---------|-------------| +| Adversary imports from constructor script | Rejected — must be independent | +| No `hypothesis` PBT in adversary | Rejected | +| Section 1 (symbolic) empty | Rejected — "overhead is trivial" is not an excuse | +| Only YES example, no NO example | Rejected | +| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | +| No gap analysis | Rejected — perform before proceeding | +| Example has < 3 variables | Rejected — too degenerate | +| Either script has < 5,000 checks | Rejected — enhance testing | +| Extraction (Section 3) not tested | Rejected — most commonly skipped | +| Cross-comparison skipped | Rejected | +| Disagreements dismissed without investigation | Rejected | + +## Step 7: Commit, Create PR, Clean Up + +```bash +git add docs/paper/verify-reductions/*** +git add -f docs/paper/verify-reductions/_.pdf +git commit -m "docs: /verify-reduction # VERIFIED" +git push -u origin "" +gh pr create --title "docs: verify reduction #" --body "..." +gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." +cd "$REPO_ROOT" && python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" +``` + +## Integration + +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +### Standalone usage + +- After `write-rule-in-paper`: invoke to verify paper entry +- During `review-structural`: check verification script exists and passes +- Before `issue-to-pr --execute`: pre-validate the algorithm diff --git a/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md b/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md new file mode 100644 index 000000000..debd6c62f --- /dev/null +++ b/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md @@ -0,0 +1,466 @@ +# Verify-Reduction Enhancements Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Enhance `/verify-reduction` with auto-matching, test vector export, and typed adversary prompts. Create new `/add-reduction` skill that consumes verified artifacts. Keep original `/add-rule` and `/review-pipeline` unchanged. + +**Architecture:** Three file changes — update verify-reduction SKILL.md (new Step 4.5, updated Step 5/8), create add-reduction SKILL.md (fork of add-rule that reads Python reduce() + test vectors JSON), register add-reduction in CLAUDE.md. + +**Tech Stack:** Markdown skill definitions only — no code changes. + +--- + +### Task 1: Update verify-reduction SKILL.md — Add Step 4.5 (Auto-Matching + Test Vectors Export) + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +The new Step 4.5 goes between the current Step 4 (Run and Iterate) and Step 5 (Adversary Verification). It adds three things: (a) constructor exports test vectors JSON, (b) Typst ↔ JSON cross-check, (c) structured claim tags replace manual gap analysis. + +- [ ] **Step 1: Insert new Step 4.5 after the current "Iteration 3: Gap analysis" block** + +In `.claude/skills/verify-reduction/SKILL.md`, find the line `If any claim has no test, add one. If it's untestable, document WHY.` (end of Step 4, Iteration 3). After it, insert: + +```markdown +### Iteration 4: Export test vectors and validate Typst matching — MANDATORY + +After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption: + +**File:** `docs/paper/verify-reductions/test_vectors__.json` + +```json +{ + "source": "", + "target": "", + "issue": , + "yes_instance": { + "input": { ... }, + "output": { ... }, + "source_feasible": true, + "target_feasible": true, + "source_solution": [ ... ], + "extracted_solution": [ ... ] + }, + "no_instance": { + "input": { ... }, + "output": { ... }, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "field_name": "expression using source getters" + }, + "claims": [ + {"tag": "overhead_universe_size", "formula": "2*n", "verified": true}, + {"tag": "forward_correctness", "description": "NAE-sat implies valid splitting", "verified": true} + ] +} +``` + +Add this export at the end of `main()` in the constructor script: + +```python +import json + +test_vectors = { + "source": "", + "target": "", + "issue": , + "yes_instance": { ... }, # from Section 6 + "no_instance": { ... }, # from Section 7 + "overhead": { ... }, # from Section 1/4 + "claims": claims_list, # accumulated claim() calls +} + +with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: + json.dump(test_vectors, f, indent=2) +print(f"Test vectors exported to test_vectors__.json") +``` + +**Typst ↔ JSON cross-check:** + +After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes, feasibility verdicts), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text, not a full parser: + +```python +typst_text = open("").read() +for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing YES value: {val}" +for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing NO value: {val}" +``` + +If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. + +**Structured claims (best-effort replacement for manual gap analysis):** + +Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: + +```python +claims_list = [] + +def claim(tag, formula_or_desc, verified=True): + claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) +``` + +Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). +``` + +- [ ] **Step 2: Verify the edit is well-placed** + +Read the file and confirm Step 4.5 appears between Step 4 and Step 5 with correct markdown heading level. + +- [ ] **Step 3: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: add Step 4.5 to verify-reduction — test vectors export + Typst auto-matching" +``` + +--- + +### Task 2: Update verify-reduction SKILL.md — Typed Adversary Prompt (Step 5) + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +- [ ] **Step 1: Add reduction-type detection and tailored instructions to the adversary prompt** + +In `.claude/skills/verify-reduction/SKILL.md`, find the adversary prompt template in Step 5a (the block starting `You are an adversary verifier`). Before the `## Your task` section of the adversary prompt, insert: + +````markdown +## Reduction type + +Detect the reduction type from the Typst proof and tailor your testing focus: + +- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. + +- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. + +- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. +```` + +- [ ] **Step 2: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: add typed adversary prompts to verify-reduction (identity/algebraic/gadget)" +``` + +--- + +### Task 3: Update verify-reduction SKILL.md — Downstream Artifacts (Step 8) and Integration + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +- [ ] **Step 1: Update Step 8a commit to include test vectors JSON** + +Find the `git add` block in Step 8a. Change it to: + +```bash +git add docs/paper/.typ \ + docs/paper/verify-reductions/verify_*.py \ + docs/paper/verify-reductions/adversary_*.py \ + docs/paper/verify-reductions/test_vectors_*.json +git add -f docs/paper/.pdf +``` + +- [ ] **Step 2: Update the Integration section at the bottom of the file** + +Replace the current Integration section with: + +```markdown +## Integration + +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available + +### Standalone usage + +- **After `write-rule-in-paper`**: invoke to verify paper entry +- **During `review-structural`**: check verification script exists and passes +- **Before `issue-to-pr --execute`**: pre-validate the algorithm +``` + +- [ ] **Step 3: Add test vectors JSON to the Quality Gates checklist** + +In the Quality Gates section, add after the cross-consistency items: + +```markdown +- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims +- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) +``` + +- [ ] **Step 4: Add test vectors JSON to the Self-Review checklist (Step 6)** + +In Step 6, add a new subsection after "Cross-consistency": + +```markdown +### Test vectors and auto-matching + +- [ ] `test_vectors__.json` exported successfully +- [ ] YES instance in JSON matches Typst feasible example (values present) +- [ ] NO instance in JSON matches Typst infeasible example (values present) +- [ ] All claims have `verified: true` +- [ ] At least 5 claims for non-trivial reductions +``` + +- [ ] **Step 5: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: update verify-reduction Step 8 + Integration for downstream pipeline" +``` + +--- + +### Task 4: Create add-reduction skill + +**Files:** +- Create: `.claude/skills/add-reduction/SKILL.md` + +This is a fork of add-rule that reads verified artifacts instead of deriving from the issue. + +- [ ] **Step 1: Create the skill directory** + +```bash +mkdir -p .claude/skills/add-reduction +``` + +- [ ] **Step 2: Write the SKILL.md** + +Create `.claude/skills/add-reduction/SKILL.md` with the following content: + +```markdown +--- +name: add-reduction +description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON +--- + +# Add Reduction (from Verified Artifacts) + +Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. + +**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. + +## Step 0: Locate Verified Artifacts + +Check for existing verification artifacts: + +```bash +ls docs/paper/verify-reductions/verify__.py +ls docs/paper/verify-reductions/test_vectors__.json +ls docs/paper/verify-reductions/_.typ +``` + +If any are missing, run `/verify-reduction` first. + +### Read the artifacts + +1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. +2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. +3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. + +```bash +# Load test vectors +TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) +``` + +Extract from test vectors JSON: +- `overhead` → use directly in `#[reduction(overhead = { ... })]` +- `yes_instance.input` / `yes_instance.output` → first closed-loop test case +- `no_instance.input` / `no_instance.output` → infeasible test case +- `claims` → verify each is preserved in the Rust implementation + +## Reference Implementations + +Same as `/add-rule`: +- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` +- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` +- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` +- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) + +## Step 1: Implement the reduction + +Create `src/rules/_.rs`. + +**Translation guide:** Map the Python `reduce()` function to Rust: + +| Python | Rust | +|--------|------| +| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | +| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | +| `literal_to_element(lit, n)` | Private helper method | +| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | + +**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: + +```rust +#[reduction(overhead = { + // Copy expressions verbatim from test_vectors JSON "overhead" field + field_name = "expression", +})] +``` + +The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. + +## Step 2: Register in mod.rs + +Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. + +## Step 3: Write unit tests from test vectors + +Create `src/unit_tests/rules/_.rs`. + +**Generate tests directly from test vectors JSON:** + +The YES instance becomes the primary closed-loop test: + +```rust +#[test] +fn test__to__closed_loop() { + // Construct source from test_vectors.yes_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target matches test_vectors.yes_instance.output + let target = reduction.target_problem(); + assert_eq!(target.(), /* value from JSON output */); + + // Solve and extract + let solver = BruteForce; + for witness in solver.find_all_witnesses(target).unwrap() { + let extracted = reduction.extract_solution(&witness); + // Verify extracted solution is valid for source + let val = source.evaluate(&extracted); + assert!(val.0); // Or check objective value + } +} +``` + +The NO instance becomes the infeasible test: + +```rust +#[test] +fn test__to__infeasible() { + // Construct source from test_vectors.no_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target is also infeasible + let solver = BruteForce; + let witnesses = solver.find_all_witnesses(reduction.target_problem()); + assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); +} +``` + +Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. + +## Step 4: Add canonical example to example_db + +Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. + +## Step 5: Document in paper + +The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). + +Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. + +## Step 6: Regenerate exports and verify + +Same as `/add-rule` Step 6: + +```bash +cargo run --example export_graph +cargo run --example export_schemas +make regenerate-fixtures +make test clippy +``` + +## Solver Rules + +Same as `/add-rule`. If the target problem needs ILP, implement alongside. + +## CLI Impact + +Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. + +## File Naming + +Same as `/add-rule`: +- Rule file: `src/rules/_.rs` +- Test file: `src/unit_tests/rules/_.rs` +- Canonical example: builder function in `src/example_db/rule_builders.rs` + +## Common Mistakes + +All mistakes from `/add-rule` apply, plus: + +| Mistake | Fix | +|---------|-----| +| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | +| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | +| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | +| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | +| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | +``` + +- [ ] **Step 3: Commit** + +```bash +git add .claude/skills/add-reduction/SKILL.md +git commit -m "feat: create add-reduction skill — consumes verified artifacts from verify-reduction" +``` + +--- + +### Task 5: Register add-reduction in CLAUDE.md + +**Files:** +- Modify: `.claude/CLAUDE.md` + +- [ ] **Step 1: Add add-reduction entry to the Skills list** + +In `.claude/CLAUDE.md`, find the line: +``` +- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +``` + +Add immediately after it: +``` +- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. +``` + +- [ ] **Step 2: Update the verify-reduction description** + +Find the line starting `- [verify-reduction]` and replace it with: + +``` +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. +``` + +- [ ] **Step 3: Commit** + +```bash +git add .claude/CLAUDE.md +git commit -m "feat: register add-reduction skill in CLAUDE.md, update verify-reduction description" +``` + +--- + +Plan complete and saved to `docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md`. Two execution options: + +**1. Subagent-Driven (recommended)** - I dispatch a fresh subagent per task, review between tasks, fast iteration + +**2. Inline Execution** - Execute tasks in this session using executing-plans, batch execution with checkpoints + +Which approach? \ No newline at end of file diff --git a/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md b/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md new file mode 100644 index 000000000..9ce7a1a90 --- /dev/null +++ b/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md @@ -0,0 +1,125 @@ +# Design: Proposed Reduction Rules — Typst Verification Note + +**Date:** 2026-03-31 +**Goal:** Create a standalone Typst document with compiled PDF that formalizes 9 reduction rules from issue #770, resolving blockers for 7 incomplete issues and adding 2 high-leverage NP-hardness chain extensions. + +## Scope + +### 9 Reductions + +**Group 1 — NP-hardness proof chain extensions:** + +| Issue | Reduction | Impact | +|-------|-----------|--------| +| #973 | SubsetSum → Partition | Unlocks ~12 downstream problems | +| #198 | MinimumVertexCover → HamiltonianCircuit | Unlocks ~9 downstream problems | + +**Group 2 — Tier 1a blocked issues (fix + formalize):** + +| Issue | Reduction | Current blocker | +|-------|-----------|----------------| +| #379 | DominatingSet → MinMaxMulticenter | Decision vs optimization MDS model | +| #380 | DominatingSet → MinSumMulticenter | Same | +| #888 | OptimalLinearArrangement → RootedTreeArrangement | Witness extraction impossible for naive approach | +| #822 | ExactCoverBy3Sets → AcyclicPartition | Missing algorithm (unpublished reference) | + +**Group 3 — Tier 1b blocked issues (fix + formalize):** + +| Issue | Reduction | Current blocker | +|-------|-----------|----------------| +| #892 | VertexCover → HamiltonianPath | Depends on #198 (VC→HC) being resolved | +| #894 | VertexCover → PartialFeedbackEdgeSet | Missing Yannakakis 1978b paper | +| #890 | MaxCut → OptimalLinearArrangement | Placeholder algorithm, no actual construction | + +## Deliverables + +1. **`docs/paper/proposed-reductions.typ`** — standalone Typst document +2. **`docs/paper/proposed-reductions.pdf`** — compiled PDF checked into repo +3. **Updated GitHub issues** — #379, #380, #888, #822, #892, #894, #890 corrected with verified algorithms +4. **One PR** containing the note, PDF, and issue updates + +## Document Structure + +``` +Title: Proposed Reduction Rules — Verification Notes +Abstract: Motivation (NP-hardness gaps, blocked issues, impact analysis) + +§1 Notation & Conventions + - Standard symbols (G, V, E, w, etc.) + - Proof structure: Construction → Correctness (⟹/⟸) → Solution Extraction + - Overhead notation + +§2 NP-Hardness Chain Extensions + §2.1 SubsetSum → Partition (#973) + §2.2 MinimumVertexCover → HamiltonianCircuit (#198) + §2.3 VertexCover → HamiltonianPath (#892) + +§3 Graph Reductions + §3.1 MaxCut → OptimalLinearArrangement (#890) + §3.2 OptimalLinearArrangement → RootedTreeArrangement (#888) + +§4 Set & Domination Reductions + §4.1 DominatingSet → MinMaxMulticenter (#379) + §4.2 DominatingSet → MinSumMulticenter (#380) + §4.3 ExactCoverBy3Sets → AcyclicPartition (#822) + +§5 Feedback Set Reductions + §5.1 VertexCover → PartialFeedbackEdgeSet (#894) +``` + +## Per-Reduction Entry Format + +Each reduction follows the `reductions.typ` convention: + +1. **Theorem statement** — 1-3 sentence intuition, citation (e.g., `[GJ79, ND50]`) +2. **Proof** with three mandatory subsections: + - _Construction._ Numbered algorithm steps, all symbols defined before use + - _Correctness._ Bidirectional: (⟹) forward direction, (⟸) backward direction + - _Solution extraction._ How to map target solution back to source +3. **Overhead table** — target size fields as functions of source size fields +4. **Worked example** — concrete small instance, full construction steps, solution verification + +Mathematical notation uses Typst math mode: `$V$`, `$E$`, `$arrow.r.double$`, `$overline(x)$`, etc. + +## Research Plan for Blocked Issues + +For each blocked reduction: + +1. **Search** for the original reference via the citation in Garey & Johnson +2. **Reconstruct** the correct algorithm from the paper or from first principles +3. **Verify** correctness with a hand-worked example in the note +4. **Resolve** the blocker: + - #379/#380: Clarify that the reduction operates on the decision variant; note model alignment needed + - #888: Research Gavril 1977a gadget construction for forcing path-tree solutions + - #822: Research the acyclic partition reduction from G&J or construct from first principles + - #892: Chain through #198 (VC→HC→HP); detail the HC→HP modification + - #894: Search for Yannakakis 1978b or reconstruct the gadget + - #890: Research the Garey-Johnson-Stockmeyer 1976 construction + +If a reference is unavailable, construct a novel reduction and clearly mark it as such. + +## Typst Setup + +- Standalone document (not importing from `reductions.typ`) +- Uses: `ctheorems` for theorem/proof environments, `cetz` if diagrams needed +- Page: A4, New Computer Modern 10pt +- Theorem numbering: `Theorem 2.1`, `Theorem 2.2`, etc. +- No dependency on `examples.json` or `reduction_graph.json` (standalone) +- Compile command: `typst compile docs/paper/proposed-reductions.typ docs/paper/proposed-reductions.pdf` + +## Quality Criteria + +Each reduction must satisfy: +1. **Math equations correct** — all formulas verified against source paper or hand-derivation +2. **Provable correctness** — both directions of the proof are rigorous, no hand-waving +3. **Algorithm clear** — detailed enough that a developer can implement `reduce_to()` and `extract_solution()` directly from the proof +4. **From math to code verifiable** — overhead expressions match the construction, worked example can be used as a test case + +## PR Structure + +- Branch: `feat/proposed-reductions-note` +- Files: + - `docs/paper/proposed-reductions.typ` (new) + - `docs/paper/proposed-reductions.pdf` (new, compiled) +- No code changes — this is a documentation-only PR +- Issue updates done via `gh issue edit` (not in the PR diff) diff --git a/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md b/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md new file mode 100644 index 000000000..c42684f08 --- /dev/null +++ b/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md @@ -0,0 +1,145 @@ +# Design: Adversarial Multi-Agent Reduction Verification + +**Date:** 2026-04-01 +**Goal:** Strengthen `/verify-reduction` with an adversarial second agent that independently verifies the reduction, catching systematic errors that the single-agent approach misses. + +## Problem Statement + +The current verification flow has a fundamental weakness: the same AI agent writes both the mathematical proof AND the verification script. If the agent systematically misunderstands the reduction (e.g., wrong variable mapping, wrong extraction direction), it will write a wrong proof and a wrong script that agree with each other. + +Evidence from PR #975: +- Python caught 4 bugs; Typst proofs caught 0; Lean caught 0 +- All bugs were construction errors or formula mistakes +- No bug was caused by "proof is right but script is wrong" — they were always "both are wrong in the same way" + +The adversarial approach addresses this by having two independent agents verify the same reduction. + +## Architecture + +``` +/verify-reduction + │ + ├── Steps 0-2: Parse input, create worktree, read issue, write Typst proof + │ + ├── Step 3: Constructor Agent + │ └── Writes verify__.py (7 sections, ≥5000 checks) + │ + ├── Step 4: Run constructor script, iterate until 0 failures + │ + ├── Step 4b: Adversary Agent (dispatched as subagent with worktree isolation) + │ ├── Input: theorem statement + construction steps ONLY + │ │ (stripped of constructor's script, internal reasoning, examples) + │ ├── Writes: adversary__.py + │ │ ├── Independent reduce() implementation + │ │ ├── Independent extract_solution() + │ │ ├── Property-based testing (hypothesis, n up to 50) + │ │ ├── Targeted counterexample search + │ │ └── ≥5000 independent checks + │ └── Output: pass/fail + counterexamples + │ + ├── Step 4c: Cross-Comparison + │ ├── constructor.reduce(x) vs adversary.reduce(x) for 1000 instances + │ ├── constructor.check(adversary.reduce(x)) vs adversary.check(constructor.reduce(x)) + │ └── Disagreement → diagnose (real bug vs script bug) + │ + ├── Steps 5-8: Self-review, report, commit, PR + │ + └── Final PR includes BOTH scripts + cross-comparison results +``` + +## Adversary Agent Design + +### What the adversary receives + +A stripped prompt containing ONLY: +- The theorem statement (1-3 sentences) +- The construction steps (numbered) +- The extraction procedure +- The overhead formula + +The adversary does NOT receive: +- The constructor's Python script +- The Typst proof's correctness argument +- The worked examples +- Any internal reasoning from Step 3 + +### What the adversary produces + +`adversary__.py` with: + +1. **Independent `reduce()` function** — implemented from the construction steps, without seeing the constructor's implementation +2. **Independent `extract_solution()` function** — same +3. **Exhaustive testing** — forward + backward for n ≤ 5 (matching constructor's minimum) +4. **Property-based testing** — `hypothesis` with `@given(st.integers(min_value=2, max_value=50))` for random instances +5. **Targeted counterexample search**: + - All-True / all-False assignments + - Single-variable instances + - Maximum and minimum clause/subset sizes + - Instances where source is barely feasible (one satisfying assignment) + - Instances where source is barely infeasible (off by one element) +6. **Diagnostic output** — any counterexample prints the full instance, expected result, actual result + +### Cross-comparison + +After both scripts pass independently, the orchestrator runs: + +```python +# Import both implementations +from verify_source_target import reduce as constructor_reduce, check as constructor_check +from adversary_source_target import reduce as adversary_reduce, check as adversary_check + +# 1. Do they produce the same target instance? +for instance in random_instances(1000): + t1 = constructor_reduce(instance) + t2 = adversary_reduce(instance) + assert t1 == t2, f"Reductions disagree on {instance}" + +# 2. Cross-check feasibility +for instance in random_instances(1000): + assert constructor_check(adversary_reduce(instance)) == adversary_check(constructor_reduce(instance)) +``` + +## Verdict Criteria + +| Constructor | Adversary | Cross-check | Verdict | +|-------------|-----------|-------------|---------| +| 0 fail | 0 fail | Agreement | **VERIFIED** | +| 0 fail | Counterexample found | — | Real bug (constructor has blind spot) | +| Fail | — | — | Bug found before adversary needed | +| 0 fail | 0 fail | Disagreement | One implementation is wrong — diagnose | + +## Why Lean is Dropped + +Lean's contribution to this project: +- **Caught bugs:** 0 out of 4 +- **Useful proofs:** `G ⊔ Gᶜ = ⊤` (1 genuinely meaningful theorem) +- **Trivial proofs:** `n + m = n + m`, `14m + (2m-n) + 2nK = 16m-n+2nK` (add no confidence) +- **Infrastructure gap:** No `ReduceTo`, `NAESatisfiability`, `SetSplitting`, or reduction framework in Mathlib +- **ROI:** Negative. Every Lean lemma requires duplicating problem definitions that already exist in Rust. + +The adversarial Python approach provides stronger evidence: +- Two independent implementations agreeing > one implementation + arithmetic Lean proofs +- Property-based testing catches scale-dependent bugs that Lean can't +- No Mathlib infrastructure needed + +Lean remains **optional** (nice for credibility) but is removed from the required quality gates. + +## Changes to verify-reduction Skill + +| Step | Before | After | +|------|--------|-------| +| Step 3 | Constructor writes script | Unchanged | +| Step 4 | Iterate until 0 failures | Unchanged | +| **Step 4b** | — | Adversary agent writes independent script | +| **Step 4c** | — | Cross-comparison | +| Step 5 | Lean required | Lean optional | +| Quality gates | 1 script ≥5000 checks | 2 independent scripts ≥5000 each + cross-comparison | +| Deliverables | 3 files (Typst + Python + Lean) | 3 files (Typst + constructor Python + adversary Python) | + +## Cost + +- Compute: ~2x (two agents instead of one) +- Time: ~1.5x (adversary runs in parallel via subagent) +- Complexity: moderate (orchestrator manages two outputs) + +Worth it: a wrong reduction that passes review is far more expensive than 2x compute. diff --git a/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md b/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md new file mode 100644 index 000000000..1ea071e75 --- /dev/null +++ b/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md @@ -0,0 +1,174 @@ +# Verify-Reduction Skill Enhancements + +**Date:** 2026-04-01 +**Status:** Draft +**Context:** Lessons from running `/verify-reduction` on 3 reductions (#841 NAE→SS, #973 SS→Partition, #198 VC→HC) and comparison with PR #975's verification approach. + +## Problem + +The verify-reduction skill (PR #979) produces high-quality mathematical verification (Typst proofs + dual Python scripts), but: + +1. **Typst ↔ Python matching is manual.** The human writes examples in both places and hopes they agree. No automated cross-check. +2. **Gap analysis is manual.** The skill asks the agent to list every Typst claim and map it to a test — this is done as a text block, not verified programmatically. +3. **Verification is disconnected from Rust implementation.** The Python scripts verify the math, but when `/add-rule` implements the Rust code, it re-reads the issue from scratch. Verified artifacts aren't consumed downstream. +4. **No compositional testing.** The strongest verification — testing via alternative `pred` reduction paths — only becomes possible after the Rust implementation exists. The current skill doesn't set this up. + +## Design + +### Pipeline Integration + +The verify-reduction skill is a **pre-verification gate** in a larger pipeline: + +``` +Issue → /verify-reduction → /add-rule → /review-pipeline + (Typst + Python) (Rust impl) (agentic test) +``` + +The Python `reduce()` function is the verified spec. `/add-rule` translates it to Rust. `/review-pipeline`'s agentic test (`pred reduce`, `pred solve`) confirms the Rust matches. + +### Enhancement 1: Typst ↔ Python Auto-Matching + +#### Example-level (mandatory) + +The constructor script exports test vectors to a JSON sidecar file: + +``` +docs/paper/verify-reductions/test_vectors__.json +``` + +Contents: +```json +{ + "source": "SubsetSum", + "target": "Partition", + "yes_instance": { + "input": {"sizes": [3, 5, 7, 1, 4], "target": 8}, + "output": {"sizes": [3, 5, 7, 1, 4, 4]}, + "source_feasible": true, + "target_feasible": true, + "source_solution": [0, 1, 1, 0, 0], + "extracted_solution": [0, 1, 1, 0, 0] + }, + "no_instance": { + "input": {"sizes": [3, 7, 11], "target": 5}, + "output": {"sizes": [3, 7, 11, 11]}, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "num_elements": "num_elements + 1" + } +} +``` + +A new validation step (Step 4.5) loads both the test vectors JSON and the Typst file, then checks that key numerical values from the JSON (input sizes, targets, output sizes, feasibility verdicts) appear in the Typst YES/NO example sections. The check is substring-based on the rendered numbers — not a full Typst parser. This catches divergence between the proof and the scripts (e.g., Typst says `S = {3, 5, 7}` but Python tests `S = {3, 5, 8}`). + +The same JSON is consumed downstream: +- `/add-rule` reads it to generate Rust `#[test]` closed-loop cases +- `/review-pipeline`'s agentic test uses it for `pred reduce`/`pred solve` verification + +#### Claim-level (best-effort) + +The constructor script emits structured claim tags: + +```python +claim("overhead_universe_size", "2*n", verified=True) +claim("forward_case2", "T + d = Sigma - T", verified=True) +claim("extraction_opposite_side", "sigma < 2T => opposite side from padding", verified=True) +``` + +Implementation: a `claims` list accumulated during `main()`, exported alongside the test vectors. The self-review step checks: every claim keyword should plausibly map to a Typst proof section. Claims without `verified=True` get flagged. This replaces the manual gap analysis table. + +### Enhancement 2: Adversary Tailoring by Reduction Type + +The adversary prompt is currently generic. Tailor it by reduction type: + +| Type | Adversary Focus | Rationale | +|------|----------------|-----------| +| Identity (same graph, different objective) | Exhaustive enumeration, edge-case configs | Bugs hide in subtle objective differences | +| Algebraic (padding, complement, case split) | Symbolic case boundaries, off-by-one in padding | SS→Partition had 3 cases; boundary conditions are error-prone | +| Gadget (widget construction) | Widget structure, traversal patterns, connectivity | VC→HC: the 3 traversal patterns and cross-edge positions are the core invariant | + +The skill detects type from the issue description (gadget keywords: "widget", "component", "gadget"; algebraic keywords: "padding", "complement", "case"; identity: everything else) and adjusts the adversary prompt template. + +### Enhancement 3: Downstream Integration with add-rule + +When `/add-rule` is invoked for a reduction that has verified artifacts: + +1. **Check for existing verification:** Look for `docs/paper/verify-reductions/verify__.py` and `test_vectors__.json`. +2. **Read Python `reduce()` as pseudocode:** The verified Python function is the spec for the Rust `reduce_to()` implementation. The agent translates it directly. +3. **Read overhead from test vectors JSON:** The `overhead` field maps directly to `#[reduction(overhead = {...})]` expressions. +4. **Generate Rust tests from test vectors:** The YES/NO instances become closed-loop test cases: construct source → reduce → solve target → extract → verify. +5. **Read Typst proof for documentation:** The proof's Construction section becomes the doc comment on the Rust impl. + +This eliminates re-derivation from the issue. The verified artifacts are the single source of truth. + +### Enhancement 4: Compositional Verification in review-pipeline + +After the Rust implementation exists, review-pipeline's agentic test gains a compositional check: + +1. Load test vectors from `test_vectors__.json` +2. For each test vector, run `pred solve ` via the new path +3. If an alternative path exists (e.g., both `NAE→SS→ILP` and `NAE→ILP`), run via both and compare +4. Disagreement = bug in either the new reduction or the test vector + +This is the strongest verification because it tests the actual Rust code through independent code paths. It happens naturally in review-pipeline's existing agentic test step — we just need the test vectors file to drive it. + +## Changes to SKILL.md + +### New Step 4.5: Auto-Matching Validation + +After the constructor passes (Step 4), before the adversary (Step 5): + +1. Constructor exports `test_vectors__.json` with YES/NO instances, overhead, and claims +2. Parse the Typst proof for numerical values in example sections +3. Verify the Typst examples match the JSON test vectors +4. Verify all `claim()` tags have `verified=True` +5. Flag any untested claims + +### Updated Step 5: Typed Adversary Prompt + +The adversary prompt template includes a reduction-type section: + +- **Identity reductions:** "Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration." +- **Algebraic reductions:** "Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently." +- **Gadget reductions:** "Focus on widget structure invariants. Verify each traversal pattern independently. Test that interior vertices have no external edges. Check connectivity after removing widgets." + +### Updated Step 8: Downstream Artifacts + +The commit includes the test vectors JSON alongside the Typst and Python files: + +```bash +git add docs/paper/verify-reductions/test_vectors__.json +``` + +The PR description notes: "Test vectors available for `/add-rule` to consume when implementing the Rust reduction." + +### Updated Integration Section + +``` +- **Before `/add-rule`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-rule`**: reads Python `reduce()` as pseudocode, overhead from JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances, compositional check via alternative paths if available +``` + +## Non-Goals + +- **No Rust code generation.** verify-reduction stays in Typst + Python. Rust translation is add-rule's job. +- **No manifest format.** The Python script + test vectors JSON are the artifacts. No intermediate spec language. +- **No Lean changes.** Lean remains optional and orthogonal to this enhancement. + +## Files Changed + +1. `.claude/skills/verify-reduction/SKILL.md` — new Step 4.5, updated Step 5 and Step 8 +2. `.claude/skills/add-rule/SKILL.md` — add "check for existing verification artifacts" to Step 1 +3. `.claude/skills/review-pipeline/SKILL.md` — add compositional test vector check to agentic test step (or document as optional enhancement) + +## Success Criteria + +After implementing these enhancements, running `/verify-reduction` → `/add-rule` → `/review-pipeline` on a new reduction should: + +1. Produce a test vectors JSON consumed by both add-rule and review-pipeline +2. Auto-detect Typst ↔ Python example divergence (if introduced deliberately as a test) +3. Generate Rust tests from the same test vectors that the Python scripts verified +4. Catch Rust implementation bugs via `pred reduce`/`pred solve` on those test vectors