feat: add verify-reduction skill by zazabap · Pull Request #979 · CodingThrust/problem-reductions

zazabap · 2026-04-01T05:26:38Z

Summary

New skill /verify-reduction for end-to-end mathematical verification of reduction rules.

Takes an issue number, produces 3 verified artifacts, iterates until all checks pass, and submits a PR:

Typst proof — Construction + Correctness (⟹/⟸) + Extraction + Overhead + YES/NO examples
Python script — 7 mandatory sections, ≥5000 checks, exhaustive n≤5
Lean lemmas — non-trivial structural proofs (not just rfl)

Strict quality gates

No "trivial" category — every reduction gets ≥5,000 checks
Two examples required (YES + NO instance, both ≥3 variables)
Zero hand-waving ("clearly", "obviously" → rejected)
Mandatory gap analysis with claim-to-test mapping
Self-review checklist: 20+ items across Typst/Python/Lean/cross-consistency

Validation

Developed through PR #975 (9 reductions, 800K+ checks, 3 bugs caught by Python verification). Tested on:

Issue [Rule] SATISFIABILITY to NON-TAUTOLOGY #868 (SAT → NonTautology) — caught wrong example assignment
Issue [Rule] NOT-ALL-EQUAL 3SAT to SET SPLITTING #841 (NAE-SAT → SetSplitting) — 35K checks, caught scratch work in self-review

Files

.claude/skills/verify-reduction/SKILL.md — the skill definition
.claude/CLAUDE.md — registration entry added

Test plan

Skill validated on issue [Rule] SATISFIABILITY to NON-TAUTOLOGY #868 (PR docs: add proposed reduction rules verification note (9 reductions) #975)
Skill validated on issue [Rule] NOT-ALL-EQUAL 3SAT to SET SPLITTING #841 (PR docs: verify reduction #841 — NAESatisfiability → SetSplitting #978)
Strict mode caught issues that lenient mode missed (PR docs: verify reduction #841 — NAESatisfiability → SetSplitting #977 closed)

🤖 Generated with Claude Code

Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198), 4 Tier 1a blocked issues (#379, #380, #888, #822), and 3 Tier 1b blocked issues (#892, #894, #890). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…uctions New skill: /verify-reduction <issue-number> End-to-end pipeline that takes a reduction rule issue and produces: 1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples) 2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5) 3. Lean 4 lemmas (non-trivial structural proofs required) Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR. Strict quality gates (zero tolerance): - No "trivial" category — every reduction ≥5000 checks - 7 mandatory Python sections including NO (infeasible) example - Non-trivial Lean required (rfl/omega tautologies rejected) - Zero hand-waving in Typst ("clearly", "obviously" → rejected) - Mandatory gap analysis: every proof claim must have a test - Self-review checklist with 20+ items across 4 categories Developed and validated through PR #975 (800K+ checks, 3 bugs caught) and tested on issues #868 (caught wrong example) and #841 (35K checks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-01T05:31:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.03%. Comparing base (423506c) to head (62a26fa).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #979   +/-   ##
=======================================
  Coverage   98.03%   98.03%           
=======================================
  Files         784      784           
  Lines       82310    82310           
=======================================
+ Hits        80695    80696    +1     
+ Misses       1615     1614    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…kill - Added frontmatter (name, description) matching other skills' convention - Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP", "NON-NEGOTIABLE") to professional but firm language - All quality gates unchanged — same strictness, better presentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces Lean-required gates with adversarial second agent: - Step 5: Adversary agent independently implements reduce() and extract_solution() from theorem statement only (not constructor's script) - Step 5c: Cross-comparison of both implementations on 1000 instances - Lean downgraded from required to optional - hypothesis property-based testing for n up to 50 - Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md): - Same agent writing proof + test is the #1 risk for AI verification - Two independent implementations agreeing > one + trivial Lean - Lean caught 0 bugs in PR #975; Python caught all 4 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Typst↔Python auto-matching, test vectors JSON for downstream consumption by add-rule and review-pipeline, adversary tailoring by reduction type, compositional verification via pred CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

5 tasks: update verify-reduction (Step 4.5 auto-matching, Step 5 typed adversary, Step 8 downstream artifacts), create add-reduction skill, register in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ary, pipeline integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…erify-reduction

…ion description Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. verify-reduction Step 1: type compatibility gate — checks source/target Value types before proceeding. Stops and comments on issue if types are incompatible (e.g., optimization → decision needs K parameter). 2. add-reduction Step 7: mandatory cleanup of verification artifacts from docs/paper/verify-reductions/ — Python scripts, JSON, Typst, PDF must not get into the library. 3. add-reduction Steps 4/4b/5: mandatory requirements from #974 — canonical example in rule_builders.rs (Check 9), example-db lookup test (Check 10), paper reduction-rule entry (Check 11). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dd-reduction verify-reduction: removed verbose templates, condensed checklists into prose, kept all requirements but removed boilerplate code blocks that the agent can derive from context. add-reduction: integrated add-rule Steps 1-6, write-rule-in-paper Steps 1-6, and #974 requirements (Checks 9/10/11) into a single self-contained skill. No need to read 3 other skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The previous rewrite over-condensed the skill, removing gates that agents need to follow: 7-section descriptions with table, minimum check count table, check count audit template, gap analysis format, common mistakes table, and self-review checklist with checkboxes. Restored: all structural gates and requirements. Kept concise: no verbose Python/Typst code templates (agent derives these). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Steps 4, 4b, 5 now have HARD GATE labels with verification commands that check the SPECIFIC required files appear in `git diff --name-only`. Step 8 has a pre-commit gate that lists all 6 required files and blocks commit if any is missing. Root cause: subagents skipped Steps 4 (put example in rule file instead of rule_builders.rs) and 5 (skipped paper entry entirely) because the skill said "MANDATORY" but had no mechanical enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Root cause: PRs #985 and #991 failed CI because: 1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop) 2. New reductions can create paths that dominate existing direct reductions (test_find_dominated_rules_returns_known_set has hardcoded known set) Added to Step 6: - Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`) - Mandatory `cargo test` (full suite, not filtered) - Explicit dominated-rules gate with fix instructions Added to Common Mistakes: - clippy without -D warnings - dominated rules test - skipping full cargo test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rule_builders.rs is a 4-line pass-through — canonical examples are registered via canonical_rule_example_specs() in each rule file, wired through mod.rs. Updated Step 4 to match actual architecture. Also added analysis.rs to git add list (for dominated-rules updates). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two enforcement mechanisms that don't rely on subagent compliance: 1. Parent-side verification (Step 8a): After subagent reports DONE, the parent runs file gate checks independently. If any required file is missing, sends the subagent back — doesn't trust self-report. 2. Pre-commit hook (.claude/hooks/add-reduction-precommit.sh): Mechanically blocks commits of new rule files unless example_db.rs, reductions.typ, and mod.rs are also staged. Subagents cannot bypass. Root cause: subagents skip HARD GATE steps despite skill text saying "MANDATORY". Text-based enforcement doesn't work — need mechanical checks that run after the subagent, not instructions the subagent reads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

zazabap and others added 2 commits March 31, 2026 16:13

zazabap and others added 14 commits April 1, 2026 05:34

feat: enhance verify-reduction with test vectors export, typed advers…

d5b7821

…ary, pipeline integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: create add-reduction skill — consumes verified artifacts from v…

7d8c417

…erify-reduction

feat: register add-reduction skill in CLAUDE.md, update verify-reduct…

022c1c0

…ion description Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add verify-reduction skill#979

feat: add verify-reduction skill#979
zazabap wants to merge 16 commits intomainfrom
feat/verify-reduction-skill

zazabap commented Apr 1, 2026

Uh oh!

codecov bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zazabap commented Apr 1, 2026

Summary

Strict quality gates

Validation

Files

Test plan

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Apr 1, 2026 •

edited

Loading