Skip to content

feat: add verify-reduction skill#979

Open
zazabap wants to merge 16 commits intomainfrom
feat/verify-reduction-skill
Open

feat: add verify-reduction skill#979
zazabap wants to merge 16 commits intomainfrom
feat/verify-reduction-skill

Conversation

@zazabap
Copy link
Copy Markdown
Collaborator

@zazabap zazabap commented Apr 1, 2026

Summary

New skill /verify-reduction for end-to-end mathematical verification of reduction rules.

Takes an issue number, produces 3 verified artifacts, iterates until all checks pass, and submits a PR:

  1. Typst proof — Construction + Correctness (⟹/⟸) + Extraction + Overhead + YES/NO examples
  2. Python script — 7 mandatory sections, ≥5000 checks, exhaustive n≤5
  3. Lean lemmas — non-trivial structural proofs (not just rfl)

Strict quality gates

  • No "trivial" category — every reduction gets ≥5,000 checks
  • Two examples required (YES + NO instance, both ≥3 variables)
  • Zero hand-waving ("clearly", "obviously" → rejected)
  • Mandatory gap analysis with claim-to-test mapping
  • Self-review checklist: 20+ items across Typst/Python/Lean/cross-consistency

Validation

Developed through PR #975 (9 reductions, 800K+ checks, 3 bugs caught by Python verification). Tested on:

Files

  • .claude/skills/verify-reduction/SKILL.md — the skill definition
  • .claude/CLAUDE.md — registration entry added

Test plan

🤖 Generated with Claude Code

zazabap and others added 2 commits March 31, 2026 16:13
Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198),
4 Tier 1a blocked issues (#379, #380, #888, #822), and
3 Tier 1b blocked issues (#892, #894, #890).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uctions

New skill: /verify-reduction <issue-number>

End-to-end pipeline that takes a reduction rule issue and produces:
1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples)
2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5)
3. Lean 4 lemmas (non-trivial structural proofs required)

Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR.

Strict quality gates (zero tolerance):
- No "trivial" category — every reduction ≥5000 checks
- 7 mandatory Python sections including NO (infeasible) example
- Non-trivial Lean required (rfl/omega tautologies rejected)
- Zero hand-waving in Typst ("clearly", "obviously" → rejected)
- Mandatory gap analysis: every proof claim must have a test
- Self-review checklist with 20+ items across 4 categories

Developed and validated through PR #975 (800K+ checks, 3 bugs caught)
and tested on issues #868 (caught wrong example) and #841 (35K checks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.03%. Comparing base (423506c) to head (62a26fa).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #979   +/-   ##
=======================================
  Coverage   98.03%   98.03%           
=======================================
  Files         784      784           
  Lines       82310    82310           
=======================================
+ Hits        80695    80696    +1     
+ Misses       1615     1614    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zazabap and others added 14 commits April 1, 2026 05:34
…kill

- Added frontmatter (name, description) matching other skills' convention
- Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP",
  "NON-NEGOTIABLE") to professional but firm language
- All quality gates unchanged — same strictness, better presentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces Lean-required gates with adversarial second agent:

- Step 5: Adversary agent independently implements reduce() and
  extract_solution() from theorem statement only (not constructor's script)
- Step 5c: Cross-comparison of both implementations on 1000 instances
- Lean downgraded from required to optional
- hypothesis property-based testing for n up to 50
- Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison

Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md):
- Same agent writing proof + test is the #1 risk for AI verification
- Two independent implementations agreeing > one + trivial Lean
- Lean caught 0 bugs in PR #975; Python caught all 4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Typst↔Python auto-matching, test vectors JSON for downstream
consumption by add-rule and review-pipeline, adversary tailoring
by reduction type, compositional verification via pred CLI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks: update verify-reduction (Step 4.5 auto-matching, Step 5 typed
adversary, Step 8 downstream artifacts), create add-reduction skill,
register in CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ary, pipeline integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion description

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. verify-reduction Step 1: type compatibility gate — checks source/target
   Value types before proceeding. Stops and comments on issue if types are
   incompatible (e.g., optimization → decision needs K parameter).

2. add-reduction Step 7: mandatory cleanup of verification artifacts from
   docs/paper/verify-reductions/ — Python scripts, JSON, Typst, PDF must
   not get into the library.

3. add-reduction Steps 4/4b/5: mandatory requirements from #974 —
   canonical example in rule_builders.rs (Check 9), example-db lookup
   test (Check 10), paper reduction-rule entry (Check 11).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dd-reduction

verify-reduction: removed verbose templates, condensed checklists into
prose, kept all requirements but removed boilerplate code blocks that
the agent can derive from context.

add-reduction: integrated add-rule Steps 1-6, write-rule-in-paper
Steps 1-6, and #974 requirements (Checks 9/10/11) into a single
self-contained skill. No need to read 3 other skills.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous rewrite over-condensed the skill, removing gates that agents
need to follow: 7-section descriptions with table, minimum check count
table, check count audit template, gap analysis format, common mistakes
table, and self-review checklist with checkboxes.

Restored: all structural gates and requirements.
Kept concise: no verbose Python/Typst code templates (agent derives these).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Steps 4, 4b, 5 now have HARD GATE labels with verification commands
that check the SPECIFIC required files appear in `git diff --name-only`.
Step 8 has a pre-commit gate that lists all 6 required files and blocks
commit if any is missing.

Root cause: subagents skipped Steps 4 (put example in rule file instead
of rule_builders.rs) and 5 (skipped paper entry entirely) because the
skill said "MANDATORY" but had no mechanical enforcement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: PRs #985 and #991 failed CI because:
1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop)
2. New reductions can create paths that dominate existing direct reductions
   (test_find_dominated_rules_returns_known_set has hardcoded known set)

Added to Step 6:
- Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`)
- Mandatory `cargo test` (full suite, not filtered)
- Explicit dominated-rules gate with fix instructions

Added to Common Mistakes:
- clippy without -D warnings
- dominated rules test
- skipping full cargo test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rule_builders.rs is a 4-line pass-through — canonical examples are
registered via canonical_rule_example_specs() in each rule file,
wired through mod.rs. Updated Step 4 to match actual architecture.

Also added analysis.rs to git add list (for dominated-rules updates).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two enforcement mechanisms that don't rely on subagent compliance:

1. Parent-side verification (Step 8a): After subagent reports DONE,
   the parent runs file gate checks independently. If any required
   file is missing, sends the subagent back — doesn't trust self-report.

2. Pre-commit hook (.claude/hooks/add-reduction-precommit.sh):
   Mechanically blocks commits of new rule files unless example_db.rs,
   reductions.typ, and mod.rs are also staged. Subagents cannot bypass.

Root cause: subagents skip HARD GATE steps despite skill text saying
"MANDATORY". Text-based enforcement doesn't work — need mechanical
checks that run after the subagent, not instructions the subagent reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant