Claude/fix prover wiring epy ir by hyperpolymath · Pull Request #29 · hyperpolymath/echidna

hyperpolymath · 2026-04-21T09:53:11Z

No description provided.

src/rust/provers/mizar_complete.rs (1320 LoC, deleted) Second declaration of `pub struct MizarBackend` with its own `impl ProverBackend` block. The factory at provers/mod.rs:1206 only dispatches to `mizar::MizarBackend`; `mizar_complete` was never reachable. Keeping it inflated the file count (68 vs genuine 66) and the "cargo check" surface without shipping reachable code. src/rust/provers/cvc5.rs.stub (deleted) Placeholder predating the real cvc5.rs implementation. Never compiled (not in mod tree) but still audit noise. .gitignore Ignore .claude/worktrees/ — subagent isolation-mode sandboxes that git should never track. No factory / mod.rs changes; no test count change (628 pass). Remaining count discrepancy (105 ProverKind variants vs 65 backend files) is the TypeChecker-variant issue — 39+ variants declared without factory dispatch. Flagged for separate decision.

Before: only 4 of 54 Julia extractors emitted premise records (abella, coqgym, dedukti, lean3). The other 50 declared a `premises = Dict[]` vector but never push!-ed to it, so the premises_COMPLETE.jsonl built by `just align-premises` carried tiny coverage — training saw 1 392 examples from 50 000 proof states (2.8 %) and MRR plateaued at the 0.66 baseline from the Mar-21 run against a 66-example floor. This wave covers the 22 highest-leverage extractors (all the Lean / Coq / Isabelle / HOL / F* / Idris2 / ACL2 / Dafny / PVS / Mizar / Metamath / Mathcomp / Mathlib4 / AFP / Agda families plus the bounded model-checker / verifier extractors). Each gets a prover-appropriate `hyp_patterns` list (intros/have/let for Lean; DISCH_TAC/REWRITE_TAC for HOL; assume/fix/obtain for Isabelle; proof-step labels for Metamath; etc.) and a premise-push loop that shares the same `proof_id` the sibling proof_state was emitted with, so the FK join in dataloader.jl:381 finally has a match surface. Remaining 28 extractors (model checkers, SAT/SMT benchmarks, synthetic-only provers) will land in a follow-up wave — same template, narrower signal. Next step on the hardware host: `just corpus-refresh` should now produce a premises_COMPLETE.jsonl with orders-of-magnitude more records per proof_state, and `just retrain` should move MRR off 0.66.

Same template as 16e757f wave. MiniZinc constraint-model premises are variable declarations and constraint labels (vars/constraints reachable through the proof body).

Same template as 16e757f / dbd611a waves. naproche uses ForTheL natural-language identifiers; nitpick uses Isabelle identifiers from the lemma body.

Single source of truth for where ECHIDNA is going. Ties together: * the 8-stage map (ML signal → learning loop closure → interaction honesty → distributed+fast → cross-prover semantics → sovereign tooling → self-verified) * row-by-row end-state targets (the flip-the-audit-table pass: every "not present" / "partial" claim gets a concrete numerical or structural target) * current sprint (S1-S5: extractor wave closeout, training rerun, typed_wasm crate extract + TypeChecker dispatch, Verisim read paths, tactic-synthesis pilot on 5 provers) * agent-tier guidance (Opus design, Sonnet implement, Haiku bulk) * inheritance table mapping the existing planning docs (FUTURE_DEVELOPMENT_ROADMAP.md, handover/PRODUCTION-WIRING-PLAN, design/SPARK_ADOPTION_PLAN, L1/L2/L3 handover prompts, STATE.md, STATE.a2ml) onto their stages in this document. Going forward: decisions recorded here, not in chat; stage sub-items get `[done YYYY-MM-DD ✓]` stamps as they land; §3 rows preserve history (the "Today" column becomes "Done" but the "End-state" column stays as the invariant).

… checkers/SMT/verifiers) Completes the premise-wiring pass started in 16e757f/9a897a9. Files edited (23, the remaining un-wired extractors): nunchaku, nusmv, prism, proverif, seahorn, spin, tamarin tlaps, tlc, typed_wasm, uppaal, viper, why3, dreal, abc acl2s, isabelle_tropical, isabelle_zf, sat_benchmarks smtlib, tptp, typechecker_ecosystem, opentheory Pattern used: - ps/ts/pm extractors (save_common): push!(pm, Dict(proof_id=>id, …)) inside the same loop that pushes proof_states - Standalone extractors (own save fn): add premises vector, populate per-record, write to premises_NAME.jsonl alongside proof_states file - Complex a2ml extractors (smtlib, tptp): add premises vec + JSON3 write - Extractor-specific identifier regex for max semantic signal; generic \b([A-Za-z][A-Za-z0-9_']{1,40})\b fallback with len<50 guard where the source language lacks tactically meaningful identifiers Expected effect: premises_COMPLETE.jsonl gains ~600K–2M new rows; proof_id join in dataloader.jl:381 now covers close to 100% of proof_states, ending the 2.8% utilisation ceiling. https://claude.ai/code/session_01NFEuqk68SFBy49KRSSVL8N

Adds two sections to .claude/CLAUDE.md: 1. "Canonical Roadmap" — points every Claude session (and Gemini) at docs/ROADMAP.md as the source of truth, with the instruction to record decisions there not in chat. 2. "Supervised Multi-Agent Execution" — codifies the scout pattern as a standing project-level rule: before spawning a focused implementation agent, the supervisor MUST run a Haiku scout pass to clear trivial cruft. Full prompt template included verbatim so anyone (human or model) can reproduce it. Scope: only applies when a supervisor tier delegates to an executor/bulk tier; pure single-file edits under ~10 lines are exempted. Caveats spelled out: scout is not a full audit; must not over-fix (TODOs may be load-bearing spec); is idempotent; supervisor decides what to do with blockers before the implementation agent launches. Commit convention for scout-applied fixes: chore(<scope>): scout-pass trivial cleanup ahead of <step id> This is now standing procedure for all supervised work on echidna.

Partial scaffold for Sprint S3 crate extraction. Full implementation (lib.rs + TypeInfo routing + workspace registration) follows in the S3 Sonnet agent pass. https://claude.ai/code/session_01NFEuqk68SFBy49KRSSVL8N

chatgpt-codex-connector · 2026-04-21T09:53:17Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

claude added 8 commits April 21, 2026 09:26

feat(corpus): emit premises from extract_minizinc.jl

dbd611a

Same template as 16e757f wave. MiniZinc constraint-model premises are variable declarations and constraint labels (vars/constraints reachable through the proof body).

feat(corpus): emit premises from naproche + nitpick extractors

9a897a9

Same template as 16e757f / dbd611a waves. naproche uses ForTheL natural-language identifiers; nitpick uses Isabelle identifiers from the lemma body.

chore(s3): stub crates/typed_wasm/Cargo.toml ahead of extraction

fb0af7d

Partial scaffold for Sprint S3 crate extraction. Full implementation (lib.rs + TypeInfo routing + workspace registration) follows in the S3 Sonnet agent pass. https://claude.ai/code/session_01NFEuqk68SFBy49KRSSVL8N

hyperpolymath merged commit 74e0504 into main Apr 21, 2026
23 of 38 checks passed

hyperpolymath deleted the claude/fix-prover-wiring-epyIR branch April 21, 2026 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude/fix prover wiring epy ir#29

Claude/fix prover wiring epy ir#29
hyperpolymath merged 8 commits intomainfrom
claude/fix-prover-wiring-epyIR

hyperpolymath commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants