Claude/fix prover wiring epy ir#29
Merged
hyperpolymath merged 8 commits intomainfrom Apr 21, 2026
Merged
Conversation
src/rust/provers/mizar_complete.rs (1320 LoC, deleted) Second declaration of `pub struct MizarBackend` with its own `impl ProverBackend` block. The factory at provers/mod.rs:1206 only dispatches to `mizar::MizarBackend`; `mizar_complete` was never reachable. Keeping it inflated the file count (68 vs genuine 66) and the "cargo check" surface without shipping reachable code. src/rust/provers/cvc5.rs.stub (deleted) Placeholder predating the real cvc5.rs implementation. Never compiled (not in mod tree) but still audit noise. .gitignore Ignore .claude/worktrees/ — subagent isolation-mode sandboxes that git should never track. No factory / mod.rs changes; no test count change (628 pass). Remaining count discrepancy (105 ProverKind variants vs 65 backend files) is the TypeChecker-variant issue — 39+ variants declared without factory dispatch. Flagged for separate decision.
Before: only 4 of 54 Julia extractors emitted premise records (abella, coqgym, dedukti, lean3). The other 50 declared a `premises = Dict[]` vector but never push!-ed to it, so the premises_COMPLETE.jsonl built by `just align-premises` carried tiny coverage — training saw 1 392 examples from 50 000 proof states (2.8 %) and MRR plateaued at the 0.66 baseline from the Mar-21 run against a 66-example floor. This wave covers the 22 highest-leverage extractors (all the Lean / Coq / Isabelle / HOL / F* / Idris2 / ACL2 / Dafny / PVS / Mizar / Metamath / Mathcomp / Mathlib4 / AFP / Agda families plus the bounded model-checker / verifier extractors). Each gets a prover-appropriate `hyp_patterns` list (intros/have/let for Lean; DISCH_TAC/REWRITE_TAC for HOL; assume/fix/obtain for Isabelle; proof-step labels for Metamath; etc.) and a premise-push loop that shares the same `proof_id` the sibling proof_state was emitted with, so the FK join in dataloader.jl:381 finally has a match surface. Remaining 28 extractors (model checkers, SAT/SMT benchmarks, synthetic-only provers) will land in a follow-up wave — same template, narrower signal. Next step on the hardware host: `just corpus-refresh` should now produce a premises_COMPLETE.jsonl with orders-of-magnitude more records per proof_state, and `just retrain` should move MRR off 0.66.
Same template as 16e757f wave. MiniZinc constraint-model premises are variable declarations and constraint labels (vars/constraints reachable through the proof body).
Single source of truth for where ECHIDNA is going. Ties together:
* the 8-stage map (ML signal → learning loop closure → interaction
honesty → distributed+fast → cross-prover semantics → sovereign
tooling → self-verified)
* row-by-row end-state targets (the flip-the-audit-table pass:
every "not present" / "partial" claim gets a concrete numerical
or structural target)
* current sprint (S1-S5: extractor wave closeout, training rerun,
typed_wasm crate extract + TypeChecker dispatch, Verisim read
paths, tactic-synthesis pilot on 5 provers)
* agent-tier guidance (Opus design, Sonnet implement, Haiku bulk)
* inheritance table mapping the existing planning docs
(FUTURE_DEVELOPMENT_ROADMAP.md, handover/PRODUCTION-WIRING-PLAN,
design/SPARK_ADOPTION_PLAN, L1/L2/L3 handover prompts, STATE.md,
STATE.a2ml) onto their stages in this document.
Going forward: decisions recorded here, not in chat; stage sub-items
get `[done YYYY-MM-DD ✓]` stamps as they land; §3 rows preserve
history (the "Today" column becomes "Done" but the "End-state"
column stays as the invariant).
… checkers/SMT/verifiers) Completes the premise-wiring pass started in 16e757f/9a897a9. Files edited (23, the remaining un-wired extractors): nunchaku, nusmv, prism, proverif, seahorn, spin, tamarin tlaps, tlc, typed_wasm, uppaal, viper, why3, dreal, abc acl2s, isabelle_tropical, isabelle_zf, sat_benchmarks smtlib, tptp, typechecker_ecosystem, opentheory Pattern used: - ps/ts/pm extractors (save_common): push!(pm, Dict(proof_id=>id, …)) inside the same loop that pushes proof_states - Standalone extractors (own save fn): add premises vector, populate per-record, write to premises_NAME.jsonl alongside proof_states file - Complex a2ml extractors (smtlib, tptp): add premises vec + JSON3 write - Extractor-specific identifier regex for max semantic signal; generic \b([A-Za-z][A-Za-z0-9_']{1,40})\b fallback with len<50 guard where the source language lacks tactically meaningful identifiers Expected effect: premises_COMPLETE.jsonl gains ~600K–2M new rows; proof_id join in dataloader.jl:381 now covers close to 100% of proof_states, ending the 2.8% utilisation ceiling. https://claude.ai/code/session_01NFEuqk68SFBy49KRSSVL8N
Adds two sections to .claude/CLAUDE.md:
1. "Canonical Roadmap" — points every Claude session (and Gemini)
at docs/ROADMAP.md as the source of truth, with the instruction
to record decisions there not in chat.
2. "Supervised Multi-Agent Execution" — codifies the scout pattern
as a standing project-level rule: before spawning a focused
implementation agent, the supervisor MUST run a Haiku scout
pass to clear trivial cruft. Full prompt template included
verbatim so anyone (human or model) can reproduce it.
Scope: only applies when a supervisor tier delegates to an
executor/bulk tier; pure single-file edits under ~10 lines are
exempted. Caveats spelled out: scout is not a full audit; must
not over-fix (TODOs may be load-bearing spec); is idempotent;
supervisor decides what to do with blockers before the
implementation agent launches.
Commit convention for scout-applied fixes:
chore(<scope>): scout-pass trivial cleanup ahead of <step id>
This is now standing procedure for all supervised work on echidna.
Partial scaffold for Sprint S3 crate extraction. Full implementation (lib.rs + TypeInfo routing + workspace registration) follows in the S3 Sonnet agent pass. https://claude.ai/code/session_01NFEuqk68SFBy49KRSSVL8N
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.