Skip to content

Claude/fix prover wiring epy ir#30

Merged
hyperpolymath merged 11 commits intomainfrom
claude/fix-prover-wiring-epyIR
Apr 21, 2026
Merged

Claude/fix prover wiring epy ir#30
hyperpolymath merged 11 commits intomainfrom
claude/fix-prover-wiring-epyIR

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

No description provided.

claude added 10 commits April 21, 2026 09:26
src/rust/provers/mizar_complete.rs  (1320 LoC, deleted)
  Second declaration of `pub struct MizarBackend` with its own
  `impl ProverBackend` block.  The factory at provers/mod.rs:1206
  only dispatches to `mizar::MizarBackend`; `mizar_complete` was never
  reachable.  Keeping it inflated the file count (68 vs genuine 66)
  and the "cargo check" surface without shipping reachable code.

src/rust/provers/cvc5.rs.stub  (deleted)
  Placeholder predating the real cvc5.rs implementation.  Never
  compiled (not in mod tree) but still audit noise.

.gitignore
  Ignore .claude/worktrees/ — subagent isolation-mode sandboxes
  that git should never track.

No factory / mod.rs changes; no test count change (628 pass).
Remaining count discrepancy (105 ProverKind variants vs 65 backend
files) is the TypeChecker-variant issue — 39+ variants declared
without factory dispatch.  Flagged for separate decision.
Before: only 4 of 54 Julia extractors emitted premise records
(abella, coqgym, dedukti, lean3).  The other 50 declared a
`premises = Dict[]` vector but never push!-ed to it, so the
premises_COMPLETE.jsonl built by `just align-premises` carried
tiny coverage — training saw 1 392 examples from 50 000 proof
states (2.8 %) and MRR plateaued at the 0.66 baseline from the
Mar-21 run against a 66-example floor.

This wave covers the 22 highest-leverage extractors (all the
Lean / Coq / Isabelle / HOL / F* / Idris2 / ACL2 / Dafny / PVS /
Mizar / Metamath / Mathcomp / Mathlib4 / AFP / Agda families plus
the bounded model-checker / verifier extractors).  Each gets a
prover-appropriate `hyp_patterns` list (intros/have/let for Lean;
DISCH_TAC/REWRITE_TAC for HOL; assume/fix/obtain for Isabelle;
proof-step labels for Metamath; etc.) and a premise-push loop that
shares the same `proof_id` the sibling proof_state was emitted
with, so the FK join in dataloader.jl:381 finally has a match
surface.

Remaining 28 extractors (model checkers, SAT/SMT benchmarks,
synthetic-only provers) will land in a follow-up wave — same
template, narrower signal.

Next step on the hardware host: `just corpus-refresh` should now
produce a premises_COMPLETE.jsonl with orders-of-magnitude more
records per proof_state, and `just retrain` should move MRR off
0.66.
Same template as 16e757f wave.  MiniZinc constraint-model premises
are variable declarations and constraint labels (vars/constraints
reachable through the proof body).
Same template as 16e757f / dbd611a waves.  naproche uses ForTheL
natural-language identifiers; nitpick uses Isabelle identifiers
from the lemma body.
Single source of truth for where ECHIDNA is going.  Ties together:

  * the 8-stage map (ML signal → learning loop closure → interaction
    honesty → distributed+fast → cross-prover semantics → sovereign
    tooling → self-verified)
  * row-by-row end-state targets (the flip-the-audit-table pass:
    every "not present" / "partial" claim gets a concrete numerical
    or structural target)
  * current sprint (S1-S5: extractor wave closeout, training rerun,
    typed_wasm crate extract + TypeChecker dispatch, Verisim read
    paths, tactic-synthesis pilot on 5 provers)
  * agent-tier guidance (Opus design, Sonnet implement, Haiku bulk)
  * inheritance table mapping the existing planning docs
    (FUTURE_DEVELOPMENT_ROADMAP.md, handover/PRODUCTION-WIRING-PLAN,
    design/SPARK_ADOPTION_PLAN, L1/L2/L3 handover prompts, STATE.md,
    STATE.a2ml) onto their stages in this document.

Going forward: decisions recorded here, not in chat; stage sub-items
get `[done YYYY-MM-DD ✓]` stamps as they land; §3 rows preserve
history (the "Today" column becomes "Done" but the "End-state"
column stays as the invariant).
… checkers/SMT/verifiers)

Completes the premise-wiring pass started in 16e757f/9a897a9.

Files edited (23, the remaining un-wired extractors):
  nunchaku, nusmv, prism, proverif, seahorn, spin, tamarin
  tlaps, tlc, typed_wasm, uppaal, viper, why3, dreal, abc
  acl2s, isabelle_tropical, isabelle_zf, sat_benchmarks
  smtlib, tptp, typechecker_ecosystem, opentheory

Pattern used:
  - ps/ts/pm extractors (save_common): push!(pm, Dict(proof_id=>id, …))
    inside the same loop that pushes proof_states
  - Standalone extractors (own save fn): add premises vector, populate
    per-record, write to premises_NAME.jsonl alongside proof_states file
  - Complex a2ml extractors (smtlib, tptp): add premises vec + JSON3 write
  - Extractor-specific identifier regex for max semantic signal;
    generic \b([A-Za-z][A-Za-z0-9_']{1,40})\b fallback with len<50 guard
    where the source language lacks tactically meaningful identifiers

Expected effect: premises_COMPLETE.jsonl gains ~600K–2M new rows;
proof_id join in dataloader.jl:381 now covers close to 100% of
proof_states, ending the 2.8% utilisation ceiling.

https://claude.ai/code/session_01NFEuqk68SFBy49KRSSVL8N
Adds two sections to .claude/CLAUDE.md:

  1. "Canonical Roadmap" — points every Claude session (and Gemini)
     at docs/ROADMAP.md as the source of truth, with the instruction
     to record decisions there not in chat.

  2. "Supervised Multi-Agent Execution" — codifies the scout pattern
     as a standing project-level rule: before spawning a focused
     implementation agent, the supervisor MUST run a Haiku scout
     pass to clear trivial cruft.  Full prompt template included
     verbatim so anyone (human or model) can reproduce it.

Scope: only applies when a supervisor tier delegates to an
executor/bulk tier; pure single-file edits under ~10 lines are
exempted.  Caveats spelled out: scout is not a full audit; must
not over-fix (TODOs may be load-bearing spec); is idempotent;
supervisor decides what to do with blockers before the
implementation agent launches.

Commit convention for scout-applied fixes:
  chore(<scope>): scout-pass trivial cleanup ahead of <step id>

This is now standing procedure for all supervised work on echidna.
Partial scaffold for Sprint S3 crate extraction.  Full implementation
(lib.rs + TypeInfo routing + workspace registration) follows in the
S3 Sonnet agent pass.

https://claude.ai/code/session_01NFEuqk68SFBy49KRSSVL8N
Background: a retrain run on a fresh 50k proof_states / 2.16M premise
corpus produced 0 training examples because align-premises was not run
between merge-corpora and retrain.  The dataloader joins by proof_id
and merge_corpus rewrites proof_state ids to fresh sequential counters
while premise files keep original extractor ids — so without alignment
the join matches nothing.

Adds a retrain-skip-align target for the iterate-on-model-code case.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: Jonathan D.A. Jewell <6759885+hyperpolymath@users.noreply.github.com>
@hyperpolymath hyperpolymath merged commit 1b3a0e9 into main Apr 21, 2026
23 of 37 checks passed
@hyperpolymath hyperpolymath deleted the claude/fix-prover-wiring-epyIR branch April 21, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants