Claude/echidna mcp by hyperpolymath · Pull Request #34 · hyperpolymath/echidna

hyperpolymath · 2026-04-22T22:33:19Z

No description provided.

…tool Exposes the `echidna` CLI through the Model Context Protocol so Claude Code / Claude API / other MCP-aware agents can invoke ECHIDNA's 105 prover backends as a first-class tool-use action. ### What ships - New workspace member `crates/echidna-mcp/` with a Rust binary built on rmcp 1.5 (`ToolRouter` + `#[tool_router]` / `#[tool_handler]` macros, stdio transport). Single tool: `prove`. - Tool schema (auto-generated from `ProveParams` via schemars 1): - `file: string` (required) — path to proof file (SMT-LIB, Lean, Coq, Agda, Isabelle, Mizar, F*, SPASS, Metamath, …) - `prover: string` (optional) — backend name; auto-detect if absent - `timeout_secs: u32` (optional, default 300) Returns `{verified, message, prover, raw_output, stderr}`. - Stdio transport; log goes to stderr (so MCP clients don't see it on stdout). Echidna binary path via `ECHIDNA_BIN` env or `echidna` in PATH. - Claude Code wiring: `.claude/settings.json` with `{"mcpServers": {"echidna": {"command": "echidna-mcp"}}}`. ### Smoke test (verified on this branch) 1. Install z3 (apt) — 4.8.12 in PATH. 2. Trivial SMT-LIB file `(set-logic QF_LIA) (declare-const x Int) (assert (> x 0)) (assert (< x 2)) (check-sat)`. 3. Feed the MCP server: initialize → notifications/initialized → tools/list → tools/call(prove, {file, prover:"Z3", timeout_secs:30}). 4. Response: `{"verified": true, "message": "✓ Proof verified successfully...", "prover": "Z3", ...}`. End-to-end path works: MCP JSON-RPC → echidna CLI → Z3 → verdict. ### Roadmap update (docs/handover/TODO.md) Added "Wave-5 (new backend targets, no adapter yet)" section tracking prover backends that do not yet have an adapter in src/rust/provers/: - **Andromeda** — scaffold `provers/andromeda.rs`, shells out to `andromeda` CLI (OCaml source build required). Open-source (MIT). - **Theorema** — deferred (Mathematica-based, commercial licence). - **Globular** — not a CLI prover (web UI for higher category theory); skipped unless scoped to graphical proof capture. ### Next sessions - Install + wire the remaining 7 provers from the agreed MVP subset: Coq/Rocq, Isabelle, Agda, Mizar, F*, Lean 4, SPASS (SPASS apt pkg unavailable on this env, needs manual), Metamath. - Containerfile: ECHIDNA + curated provers + the MCP server, so agents can `podman run` a self-contained proving service. - Andromeda backend scaffold (Wave-5 entry #1).

Upgrades the capnp runtime and capnpc code-generator in echidna-wire to clear RUSTSEC-2025-0143 ("Unsound APIs of public constant::Reader and StructSchema", fixed in >=0.24). capnpc 0.25 regenerates the wire module identically and the 3 echidna-wire smoke tests pass. Only remaining cargo-audit warning is RUSTSEC-2025-0134 (rustls-pemfile unmaintained) pulled transitively via tonic 0.12 in echidna-grpc; that is tracked as a separate tonic 0.13 bump campaign. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

The previous pipeline parsed a .smt2/.v/.agda/.dfg/.mm file into the universal Term IR and then reconstructed the solver's input from that IR (via each backend's `export(state)` or `term_to_*` helper) before invoking the solver. The parsers are incomplete, so non-trivial inputs were silently mangled — the Coq parser literally emits `"param": "(P"` with a stray paren — and every downstream solver then rejected the regenerated file. A few backends hid this by early-returning `Ok(true)` whenever `state.goals.is_empty()`, which fires for real files whose parser simply fails to populate goals the way the generic pipeline expects — fake success, solver never ran. Fix follows the pattern already established by Mizar: `parse_file` stashes the absolute path under `"source_path"` and `parse_string` stashes the raw content under `"{prover}_source"`; `verify_proof` prefers those over the lossy reconstruction and falls back to the old IR-driven path only for synthetic ProofStates that were never parsed from real source. Also fixes two binary-invocation bugs surfaced during end-to-end smoke testing: - Agda: spawn the process in the file's parent directory so the top-level module name resolves relative to CWD (otherwise every non-trivial file trips "module does not match file name"). - SPASS: pass input as a positional file argument (the binary rejects DFG on stdin, misreading it as additional flags); use `-TimeLimit=N` flag syntax (space-separated form makes SPASS try to open "N" as a filename); drop `-PGiven=0 -PProblem=0` which suppress the "Proof found" banner parse_result matches on. - Metamath: wire `verify_proof` to the real `metamath` CLI via a scripted stdin session (`read "…"; verify proof *; exit`) instead of returning `Ok(state.goals.is_empty())` — the previous impl never ran the solver. End-to-end MCP smoke tests now pass for all six apt-installable MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS): `tools/call prove` → solver → `{"verified": true}`. The 625-test unit suite remains green; clippy on the bin+lib remains clean. The remaining ~30 LOSSY_ROUNDTRIP backends (Lean, Isabelle, F*, Idris2, Vampire, etc.), the PVS/HOL4 fake-success stubs, and the TRIVIAL_ONLY SAT solvers (CaDiCaL/Kissat/MiniSat) are tracked for follow-up. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Both backends' verify_proof was `Ok(state.goals.is_empty())` — no solver invocation at any point, so every file with "no goals left after parsing" (which is the normal case for top-level declarations) reported VERIFIED without running PVS or HOL4. parse_file now stashes the original path as source_path (consistent with the pattern introduced for the SMT/Coq/Agda/Metamath/SPASS backends). verify_proof prefers that path: for PVS it runs the executable in batch typecheck mode and rejects runs containing "error" or "typecheck failed"; for HOL4 it runs the configured driver against the file. Neither PVS nor HOL4 is installed in this environment, so the real-solver path is code-only; the fallback preserves the legacy empty-goals shortcut for synthetic ProofStates that never came from a parsed file. The 625-test unit suite remains green. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Multi-stage build that ships echidna + echidna-mcp alongside the 6 apt-installable MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS) and three upstream-distributed ones (Lean 4 via elan, Isabelle 2025 via tum.de tarball with a pre-built Main heap, F* via the latest GitHub-release Linux asset). The runtime entry point is /app/bin/echidna-mcp so an MCP client connects over stdio: podman build -f .containerization/Containerfile.mcp -t echidna:mcp . podman run --rm -i echidna:mcp Consumers wire it into Claude Code / Claude Desktop / any MCP client via: {"mcpServers": {"echidna": { "command": "podman", "args": ["run","--rm","-i","echidna:mcp"] }}} Mizar is omitted — it has no Debian package and the SourceForge upstream distributes neither a stable URL nor a signed tarball, so pulling it reproducibly inside a container build is a separate exercise. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Batch-apply the same fix shape used for the MVP provers (z3, cvc5, coq, agda, metamath, spass) to a first wave of Phase-3 backends: abc, acl2, altergo, eprover, vampire. parse_file stashes the absolute path as "source_path"; parse_string stashes the raw content as "{prover}_source"; verify_proof prefers those over reconstruction and falls back to the old IR-driven path only for synthetic ProofStates. No binary-invocation changes in this wave — we only gate the existing solver-spawn logic behind a metadata lookup. The 625-test unit suite remains green after the change; cargo build --release completes clean. Further Phase-3 waves will follow for the remaining LOSSY backends. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Applies the same parse_file → stash "source_path", parse_string → stash "{prover}_source", verify_proof → prefer-metadata-before-IR fix to the rest of the LOSSY_ROUNDTRIP backends: alloy, cbmc, chuffed, dafny, dreal, framac, fstar, glpk, hol_light, idris2, imandra, isabelle, key, lean, minizinc, minlog, mizar, nuprl, nusmv, ortools, prism, proverif, scip, seahorn, spin_checker, tamarin, tlaps, tlc, twelf, uppaal, viper, why3. Non-trivial adaptations required to preserve semantics: - idris2: Idris 2 resolves the top-level module name from CWD like Agda, so the source_path branch spawns with current_dir set to the file's parent and passes the bare filename. - seahorn: already had a seahorn_input_file escape hatch for pre- compiled .bc/.ll bitcode; source_path becomes the primary preference with the existing hatch and the IR reconstruction as successive fallbacks. - cbmc/framac: the --unwind / -wp-* metadata knobs were hoisted above the source_path branch so every path honours the user's bound overrides. - prism/tlc: these write two files (model + properties/cfg). The source_path branch passes the real model file verbatim; the multi-file generation fallback stays intact behind the if-let. - isabelle/mizar: both already had partial source stashing (raw_thy_content, source_path). Normalised to the canonical source_path + {prover}_source convention; Mizar's is_complete() short-circuit was moved below the source-backed branches so real files are not rejected for having open goals in our IR. - acl2: ACL2 reads input on stdin — the source_path branch opens the real file as Stdio::from(File) rather than piping string content, matching the existing lossy-path invocation shape. No trait signatures changed; every fallback path remains byte-for- byte identical behind an if-let guard. Full workspace build, the 625-test unit suite, and `cargo clippy --lib --bin echidna` all remain green. Phase 3 closes out the broken-backend audit that started with the Z3/CVC5/Coq MVP fixes: 39 backends now prefer their original source over the lossy Term-IR round-trip. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

chatgpt-codex-connector · 2026-04-22T22:33:24Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

claude added 7 commits April 22, 2026 16:39

Merge branch 'main' into claude/echidna-mcp

d374737

hyperpolymath merged commit 6ee5c64 into main Apr 22, 2026

hyperpolymath deleted the claude/echidna-mcp branch April 22, 2026 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude/echidna mcp#34

Claude/echidna mcp#34
hyperpolymath merged 8 commits intomainfrom
claude/echidna-mcp

hyperpolymath commented Apr 22, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Apr 22, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants