Claude/echidna mcp#34
Merged
hyperpolymath merged 8 commits intomainfrom Apr 22, 2026
Merged
Conversation
…tool
Exposes the `echidna` CLI through the Model Context Protocol so Claude
Code / Claude API / other MCP-aware agents can invoke ECHIDNA's 105
prover backends as a first-class tool-use action.
### What ships
- New workspace member `crates/echidna-mcp/` with a Rust binary built
on rmcp 1.5 (`ToolRouter` + `#[tool_router]` / `#[tool_handler]`
macros, stdio transport). Single tool: `prove`.
- Tool schema (auto-generated from `ProveParams` via schemars 1):
- `file: string` (required) — path to proof file (SMT-LIB, Lean,
Coq, Agda, Isabelle, Mizar, F*, SPASS, Metamath, …)
- `prover: string` (optional) — backend name; auto-detect if absent
- `timeout_secs: u32` (optional, default 300)
Returns `{verified, message, prover, raw_output, stderr}`.
- Stdio transport; log goes to stderr (so MCP clients don't see it on
stdout). Echidna binary path via `ECHIDNA_BIN` env or `echidna` in PATH.
- Claude Code wiring: `.claude/settings.json` with
`{"mcpServers": {"echidna": {"command": "echidna-mcp"}}}`.
### Smoke test (verified on this branch)
1. Install z3 (apt) — 4.8.12 in PATH.
2. Trivial SMT-LIB file `(set-logic QF_LIA) (declare-const x Int)
(assert (> x 0)) (assert (< x 2)) (check-sat)`.
3. Feed the MCP server: initialize → notifications/initialized →
tools/list → tools/call(prove, {file, prover:"Z3", timeout_secs:30}).
4. Response: `{"verified": true, "message": "✓ Proof verified
successfully...", "prover": "Z3", ...}`.
End-to-end path works: MCP JSON-RPC → echidna CLI → Z3 → verdict.
### Roadmap update (docs/handover/TODO.md)
Added "Wave-5 (new backend targets, no adapter yet)" section tracking
prover backends that do not yet have an adapter in src/rust/provers/:
- **Andromeda** — scaffold `provers/andromeda.rs`, shells out to
`andromeda` CLI (OCaml source build required). Open-source (MIT).
- **Theorema** — deferred (Mathematica-based, commercial licence).
- **Globular** — not a CLI prover (web UI for higher category theory);
skipped unless scoped to graphical proof capture.
### Next sessions
- Install + wire the remaining 7 provers from the agreed MVP subset:
Coq/Rocq, Isabelle, Agda, Mizar, F*, Lean 4, SPASS (SPASS apt pkg
unavailable on this env, needs manual), Metamath.
- Containerfile: ECHIDNA + curated provers + the MCP server, so
agents can `podman run` a self-contained proving service.
- Andromeda backend scaffold (Wave-5 entry #1).
Upgrades the capnp runtime and capnpc code-generator in echidna-wire
to clear RUSTSEC-2025-0143 ("Unsound APIs of public constant::Reader
and StructSchema", fixed in >=0.24). capnpc 0.25 regenerates the
wire module identically and the 3 echidna-wire smoke tests pass.
Only remaining cargo-audit warning is RUSTSEC-2025-0134 (rustls-pemfile
unmaintained) pulled transitively via tonic 0.12 in echidna-grpc; that
is tracked as a separate tonic 0.13 bump campaign.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
The previous pipeline parsed a .smt2/.v/.agda/.dfg/.mm file into the
universal Term IR and then reconstructed the solver's input from that
IR (via each backend's `export(state)` or `term_to_*` helper) before
invoking the solver. The parsers are incomplete, so non-trivial
inputs were silently mangled — the Coq parser literally emits
`"param": "(P"` with a stray paren — and every downstream solver
then rejected the regenerated file. A few backends hid this by
early-returning `Ok(true)` whenever `state.goals.is_empty()`, which
fires for real files whose parser simply fails to populate goals the
way the generic pipeline expects — fake success, solver never ran.
Fix follows the pattern already established by Mizar: `parse_file`
stashes the absolute path under `"source_path"` and `parse_string`
stashes the raw content under `"{prover}_source"`; `verify_proof`
prefers those over the lossy reconstruction and falls back to the
old IR-driven path only for synthetic ProofStates that were never
parsed from real source.
Also fixes two binary-invocation bugs surfaced during end-to-end
smoke testing:
- Agda: spawn the process in the file's parent directory so the
top-level module name resolves relative to CWD (otherwise every
non-trivial file trips "module does not match file name").
- SPASS: pass input as a positional file argument (the binary
rejects DFG on stdin, misreading it as additional flags); use
`-TimeLimit=N` flag syntax (space-separated form makes SPASS
try to open "N" as a filename); drop `-PGiven=0 -PProblem=0`
which suppress the "Proof found" banner parse_result matches on.
- Metamath: wire `verify_proof` to the real `metamath` CLI via a
scripted stdin session (`read "…"; verify proof *; exit`) instead
of returning `Ok(state.goals.is_empty())` — the previous impl
never ran the solver.
End-to-end MCP smoke tests now pass for all six apt-installable
MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS): `tools/call
prove` → solver → `{"verified": true}`. The 625-test unit suite
remains green; clippy on the bin+lib remains clean.
The remaining ~30 LOSSY_ROUNDTRIP backends (Lean, Isabelle, F*,
Idris2, Vampire, etc.), the PVS/HOL4 fake-success stubs, and the
TRIVIAL_ONLY SAT solvers (CaDiCaL/Kissat/MiniSat) are tracked for
follow-up.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Both backends' verify_proof was `Ok(state.goals.is_empty())` — no solver invocation at any point, so every file with "no goals left after parsing" (which is the normal case for top-level declarations) reported VERIFIED without running PVS or HOL4. parse_file now stashes the original path as source_path (consistent with the pattern introduced for the SMT/Coq/Agda/Metamath/SPASS backends). verify_proof prefers that path: for PVS it runs the executable in batch typecheck mode and rejects runs containing "error" or "typecheck failed"; for HOL4 it runs the configured driver against the file. Neither PVS nor HOL4 is installed in this environment, so the real-solver path is code-only; the fallback preserves the legacy empty-goals shortcut for synthetic ProofStates that never came from a parsed file. The 625-test unit suite remains green. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Multi-stage build that ships echidna + echidna-mcp alongside the 6
apt-installable MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS)
and three upstream-distributed ones (Lean 4 via elan, Isabelle 2025
via tum.de tarball with a pre-built Main heap, F* via the latest
GitHub-release Linux asset).
The runtime entry point is /app/bin/echidna-mcp so an MCP client
connects over stdio:
podman build -f .containerization/Containerfile.mcp -t echidna:mcp .
podman run --rm -i echidna:mcp
Consumers wire it into Claude Code / Claude Desktop / any MCP client
via:
{"mcpServers": {"echidna": {
"command": "podman",
"args": ["run","--rm","-i","echidna:mcp"]
}}}
Mizar is omitted — it has no Debian package and the SourceForge
upstream distributes neither a stable URL nor a signed tarball, so
pulling it reproducibly inside a container build is a separate
exercise.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Batch-apply the same fix shape used for the MVP provers (z3, cvc5, coq,
agda, metamath, spass) to a first wave of Phase-3 backends: abc, acl2,
altergo, eprover, vampire. parse_file stashes the absolute path as
"source_path"; parse_string stashes the raw content as "{prover}_source";
verify_proof prefers those over reconstruction and falls back to the old
IR-driven path only for synthetic ProofStates.
No binary-invocation changes in this wave — we only gate the existing
solver-spawn logic behind a metadata lookup. The 625-test unit suite
remains green after the change; cargo build --release completes clean.
Further Phase-3 waves will follow for the remaining LOSSY backends.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Applies the same parse_file → stash "source_path", parse_string →
stash "{prover}_source", verify_proof → prefer-metadata-before-IR
fix to the rest of the LOSSY_ROUNDTRIP backends: alloy, cbmc, chuffed,
dafny, dreal, framac, fstar, glpk, hol_light, idris2, imandra,
isabelle, key, lean, minizinc, minlog, mizar, nuprl, nusmv, ortools,
prism, proverif, scip, seahorn, spin_checker, tamarin, tlaps, tlc,
twelf, uppaal, viper, why3.
Non-trivial adaptations required to preserve semantics:
- idris2: Idris 2 resolves the top-level module name from CWD like
Agda, so the source_path branch spawns with current_dir set to
the file's parent and passes the bare filename.
- seahorn: already had a seahorn_input_file escape hatch for pre-
compiled .bc/.ll bitcode; source_path becomes the primary
preference with the existing hatch and the IR reconstruction as
successive fallbacks.
- cbmc/framac: the --unwind / -wp-* metadata knobs were hoisted
above the source_path branch so every path honours the user's
bound overrides.
- prism/tlc: these write two files (model + properties/cfg). The
source_path branch passes the real model file verbatim; the
multi-file generation fallback stays intact behind the if-let.
- isabelle/mizar: both already had partial source stashing
(raw_thy_content, source_path). Normalised to the canonical
source_path + {prover}_source convention; Mizar's is_complete()
short-circuit was moved below the source-backed branches so real
files are not rejected for having open goals in our IR.
- acl2: ACL2 reads input on stdin — the source_path branch opens
the real file as Stdio::from(File) rather than piping string
content, matching the existing lossy-path invocation shape.
No trait signatures changed; every fallback path remains byte-for-
byte identical behind an if-let guard. Full workspace build, the
625-test unit suite, and `cargo clippy --lib --bin echidna` all
remain green.
Phase 3 closes out the broken-backend audit that started with the
Z3/CVC5/Coq MVP fixes: 39 backends now prefer their original source
over the lossy Term-IR round-trip.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.