Skip to content

Claude/echidna mcp#33

Merged
hyperpolymath merged 7 commits intomainfrom
claude/echidna-mcp
Apr 22, 2026
Merged

Claude/echidna mcp#33
hyperpolymath merged 7 commits intomainfrom
claude/echidna-mcp

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

No description provided.

claude and others added 7 commits April 22, 2026 16:39
…tool

Exposes the `echidna` CLI through the Model Context Protocol so Claude
Code / Claude API / other MCP-aware agents can invoke ECHIDNA's 105
prover backends as a first-class tool-use action.

### What ships

- New workspace member `crates/echidna-mcp/` with a Rust binary built
  on rmcp 1.5 (`ToolRouter` + `#[tool_router]` / `#[tool_handler]`
  macros, stdio transport). Single tool: `prove`.
- Tool schema (auto-generated from `ProveParams` via schemars 1):
  - `file: string` (required) — path to proof file (SMT-LIB, Lean,
    Coq, Agda, Isabelle, Mizar, F*, SPASS, Metamath, …)
  - `prover: string` (optional) — backend name; auto-detect if absent
  - `timeout_secs: u32` (optional, default 300)
  Returns `{verified, message, prover, raw_output, stderr}`.
- Stdio transport; log goes to stderr (so MCP clients don't see it on
  stdout). Echidna binary path via `ECHIDNA_BIN` env or `echidna` in PATH.
- Claude Code wiring: `.claude/settings.json` with
  `{"mcpServers": {"echidna": {"command": "echidna-mcp"}}}`.

### Smoke test (verified on this branch)

1. Install z3 (apt) — 4.8.12 in PATH.
2. Trivial SMT-LIB file `(set-logic QF_LIA) (declare-const x Int)
   (assert (> x 0)) (assert (< x 2)) (check-sat)`.
3. Feed the MCP server: initialize → notifications/initialized →
   tools/list → tools/call(prove, {file, prover:"Z3", timeout_secs:30}).
4. Response: `{"verified": true, "message": "✓ Proof verified
   successfully...", "prover": "Z3", ...}`.

End-to-end path works: MCP JSON-RPC → echidna CLI → Z3 → verdict.

### Roadmap update (docs/handover/TODO.md)

Added "Wave-5 (new backend targets, no adapter yet)" section tracking
prover backends that do not yet have an adapter in src/rust/provers/:
- **Andromeda** — scaffold `provers/andromeda.rs`, shells out to
  `andromeda` CLI (OCaml source build required). Open-source (MIT).
- **Theorema** — deferred (Mathematica-based, commercial licence).
- **Globular** — not a CLI prover (web UI for higher category theory);
  skipped unless scoped to graphical proof capture.

### Next sessions

- Install + wire the remaining 7 provers from the agreed MVP subset:
  Coq/Rocq, Isabelle, Agda, Mizar, F*, Lean 4, SPASS (SPASS apt pkg
  unavailable on this env, needs manual), Metamath.
- Containerfile: ECHIDNA + curated provers + the MCP server, so
  agents can `podman run` a self-contained proving service.
- Andromeda backend scaffold (Wave-5 entry #1).
Upgrades the capnp runtime and capnpc code-generator in echidna-wire
to clear RUSTSEC-2025-0143 ("Unsound APIs of public constant::Reader
and StructSchema", fixed in >=0.24). capnpc 0.25 regenerates the
wire module identically and the 3 echidna-wire smoke tests pass.

Only remaining cargo-audit warning is RUSTSEC-2025-0134 (rustls-pemfile
unmaintained) pulled transitively via tonic 0.12 in echidna-grpc; that
is tracked as a separate tonic 0.13 bump campaign.

https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
The previous pipeline parsed a .smt2/.v/.agda/.dfg/.mm file into the
universal Term IR and then reconstructed the solver's input from that
IR (via each backend's `export(state)` or `term_to_*` helper) before
invoking the solver.  The parsers are incomplete, so non-trivial
inputs were silently mangled — the Coq parser literally emits
`"param": "(P"` with a stray paren — and every downstream solver
then rejected the regenerated file.  A few backends hid this by
early-returning `Ok(true)` whenever `state.goals.is_empty()`, which
fires for real files whose parser simply fails to populate goals the
way the generic pipeline expects — fake success, solver never ran.

Fix follows the pattern already established by Mizar: `parse_file`
stashes the absolute path under `"source_path"` and `parse_string`
stashes the raw content under `"{prover}_source"`; `verify_proof`
prefers those over the lossy reconstruction and falls back to the
old IR-driven path only for synthetic ProofStates that were never
parsed from real source.

Also fixes two binary-invocation bugs surfaced during end-to-end
smoke testing:

- Agda: spawn the process in the file's parent directory so the
  top-level module name resolves relative to CWD (otherwise every
  non-trivial file trips "module does not match file name").
- SPASS: pass input as a positional file argument (the binary
  rejects DFG on stdin, misreading it as additional flags); use
  `-TimeLimit=N` flag syntax (space-separated form makes SPASS
  try to open "N" as a filename); drop `-PGiven=0 -PProblem=0`
  which suppress the "Proof found" banner parse_result matches on.
- Metamath: wire `verify_proof` to the real `metamath` CLI via a
  scripted stdin session (`read "…"; verify proof *; exit`) instead
  of returning `Ok(state.goals.is_empty())` — the previous impl
  never ran the solver.

End-to-end MCP smoke tests now pass for all six apt-installable
MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS): `tools/call
prove` → solver → `{"verified": true}`.  The 625-test unit suite
remains green; clippy on the bin+lib remains clean.

The remaining ~30 LOSSY_ROUNDTRIP backends (Lean, Isabelle, F*,
Idris2, Vampire, etc.), the PVS/HOL4 fake-success stubs, and the
TRIVIAL_ONLY SAT solvers (CaDiCaL/Kissat/MiniSat) are tracked for
follow-up.

https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Both backends' verify_proof was `Ok(state.goals.is_empty())` — no
solver invocation at any point, so every file with "no goals left
after parsing" (which is the normal case for top-level declarations)
reported VERIFIED without running PVS or HOL4.

parse_file now stashes the original path as source_path (consistent
with the pattern introduced for the SMT/Coq/Agda/Metamath/SPASS
backends).  verify_proof prefers that path: for PVS it runs the
executable in batch typecheck mode and rejects runs containing
"error" or "typecheck failed"; for HOL4 it runs the configured
driver against the file.

Neither PVS nor HOL4 is installed in this environment, so the
real-solver path is code-only; the fallback preserves the legacy
empty-goals shortcut for synthetic ProofStates that never came
from a parsed file.  The 625-test unit suite remains green.

https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Multi-stage build that ships echidna + echidna-mcp alongside the 6
apt-installable MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS)
and three upstream-distributed ones (Lean 4 via elan, Isabelle 2025
via tum.de tarball with a pre-built Main heap, F* via the latest
GitHub-release Linux asset).

The runtime entry point is /app/bin/echidna-mcp so an MCP client
connects over stdio:

    podman build -f .containerization/Containerfile.mcp -t echidna:mcp .
    podman run --rm -i echidna:mcp

Consumers wire it into Claude Code / Claude Desktop / any MCP client
via:

    {"mcpServers": {"echidna": {
        "command": "podman",
        "args": ["run","--rm","-i","echidna:mcp"]
    }}}

Mizar is omitted — it has no Debian package and the SourceForge
upstream distributes neither a stable URL nor a signed tarball, so
pulling it reproducibly inside a container build is a separate
exercise.

https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Batch-apply the same fix shape used for the MVP provers (z3, cvc5, coq,
agda, metamath, spass) to a first wave of Phase-3 backends: abc, acl2,
altergo, eprover, vampire.  parse_file stashes the absolute path as
"source_path"; parse_string stashes the raw content as "{prover}_source";
verify_proof prefers those over reconstruction and falls back to the old
IR-driven path only for synthetic ProofStates.

No binary-invocation changes in this wave — we only gate the existing
solver-spawn logic behind a metadata lookup.  The 625-test unit suite
remains green after the change; cargo build --release completes clean.

Further Phase-3 waves will follow for the remaining LOSSY backends.

https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hyperpolymath hyperpolymath merged commit 6c07c3d into main Apr 22, 2026
@hyperpolymath hyperpolymath deleted the claude/echidna-mcp branch April 22, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants