Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,70 @@ The following files in `.machine_readable/` contain structured project metadata:
- `.machine_readable/6a2/PLAYBOOK.a2ml` - Operational runbook
- `.machine_readable/bot_directives/*.a2ml` - Per-bot permission and scope rules

## Canonical Roadmap

See [`docs/ROADMAP.md`](../docs/ROADMAP.md) for the single source of truth on
where ECHIDNA is going: the 8‑stage map, row‑by‑row endpoint targets,
current sprint (S1–S5), and agent‑tier guidance. Record decisions
there, not in chat.

## Supervised Multi-Agent Execution

When a supervising agent (Opus) plans work that will be implemented by
a focused executor (Sonnet) or a mechanical bulk agent (Haiku), the
supervisor MUST run a **Haiku scout pass** immediately before spawning
the implementation agent, unless the scope is strictly single‑file
and under ten lines of change.

### Scout pattern

**Purpose**: clear obvious cruft (unused imports, stale comments,
dead stubs, broken references) so the implementation agent arrives
to clean ground and does not burn tokens on trivial lint.

**When to run**: once per upcoming step, after the step's scope is
locked and before the implementation agent is launched.

**Prompt template** (copy this verbatim, fill in the file list):

```
Haiku scout, scope = <exact file list for the step>

1. Confirm each file compiles in isolation (cargo check --lib / julia -e).
2. Scan for: unused imports, dead code, `todo!()/FIXME/XXX`,
orphan type references, stale doc comments referring to
renamed symbols, broken `mod.rs` exports.
3. Trivially fix:
- unused imports (remove)
- obvious typos in comments
- unused `_var` renames
4. DO NOT fix — FLAG to me:
- any TODO with semantic content (may be load-bearing)
- anything requiring judgement about intent
- any cross-module change
5. Report: <20 lines. Green-light or list of blockers.
```

**Caveats**:

- Scout is **not** a full audit. Full audits remain the periodic
cross‑module sweep (cf. the Agent A + Agent B pattern in session
history); scout is narrow prep for an imminent step.
- Scout must **not over‑fix**. TODO/FIXME comments sometimes
contain the spec for the upcoming work — flag, don't delete.
- Scout is **idempotent**. Re‑running on unchanged code should
produce the same report.
- If scout flags a blocker, the supervisor decides whether to
resolve it inline, expand the step's scope, or defer. The
implementation agent does not launch until the supervisor
green‑lights.

**Commit convention** (when the scout applied trivial fixes):

```
chore(<scope>): scout-pass trivial cleanup ahead of <step id>
```

### CRITICAL: .scm metadata files are DEPRECATED

**All `.scm` state/metadata files have been replaced by `.a2ml`.**
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,6 @@ deps/
.cache/
build/
dist/

# Subagent worktrees (isolation mode)
.claude/worktrees/
17 changes: 17 additions & 0 deletions crates/typed_wasm/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# SPDX-License-Identifier: PMPL-1.0-or-later

[package]
name = "typed-wasm"
version = "0.1.0"
edition = "2021"
authors = ["Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk>"]
license = "PMPL-1.0-or-later"
description = "TypedWasm prover oracle engine for ECHIDNA — standalone domain types and engine"
repository = "https://github.com/hyperpolymath/echidna"

[dependencies]
anyhow = "1"
async-trait = "0.1"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
192 changes: 192 additions & 0 deletions docs/ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
<!-- SPDX-License-Identifier: PMPL-1.0-or-later -->
<!-- SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk> -->

# ECHIDNA Roadmap

**Status**: canonical • supersedes the aspirational parts of
[`FUTURE_DEVELOPMENT_ROADMAP.md`](./FUTURE_DEVELOPMENT_ROADMAP.md).
**Last revised**: 2026‑04‑20
**Scope**: the shortest honest path from today's repo to the endpoint vision.

This document is the single source of truth for where ECHIDNA is going.
The domain‑specific plans under [`handover/`](./handover/) and
[`design/SPARK_ADOPTION_PLAN.md`](./design/SPARK_ADOPTION_PLAN.md) are
still authoritative for their sub‑areas; they fit *inside* this
roadmap as implementations of individual stages.

## 1. Endpoint

ECHIDNA, when complete, is the **reasoning substrate of the
hyperpolymath ecosystem** — not just a proof tool. It is:

* a unified interface to every mature theorem prover, SMT solver,
model checker, and type checker in public use;
* an ML layer (GNN + Transformer over Flux.jl) that actually learns
premise selection, tactic synthesis, and strategy routing well
enough to beat hand‑curated choices on held‑out theorems;
* a persistent E‑R layer (Verisim) that stores every proof, tactic
application, and version as a content‑addressed row and closes a
real learning loop over historical outcomes;
* a parallel dispatch layer (Chapel) that runs portfolio solves
across CPU cores and dispatches specialised work to coprocessors
(GPU / tensor / vector / FPGA / QPU);
* a high‑speed IPC fabric (Cap'n Proto) between the Rust core and
the Julia ML sidecar and between the backend and the frontend;
* a frontend in AffineScript + Deno (TEA architecture) with i18n via
the LOL locale repo;
* a 16‑endpoint Zig ABI that external consumers (Coursera labs,
other projects) link against, with an SPDX‑pinned, Guix‑built,
Chainguard‑based container managed by Stapeln and hardened by the
Vordr / Selur / Svalinn / Cerro‑Torre surround;
* itself formally verified — the trust kernel's critical invariants
are proved in SPARK and in Idris2 / Agda.

From a user's seat: **indistinguishable from magic for easy
obligations, honest about ambiguity for hard ones, never silently
wrong.**

## 2. Stage map

```
Stage 1 Train the model at all ML layer can receive signal
1a extractors emit premises [done 2026‑04‑20, 22 of 50 ✓]
1b align_premises joins correctly [done 2026‑04‑19 ✓]
1c residual extractor gaps fixed [in progress, 28 to go]
1d vocab covers the surface [255 K; target ~1 M]

Stage 2 Signal becomes useful MRR moves off 0.66 baseline
2a run training at scale hardware task
2b online vocab growth new feature
2c re‑baseline and publish metrics run gate

Stage 3 Learning loop closes model improves from outcomes
3a Verisim read paths cross‑prover queries by goal_hash
3b hypatia strategy loop wired `mv_prover_success_by_class` reads
3c Proof / TacticApplication / ProofVersion emission wired
[defined 2026‑04‑19 ✓, emission pending]

Stage 4 Interaction layer honest every declared ProverKind works
4a typed_wasm → crates/typed_wasm extract to standalone crate
4b 40 TypeChecker variant dispatch Sigma‑route via the crate
4c tactic synthesis template 37 suggest_tactics stubs replaced
4d search_theorems template 49 stubs replaced, query Verisim

Stage 5 Distributed & fast scale + specialised hardware
5a Cap'n Proto IPC Rust ↔ Julia, :8090
5b Chapel full integration dispatch.rs picks by config
5c coprocessor abstraction trait + tensor/vector/QPU/FPGA
5d Tier‑4 live‑CI provisioning 19 backends, Containerfile path

Stage 6 Cross‑prover semantics translation actually works
6a OpenTheory bridge real bidirectional term translation
6b Dedukti bridge real λΠ modulo as lingua franca
6c mathematical identity arbitration canonicalise across representations
6d Coursera delivery Zig ABI exposes capability

Stage 7 Sovereign tooling surround the rest of the ecosystem
7a Stapeln container lifecycle
7b Vordr entry gate
7c Selur scheduler
7d Svalinn trust boundary
7e Cerro‑Torre observability
7f AffineScript‑TEA frontend src/rescript/, ≥33 modules
7g LOL i18n locale/ + t!() macro
7h Zig ABI 16‑endpoint surface src/zig/echidna_abi.zig

Stage 8 Self‑verified ECHIDNA proves ECHIDNA
8a Idris2 meta‑proofs extend from 32 modules to full kernel
8b Agda meta‑proofs extend from 12 modules
8c SPARK‑verified trust kernel crates/echidna-core-spark/
8d panic‑attack hardened proptest + AFL++ nightly
```

## 3. Endpoint target — row by row

| Claim | Today | End‑state target |
|---|---|---|
| "Every important solver" | 59 backends; 37 `suggest_tactics` stubs; 49 `search_theorems` stubs | **65 backends, every one with real `suggest_tactics` (GNN‑ranked top‑k) and `search_theorems` (queries Verisim by `goal_hash`) — no `Ok(vec![])` returns anywhere** |
| "Vocab at 2.5 M" | 255 K | **~1 M canonical tokens** after Mathport + Iris + VST + Flyspeck + HoTT absorption, with **online growth** adding tokens during training |
| "Chapel fully supported" | 2 files, POC only | **`dispatch.rs` picks Chapel‑parallel dispatch by config**; runtime init + cancellation + error propagation wired; ≥1 OoM speedup on portfolio solves |
| "Cap'n Proto serialisation" | 0 `.capnp` files | **`crates/echidna-wire/`** contains schemas for ProofState / Goal / Tactic / EmbeddingRequest / RankingResponse; IPC on :8090 is Cap'n Proto; JSON retained only as debug fallback |
| "Vordr / Selur / Svalinn / Cerro‑Torre / Stapeln" | Not present | Each named as a versioned dependency in Cargo.toml or the container definition, wired into its role |
| "AffineScript‑TEA frontend" | Not present (10 `.res` files stub) | **`src/rescript/`** holds ≥33 AffineScript‑TEA modules, persistent Model → Msg → Update loop, talks to core over Cap'n Proto WebSocket |
| "LOL i18n" | Not present | **`locale/`** with LOL‑sourced translations; `t!()` macro for every user‑facing string |
| "Rust/SPARK base stable" | Rust only | **`crates/echidna-core-spark/`** holds the Ada/SPARK‑verified kernel (axiom scanning, trust‑level computation); proved free of runtime errors; called from Rust via C ABI |
| "16‑endpoint Zig ABI" | 4 shared libs, endpoint count unverified | **`src/zig/echidna_abi.zig`** exports exactly 16 functions (`echidna_prove`, `echidna_apply_tactic`, `echidna_rank_premises`, `echidna_verify_certificate`, `echidna_post_octad`, `echidna_query_identity`, …) with a versioned header |
| "100 K+ per prover" | Uneven; long tail <20 K | **Top 15 provers ≥ 100 K records each**; next 20 ≥ 25 K each; bottom 10 synthetically augmented to ≥ 10 K; `stats_UNIFIED.json` reports the distribution publicly |
| "Panic‑attacked" | No fuzz infra | **`benches/panic_attack/`** with proptest + AFL++ harnesses covering every parser, FFI boundary, IPC schema; nightly CI ≥ 1 h; zero known panic paths |
| "Prover count" | 59 real, 105 declared | **`ProverKind` = 65 variants, every one factory‑dispatched**; CLAUDE.md states the canonical count; no orphans |

## 4. Current sprint — "the smallest set that flips half the table"

Five concrete commitments. Landed together inside ~6 weeks they move
"Every important solver", "Vocab", "Prover count", half of "100 K+",
and start the self‑learning loop closure.

| # | Stage | Commitment | Agent tier |
|---|---|---|---|
| S1 | 1c | Finish the extractor premise‑emission wave (22 done; 28 remaining) | **Sonnet** (in flight) |
| S2 | 2a / 2c | Run training on the fixed corpus; record new MRR; commit `metrics_baseline.jsonl` | **Hardware / you** |
| S3 | 4a / 4b | Extract `typed_wasm` to `crates/typed_wasm/`; route 40 TypeChecker variants through Sigma parameters | **Opus‑design + Sonnet‑impl** |
| S4 | 3a / 3b | Wire Verisim cross‑prover read paths (`goal_hash` queries + `mv_prover_success_by_class` + hypatia loop) | **Opus‑design + Sonnet‑impl** |
| S5 | 4c / 4d (pilot) | Tactic synthesis template for 5 high‑value provers (coq, lean, agda, isabelle, z3) using the GNN | **Opus‑design + Sonnet‑impl** |

After S1–S5 land, the roadmap's next sprint takes up Stage 5 (IPC +
Chapel + Tier‑4 CI) and Stage 8 begins in parallel.

## 5. Agent‑tier guidance

**Opus** (supervisor): architecture, schema design, cross‑module
contracts, merge review. Touches: IPC protocol design, Verisim
schema additions, Chapel FFI shape, coprocessor trait, SPARK kernel
surface.

**Sonnet** (executor): focused implementation against a clear spec,
3–20 files, moderate complexity. Touches: per‑prover tactic
synthesis, Verisim read client, typed_wasm crate extraction,
adapter code, worker logic.

**Haiku** (bulk): template‑driven mechanical edits, CI YAML,
deletes, repetitive stubs. Touches: Tier‑4 provisioning shell
commands, extractor boilerplate, per‑prover factory branches that
follow a mould, localisation key additions.

## 6. Inheritance from existing planning docs

* **[`docs/FUTURE_DEVELOPMENT_ROADMAP.md`](FUTURE_DEVELOPMENT_ROADMAP.md)**
— 2026‑01‑29 vision document. Its chapters on RL for tactic
search (§1.1), active learning (§1.2), and distributed dispatch
map onto Stages 3c / 4c and 5b/5c of this roadmap. Retain as
background reading; this document is the source of truth for
sequencing.
* **[`docs/handover/PRODUCTION-WIRING-PLAN.md`](handover/PRODUCTION-WIRING-PLAN.md)**
— L1/L2/L3 sprint doc. L1 = Stage 5a. L2 = Stage 5b. L3 Wave‑1/2
are complete (in Stage 4 surface); Wave‑3 is partially complete in
this branch (tamarin/proverif/metamath/twelf/ortools); Wave‑4 is
Stage 5d.
* **[`docs/design/SPARK_ADOPTION_PLAN.md`](design/SPARK_ADOPTION_PLAN.md)**
— design for the SPARK‑verified kernel. Executes Stage 8c.
* **[`docs/handover/L1-CAPNPROTO-PROMPT.md`](handover/L1-CAPNPROTO-PROMPT.md)**
— implementation brief for Stage 5a. Use as the Sonnet prompt
when S6 begins.
* **[`docs/handover/L2-CHAPEL-PROMPT.md`](handover/L2-CHAPEL-PROMPT.md)**
— implementation brief for Stage 5b.
* **[`docs/handover/L3-LIVE-PROVER-CI-PROMPT.md`](handover/L3-LIVE-PROVER-CI-PROMPT.md)**
— implementation brief for Stage 5d.
* **[`docs/handover/STATE.md`](handover/STATE.md)** and
**[`.machine_readable/6a2/STATE.a2ml`](../.machine_readable/6a2/STATE.a2ml)**
— running log of session‑by‑session progress. Continue to use as
running state; sync highlights to §2 of this document at each
sprint boundary.

## 7. How to update this roadmap

* When a stage sub‑item lands, mark its line `[done YYYY‑MM‑DD ✓]`
in §2 and update the corresponding row of §3.
* When a new endpoint target emerges, add a row to §3, extend §2's
stage map if it doesn't already fit, and add it to the sprint
plan in §4 only once it's next‑up.
* Don't remove rows from §3 when they land — keep the history
visible, the "Today" column becomes "Done" and the "End‑state"
column stays as the invariant.
* Record non‑trivial decisions in this doc, not in chat.
8 changes: 8 additions & 0 deletions scripts/extract_abc.jl
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,14 @@ function run_extract()
"goal"=>"$(splitext(f)[2][2:end])_circuit size=$(sz)",
"kind"=>splitext(f)[2][2:end],
"context"=>Any[]))
# Emit premises: identifiers from circuit filename stem
stem = splitext(basename(f))[1]
for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", stem)
h = strip(hm.captures[1])
!isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
"proof_id"=>id, "premise"=>h,
"prover"=>"ABC", "theorem"=>basename(f), "source"=>"abc"))
end
id += 1
catch e; println("Warning: $f: $e"); end
end
Expand Down
34 changes: 34 additions & 0 deletions scripts/extract_acl2.jl
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "acl2")
const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_acl2.jsonl")
const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_acl2.jsonl")
const STATS_FILE = joinpath(OUTPUT_DIR, "stats_acl2.json")
const START_ID = 92000

Expand Down Expand Up @@ -297,8 +298,17 @@ function run()::Tuple{Int,Int}
end
println(" Generated $added unique synthetic proofs")

# ACL2 premise patterns: :use hints, :in-theory enables, defthm names
acl2_hyp_patterns = [
r":use\s+\(?:?instance\s+([a-zA-Z0-9_-]+)",
r":use\s+\(([a-zA-Z0-9_-]+)",
r":in-theory\s+\(enable\s+([a-zA-Z0-9_\s-]+)\)",
r"\(defthm\s+([a-zA-Z0-9_-]+)",
]

current_id = START_ID
output_records = Dict{String,Any}[]
premises = Dict{String,Any}[]
for entry in all_entries
record = Dict{String,Any}(
"id" => current_id,
Expand All @@ -310,6 +320,25 @@ function run()::Tuple{Int,Int}
"source" => get(entry, "source", "acl2"),
)
push!(output_records, record)
# Emit premise records from hints/tactic_proof
proof_text = get(entry, "tactic_proof", "") * " " * string(get(entry, "hints", ""))
thm_name = entry["theorem"]
for hyp_pattern in acl2_hyp_patterns
for hyp_match in eachmatch(hyp_pattern, proof_text)
hyps = [strip(h) for h in split(hyp_match.captures[1], ' ')]
for hyp in hyps
if !isempty(hyp) && length(hyp) < 50
push!(premises, Dict{String,Any}(
"proof_id" => current_id,
"premise" => String(hyp),
"prover" => "ACL2",
"theorem" => thm_name,
"source" => get(entry, "source", "acl2"),
))
end
end
end
end
current_id += 1
end

Expand All @@ -318,6 +347,11 @@ function run()::Tuple{Int,Int}
println(fh, JSON3.write(rec))
end
end
open(PREMISES_FILE, "w") do fh
for p in premises
println(fh, JSON3.write(p))
end
end

stats = Dict{String,Any}(
"prover" => "ACL2",
Expand Down
7 changes: 7 additions & 0 deletions scripts/extract_acl2s.jl
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,13 @@ function extract_acl2s_proofs()
"theorem" => name, "goal" => statement,
"context" => Any[],
))
# Emit premises: ACL2s identifiers from theorem statement
for hm in eachmatch(r"\b([a-zA-Z][a-zA-Z0-9_\-]{1,40})\b", statement)
h = strip(hm.captures[1])
!isempty(h) && length(h) < 50 && push!(premises, Dict{String,Any}(
"proof_id"=>current_id, "premise"=>h,
"prover"=>"ACL2s", "theorem"=>name, "source"=>"acl2s"))
end
current_id += 1
end
end
Expand Down
Loading
Loading