diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index f1d0609..ed45a27 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -10,6 +10,70 @@ The following files in `.machine_readable/` contain structured project metadata:
 - `.machine_readable/6a2/PLAYBOOK.a2ml` - Operational runbook
 - `.machine_readable/bot_directives/*.a2ml` - Per-bot permission and scope rules
 
+## Canonical Roadmap
+
+See [`docs/ROADMAP.md`](../docs/ROADMAP.md) for the single source of truth on
+where ECHIDNA is going: the 8‑stage map, row‑by‑row endpoint targets,
+current sprint (S1–S5), and agent‑tier guidance.  Record decisions
+there, not in chat.
+
+## Supervised Multi-Agent Execution
+
+When a supervising agent (Opus) plans work that will be implemented by
+a focused executor (Sonnet) or a mechanical bulk agent (Haiku), the
+supervisor MUST run a **Haiku scout pass** immediately before spawning
+the implementation agent, unless the scope is strictly single‑file
+and under ten lines of change.
+
+### Scout pattern
+
+**Purpose**: clear obvious cruft (unused imports, stale comments,
+dead stubs, broken references) so the implementation agent arrives
+to clean ground and does not burn tokens on trivial lint.
+
+**When to run**: once per upcoming step, after the step's scope is
+locked and before the implementation agent is launched.
+
+**Prompt template** (copy this verbatim, fill in the file list):
+
+```
+Haiku scout, scope = <exact file list for the step>
+
+1. Confirm each file compiles in isolation (cargo check --lib / julia -e).
+2. Scan for: unused imports, dead code, `todo!()/FIXME/XXX`,
+   orphan type references, stale doc comments referring to
+   renamed symbols, broken `mod.rs` exports.
+3. Trivially fix:
+   - unused imports (remove)
+   - obvious typos in comments
+   - unused `_var` renames
+4. DO NOT fix — FLAG to me:
+   - any TODO with semantic content (may be load-bearing)
+   - anything requiring judgement about intent
+   - any cross-module change
+5. Report: <20 lines. Green-light or list of blockers.
+```
+
+**Caveats**:
+
+- Scout is **not** a full audit.  Full audits remain the periodic
+  cross‑module sweep (cf. the Agent A + Agent B pattern in session
+  history); scout is narrow prep for an imminent step.
+- Scout must **not over‑fix**.  TODO/FIXME comments sometimes
+  contain the spec for the upcoming work — flag, don't delete.
+- Scout is **idempotent**.  Re‑running on unchanged code should
+  produce the same report.
+- If scout flags a blocker, the supervisor decides whether to
+  resolve it inline, expand the step's scope, or defer.  The
+  implementation agent does not launch until the supervisor
+  green‑lights.
+
+**Commit convention** (when the scout applied trivial fixes):
+
+```
+chore(<scope>): scout-pass trivial cleanup ahead of <step id>
+```
+
 ### CRITICAL: .scm metadata files are DEPRECATED
 
 **All `.scm` state/metadata files have been replaced by `.a2ml`.**
diff --git a/.gitignore b/.gitignore
index a8ee254..480256f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -193,3 +193,6 @@ deps/
 .cache/
 build/
 dist/
+
+# Subagent worktrees (isolation mode)
+.claude/worktrees/
diff --git a/crates/typed_wasm/Cargo.toml b/crates/typed_wasm/Cargo.toml
new file mode 100644
index 0000000..c27f2d2
--- /dev/null
+++ b/crates/typed_wasm/Cargo.toml
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: PMPL-1.0-or-later
+
+[package]
+name = "typed-wasm"
+version = "0.1.0"
+edition = "2021"
+authors = ["Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk>"]
+license = "PMPL-1.0-or-later"
+description = "TypedWasm prover oracle engine for ECHIDNA — standalone domain types and engine"
+repository = "https://github.com/hyperpolymath/echidna"
+
+[dependencies]
+anyhow = "1"
+async-trait = "0.1"
+tokio = { version = "1", features = ["full"] }
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
new file mode 100644
index 0000000..4ddf763
--- /dev/null
+++ b/docs/ROADMAP.md
@@ -0,0 +1,192 @@
+<!-- SPDX-License-Identifier: PMPL-1.0-or-later -->
+<!-- SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk> -->
+
+# ECHIDNA Roadmap
+
+**Status**: canonical • supersedes the aspirational parts of
+[`FUTURE_DEVELOPMENT_ROADMAP.md`](./FUTURE_DEVELOPMENT_ROADMAP.md).
+**Last revised**: 2026‑04‑20
+**Scope**: the shortest honest path from today's repo to the endpoint vision.
+
+This document is the single source of truth for where ECHIDNA is going.
+The domain‑specific plans under [`handover/`](./handover/) and
+[`design/SPARK_ADOPTION_PLAN.md`](./design/SPARK_ADOPTION_PLAN.md) are
+still authoritative for their sub‑areas; they fit *inside* this
+roadmap as implementations of individual stages.
+
+## 1. Endpoint
+
+ECHIDNA, when complete, is the **reasoning substrate of the
+hyperpolymath ecosystem** — not just a proof tool.  It is:
+
+* a unified interface to every mature theorem prover, SMT solver,
+  model checker, and type checker in public use;
+* an ML layer (GNN + Transformer over Flux.jl) that actually learns
+  premise selection, tactic synthesis, and strategy routing well
+  enough to beat hand‑curated choices on held‑out theorems;
+* a persistent E‑R layer (Verisim) that stores every proof, tactic
+  application, and version as a content‑addressed row and closes a
+  real learning loop over historical outcomes;
+* a parallel dispatch layer (Chapel) that runs portfolio solves
+  across CPU cores and dispatches specialised work to coprocessors
+  (GPU / tensor / vector / FPGA / QPU);
+* a high‑speed IPC fabric (Cap'n Proto) between the Rust core and
+  the Julia ML sidecar and between the backend and the frontend;
+* a frontend in AffineScript + Deno (TEA architecture) with i18n via
+  the LOL locale repo;
+* a 16‑endpoint Zig ABI that external consumers (Coursera labs,
+  other projects) link against, with an SPDX‑pinned, Guix‑built,
+  Chainguard‑based container managed by Stapeln and hardened by the
+  Vordr / Selur / Svalinn / Cerro‑Torre surround;
+* itself formally verified — the trust kernel's critical invariants
+  are proved in SPARK and in Idris2 / Agda.
+
+From a user's seat: **indistinguishable from magic for easy
+obligations, honest about ambiguity for hard ones, never silently
+wrong.**
+
+## 2. Stage map
+
+```
+Stage 1  Train the model at all             ML layer can receive signal
+    1a  extractors emit premises            [done 2026‑04‑20, 22 of 50 ✓]
+    1b  align_premises joins correctly      [done 2026‑04‑19 ✓]
+    1c  residual extractor gaps fixed       [in progress, 28 to go]
+    1d  vocab covers the surface            [255 K; target ~1 M]
+
+Stage 2  Signal becomes useful              MRR moves off 0.66 baseline
+    2a  run training at scale               hardware task
+    2b  online vocab growth                 new feature
+    2c  re‑baseline and publish metrics     run gate
+
+Stage 3  Learning loop closes               model improves from outcomes
+    3a  Verisim read paths                  cross‑prover queries by goal_hash
+    3b  hypatia strategy loop wired         `mv_prover_success_by_class` reads
+    3c  Proof / TacticApplication / ProofVersion emission wired
+                                            [defined 2026‑04‑19 ✓, emission pending]
+
+Stage 4  Interaction layer honest           every declared ProverKind works
+    4a  typed_wasm → crates/typed_wasm      extract to standalone crate
+    4b  40 TypeChecker variant dispatch     Sigma‑route via the crate
+    4c  tactic synthesis template           37 suggest_tactics stubs replaced
+    4d  search_theorems template            49 stubs replaced, query Verisim
+
+Stage 5  Distributed & fast                 scale + specialised hardware
+    5a  Cap'n Proto IPC                     Rust ↔ Julia, :8090
+    5b  Chapel full integration             dispatch.rs picks by config
+    5c  coprocessor abstraction             trait + tensor/vector/QPU/FPGA
+    5d  Tier‑4 live‑CI provisioning         19 backends, Containerfile path
+
+Stage 6  Cross‑prover semantics             translation actually works
+    6a  OpenTheory bridge real              bidirectional term translation
+    6b  Dedukti bridge real                 λΠ modulo as lingua franca
+    6c  mathematical identity arbitration   canonicalise across representations
+    6d  Coursera delivery                   Zig ABI exposes capability
+
+Stage 7  Sovereign tooling surround         the rest of the ecosystem
+    7a  Stapeln container lifecycle
+    7b  Vordr entry gate
+    7c  Selur scheduler
+    7d  Svalinn trust boundary
+    7e  Cerro‑Torre observability
+    7f  AffineScript‑TEA frontend           src/rescript/, ≥33 modules
+    7g  LOL i18n                            locale/ + t!() macro
+    7h  Zig ABI 16‑endpoint surface         src/zig/echidna_abi.zig
+
+Stage 8  Self‑verified                      ECHIDNA proves ECHIDNA
+    8a  Idris2 meta‑proofs extend           from 32 modules to full kernel
+    8b  Agda meta‑proofs extend             from 12 modules
+    8c  SPARK‑verified trust kernel         crates/echidna-core-spark/
+    8d  panic‑attack hardened               proptest + AFL++ nightly
+```
+
+## 3. Endpoint target — row by row
+
+| Claim | Today | End‑state target |
+|---|---|---|
+| "Every important solver" | 59 backends; 37 `suggest_tactics` stubs; 49 `search_theorems` stubs | **65 backends, every one with real `suggest_tactics` (GNN‑ranked top‑k) and `search_theorems` (queries Verisim by `goal_hash`) — no `Ok(vec![])` returns anywhere** |
+| "Vocab at 2.5 M" | 255 K | **~1 M canonical tokens** after Mathport + Iris + VST + Flyspeck + HoTT absorption, with **online growth** adding tokens during training |
+| "Chapel fully supported" | 2 files, POC only | **`dispatch.rs` picks Chapel‑parallel dispatch by config**; runtime init + cancellation + error propagation wired; ≥1 OoM speedup on portfolio solves |
+| "Cap'n Proto serialisation" | 0 `.capnp` files | **`crates/echidna-wire/`** contains schemas for ProofState / Goal / Tactic / EmbeddingRequest / RankingResponse; IPC on :8090 is Cap'n Proto; JSON retained only as debug fallback |
+| "Vordr / Selur / Svalinn / Cerro‑Torre / Stapeln" | Not present | Each named as a versioned dependency in Cargo.toml or the container definition, wired into its role |
+| "AffineScript‑TEA frontend" | Not present (10 `.res` files stub) | **`src/rescript/`** holds ≥33 AffineScript‑TEA modules, persistent Model → Msg → Update loop, talks to core over Cap'n Proto WebSocket |
+| "LOL i18n" | Not present | **`locale/`** with LOL‑sourced translations; `t!()` macro for every user‑facing string |
+| "Rust/SPARK base stable" | Rust only | **`crates/echidna-core-spark/`** holds the Ada/SPARK‑verified kernel (axiom scanning, trust‑level computation); proved free of runtime errors; called from Rust via C ABI |
+| "16‑endpoint Zig ABI" | 4 shared libs, endpoint count unverified | **`src/zig/echidna_abi.zig`** exports exactly 16 functions (`echidna_prove`, `echidna_apply_tactic`, `echidna_rank_premises`, `echidna_verify_certificate`, `echidna_post_octad`, `echidna_query_identity`, …) with a versioned header |
+| "100 K+ per prover" | Uneven; long tail <20 K | **Top 15 provers ≥ 100 K records each**; next 20 ≥ 25 K each; bottom 10 synthetically augmented to ≥ 10 K; `stats_UNIFIED.json` reports the distribution publicly |
+| "Panic‑attacked" | No fuzz infra | **`benches/panic_attack/`** with proptest + AFL++ harnesses covering every parser, FFI boundary, IPC schema; nightly CI ≥ 1 h; zero known panic paths |
+| "Prover count" | 59 real, 105 declared | **`ProverKind` = 65 variants, every one factory‑dispatched**; CLAUDE.md states the canonical count; no orphans |
+
+## 4. Current sprint — "the smallest set that flips half the table"
+
+Five concrete commitments.  Landed together inside ~6 weeks they move
+"Every important solver", "Vocab", "Prover count", half of "100 K+",
+and start the self‑learning loop closure.
+
+| # | Stage | Commitment | Agent tier |
+|---|---|---|---|
+| S1 | 1c | Finish the extractor premise‑emission wave (22 done; 28 remaining) | **Sonnet** (in flight) |
+| S2 | 2a / 2c | Run training on the fixed corpus; record new MRR; commit `metrics_baseline.jsonl` | **Hardware / you** |
+| S3 | 4a / 4b | Extract `typed_wasm` to `crates/typed_wasm/`; route 40 TypeChecker variants through Sigma parameters | **Opus‑design + Sonnet‑impl** |
+| S4 | 3a / 3b | Wire Verisim cross‑prover read paths (`goal_hash` queries + `mv_prover_success_by_class` + hypatia loop) | **Opus‑design + Sonnet‑impl** |
+| S5 | 4c / 4d (pilot) | Tactic synthesis template for 5 high‑value provers (coq, lean, agda, isabelle, z3) using the GNN | **Opus‑design + Sonnet‑impl** |
+
+After S1–S5 land, the roadmap's next sprint takes up Stage 5 (IPC +
+Chapel + Tier‑4 CI) and Stage 8 begins in parallel.
+
+## 5. Agent‑tier guidance
+
+**Opus** (supervisor): architecture, schema design, cross‑module
+contracts, merge review.  Touches: IPC protocol design, Verisim
+schema additions, Chapel FFI shape, coprocessor trait, SPARK kernel
+surface.
+
+**Sonnet** (executor): focused implementation against a clear spec,
+3–20 files, moderate complexity.  Touches: per‑prover tactic
+synthesis, Verisim read client, typed_wasm crate extraction,
+adapter code, worker logic.
+
+**Haiku** (bulk): template‑driven mechanical edits, CI YAML,
+deletes, repetitive stubs.  Touches: Tier‑4 provisioning shell
+commands, extractor boilerplate, per‑prover factory branches that
+follow a mould, localisation key additions.
+
+## 6. Inheritance from existing planning docs
+
+* **[`docs/FUTURE_DEVELOPMENT_ROADMAP.md`](FUTURE_DEVELOPMENT_ROADMAP.md)**
+  — 2026‑01‑29 vision document.  Its chapters on RL for tactic
+  search (§1.1), active learning (§1.2), and distributed dispatch
+  map onto Stages 3c / 4c and 5b/5c of this roadmap.  Retain as
+  background reading; this document is the source of truth for
+  sequencing.
+* **[`docs/handover/PRODUCTION-WIRING-PLAN.md`](handover/PRODUCTION-WIRING-PLAN.md)**
+  — L1/L2/L3 sprint doc.  L1 = Stage 5a.  L2 = Stage 5b.  L3 Wave‑1/2
+  are complete (in Stage 4 surface); Wave‑3 is partially complete in
+  this branch (tamarin/proverif/metamath/twelf/ortools); Wave‑4 is
+  Stage 5d.
+* **[`docs/design/SPARK_ADOPTION_PLAN.md`](design/SPARK_ADOPTION_PLAN.md)**
+  — design for the SPARK‑verified kernel.  Executes Stage 8c.
+* **[`docs/handover/L1-CAPNPROTO-PROMPT.md`](handover/L1-CAPNPROTO-PROMPT.md)**
+  — implementation brief for Stage 5a.  Use as the Sonnet prompt
+  when S6 begins.
+* **[`docs/handover/L2-CHAPEL-PROMPT.md`](handover/L2-CHAPEL-PROMPT.md)**
+  — implementation brief for Stage 5b.
+* **[`docs/handover/L3-LIVE-PROVER-CI-PROMPT.md`](handover/L3-LIVE-PROVER-CI-PROMPT.md)**
+  — implementation brief for Stage 5d.
+* **[`docs/handover/STATE.md`](handover/STATE.md)** and
+  **[`.machine_readable/6a2/STATE.a2ml`](../.machine_readable/6a2/STATE.a2ml)**
+  — running log of session‑by‑session progress.  Continue to use as
+  running state; sync highlights to §2 of this document at each
+  sprint boundary.
+
+## 7. How to update this roadmap
+
+* When a stage sub‑item lands, mark its line `[done YYYY‑MM‑DD ✓]`
+  in §2 and update the corresponding row of §3.
+* When a new endpoint target emerges, add a row to §3, extend §2's
+  stage map if it doesn't already fit, and add it to the sprint
+  plan in §4 only once it's next‑up.
+* Don't remove rows from §3 when they land — keep the history
+  visible, the "Today" column becomes "Done" and the "End‑state"
+  column stays as the invariant.
+* Record non‑trivial decisions in this doc, not in chat.
diff --git a/scripts/extract_abc.jl b/scripts/extract_abc.jl
index a6c5465..09711c2 100644
--- a/scripts/extract_abc.jl
+++ b/scripts/extract_abc.jl
@@ -43,6 +43,14 @@ function run_extract()
                 "goal"=>"$(splitext(f)[2][2:end])_circuit size=$(sz)",
                 "kind"=>splitext(f)[2][2:end],
                 "context"=>Any[]))
+            # Emit premises: identifiers from circuit filename stem
+            stem = splitext(basename(f))[1]
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", stem)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"ABC", "theorem"=>basename(f), "source"=>"abc"))
+            end
             id += 1
         catch e; println("Warning: $f: $e"); end
     end
diff --git a/scripts/extract_acl2.jl b/scripts/extract_acl2.jl
index 5f45d44..1b62c80 100644
--- a/scripts/extract_acl2.jl
+++ b/scripts/extract_acl2.jl
@@ -26,6 +26,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "acl2")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_acl2.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_acl2.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_acl2.json")
 const START_ID = 92000
 
@@ -297,8 +298,17 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $added unique synthetic proofs")
 
+    # ACL2 premise patterns: :use hints, :in-theory enables, defthm names
+    acl2_hyp_patterns = [
+        r":use\s+\(?:?instance\s+([a-zA-Z0-9_-]+)",
+        r":use\s+\(([a-zA-Z0-9_-]+)",
+        r":in-theory\s+\(enable\s+([a-zA-Z0-9_\s-]+)\)",
+        r"\(defthm\s+([a-zA-Z0-9_-]+)",
+    ]
+
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -310,6 +320,25 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "acl2"),
         )
         push!(output_records, record)
+        # Emit premise records from hints/tactic_proof
+        proof_text = get(entry, "tactic_proof", "") * " " * string(get(entry, "hints", ""))
+        thm_name = entry["theorem"]
+        for hyp_pattern in acl2_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, proof_text)
+                hyps = [strip(h) for h in split(hyp_match.captures[1], ' ')]
+                for hyp in hyps
+                    if !isempty(hyp) && length(hyp) < 50
+                        push!(premises, Dict{String,Any}(
+                            "proof_id" => current_id,
+                            "premise" => String(hyp),
+                            "prover" => "ACL2",
+                            "theorem" => thm_name,
+                            "source" => get(entry, "source", "acl2"),
+                        ))
+                    end
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -318,6 +347,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     stats = Dict{String,Any}(
         "prover" => "ACL2",
diff --git a/scripts/extract_acl2s.jl b/scripts/extract_acl2s.jl
index 28f58ea..adbfa64 100644
--- a/scripts/extract_acl2s.jl
+++ b/scripts/extract_acl2s.jl
@@ -48,6 +48,13 @@ function extract_acl2s_proofs()
                         "theorem" => name, "goal" => statement,
                         "context" => Any[],
                     ))
+                    # Emit premises: ACL2s identifiers from theorem statement
+                    for hm in eachmatch(r"\b([a-zA-Z][a-zA-Z0-9_\-]{1,40})\b", statement)
+                        h = strip(hm.captures[1])
+                        !isempty(h) && length(h) < 50 && push!(premises, Dict{String,Any}(
+                            "proof_id"=>current_id, "premise"=>h,
+                            "prover"=>"ACL2s", "theorem"=>name, "source"=>"acl2s"))
+                    end
                     current_id += 1
                 end
             end
diff --git a/scripts/extract_afp.jl b/scripts/extract_afp.jl
index f534691..df9bc33 100644
--- a/scripts/extract_afp.jl
+++ b/scripts/extract_afp.jl
@@ -40,6 +40,7 @@ const REPO_ROOT        = dirname(dirname(abspath(@__FILE__)))
 const AFP_ROOT         = joinpath(REPO_ROOT, "external_corpora", "afp", "thys")
 const OUTPUT_DIR       = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE      = joinpath(OUTPUT_DIR, "proof_states_afp.jsonl")
+const PREMISES_FILE    = joinpath(OUTPUT_DIR, "premises_afp.jsonl")
 const STATS_FILE       = joinpath(OUTPUT_DIR, "stats_afp.json")
 const START_ID         = 220000
 const MAX_PER_ARTICLE  = 200  # target ~100K+ (893 articles, raw pool 217 593)
@@ -302,8 +303,17 @@ function main()
     println("After per-article cap:    $(length(all_recs))  (cap=$MAX_PER_ARTICLE)")
     println("Articles represented:     $(length(per_article))")
 
+    # Isabelle premise patterns: assume/fix/obtain/case keywords
+    isabelle_hyp_patterns = [
+        r"assume\s+\"([a-zA-Z0-9_\s]+)\"",
+        r"\bfix\s+([a-zA-Z0-9_\s]+)",
+        r"obtain\s+([a-zA-Z0-9_\s]+)\s+where",
+        r"\bcase\s+([a-zA-Z0-9_]+)",
+    ]
+
     mkpath(OUTPUT_DIR)
     nid = START_ID
+    premises_out = Dict{String,Any}[]
     open(OUTPUT_FILE, "w") do fh
         for rec in all_recs
             delete!(rec, "_article")
@@ -311,9 +321,34 @@ function main()
             rec["prover"] = "Isabelle"
             JSON3.write(fh, rec)
             println(fh)
+            # Emit premise records from tactic_proof text
+            proof_text = get(rec, "tactic_proof", "")
+            thm_name = get(rec, "theorem", "")
+            for hyp_pattern in isabelle_hyp_patterns
+                for hyp_match in eachmatch(hyp_pattern, proof_text)
+                    hyps = [strip(h) for h in split(hyp_match.captures[1], ',')]
+                    for hyp in hyps
+                        if !isempty(hyp) && length(hyp) < 50
+                            push!(premises_out, Dict{String,Any}(
+                                "proof_id" => nid,
+                                "premise" => String(hyp),
+                                "prover" => "Isabelle",
+                                "theorem" => thm_name,
+                                "source" => "afp",
+                            ))
+                        end
+                    end
+                end
+            end
             nid += 1
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises_out
+            JSON3.write(fh, p); println(fh)
+        end
+    end
+    println("Wrote $(length(premises_out)) premise records to $PREMISES_FILE")
     total = nid - START_ID
     println("Wrote $total records to $OUTPUT_FILE")
 
diff --git a/scripts/extract_agda.jl b/scripts/extract_agda.jl
index 2926ff7..fd7a287 100644
--- a/scripts/extract_agda.jl
+++ b/scripts/extract_agda.jl
@@ -43,8 +43,9 @@ using Dates
 
 const REPO_ROOT   = dirname(dirname(abspath(@__FILE__)))
 const OUTPUT_DIR  = joinpath(REPO_ROOT, "training_data")
-const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_agda.jsonl")
-const STATS_FILE  = joinpath(OUTPUT_DIR, "stats_agda.json")
+const OUTPUT_FILE    = joinpath(OUTPUT_DIR, "proof_states_agda.jsonl")
+const PREMISES_FILE  = joinpath(OUTPUT_DIR, "premises_agda.jsonl")
+const STATS_FILE     = joinpath(OUTPUT_DIR, "stats_agda.json")
 const START_ID    = 210000
 
 # Multiple Agda corpus roots. Added 2026-04-18 (echidna#17): the
@@ -426,9 +427,18 @@ function main()
     println("Files with keepable decls: $parsed_ok")
     println("Raw declarations:          $(length(all_records))")
 
+    # Agda premise patterns: with/where bindings, pattern-lambda vars
+    agda_hyp_patterns = [
+        r"with\s+([a-zA-Z0-9_]+)",
+        r"\bwhere\s+([a-zA-Z0-9_]+)\s*=",
+        r"let\s+([a-zA-Z0-9_]+)\s*=",
+        r"\bin\s+([a-zA-Z0-9_]+)\b",
+    ]
+
     # Assign IDs and write JSONL.
     mkpath(OUTPUT_DIR)
     nid = START_ID
+    premises_out = Dict{String,Any}[]
     open(OUTPUT_FILE, "w") do fh
         for rec in all_records
             rec["id"]           = nid
@@ -438,9 +448,32 @@ function main()
                                       # referenced in rec["source"].
             JSON3.write(fh, rec)
             println(fh)
+            # Emit premise records from the signature body (goal field)
+            sig_text = get(rec, "goal", "")
+            thm_name = get(rec, "theorem", "")
+            for hyp_pattern in agda_hyp_patterns
+                for hyp_match in eachmatch(hyp_pattern, sig_text)
+                    hyp = strip(hyp_match.captures[1])
+                    if !isempty(hyp) && length(hyp) < 50
+                        push!(premises_out, Dict{String,Any}(
+                            "proof_id" => nid,
+                            "premise" => String(hyp),
+                            "prover" => "Agda",
+                            "theorem" => thm_name,
+                            "source" => "agda",
+                        ))
+                    end
+                end
+            end
             nid += 1
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises_out
+            JSON3.write(fh, p); println(fh)
+        end
+    end
+    println("Wrote $(length(premises_out)) premise records to $PREMISES_FILE")
     total = nid - START_ID
     println("Wrote $total records to $OUTPUT_FILE")
 
diff --git a/scripts/extract_alloy.jl b/scripts/extract_alloy.jl
index eda3e9c..3404b14 100644
--- a/scripts/extract_alloy.jl
+++ b/scripts/extract_alloy.jl
@@ -50,6 +50,14 @@ function run_extract()
                     "source_file"=>rel,
                     "theorem"=>name, "goal"=>goal,
                     "kind"=>kind, "context"=>Any[]))
+                # Emit premises: Alloy pred/sig names referenced in body
+                for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_]+)\b", goal)
+                    h = strip(hm.captures[1])
+                    if !isempty(h) && length(h) < 50
+                        push!(pm, Dict{String,Any}("proof_id"=>id, "premise"=>h,
+                            "prover"=>"Alloy", "theorem"=>name, "source"=>"alloy"))
+                    end
+                end
                 id += 1
             end
         end
diff --git a/scripts/extract_arend.jl b/scripts/extract_arend.jl
index f784fb3..8be7b57 100644
--- a/scripts/extract_arend.jl
+++ b/scripts/extract_arend.jl
@@ -60,6 +60,14 @@ function run_extract()
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"arend",
                     "source_file"=>rel, "theorem"=>name,
                     "kind"=>kind, "goal"=>body, "context"=>Any[]))
+                # Emit premises: \\func/\\lemma names referenced in body
+                for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_'-]{1,40})\b", body)
+                    h = strip(hm.captures[1])
+                    if !isempty(h) && length(h) < 50
+                        push!(pm, Dict{String,Any}("proof_id"=>id, "premise"=>h,
+                            "prover"=>"arend", "theorem"=>name, "source"=>"arend"))
+                    end
+                end
                 id += 1
             end
         end
diff --git a/scripts/extract_athena.jl b/scripts/extract_athena.jl
index cd54556..05f73a3 100644
--- a/scripts/extract_athena.jl
+++ b/scripts/extract_athena.jl
@@ -69,6 +69,14 @@ function run_extract()
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"athena",
                     "source_file"=>rel, "theorem"=>name, "goal"=>body,
                     "kind"=>kind, "context"=>Any[]))
+                # Emit premises: Athena define/assert identifiers in body
+                for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_'-]{1,40})\b", body)
+                    h = strip(hm.captures[1])
+                    if !isempty(h) && length(h) < 50
+                        push!(pm, Dict{String,Any}("proof_id"=>id, "premise"=>h,
+                            "prover"=>"athena", "theorem"=>name, "source"=>"athena"))
+                    end
+                end
                 id += 1
             end
         end
diff --git a/scripts/extract_boogie.jl b/scripts/extract_boogie.jl
index 6638981..d98b57b 100644
--- a/scripts/extract_boogie.jl
+++ b/scripts/extract_boogie.jl
@@ -56,6 +56,14 @@ function extract_boogie_programs()
     const_pattern = r"const\s+(?:unique\s+)?(?:\{:[^}]*\}\s*)*([a-zA-Z0-9_.]+)\s*:\s*([^;]+);"
     type_pattern = r"type\s+(?:\{:[^}]*\}\s*)*([a-zA-Z0-9_.]+)\s*(?:<[^>]*>\s*)?(?:=([^;]+))?;"
 
+    # Boogie premise patterns: requires/ensures/modifies + called procedure names
+    boogie_hyp_patterns = [
+        r"\brequires\s+([a-zA-Z][a-zA-Z0-9_.]*)\b",
+        r"\bensures\s+([a-zA-Z][a-zA-Z0-9_.]*)\b",
+        r"\binvariant\s+([a-zA-Z][a-zA-Z0-9_.]*)\b",
+        r"\bcall\s+([a-zA-Z][a-zA-Z0-9_.]*)\b",
+    ]
+
     for f in bpl_files
         content = try
             read(f, String)
@@ -78,6 +86,14 @@ function extract_boogie_programs()
                 "source_file" => rel,
                 "theorem" => name, "goal" => contract,
                 "context" => Any[]))
+            for hyp_pattern in boogie_hyp_patterns
+                for hm in eachmatch(hyp_pattern, contract)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(premises, Dict{String,Any}(
+                        "proof_id"=>current_id, "premise"=>h,
+                        "prover"=>"Boogie", "theorem"=>name, "source"=>"boogie"))
+                end
+            end
             current_id += 1
         end
         matches = try
diff --git a/scripts/extract_cameleer.jl b/scripts/extract_cameleer.jl
index 84c1cae..2c1a4af 100644
--- a/scripts/extract_cameleer.jl
+++ b/scripts/extract_cameleer.jl
@@ -49,6 +49,9 @@ function extract_cameleer_proofs()
     type_pat = r"type\s+([a-zA-Z0-9_']+)\s+(?:[^=]*?=\s*([^(]+?))?\s*(?:\(\*@\s*(.*?)\s*\*\))"s
     standalone_pat = r"\(\*@\s*(requires|ensures|modifies|consumes|invariant|variant|raises|diverges|equivalent)\s+(.+?)\s*\*\)"s
 
+    # Cameleer/GOSPEL premise patterns: requires/ensures identifiers
+    cameleer_hyp_pat = r"\b(requires|ensures|modifies|invariant|variant)\s+([a-zA-Z][a-zA-Z0-9_']*)\b"
+
     for f in src_files
         content = try
             read(f, String)
@@ -66,6 +69,12 @@ function extract_cameleer_proofs()
                 "source_file" => rel,
                 "theorem" => name, "goal" => spec,
                 "kind" => "let_spec", "context" => Any[]))
+            for hm in eachmatch(cameleer_hyp_pat, spec)
+                h = strip(hm.captures[2])
+                !isempty(h) && length(h) < 50 && push!(premises, Dict{String,Any}(
+                    "proof_id"=>current_id, "premise"=>h,
+                    "prover"=>"Cameleer", "theorem"=>name, "source"=>"cameleer"))
+            end
             current_id += 1
         end
         for (pat, kind) in ((val_pat, "val"), (type_pat, "type"))
diff --git a/scripts/extract_cbmc.jl b/scripts/extract_cbmc.jl
index 3789da8..27a35f9 100644
--- a/scripts/extract_cbmc.jl
+++ b/scripts/extract_cbmc.jl
@@ -17,9 +17,18 @@ function run_extract()
         try
             c = read(f, String)
             for m in eachmatch(pat, c)
+                goal_text = strip(m.captures[1])
+                thm_name = strip(m.captures[2])
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"CBMC",
                     "source_file"=>relpath(f, DIR),
-                    "theorem"=>strip(m.captures[2]), "goal"=>strip(m.captures[1]), "context"=>Any[]))
+                    "theorem"=>thm_name, "goal"=>goal_text, "context"=>Any[]))
+                # Emit premises: C identifiers in assert condition
+                for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", goal_text)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"CBMC", "theorem"=>thm_name, "source"=>"cbmc"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_dafny.jl b/scripts/extract_dafny.jl
index 5537576..0f47294 100644
--- a/scripts/extract_dafny.jl
+++ b/scripts/extract_dafny.jl
@@ -26,6 +26,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "dafny")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_dafny.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_dafny.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_dafny.json")
 const START_ID = 96000
 
@@ -367,8 +368,17 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $added unique synthetic proofs")
 
+    # Dafny premise patterns: requires/ensures/invariant/modifies clauses
+    dafny_hyp_patterns = [
+        r"\brequires\s+([a-zA-Z][a-zA-Z0-9_]*)\b",
+        r"\bensures\s+([a-zA-Z][a-zA-Z0-9_]*)\b",
+        r"\binvariant\s+([a-zA-Z][a-zA-Z0-9_]*)\b",
+        r"\bmodifies\s+([a-zA-Z][a-zA-Z0-9_]*)\b",
+    ]
+
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -380,6 +390,22 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "dafny"),
         )
         push!(output_records, record)
+        spec_text = get(entry, "goal", "") * " " * get(entry, "tactic_proof", "")
+        thm_name = entry["theorem"]
+        for hyp_pattern in dafny_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, spec_text)
+                hyp = strip(hyp_match.captures[1])
+                if !isempty(hyp) && length(hyp) < 50
+                    push!(premises, Dict{String,Any}(
+                        "proof_id" => current_id,
+                        "premise" => String(hyp),
+                        "prover" => "Dafny",
+                        "theorem" => thm_name,
+                        "source" => get(entry, "source", "dafny"),
+                    ))
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -388,6 +414,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     stats = Dict{String,Any}(
         "prover" => "Dafny",
diff --git a/scripts/extract_dreal.jl b/scripts/extract_dreal.jl
index 9bc880d..253014a 100644
--- a/scripts/extract_dreal.jl
+++ b/scripts/extract_dreal.jl
@@ -21,6 +21,13 @@ function run_extract()
                     "source_file"=>relpath(f, DIR),
                     "theorem"=>"dreal_$(id)", "goal"=>strip(m.captures[1]),
                     "context"=>Any[]))
+                # Emit premises: SMT-LIB identifiers from assertion body
+                for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", m.captures[1])
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"DReal", "theorem"=>"dreal_$(id)", "source"=>"dreal"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_framac.jl b/scripts/extract_framac.jl
index 9b85678..f042c71 100644
--- a/scripts/extract_framac.jl
+++ b/scripts/extract_framac.jl
@@ -34,11 +34,19 @@ function run_extract()
         for m in matches
             body = first(strip(m.captures[2]), 800)
             isempty(body) && continue
+            thm_name = "$(m.captures[1])_$(id)"
             push!(ps, Dict{String,Any}("id"=>id, "prover"=>"FramaC",
                 "source_file"=>rel,
-                "theorem"=>"$(m.captures[1])_$(id)", "goal"=>body,
+                "theorem"=>thm_name, "goal"=>body,
                 "annotation_kind"=>strip(replace(m.captures[1], r"\s+"=>"_")),
                 "context"=>Any[]))
+            # Emit premises: C identifiers in ACSL clause body
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", body)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"FramaC", "theorem"=>thm_name, "source"=>"framac"))
+            end
             id += 1
         end
         matches = try
diff --git a/scripts/extract_fstar.jl b/scripts/extract_fstar.jl
index fb62db3..421461b 100644
--- a/scripts/extract_fstar.jl
+++ b/scripts/extract_fstar.jl
@@ -25,6 +25,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "fstar")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_fstar.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_fstar.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_fstar.json")
 const START_ID = 97000
 
@@ -425,8 +426,17 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $added unique synthetic proofs")
 
+    # F* premise patterns: let/requires/ensures bindings
+    fstar_hyp_patterns = [
+        r"\blet\s+([a-zA-Z0-9_]+)\s*=",
+        r"\bassume\s+([a-zA-Z0-9_]+)\b",
+        r"\brequires\s+([a-zA-Z0-9_]+)\b",
+        r"\bensures\s+([a-zA-Z0-9_]+)\b",
+    ]
+
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -438,6 +448,22 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "fstar"),
         )
         push!(output_records, record)
+        proof_text = get(entry, "tactic_proof", "") * " " * get(entry, "goal", "")
+        thm_name = entry["theorem"]
+        for hyp_pattern in fstar_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, proof_text)
+                hyp = strip(hyp_match.captures[1])
+                if !isempty(hyp) && length(hyp) < 50
+                    push!(premises, Dict{String,Any}(
+                        "proof_id" => current_id,
+                        "premise" => String(hyp),
+                        "prover" => "FStar",
+                        "theorem" => thm_name,
+                        "source" => get(entry, "source", "fstar"),
+                    ))
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -446,6 +472,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     stats = Dict{String,Any}(
         "prover" => "FStar",
diff --git a/scripts/extract_hol4.jl b/scripts/extract_hol4.jl
index bd724c5..6926916 100644
--- a/scripts/extract_hol4.jl
+++ b/scripts/extract_hol4.jl
@@ -24,6 +24,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "hol4")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_hol4.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_hol4.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_hol4.json")
 const START_ID = 91000
 
@@ -514,8 +515,18 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $(added) unique synthetic proofs")
 
+    # HOL4 premise patterns: DISCH_TAC/ASSUME_TAC constants, REWRITE_TAC thm names
+    hol4_hyp_patterns = [
+        r"\bDISCH_TAC\b",
+        r"ASSUME_TAC\s+([A-Z_][A-Z0-9_]+)",
+        r"REWRITE_TAC\s*\[([A-Z_][A-Z0-9_\s,]+)\]",
+        r"metis_tac\s*\[([a-zA-Z_][a-zA-Z0-9_\s,]+)\]",
+        r"\brw\s*\[([a-zA-Z_][a-zA-Z0-9_\s,]+)\]",
+    ]
+
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -527,6 +538,25 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "hol4"),
         )
         push!(output_records, record)
+        proof_text = get(entry, "tactic_proof", "")
+        thm_name = entry["theorem"]
+        for hyp_pattern in hol4_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, proof_text)
+                raw = hyp_match.captures[1] === nothing ? "" : hyp_match.captures[1]
+                hyps = [strip(h) for h in split(raw, ',')]
+                for hyp in hyps
+                    if !isempty(hyp) && length(hyp) < 50
+                        push!(premises, Dict{String,Any}(
+                            "proof_id" => current_id,
+                            "premise" => String(hyp),
+                            "prover" => "HOL4",
+                            "theorem" => thm_name,
+                            "source" => get(entry, "source", "hol4"),
+                        ))
+                    end
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -535,6 +565,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     stats = Dict{String,Any}(
         "prover" => "HOL4",
diff --git a/scripts/extract_hol_light.jl b/scripts/extract_hol_light.jl
index 4c245c8..89a9f9b 100644
--- a/scripts/extract_hol_light.jl
+++ b/scripts/extract_hol_light.jl
@@ -24,6 +24,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "hol_light")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_hol_light.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_hol_light.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_hol_light.json")
 const START_ID = 90000
 
@@ -513,9 +514,18 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $(added) unique synthetic proofs")
 
+    # HOL Light premise patterns: REWRITE_TAC/MESON_TAC/ASSUME theorem names
+    hol_light_hyp_patterns = [
+        r"REWRITE_TAC\s*\[([A-Z_][A-Z0-9_\s,;]+)\]",
+        r"MESON_TAC\s*\[([A-Z_][A-Z0-9_\s,;]+)\]",
+        r"ASSUME\s+`([a-zA-Z][a-zA-Z0-9_']*)`",
+        r"\bASM_REWRITE_TAC\s*\[([A-Z_][A-Z0-9_\s,;]+)\]",
+    ]
+
     # Assign IDs and normalise schema
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -527,6 +537,24 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "hol_light"),
         )
         push!(output_records, record)
+        proof_text = get(entry, "tactic_proof", "")
+        thm_name = entry["theorem"]
+        for hyp_pattern in hol_light_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, proof_text)
+                hyps = [strip(h) for h in split(hyp_match.captures[1], r"[,;]")]
+                for hyp in hyps
+                    if !isempty(hyp) && length(hyp) < 50
+                        push!(premises, Dict{String,Any}(
+                            "proof_id" => current_id,
+                            "premise" => String(hyp),
+                            "prover" => "HOLLight",
+                            "theorem" => thm_name,
+                            "source" => get(entry, "source", "hol_light"),
+                        ))
+                    end
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -536,6 +564,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     # Write stats
     stats = Dict{String,Any}(
diff --git a/scripts/extract_idris2.jl b/scripts/extract_idris2.jl
index fe36f95..4384d4f 100644
--- a/scripts/extract_idris2.jl
+++ b/scripts/extract_idris2.jl
@@ -25,6 +25,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "idris2")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_idris2.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_idris2.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_idris2.json")
 const START_ID = 98000
 
@@ -415,8 +416,17 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $added unique synthetic proofs")
 
+    # Idris2 premise patterns: intro/case/let/Refl names
+    idris2_hyp_patterns = [
+        r"\bcase\s+([a-zA-Z0-9_]+)\s+of",
+        r"\blet\s+([a-zA-Z0-9_]+)\s*=",
+        r"\b(Refl|LTEZero|LTESucc|FZ|FS)\b",
+        r"cong\s+([a-zA-Z0-9_]+)\b",
+    ]
+
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -428,6 +438,22 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "idris2"),
         )
         push!(output_records, record)
+        proof_text = get(entry, "tactic_proof", "")
+        thm_name = entry["theorem"]
+        for hyp_pattern in idris2_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, proof_text)
+                hyp = strip(hyp_match.captures[1])
+                if !isempty(hyp) && length(hyp) < 50
+                    push!(premises, Dict{String,Any}(
+                        "proof_id" => current_id,
+                        "premise" => String(hyp),
+                        "prover" => "Idris2",
+                        "theorem" => thm_name,
+                        "source" => get(entry, "source", "idris2"),
+                    ))
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -436,6 +462,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     stats = Dict{String,Any}(
         "prover" => "Idris2",
diff --git a/scripts/extract_isabelle_tropical.jl b/scripts/extract_isabelle_tropical.jl
index e30261e..7c7aaf8 100644
--- a/scripts/extract_isabelle_tropical.jl
+++ b/scripts/extract_isabelle_tropical.jl
@@ -19,7 +19,7 @@
 # Schema matches existing ECHIDNA corpus records:
 #   { "id", "prover", "theorem", "goal", "context", "tactic_proof", "source" }
 
-using JSON3
+using JSON3, Dates
 
 # ---------------------------------------------------------------------------
 # Configuration
@@ -28,6 +28,7 @@ using JSON3
 const REPO_ROOT       = dirname(dirname(abspath(@__FILE__)))
 const OUTPUT_DIR      = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE     = joinpath(OUTPUT_DIR, "proof_states_isabelle.jsonl")
+const PREMISES_FILE   = joinpath(OUTPUT_DIR, "premises_isabelle_tropical.jsonl")
 const STATS_FILE      = joinpath(OUTPUT_DIR, "stats_isabelle_tropical.json")
 const START_ID        = 200000
 
@@ -650,6 +651,7 @@ function run()
 
     # Emit JSONL
     current_id = START_ID
+    premises_records = Dict{String,Any}[]
     open(OUTPUT_FILE, "w") do fh
         for e in unique_entries
             record = Dict{String,Any}(
@@ -662,15 +664,26 @@ function run()
                 "source"       => "tropical_resource_typing/$(e.source_file)",
             )
             println(fh, JSON3.write(record))
+            # Emit premises: Isabelle identifiers from goal
+            for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_']{1,40})\b", e.goal)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(premises_records, Dict{String,Any}(
+                    "proof_id"=>current_id, "premise"=>h,
+                    "prover"=>"Isabelle", "theorem"=>e.name, "source"=>"isabelle_tropical"))
+            end
             current_id += 1
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises_records; println(fh, JSON3.write(p)); end
+    end
     println("[Isabelle] Written to $(OUTPUT_FILE)")
 
     # Stats
     stats = Dict{String,Any}(
         "prover"          => "Isabelle",
         "total_proofs"    => length(unique_entries),
+        "premises_count"  => length(premises_records),
         "from_thy_files"  => length(unique_entries) - added,
         "synthetic_added" => added,
         "id_range"        => [START_ID, current_id - 1],
diff --git a/scripts/extract_isabelle_zf.jl b/scripts/extract_isabelle_zf.jl
index fac394d..c24f5ad 100644
--- a/scripts/extract_isabelle_zf.jl
+++ b/scripts/extract_isabelle_zf.jl
@@ -56,6 +56,13 @@ function extract_isabelle_zf_proofs()
                         "theorem" => name, "goal" => statement,
                         "context" => Any[],
                     ))
+                    # Emit premises: Isabelle/ZF identifiers from theorem statement
+                    for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_']{1,40})\b", statement)
+                        h = strip(hm.captures[1])
+                        !isempty(h) && length(h) < 50 && push!(premises, Dict{String,Any}(
+                            "proof_id"=>current_id, "premise"=>h,
+                            "prover"=>"IsabelleZF", "theorem"=>name, "source"=>"isabelle_zf"))
+                    end
                     current_id += 1
                 end
             end
diff --git a/scripts/extract_key.jl b/scripts/extract_key.jl
index 1ee94e4..f117876 100644
--- a/scripts/extract_key.jl
+++ b/scripts/extract_key.jl
@@ -51,6 +51,13 @@ function run_extract()
                     "source_file"=>rel,
                     "theorem"=>name, "goal"=>body, "kind"=>kind,
                     "context"=>Any[]))
+                # Emit premises: Java/JML identifiers in body
+                for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", body)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"KeY", "theorem"=>name, "source"=>"key"))
+                end
                 id += 1
             end
         end
diff --git a/scripts/extract_lambda_prolog.jl b/scripts/extract_lambda_prolog.jl
index d835f7e..f1a10b1 100644
--- a/scripts/extract_lambda_prolog.jl
+++ b/scripts/extract_lambda_prolog.jl
@@ -59,6 +59,13 @@ function run_extract()
             push!(ps, Dict{String,Any}("id"=>id, "prover"=>"lambda_prolog",
                 "source_file"=>rel, "theorem"=>n, "goal"=>g,
                 "kind"=>"clause", "context"=>Any[]))
+            # Emit premises: predicate names in clause args
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_']{1,40})\b", g)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"lambda_prolog", "theorem"=>n, "source"=>"lambda_prolog"))
+            end
             id += 1
         end
         matches = try collect(eachmatch(decl_pat, c)) catch; Any[] end
diff --git a/scripts/extract_mathcomp.jl b/scripts/extract_mathcomp.jl
index a4eaebe..73e971f 100644
--- a/scripts/extract_mathcomp.jl
+++ b/scripts/extract_mathcomp.jl
@@ -11,7 +11,7 @@
 #         training_data/tactics_mathcomp.a2ml
 #         training_data/stats_mathcomp.a2ml
 
-using Dates
+using Dates, JSON3
 include("a2ml_emit.jl")
 using .A2MLEmit
 
@@ -69,9 +69,19 @@ function extract_all(base_dir::String)
 
     proof_states = Dict{String,Any}[]
     tactics = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     skipped_noproof = 0
     errors = 0
 
+    # Coq/ssreflect premise patterns: intro/apply/have/pose/specialize
+    coq_hyp_patterns = [
+        r"\bintro\s+([a-zA-Z0-9_']+)",
+        r"\bintros\s+([a-zA-Z0-9_'\s]+)",
+        r"\bhave\s+([a-zA-Z0-9_']+)\s*:",
+        r"\bpose\s+([a-zA-Z0-9_']+)\s*:=",
+        r"\bspecialize\s+\(?([a-zA-Z0-9_']+)\b",
+    ]
+
     for (idx, fpath) in enumerate(files)
         content = try
             read(fpath, String)
@@ -138,6 +148,25 @@ function extract_all(base_dir::String)
                 "proof_steps" => length(steps),
             ))
 
+            # Emit premise records from proof body
+            proof_body_text = String(proof_body)
+            for hyp_pattern in coq_hyp_patterns
+                for hyp_match in eachmatch(hyp_pattern, proof_body_text)
+                    hyps = [strip(h) for h in split(hyp_match.captures[1], ' ')]
+                    for hyp in hyps
+                        if !isempty(hyp) && length(hyp) < 50
+                            push!(premises, Dict{String,Any}(
+                                "proof_id" => record_id,
+                                "premise" => String(hyp),
+                                "prover" => PROVER,
+                                "theorem" => theorem,
+                                "source" => "mathcomp",
+                            ))
+                        end
+                    end
+                end
+            end
+
             for (step_idx, tac) in enumerate(steps)
                 push!(tactics, Dict{String,Any}(
                     "proof_id" => record_id,
@@ -158,15 +187,16 @@ function extract_all(base_dir::String)
         "total_files_scanned" => length(files),
         "total_proofs" => length(proof_states),
         "total_tactics" => length(tactics),
+        "total_premises" => length(premises),
         "skipped_no_proof" => skipped_noproof,
         "read_errors" => errors,
         "source" => "Mathematical Components (math-comp/math-comp)",
         "id_range" => id_range,
     )
-    return proof_states, tactics, stats
+    return proof_states, tactics, premises, stats
 end
 
-function save_results(proof_states, tactics, stats; output_dir="training_data")
+function save_results(proof_states, tactics, premises, stats; output_dir="training_data")
     mkpath(output_dir)
     write_records_file(
         joinpath(output_dir, "proof_states_mathcomp.a2ml"),
@@ -176,12 +206,17 @@ function save_results(proof_states, tactics, stats; output_dir="training_data")
         joinpath(output_dir, "tactics_mathcomp.a2ml"),
         stats, tactics, "tactic";
         header="mathcomp tactic records (ssreflect tactics per step)")
+    open(joinpath(output_dir, "premises_mathcomp.jsonl"), "w") do fh
+        for p in premises
+            JSON3.write(fh, p); println(fh)
+        end
+    end
     open(joinpath(output_dir, "stats_mathcomp.a2ml"), "w") do fh
         println(fh, "# SPDX-License-Identifier: PMPL-1.0-or-later")
         println(fh, "# mathcomp extraction statistics"); println(fh)
         A2MLEmit.write_metadata_table(fh, stats)
     end
-    println("\nSaved $(length(proof_states)) proof states, $(length(tactics)) tactics -> $output_dir/*_mathcomp.a2ml")
+    println("\nSaved $(length(proof_states)) proof states, $(length(tactics)) tactics, $(length(premises)) premises -> $output_dir/*_mathcomp.*")
 end
 
 function main()::Int
@@ -207,22 +242,25 @@ function main()::Int
     # Override extract_all to scan every root.
     proof_states = Dict{String,Any}[]
     tactics = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     stats = Dict{String,Any}()
     for root in roots
-        ps, ts, st = extract_all(root)
+        ps, ts, pm, st = extract_all(root)
         append!(proof_states, ps)
         append!(tactics, ts)
+        append!(premises, pm)
         stats = st  # keep last
     end
     stats["total_proofs"] = length(proof_states)
     stats["total_tactics"] = length(tactics)
+    stats["total_premises"] = length(premises)
     stats["id_range"] = isempty(proof_states) ? "none" :
         "$(ID_BASE)-$(ID_BASE + length(proof_states) - 1)"
     if isempty(proof_states)
         println("\nWARNING: No proof states extracted.")
         return 1
     end
-    save_results(proof_states, tactics, stats)
+    save_results(proof_states, tactics, premises, stats)
     println("\nTotal: $(stats["total_proofs"]) proofs / $(stats["total_tactics"]) tactics (IDs $(stats["id_range"]))")
     return 0
 end
diff --git a/scripts/extract_mathlib4.jl b/scripts/extract_mathlib4.jl
index 8981ddf..6c2fc55 100644
--- a/scripts/extract_mathlib4.jl
+++ b/scripts/extract_mathlib4.jl
@@ -98,6 +98,32 @@ function extract_mathlib4_proofs()
                             end
                         end
 
+                        # Extract premises/hypotheses from proof body
+                        hyp_patterns = [
+                            r"intro\s+([a-zA-Z0-9_\s]+)",
+                            r"intros\s+([a-zA-Z0-9_\s]+)",
+                            r"⟨([a-zA-Z0-9_\s,]+)⟩",
+                            r"let\s+([a-zA-Z0-9_]+)\s*:=",
+                            r"have\s+([a-zA-Z0-9_]+)\s*:=",
+                        ]
+                        proof_text = !isnothing(by_match) ? strip(by_match.captures[1]) : ""
+                        for hyp_pattern in hyp_patterns
+                            for hyp_match in eachmatch(Regex(hyp_pattern), proof_text)
+                                hyps = [strip(h) for h in split(hyp_match.captures[1], ',')]
+                                for hyp in hyps
+                                    if !isempty(hyp) && length(hyp) < 50
+                                        push!(premises, Dict{String,Any}(
+                                            "proof_id" => current_id,
+                                            "premise" => String(hyp),
+                                            "prover" => "Lean",
+                                            "theorem" => name,
+                                            "source" => "mathlib4",
+                                        ))
+                                    end
+                                end
+                            end
+                        end
+
                         current_id += 1
                     end
                 end
diff --git a/scripts/extract_matita.jl b/scripts/extract_matita.jl
index 6681a14..aa9652c 100644
--- a/scripts/extract_matita.jl
+++ b/scripts/extract_matita.jl
@@ -37,6 +37,13 @@ function run_extract()
             push!(ps, Dict{String,Any}("id"=>id, "prover"=>"matita",
                 "source_file"=>rel, "theorem"=>name, "goal"=>goal,
                 "kind"=>kind, "context"=>Any[]))
+            # Emit premises: Matita/Coq-style identifiers in goal type
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_']{1,40})\b", goal)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"matita", "theorem"=>name, "source"=>"matita"))
+            end
             id += 1
         end
         matches = try collect(eachmatch(let_pattern, c)) catch; Any[] end
diff --git a/scripts/extract_mercury.jl b/scripts/extract_mercury.jl
index 80c43a3..3a71cde 100644
--- a/scripts/extract_mercury.jl
+++ b/scripts/extract_mercury.jl
@@ -36,6 +36,13 @@ function run_extract()
                 if !isempty(n)
                     push!(ps, Dict{String,Any}("id"=>id, "prover"=>"mercury",
                         "source_file"=>relpath(f, DIR), "theorem"=>n, "goal"=>g, "context"=>Any[]))
+                    # Emit premises: type names in predicate signature
+                    for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", g)
+                        h = strip(hm.captures[1])
+                        !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                            "proof_id"=>id, "premise"=>h,
+                            "prover"=>"mercury", "theorem"=>n, "source"=>"mercury"))
+                    end
                     id += 1
                 end
             end
diff --git a/scripts/extract_metamath.jl b/scripts/extract_metamath.jl
index 5c03dd6..07a91bf 100644
--- a/scripts/extract_metamath.jl
+++ b/scripts/extract_metamath.jl
@@ -79,11 +79,16 @@ Returns the number of theorems saved.
 function save_as_training_data(theorems::Vector{Theorem})::Int
     proof_states = Dict{String,Any}[]
     tactics = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
+
+    # Metamath premises are proof-step labels (space-separated identifiers in proof)
+    metamath_step_pat = r"\b([a-z][a-z0-9.-]{1,30})\b"
 
     for (i, theorem) in enumerate(theorems)
+        rec_id = i - 1 + 1000
         # Proof state
         state = Dict{String,Any}(
-            "id" => i - 1 + 1000,  # Start from 1000 to avoid conflicts
+            "id" => rec_id,
             "prover" => "Metamath",
             "theorem" => theorem.name,
             "goal" => theorem.statement,
@@ -95,13 +100,28 @@ function save_as_training_data(theorems::Vector{Theorem})::Int
 
         # Tactic
         tactic = Dict{String,Any}(
-            "proof_id" => i - 1 + 1000,
+            "proof_id" => rec_id,
             "step" => 1,
             "tactic" => "metamath_prove",
             "prover" => "Metamath",
             "proof_text" => theorem.proof
         )
         push!(tactics, tactic)
+
+        # Emit premise records: each proof-step label is a referenced lemma
+        seen_steps = Set{String}()
+        for hyp_match in eachmatch(metamath_step_pat, theorem.proof)
+            step = hyp_match.captures[1]
+            step ∈ seen_steps && continue
+            push!(seen_steps, step)
+            step != theorem.name && push!(premises, Dict{String,Any}(
+                "proof_id" => rec_id,
+                "premise" => step,
+                "prover" => "Metamath",
+                "theorem" => theorem.name,
+                "source" => "Metamath",
+            ))
+        end
     end
 
     # Save to files
@@ -120,6 +140,12 @@ function save_as_training_data(theorems::Vector{Theorem})::Int
         end
     end
 
+    open(joinpath(training_dir, "premises_metamath.jsonl"), "w") do f
+        for p in premises
+            println(f, JSON3.write(p))
+        end
+    end
+
     # Save stats
     stats = Dict{String,Any}(
         "version" => "v2.0-metamath-python",
diff --git a/scripts/extract_minizinc.jl b/scripts/extract_minizinc.jl
index 1551efa..60663d3 100644
--- a/scripts/extract_minizinc.jl
+++ b/scripts/extract_minizinc.jl
@@ -25,6 +25,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "minizinc")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_minizinc.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_minizinc.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_minizinc.json")
 const START_ID = 99000
 
@@ -708,9 +709,13 @@ function run()::Tuple{Int,Int}
     # (model, solver) pair — 5× the per-solver coverage without any
     # new data. Synthetic entries retain their specific `solver`
     # assignment so sub-family statistics stay accurate.
+    # MiniZinc premise patterns: constraint/variable identifiers
+    mzn_hyp_pat = r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b"
+
     solvers_cycle = ["MiniZinc", "Chuffed", "ORTools", "SCIP", "GLPK"]
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises_out = Dict{String,Any}[]
 
     for entry in all_entries
         if haskey(entry, "solver")
@@ -748,11 +753,29 @@ function run()::Tuple{Int,Int}
         end
     end
 
+    # Emit premises: constraint/variable names per record
+    for rec in output_records
+        goal_text = get(rec, "goal", "")
+        thm_name = get(rec, "theorem", "")
+        for hm in eachmatch(mzn_hyp_pat, goal_text)
+            h = strip(hm.captures[1])
+            !isempty(h) && length(h) < 50 && push!(premises_out, Dict{String,Any}(
+                "proof_id"=>rec["id"], "premise"=>h,
+                "prover"=>get(rec, "prover", "MiniZinc"),
+                "theorem"=>thm_name, "source"=>"minizinc"))
+        end
+    end
+
     open(OUTPUT_FILE, "w") do fh
         for rec in output_records
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises_out
+            println(fh, JSON3.write(p))
+        end
+    end
 
     # Count per solver
     solver_counts = Dict{String,Int}()
diff --git a/scripts/extract_mizar.jl b/scripts/extract_mizar.jl
index 007a19d..fd4fb7b 100644
--- a/scripts/extract_mizar.jl
+++ b/scripts/extract_mizar.jl
@@ -25,6 +25,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "mizar")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_mizar.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_mizar.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_mizar.json")
 const START_ID = 94000
 
@@ -455,8 +456,17 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $(added) unique synthetic proofs")
 
+    # Mizar premise patterns: assume/let/consider/per cases keywords
+    mizar_hyp_patterns = [
+        r"\bassume\s+([A-Za-z][A-Za-z0-9_]*)\b",
+        r"\blet\s+([A-Za-z][A-Za-z0-9_]*)\b",
+        r"\bconsider\s+([A-Za-z][A-Za-z0-9_]*)\b",
+        r"\bset\s+([A-Za-z][A-Za-z0-9_]*)\s*=",
+    ]
+
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -468,6 +478,22 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "mizar"),
         )
         push!(output_records, record)
+        proof_text = get(entry, "tactic_proof", "")
+        thm_name = entry["theorem"]
+        for hyp_pattern in mizar_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, proof_text)
+                hyp = strip(hyp_match.captures[1])
+                if !isempty(hyp) && length(hyp) < 50
+                    push!(premises, Dict{String,Any}(
+                        "proof_id" => current_id,
+                        "premise" => String(hyp),
+                        "prover" => "Mizar",
+                        "theorem" => thm_name,
+                        "source" => get(entry, "source", "mizar"),
+                    ))
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -476,6 +502,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     stats = Dict{String,Any}(
         "prover" => "Mizar",
diff --git a/scripts/extract_naproche.jl b/scripts/extract_naproche.jl
index 41eea4d..23e3bfe 100644
--- a/scripts/extract_naproche.jl
+++ b/scripts/extract_naproche.jl
@@ -50,6 +50,13 @@ function run_extract()
             push!(ps, Dict{String,Any}("id"=>id, "prover"=>"naproche",
                 "source_file"=>rel, "theorem"=>name,
                 "kind"=>kind, "goal"=>body, "context"=>Any[]))
+            # Emit premises: ForTheL identifiers referenced in body
+            for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_]{1,40})\b", body)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"naproche", "theorem"=>name, "source"=>"naproche"))
+            end
             id += 1
         end
 
diff --git a/scripts/extract_nitpick.jl b/scripts/extract_nitpick.jl
index 22bb356..51249bf 100644
--- a/scripts/extract_nitpick.jl
+++ b/scripts/extract_nitpick.jl
@@ -74,6 +74,13 @@ function run_extract()
                 "source_file"=>rel, "theorem"=>theorem,
                 "goal"=>first(lemma_body, 600),
                 "opts"=>opts, "kind"=>kind, "context"=>Any[]))
+            # Emit premises: Isabelle identifiers from lemma body
+            for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_']{1,40})\b", lemma_body)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"nitpick", "theorem"=>theorem, "source"=>"nitpick"))
+            end
             id += 1
         end
     end
diff --git a/scripts/extract_nunchaku.jl b/scripts/extract_nunchaku.jl
index b388793..400b99c 100644
--- a/scripts/extract_nunchaku.jl
+++ b/scripts/extract_nunchaku.jl
@@ -56,6 +56,13 @@ function run_extract()
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"nunchaku",
                     "source_file"=>rel, "theorem"=>n, "goal"=>g,
                     "kind"=>"goal", "context"=>Any[]))
+                # Emit premises: identifiers from goal body
+                for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_']{1,40})\b", g)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"nunchaku", "theorem"=>n, "source"=>"nunchaku"))
+                end
                 id += 1
             end
         else
@@ -83,6 +90,13 @@ function run_extract()
                     "source_file"=>rel, "theorem"=>theorem,
                     "goal"=>first(lemma_body, 600),
                     "opts"=>opts, "kind"=>kind, "context"=>Any[]))
+                # Emit premises: Isabelle identifiers from lemma body
+                for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_']{1,40})\b", lemma_body)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"nunchaku", "theorem"=>theorem, "source"=>"nunchaku"))
+                end
                 id += 1
             end
         end
diff --git a/scripts/extract_nusmv.jl b/scripts/extract_nusmv.jl
index 4c1779a..7b463d6 100644
--- a/scripts/extract_nusmv.jl
+++ b/scripts/extract_nusmv.jl
@@ -20,6 +20,13 @@ function run_extract()
                     "source_file"=>relpath(f, DIR),
                     "theorem"=>"$(m.captures[1])_$(id)", "goal"=>strip(m.captures[2]),
                     "spec_kind"=>m.captures[1], "context"=>Any[]))
+                # Emit premises: identifiers from CTL/LTL formula
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", m.captures[2])
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"NuSMV", "theorem"=>"$(m.captures[1])_$(id)", "source"=>"nusmv"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_opentheory.jl b/scripts/extract_opentheory.jl
index 79e72c6..fa73fbd 100644
--- a/scripts/extract_opentheory.jl
+++ b/scripts/extract_opentheory.jl
@@ -29,6 +29,7 @@ const OT_ROOT = joinpath(REPO_ROOT, "external_corpora", "opentheory")
 const OT_THEORIES = joinpath(OT_ROOT, "data", "theories")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_opentheory.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_opentheory.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_opentheory.json")
 const START_ID = 2_500_000
 
@@ -190,6 +191,7 @@ function save_results(packages::Vector, theorems::Vector)
     # aligned HOL-family provers listed — downstream VeriSim will
     # materialise per-prover alignment edges from this.
     current_id = START_ID
+    premises_records = Dict{String,Any}[]
     open(OUTPUT_FILE, "w") do fh
         for pkg in packages
             name = get(pkg, "name", "unknown")
@@ -206,6 +208,13 @@ function save_results(packages::Vector, theorems::Vector)
                 "source" => get(pkg, "__path", "opentheory/unknown.thy"),
             )
             println(fh, JSON3.write(rec))
+            # Emit premises: dotted name components from package name
+            for part in split(name, '.')
+                h = strip(part)
+                !isempty(h) && length(h) < 50 && push!(premises_records, Dict{String,Any}(
+                    "proof_id"=>current_id, "premise"=>h,
+                    "prover"=>"OpenTheory", "theorem"=>name, "source"=>"opentheory"))
+            end
             current_id += 1
         end
         for thm in theorems
@@ -222,9 +231,21 @@ function save_results(packages::Vector, theorems::Vector)
                 "source" => thm["source"],
             )
             println(fh, JSON3.write(rec))
+            # Emit premises: identifiers from theorem name and goal hint
+            for text in (thm["theorem"], thm["goal"])
+                for hm in eachmatch(r"\b([A-Za-z][A-Za-z0-9_'.]{1,40})\b", text)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(premises_records, Dict{String,Any}(
+                        "proof_id"=>current_id, "premise"=>h,
+                        "prover"=>"OpenTheory", "theorem"=>thm["theorem"], "source"=>"opentheory"))
+                end
+            end
             current_id += 1
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises_records; println(fh, JSON3.write(p)); end
+    end
 
     stats = Dict{String,Any}(
         "prover"              => "OpenTheory",
diff --git a/scripts/extract_prism.jl b/scripts/extract_prism.jl
index ffdecbf..1c1aa07 100644
--- a/scripts/extract_prism.jl
+++ b/scripts/extract_prism.jl
@@ -48,6 +48,13 @@ function run_extract()
                 "theorem"=>"prob_$(id)", "goal"=>goal,
                 "kind"=>"property", "operator"=>String(m.captures[1]),
                 "context"=>Any[]))
+            # Emit premises: identifiers from property formula
+            for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", goal)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"Prism", "theorem"=>"prob_$(id)", "source"=>"prism"))
+            end
             id += 1
         end
         for (pat, kind) in ((named_prop_pat, "named_property"),
@@ -65,6 +72,13 @@ function run_extract()
                     "source_file"=>rel,
                     "theorem"=>name, "goal"=>goal, "kind"=>kind,
                     "context"=>Any[]))
+                # Emit premises: identifiers from module/formula body
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", goal)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"Prism", "theorem"=>name, "source"=>"prism"))
+                end
                 id += 1
             end
         end
diff --git a/scripts/extract_proverif.jl b/scripts/extract_proverif.jl
index 38f4d43..50960e7 100644
--- a/scripts/extract_proverif.jl
+++ b/scripts/extract_proverif.jl
@@ -21,6 +21,13 @@ function run_extract()
                     "source_file"=>relpath(f, DIR),
                     "theorem"=>"$(m.captures[1])_$(id)", "goal"=>strip(m.captures[2]),
                     "query_kind"=>m.captures[1], "context"=>Any[]))
+                # Emit premises: identifiers from query formula
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", m.captures[2])
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"ProVerif", "theorem"=>"$(m.captures[1])_$(id)", "source"=>"proverif"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_pvs.jl b/scripts/extract_pvs.jl
index 513da8c..6938857 100644
--- a/scripts/extract_pvs.jl
+++ b/scripts/extract_pvs.jl
@@ -25,6 +25,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "pvs")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_pvs.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_pvs.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_pvs.json")
 const START_ID = 93000
 
@@ -283,8 +284,17 @@ function run()::Tuple{Int,Int}
     end
     println("  Generated $added unique synthetic proofs")
 
+    # PVS premise patterns: IMPLIES/LET/ASSUME/FORALL keywords + tactic names
+    pvs_hyp_patterns = [
+        r"\bLET\s+([a-zA-Z][a-zA-Z0-9_]*)\s*=",
+        r"\bASK\s+([a-zA-Z][a-zA-Z0-9_]*)\b",
+        r"\bUSING\s+([a-zA-Z][a-zA-Z0-9_]*)\b",
+        r"\b(grind|expand|simplify|rewrite|split|flatten)\b",
+    ]
+
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -296,6 +306,22 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "pvs"),
         )
         push!(output_records, record)
+        proof_text = get(entry, "tactic_proof", "")
+        thm_name = entry["theorem"]
+        for hyp_pattern in pvs_hyp_patterns
+            for hyp_match in eachmatch(hyp_pattern, proof_text)
+                hyp = strip(hyp_match.captures[1])
+                if !isempty(hyp) && length(hyp) < 50
+                    push!(premises, Dict{String,Any}(
+                        "proof_id" => current_id,
+                        "premise" => String(hyp),
+                        "prover" => "PVS",
+                        "theorem" => thm_name,
+                        "source" => get(entry, "source", "pvs"),
+                    ))
+                end
+            end
+        end
         current_id += 1
     end
 
@@ -304,6 +330,11 @@ function run()::Tuple{Int,Int}
             println(fh, JSON3.write(rec))
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises
+            println(fh, JSON3.write(p))
+        end
+    end
 
     stats = Dict{String,Any}(
         "prover" => "PVS",
diff --git a/scripts/extract_sat_benchmarks.jl b/scripts/extract_sat_benchmarks.jl
index 153e345..81cacd2 100644
--- a/scripts/extract_sat_benchmarks.jl
+++ b/scripts/extract_sat_benchmarks.jl
@@ -38,6 +38,14 @@ function collect_for(prover::String, start_id::Int)
                 "theorem"=>basename(f),
                 "goal"=>"cnf vars=$nvar clauses=$nclause",
                 "context"=>Any[]))
+            # Emit premises: identifiers from benchmark filename stem
+            stem = splitext(basename(f))[1]
+            for hm in eachmatch(r"\b([a-zA-Z][a-zA-Z0-9_]{1,40})\b", stem)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>prover, "theorem"=>basename(f), "source"=>"sat_benchmarks"))
+            end
             id += 1
         catch e; println("Warning: $f: $e"); end
     end
diff --git a/scripts/extract_seahorn.jl b/scripts/extract_seahorn.jl
index ee35613..12271f3 100644
--- a/scripts/extract_seahorn.jl
+++ b/scripts/extract_seahorn.jl
@@ -40,6 +40,13 @@ function run_extract()
                 "source_file"=>rel,
                 "theorem"=>"$(kind)_$(id)", "goal"=>body,
                 "annotation_kind"=>kind, "context"=>Any[]))
+            # Emit premises: C identifiers from assertion body
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", body)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"SeaHorn", "theorem"=>"$(kind)_$(id)", "source"=>"seahorn"))
+            end
             id += 1
         end
     end
diff --git a/scripts/extract_smtlib.jl b/scripts/extract_smtlib.jl
index c6ac113..2e33a78 100644
--- a/scripts/extract_smtlib.jl
+++ b/scripts/extract_smtlib.jl
@@ -11,7 +11,7 @@
 #         training_data/tactics_smtlib.a2ml
 #         training_data/stats_smtlib.a2ml
 
-using Dates
+using Dates, JSON3
 include("a2ml_emit.jl")
 using .A2MLEmit
 
@@ -240,6 +240,7 @@ function extract_all(base_dir::String)
 
     proof_states = Dict{String,Any}[]
     tactics = Dict{String,Any}[]
+    premises = Dict{String,Any}[]
     prover_counts = Dict{String,Int}(p => 0 for p in PROVERS)
     logic_counts = Dict{String,Int}()
     status_counts = Dict{String,Int}("sat" => 0, "unsat" => 0, "unknown" => 0)
@@ -313,6 +314,13 @@ function extract_all(base_dir::String)
                 "synthetic_goal" => synthetic,
             )
             push!(proof_states, state)
+            # Emit premises: SMT-LIB identifiers from assertion goal
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_\-]{1,40})\b", goal)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(premises, Dict{String,Any}(
+                    "proof_id"=>record_id, "premise"=>h,
+                    "prover"=>prover, "theorem"=>parsed["name"], "source"=>"smtlib"))
+            end
             prover_counts[prover] += 1
         end
 
@@ -363,7 +371,7 @@ function extract_all(base_dir::String)
         "id_range" => isempty(proof_states) ? "none" : "$(ID_BASE)-$(ID_BASE + length(proof_states) - 1)",
     )
 
-    return proof_states, tactics, stats
+    return proof_states, tactics, premises, stats
 end
 
 
@@ -372,7 +380,7 @@ end
 
 Write extraction results to JSONL / JSON files.
 """
-function save_results(proof_states, tactics, stats; output_dir="training_data")
+function save_results(proof_states, tactics, premises, stats; output_dir="training_data")
     mkpath(output_dir)
 
     write_records_file(
@@ -394,8 +402,13 @@ function save_results(proof_states, tactics, stats; output_dir="training_data")
         A2MLEmit.write_metadata_table(fh, stats)
     end
 
+    open(joinpath(output_dir, "premises_smtlib.jsonl"), "w") do fh
+        for p in premises; println(fh, JSON3.write(p)); end
+    end
+
     println("\nSaved $(length(proof_states)) proof states -> $(output_dir)/proof_states_smtlib.a2ml")
     println("Saved $(length(tactics)) tactics        -> $(output_dir)/tactics_smtlib.a2ml")
+    println("Saved $(length(premises)) premises       -> $(output_dir)/premises_smtlib.jsonl")
     println("Saved stats                        -> $(output_dir)/stats_smtlib.a2ml")
 end
 
@@ -416,14 +429,14 @@ function main()::Int
         return 1
     end
 
-    proof_states, tactics, stats = extract_all(base_dir)
+    proof_states, tactics, premises, stats = extract_all(base_dir)
 
     if isempty(proof_states)
         println("\nWARNING: No proof states extracted. Check corpus contents.")
         return 1
     end
 
-    save_results(proof_states, tactics, stats)
+    save_results(proof_states, tactics, premises, stats)
 
     println("\nProver distribution:")
     for (prover, count) in stats["prover_distribution"]
diff --git a/scripts/extract_spin.jl b/scripts/extract_spin.jl
index 97d3807..179c380 100644
--- a/scripts/extract_spin.jl
+++ b/scripts/extract_spin.jl
@@ -41,6 +41,13 @@ function run_extract()
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"SPIN",
                     "source_file"=>rel, "theorem"=>name,
                     "goal"=>goal, "kind"=>kind, "context"=>Any[]))
+                # Emit premises: identifiers from LTL/inline/mtype/define body
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", goal)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"SPIN", "theorem"=>name, "source"=>"spin"))
+                end
                 id += 1
             end
         end
diff --git a/scripts/extract_tamarin.jl b/scripts/extract_tamarin.jl
index 655c266..a0e75b6 100644
--- a/scripts/extract_tamarin.jl
+++ b/scripts/extract_tamarin.jl
@@ -20,6 +20,13 @@ function run_extract()
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"Tamarin",
                     "source_file"=>relpath(f, DIR), "theorem"=>strip(m.captures[1]),
                     "goal"=>strip(m.captures[2]), "context"=>Any[]))
+                # Emit premises: identifiers from lemma formula
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", m.captures[2])
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"Tamarin", "theorem"=>strip(m.captures[1]), "source"=>"tamarin"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_tlaps.jl b/scripts/extract_tlaps.jl
index b141742..592a8e0 100644
--- a/scripts/extract_tlaps.jl
+++ b/scripts/extract_tlaps.jl
@@ -45,6 +45,13 @@ function run_extract()
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"TLAPS",
                     "source_file"=>rel, "theorem"=>name,
                     "goal"=>body, "kind"=>kind, "context"=>Any[]))
+                # Emit premises: TLA+ identifiers from theorem body
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", body)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"TLAPS", "theorem"=>name, "source"=>"tlaps"))
+                end
                 id += 1
             end
         end
@@ -64,6 +71,13 @@ function run_extract()
                         "source_file"=>rel, "theorem"=>name,
                         "goal"=>body, "kind"=>"definition",
                         "context"=>Any[]))
+                    # Emit premises: TLA+ identifiers from definition body
+                    for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", body)
+                        h = strip(hm.captures[1])
+                        !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                            "proof_id"=>id, "premise"=>h,
+                            "prover"=>"TLAPS", "theorem"=>name, "source"=>"tlaps"))
+                    end
                     id += 1
                 end
             end
diff --git a/scripts/extract_tlc.jl b/scripts/extract_tlc.jl
index 981cffc..0e6c0b2 100644
--- a/scripts/extract_tlc.jl
+++ b/scripts/extract_tlc.jl
@@ -29,6 +29,13 @@ function run_extract()
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"TLC",
                     "source_file"=>relpath(f, DIR), "theorem"=>strip(m.captures[2]),
                     "goal"=>"$(m.captures[1]) $(m.captures[2])", "context"=>Any[]))
+                # Emit premises: identifiers from invariant/property name
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", m.captures[2])
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"TLC", "theorem"=>strip(m.captures[2]), "source"=>"tlc"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_tptp.jl b/scripts/extract_tptp.jl
index ed5e71a..cc09b31 100644
--- a/scripts/extract_tptp.jl
+++ b/scripts/extract_tptp.jl
@@ -11,7 +11,7 @@
 #         training_data/tactics_tptp.a2ml
 #         training_data/stats_tptp.a2ml
 
-using Dates
+using Dates, JSON3
 include("a2ml_emit.jl")
 using .A2MLEmit
 
@@ -162,6 +162,7 @@ function extract_all(base_dir::String)
 
     proof_states = Dict{String, Any}[]
     tactics = Dict{String, Any}[]
+    premises = Dict{String, Any}[]
     prover_counts = Dict(p => 0 for p in PROVERS)
     skipped = 0
     errors = 0
@@ -222,6 +223,14 @@ function extract_all(base_dir::String)
                 "from_negated" => get(parsed, "from_negated", false),
             )
             push!(proof_states, state)
+            # Emit premises: TPTP identifiers from conjecture
+            conj = parsed["conjecture"]
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", conj)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(premises, Dict{String,Any}(
+                    "proof_id"=>record_id, "premise"=>h,
+                    "prover"=>prover, "theorem"=>parsed["name"], "source"=>"tptp"))
+            end
             prover_counts[prover] += 1
 
             tactic = Dict{String, Any}(
@@ -255,15 +264,15 @@ function extract_all(base_dir::String)
         "id_range" => id_range,
     )
 
-    return proof_states, tactics, stats
+    return proof_states, tactics, premises, stats
 end
 
 """
-    save_results(proof_states, tactics, stats; output_dir="training_data")
+    save_results(proof_states, tactics, premises, stats; output_dir="training_data")
 
 Write extraction results to JSONL / JSON files.
 """
-function save_results(proof_states, tactics, stats; output_dir::String="training_data")
+function save_results(proof_states, tactics, premises, stats; output_dir::String="training_data")
     mkpath(output_dir)
 
     write_records_file(
@@ -286,8 +295,13 @@ function save_results(proof_states, tactics, stats; output_dir::String="training
         A2MLEmit.write_metadata_table(fh, stats)
     end
 
+    open(joinpath(output_dir, "premises_tptp.jsonl"), "w") do fh
+        for p in premises; println(fh, JSON3.write(p)); end
+    end
+
     println("\nSaved $(length(proof_states)) proof states -> $output_dir/proof_states_tptp.a2ml")
     println("Saved $(length(tactics)) tactics        -> $output_dir/tactics_tptp.a2ml")
+    println("Saved $(length(premises)) premises       -> $output_dir/premises_tptp.jsonl")
     println("Saved stats                        -> $output_dir/stats_tptp.a2ml")
 end
 
@@ -308,14 +322,14 @@ function main()::Int
         return 1
     end
 
-    proof_states, tactics, stats = extract_all(base_dir)
+    proof_states, tactics, premises, stats = extract_all(base_dir)
 
     if isempty(proof_states)
         println("\nWARNING: No proof states extracted. Check corpus contents.")
         return 1
     end
 
-    save_results(proof_states, tactics, stats)
+    save_results(proof_states, tactics, premises, stats)
 
     println("\nProver distribution:")
     for (prover, count) in stats["prover_distribution"]
diff --git a/scripts/extract_typechecker_ecosystem.jl b/scripts/extract_typechecker_ecosystem.jl
index 9e2da33..6bbcdf4 100644
--- a/scripts/extract_typechecker_ecosystem.jl
+++ b/scripts/extract_typechecker_ecosystem.jl
@@ -23,6 +23,7 @@ using JSON3
 const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const TRAINING_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(TRAINING_DIR, "proof_states_typechecker_ecosystem.jsonl")
+const PREMISES_FILE = joinpath(TRAINING_DIR, "premises_typechecker_ecosystem.jsonl")
 const STATS_FILE = joinpath(TRAINING_DIR, "stats_typechecker_ecosystem.json")
 const START_ID = 320000
 const TARGET_PER_TOOL = 220
@@ -368,10 +369,12 @@ function write_output(entries::Vector{Entry})::Dict{String,Int}
     prover_counts = Dict{String,Int}()
 
     mkpath(TRAINING_DIR)
+    premises_records = Dict{String,Any}[]
     open(OUTPUT_FILE, "w") do fh
         for (idx, e) in enumerate(entries)
+            record_id = START_ID + idx - 1
             record = Dict{String,Any}(
-                "id" => START_ID + idx - 1,
+                "id" => record_id,
                 "prover" => e.prover,
                 "theorem" => e.theorem,
                 "goal" => e.goal,
@@ -380,9 +383,19 @@ function write_output(entries::Vector{Entry})::Dict{String,Int}
                 "source" => e.source,
             )
             println(fh, JSON3.write(record))
+            # Emit premises: identifiers from typechecker goal snippet
+            for hm in eachmatch(r"\b([a-zA-Z][a-zA-Z0-9_\-]{1,40})\b", e.goal)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(premises_records, Dict{String,Any}(
+                    "proof_id"=>record_id, "premise"=>h,
+                    "prover"=>e.prover, "theorem"=>e.theorem, "source"=>"typechecker_ecosystem"))
+            end
             prover_counts[e.prover] = get(prover_counts, e.prover, 0) + 1
         end
     end
+    open(PREMISES_FILE, "w") do fh
+        for p in premises_records; println(fh, JSON3.write(p)); end
+    end
     return prover_counts
 end
 
diff --git a/scripts/extract_typed_wasm.jl b/scripts/extract_typed_wasm.jl
index 763db8b..4ee0550 100644
--- a/scripts/extract_typed_wasm.jl
+++ b/scripts/extract_typed_wasm.jl
@@ -154,6 +154,14 @@ function run_extract()
             rec["id"] = id
             rec["prover"] = "TypedWasm"
             push!(ps, rec)
+            # Emit premises: Rust identifiers from proptest goal/args
+            goal_text = get(rec, "goal", "")
+            for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", goal_text)
+                h = strip(hm.captures[1])
+                !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                    "proof_id"=>id, "premise"=>h,
+                    "prover"=>"TypedWasm", "theorem"=>rec["theorem"], "source"=>"typed_wasm"))
+            end
             id += 1
             real_count += 1
         end
diff --git a/scripts/extract_uppaal.jl b/scripts/extract_uppaal.jl
index ae2b753..35713f3 100644
--- a/scripts/extract_uppaal.jl
+++ b/scripts/extract_uppaal.jl
@@ -21,6 +21,13 @@ function run_extract()
                     "source_file"=>relpath(f, DIR),
                     "theorem"=>"tctl_$(id)", "goal"=>strip(m.match),
                     "context"=>Any[]))
+                # Emit premises: identifiers from TCTL query
+                for hm in eachmatch(r"\b([A-Za-z_][A-Za-z0-9_]{1,40})\b", m.match)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"UPPAAL", "theorem"=>"tctl_$(id)", "source"=>"uppaal"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_viper.jl b/scripts/extract_viper.jl
index c0de304..5c468da 100644
--- a/scripts/extract_viper.jl
+++ b/scripts/extract_viper.jl
@@ -21,6 +21,13 @@ function run_extract()
                 spec = m.captures[2] === nothing ? "" : strip(m.captures[2])
                 push!(ps, Dict{String,Any}("id"=>id, "prover"=>"Viper",
                     "source_file"=>relpath(f, DIR), "theorem"=>n, "goal"=>spec, "context"=>Any[]))
+                # Emit premises: identifiers from method spec (requires/ensures)
+                for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_]{1,40})\b", spec)
+                    h = strip(hm.captures[1])
+                    !isempty(h) && length(h) < 50 && push!(pm, Dict{String,Any}(
+                        "proof_id"=>id, "premise"=>h,
+                        "prover"=>"Viper", "theorem"=>n, "source"=>"viper"))
+                end
                 id += 1
             end
         catch e; println("Warning: $f: $e"); end
diff --git a/scripts/extract_why3.jl b/scripts/extract_why3.jl
index 077d0ca..6323725 100644
--- a/scripts/extract_why3.jl
+++ b/scripts/extract_why3.jl
@@ -24,6 +24,7 @@ const REPO_ROOT = dirname(dirname(abspath(@__FILE__)))
 const EXTERNAL_DIR = joinpath(REPO_ROOT, "external_corpora", "why3")
 const OUTPUT_DIR = joinpath(REPO_ROOT, "training_data")
 const OUTPUT_FILE = joinpath(OUTPUT_DIR, "proof_states_why3.jsonl")
+const PREMISES_FILE = joinpath(OUTPUT_DIR, "premises_why3.jsonl")
 const STATS_FILE = joinpath(OUTPUT_DIR, "stats_why3.json")
 const START_ID = 95000
 
@@ -373,6 +374,7 @@ function run()::Tuple{Int,Int}
 
     current_id = START_ID
     output_records = Dict{String,Any}[]
+    premises_records = Dict{String,Any}[]
     for entry in all_entries
         record = Dict{String,Any}(
             "id" => current_id,
@@ -384,6 +386,14 @@ function run()::Tuple{Int,Int}
             "source" => get(entry, "source", "why3"),
         )
         push!(output_records, record)
+        # Emit premises: Why3 identifiers from goal body
+        goal_text = entry["goal"]
+        for hm in eachmatch(r"\b([a-zA-Z_][a-zA-Z0-9_']{1,40})\b", goal_text)
+            h = strip(hm.captures[1])
+            !isempty(h) && length(h) < 50 && push!(premises_records, Dict{String,Any}(
+                "proof_id"=>current_id, "premise"=>h,
+                "prover"=>"Why3", "theorem"=>entry["theorem"], "source"=>"why3"))
+        end
         current_id += 1
     end
 
@@ -393,9 +403,16 @@ function run()::Tuple{Int,Int}
         end
     end
 
+    open(PREMISES_FILE, "w") do fh
+        for p in premises_records
+            println(fh, JSON3.write(p))
+        end
+    end
+
     stats = Dict{String,Any}(
         "prover" => "Why3",
         "total_proofs" => length(output_records),
+        "premises_count" => length(premises_records),
         "extracted_from_source" => extracted_count,
         "synthetic_added" => length(output_records) - extracted_count,
         "id_range" => [START_ID, current_id - 1],
diff --git a/src/rust/provers/cvc5.rs.stub b/src/rust/provers/cvc5.rs.stub
deleted file mode 100644
index 9d275c4..0000000
--- a/src/rust/provers/cvc5.rs.stub
+++ /dev/null
@@ -1,25 +0,0 @@
-// SPDX-FileCopyrightText: 2025 ECHIDNA Project Team
-// SPDX-License-Identifier: MIT OR Palimpsest-0.6
-use async_trait::async_trait;
-use anyhow::{anyhow, Result};
-use std::path::PathBuf;
-use crate::core::{ProofState, Tactic, TacticResult};
-use crate::provers::{ProverBackend, ProverConfig, ProverKind};
-
-pub struct CVC5Backend { config: ProverConfig }
-impl CVC5Backend { pub fn new(config: ProverConfig) -> Self { CVC5Backend { config } } }
-
-#[async_trait]
-impl ProverBackend for CVC5Backend {
-    fn kind(&self) -> ProverKind { ProverKind::CVC5 }
-    async fn version(&self) -> Result<String> { Err(anyhow!("Not implemented")) }
-    async fn parse_file(&self, _path: PathBuf) -> Result<ProofState> { Err(anyhow!("Not implemented")) }
-    async fn parse_string(&self, _content: &str) -> Result<ProofState> { Err(anyhow!("Not implemented")) }
-    async fn apply_tactic(&self, _state: &ProofState, _tactic: &Tactic) -> Result<TacticResult> { Err(anyhow!("Not implemented")) }
-    async fn verify_proof(&self, _state: &ProofState) -> Result<bool> { Err(anyhow!("Not implemented")) }
-    async fn export(&self, _state: &ProofState) -> Result<String> { Err(anyhow!("Not implemented")) }
-    async fn suggest_tactics(&self, _state: &ProofState, _limit: usize) -> Result<Vec<Tactic>> { Err(anyhow!("Not implemented")) }
-    async fn search_theorems(&self, _pattern: &str) -> Result<Vec<String>> { Err(anyhow!("Not implemented")) }
-    fn config(&self) -> &ProverConfig { &self.config }
-    fn set_config(&mut self, config: ProverConfig) { self.config = config; }
-}
diff --git a/src/rust/provers/mizar_complete.rs b/src/rust/provers/mizar_complete.rs
deleted file mode 100644
index 3cc226f..0000000
--- a/src/rust/provers/mizar_complete.rs
+++ /dev/null
@@ -1,1320 +0,0 @@
-// SPDX-FileCopyrightText: 2025 ECHIDNA Project Team
-// SPDX-License-Identifier: PMPL-1.0-or-later
-
-#![allow(dead_code)]
-
-//! Mizar theorem prover backend implementation
-//!
-//! Mizar is a Tier 2 prover (complexity 3/5) with natural-language-like syntax.
-//! It features the Mizar Mathematical Library (MML), one of the largest collections
-//! of formalized mathematics.
-//!
-//! Key features:
-//! - Natural language proof style (let, assume, thus, per cases)
-//! - Environ sections for article dependencies
-//! - MML (Mizar Mathematical Library) integration
-//! - Multi-phase verification (accommodation, analyzer)
-//! - Extensive library of formalized mathematics
-
-use async_trait::async_trait;
-use anyhow::{anyhow, Context, Result};
-use serde::{Deserialize, Serialize};
-use std::collections::HashMap;
-use std::path::{Path, PathBuf};
-use std::process::Stdio;
-use tokio::fs;
-use tokio::io::AsyncWriteExt;
-use tokio::process::Command;
-
-use crate::core::{
-    Context as ProofContext, Definition, Goal, Hypothesis, ProofState, Tactic, TacticResult,
-    Term, Theorem, Variable,
-};
-use crate::provers::{ProverBackend, ProverConfig, ProverKind};
-
-/// Mizar prover backend
-pub struct MizarBackend {
-    config: ProverConfig,
-    mml_path: PathBuf,
-}
-
-impl MizarBackend {
-    /// Create a new Mizar backend
-    pub fn new(config: ProverConfig) -> Self {
-        let mml_path = std::env::var("MIZFILES")
-            .map(PathBuf::from)
-            .unwrap_or_else(|_| PathBuf::from("/usr/local/share/mizar"));
-
-        MizarBackend { config, mml_path }
-    }
-
-    /// Set the MML (Mizar Mathematical Library) path
-    pub fn with_mml_path(mut self, path: PathBuf) -> Self {
-        self.mml_path = path;
-        self
-    }
-
-    /// Parse a Mizar article from string content
-    fn parse_article(&self, content: &str) -> Result<MizarArticle> {
-        let mut parser = MizarParser::new(content);
-        parser.parse()
-    }
-
-    /// Run Mizar verifier on a file
-    async fn verify_file(&self, path: &Path) -> Result<MizarVerificationResult> {
-        // Mizar verification is a two-phase process:
-        // 1. Accommodation (mizf) - processes environ directives
-        // 2. Analyzer (verifier) - type checks and verifies proofs
-
-        // Phase 1: Accommodation
-        let mizf_output = self.run_mizf(path).await?;
-        if !mizf_output.success {
-            return Ok(MizarVerificationResult {
-                success: false,
-                errors: mizf_output.errors,
-                warnings: mizf_output.warnings,
-            });
-        }
-
-        // Phase 2: Verification
-        let verifier_output = self.run_verifier(path).await?;
-
-        Ok(MizarVerificationResult {
-            success: verifier_output.success,
-            errors: verifier_output.errors,
-            warnings: verifier_output.warnings,
-        })
-    }
-
-    /// Run mizf (accommodation phase)
-    async fn run_mizf(&self, path: &Path) -> Result<MizarPhaseOutput> {
-        let mizf_path = self.config.executable.parent()
-            .unwrap_or_else(|| Path::new("/usr/local/bin"))
-            .join("mizf");
-
-        let output = Command::new(&mizf_path)
-            .arg(path)
-            .env("MIZFILES", &self.mml_path)
-            .stdout(Stdio::piped())
-            .stderr(Stdio::piped())
-            .output()
-            .await
-            .context("Failed to run mizf")?;
-
-        let stdout = String::from_utf8_lossy(&output.stdout);
-        let stderr = String::from_utf8_lossy(&output.stderr);
-
-        let (errors, warnings) = self.parse_mizar_messages(&stdout, &stderr);
-
-        Ok(MizarPhaseOutput {
-            success: output.status.success() && errors.is_empty(),
-            errors,
-            warnings,
-        })
-    }
-
-    /// Run verifier (verification phase)
-    async fn run_verifier(&self, path: &Path) -> Result<MizarPhaseOutput> {
-        let verifier_path = self.config.executable.parent()
-            .unwrap_or_else(|| Path::new("/usr/local/bin"))
-            .join("verifier");
-
-        let output = Command::new(&verifier_path)
-            .arg(path)
-            .env("MIZFILES", &self.mml_path)
-            .stdout(Stdio::piped())
-            .stderr(Stdio::piped())
-            .output()
-            .await
-            .context("Failed to run verifier")?;
-
-        let stdout = String::from_utf8_lossy(&output.stdout);
-        let stderr = String::from_utf8_lossy(&output.stderr);
-
-        let (errors, warnings) = self.parse_mizar_messages(&stdout, &stderr);
-
-        Ok(MizarPhaseOutput {
-            success: output.status.success() && errors.is_empty(),
-            errors,
-            warnings,
-        })
-    }
-
-    /// Parse Mizar error and warning messages
-    fn parse_mizar_messages(&self, stdout: &str, stderr: &str) -> (Vec<MizarError>, Vec<String>) {
-        let mut errors = Vec::new();
-        let mut warnings = Vec::new();
-
-        for line in stdout.lines().chain(stderr.lines()) {
-            if line.contains("error") || line.starts_with('*') {
-                if let Some(error) = self.parse_error_line(line) {
-                    errors.push(error);
-                }
-            } else if line.contains("warning") {
-                warnings.push(line.to_string());
-            }
-        }
-
-        (errors, warnings)
-    }
-
-    /// Parse a single error line
-    fn parse_error_line(&self, line: &str) -> Option<MizarError> {
-        if line.starts_with('*') {
-            let parts: Vec<&str> = line.split_whitespace().collect();
-            if parts.len() >= 4 {
-                return Some(MizarError {
-                    line: parts[1].parse().ok()?,
-                    column: parts[2].parse().ok()?,
-                    code: parts[3].to_string(),
-                    message: parts[4..].join(" "),
-                });
-            }
-        }
-
-        let parts: Vec<&str> = line.splitn(4, ':').collect();
-        if parts.len() == 4 {
-            return Some(MizarError {
-                line: parts[1].trim().parse().ok()?,
-                column: parts[2].trim().parse().ok()?,
-                code: String::new(),
-                message: parts[3].trim().to_string(),
-            });
-        }
-
-        None
-    }
-
-    /// Convert Mizar term to universal Term
-    fn mizar_to_term(&self, mizar_term: &MizarTerm) -> Result<Term> {
-        match mizar_term {
-            MizarTerm::Variable(name) => Ok(Term::Var(name.clone())),
-            MizarTerm::Constant(name) => Ok(Term::Const(name.clone())),
-            MizarTerm::Application { func, args } => {
-                let func_term = Box::new(self.mizar_to_term(func)?);
-                let arg_terms: Result<Vec<Term>> =
-                    args.iter().map(|a| self.mizar_to_term(a)).collect();
-                Ok(Term::App {
-                    func: func_term,
-                    args: arg_terms?,
-                })
-            }
-            MizarTerm::Quantifier { kind, var, var_type, body } => {
-                let var_type_term = self.mizar_to_term(var_type)?;
-                let body_term = self.mizar_to_term(body)?;
-
-                match kind {
-                    QuantifierKind::ForAll => Ok(Term::Pi {
-                        param: var.clone(),
-                        param_type: Box::new(var_type_term),
-                        body: Box::new(body_term),
-                    }),
-                    QuantifierKind::Exists => {
-                        Ok(Term::App {
-                            func: Box::new(Term::Const("Exists".to_string())),
-                            args: vec![Term::Lambda {
-                                param: var.clone(),
-                                param_type: Some(Box::new(var_type_term)),
-                                body: Box::new(body_term),
-                            }],
-                        })
-                    }
-                }
-            }
-            MizarTerm::BinaryOp { op, left, right } => {
-                let left_term = self.mizar_to_term(left)?;
-                let right_term = self.mizar_to_term(right)?;
-                Ok(Term::App {
-                    func: Box::new(Term::Const(op.to_string())),
-                    args: vec![left_term, right_term],
-                })
-            }
-            MizarTerm::UnaryOp { op, operand } => {
-                let operand_term = self.mizar_to_term(operand)?;
-                Ok(Term::App {
-                    func: Box::new(Term::Const(op.to_string())),
-                    args: vec![operand_term],
-                })
-            }
-        }
-    }
-
-    /// Convert universal Term to Mizar syntax
-    fn term_to_mizar(&self, term: &Term) -> String {
-        match term {
-            Term::Var(name) => name.clone(),
-            Term::Const(name) => name.clone(),
-            Term::App { func, args } => {
-                let func_str = self.term_to_mizar(func);
-                let args_str = args
-                    .iter()
-                    .map(|a| self.term_to_mizar(a))
-                    .collect::<Vec<_>>()
-                    .join(", ");
-                format!("{}({})", func_str, args_str)
-            }
-            Term::Lambda { param, param_type, body } => {
-                let type_str = param_type
-                    .as_ref()
-                    .map(|t| format!(" being {}", self.term_to_mizar(t)))
-                    .unwrap_or_default();
-                format!("λ {}{} . {}", param, type_str, self.term_to_mizar(body))
-            }
-            Term::Pi { param, param_type, body } => {
-                format!(
-                    "for {} being {} holds {}",
-                    param,
-                    self.term_to_mizar(param_type),
-                    self.term_to_mizar(body)
-                )
-            }
-            Term::Universe(level) => format!("Type{}", level),
-            Term::Meta(id) => format!("?{}", id),
-            Term::ProverSpecific { .. } => "<term>".to_string(),
-        }
-    }
-
-    /// Export proof state to Mizar article format
-    fn export_to_mizar(&self, state: &ProofState) -> Result<String> {
-        let mut output = String::new();
-
-        output.push_str(":: Generated by ECHIDNA\n");
-        output.push_str(":: Mizar article\n\n");
-
-        output.push_str("environ\n\n");
-        output.push_str(" vocabularies SUBSET_1, XBOOLE_0, TARSKI;\n");
-        output.push_str(" notations TARSKI, XBOOLE_0;\n");
-        output.push_str(" constructors TARSKI, XBOOLE_0;\n");
-        output.push_str(" registrations XBOOLE_0;\n\n");
-        output.push_str("begin\n\n");
-
-        for theorem in &state.context.theorems {
-            output.push_str(&format!("theorem {}:\n", theorem.name));
-            output.push_str(&format!("  {}\n", self.term_to_mizar(&theorem.statement)));
-            output.push_str("proof\n");
-            output.push_str("  thus thesis;\n");
-            output.push_str("end;\n\n");
-        }
-
-        for (i, goal) in state.goals.iter().enumerate() {
-            output.push_str(&format!(":: Goal {}\n", i + 1));
-            output.push_str(&format!("theorem Goal{}:\n", i + 1));
-            output.push_str(&format!("  {}\n", self.term_to_mizar(&goal.target)));
-            output.push_str("proof\n");
-            output.push_str("  :: Proof to be completed\n");
-            output.push_str("  thus thesis;\n");
-            output.push_str("end;\n\n");
-        }
-
-        Ok(output)
-    }
-}
-
-#[async_trait]
-impl ProverBackend for MizarBackend {
-    fn kind(&self) -> ProverKind {
-        ProverKind::Mizar
-    }
-
-    async fn version(&self) -> Result<String> {
-        let output = Command::new(&self.config.executable)
-            .arg("--version")
-            .output()
-            .await
-            .context("Failed to get Mizar version")?;
-
-        Ok(String::from_utf8_lossy(&output.stdout).trim().to_string())
-    }
-
-    async fn parse_file(&self, path: PathBuf) -> Result<ProofState> {
-        let content = fs::read_to_string(&path)
-            .await
-            .context("Failed to read Mizar file")?;
-
-        self.parse_string(&content).await
-    }
-
-    async fn parse_string(&self, content: &str) -> Result<ProofState> {
-        let article = self.parse_article(content)?;
-
-        let mut goals = Vec::new();
-        let mut context = ProofContext::default();
-
-        for theorem in &article.theorems {
-            let statement = self.mizar_to_term(&theorem.formula)?;
-
-            if theorem.proof.is_none() || !theorem.is_proved {
-                goals.push(Goal {
-                    id: theorem.name.clone(),
-                    target: statement.clone(),
-                    hypotheses: vec![],
-                });
-            }
-
-            context.theorems.push(Theorem {
-                name: theorem.name.clone(),
-                statement,
-                proof: None,
-                aspects: vec![],
-            });
-        }
-
-        for def in &article.definitions {
-            let def_type = self.mizar_to_term(&def.def_type)?;
-            let def_body = self.mizar_to_term(&def.body)?;
-
-            context.definitions.push(Definition {
-                name: def.name.clone(),
-                ty: def_type,
-                body: def_body,
-            });
-        }
-
-        Ok(ProofState {
-            goals,
-            context,
-            proof_script: vec![],
-            metadata: HashMap::new(),
-        })
-    }
-
-    async fn apply_tactic(&self, state: &ProofState, tactic: &Tactic) -> Result<TacticResult> {
-        let mut new_state = state.clone();
-
-        match tactic {
-            Tactic::Apply(theorem_name) => {
-                if new_state.goals.is_empty() {
-                    return Ok(TacticResult::QED);
-                }
-                new_state.proof_script.push(tactic.clone());
-                Ok(TacticResult::Success(new_state))
-            }
-            Tactic::Intro(name) => {
-                if new_state.goals.is_empty() {
-                    return Ok(TacticResult::QED);
-                }
-
-                let goal = &mut new_state.goals[0];
-                let hypothesis_name = name
-                    .clone()
-                    .unwrap_or_else(|| format!("H{}", goal.hypotheses.len()));
-
-                if let Term::Pi { param, param_type, body } = &goal.target {
-                    goal.hypotheses.push(Hypothesis {
-                        name: hypothesis_name,
-                        ty: *param_type.clone(),
-                        body: None,
-                    });
-                    goal.target = *body.clone();
-                    new_state.proof_script.push(tactic.clone());
-                    Ok(TacticResult::Success(new_state))
-                } else {
-                    Err(anyhow!("Cannot introduce: goal is not a Pi type"))
-                }
-            }
-            Tactic::Cases(term) => {
-                if new_state.goals.is_empty() {
-                    return Ok(TacticResult::QED);
-                }
-                new_state.proof_script.push(tactic.clone());
-                Ok(TacticResult::Success(new_state))
-            }
-            Tactic::Assumption => {
-                if new_state.goals.is_empty() {
-                    return Ok(TacticResult::QED);
-                }
-
-                let goal = &new_state.goals[0];
-                let found = goal
-                    .hypotheses
-                    .iter()
-                    .any(|h| h.ty == goal.target);
-
-                if found {
-                    new_state.goals.remove(0);
-                    new_state.proof_script.push(tactic.clone());
-                    if new_state.goals.is_empty() {
-                        Ok(TacticResult::QED)
-                    } else {
-                        Ok(TacticResult::Success(new_state))
-                    }
-                } else {
-                    Err(anyhow!("No matching hypothesis found"))
-                }
-            }
-            Tactic::Exact(term) => {
-                if new_state.goals.is_empty() {
-                    return Ok(TacticResult::QED);
-                }
-
-                let goal = &new_state.goals[0];
-                if term == &goal.target {
-                    new_state.goals.remove(0);
-                    new_state.proof_script.push(tactic.clone());
-                    if new_state.goals.is_empty() {
-                        Ok(TacticResult::QED)
-                    } else {
-                        Ok(TacticResult::Success(new_state))
-                    }
-                } else {
-                    Err(anyhow!("Exact term does not match goal"))
-                }
-            }
-            Tactic::Custom { prover, command, args } => {
-                if prover != "mizar" {
-                    return Err(anyhow!("Custom tactic not for Mizar"));
-                }
-
-                match command.as_str() {
-                    "thus" | "hence" => {
-                        new_state.proof_script.push(tactic.clone());
-                        Ok(TacticResult::Success(new_state))
-                    }
-                    "per_cases" => {
-                        new_state.proof_script.push(tactic.clone());
-                        Ok(TacticResult::Success(new_state))
-                    }
-                    _ => Err(anyhow!("Unknown Mizar tactic: {}", command)),
-                }
-            }
-            _ => {
-                new_state.proof_script.push(tactic.clone());
-                Ok(TacticResult::Success(new_state))
-            }
-        }
-    }
-
-    async fn verify_proof(&self, state: &ProofState) -> Result<bool> {
-        if !state.is_complete() {
-            return Ok(false);
-        }
-
-        let mizar_content = self.export_to_mizar(state)?;
-
-        let temp_dir = std::env::temp_dir();
-        let temp_file = temp_dir.join(format!("echidna_{}.miz", uuid::Uuid::new_v4()));
-
-        let mut file = fs::File::create(&temp_file).await?;
-        file.write_all(mizar_content.as_bytes()).await?;
-        file.sync_all().await?;
-        drop(file);
-
-        let result = self.verify_file(&temp_file).await?;
-
-        let _ = fs::remove_file(&temp_file).await;
-
-        Ok(result.success)
-    }
-
-    async fn export(&self, state: &ProofState) -> Result<String> {
-        self.export_to_mizar(state)
-    }
-
-    async fn suggest_tactics(&self, state: &ProofState, limit: usize) -> Result<Vec<Tactic>> {
-        if state.goals.is_empty() {
-            return Ok(vec![]);
-        }
-
-        let goal = &state.goals[0];
-        let mut suggestions = Vec::new();
-
-        match &goal.target {
-            Term::Pi { .. } => {
-                suggestions.push(Tactic::Intro(None));
-            }
-            Term::App { func, .. } => {
-                for theorem in &state.context.theorems {
-                    suggestions.push(Tactic::Apply(theorem.name.clone()));
-                    if suggestions.len() >= limit {
-                        break;
-                    }
-                }
-            }
-            _ => {
-                suggestions.push(Tactic::Assumption);
-            }
-        }
-
-        suggestions.truncate(limit);
-        Ok(suggestions)
-    }
-
-    async fn search_theorems(&self, pattern: &str) -> Result<Vec<String>> {
-        let mml_dict = self.mml_path.join("mml.lar");
-        if !mml_dict.exists() {
-            return Ok(vec![]);
-        }
-
-        let content = fs::read_to_string(&mml_dict).await?;
-        let mut results = Vec::new();
-
-        for line in content.lines() {
-            if line.to_lowercase().contains(&pattern.to_lowercase()) {
-                results.push(line.to_string());
-                if results.len() >= 100 {
-                    break;
-                }
-            }
-        }
-
-        Ok(results)
-    }
-
-    fn config(&self) -> &ProverConfig {
-        &self.config
-    }
-
-    fn set_config(&mut self, config: ProverConfig) {
-        self.config = config;
-    }
-}
-
-// ============================================================================
-// Parser Implementation
-// ============================================================================
-
-#[derive(Debug, Clone)]
-struct MizarArticle {
-    environ: MizarEnviron,
-    theorems: Vec<MizarTheorem>,
-    definitions: Vec<MizarDefinition>,
-    schemes: Vec<MizarScheme>,
-}
-
-#[derive(Debug, Clone, Default)]
-struct MizarEnviron {
-    vocabularies: Vec<String>,
-    notations: Vec<String>,
-    constructors: Vec<String>,
-    registrations: Vec<String>,
-    theorems: Vec<String>,
-    requirements: Vec<String>,
-}
-
-#[derive(Debug, Clone)]
-struct MizarTheorem {
-    name: String,
-    formula: MizarTerm,
-    proof: Option<MizarProof>,
-    is_proved: bool,
-}
-
-#[derive(Debug, Clone)]
-struct MizarDefinition {
-    name: String,
-    def_type: MizarTerm,
-    body: MizarTerm,
-}
-
-#[derive(Debug, Clone)]
-struct MizarScheme {
-    name: String,
-    parameters: Vec<String>,
-    formula: MizarTerm,
-}
-
-#[derive(Debug, Clone)]
-struct MizarProof {
-    steps: Vec<MizarProofStep>,
-}
-
-#[derive(Debug, Clone)]
-enum MizarProofStep {
-    Let { vars: Vec<String>, var_type: String },
-    Assume { label: Option<String>, formula: MizarTerm },
-    Thus { formula: MizarTerm, justification: Option<String> },
-    Hence { formula: MizarTerm, justification: Option<String> },
-    PerCases { cases: Vec<MizarProof> },
-    Take { term: MizarTerm },
-    Consider { vars: Vec<String>, formula: MizarTerm, justification: Option<String> },
-}
-
-#[derive(Debug, Clone, PartialEq)]
-enum MizarTerm {
-    Variable(String),
-    Constant(String),
-    Application {
-        func: Box<MizarTerm>,
-        args: Vec<MizarTerm>,
-    },
-    Quantifier {
-        kind: QuantifierKind,
-        var: String,
-        var_type: Box<MizarTerm>,
-        body: Box<MizarTerm>,
-    },
-    BinaryOp {
-        op: String,
-        left: Box<MizarTerm>,
-        right: Box<MizarTerm>,
-    },
-    UnaryOp {
-        op: String,
-        operand: Box<MizarTerm>,
-    },
-}
-
-#[derive(Debug, Clone, PartialEq)]
-enum QuantifierKind {
-    ForAll,
-    Exists,
-}
-
-struct MizarParser {
-    input: String,
-    pos: usize,
-}
-
-impl MizarParser {
-    fn new(input: &str) -> Self {
-        MizarParser {
-            input: input.to_string(),
-            pos: 0,
-        }
-    }
-
-    fn parse(&mut self) -> Result<MizarArticle> {
-        self.skip_whitespace_and_comments();
-
-        let environ = self.parse_environ()?;
-
-        self.expect_keyword("begin")?;
-
-        let mut theorems = Vec::new();
-        let mut definitions = Vec::new();
-        let mut schemes = Vec::new();
-
-        while self.pos < self.input.len() {
-            self.skip_whitespace_and_comments();
-            if self.pos >= self.input.len() {
-                break;
-            }
-
-            let keyword = self.peek_keyword();
-            match keyword.as_deref() {
-                Some("theorem") => {
-                    theorems.push(self.parse_theorem()?);
-                }
-                Some("definition") => {
-                    definitions.push(self.parse_definition()?);
-                }
-                Some("scheme") => {
-                    schemes.push(self.parse_scheme()?);
-                }
-                _ => break,
-            }
-        }
-
-        Ok(MizarArticle {
-            environ,
-            theorems,
-            definitions,
-            schemes,
-        })
-    }
-
-    fn parse_environ(&mut self) -> Result<MizarEnviron> {
-        self.expect_keyword("environ")?;
-
-        let mut environ = MizarEnviron::default();
-
-        while !self.check_keyword("begin") {
-            self.skip_whitespace_and_comments();
-
-            let keyword = self.peek_keyword();
-            match keyword.as_deref() {
-                Some("vocabularies") => {
-                    self.consume_keyword("vocabularies");
-                    environ.vocabularies = self.parse_identifier_list()?;
-                }
-                Some("notations") => {
-                    self.consume_keyword("notations");
-                    environ.notations = self.parse_identifier_list()?;
-                }
-                Some("constructors") => {
-                    self.consume_keyword("constructors");
-                    environ.constructors = self.parse_identifier_list()?;
-                }
-                Some("registrations") => {
-                    self.consume_keyword("registrations");
-                    environ.registrations = self.parse_identifier_list()?;
-                }
-                Some("theorems") => {
-                    self.consume_keyword("theorems");
-                    environ.theorems = self.parse_identifier_list()?;
-                }
-                Some("requirements") => {
-                    self.consume_keyword("requirements");
-                    environ.requirements = self.parse_identifier_list()?;
-                }
-                _ => break,
-            }
-        }
-
-        Ok(environ)
-    }
-
-    fn parse_identifier_list(&mut self) -> Result<Vec<String>> {
-        let mut identifiers = Vec::new();
-
-        loop {
-            self.skip_whitespace_and_comments();
-
-            if let Some(id) = self.parse_identifier() {
-                identifiers.push(id);
-                self.skip_whitespace_and_comments();
-                if self.check_char(',') {
-                    self.consume_char(',');
-                } else {
-                    break;
-                }
-            } else {
-                break;
-            }
-        }
-
-        self.expect_char(';')?;
-        Ok(identifiers)
-    }
-
-    fn parse_theorem(&mut self) -> Result<MizarTheorem> {
-        self.expect_keyword("theorem")?;
-
-        let name = self.parse_identifier()
-            .unwrap_or_else(|| format!("Theorem{}", self.pos));
-
-        self.expect_char(':')?;
-
-        let formula = self.parse_formula()?;
-
-        let (proof, is_proved) = if self.check_keyword("proof") {
-            (Some(self.parse_proof()?), true)
-        } else {
-            self.expect_char(';')?;
-            (None, false)
-        };
-
-        Ok(MizarTheorem {
-            name,
-            formula,
-            proof,
-            is_proved,
-        })
-    }
-
-    fn parse_definition(&mut self) -> Result<MizarDefinition> {
-        self.expect_keyword("definition")?;
-
-        let name = self.parse_identifier()
-            .ok_or_else(|| anyhow!("Expected definition name"))?;
-
-        let def_type = MizarTerm::Constant("Type".to_string());
-        let body = MizarTerm::Constant("body".to_string());
-
-        while !self.check_keyword("end") {
-            self.pos += 1;
-            if self.pos >= self.input.len() {
-                break;
-            }
-        }
-        if self.check_keyword("end") {
-            self.expect_keyword("end")?;
-            self.expect_char(';')?;
-        }
-
-        Ok(MizarDefinition {
-            name,
-            def_type,
-            body,
-        })
-    }
-
-    fn parse_scheme(&mut self) -> Result<MizarScheme> {
-        self.expect_keyword("scheme")?;
-
-        let name = self.parse_identifier()
-            .ok_or_else(|| anyhow!("Expected scheme name"))?;
-
-        let parameters = vec![];
-        let formula = MizarTerm::Constant("scheme".to_string());
-
-        while !self.check_keyword("end") {
-            self.pos += 1;
-            if self.pos >= self.input.len() {
-                break;
-            }
-        }
-        if self.check_keyword("end") {
-            self.expect_keyword("end")?;
-            self.expect_char(';')?;
-        }
-
-        Ok(MizarScheme {
-            name,
-            parameters,
-            formula,
-        })
-    }
-
-    fn parse_proof(&mut self) -> Result<MizarProof> {
-        self.expect_keyword("proof")?;
-
-        let mut steps = Vec::new();
-
-        while !self.check_keyword("end") {
-            self.skip_whitespace_and_comments();
-
-            if self.pos >= self.input.len() || self.check_keyword("end") {
-                break;
-            }
-
-            let keyword = self.peek_keyword();
-            match keyword.as_deref() {
-                Some("let") => {
-                    steps.push(self.parse_let_step()?);
-                }
-                Some("assume") => {
-                    steps.push(self.parse_assume_step()?);
-                }
-                Some("thus") => {
-                    steps.push(self.parse_thus_step()?);
-                }
-                Some("hence") => {
-                    steps.push(self.parse_hence_step()?);
-                }
-                Some("per") => {
-                    steps.push(self.parse_per_cases()?);
-                }
-                Some("take") => {
-                    steps.push(self.parse_take_step()?);
-                }
-                _ => {
-                    self.pos += 1;
-                }
-            }
-        }
-
-        self.expect_keyword("end")?;
-        self.expect_char(';')?;
-
-        Ok(MizarProof { steps })
-    }
-
-    fn parse_let_step(&mut self) -> Result<MizarProofStep> {
-        self.expect_keyword("let")?;
-
-        let mut vars = Vec::new();
-        vars.push(self.parse_identifier()
-            .ok_or_else(|| anyhow!("Expected variable name"))?);
-
-        while self.check_char(',') {
-            self.consume_char(',');
-            if let Some(var) = self.parse_identifier() {
-                vars.push(var);
-            }
-        }
-
-        self.expect_keyword("be")?;
-        let var_type = self.parse_identifier()
-            .ok_or_else(|| anyhow!("Expected type name"))?;
-
-        self.expect_char(';')?;
-
-        Ok(MizarProofStep::Let { vars, var_type })
-    }
-
-    fn parse_assume_step(&mut self) -> Result<MizarProofStep> {
-        self.expect_keyword("assume")?;
-
-        let label = if self.peek().is_some_and(|c| c.is_ascii_uppercase()) {
-            let lbl = self.parse_identifier();
-            if self.check_char(':') {
-                self.consume_char(':');
-                lbl
-            } else {
-                None
-            }
-        } else {
-            None
-        };
-
-        let formula = self.parse_formula()?;
-        self.expect_char(';')?;
-
-        Ok(MizarProofStep::Assume { label, formula })
-    }
-
-    fn parse_thus_step(&mut self) -> Result<MizarProofStep> {
-        self.expect_keyword("thus")?;
-
-        let formula = self.parse_formula()?;
-
-        let justification = if self.check_keyword("by") {
-            self.consume_keyword("by");
-            Some(self.parse_justification()?)
-        } else {
-            None
-        };
-
-        self.expect_char(';')?;
-
-        Ok(MizarProofStep::Thus { formula, justification })
-    }
-
-    fn parse_hence_step(&mut self) -> Result<MizarProofStep> {
-        self.expect_keyword("hence")?;
-
-        let formula = self.parse_formula()?;
-
-        let justification = if self.check_keyword("by") {
-            self.consume_keyword("by");
-            Some(self.parse_justification()?)
-        } else {
-            None
-        };
-
-        self.expect_char(';')?;
-
-        Ok(MizarProofStep::Hence { formula, justification })
-    }
-
-    fn parse_per_cases(&mut self) -> Result<MizarProofStep> {
-        self.expect_keyword("per")?;
-        self.expect_keyword("cases")?;
-
-        let mut cases = Vec::new();
-
-        while self.check_keyword("suppose") {
-            self.expect_keyword("suppose")?;
-            
-            while !self.check_keyword("end") && self.pos < self.input.len() {
-                self.pos += 1;
-            }
-            
-            cases.push(MizarProof { steps: vec![] });
-        }
-
-        Ok(MizarProofStep::PerCases { cases })
-    }
-
-    fn parse_take_step(&mut self) -> Result<MizarProofStep> {
-        self.expect_keyword("take")?;
-
-        let term = self.parse_term()?;
-        self.expect_char(';')?;
-
-        Ok(MizarProofStep::Take { term })
-    }
-
-    fn parse_formula(&mut self) -> Result<MizarTerm> {
-        self.skip_whitespace_and_comments();
-
-        if self.check_keyword("for") {
-            return self.parse_for_formula();
-        }
-
-        self.parse_term()
-    }
-
-    fn parse_for_formula(&mut self) -> Result<MizarTerm> {
-        self.expect_keyword("for")?;
-
-        let var = self.parse_identifier()
-            .ok_or_else(|| anyhow!("Expected variable"))?;
-
-        let mut var_type = MizarTerm::Constant("set".to_string());
-
-        if self.check_keyword("being") {
-            self.consume_keyword("being");
-            let type_name = self.parse_identifier()
-                .ok_or_else(|| anyhow!("Expected type name"))?;
-            var_type = MizarTerm::Constant(type_name);
-        }
-
-        self.expect_keyword("holds")?;
-
-        let body = self.parse_formula()?;
-
-        Ok(MizarTerm::Quantifier {
-            kind: QuantifierKind::ForAll,
-            var,
-            var_type: Box::new(var_type),
-            body: Box::new(body),
-        })
-    }
-
-    fn parse_term(&mut self) -> Result<MizarTerm> {
-        self.skip_whitespace_and_comments();
-
-        if let Some(id) = self.parse_identifier() {
-            self.skip_whitespace_and_comments();
-
-            if let Some(op) = self.parse_binary_op() {
-                let right = self.parse_term()?;
-                return Ok(MizarTerm::BinaryOp {
-                    op,
-                    left: Box::new(MizarTerm::Variable(id)),
-                    right: Box::new(right),
-                });
-            }
-
-            if self.check_char('(') {
-                self.consume_char('(');
-                let mut args = vec![];
-
-                if !self.check_char(')') {
-                    loop {
-                        args.push(self.parse_term()?);
-                        self.skip_whitespace_and_comments();
-                        if self.check_char(',') {
-                            self.consume_char(',');
-                        } else {
-                            break;
-                        }
-                    }
-                }
-
-                self.expect_char(')')?;
-
-                return Ok(MizarTerm::Application {
-                    func: Box::new(MizarTerm::Variable(id)),
-                    args,
-                });
-            }
-
-            if id.chars().next().unwrap().is_uppercase() {
-                Ok(MizarTerm::Constant(id))
-            } else {
-                Ok(MizarTerm::Variable(id))
-            }
-        } else {
-            Err(anyhow!("Expected term"))
-        }
-    }
-
-    fn parse_binary_op(&mut self) -> Option<String> {
-        self.skip_whitespace_and_comments();
-
-        let ops = [
-            ("=", "="),
-            ("<>", "≠"),
-            ("c=", "⊆"),
-            ("\\/", "∪"),
-            ("/\\", "∩"),
-            ("\\", "\\"),
-            ("&", "&"),
-            ("or", "or"),
-            ("implies", "⇒"),
-            ("iff", "⇔"),
-            ("+", "+"),
-            ("-", "-"),
-            ("*", "*"),
-            ("/", "/"),
-            ("<=", "≤"),
-            (">=", "≥"),
-            ("<", "<"),
-            (">", ">"),
-        ];
-
-        for (pattern, op) in ops {
-            if self.input[self.pos..].starts_with(pattern) {
-                self.pos += pattern.len();
-                return Some(op.to_string());
-            }
-        }
-
-        None
-    }
-
-    fn parse_justification(&mut self) -> Result<String> {
-        let mut justification = String::new();
-
-        loop {
-            self.skip_whitespace_and_comments();
-
-            if self.check_char(';') {
-                break;
-            }
-
-            if let Some(id) = self.parse_identifier() {
-                justification.push_str(&id);
-                justification.push(' ');
-            } else if self.check_char(':') {
-                self.consume_char(':');
-                justification.push(':');
-            } else if self.check_char(',') {
-                self.consume_char(',');
-                justification.push(',');
-                justification.push(' ');
-            } else {
-                break;
-            }
-        }
-
-        Ok(justification.trim().to_string())
-    }
-
-    fn parse_identifier(&mut self) -> Option<String> {
-        self.skip_whitespace_and_comments();
-
-        let start = self.pos;
-        while self.pos < self.input.len() {
-            let ch = self.input.as_bytes()[self.pos] as char;
-            if ch.is_alphanumeric() || ch == '_' {
-                self.pos += 1;
-            } else {
-                break;
-            }
-        }
-
-        if self.pos > start {
-            Some(self.input[start..self.pos].to_string())
-        } else {
-            None
-        }
-    }
-
-    fn peek_keyword(&self) -> Option<String> {
-        let mut temp_pos = self.pos;
-
-        while temp_pos < self.input.len() && self.input.as_bytes()[temp_pos].is_ascii_whitespace() {
-            temp_pos += 1;
-        }
-
-        let start = temp_pos;
-        while temp_pos < self.input.len() {
-            let ch = self.input.as_bytes()[temp_pos] as char;
-            if ch.is_alphanumeric() || ch == '_' {
-                temp_pos += 1;
-            } else {
-                break;
-            }
-        }
-
-        if temp_pos > start {
-            Some(self.input[start..temp_pos].to_string())
-        } else {
-            None
-        }
-    }
-
-    fn check_keyword(&self, keyword: &str) -> bool {
-        self.peek_keyword().as_deref() == Some(keyword)
-    }
-
-    fn consume_keyword(&mut self, keyword: &str) {
-        self.skip_whitespace_and_comments();
-        if self.input[self.pos..].starts_with(keyword) {
-            self.pos += keyword.len();
-        }
-    }
-
-    fn expect_keyword(&mut self, keyword: &str) -> Result<()> {
-        self.skip_whitespace_and_comments();
-        if self.pos + keyword.len() > self.input.len() || !self.input[self.pos..].starts_with(keyword) {
-            return Err(anyhow!("Expected keyword '{}' at position {}", keyword, self.pos));
-        }
-        self.pos += keyword.len();
-        Ok(())
-    }
-
-    fn peek(&self) -> Option<char> {
-        if self.pos < self.input.len() {
-            Some(self.input.as_bytes()[self.pos] as char)
-        } else {
-            None
-        }
-    }
-
-    fn check_char(&self, ch: char) -> bool {
-        self.peek() == Some(ch)
-    }
-
-    fn consume_char(&mut self, ch: char) {
-        if self.peek() == Some(ch) {
-            self.pos += 1;
-        }
-    }
-
-    fn expect_char(&mut self, ch: char) -> Result<()> {
-        self.skip_whitespace_and_comments();
-        if self.peek() != Some(ch) {
-            return Err(anyhow!("Expected '{}' at position {}, found {:?}", ch, self.pos, self.peek()));
-        }
-        self.pos += 1;
-        Ok(())
-    }
-
-    fn skip_whitespace_and_comments(&mut self) {
-        while self.pos < self.input.len() {
-            let ch = self.input.as_bytes()[self.pos] as char;
-
-            if ch.is_whitespace() {
-                self.pos += 1;
-            } else if self.input[self.pos..].starts_with("::") {
-                while self.pos < self.input.len() && self.input.as_bytes()[self.pos] as char != '\n' {
-                    self.pos += 1;
-                }
-            } else {
-                break;
-            }
-        }
-    }
-}
-
-#[derive(Debug, Clone)]
-struct MizarVerificationResult {
-    success: bool,
-    errors: Vec<MizarError>,
-    warnings: Vec<String>,
-}
-
-#[derive(Debug, Clone)]
-struct MizarPhaseOutput {
-    success: bool,
-    errors: Vec<MizarError>,
-    warnings: Vec<String>,
-}
-
-#[derive(Debug, Clone)]
-struct MizarError {
-    line: usize,
-    column: usize,
-    code: String,
-    message: String,
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_mizar_parser_basic() {
-        let content = r#"
-environ
- vocabularies SUBSET_1, XBOOLE_0;
- notations XBOOLE_0;
-begin
-
-theorem Th1:
-  for P being set holds P = P
-proof
-  let P be set;
-  thus P = P;
-end;
-"#;
-
-        let mut parser = MizarParser::new(content);
-        let article = parser.parse().unwrap();
-
-        assert_eq!(article.theorems.len(), 1);
-        assert_eq!(article.theorems[0].name, "Th1");
-    }
-
-    #[test]
-    fn test_mizar_backend_creation() {
-        let config = ProverConfig::default();
-        let backend = MizarBackend::new(config);
-        assert_eq!(backend.kind(), ProverKind::Mizar);
-    }
-}