damborik22 · damborik22 · Apr 24, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 24, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,6 +16,38 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ### Changed
 - **BindCraft pin** `828fd9f` → `7cd4ace` (3 upstream bugfixes): graylab→west.rosettacommons.org PyRosetta wheels (x86_64), `range(11,15)→(11,16)` model-selection fix, stage-3 `onehot_plddt` init + `align_pdbs` crash guard
 
+### Added (Parts L + M — Protein-Hunter & RFD3, on `refactor/af3-rfd3-ph`)
+- **Part L — Protein-Hunter** (Cho et al. 2025) installable via `bindmaster install --tool protein-hunter` (x86 only; aarch64 blocked by pyrosetta). Conda env `bindmaster_protein_hunter` (Py 3.10), vendored Boltz-2 + LigandMPNN + Chai-1 (sokrypton fork), shortcut `bin/protein-hunter`. New Evaluator extractor reads `summary_high_iptm.csv` by default (`--all-protein-hunter-designs` for all runs). Supports all 6 modalities via upstream `design.py` flags (protein / cyclic / ligand-CCD / ligand-SMILES / DNA / RNA). `SourceTool` Literal + tool colors/displays extended.
+- **Part M — RFD3 (RosettaCommons/foundry v0.1.9)** installable via `bindmaster install --tool rfd3`. Conda env `bindmaster_rfd3` (Py 3.12), `rc-foundry[rfd3,mpnn]` from PyPI, weights at `BindMaster/weights/foundry/`. BSD-3-Clause, commercial-use OK, works on aarch64 (no DGL). Shortcut `bin/rfd3` runs `rfd3 design ...` or opens an env shell. New `RFD3Extractor` with defensive CSV/FASTA parsing. Tool colors/displays added.
+- **RFAA deprecated (not deleted)**. Dropped from interactive menu and from the `--tool all` meta-tool. Still installable via `bindmaster install --tool rfaa` for reproducing existing runs. `install_rfaa()` now prints a deprecation banner pointing at RFD3 and `docs/rfaa_manual_reinstall.md`.
+- **New doc** `docs/rfaa_manual_reinstall.md` captures commit SHAs, post-install patches, and manual-reproducibility steps for long-term RFAA maintenance.
+
+### Added (Part J — Protenix refolder, on `refactor/af3-rfd3-ph`)
+- **Protenix v0.5.0 as universal 2nd refolding engine** — ByteDance's open-source AlphaFold 3 reimplementation (~3-4 GB weights auto-downloaded from ByteDance TOS, runs comfortably on 24 GB GPUs).
+- New CLI: `binder-compare refold-protenix` — runs inside the existing `bindmaster_pxdesign` conda env (no new env needed).
+- New files: `Evaluator/scripts/refold_protenix.py`, `Evaluator/binder_comparison/refolding/protenix_runner.py`, `Evaluator/binder_comparison/cli/refold_protenix.py`.
+- Schema: `protenix_*` columns in `StandardisedMetrics` (iptm, ptm, ranking_score, plddt_binder_mean/min, plddt_target_mean, pae_bt/tb/bb, bt_ipsae, tb_ipsae, ipsae_min). `af3_*` counterparts also reserved for Part K. pLDDT rescaled 0-100 → 0-1 on ingest.
+- Scoring: new generic `add_ipsae_from_pae_files(df, prefix=...)` for any engine's saved PAE matrix.
+- Merger: multi-engine support — `merge_refold_results(boltz2_csv, ..., protenix_csv=..., af3_csv=...)`. Accepts any combination; outer-joins on `sequence`.
+- `compute_agreement` now sums {boltz_pae_ipsae_min, protenix_ipsae_min, af3_ipsae_min} passing the 0.61 threshold (0–3 on Spark, 0–2 on x86).
+- Orchestration:
+  - `Evaluator/evaluate.sh` auto-detects `bindmaster_pxdesign`; Protenix step runs between Boltz-2 and report unless `--skip-protenix` or env missing.
+  - `binder-compare run --protenix-env bindmaster_pxdesign` enables Protenix; omit to skip.
+  - `binder-compare report` gains `--protenix-results` and `--af3-results`.
+- Installer: PXDesign step now pip-installs `binder-compare` into `bindmaster_pxdesign` env so Protenix refolding is available after `bindmaster install --tool pxdesign`.
+- **Live smoke test passed** — 2 × 43aa random binders against 76aa ubiquitin target: inference ~12 s/design on RTX 3090, CSV + `*_pae.npy` populated, token-pair PAE extracted via `need_atom_confidence=True`, DunbrackLab ipSAE computed downstream in the report.
+
+### Removed (Part I — AF2 refolding removal, on `refactor/af3-rfd3-ph`)
+- Evaluator AF2 refolding is gone. This is step 1 of the AF3/Protenix refactor; AF3 (aarch64-only, DGX Spark) and Protenix (universal) will provide the second engine in Parts J & K.
+- Deleted files: `Evaluator/scripts/refold_af2.py`, `Evaluator/scripts/refold_Version6.py`, `Evaluator/binder_comparison/refolding/af2_runner.py`, `Evaluator/binder_comparison/cli/refold_af2.py`, `Evaluator/envs/binder-eval-af2.yml`
+- Installer no longer creates `binder-eval-af2` conda env (uninstall path still cleans legacy installs)
+- Schema: removed 8 `af2_*` fields from `StandardisedMetrics`, 2 from `PerResidueData`; pruned `af2_*` entries from `LOWER_IS_BETTER`, `ZSCORE_METRICS`; `model_weights` default now `{"boltz2": 1.0}`
+- Scoring: deleted `add_af2_ipsae_from_files`; `compute_agreement` engine list now `[boltz_pae_ipsae_min, protenix_ipsae_min, af3_ipsae_min]` (Protenix/AF3 columns land in Parts J & K)
+- Merger: `merge_refold_results(boltz2_csv, sequences_fasta)` (dropped `af2_csv` param)
+- Report & plots: removed `_compute_af2_boltz2_r`, `_correlation_callout_html`, `plot_af2_vs_boltz2_scatter`; pruned all `af2_*` columns from display lists and tooltip map
+- Evaluator orchestration: `evaluate.sh` is now 2-step (Boltz-2 + report); `binder-compare run` is 3-step (extract + refold-boltz2 + report)
+- BindCraft's internal AF2 design path, PXDesign's internal AF2 eval, and Proteina-Complexa's AF2 cross-val **all stay** — only Evaluator AF2 refolding was removed
+
 ### Fixed
 - Configurator `ask_choice()` return value destructuring for PXDesign mode and preset selection
 - RFAA template: Python 3.12 f-string syntax replaced with 3.10-compatible `ligand_line` variable

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -26,8 +26,9 @@ Target structure (.pdb / .mmcif)
     → Evaluator:
        1. Extract sequences from all tool outputs
        2. Refold with Boltz-2 (Mosaic venv)
-       3. Refold with AF2 (ColabDesign)
-       4. Rank, score, and generate HTML report
+       3. (x86) Refold with Protenix (bindmaster_pxdesign env)  [Part J, in progress]
+       4. (aarch64 / DGX Spark) Refold with AlphaFold 3 v3.0.2 (binder-eval-af3)  [Part K, in progress]
+       5. Rank, score, and generate HTML report
 ```
 
 ### Directory layout
@@ -46,9 +47,9 @@ BindMaster/
 │   └── evaluator.py           ← lightweight evaluator (Mosaic venv, ~780 lines)
 ├── Evaluator/                 ← bundled full evaluation pipeline package
 │   ├── binder_comparison/     ← core Python package (extractors, refolding, scoring, viz)
-│   ├── scripts/               ← standalone refold scripts (refold_boltz2.py, refold_af2.py)
+│   ├── scripts/               ← standalone refold scripts (refold_boltz2.py, refold_protenix.py [todo], refold_af3.py [todo])
 │   ├── evaluate.sh            ← shell orchestrator for full 4-step pipeline
-│   ├── envs/                  ← conda env specs (binder-eval.yml, binder-eval-af2.yml)
+│   ├── envs/                  ← conda env specs (binder-eval.yml, binder-eval-af3.yml [aarch64 only, todo])
 │   ├── docs/                  ← pipeline_reference.md (metrics, known issues)
 │   └── pyproject.toml         ← package: "binder-comparison" v0.1.0
 ├── bindmaster_examples/
@@ -85,7 +86,6 @@ Each tool runs in its own isolated environment. **Never mix packages across envi
 | `bindmaster_pxdesign` | PXDesign | 3.11 | conda | Protenix binder design + eval |
 | `Proteina-Complexa/.venv` | Proteina-Complexa | 3.12 | uv | Flow matching + test-time compute binder design |
 | `binder-eval` | Evaluator | 3.10 | conda | Sequence extraction + reporting |
-| `binder-eval-af2` | Evaluator | 3.10 | conda | AF2 refolding via ColabDesign |
 
 The `bindmaster.py` CLI dispatcher uses `os.execv()` to launch sub-commands in their correct environment — `install` runs in bash, `configure` runs in system Python, `evaluate` runs in the Mosaic `.venv` Python.
 
@@ -110,7 +110,7 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments
 - **stdlib-only CLI:** `bindmaster.py` uses only stdlib so it works on any Python 3.10+ without pip installs.
 - **uv for Mosaic:** Mosaic uses `uv` instead of conda because it needs JAX with CUDA, and uv resolves this faster and more reliably.
 - **Pinned commits:** Tool repos are cloned at pinned commits (`BINDCRAFT_COMMIT`, `BOLTZGEN_COMMIT`, `MOSAIC_COMMIT`) for reproducible installs.
-- **Separate evaluator envs:** Boltz-2 refolding needs JAX (Mosaic venv), AF2 refolding needs ColabDesign (conda). These conflict, so they run in separate environments orchestrated by `evaluate.sh`.
+- **Separate evaluator envs:** Boltz-2 refolding runs in the Mosaic venv (JAX). The new Protenix refolder (Part J) rides the existing `bindmaster_pxdesign` conda env. AF3 (Part K) on DGX Spark gets its own `binder-eval-af3` env. `evaluate.sh` orchestrates all three.
 
 ---
 
@@ -147,7 +147,7 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments
 - Python classes: PascalCase
 - Python variables/functions: snake_case
 - Bash constants: UPPER_CASE
-- Conda envs: BindCraft, BoltzGen, binder-eval, binder-eval-af2
+- Conda envs: BindCraft, BoltzGen, binder-eval, bindmaster_pxdesign, bindmaster_rfaa (legacy — being replaced by bindmaster_rfd3)
 
 ### Git and branching
 
@@ -190,7 +190,7 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments
 
 ### Evaluation metrics and ranking
 
-**Primary metric: `ipsae_min`** — the minimum of binder→target and target→binder iPSAE scores. Computed from PAE arrays using the DunbrackLab 2025 formula: `max_i[mean_j(1/(1+(PAE_ij/d0)²))]` (d0_res variant, uniform 10 Å PAE cutoff for both Boltz-2 and AF2). Ranking uses agreement_count (how many engines agree ipsae_min > 0.61) as primary sort, then ipsae_min desc.
+**Primary metric: `ipsae_min`** — the minimum of binder→target and target→binder iPSAE scores. Computed from PAE arrays using the DunbrackLab 2025 formula: `max_i[mean_j(1/(1+(PAE_ij/d0)²))]` (d0_res variant, uniform 10 Å PAE cutoff across all engines). Ranking uses agreement_count (how many engines agree ipsae_min > 0.61) as primary sort, then ipsae_min desc.
 
 **Direction guide:**
 - **Higher is better:** `iptm`, `bt_ipsae`, `tb_ipsae`, `ipsae_min`, `plddt_binder_mean`, `binder_ptm`
@@ -208,13 +208,13 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments
 ### Critical domain facts
 
 - **iptm is gameable** — AF2-designed sequences (BindCraft) tend to score high on ipTM by construction. Use `ipsae_min` as the primary ranking metric instead.
-- **AF2 vs Boltz-2 disagreement** — For short binders (~60aa), Boltz-2 may score high while AF2 scores low. This is meaningful signal, not noise. The `agreement_count` column reflects how many engines agree above the 0.61 threshold.
+- **Engine disagreement is signal, not noise** — For short binders (~60aa), different refolding engines often disagree on interface quality. The `agreement_count` column reflects how many engines pass the 0.61 threshold; higher = stronger candidate.
 - **Binder length is a main driver** — Longer binders tend to score lower on `ipsae_min` (r ≈ -0.78).
 - **Mosaic designs.csv format** — Can mix column formats between workers (old 11-col / new 13-col). The parser must handle this carefully or columns misalign. The `is_top` column marks the ~40 refolded designs out of ~800 total; extractors filter to `is_top=1` by default.
 - **Mosaic `target_sequence` placeholder** — The Mosaic template (`hallucinate_bindmaster.py`) writes `"REPLACE_ME"` as `target_sequence` when not configured. The legacy evaluator guards against using this as a real target sequence.
-- **AF2 pLDDT scale** — ColabDesign `get_plddt()` returns values in [0,1], not [0,100].
-- **PAE ordering** — Boltz-2: [binder|target]; AF2: [target|binder]. Column prefixes distinguish them (`boltz_pae_*` vs `af2_*`).
-- **Append-mode CSVs** — Both `refold_boltz2.py` and `refold_af2.py` append to CSV. If rerun after partial failure, check for duplicate `run_id` entries.
+- **pLDDT scale** — Boltz-2 returns [0,1]; AF3 native is [0,100] and is rescaled to [0,1] on ingest by the refold runner so report columns are directly comparable.
+- **PAE ordering** — Boltz-2 is native [binder|target]. AF3 is token-order so we always put target first in the input JSON, giving [target|binder] — the evaluator transposes internally. Column prefixes distinguish engines (`boltz_pae_*`, `protenix_*`, `af3_*`).
+- **Append-mode CSVs** — `refold_boltz2.py` appends to CSV. If rerun after partial failure, check for duplicate `run_id` entries.
 
 ### Lab-specific information
 
@@ -350,12 +350,8 @@ conda run -n binder-eval binder-compare extract \
 Mosaic/.venv/bin/binder-compare refold-boltz2 \
     --sequences seqs.fasta --target-seq SEQ -o boltz2.csv
 
-# Refold with AF2
-conda run -n binder-eval-af2 binder-compare refold-af2 \
-    --sequences seqs.fasta --target-pdb PDB -o af2.csv
-
-# Generate report
+# Generate report (Boltz-2 only for now; Protenix / AF3 land in Parts J & K)
 conda run -n binder-eval binder-compare report \
-    --boltz2-results boltz2.csv --af2-results af2.csv \
+    --boltz2-results boltz2.csv \
     --sequences seqs.fasta -o ./report
 ```
diff --git a/Evaluator/binder_comparison/__init__.py b/Evaluator/binder_comparison/__init__.py
@@ -1,7 +1,8 @@
 """Binder Design Comparison Tool.
 
-Compare binder sequences from BindCraft, BoltzGen, and Mosaic using
-standardised refolding with both AF2 and Boltz2, then ensemble the results.
+Compare binder sequences from BindCraft, BoltzGen, Mosaic, PXDesign,
+Proteina-Complexa, and Protein Hunter using Boltz-2 standardised refolding
+(plus Protenix on x86 and AF3 on aarch64/DGX Spark).
 """
 
 __version__ = "0.1.0"
diff --git a/Evaluator/binder_comparison/cli/__init__.py b/Evaluator/binder_comparison/cli/__init__.py
@@ -1,3 +1,3 @@
-from . import extract, refold_af2, refold_boltz2, report, run, validate
+from . import extract, parse_seqs, refold_boltz2, refold_protenix, report, run, validate
 
-__all__ = ["extract", "refold_af2", "refold_boltz2", "report", "run", "validate"]
+__all__ = ["extract", "parse_seqs", "refold_boltz2", "refold_protenix", "report", "run", "validate"]
diff --git a/Evaluator/binder_comparison/cli/extract.py b/Evaluator/binder_comparison/cli/extract.py
@@ -20,8 +20,10 @@
     BoltzGenExtractor,
     MosaicExtractor,
     ProteinaComplexaExtractor,
+    ProteinHunterExtractor,
     PXDesignExtractor,
     RFAAExtractor,
+    RFD3Extractor,
 )
 from ..io.write import write_fasta
 
@@ -60,12 +62,25 @@ def run(args: argparse.Namespace) -> None:
         print(f"  → {len(extracted)} sequences")
         all_binders.extend(extracted)
 
+    if args.rfd3:
+        print(f"[extract] RFD3: {args.rfd3}")
+        extracted = RFD3Extractor().extract(args.rfd3)
+        print(f"  → {len(extracted)} sequences")
+        all_binders.extend(extracted)
+
     if args.proteina_complexa:
         print(f"[extract] Proteina-Complexa: {args.proteina_complexa}")
         extracted = ProteinaComplexaExtractor().extract(args.proteina_complexa)
         print(f"  → {len(extracted)} sequences")
         all_binders.extend(extracted)
 
+    if args.protein_hunter:
+        print(f"[extract] Protein-Hunter: {args.protein_hunter}")
+        all_runs = getattr(args, "all_protein_hunter_designs", False)
+        extracted = ProteinHunterExtractor(all_runs=all_runs).extract(args.protein_hunter)
+        print(f"  → {len(extracted)} sequences")
+        all_binders.extend(extracted)
+
     if not all_binders:
         print("[extract] ERROR: no binders found. Check input directories.", file=sys.stderr)
         sys.exit(1)
@@ -107,18 +122,30 @@ def add_parser(subparsers) -> None:
     p.add_argument("--boltzgen", metavar="DIR", help="BoltzGen output directory")
     p.add_argument("--mosaic", metavar="DIR", help="Mosaic output directory (containing designs.csv)")
     p.add_argument("--pxdesign", metavar="DIR", help="PXDesign output directory (containing summary.csv)")
-    p.add_argument("--rfaa", metavar="DIR", help="RFAA output directory (containing sequences.csv)")
+    p.add_argument("--rfaa", metavar="DIR", help="RFAA output directory (legacy — RFD3 preferred)")
+    p.add_argument("--rfd3", metavar="DIR", help="RFD3 / foundry output directory (replaces RFAA)")
     p.add_argument(
         "--proteina-complexa",
         metavar="DIR",
         dest="proteina_complexa",
         help="Proteina-Complexa output directory (containing sequences.csv)",
     )
+    p.add_argument(
+        "--protein-hunter",
+        metavar="DIR",
+        dest="protein_hunter",
+        help="Protein-Hunter output directory (containing summary_high_iptm.csv)",
+    )
     p.add_argument("--output", "-o", required=True, metavar="FILE", help="Output FASTA path (e.g. sequences.fasta)")
     p.add_argument("--keep-duplicates", action="store_true", help="Do not deduplicate identical sequences across tools")
     p.add_argument(
         "--all-mosaic-designs",
         action="store_true",
         help="Include all Mosaic designs (default: only is_top=1 refolded designs)",
     )
+    p.add_argument(
+        "--all-protein-hunter-designs",
+        action="store_true",
+        help="Include all Protein-Hunter designs (default: only summary_high_iptm.csv rows)",
+    )
     p.set_defaults(func=run)
diff --git a/Evaluator/binder_comparison/cli/refold_af2.py b/Evaluator/binder_comparison/cli/refold_af2.py