Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,38 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
### Changed
- **BindCraft pin** `828fd9f` → `7cd4ace` (3 upstream bugfixes): graylab→west.rosettacommons.org PyRosetta wheels (x86_64), `range(11,15)→(11,16)` model-selection fix, stage-3 `onehot_plddt` init + `align_pdbs` crash guard

### Added (Parts L + M — Protein-Hunter & RFD3, on `refactor/af3-rfd3-ph`)
- **Part L — Protein-Hunter** (Cho et al. 2025) installable via `bindmaster install --tool protein-hunter` (x86 only; aarch64 blocked by pyrosetta). Conda env `bindmaster_protein_hunter` (Py 3.10), vendored Boltz-2 + LigandMPNN + Chai-1 (sokrypton fork), shortcut `bin/protein-hunter`. New Evaluator extractor reads `summary_high_iptm.csv` by default (`--all-protein-hunter-designs` for all runs). Supports all 6 modalities via upstream `design.py` flags (protein / cyclic / ligand-CCD / ligand-SMILES / DNA / RNA). `SourceTool` Literal + tool colors/displays extended.
- **Part M — RFD3 (RosettaCommons/foundry v0.1.9)** installable via `bindmaster install --tool rfd3`. Conda env `bindmaster_rfd3` (Py 3.12), `rc-foundry[rfd3,mpnn]` from PyPI, weights at `BindMaster/weights/foundry/`. BSD-3-Clause, commercial-use OK, works on aarch64 (no DGL). Shortcut `bin/rfd3` runs `rfd3 design ...` or opens an env shell. New `RFD3Extractor` with defensive CSV/FASTA parsing. Tool colors/displays added.
- **RFAA deprecated (not deleted)**. Dropped from interactive menu and from the `--tool all` meta-tool. Still installable via `bindmaster install --tool rfaa` for reproducing existing runs. `install_rfaa()` now prints a deprecation banner pointing at RFD3 and `docs/rfaa_manual_reinstall.md`.
- **New doc** `docs/rfaa_manual_reinstall.md` captures commit SHAs, post-install patches, and manual-reproducibility steps for long-term RFAA maintenance.

### Added (Part J — Protenix refolder, on `refactor/af3-rfd3-ph`)
- **Protenix v0.5.0 as universal 2nd refolding engine** — ByteDance's open-source AlphaFold 3 reimplementation (~3-4 GB weights auto-downloaded from ByteDance TOS, runs comfortably on 24 GB GPUs).
- New CLI: `binder-compare refold-protenix` — runs inside the existing `bindmaster_pxdesign` conda env (no new env needed).
- New files: `Evaluator/scripts/refold_protenix.py`, `Evaluator/binder_comparison/refolding/protenix_runner.py`, `Evaluator/binder_comparison/cli/refold_protenix.py`.
- Schema: `protenix_*` columns in `StandardisedMetrics` (iptm, ptm, ranking_score, plddt_binder_mean/min, plddt_target_mean, pae_bt/tb/bb, bt_ipsae, tb_ipsae, ipsae_min). `af3_*` counterparts also reserved for Part K. pLDDT rescaled 0-100 → 0-1 on ingest.
- Scoring: new generic `add_ipsae_from_pae_files(df, prefix=...)` for any engine's saved PAE matrix.
- Merger: multi-engine support — `merge_refold_results(boltz2_csv, ..., protenix_csv=..., af3_csv=...)`. Accepts any combination; outer-joins on `sequence`.
- `compute_agreement` now sums {boltz_pae_ipsae_min, protenix_ipsae_min, af3_ipsae_min} passing the 0.61 threshold (0–3 on Spark, 0–2 on x86).
- Orchestration:
- `Evaluator/evaluate.sh` auto-detects `bindmaster_pxdesign`; Protenix step runs between Boltz-2 and report unless `--skip-protenix` or env missing.
- `binder-compare run --protenix-env bindmaster_pxdesign` enables Protenix; omit to skip.
- `binder-compare report` gains `--protenix-results` and `--af3-results`.
- Installer: PXDesign step now pip-installs `binder-compare` into `bindmaster_pxdesign` env so Protenix refolding is available after `bindmaster install --tool pxdesign`.
- **Live smoke test passed** — 2 × 43aa random binders against 76aa ubiquitin target: inference ~12 s/design on RTX 3090, CSV + `*_pae.npy` populated, token-pair PAE extracted via `need_atom_confidence=True`, DunbrackLab ipSAE computed downstream in the report.

### Removed (Part I — AF2 refolding removal, on `refactor/af3-rfd3-ph`)
- Evaluator AF2 refolding is gone. This is step 1 of the AF3/Protenix refactor; AF3 (aarch64-only, DGX Spark) and Protenix (universal) will provide the second engine in Parts J & K.
- Deleted files: `Evaluator/scripts/refold_af2.py`, `Evaluator/scripts/refold_Version6.py`, `Evaluator/binder_comparison/refolding/af2_runner.py`, `Evaluator/binder_comparison/cli/refold_af2.py`, `Evaluator/envs/binder-eval-af2.yml`
- Installer no longer creates `binder-eval-af2` conda env (uninstall path still cleans legacy installs)
- Schema: removed 8 `af2_*` fields from `StandardisedMetrics`, 2 from `PerResidueData`; pruned `af2_*` entries from `LOWER_IS_BETTER`, `ZSCORE_METRICS`; `model_weights` default now `{"boltz2": 1.0}`
- Scoring: deleted `add_af2_ipsae_from_files`; `compute_agreement` engine list now `[boltz_pae_ipsae_min, protenix_ipsae_min, af3_ipsae_min]` (Protenix/AF3 columns land in Parts J & K)
- Merger: `merge_refold_results(boltz2_csv, sequences_fasta)` (dropped `af2_csv` param)
- Report & plots: removed `_compute_af2_boltz2_r`, `_correlation_callout_html`, `plot_af2_vs_boltz2_scatter`; pruned all `af2_*` columns from display lists and tooltip map
- Evaluator orchestration: `evaluate.sh` is now 2-step (Boltz-2 + report); `binder-compare run` is 3-step (extract + refold-boltz2 + report)
- BindCraft's internal AF2 design path, PXDesign's internal AF2 eval, and Proteina-Complexa's AF2 cross-val **all stay** — only Evaluator AF2 refolding was removed

### Fixed
- Configurator `ask_choice()` return value destructuring for PXDesign mode and preset selection
- RFAA template: Python 3.12 f-string syntax replaced with 3.10-compatible `ligand_line` variable
Expand Down
32 changes: 14 additions & 18 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ Target structure (.pdb / .mmcif)
→ Evaluator:
1. Extract sequences from all tool outputs
2. Refold with Boltz-2 (Mosaic venv)
3. Refold with AF2 (ColabDesign)
4. Rank, score, and generate HTML report
3. (x86) Refold with Protenix (bindmaster_pxdesign env) [Part J, in progress]
4. (aarch64 / DGX Spark) Refold with AlphaFold 3 v3.0.2 (binder-eval-af3) [Part K, in progress]
5. Rank, score, and generate HTML report
```

### Directory layout
Expand All @@ -46,9 +47,9 @@ BindMaster/
│ └── evaluator.py ← lightweight evaluator (Mosaic venv, ~780 lines)
├── Evaluator/ ← bundled full evaluation pipeline package
│ ├── binder_comparison/ ← core Python package (extractors, refolding, scoring, viz)
│ ├── scripts/ ← standalone refold scripts (refold_boltz2.py, refold_af2.py)
│ ├── scripts/ ← standalone refold scripts (refold_boltz2.py, refold_protenix.py [todo], refold_af3.py [todo])
│ ├── evaluate.sh ← shell orchestrator for full 4-step pipeline
│ ├── envs/ ← conda env specs (binder-eval.yml, binder-eval-af2.yml)
│ ├── envs/ ← conda env specs (binder-eval.yml, binder-eval-af3.yml [aarch64 only, todo])
│ ├── docs/ ← pipeline_reference.md (metrics, known issues)
│ └── pyproject.toml ← package: "binder-comparison" v0.1.0
├── bindmaster_examples/
Expand Down Expand Up @@ -85,7 +86,6 @@ Each tool runs in its own isolated environment. **Never mix packages across envi
| `bindmaster_pxdesign` | PXDesign | 3.11 | conda | Protenix binder design + eval |
| `Proteina-Complexa/.venv` | Proteina-Complexa | 3.12 | uv | Flow matching + test-time compute binder design |
| `binder-eval` | Evaluator | 3.10 | conda | Sequence extraction + reporting |
| `binder-eval-af2` | Evaluator | 3.10 | conda | AF2 refolding via ColabDesign |

The `bindmaster.py` CLI dispatcher uses `os.execv()` to launch sub-commands in their correct environment — `install` runs in bash, `configure` runs in system Python, `evaluate` runs in the Mosaic `.venv` Python.

Expand All @@ -110,7 +110,7 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments
- **stdlib-only CLI:** `bindmaster.py` uses only stdlib so it works on any Python 3.10+ without pip installs.
- **uv for Mosaic:** Mosaic uses `uv` instead of conda because it needs JAX with CUDA, and uv resolves this faster and more reliably.
- **Pinned commits:** Tool repos are cloned at pinned commits (`BINDCRAFT_COMMIT`, `BOLTZGEN_COMMIT`, `MOSAIC_COMMIT`) for reproducible installs.
- **Separate evaluator envs:** Boltz-2 refolding needs JAX (Mosaic venv), AF2 refolding needs ColabDesign (conda). These conflict, so they run in separate environments orchestrated by `evaluate.sh`.
- **Separate evaluator envs:** Boltz-2 refolding runs in the Mosaic venv (JAX). The new Protenix refolder (Part J) rides the existing `bindmaster_pxdesign` conda env. AF3 (Part K) on DGX Spark gets its own `binder-eval-af3` env. `evaluate.sh` orchestrates all three.

---

Expand Down Expand Up @@ -147,7 +147,7 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments
- Python classes: PascalCase
- Python variables/functions: snake_case
- Bash constants: UPPER_CASE
- Conda envs: BindCraft, BoltzGen, binder-eval, binder-eval-af2
- Conda envs: BindCraft, BoltzGen, binder-eval, bindmaster_pxdesign, bindmaster_rfaa (legacy — being replaced by bindmaster_rfd3)

### Git and branching

Expand Down Expand Up @@ -190,7 +190,7 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments

### Evaluation metrics and ranking

**Primary metric: `ipsae_min`** — the minimum of binder→target and target→binder iPSAE scores. Computed from PAE arrays using the DunbrackLab 2025 formula: `max_i[mean_j(1/(1+(PAE_ij/d0)²))]` (d0_res variant, uniform 10 Å PAE cutoff for both Boltz-2 and AF2). Ranking uses agreement_count (how many engines agree ipsae_min > 0.61) as primary sort, then ipsae_min desc.
**Primary metric: `ipsae_min`** — the minimum of binder→target and target→binder iPSAE scores. Computed from PAE arrays using the DunbrackLab 2025 formula: `max_i[mean_j(1/(1+(PAE_ij/d0)²))]` (d0_res variant, uniform 10 Å PAE cutoff across all engines). Ranking uses agreement_count (how many engines agree ipsae_min > 0.61) as primary sort, then ipsae_min desc.

**Direction guide:**
- **Higher is better:** `iptm`, `bt_ipsae`, `tb_ipsae`, `ipsae_min`, `plddt_binder_mean`, `binder_ptm`
Expand All @@ -208,13 +208,13 @@ In **standalone mode** (`--standalone` or auto-detected), all conda environments
### Critical domain facts

- **iptm is gameable** — AF2-designed sequences (BindCraft) tend to score high on ipTM by construction. Use `ipsae_min` as the primary ranking metric instead.
- **AF2 vs Boltz-2 disagreement** — For short binders (~60aa), Boltz-2 may score high while AF2 scores low. This is meaningful signal, not noise. The `agreement_count` column reflects how many engines agree above the 0.61 threshold.
- **Engine disagreement is signal, not noise** — For short binders (~60aa), different refolding engines often disagree on interface quality. The `agreement_count` column reflects how many engines pass the 0.61 threshold; higher = stronger candidate.
- **Binder length is a main driver** — Longer binders tend to score lower on `ipsae_min` (r ≈ -0.78).
- **Mosaic designs.csv format** — Can mix column formats between workers (old 11-col / new 13-col). The parser must handle this carefully or columns misalign. The `is_top` column marks the ~40 refolded designs out of ~800 total; extractors filter to `is_top=1` by default.
- **Mosaic `target_sequence` placeholder** — The Mosaic template (`hallucinate_bindmaster.py`) writes `"REPLACE_ME"` as `target_sequence` when not configured. The legacy evaluator guards against using this as a real target sequence.
- **AF2 pLDDT scale** — ColabDesign `get_plddt()` returns values in [0,1], not [0,100].
- **PAE ordering** — Boltz-2: [binder|target]; AF2: [target|binder]. Column prefixes distinguish them (`boltz_pae_*` vs `af2_*`).
- **Append-mode CSVs** — Both `refold_boltz2.py` and `refold_af2.py` append to CSV. If rerun after partial failure, check for duplicate `run_id` entries.
- **pLDDT scale** — Boltz-2 returns [0,1]; AF3 native is [0,100] and is rescaled to [0,1] on ingest by the refold runner so report columns are directly comparable.
- **PAE ordering** — Boltz-2 is native [binder|target]. AF3 is token-order so we always put target first in the input JSON, giving [target|binder] — the evaluator transposes internally. Column prefixes distinguish engines (`boltz_pae_*`, `protenix_*`, `af3_*`).
- **Append-mode CSVs** — `refold_boltz2.py` appends to CSV. If rerun after partial failure, check for duplicate `run_id` entries.

### Lab-specific information

Expand Down Expand Up @@ -350,12 +350,8 @@ conda run -n binder-eval binder-compare extract \
Mosaic/.venv/bin/binder-compare refold-boltz2 \
--sequences seqs.fasta --target-seq SEQ -o boltz2.csv

# Refold with AF2
conda run -n binder-eval-af2 binder-compare refold-af2 \
--sequences seqs.fasta --target-pdb PDB -o af2.csv

# Generate report
# Generate report (Boltz-2 only for now; Protenix / AF3 land in Parts J & K)
conda run -n binder-eval binder-compare report \
--boltz2-results boltz2.csv --af2-results af2.csv \
--boltz2-results boltz2.csv \
--sequences seqs.fasta -o ./report
```
5 changes: 3 additions & 2 deletions Evaluator/binder_comparison/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""Binder Design Comparison Tool.

Compare binder sequences from BindCraft, BoltzGen, and Mosaic using
standardised refolding with both AF2 and Boltz2, then ensemble the results.
Compare binder sequences from BindCraft, BoltzGen, Mosaic, PXDesign,
Proteina-Complexa, and Protein Hunter using Boltz-2 standardised refolding
(plus Protenix on x86 and AF3 on aarch64/DGX Spark).
"""

__version__ = "0.1.0"
4 changes: 2 additions & 2 deletions Evaluator/binder_comparison/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from . import extract, refold_af2, refold_boltz2, report, run, validate
from . import extract, parse_seqs, refold_boltz2, refold_protenix, report, run, validate

__all__ = ["extract", "refold_af2", "refold_boltz2", "report", "run", "validate"]
__all__ = ["extract", "parse_seqs", "refold_boltz2", "refold_protenix", "report", "run", "validate"]
29 changes: 28 additions & 1 deletion Evaluator/binder_comparison/cli/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,10 @@
BoltzGenExtractor,
MosaicExtractor,
ProteinaComplexaExtractor,
ProteinHunterExtractor,
PXDesignExtractor,
RFAAExtractor,
RFD3Extractor,
)
from ..io.write import write_fasta

Expand Down Expand Up @@ -60,12 +62,25 @@ def run(args: argparse.Namespace) -> None:
print(f" → {len(extracted)} sequences")
all_binders.extend(extracted)

if args.rfd3:
print(f"[extract] RFD3: {args.rfd3}")
extracted = RFD3Extractor().extract(args.rfd3)
print(f" → {len(extracted)} sequences")
all_binders.extend(extracted)

if args.proteina_complexa:
print(f"[extract] Proteina-Complexa: {args.proteina_complexa}")
extracted = ProteinaComplexaExtractor().extract(args.proteina_complexa)
print(f" → {len(extracted)} sequences")
all_binders.extend(extracted)

if args.protein_hunter:
print(f"[extract] Protein-Hunter: {args.protein_hunter}")
all_runs = getattr(args, "all_protein_hunter_designs", False)
extracted = ProteinHunterExtractor(all_runs=all_runs).extract(args.protein_hunter)
print(f" → {len(extracted)} sequences")
all_binders.extend(extracted)

if not all_binders:
print("[extract] ERROR: no binders found. Check input directories.", file=sys.stderr)
sys.exit(1)
Expand Down Expand Up @@ -107,18 +122,30 @@ def add_parser(subparsers) -> None:
p.add_argument("--boltzgen", metavar="DIR", help="BoltzGen output directory")
p.add_argument("--mosaic", metavar="DIR", help="Mosaic output directory (containing designs.csv)")
p.add_argument("--pxdesign", metavar="DIR", help="PXDesign output directory (containing summary.csv)")
p.add_argument("--rfaa", metavar="DIR", help="RFAA output directory (containing sequences.csv)")
p.add_argument("--rfaa", metavar="DIR", help="RFAA output directory (legacy — RFD3 preferred)")
p.add_argument("--rfd3", metavar="DIR", help="RFD3 / foundry output directory (replaces RFAA)")
p.add_argument(
"--proteina-complexa",
metavar="DIR",
dest="proteina_complexa",
help="Proteina-Complexa output directory (containing sequences.csv)",
)
p.add_argument(
"--protein-hunter",
metavar="DIR",
dest="protein_hunter",
help="Protein-Hunter output directory (containing summary_high_iptm.csv)",
)
p.add_argument("--output", "-o", required=True, metavar="FILE", help="Output FASTA path (e.g. sequences.fasta)")
p.add_argument("--keep-duplicates", action="store_true", help="Do not deduplicate identical sequences across tools")
p.add_argument(
"--all-mosaic-designs",
action="store_true",
help="Include all Mosaic designs (default: only is_top=1 refolded designs)",
)
p.add_argument(
"--all-protein-hunter-designs",
action="store_true",
help="Include all Protein-Hunter designs (default: only summary_high_iptm.csv rows)",
)
p.set_defaults(func=run)
73 changes: 0 additions & 73 deletions Evaluator/binder_comparison/cli/refold_af2.py

This file was deleted.

Loading
Loading