Part J: Add Protenix as universal 2nd refolding engine, remove AF2#5
Merged
Conversation
Describes the 6 parts of a large refactor:
I — Remove AF2 refolding from Evaluator
J — Protenix refolder (universal, via bindmaster_pxdesign env)
K — AF3 v3.0.2 refolder (aarch64 / DGX Spark only)
L — Protein Hunter with all 6 modalities (protein / cyclic / ligand / DNA / RNA)
M — RFD3 (RosettaCommons/foundry) replaces RFAA, which is hard-deleted
N — Distributed-workflow docs (design on x86, evaluate on Spark)
Key architectural choices captured:
- AF3 is aarch64-only because the 80 GB VRAM target exceeds our 3090 (24 GB)
- x86 keeps Boltz-2 + Protenix as two independent refolding engines
- AF3 weights require a manual Google Form request (2–3 business day wait)
- RFAA deleted (not deprecated); docs/rfaa_manual_reinstall.md retains
commit SHAs + patch list for ad-hoc recreation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Evaluator AF2 refolding is removed. Parts J (Protenix, universal) and K (AF3,
aarch64/DGX Spark) will restore a multi-engine agreement_count.
Note: BindCraft's internal AF2 design, PXDesign's internal AF2 eval, and
Proteina-Complexa's AF2 cross-val all stay — only Evaluator AF2 refolding
is removed.
Deleted:
- Evaluator/scripts/refold_af2.py, refold_Version6.py
- Evaluator/binder_comparison/refolding/af2_runner.py
- Evaluator/binder_comparison/cli/refold_af2.py
- Evaluator/envs/binder-eval-af2.yml
Schema (Evaluator/binder_comparison/core/schema.py):
- dropped 8 af2_* fields from StandardisedMetrics, 2 from PerResidueData
- pruned af2_* entries from LOWER_IS_BETTER and ZSCORE_METRICS
- model_weights default: {"af2": 0.6, "boltz2": 0.4} → {"boltz2": 1.0}
Scoring (comparison/scoring.py):
- deleted add_af2_ipsae_from_files
- compute_agreement engine list now [boltz_pae_ipsae_min, protenix_ipsae_min,
af3_ipsae_min]; Protenix/AF3 columns arrive in Parts J & K
- _best_ipsae_col + rank_by_adaptyv_method no longer consider af2_*
Merger (comparison/merger.py):
- rewritten to Boltz-2 only: merge_refold_results(boltz2_csv, sequences_fasta)
- _load_af2 + _AF2_DROP_COLS gone
CLI:
- binder-compare refold-af2 subcommand removed
- binder-compare run is now a 3-step pipeline (extract → refold-boltz2 → report)
- binder-compare report no longer accepts --af2-results / --af2-pae-dir
Visualization:
- plots.py: METRICS_DISPLAY af2_* entries pruned; plot_af2_vs_boltz2_scatter
deleted; plot_pae_heatmaps / load_pae_data_from_df simplified to Boltz-2 only
- report.py: _compute_af2_boltz2_r + _correlation_callout_html deleted;
all af2_* tooltips and display columns removed; methodology text updated
Installers:
- install/install.sh + install/install_aarch.sh: binder-eval-af2 env is no
longer created; uninstall path still cleans legacy envs
- Evaluator/install.sh: single-env install (binder-eval only)
- Evaluator/evaluate.sh: 2-step pipeline (Boltz-2 + report)
- Evaluator/pyproject.toml: af2 optional deps group dropped
Configurator (configurator/configurator.py):
- evaluator env-detection now checks binder-eval, not the removed
binder-eval-af2
- Prompts + status text updated to "Boltz-2 refolding + ranked report"
Smoke tests pass: all modules import, `binder-compare --help` lists 6 subcommands
(no refold-af2), ruff check + format green, shellcheck clean on updated scripts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a second refolding engine to the Evaluator without creating a new env —
Protenix v0.5.0 (ByteDance's open-source AlphaFold 3 re-implementation) rides
the existing bindmaster_pxdesign conda env that PXDesign already installs.
New CLI subcommand:
conda run -n bindmaster_pxdesign binder-compare refold-protenix \
--sequences seqs.fasta --target-seq SEQ -o protenix_results.csv
Schema additions in core/schema.py:
- StandardisedMetrics gains protenix_* fields (iptm, ptm, ranking_score,
plddt_binder_mean/min, plddt_target_mean, pae_bt/tb/bb, bt_ipsae,
tb_ipsae, ipsae_min). pLDDT rescaled 0-100 → 0-1 on ingest.
- af3_* counterparts reserved for Part K (aarch64 / DGX Spark only).
- PerResidueData gains protenix_pae, af3_pae.
- LOWER_IS_BETTER + ZSCORE_METRICS extended for both engines.
Merger (comparison/merger.py): multi-engine, outer-joins on sequence.
merge_refold_results(boltz2_csv, ..., protenix_csv=..., af3_csv=...)
Scoring (comparison/scoring.py): new generic
add_ipsae_from_pae_files(df, prefix=...)
that computes DunbrackLab d0res ipSAE from any engine's saved PAE .npy.
compute_agreement now sums over
{boltz_pae_ipsae_min, protenix_ipsae_min, af3_ipsae_min}
passing 0.61 — 0–2 on x86, up to 0–3 on Spark when AF3 is wired in Part K.
Orchestration:
- Evaluator/evaluate.sh auto-detects the bindmaster_pxdesign env and runs
Protenix as step 2 of 3 unless --skip-protenix.
- binder-compare run --protenix-env bindmaster_pxdesign enables Protenix.
- binder-compare report gains --protenix-results and --af3-results flags.
Installer (install/install.sh + install/install_aarch.sh): the PXDesign step
now pip-installs binder-compare[report] into bindmaster_pxdesign so the
refolder is callable from there immediately after `bindmaster install --tool
pxdesign`.
Runtime details:
- Protenix weights (~3-4 GB) auto-download from ByteDance TOS on first use.
- need_atom_confidence=True forced at call time so the token-pair PAE is
written to the full_data JSON (required for DunbrackLab ipSAE).
- use_msa=False by default — MSA-free inference, no internet needed after
the initial weight pull.
- chain_plddt[0] = target, chain_plddt[1] = binder (already 0-1 scale).
- Binder atom pLDDT min derived from atom_plddt + atom_to_token_idx +
token_asym_id.
Live smoke test (2 × 43aa random binders vs 76aa ubiquitin):
- Weight download + CUDA init: ~6 min first run; ~10 s warm.
- Inference: ~12 s/design on RTX 3090 at n_cycle=3 n_step=50 n_sample=1.
- CSV + *_pae.npy + CIF all populated; PAE shape (119, 119) matches
target+binder tokens.
- Report pipeline (merger → add_ipsae_from_pae_files → agreement_count →
rank_by_adaptyv_method) green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New design tool: Protein-Hunter (Cho et al. 2025, bioRxiv 10.1101/2025.10.10.681530)
— Boltz-2 / Chai-1 multi-cycle structure hallucination for protein, cyclic
peptide, small-molecule (CCD/SMILES), DNA, and RNA binders (all 6 modalities
supported natively by upstream design.py).
Installer (install/install.sh only — aarch64 deferred):
- PROTEIN_HUNTER_{REPO,COMMIT,DIR} constants
- DO_PROTEIN_HUNTER flag wired into --tool parsing, interactive menu
(8-tool selector), `all` meta-tool, uninstall, and run order
- install_protein_hunter(): clone @ pinned commit d4bd9515..., conda env
bindmaster_protein_hunter (Py 3.10), PyTorch 2.2+ CUDA 12.1, vendored
boltz_ph pip-installed, pyrosetta-installer, chai-lab (sokrypton fork),
LigandMPNN weights symlinked from RFAA install when present
- _write_protein_hunter_shortcut(): bin/protein-hunter opens env shell and
prints the 6-modality flag cheat sheet
- Uninstall case removes env + cloned dir + shortcut
aarch64: Protein-Hunter is x86-only in this release — pyrosetta has no
aarch64 wheel, and the chai-lab fork is untested on ARM. install_aarch.sh
is deliberately untouched.
Evaluator integration:
- extractors/protein_hunter.py: ProteinHunterExtractor reads
summary_high_iptm.csv by default (high-ipTM + %X filter, analogous to
Mosaic is_top=1). Pass all_runs=True → reads summary_all_runs.csv and
extracts best_seq per run.
- Exported from extractors/__init__.py
- cli/extract.py: new --protein-hunter DIR and
--all-protein-hunter-designs flags
- core/schema.py: SourceTool literal gains "protein_hunter" (and the
previously-missing "proteina_complexa")
- visualization/plots.py: TOOL_COLOURS + _TOOL_DISPLAY entries
- visualization/report.py: _TOOL_COLOURS_NGL + _TOOL_DISPLAY + CSS class
.tool-protein_hunter (#00838F teal-cyan)
- cli/report.py: PyMOL color + display name entries
Deferred to follow-up commits:
- Configurator page + modality-specific run-script templates
- aarch64 installer support (pyrosetta blocker)
- Live install smoke test (~30 min + several GB of weights)
Lint clean (ruff + shellcheck); all binder_comparison imports green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RFAA is deprecated in favor of RFD3 (RosettaCommons/foundry v0.1.9) but
remains installable to reproduce existing runs. Decision: soft-delete via
menu removal + deprecation banner rather than ripping out 22 files of tests
/ tui / bindmaster/tools/rfaa code that would also need updating.
Installer (install/install.sh):
- FOUNDRY_{REPO,COMMIT,DIR,WEIGHTS_DIR} constants (v0.1.9, weights under
BindMaster/weights/foundry)
- DO_RFD3 flag, --tool rfd3|foundry parsing, uninstall case
- install_rfd3(): conda env bindmaster_rfd3 (Py 3.12), PyTorch 2.2+ CUDA
12.1 wheels, rc-foundry[rfd3,mpnn] from PyPI, foundry install rfd3
(weights), smoke test on `rfd3 --help`
- _write_rfd3_shortcut(): bin/rfd3 runs `rfd3 design ...` passthrough or
opens an env shell with FOUNDRY_CHECKPOINT_DIR exported
- is_rfd3_installed() status check
- install_rfaa() now prints a deprecation banner pointing at
docs/rfaa_manual_reinstall.md
- --tool all no longer includes RFAA (dropped from meta-tool + interactive
menu; opt in via --tool rfaa on the CLI)
- RFD3 replaces RFAA slot #5 in the 8-tool interactive menu
- Tool summary lists RFAA as (legacy) in yellow
aarch64 installer untouched in this commit (install_aarch.sh) — RFD3 will
wire in during Part K (DGX Spark) work; main benefit of RFD3 there is that
it has no DGL, unblocking the Grace-Hopper path that RFAA could never reach.
Evaluator:
- extractors/rfd3.py — defensive CSV/FASTA parser (foundry output schema
isn't 100% locked in v0.1.9; refine during Part N1 end-to-end test)
- Registered in extractors/__init__.py + cli/extract.py (--rfd3 DIR flag)
- core/schema.py SourceTool literal gains "rfd3"
- Tool colors/displays added in visualization/plots.py, report.py
(_TOOL_COLOURS_NGL, _TOOL_DISPLAY, CSS class .tool-rfd3), cli/report.py
(PyMOL colors)
Docs:
- docs/rfaa_manual_reinstall.md — commit SHAs (f913a19 RFAA, 26ec57a
LigandMPNN), post-install patches, manual install recipe, notable open
upstream PRs (#21 TRP fix, #26 dir-portability, #37 ContigMap), migration
notes to RFD3 including what's compatible / what isn't (config schema,
contig syntax, AtomWorks-handled post-processing).
Lint clean (ruff + shellcheck); all binder_comparison imports green.
Live smoke test for RFD3 install deferred — will happen alongside Part K's
DGX Spark deployment to cover both aarch64 and x86 simultaneously.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single line-wrapping nit left over from the AF2-removal edits in Part I. Re-running ruff format collapses the evaluator detection expression back to a single line now that the condition is short enough. No logic change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements Part J of the AF3 + RFD3 refactor: replaces AlphaFold2 with Protenix v0.5.0 (ByteDance's open-source AF3 reimplementation) as the universal 2nd refolding engine across all platforms. AF2 refolding code is completely removed from the Evaluator, while Protenix is integrated as a lightweight alternative that runs comfortably on 24 GB VRAM.
Key Changes
Evaluator — AF2 removal:
refold_af2.py,refold_Version6.py, andaf2_runner.py(all AF2 refolding logic)binder-eval-af2conda environment definitionrefold-af2CLI subcommand and all AF2-specific argument parsingEvaluator — Protenix integration:
refold_protenix.py(standalone batch refolder for Protenix v0.5.0)iptm,ptm,ranking_score,plddt_*,pae_*protenix_runner.py(CLI wrapper for batch refolding)refold_protenix.pyCLI subcommandmerger.pyto prefix Protenix columns withprotenix_and join onsequencescoring.pyto compute ipSAE from Protenix PAE files (uniform 10 Å cutoff across engines)Installer & tool support:
install.shto add RFD3 (foundry) and Protein-Hunter as new tools--tool rfaa)docs/rfaa_manual_reinstall.mdwith instructions for manual RFAA re-installationNew extractors:
rfd3.pyextractor (defensive CSV/FASTA scanning for RFD3 / foundry outputs)protein_hunter.pyextractor (readssummary_high_iptm.csvorsummary_all_runs.csv)Visualization & reporting:
rfd3(deep-orange) andprotein_hunter(teal-cyan)Workflow updates:
evaluate.shnow requires only--target-seq(no--target-pdb)--protenix-envflag to specify Protenix conda environment (default:bindmaster_pxdesign)--skip-protenixflag to skip Protenix refoldingrun.pyorchestrator to callrefold-protenixinstead ofrefold-af2Implementation Details
bindmaster_pxdesignconda env shipped by PXDesign installer (v0.5.0 pinned)full_data.json; extracted and saved as .npy for ipSAE computationhttps://claude.ai/code/session_01KMBQ6cJe46ZNuDkNpkRrbE