nf-binder-design

|

Nextflow pipelines for de novo protein binder design.

⚠️ NOTE: Major change in v0.2.0 - individual workflows have been shifted into workflows/, all launched via a single main.nf entry point with the --method flag. To modify any existing wrapper scripts, you should be able to simply use nextflow run Australian-Protein-Design-Initiative/nf-binder-design --method <method>. and keep other arguments the same. ⚠️

RFdiffusion → ProteinMPNN → AlphaFold2 initial guess → Boltz-2 refolding
RFdiffusion Partial Diffusion → Boltz-2 refolding
BindCraft (in parallel across multiple GPUs)
BoltzGen (design proteins and peptides binders using BoltzGen)
"Boltz Pulldown" (an AlphaPulldown-like protocol using Boltz-2)

⚠️ Note: Components of these workflows use RFdiffusion and BindCraft, which depend on PyRosetta/Rosetta, which is free for non-commercial use. Commercial use requires a paid license agreement with University of Washington: https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md and https://rosettacommons.org/software/licensing-faq/

Full documentation: https://australian-protein-design-initiative.github.io/nf-binder-design/

nf-binder-design

Setup

Install Nextflow.

Clone the git repository:

git clone https://github.com/Australian-Protein-Design-Initiative/nf-binder-design

Examples

See the examples directory for examples.

Commandline options

For any of the workflows, you can see the commandline options with --help, eg:

nextflow run Australian-Protein-Design-Initiative/nf-binder-design \
  --method rfd --help
nextflow run Australian-Protein-Design-Initiative/nf-binder-design \
  --method bindcraft --help
nextflow run Australian-Protein-Design-Initiative/nf-binder-design \
  --method boltzgen --help

Available methods: rfd, rfd_partial, bindcraft, boltzgen, boltz_pulldown

Any of the --params commandline options can alternatively be defined in a params.json file and passed to the workflow with -params-file params.json.

Binder design with RFdiffusion

Single node or local workstation

Simple example (single 'local' compute node):

OUTDIR=results
mkdir -p $OUTDIR/logs

nextflow run Australian-Protein-Design-Initiative/nf-binder-design \
    --method rfd \
    --input_pdb target.pdb \
    --outdir $OUTDIR \
    --contigs "[A371-508/A753-883/A946-1118/A1135-1153/0 70-100]" \
    --hotspot_res "A473,A995,A411,A421" \
    --rfd_n_designs=10 \
    --rfdiffusion_batch_size 1 \
    -with-report $OUTDIR/logs/report_$(date +%Y%m%d_%H%M%S).html \
    -with-trace $OUTDIR/logs/trace_$(date +%Y%m%d_%H%M%S).txt \
    -resume \
    -profile local

If you are working on a specific HPC cluster like M3 or MLeRP, you should omit -profile local and add the -c flag pointing to the specific platform config, eg -c conf/platforms/m3.config for M3.

See the rfd workflow documentation for more details on options available.

Parallel on an HPC cluster

A more complex example, as a wrapper script for the M3 HPC cluster, using a the site-specific config (-c), a specific RFdiffusion model (--rfd_model_path), a radius of gyration filter on the generated RFdiffusion backbones (--rfd_filters), custom ProteinMPNN weights (--pmpnn_weigths) and radius of gyration potentials (--rfd_extra_args):

#!/bin/bash
# CHANGE THIS - this is the path where your git clone of this repo is
WF_PATH="/some/path/to/nf-binder-design"

mkdir -p results/logs
DATESTAMP=$(date +%Y%m%d_%H%M%S)

# Ensure our tmp directory is in a location with enough space
export TMPDIR=$(realpath ./tmp)
export NXF_TEMP=$TMPDIR
mkdir -p $TMPDIR

# CHANGE THIS to a path in scratch or scratch2 to act as the cache directory for apptainer
# Containers will be automatically downloaded to this path.
# You can add it to ~/.bashrc if you prefer
export NXF_APPTAINER_CACHEDIR=/some/path/to/scratch2/apptainer_cache
export NXF_APPTAINER_TMPDIR=$TMPDIR

# There's a module for Nextflow on M3
module load nextflow/24.04.3 || true

# CHANGE the --slurm_account to match the project ID you wish to run SLURM jobs under
nextflow \
-c ${WF_PATH}/conf/platforms/m3.config run \
${WF_PATH}/main.nf \
--method rfd \
--slurm_account=ab12 \
--input_pdb 'input/target_cropped.pdb' \
--design_name my-binder \
--outdir results \
--contigs "[B346-521/B601-696/B786-856/0 70-130]" \
--hotspot_res "B472,B476,B484,B488" \
--rfd_n_designs=1000 \
--rfd_batch_size=5 \
--rfd_filters="rg<20" \
--pmpnn_seqs_per_struct=2 \
--pmpnn_relax_cycles=1 \
--pmpnn_weigths="/models/HyperMPNN/retrained_models/v48_020_epoch300_hyper.pt" \
--rfd_model_path="/models/rfdiffusion/Complex_beta_ckpt.pt" \
--rfd_extra_args='potentials.guiding_potentials=[\"type:binder_ROG,weight:7,min_dist:10\"] potentials.guide_decay="quadratic"' \
-resume \
-with-report results/logs/report_${DATESTAMP}.html \
-with-trace results/logs/trace_${DATESTAMP}.txt

See the M3 HPC cluster examples for more examples specific to running on SLURM.

Partial diffusion on binder designs

NOTE: It seems with output from previous designs that the binder is always named chain A, and your other chains are named B, C, etc - irrespective of the chain ID in the original target PDB file. Residue numbering is 1 to N, sequential irrespective of gaps in the chain, rather than original target chain numbering.

OUTDIR=results
mkdir -p $OUTDIR/logs

# Generate 10 partial designs for each binder, in batches of 5
# Note the 'single quotes' around the '*.pdb' glob pattern !
nextflow run Australian-Protein-Design-Initiative/nf-binder-design \
    --method rfd_partial \
    --input_pdb 'my_designs/*.pdb' \
    --rfd_n_partial_per_binder=10 \
    --rfd_batch_size=5 \
    --hotspot_res "A473,A995,A411,A421" \
    --rfd_partial_T=2,5,10,20 \
    -with-report $OUTDIR/logs/report_$(date +%Y%m%d_%H%M%S).html \
    -with-trace $OUTDIR/logs/trace_$(date +%Y%m%d_%H%M%S).txt \
    -profile local

See the rfd_partial workflow documentation for more details on options available.

Binder design with BindCraft

The --method bindcraft workflow helps run BindCraft trajectories in parallel across multiple GPUs. This is particularly well suited for running BindCraft on an HPC cluster, or a workstation with multiple GPUs.

Unlike the default BindCraft configuration which runs for an indeterminate amount of time until a number of accepted designs are found, this pipeline will run a fixed number of trajectories --bindcraft_n_traj and stop.

Example:

DATESTAMP=$(date +%Y%m%d_%H%M%S)

nextflow run Australian-Protein-Design-Initiative/nf-binder-design \
  --method bindcraft \
  --input_pdb 'input/PDL1.pdb' \
  --outdir results \
  --target_chains "A" \
  --hotspot_res "A56,A125" \
  --hotspot_subsample 0.5 \
  --binder_length_range "55-120" \
  --bindcraft_n_traj 2 \
  --bindcraft_batch_size 1 \
  --bindcraft_advanced_settings_preset "default_4stage_multimer" \
  --bindcraft_filters_preset "default_filters" \
  -profile local \
  -resume \
  -with-report results/logs/report_${DATESTAMP}.html \
  -with-trace results/logs/trace_${DATESTAMP}.txt

--bindcraft_advanced_settings_preset and --bindcraft_filters_preset are are those available in the BindCraft settings_advanced and settings_filters directories (without the .json extension).

--hotspot_subsample randomly takes this random proportion of the hotspot residues for each design, allowing the impact of hotspot selection to be explored in a single run.

If you have multiple GPUs per compute node, you can specify them with the --gpu_devices flag, eg --gpu_devices=0,1.

Results are saved to the --outdir directory, in the bindcraft subdirectory, with CSV outputs from each batch combined into single tables, eg bindcraft/final_design_stats.csv.

A report summarizing the results is generated in bindcraft_report.html.

See the bindcraft workflow documentation for more details on options and output.

Binder design with BoltzGen

The --method boltzgen workflow automates the design of binders using the BoltzGen generative model. It supports protein-anything, peptide-anything, protein_small-molecule and nanobody-anything protocols.

Example:

nextflow run Australian-Protein-Design-Initiative/nf-binder-design \
    --method boltzgen \
    --config_yaml config/my_design.yaml \
    --outdir results \
    --num_designs 100 \
    --batch_size 10 \
    --devices 1

See the boltzgen workflow documentation for more details on configuration files and options.

License

MIT

Note that some software dependencies of the pipeline are under less permissive licenses - in particular, RFdiffusion and BindCraft use Rosetta/PyRosetta which is only free for Non-Commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github/workflows		.github/workflows
assets		assets
bin		bin
conf/platforms		conf/platforms
docs		docs
examples		examples
models		models
modules/local		modules/local
subworkflows/local		subworkflows/local
workflows		workflows
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
boltzgen_filter.nf		boltzgen_filter.nf
combine_scores.sh		combine_scores.sh
main.nf		main.nf
nextflow.config		nextflow.config
nextflow.grumpus.config		nextflow.grumpus.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-binder-design

Setup

Examples

Commandline options

Binder design with RFdiffusion

Single node or local workstation

Parallel on an HPC cluster

Partial diffusion on binder designs

Binder design with BindCraft

Binder design with BoltzGen

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nf-binder-design

Setup

Examples

Commandline options

Binder design with RFdiffusion

Single node or local workstation

Parallel on an HPC cluster

Partial diffusion on binder designs

Binder design with BindCraft

Binder design with BoltzGen

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages