Developed at Artificial Intelligence Protein Design Lab
This repository provides a reproducible workflow for ligand binder design using diffusion models. The workflow integrates multiple state-of-the-art tools to generate and validate protein binders for small molecule ligands.
The workflow consists of 9 sequential steps:
- Parameters Setup (
0_params/) - Ligand preparation and parameter files - Backbone Generation (
1_diffusion/) - Structure generation using RFdiffusion - Backbone Filtering (
2_backbone_filter/) - DSSP and SASA-based filtering - Sequence Design (
3_lmpnn/) - Sequence generation using LigandMPNN - Rosetta Scoring (
4_rscore_filter/) - Energy-based filtering - AlphaFold3 Prediction (
5_af3/) - Structure prediction and validation - Boltz Prediction (
6_boltz/) - Alternative structure prediction - PLACER Analysis (
7_placer/) - Binding site analysis - RMSD Filtering (
8_rmsd_filter/) - Final structure validation
- Reproducible Workflow: Complete protocol from ligand input to validated binders
- Multiple Validation Steps: Combines geometric, energetic, and structural filters
- Modern AI Tools: Utilizes RFdiffusion, LigandMPNN, AlphaFold3, and Boltz
- Scalable: Designed for SLURM-based cluster environments
- Performance Optimized: Multiprocessing enabled for DSSP/SASA/Rosetta scoring and RMSD calculations
- Enhanced RMSD Analysis: Biopython-based structure handling with RDKit for ligand symmetry-aware RMSD calculations
- RFdiffusion All-Atom
- LigandMPNN
- Rosetta
- AlphaFold3
- Boltz
- PLACER
- TMalign
- PyMOL
- Python 3.8+
- PyArrow (for parquet file handling)
Note: This workflow uses parquet file format to handle double headers efficiently. Parquet files can be viewed and analyzed using VSCode's Data Wrangler extension.
- Place your ligand files in the
0_params/directory - Run each workflow step sequentially using the provided
run.shscripts - Review and filter results at each step as needed
- Final candidates will be available in
8_rmsd_filter/