Preprint: "Data-centric training enables meaningful interaction learning in protein–ligand binding affinity prediction." ChemRXiv.
Tip
Always use a virtual environment to manage dependencies.
python -m venv .venv
source .venv/bin/activateQuick setup for inference. Install the package directly from PyPI:
pip install docktdeepFor a containerized setup that requires no Python environment:
# clone the repository (needed for the model checkpoint)
git clone https://github.com/gmmsb-lncc/docktdeep.git
cd docktdeep
# build the image and install the wrapper script
./install.shThis installs a docktdeep command system-wide. Run it from any directory containing your data files:
docktdeep --proteins protein.pdb --ligands ligand.mol2 --output-csv results.csv
# with GPU support
docktdeep --gpu --proteins protein.pdb --ligands ligand.mol2 --output-csv results.csvNote
The wrapper automatically mounts the current working directory into the container. All input files must be in the current directory and output files will be written there.
Predict binding affinities for protein-ligand pairs (predictions are given in kcal/mol).
# single protein-ligand pair
docktdeep predict --proteins protein.pdb --ligands ligand.pdb --output-csv results.csv
# multiple pairs
docktdeep predict \
--proteins protein1.pdb protein2.pdb \
--ligands ligand1.pdb ligand2.pdb \
--output-csv results.csv \
--max-batch-size 16
# single protein with multiple ligands (protein auto-replicated)
docktdeep predict \
--proteins protein.pdb \
--ligands ligand1.mol2 ligand2.mol2 ligand3.mol2 \
--output-csv results.csv
# multi-mol2 file (e.g., docking output with multiple poses)
docktdeep predict \
--proteins protein.pdb \
--ligands docked_poses.mol2
# options available in help
docktdeep predict --helpNote
- When using a single protein with multiple ligands, the protein is automatically replicated (no need to repeat the protein path).
- Multi-mol2 files (common output from docking programs) are automatically split into individual molecules.
For development and training custom models:
# clone the repository
git clone https://github.com/gmmsb-lncc/docktdeep.git
cd docktdeep
# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# install deps
python -m pip install -r requirements.txt
# run tests to verify installation
python -m pytest tests/Initialize a new aim repository for tracking experiments:
aim init
# to start the aim server
aim serverTo see all available training options:
python train.py --helpTrain a model with optimized hyperparameters:
python train.py \
--model Baseline \
--experiment experiment-name \
--depthwise-convs \
--adaptive-pooling \
--optim AdamW \
--max-epochs 1500 \
--batch-size 64 \
--lr 0.00087469 \
--beta1 0.25693012 \
--eps 0.00032933 \
--dropout 0.25348994 \
--wdecay 0.0000169 \
--molecular-dropout 0.06 \
--molecular-dropout-unit complex \
--random-rotation \
--dataframe-path path/to/dataframe.csv \
--root-dir path/to/data/PDBbind2020 \
--ligand-path-pattern "{c}/{c}_ligand_rnum.pdb" \
--protein-path-pattern "{c}/{c}_protein_prep.pdb" \
--split-column random_splitIf you use DockTDeep in your research, please cite:
@article{dasilva2025docktdeep,
title={Data-centric training enables meaningful interaction learning in protein--ligand binding affinity prediction},
author={da Silva, Matheus M. P. and Vidal, Lincon and Guedes, Isabella and de Magalh{\~a}es, Camila and Cust{\'o}dio, F{\'a}bio and Dardenne, Laurent},
year={2025}
}- DockTGrid: a python package for generating deep learning-ready voxel grids of molecular complexes. GitHub.