Skip to content

Alberto024/protein-pocket-volume

Repository files navigation

protein-pocket-volume

This repository contains a tool that can be used to calculate the volume of a pocket within a protein.

This tool streamlines the process of aligning a set of protein structures to a common reference via USalign, running fpocket on the aligned structures, and calculating the volume based on a user-defined bounding box.

Python Version License: MIT

Overview

There are many tools available that can identify protein cavities and estimate volume, but many seem to struggle with consistently defining the boundary between bulk solvent and the inner cavity. To make fair comparisons of pocket volume between protein homologs, this boundary should be defined consistently.

The protein_pocket_volume tool in this repo tries to address this challenge through two key steps:

  1. Structural Alignment: All proteins must be superimposed onto a common reference frame.
  2. User-Defined Bounding Box: A bounding box is set appropriately on the reference protein to define the cavity of interest.

Prerequisites

  • Python 3.11++
  • Docker: The Docker daemon must be running.
  • Git: For cloning the USalign repository.
  • UV: For installing this package.

Installation

1. Build the USalign Docker Image

This repo uses docker for running USalign. You must build this image first.

# Clone the official USalign repository
git clone https://github.com/pylelab/USalign.git

# Navigate into the directory and build the Docker image
cd USalign
docker build -t usalign:latest .

2. Install protein-pocket-volume

You can now install the protein-pocket-volume package into a new virtual environment with:

# Clone this repository
git clone https://github.com/Alberto024/protein-pocket-volume.git
cd protein-pocket-volume

# Create virtual environment for protein-pocket-volume
uv sync

# Activate the virtual environment
source .venv/bin/activate

Alternatively, you can install it into an existing virtual environment via uv by running:

# Install without adding to pyproject.toml
uv pip install 'git+https://github.com/Alberto024/protein-pocket-volume.git'

# Or add to pyproject.toml
uv add 'git+https://github.com/Alberto024/protein-pocket-volume.git'

You can check if the installation was successful by running:

protein_pocket_volume --help

Usage

The main command is protein_pocket_volume. You must provide an input directory of target PDBs and a reference PDB for alignment.

Additionally, you must define the bounding box parameters using the --grid-center and --grid-size options which each accept 3 floating point numbers. Furthermore, the fpocket parameters should be tuned such that your cavity of interest is appropriately recognized and mapped out using the --fpocket-min-spheres-per-pocket, --fpocket-min-sphere-size, and --fpocket-max-sphere-size options.

Command Line Usage:

╰─➤  protein_pocket_volume --help

 Usage: protein_pocket_volume [OPTIONS] INPUT_DIR REFERENCE_PDB

 Calculate active site volumes for all PDB files in a directory.

 This command performs the following steps: 1. Sets up output directories. 2. Aligns all PDBs in the input
 directory to the reference PDB using USalign. 3. Runs fpocket on each newly aligned PDB file. 4. Concatenates the
 resulting pocket PQR files for each protein. 5. Calculates the volume for each protein based on the defined grid.
 6. Saves the results to a CSV file.

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    input_dir          DIRECTORY  Directory containing PDB files. [default: None] [required]                    │
│ *    reference_pdb      FILE       Path to the reference PDB file for alignment. [default: None] [required]      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --output-dir                                          DIRECTORY               Directory to save fpocket outputs  │
│                                                                               and intermediate files.            │
│                                                                               [default: pocket_volume_outputs]   │
│ --grid-center                                         <FLOAT FLOAT FLOAT>...  Grid center coordinates (X Y Z).   │
│                                                                               [default: 0.0, 0.0, 0.0]           │
│ --grid-size                                           <FLOAT FLOAT FLOAT>...  Grid size in angstroms (Length     │
│                                                                               Width Height).                     │
│                                                                               [default: 16.0, 16.0, 16.0]        │
│ --grid-resolution                                     INTEGER                 Number of points along each grid   │
│                                                                               axis.                              │
│                                                                               [default: 30]                      │
│ --fpocket-min-spheres-per-pocket                      INTEGER                 fpocket -i: min spheres per        │
│                                                                               pocket.                            │
│                                                                               [default: 15]                      │
│ --fpocket-min-sphere-size                             FLOAT                   fpocket -m: min sphere size.       │
│                                                                               [default: 3.4]                     │
│ --fpocket-max-sphere-size                             FLOAT                   fpocket -M: max sphere size.       │
│                                                                               [default: 6.2]                     │
│ --visualize                         --no-visualize                            Visualize point cloud.             │
│                                                                               [default: no-visualize]            │
│ --install-completion                                                          Install completion for the current │
│                                                                               shell.                             │
│ --show-completion                                                             Show completion for the current    │
│                                                                               shell, to copy it or customize the │
│                                                                               installation.                      │
│ --help                                                                        Show this message and exit.        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The protein_pocket_volume tool once executed will produce output in the following format:

pocket_volume_outputs/
  aligned_pdbs/
    example_protein.aligned.pdb
  fpocket_outputs/
    example_protein.aligned_out/
      example_protein.aligned_pockets.pqr
      pockets/
        pocket1_vert.pqr
        pocket2_vert.pqr
        ...
  pqr_files/
    example_protein.pqr
  logs/
    fpocket.log
    usalign.lot
  pocket_volumes.csv

How to choose a bounding box

Choosing the centroid and box dimensions for the pocket bounding box is critical. It should be chosen such that it will capture all the variation in your pocket of interest.

The recommended approach is to first optimize the fpocket parameters on your reference protein.

You can easily run fpocket on a protein via docker. For example, if you are in a directory with the file protein.pdb then you can run fpocket with:

docker run --rm -v `pwd`:/workdir fpocket/fpocket fpocket -f /workdir/protein.pdb -i 30 -m 3.4 -M 6.2

Take note of the parameters -i, -m, and -M which can be used to tune the fpocket algorithm. There are other options as well that you can see by running:

docker run --rm fpocket/fpocket fpocket -h

Or going to the fpocket documentation.

Fpocket will produce an output directory in your current directory that should contain a file named protein_out/protein.pml which you can open via pymol with

pymol protein_out/protein.pml

Make sure to choose fpocket parameters such that your active site is fully mapped out. It's okay if there are off-target pockets detected as they will be ignored once you select a bounding box.

In pymol, you can draw a bounding box by loading the script named scripts/draw_box.py located in this repository.

For example, first load the script in the pymol command line:

PyMOL>load protein-pocket-volume/scripts/draw_box.py

Note, the path to the draw_box.py should be correctly set. Then you can visualize a bounding box with:

PyMOL>cgo_box(center=[0,0,0], size=[16,16,16], name='box1', color='blue')

Adjust the center and size parameters until your bounding box covers your pocket of interest, but be sure to provide it extra space to account for the variation that other protein homologs will have.

One way to get a good starting point is to select 2 atoms that can act as vertices on opposite corners of the bounding box and then use the cmd.get_atom_coords command to get their coordinates individually, then use those coordinates to define a bounding box centroid and dimensions.

PyMOL>print(cmd.get_atom_coords('sele'))
[123.03800201416016, 113.74600219726562, 3.4489998817443848]

Alternatively, you can use the calculate_box_dimensions command after loading the draw_box.py script to calculate the centroid and dimensions of the bounding box. The calculate_box_dimensions command requires a selection of exactly 2 atoms and can optionally draw a bounding box automatically. For example:

PyMOL>calculate_box_dimensions(selection='sele', draw=True)
Analysis for selection: 'sele'
  - Point 1: [123.038 113.746   3.449]
  - Point 2: [136.215  98.486  12.934]
  - Box Centroid: [129.6265 106.116    8.1915]
  - Box Dimensions: [13.176994 15.260002  9.485001]
Bounding box drawn with name: box_4909

The bounding box will be used to define the pocket for every homolog of interest, so it is important to define it in a way that will not cut off the pocket for other homologs with larger pockets.

Alternatively, the AutoDockTools pymol plugin can be used to visualize a bounding box.

Lastly, you can visualize the point cloud defined by the intersection of the bounding box and the fpocket mapped spheres by using the --visualize flag of the protein_pocket_volume command. An example plot will be made like the one below:

Example Point Cloud

Limitations

The calculation of the pocket volume is only an estimate for several reasons:

  • Proteins are dynamic and measuring the volume of an individual structure will only be a snapshot of possible conformations
  • The bounding box is user defined to capture variation in the cavity of interest, but its not necessarily the true bulk solvent/pocket boundary
  • Defining the bounding box for a single reference protein that is applied to all homologs of interest may not always be appropriate. Doing this carries the assumption that the cavity of interest is conserved enough across your protein homologs that a single bounding box would provide a reasonable boundary for all of them.
  • Choosing a single reference protein to align every homolog to carries bias. Proteins that are more structurally similar to the reference protein will have a bounding box applied more consistently than proteins that are more structurally distant. An alternative approach would be to perform a multiple structure alignment from all homologs of interest and using that alignment to define a bounding box. A challenge with that approach is that multiple structure alignments for hundreds to thousands of proteins is computationally intensive. It is possible to take an intermediate approach and select a handful of representatives that could be used to perform a smaller but still representative multiple structural alignment.
  • Using a plane to define the bulk solvent boundary (via a bounding box) is possibly not the ideal geometric shape. It's possible that some kind of curved surface would be more appropriate.

Also, defining the bounding box using a centroid and box dimensions is not the most user-friendly and requires careful consideration and tuning from the user. A more automated approach could be to automatically define a bounding box using structurally conserved points on the proteins themselves.

Furthermore, volume itself is only a proxy for characterizing steric interactions in a pocket. There are numerous other factors that play a functional role in protein pockets including morphology, electrostatics, hydrophobicity, etc. To get a more holistic view of the properties of a pocket these should all be considered.

Citation

If you use this code, please cite:

Englund E, Schmidt M, Nava AA, Lechner A, Deng K, Jocic R, Lin Y, Roberts J, Benites VT, Kakumanu R, Gin JW.
Expanding extender substrate selection for unnatural polyketide biosynthesis by acyltransferase domain exchange within a modular polyketide synthase.
Journal of the American Chemical Society. 2023 Apr 14;145(16):8822-32.
https://doi.org/10.1021/jacs.2c11027

Additionally, this code relies on the following programs:

USalign

Zhang, C., Shine, M., Pyle, A.M. et al.
US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes.
Nat Methods 19, 1109–1115 (2022).
https://doi.org/10.1038/s41592-022-01585-1

FPocket

Le Guilloux, V., Schmidtke, P. & Tuffery, P.
Fpocket: An open source platform for ligand pocket detection.
BMC Bioinformatics 10, 168 (2009).
https://doi.org/10.1186/1471-2105-10-168
  • docker
  • numpy
  • pandas
  • pydantic
  • rich
  • tqdm
  • typer
  • matplotlib
  • ruff
  • uv

We thank all the authors and maintainers for their open-source contributions!

About

Measure volume of a protein pocket

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages