Skip to content

Protein Mutation Mapping fails on Complex Phase using Apo Mapping JSON file #120

@sukritsingh

Description

@sukritsingh

Describe the bug
Context: I am trying to run a protein mutation protocol on a protein protein interaction (Apo phase = protein monomer, Complex phase = Protein dimer). In the Apo phase, I have successfully generated a JSON of the mapping, a DAG, and even run a full CycleUnit of the transformation in the Apo phase (which is great!).

I was told that I should use the same Mapping from the Apo phase when doing the complex phase.

Issue: However, upon running DAG on the complex phase using the Mapping generated from the Apo phase, things crash (see stack trace below).

To Reproduce
Steps to reproduce the behavior (ideally a minimally reproducible example):

Using these example files:
apo-mapping-fails-files.tar.gz

The apo mapping was made using essentially the following codeblock (using the example files i have attached):

from pdbfixer import pdbfixer
import pandas as pd
import numpy as np
from kartograf import KartografAtomMapper
from gufe import ProteinComponent
from gufe.tokenization import JSON_HANDLER

pdbfixer = PDBFixer("./structures/apo-structure-for-mapping-p61a.pdb")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.applyMutations(["PRO-61-ALA"], "A")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.addMissingAtoms()
pdbfixer.addMissingHydrogens(7.0)

from openmm.app import PDBFile
omm_top = pdbfixer.topology
omm_pos = pdbfixer.positions
with open("./structures/mutated_dimer_P61A.pdb", "w") as out_file:
    PDBFile.writeFile(omm_top, omm_pos, out_file)

atom_mapper = KartografAtomMapper(map_exact_ring_matches_only=False, atom_map_hydrogens=True)
mutation_string = "P61A"
# Read the starting apo pdb
initial_comp = ProteinComponent.from_pdb_file("structures/apo-structure-for-mapping-p61a.pdb")
# Read the final mutated apo pdb
final_comp = ProteinComponent.from_pdb_file(f"structures/mutated_dimer_{mutation_string}.pdb")
# Generate mappings
mapping = next(atom_mapper.suggest_mappings(initial_comp, final_comp))

mappings_dir = "." 
os.makedirs(mappings_dir, exist_ok=True)
with open(f"{mappings_dir}/dimer_{mutation_string}.json", "w") as out_file:
    mapping.to_json(out_file)

Then, the Complex phase structures can be generating using essentially the same codeblock, but with slightly different inputs:

from pdbfixer import PDBFixer
pdbfixer = PDBFixer("./structures/dimer-wild-type.pdb")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.applyMutations(["PRO-61-ALA"], "A")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.addMissingAtoms()
pdbfixer.addMissingHydrogens(7.0)

from openmm.app import PDBFile
omm_top = pdbfixer.topology
omm_pos = pdbfixer.positions
with open("./structures/dimer-mutant.pdb.pdb", "w") as out_file:
    PDBFile.writeFile(omm_top, omm_pos, out_file)

The apo mapping can then be used to setup the DAG with the complex phase (as I understand) using the setup-complex-phase-dag.py script provided.

Then the protocol DAG is executed using:

python run-protocol-complex-dag.py --protocol-dags-dir .

Software versions
This environment is run on Ubuntu 20.04 using the environment provided in PALE:

Output
Stack Trace of the error:

Executing protocol dag.                                                                                                                         
/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/openff/amber_ff_ports/amber_ff_ports.py:8: UserWarning: pkg_resources is deprecate
d as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30
. Refrain from using this package or pin to Setuptools<81.                                                                                      
  from pkg_resources import resource_filename                                                                                                   
Traceback (most recent call last):                                                                                                              
  File "/home/sukrit/work/her2-p61r-pale/run_protocol_dag_complex_p61a.py", line 71, in <module>                                                
    protocol_result_dag = execute_DAG(protocol_dag_deserialized, keep_shared=True, shared_basedir=results_path, scratch_basedir=results_path)   
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   
  File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/gufe/protocols/protocoldag.py", line 417, in execute_DAG                  
    result = unit.execute(context=context, raise_error=raise_error, **inputs)                                                                   
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                   
  File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/gufe/protocols/protocolunit.py", line 322, in execute                     
    outputs = self._execute(context, **inputs)                                                                                                  
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                  
  File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/feflow/protocols/nonequilibrium_cycling.py", line 314, in _execute        
    hybrid_factory = HybridTopologyFactory(                                                                                                     
                     ^^^^^^^^^^^^^^^^^^^^^^                                                                                                     
  File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/feflow/utils/hybrid_topology.py", line 238, in __init__                   
    self._hybrid_topology = self._create_mdtraj_topology()                                                                                      
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                      
  File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/feflow/utils/hybrid_topology.py", line 2640, in _create_mdtraj_topology   
    first_mapped_old_atom_index = mapped_old_atom_indices[0]                                                                                    
                                  ~~~~~~~~~~~~~~~~~~~~~~~^^^                                                                                    
IndexError: list index out of range

Expected behavior
The code generating the DAG should just print out Executing protocol dag, or provide some indication that the Apo mapping is inappropriate for use with this complex phase (although it would be useful to know why that would be the case here).

Additional context
This is my first time trying to run a Complex phase protein mutation transformation using FEFlow (all previous Apo phase DAGs have run a CycleUnit fine), so I'm entirely open to the possibility that I should not be using the Apo mapping, or doing something else!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions