-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Describe the bug
Context: I am trying to run a protein mutation protocol on a protein protein interaction (Apo phase = protein monomer, Complex phase = Protein dimer). In the Apo phase, I have successfully generated a JSON of the mapping, a DAG, and even run a full CycleUnit of the transformation in the Apo phase (which is great!).
I was told that I should use the same Mapping from the Apo phase when doing the complex phase.
Issue: However, upon running DAG on the complex phase using the Mapping generated from the Apo phase, things crash (see stack trace below).
To Reproduce
Steps to reproduce the behavior (ideally a minimally reproducible example):
Using these example files:
apo-mapping-fails-files.tar.gz
The apo mapping was made using essentially the following codeblock (using the example files i have attached):
from pdbfixer import pdbfixer
import pandas as pd
import numpy as np
from kartograf import KartografAtomMapper
from gufe import ProteinComponent
from gufe.tokenization import JSON_HANDLER
pdbfixer = PDBFixer("./structures/apo-structure-for-mapping-p61a.pdb")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.applyMutations(["PRO-61-ALA"], "A")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.addMissingAtoms()
pdbfixer.addMissingHydrogens(7.0)
from openmm.app import PDBFile
omm_top = pdbfixer.topology
omm_pos = pdbfixer.positions
with open("./structures/mutated_dimer_P61A.pdb", "w") as out_file:
PDBFile.writeFile(omm_top, omm_pos, out_file)
atom_mapper = KartografAtomMapper(map_exact_ring_matches_only=False, atom_map_hydrogens=True)
mutation_string = "P61A"
# Read the starting apo pdb
initial_comp = ProteinComponent.from_pdb_file("structures/apo-structure-for-mapping-p61a.pdb")
# Read the final mutated apo pdb
final_comp = ProteinComponent.from_pdb_file(f"structures/mutated_dimer_{mutation_string}.pdb")
# Generate mappings
mapping = next(atom_mapper.suggest_mappings(initial_comp, final_comp))
mappings_dir = "."
os.makedirs(mappings_dir, exist_ok=True)
with open(f"{mappings_dir}/dimer_{mutation_string}.json", "w") as out_file:
mapping.to_json(out_file)
Then, the Complex phase structures can be generating using essentially the same codeblock, but with slightly different inputs:
from pdbfixer import PDBFixer
pdbfixer = PDBFixer("./structures/dimer-wild-type.pdb")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.applyMutations(["PRO-61-ALA"], "A")
pdbfixer.findMissingResidues()
pdbfixer.findMissingAtoms()
pdbfixer.addMissingAtoms()
pdbfixer.addMissingHydrogens(7.0)
from openmm.app import PDBFile
omm_top = pdbfixer.topology
omm_pos = pdbfixer.positions
with open("./structures/dimer-mutant.pdb.pdb", "w") as out_file:
PDBFile.writeFile(omm_top, omm_pos, out_file)
The apo mapping can then be used to setup the DAG with the complex phase (as I understand) using the setup-complex-phase-dag.py script provided.
Then the protocol DAG is executed using:
python run-protocol-complex-dag.py --protocol-dags-dir .
Software versions
This environment is run on Ubuntu 20.04 using the environment provided in PALE:
Output
Stack Trace of the error:
Executing protocol dag.
/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/openff/amber_ff_ports/amber_ff_ports.py:8: UserWarning: pkg_resources is deprecate
d as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30
. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import resource_filename
Traceback (most recent call last):
File "/home/sukrit/work/her2-p61r-pale/run_protocol_dag_complex_p61a.py", line 71, in <module>
protocol_result_dag = execute_DAG(protocol_dag_deserialized, keep_shared=True, shared_basedir=results_path, scratch_basedir=results_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/gufe/protocols/protocoldag.py", line 417, in execute_DAG
result = unit.execute(context=context, raise_error=raise_error, **inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/gufe/protocols/protocolunit.py", line 322, in execute
outputs = self._execute(context, **inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/feflow/protocols/nonequilibrium_cycling.py", line 314, in _execute
hybrid_factory = HybridTopologyFactory(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/feflow/utils/hybrid_topology.py", line 238, in __init__
self._hybrid_topology = self._create_mdtraj_topology()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sukrit/anaconda3/envs/her2/lib/python3.12/site-packages/feflow/utils/hybrid_topology.py", line 2640, in _create_mdtraj_topology
first_mapped_old_atom_index = mapped_old_atom_indices[0]
~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Expected behavior
The code generating the DAG should just print out Executing protocol dag, or provide some indication that the Apo mapping is inappropriate for use with this complex phase (although it would be useful to know why that would be the case here).
Additional context
This is my first time trying to run a Complex phase protein mutation transformation using FEFlow (all previous Apo phase DAGs have run a CycleUnit fine), so I'm entirely open to the possibility that I should not be using the Apo mapping, or doing something else!