Summary
Build an open-source virtual screening module that enables users to take a Q2MM-optimized TSFF and screen libraries of catalyst/ligand candidates to predict enantioselectivity -- without any proprietary software.
Background
The legacy Q2MM included a powerful screening workflow (scripts/screen/ and smiles_to_catvs/) that:
- Prepared structures from SMILES via PubChem 3D lookup
- Merged ligand/substrate/reaction template structures using SMARTS pattern matching
- Set up conformational searches (MacroModel COMP/CHIG/TORS/RCA4 commands)
- Ran conformational sampling via MacroModel (
bmin)
- Eliminated redundant conformers and ranked by TS energy
- Predicted enantioselectivity from Boltzmann-weighted TS energy differences
This was one of Q2MM's killer applications -- you optimize a TSFF once, then screen hundreds of ligand variants in hours instead of running DFT on each one.
The problem: Every step depended on Schrodinger's proprietary .mae format, schrodinger.structutils, and MacroModel. The legacy code (scripts/screen/merge.py at 903 lines, setup_com_from_mae.py at 446 lines, etc.) is being removed as part of the open-source modernization.
Proposed Open-Source Stack
| Legacy (Schrodinger) |
Modern (Open Source) |
.mae file format |
Mol2, SDF, PDB via RDKit |
schrodinger.structutils.analyze |
RDKit substructure matching (Chem.MolFromSmarts) |
MacroModel merge.py (903 lines) |
RDKit AllChem.ConstrainedEmbed() or custom overlay |
MacroModel conformational search (bmin) |
CREST, RDKit ETKDG, or OpenMM + enhanced sampling |
| MacroModel energy evaluation |
OpenMM with Q2MM-optimized TSFF |
| MacroModel redundant conformer elimination |
RDKit Chem.rdMolAlign.GetBestRMS() + clustering |
Proposed API
from q2mm.screening import VirtualScreen, CatalystLibrary
# Load the optimized TSFF
ff = ForceField.from_mm3_fld("optimized.fld")
# Build a catalyst library from SMILES
library = CatalystLibrary.from_smiles([
("BINAP", "c1ccc2c(c1)-c1ccccc1C2c1cccc2ccccc12"),
("SEGPHOS", "..."),
# or from a file
])
# Set up the screen
screen = VirtualScreen(
forcefield=ff,
reaction_template=reaction_mol, # Q2MMMolecule with TS geometry
engine=OpenMMEngine(),
conformer_method="etkdg", # or "crest"
n_conformers=100,
)
# Run screening
results = screen.run(library)
# Analyze
for entry in results.ranked():
print(f"{entry.name}: ddG = {entry.ddg_kcal:.1f}, ee = {entry.predicted_ee:.0f}%")
Key Components
1. Structure Merging (replaces merge.py)
- SMARTS-based atom mapping between template and candidate
- 3D overlay using matched atoms as constraints
- Handle enantiomer generation (mirror x-coordinates, as legacy did)
2. Conformational Sampling
- RDKit ETKDG: Fast, good for initial ensemble generation
- CREST (optional): GFN2-xTB-based, better coverage for flexible systems
- OpenMM MD: Use the TSFF itself for short MD + clustering
3. Energy Evaluation
- Use
OpenMMEngine with the optimized TSFF
- Minimize each conformer with constrained TS-defining atoms
- Compute Boltzmann-weighted energies
4. Selectivity Prediction
- Delta-delta-G from competing TS energies
- Boltzmann populations across conformers
- Predicted ee% with confidence intervals
Acceptance Criteria
References
Summary
Build an open-source virtual screening module that enables users to take a Q2MM-optimized TSFF and screen libraries of catalyst/ligand candidates to predict enantioselectivity -- without any proprietary software.
Background
The legacy Q2MM included a powerful screening workflow (
scripts/screen/andsmiles_to_catvs/) that:bmin)This was one of Q2MM's killer applications -- you optimize a TSFF once, then screen hundreds of ligand variants in hours instead of running DFT on each one.
The problem: Every step depended on Schrodinger's proprietary
.maeformat,schrodinger.structutils, and MacroModel. The legacy code (scripts/screen/merge.pyat 903 lines,setup_com_from_mae.pyat 446 lines, etc.) is being removed as part of the open-source modernization.Proposed Open-Source Stack
.maefile formatschrodinger.structutils.analyzeChem.MolFromSmarts)merge.py(903 lines)AllChem.ConstrainedEmbed()or custom overlaybmin)Chem.rdMolAlign.GetBestRMS()+ clusteringProposed API
Key Components
1. Structure Merging (replaces
merge.py)2. Conformational Sampling
3. Energy Evaluation
OpenMMEnginewith the optimized TSFF4. Selectivity Prediction
Acceptance Criteria
CatalystLibrary.from_smiles()generates 3D structures via RDKitReferences
scripts/screen/(removed),smiles_to_catvs/(upstream only)