Skip to content

feat: Open-source virtual screening module (replaces legacy CatVS/screen) #69

@ericchansen

Description

@ericchansen

Summary

Build an open-source virtual screening module that enables users to take a Q2MM-optimized TSFF and screen libraries of catalyst/ligand candidates to predict enantioselectivity -- without any proprietary software.

Background

The legacy Q2MM included a powerful screening workflow (scripts/screen/ and smiles_to_catvs/) that:

  1. Prepared structures from SMILES via PubChem 3D lookup
  2. Merged ligand/substrate/reaction template structures using SMARTS pattern matching
  3. Set up conformational searches (MacroModel COMP/CHIG/TORS/RCA4 commands)
  4. Ran conformational sampling via MacroModel (bmin)
  5. Eliminated redundant conformers and ranked by TS energy
  6. Predicted enantioselectivity from Boltzmann-weighted TS energy differences

This was one of Q2MM's killer applications -- you optimize a TSFF once, then screen hundreds of ligand variants in hours instead of running DFT on each one.

The problem: Every step depended on Schrodinger's proprietary .mae format, schrodinger.structutils, and MacroModel. The legacy code (scripts/screen/merge.py at 903 lines, setup_com_from_mae.py at 446 lines, etc.) is being removed as part of the open-source modernization.

Proposed Open-Source Stack

Legacy (Schrodinger) Modern (Open Source)
.mae file format Mol2, SDF, PDB via RDKit
schrodinger.structutils.analyze RDKit substructure matching (Chem.MolFromSmarts)
MacroModel merge.py (903 lines) RDKit AllChem.ConstrainedEmbed() or custom overlay
MacroModel conformational search (bmin) CREST, RDKit ETKDG, or OpenMM + enhanced sampling
MacroModel energy evaluation OpenMM with Q2MM-optimized TSFF
MacroModel redundant conformer elimination RDKit Chem.rdMolAlign.GetBestRMS() + clustering

Proposed API

from q2mm.screening import VirtualScreen, CatalystLibrary

# Load the optimized TSFF
ff = ForceField.from_mm3_fld("optimized.fld")

# Build a catalyst library from SMILES
library = CatalystLibrary.from_smiles([
    ("BINAP", "c1ccc2c(c1)-c1ccccc1C2c1cccc2ccccc12"),
    ("SEGPHOS", "..."),
    # or from a file
])

# Set up the screen
screen = VirtualScreen(
    forcefield=ff,
    reaction_template=reaction_mol,   # Q2MMMolecule with TS geometry
    engine=OpenMMEngine(),
    conformer_method="etkdg",         # or "crest"
    n_conformers=100,
)

# Run screening
results = screen.run(library)

# Analyze
for entry in results.ranked():
    print(f"{entry.name}: ddG = {entry.ddg_kcal:.1f}, ee = {entry.predicted_ee:.0f}%")

Key Components

1. Structure Merging (replaces merge.py)

  • SMARTS-based atom mapping between template and candidate
  • 3D overlay using matched atoms as constraints
  • Handle enantiomer generation (mirror x-coordinates, as legacy did)

2. Conformational Sampling

  • RDKit ETKDG: Fast, good for initial ensemble generation
  • CREST (optional): GFN2-xTB-based, better coverage for flexible systems
  • OpenMM MD: Use the TSFF itself for short MD + clustering

3. Energy Evaluation

  • Use OpenMMEngine with the optimized TSFF
  • Minimize each conformer with constrained TS-defining atoms
  • Compute Boltzmann-weighted energies

4. Selectivity Prediction

  • Delta-delta-G from competing TS energies
  • Boltzmann populations across conformers
  • Predicted ee% with confidence intervals

Acceptance Criteria

  • CatalystLibrary.from_smiles() generates 3D structures via RDKit
  • Structure merging via SMARTS pattern matching (no Schrodinger)
  • Conformational search with at least one open-source method
  • Energy evaluation pipeline with OpenMM
  • Boltzmann-weighted ee% prediction
  • End-to-end test: known catalyst system with expected selectivity
  • Tutorial / example notebook demonstrating the workflow
  • No proprietary dependencies

References

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions