Skip to content

MauricioCafiero/CafChem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

385 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CafChem - Libraries for computational chemistry/drug design research.

See below for sample notebooks for various computation and medicinal chemistry, machine learning and AI research tools.

Some basics and background material

Generative models for molecules

Machine Learning for chemistry

Protein / Ligand interactions

Protein Models

Medicinal Chemistry

Quantum Chemistry

LLMs for Medchem

1Solvation (adding explicit waters and optimizing) available in ReDock and QM_UMA

Install:

git clone https://github.com/MauricioCafiero/CafChem.git

import CafChem.CafChemGPT as ccgpt
import CafChem.CafChemRNN as ccrnn
import CafChem.CafChemTxGemma as cctxg
import CafChemSkipDense as ccsd
import CafChem.CafChemHFClassifier as cchf
import CafChem.CafChemBoltz as ccb
import CafChem.CafChemQM_UMA as ccqm
import CafChem.CafChemEleon as ccel
import CafChem.CafChemProp as ccp
import CafChem.CafChemSubs as ccs
import CafChem.CafChemReDock as ccr
import CafChem.CafChemBML as ccml
import CafChem.CafChemFragGrow as ccfg
import CafChem.CafChemMLPPyTorch as ccmlp
import CafChem.CafChemPsi4 as ccp4

CafChemGPT

  • example notebook
  • Train a GPT on a SMILES dataset. Use the tools provided to generate novel molecules.
  • Using a provided foundation model, finetune with a specific dataset for targeted molecule generation.
  • This also uses the CafChemGPTINF module for inference.

CafChemRNN

  • example notebook
  • Train an RNN on a SMILES dataset. Use the tools provided to generate novel molecules.
  • Using a provided foundation model, finetune with a specific dataset for targeted molecule generation.

CafChemSubs

  • example notebook
  • generate analogues of a molecule (from SMILES strings) using generative mask-filling and/or substitutions on phenyl rings.
  • Can also calculate some properties (QED, Lipinski properties) related to drug design.
  • Calculate Tanimoto similarities based on Fingerprints between molecules in a list and molecules against a known active.
  • visualize molecules.

CafChemFragGrow

  • example notebook
  • Explore a binding site with chemical fragments.
  • Various viewing options to probe the nature of the binding site.

CafChemBML

  • example notebook
  • read ChEMBL CSV files and clean data.
  • featurize data, remove outliers, scale, apply PCA and split into training ad validation sets.
  • perform analysis with tree-based methods, linear methods, SVR, and MLP.

CafChemSkipDense

  • example notebook
  • Create regression and classification models using skipdense neural networks.
  • Train, save, load and evaluate models.

CafChemMLPPyTorch

  • example notebook
  • Featurize a dataset and
  • Train an MLP using Pytorch.
  • Evaluate, predict with, save and load models.

CafChemClassifiers

  • example notebook
  • Create a classifier model using a variety of SciKitLearn models.
  • Load a CSV with quantitative data and create classes.
  • Tree-based models, Logistic Regression, Support Vector Machines, Ridge, MLP.
  • Analyze data with confusion matrices.

CafChemBoost

  • example notebook
  • Featurize SMILES data with RDKit, Mordred or Fingerprints
  • Perform classification or regression.
  • XGBoost, LightGBM, and CatBoost.
  • Evaluate models.

CafChemEDA

  • example notebook
  • Calculate RDKit or Mordred features, or fingerprints for a set of molecules.
  • Use PCA or t-SNE to reduce feature dimensionality to 2 and view in a plot.
  • Perform autoviz analysis.

CafChemHFClassifier

  • example notebook
  • Create a classifier model using HuggingFace.
  • Analyze data with confusion matrices.
  • Load datasets, add tokens, train, push all to the HuggingFace hub.

CafChemProp

CafChemEleon

CafChemALDataBuild

  • example notebook
  • Use active learning and a gaussian process regressor to build up a dataset to a desired accuracy.
  • export the dataset at the end.

CafChemAutoDockVina

CafChemReDock

  • example notebook
  • dock molecular SMILES strings in a protein using DockString and save poses.
  • Calculate the interaction between a docking pose and a trimmed protein active site using Meta's UMA MLIP.
  • visualize molecules.

CafChemBoltz

  • example notebook
  • Input a protein sequence and a list of SMILES strings.
  • Co-fold the protein/ligand pairs using Boltz2, extract the structures and predict IC50.

CafChemODDT

  • example notebook
  • Use various methods to compare molecules from SDFs
  • find all interactions between a protein (PDB file) and a ligand (SDF file)

CafChemPDBFixer

  • example notebook
  • use PDB fixer to prepare a PDB file for docking or MD
  • treats both proteins and ligands
  • use the output from this notebook to create PDBQT files with obabel.

CafChemAlphaFold

  • example notebook
  • Colabfold version of Alphafold2, lightly adapted for CafChem.
  • Citations to original work in the notebook.

CafChemESMFold

  • example notebook
  • Colabfold version of ESMfold, lightly adapted for CafChem.
  • Citations to original work in the notebook.

CafChemProteinMaskEmbed

  • example notebook
  • use the ESM model to mask a protein and generate novel proteins via masking-filling.
  • Calculate ESM embeddings and use them to find cosine similarity.

CafChemProteinGPT

  • example notebook
  • Train or finetune a GPT on protein data.
  • download specific protein data from Uniprot
  • generate novel proteins with GPT models

CafChemESM

CafChemPySCF

  • example notebook
  • Run HF, DFT, MP2 and CCSD(T) calculations
  • Implicit solvent, TDDFT, Molecular Dynamics

CafChemQM_UMA

  • example notebook
  • Uses ASE to implement calculations using Meta's UMA MLIP.
  • perform energy calculations, geometry optimizations, vibrational calculations, and thermodynamics calculations.
  • Calculate a reaction Gibbs, Enthalpy and Entropy.
  • Perform simple dynamics. (Langevin works, Velocity Verlet seems a bit buggy)

CafChemSkala

  • example notebook
  • Implements the Microsoft Skala DFT functional in ASE. Also includes LDA, PBE, and TPSS.
  • Includes several def2 basis sets.
  • Calculate energy, geometry, dipole, vibrational frequencies.

CafChemPharm

  • example notebook
  • Generate a defined number of conformers for a list of molecules.
  • Test pharmacophore features of a single or multiple conformers against a known active.

CafChemPK

  • example notebook.
  • predict human, monkey, dog and rat pharmacokinetic properties.

CafChemBl

  • example notebook
  • query Uniprot for protein IDs
  • query Chembl for bioactive molecules for the desired protein.

CafChemSKFP

  • example notebook
  • generate 2D and 3D features/fingerprints for molecules.
  • apply molecule filters
  • perform distance calculations between molecules.

CafChemTargets

CafChemPsi4

  • example notebook
  • Use the Psi4 code to run DFT energy and geometry optimization calculations.
  • Use SAPT on Psi4 to explore contributions to interaction energies.

CafChemAgent

CafChemEmbed

  • example notebook
  • Create a contrastive pairs dataset
  • Train an embedding model
  • Use embeddings for similarity calculations or features for regression

CafChemTxGemma

  • example notebook
  • Inference with TxGemma models.
  • These models have been finetuned to answer many types of medicinal chemistry questions.
  • Finetune a TxGemma model on your own medchem dataset

CafChemEther0

  • example notebook
  • Inference with the Ether0 model.
  • This model has been finetuned to answer many types of medicinal chemistry questions. (see the notebook for use cases).

About

Libraries/modules for the CafChem tools for computational chemistry/drug design.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors