This project provides a reaction prediction model based on a deep learning framework using graph neural networks (GNN). It predicts whether a given pair of reactants is likely to undergo a reaction. The model accepts input as pairs of SMILES strings and outputs the predicted reaction probability.
reaction-predictor/
├── predict.py # Command-line interface for prediction
├── train.py # Training script (optional, not required for inference)
├── RxnPred/ # Core model and data utilities
│ ├── model.py
│ ├── getTfRecord.py
│ └── ...
├── Model/
│ └── graph\_structure\_reaction/
│ ├── *.json # Model config files
│ ├── model\_*.ckpt # Trained model weights
│ └── isotonic\_model.joblib # Isotonic regression calibration model
├── demo.csv # Example input file
├── demo\_out.csv # Example output file
└── README.md # Documentation
- Clone the repository
git clone https://github.com/ZhuMetLab/ReactionPredictor.git
cd ReactionPredictor- Install dependencies
We recommend using a conda environment:
conda create -n RxnPred python=3.10
conda activate RxnPredpip install tensorflow==2.12.0 pandas tqdm joblib rdkit==2022.9.1 bayesian-optimization molmassInput should be a CSV file with two columns:
SMILES1: SMILES string of the first metaboliteSMILES2: SMILES string of the second metabolite
Example (demo.csv):
SMILES1,SMILES2
C(CC(=O)O)[C@@H](C(=O)O)N,CC(=O)N[C@@H](CCC(=O)O)C(=O)O
CCCCC,C(C(=O)O)N
CC(=O)SCCNC(=O)CCNC(=O)[C@@H](C(C)(C)COP(=O)(O)OP(=O)(O)OC[C@@H]1[C@H]([C@H]([C@@H](O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O,CC(C)(COP(=O)(O)OP(=O)(O)OC[C@@H]1[C@H]([C@H]([C@@H](O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)[C@H](C(=O)NCCC(=O)NCCS)O
ABCDE,C(C(=O)O)N
C(CS(=O)(=O)O)N,C(CS(=O)(=O)O)NC(=O)CO
C(CCNCCCN)CN,C(CCNCCCN)CNCCCN
C(CCNCCCN)CNCCCN,CCCCCC
C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O,C([C@H]([C@H]([C@@H]([C@H](C=O)O)O)O)O)OP(=O)(O)O
COCOCOCOCOCOCOCOCOCOCOC,C[C@H](CCCC(C)C)[C@H]1CC[C@@H]2[C@@]1(CC[C@H]3[C@H]2CC=C4[C@@]3(CC[C@@H](C4)O)C)C
C(C(=O)O)N,TiO2Note: Invalid or unparseable SMILES will be flagged as
ERRORin the output.
The output CSV will contain the same SMILES pairs with an additional score column indicating predicted reaction probability (0–1), or "ERROR" for invalid inputs.
You can run prediction from the command line:
python predict.py --input demo.csv --output demo_out.csvOptional arguments:
--batch_size: Batch size for prediction (default:64)--keep_tfrecord: If specified, temporary TFRecord files will be preserved
Example with options:
python predict.py --input demo.csv --output demo_out.csv --batch_size 128 --keep_tfrecord- Architecture: GNN-based model using graph convolutions and dense layers
- Input: Metabolite SMILES pairs
- Output: Reaction probability
- Input SMILES must be valid; otherwise the line will be marked as
"ERROR". - Temporary files (
.tfrecord) are deleted by default unless--keep_tfrecordis specified. - This repo currently supports prediction only. For training or model fine-tuning, refer to
train.py.
For questions, feel free to open an issue or contact the authors at [zhanghs@sioc.ac.cn].
This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
See LICENSE for details.
