A colour prediction workflow for biosynthetically produced dyes and pigments.
A code repository to reproduce the results published by Karlov et al.
- Poetry is required to run the package I used version (1.8.3)
- Clone the repo
git clone git@github.com:Colorifix/dyedactic_public.git - Download a release of XTB executable (6.6.1 was tested) and make sure executable is in your $PATH
- Make sure you are in the root directory
cd DyeDactic_public - Please run
poetry installto install dependencies - Then any script can be launched using
poetry run python /path/to/script.py
src/- directory contains all functions and classes from 3D structure generation to colour estimation.src/convert_spectrum_to_colour.py- contains functions to convert absorption spectra to RGB colours together with a test runsrc/optimize_xtb.py- has wrapping functions to use XTB external program to optimise geometry, calculate energies, and HOMO-LUMO gapssrc/orca_inputs.py- a class for generating ORCA (5.0.3) input files to optimise geometry and do TD-DFT calculationssrc/tauromers.py- tautomer enumeration functions which use euristics to prune tautomer generation tree and estimate energies using XTB energiessrc/utils.py- miscellaneous helper functionsdata/- contains the collected data set of natural colourants, calculated QM descriptors, TD-DFT results, and experimental absorption spectradata/pigments.csv- a csv file containing the database of collected natural compounds with experimental data and referencesdata/pigment_pH_SI.csv- contains experimental absorption spectra for 4 colourants explored in the paper (emodin, quinalizarin, biliverdin, and orcein) at different pH levelsinputs/- directory for input files for ORCA calculationsmpnn_training/- a specified folder for chemprop based neural network model to predict absorption lowest light absorption energiesmpnn_training/data/- a directory for raw and clean training datampnn_training/data/20210205_all_expt_data_no_duplicates_solvent_calcs.csv- a training set provided by Greenman et al.mpnn_training/data/data_all.csv- a csv file for MPNN training data (90% of natural and 90% of artificial colourants together after split), validation, and test data sets with transitions and solventmpnn_training/data/test_natural.csv- a natural colourant test set (10% of the collected set) solely to estimate prediction errormpnn_training/hyperopt- parameters and NN weightsmpnn_training/chemprop_hyperopt.pya script to run hyperparameter optimisation (training is done using GPU)mpnn_training/predict.py- a prediction script which runs with test_natural.csv by defaultmpnn_training/prepare_dataset.ipynb- prepare a train/test spilt for MPNN training and clean the initial data from outliers; to run the natural compound split TD-DFT calculations hav to be done
biliverdin_colour_vs_pH.py- a script for biliverdin halochromicity visualisation based pKa, transition energies and oscillator strengthsemodin_colour_vs_pH.py- a script for emodin halochromicity visualisation based pKa, transition energies and oscillator strengthsquinalizarin_colour_vs_pH.py- a script for quinalizarin halochromicity visualisation based pKa, transition energies and oscillator strengthsorcein_colour_vs_pH.py- a script for orcein halochromicity visualisation based pKa, transition energies and oscillator strengthsgenerate_inputs.py- a script to generate inputs files for ORCA and xyz coordinatesplot_experimental_spectra_SI.py- takes experimental spectra in csv format, prints, and converts to corresponding colours