Skip to content

Colorifix/DyeDactic

Repository files navigation

DyeDactic: learning colours


License: AGPL v3

A colour prediction workflow for biosynthetically produced dyes and pigments.
A code repository to reproduce the results published by Karlov et al.

Command line set up

  • Poetry is required to run the package I used version (1.8.3)
  • Clone the repo git clone git@github.com:Colorifix/dyedactic_public.git
  • Download a release of XTB executable (6.6.1 was tested) and make sure executable is in your $PATH
  • Make sure you are in the root directory cd DyeDactic_public
  • Please run poetry install to install dependencies
  • Then any script can be launched using poetry run python /path/to/script.py

Description of package files

Data and source files

  • src/ - directory contains all functions and classes from 3D structure generation to colour estimation.
  • src/convert_spectrum_to_colour.py - contains functions to convert absorption spectra to RGB colours together with a test run
  • src/optimize_xtb.py - has wrapping functions to use XTB external program to optimise geometry, calculate energies, and HOMO-LUMO gaps
  • src/orca_inputs.py - a class for generating ORCA (5.0.3) input files to optimise geometry and do TD-DFT calculations
  • src/tauromers.py - tautomer enumeration functions which use euristics to prune tautomer generation tree and estimate energies using XTB energies
  • src/utils.py - miscellaneous helper functions
  • data/ - contains the collected data set of natural colourants, calculated QM descriptors, TD-DFT results, and experimental absorption spectra
  • data/pigments.csv - a csv file containing the database of collected natural compounds with experimental data and references
  • data/pigment_pH_SI.csv - contains experimental absorption spectra for 4 colourants explored in the paper (emodin, quinalizarin, biliverdin, and orcein) at different pH levels
  • inputs/ - directory for input files for ORCA calculations
  • mpnn_training/ - a specified folder for chemprop based neural network model to predict absorption lowest light absorption energies
  • mpnn_training/data/ - a directory for raw and clean training data
  • mpnn_training/data/20210205_all_expt_data_no_duplicates_solvent_calcs.csv - a training set provided by Greenman et al.
  • mpnn_training/data/data_all.csv - a csv file for MPNN training data (90% of natural and 90% of artificial colourants together after split), validation, and test data sets with transitions and solvent
  • mpnn_training/data/test_natural.csv - a natural colourant test set (10% of the collected set) solely to estimate prediction error
  • mpnn_training/hyperopt - parameters and NN weights
  • mpnn_training/chemprop_hyperopt.py a script to run hyperparameter optimisation (training is done using GPU)
  • mpnn_training/predict.py - a prediction script which runs with test_natural.csv by default
  • mpnn_training/prepare_dataset.ipynb - prepare a train/test spilt for MPNN training and clean the initial data from outliers; to run the natural compound split TD-DFT calculations hav to be done

Scripts for image generation and input preparation

  • biliverdin_colour_vs_pH.py - a script for biliverdin halochromicity visualisation based pKa, transition energies and oscillator strengths
  • emodin_colour_vs_pH.py - a script for emodin halochromicity visualisation based pKa, transition energies and oscillator strengths
  • quinalizarin_colour_vs_pH.py - a script for quinalizarin halochromicity visualisation based pKa, transition energies and oscillator strengths
  • orcein_colour_vs_pH.py - a script for orcein halochromicity visualisation based pKa, transition energies and oscillator strengths
  • generate_inputs.py - a script to generate inputs files for ORCA and xyz coordinates
  • plot_experimental_spectra_SI.py - takes experimental spectra in csv format, prints, and converts to corresponding colours

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published