Machine learning–guided optimization of ultrasound-assisted extraction parameters for maximizing bioactive compound yield.
- Use Latin Hypercube Sampling (LHS) to design experiments, ensure representative data coverage. The generation is fully data driven and parameterized through
config.json.- List each variable with its
[min, max]range inconfig.json. - A number of
total_samplesLatin Hypercube samples were generated to cover the parameter space. - Generate all
$2^n$ boundary combinations. - Construct a Latin Hypercube Sampling (LHS) design in
ndimensions. - A diverse subset with a number of
pretest_sizepoints was selected using a MaxMin (greedy space-filling heuristic) algorithm for initial experiments (Set = 1), with the remaining reserved for follow-up (Set = 2).- MaxMin: at each step, selecting the point that maximizes the minimum distance to the already chosen points.
- Boundary combinations are always included in Set 1.
- Outcome visualized with seaborn.pairplot.
- List each variable with its
- Use k-nearest neighbors algorithm (k-NN) regression to model and optimize the parameter space, eg. enzyme concentration, ultrasound temperature, time, and power.(work in progress)
numpypandaspyDOEscikit-learnmatplotlibseaborn
pip install -r requirements.txtThis project was supported by guidance from Prof. Yinsheng Zhang at Zhejiang Gongshang University, whose feedback greatly improved the analysis. I also gratefully acknowledge Prof. Lina Chen's lab at NJMU for sharing the raw chemical data.