🧬 Deep learning-based prognosis and diagnosis using Escherichia coli growth-coupled metabolic sensors
This repository contains the full codebase used for prognosis and diagnosis of COVID-19 based on Escherichia coli growth dynamics, as well as statistical analyses using Generalized Additive Mixed Models (GAMMs).
Preprint Deep learning-based prognosis and diagnosis using Escherichia coli growth-coupled metabolic sensors
This project explores multiple machine learning and statistical approaches to classify biological conditions from microbial growth data:
- Growth parameter-based models
- Time-series deep learning models
- Context-aware models (FiLM)
- Multistrain fusion strategies
- Statistical inference using GAMMs
All methods are evaluated under a nested cross-validation framework to ensure robust and unbiased performance estimates.
📂 Growth_parameter_based_classification/
-
Input: extracted growth parameters
-
Models:
- Support Vector Machine (SVM)
- Logistic Regression
- XGBoost
- Soft-voting ensemble
-
Main script:
growth_parameters_main.py
📂 Time_series_classification/
-
Input: raw growth curves
-
Models:
- 1D Convolutional Neural Network (CNN1D)
- Temporal Convolutional Network (TCN)
-
Optional features:
- First & second derivatives
- Channel-wise normalization
-
Scripts:
time_series_main.pytime_series_model.py
📂 FiLM/
-
Combines:
- Growth parameters → conditioning signal
- Growth curves → primary input
-
Mechanism:
- Feature-wise Linear Modulation (FiLM)
- Parameters generated via MLP
-
Models:
- FiLM-CNN1D
- FiLM-TCN
📂 Multistrain_models/
- Combines multiple strains as multi-channel input
- Optional FiLM conditioning per strain
- Independent models per strain
- Weighted soft-voting ensemble
- Weights learned per fold to avoid leakage
-
Nested cross-validation:
- 5 outer folds
- 3 inner folds
-
Patient-level splitting
-
Shared splits across all models
-
Optimization target: balanced accuracy
-
Hyperparameter tuning:
- Optuna (for time-series models)
📂 GAMM models/
- Based on Akaike Information Criterion (AIC)
- Basis dimension tuning via
k.check()
GAMM_all_mutants_model_comparison.RGAMM_selected_mutants_model_comparison.R
→ Select best model per strain and classification task
GAMM_all_mutants.RGAMM_selected_mutants.R
→ Perform statistical testing:
- Detect differences between conditions
- Identify significant time windows
- Compute confidence intervals
-
Fitted using
mgcv::bam()(v1.9.3) -
Includes:
- Simultaneous 95% confidence bands
- Pointwise variance estimation
FiLM/
GAMM models/
Growth_parameter_based_classification/
Multistrain_models/
Time_series_classification/
Each folder contains:
- Model scripts
- Data inputs (parameters or time series)
- Cross-validation splits
- RStudio-generated files (
.Rproj.user,.RData,.Rhistory) are excluded from version control - Excel files correspond to experimental datasets used in the study
- Splits are shared across all models to ensure fair comparison
-
Paul Ahavi 📧 paul.ahavi228@gmail.com
-
Jean-Loup Faulon 📧 jean-loup.faulon@inrae.fr