Skip to content

Latest commit

 

History

History
47 lines (34 loc) · 1.62 KB

File metadata and controls

47 lines (34 loc) · 1.62 KB

CellDivider — Prediction Models

This folder contains model code and artifacts used by the CellDivider prediction pipeline. The repository focuses on three primary modelling approaches used for phenotype prediction from processed expression features:

  • ElasticNet (regularized linear model)
  • Multilayer Perceptron (MLP) neural network
  • XGBoost (gradient-boosted trees)

Models

ElasticNet

  • Description: Linear regression with combined L1/L2 regularization (a mix of Lasso and Ridge). Useful as a baseline and for interpretable feature selection.
  • Key hyperparameters:
    • alpha (overall regularization strength)
    • l1_ratio (mix between L1 and L2 regularization)

Multilayer Perceptron (MLP)

  • Description: Feed-forward neural network with one or more hidden layers and non-linear activations.
  • Key hyperparameters:
    • hidden_dim (hidden layer sizes)
    • num_layers (number of hidden layers)
    • dropout_rate (regularization)
    • start_lr (learning rate for Adam optimizer)
    • batch size (dataloader batch size)

The main training code for the MLP can be found in mlp/train_mlp.py

XGBoost

  • Description: Gradient-boosted decision trees.
  • Key hyperparameters:
    • n_estimators (number of trees)
    • max_depth (maximum tree depth)
    • learning_rate (shrinkage)
    • subsample, colsample_bytree (row/column sampling for regularization)
    • gamma (regularization)

Installation

Activate your python enviroment, would recommend conda or venv.

pip install -r requirements.txt

If the GPU install doesn't work out of the box install pytorch for your GPU setup: https://pytorch.org/get-started/locally/