Skip to content

Sinha-CompBio-Lab/Cell2Mice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cell2Mice

Cell2Mice asks: how should we evaluate in vitro assays for drug testing in aging research and which assays are most informative?

Our long-term goal is to develop an in vitro assay for drug testing in aging research. We propose that a useful in vitro assay should carry information predictive of the same drug's outcome in vivo. In order to test this, there are two parts needed: the outcome of a drug in vivo and the same drug tested in vitro. Our project will be done on drugs that contains both sets of information.

In order to initially test this, we extracted information from: the NIA Interventions Testing Program (ITP), the largest uniform in vivo drug-testing program for lifespan extension in mice, and JUMP Cell Painting Consortium, a large-scale phenotypic screen capturing over 116,000 post-drug treatment cell imaging in U2OS cells.

Our study design tests whether cell imaging features extracted from drug-treated cells contain information to predict the drug efficacy in mice in the ITP.


Summarized Findings

  • We tested whether JUMP Cell Painting morphological features predict ITP lifespan-extension labels (log Wang–Allison odds ratio) for compounds present in both datasets.
  • After overlapping JUMP with ITP and removing compounds without WA OR labels, n = 29 compounds were available for modeling. Sex-specific models were built separately (male and female).
  • Two feature sets were tested: 440 CellProfiler features and 737 Harmony batch-corrected features which were both preprocessed and made available by the Broad Institute. Both gave qualitatively similar results.
  • Single-feature Spearman correlations with log(WA OR) were approximately symmetric and centered near zero in both sexes for both feature sets, with no individual feature standing out.
  • Ridge regression with per-fold top-100 feature selection (top 50 positive, top 50 negative ρ), evaluated by 3-fold stratified CV (quartile-balanced) over 1000 iterations, yielded mean test-set Spearman ρ of approximately 0.2 (male) and < 0 (female).
  • We interpret this as a weak male signal and no female signal, both inconclusive at this n.

Data

  • JUMP Cell Painting (JCP2022): Compound profiles for U2OS cells, downloaded from JUMP's remote Parquet store via DuckDB (code/get_JUMP_features.py). Two feature sets used:
    • CellProfiler features (pycytominer-cleaned), 440 features. Standard JUMP preprocessing (low-variance feature removal, etc.) followed by pycytominer's correlated-feature cleanup.
    • Harmony batch-corrected features, 737 features.
  • NIA ITP: Lifespan outcomes per compound, per sex, per cohort/dose. Labels are log-transformed Wang–Allison odds ratios (code/lifespan_labels.py).
  • Compound overlap: Established by InChIKey matching (exact and 14-character connectivity-prefix fuzzy match). 32 compounds overlap JUMP and ITP; 3 lack a usable WA OR label, leaving n = 29 for modeling. JUMP profiles are at a single dose per compound; for compounds with multiple ITP doses, the maximum tested dose was used.

Methods

Compound-level profiles. JUMP provides data at the well level. To obtain a single profile per compound, we averaged across all wells for that compound.

Single-feature analysis. For each feature, Spearman ρ was computed against log(WA OR), separately for male and female ITP outcomes.

Model. Ridge regression on the top 100 features (top 50 by positive ρ, top 50 by negative ρ). Feature selection was performed inside each CV fold on the training set only and avoiding leakage into test set.

Evaluation. 3-fold CV, stratified by WA OR quartile to keep label distribution balanced across folds. Repeated for 1000 iterations with different fold assignments. LOOCV, where each compound is held out once. Evaluated as Spearman ρ over the pooled n = 29 (predicted, true) pairs. Test-set Spearman ρ is the primary metric.

Null model. A within-feature shuffle null was generated by independently permuting the values of each feature column across compounds, then running the same training / CV / feature-selection pipeline 1000 times. This destroys both feature–label and feature–feature correlations, providing a strong null against which real performance is compared.

Sex-specific models. Male and female outcomes were modeled separately.

Feature sets. The full pipeline was run on both the 440-feature pycytominer-cleaned CellProfiler set and the 737-feature Harmony-corrected set.


Results

Note: Downstream analysis focuses on Harmony features, there are folders exploring use of CellProfiler features.

Harmony Batch Effect Exploration

PCA and UMAP of harmony-corrected well profiles show sources well-mixed across the embedding, with only minor residual structure. The two clusters visible in both PCA and UMAP are driven by simvastatin: one cluster is almost entirely simvastatin wells, the other contains every other compound and not investigated further here.

Coloring by Wang–Allison outcome shows no separation between lifespan-extending and non-extending compounds.

Single-feature correlations

For both feature sets and both sexes, the distribution of feature-wise Spearman ρ values against log(WA OR) was approximately symmetric and centered near zero. No feature emerged as clearly separated from the rest of the distribution.

Feature set Sex n features Mean ρ Median ρ Min Max
Harmony-corrected Male 737 −0.045 −0.033 −0.591 +0.504
Harmony-corrected Female 737 −0.009 −0.014 −0.452 +0.504

Ridge regression — 3-fold CV Mean test Spearman ρ across 1000 iterations of quartile-stratified 3-fold CV (Harmony features):

  • Male: ~+0.2 (pooled-prediction ρ = +0.217). The real distribution of test ρ values is shifted to the right of the within-feature-shuffle null distribution (real centered ~+0.2; null centered ~−0.1).
  • Female: ~−0.12. Real and null distributions overlap.

Ridge regression — LOOCV Pooled-prediction Spearman ρ over the n = 29 LOOCV predictions:

  • Male: +0.184
  • Female: −0.214

Per-iteration prediction outputs and metadata are in predictions/.


Interpretation

We read these results as weak male signal, no female signal, both inconclusive at this n. Specifically:

  • We are not claiming that JUMP Cell Painting morphology predicts ITP lifespan extension.
  • We are not claiming it does not. n = 29 is too small to support either conclusion. This is a record of what has been tried.

Limitations

  • Small n. 29 compounds is underpowered.
  • Cell-line. U2OS osteosarcoma cells are an immortalized cancer cell line. Whether morphological features in this background carry signal about in-vivo aging is itself part of what Cell2Mice is testing.
  • Acute vs. chronic exposure. JUMP exposes cells to compounds for ~48 h. ITP feeds compounds chronically over months to years. The pharmacology being read out is qualitatively different.
  • Dose mismatch. JUMP screening concentrations are not matched to ITP feeding doses; we used the max ITP dose per compound for labeling, which is a simplification.
  • Fuzzy InChIKey matches. 19 of the 32 overlapping compounds (~59%) were matched between JUMP and ITP via the 14-character InChIKey connectivity prefix rather than the full InChIKey.

Reproducing

Run order below: Setup environment, data files and downloads next, then notebooks that consume them.

Environment

Clone the repository, create a virtual environment, and install dependencies:

  • python3 -m venv .venv && source .venv/bin/activate (Windows: .venv\Scripts\activate)
  • pip install -r requirements.txt
    Add extras locally as needed (for example RDKit or pycytominer if you run notebooks or scripts that import them).

Data files and download scripts

  1. ITP combined survival CSV — Used by code/build_itp_wang_allison_table.ipynb, code/CellProfiler_Feature_Exploration/cellprofiler_feature_analysis.ipynb, and code/Harmony_Features/harmony_feature_analysis.ipynb. Place or symlink an export at data/raw/NIA_ITP_Lifespan_Data/ITP_survival.csv
    or change the path in those notebooks to match your file. To build a fresh combined export from NIA pages, run **code/crawl_ITP_retrieval.py** (see script docstring / usage).
  2. JUMP ∩ ITP overlap — Notebook **code/get_jump_compound_overlap.ipynb**. Regenerates the overlap CSVs (aging_jump_overlap_exact.csv, aging_jump_overlap_fuzzy.csv, aging_no_jump_match.csv) used under data/processed/. Run from the repository root (or adjust save paths in the notebook) so outputs land next to the other processed tables.
  3. Wang–Allison merge — Notebook **code/build_itp_wang_allison_table.ipynb**. Reads data/processed/NIA_ITP_JUMP_Overlap.xlsx plus the combined survival CSV and writes **data/processed/NIA_ITP_JUMP_WangAllison.xlsx**, which downstream modeling notebooks expect.
  4. JUMP feature manifest — File **manifest_aging_jump_profiles.json**. Notebooks resolve it under **data/processed/features/** first, then **data/legacy/processed/features/** if you keep an older copy in legacy.
  5. JUMP compound profiles (remote Parquet) — Script **code/get_JUMP_features.py**. Pulls pycytominer-cleaned CellProfiler (interpretable) and Harmony-corrected (harmony) compound profiles for the overlap list into **data/processed/features/** (see script --help; default overlap input is data/processed/NIA_ITP_JUMP_Overlap.xlsx). Requires duckdb, pandas, openpyxl as noted in the script header.

Analysis and modeling notebooks

  1. Single-feature / EDA (survival CSV + processed feature tables):
  • code/CellProfiler_Feature_Exploration/cellprofiler_feature_analysis.ipynb
  • code/Harmony_Features/harmony_feature_analysis.ipynb
  1. Batch / embedding-style exploration (optional; same inputs as feature notebooks):
  • code/CellProfiler_Feature_Exploration/CellProfiler_batch_exploration.ipynb
  • code/Harmony_Features/Harmony_batch_exploration.ipynb
  1. Cross-validated models and nulls → predictions/:
  • code/CellProfiler_Feature_Exploration/cellprofiler_model_testing.ipynb
  • code/Harmony_Features/harmony_model_testing.ipynb
  1. Summaries of saved predictions (read from predictions/):
  • code/CellProfiler_model_predictions_exploration/explore_cellprofiler_performance.ipynb
  • code/Harmony_model_predictions_exploration/explore_Harmony_performance.ipynb
  • code/Harmony_model_predictions_exploration/explore_Harmony_predictions.ipynb

About

Testing whether imaging features from drug-treated cells from JUMP Cell Painting (U2OS cells) predict compound-induced lifespan extension in the NIA ITP. Initial inconclusive results at n = 29 compounds

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors