Skip to content

MaikTungsten/Deconvolution_benchmarking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deconvolution Benchmarking Pipeline

Pipeline for benchmarking a variety of bulk RNA‑seq deconvolution tools using single‑cell references, orchestrated with Nextflow.

Overview

  1. PREPARE (subworkflow)
    1. PREPROCESS
      1. Reads bulk (.csv/.tsv/.h5ad) and sc (.h5ad)
      2. Removes zero‑variance genes and aligns genes (intersection between bulk and single-cell)
  2. DECONVOLVE (subworkflow)
    1. DECONVOLUTION per selected tool
      1. Runs tool wrapper with --bulk and --sc
      2. Writes predictions CSV (one per tool)
  3. ASSESS (subworkflow)
    1. EVALUATE
      1. Merges predictions
      2. Evaluates vs ground truth (if available)
    2. VISUALIZE
      1. Creates figures from evaluations
      2. Creates figures using ground truth (if available)

Data Inputs

Format

  • Required: a samplesheet CSV with header bulk,sc_ref,truth where truth is optional.
  • Data expectations:
    • bulk: samples × genes matrix
    • sc_ref: .h5ad with obs['cell_type']
      • Optional GrooD features: may use obs['individual'] and/or obs['condition'] if present and target set in its wrapper
  • Naming: sc_ref cell‑type names and ground‑truth cell types should ideally align.

Samplesheet Parameters

Parameter Required Types Notes
bulk yes .csv/.tsv/.h5ad samples × genes
sc_ref yes .h5ad must contain obs['cell_type']
truth optional .csv/.tsv samples × proportions

Samplesheet Examples

With ground truth file:

bulk,sc_ref,truth
/path/to/bulk1.h5ad,/path/to/sc_ref.h5ad,/path/to/ground_truth.tsv
/path/to/bulk2.h5ad,/path/to/sc_ref.h5ad,/path/to/ground_truth.tsv

Without ground truth file (leave blank):

bulk,sc_ref,truth
/path/to/bulk1.h5ad,/path/to/sc_ref.h5ad,
/path/to/bulk2.h5ad,/path/to/sc_ref.h5ad,

Usage

Example

nextflow run main.nf \
  --input /path/to/samplesheet.csv \
  --output_dir /path/to/output \
  --tools BayesPrism,DWLS,GrooD,MuSiC,Scaden

Parameters

Parameter Default Description
input ./samplesheet.csv Path to samplesheet CSV
output_dir ./results/ Output directory root
tools BayesPrism,DWLS,GrooD,MuSiC,Scaden One or more tool names (comma-separated). Not case-sensitive.

Set tool‑specific parameters in wrappers under ./bin/tools/ (e.g. bayesprism.R, dwls.py, grood.py, music.R, scaden.py).

Getting Started

Prerequisites

# Core
- Nextflow ≥ 23
- OpenJDK 17
- Conda/Mamba or Docker

# Internal dependencies
- Conda/Mamba: ./env/deconv-benchmark-conda.yml
- Docker: ./env/deconv-benchmark-docker.yml

Installation

Conda

# Create env from .yml
conda env create -f ./env/deconv-benchmark-conda.yml -n deconv-benchmark

# Activate env
conda activate deconv-benchmark

Mamba

# Create env from .yml
mamba env create -f ./env/deconv-benchmark-conda.yml -n deconv-benchmark

# Activate env
mamba activate deconv-benchmark

Docker

# Build container
docker build -t deconv-benchmark:latest -f ./env/Dockerfile .

# Run pipeline with Docker profile
nextflow run main.nf -profile docker \
  --input /path/to/samplesheet.csv \
  --output_dir /path/to/output \
  --tools BayesPrism,DWLS,GrooD,MuSiC,Scaden

Outputs

  • Results are grouped by a tag derived from input basenames: <bulkBase>_<scBase>_<truthBase|notruth>.
  • Key files:
    • evaluation/evaluation.csv: long‑format table of metrics per tool/cell type
    • evaluation/predictions_merged.csv: wide-format table of merged predictions
    • evaluation/summary_stats.csv: summary statistics across metrics
    • figures/*.png|*.svg: plots from evaluations

Example Structure

results/
└── Bulk_SingleCell_GroundTruth/
    ├── evaluation/
    │   ├── evaluation.csv
    │   ├── predictions_merged.csv
    │   └── summary_stats.csv
    ├── figures/
    │   ├── annotated_heatmap.png
    │   ├── annotated_heatmap.svg
    │   └── ...
    ├── predictions/
    │   ├── bayesprism.csv
    │   ├── dwls.csv
    │   ├── music.csv
    │   ├── scaden.csv
    │   └── grood.csv
    └── preprocessed/
        ├── bulk_preprocessed.h5ad
        └── sc_preprocessed.h5ad

Citations

Tools

Method License Citation
BayesPrism free (GPL 3.0) Chu, T., Wang, Z., Pe’er, D. et al. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat Cancer 3, 505–517 (2022). https://doi.org/10.1038/s43018-022-00356-3
DWLS free (GPL) Tsoucas, D., Dong, R., Chen, H., Zhu, Q., Guo, G., & Yuan, G.-C. (2019). Accurate estimation of cell-type composition from gene expression data. Nature Communications, 10(1), 2975. https://doi.org/10.1038/s41467-019-10802-z
GrooD TBD TBD
MuSiC free (GPL 3.0) Wang, X., Park, J., Susztak, K., Zhang, N. R., & Li, M. (2019). Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nature Communications, 10(1), 380. https://doi.org/10.1038/s41467-018-08023-x
Scaden free (MIT) Menden, K., Marouf, M., Oller, S., Dalmia, A., Kloiber, K., Heutink, P., & Bonn, S. Deep-learning-based cell composition analysis from tissue expression profiles. https://doi.org/10.1101/659227

Other

  • Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. https://doi.org/10.1038/nbt.3820

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors