Deconvolution Benchmarking Pipeline

Pipeline for benchmarking a variety of bulk RNA‑seq deconvolution tools using single‑cell references, orchestrated with Nextflow.

Overview

PREPARE (subworkflow)
1. PREPROCESS
  1. Reads bulk (.csv/.tsv/.h5ad) and sc (.h5ad)
  2. Removes zero‑variance genes and aligns genes (intersection between bulk and single-cell)
DECONVOLVE (subworkflow)
1. DECONVOLUTION per selected tool
  1. Runs tool wrapper with --bulk and --sc
  2. Writes predictions CSV (one per tool)
ASSESS (subworkflow)
1. EVALUATE
  1. Merges predictions
  2. Evaluates vs ground truth (if available)
2. VISUALIZE
  1. Creates figures from evaluations
  2. Creates figures using ground truth (if available)

Data Inputs

Format

Required: a samplesheet CSV with header bulk,sc_ref,truth where truth is optional.
Data expectations:
- bulk: samples × genes matrix
- sc_ref: .h5ad with obs['cell_type']
  - Optional GrooD features: may use obs['individual'] and/or obs['condition'] if present and target set in its wrapper
Naming: sc_ref cell‑type names and ground‑truth cell types should ideally align.

Samplesheet Parameters

Parameter	Required	Types	Notes
bulk	yes	.csv/.tsv/.h5ad	samples × genes
sc_ref	yes	.h5ad	must contain `obs['cell_type']`
truth	optional	.csv/.tsv	samples × proportions

Samplesheet Examples

With ground truth file:

bulk,sc_ref,truth
/path/to/bulk1.h5ad,/path/to/sc_ref.h5ad,/path/to/ground_truth.tsv
/path/to/bulk2.h5ad,/path/to/sc_ref.h5ad,/path/to/ground_truth.tsv

Without ground truth file (leave blank):

bulk,sc_ref,truth
/path/to/bulk1.h5ad,/path/to/sc_ref.h5ad,
/path/to/bulk2.h5ad,/path/to/sc_ref.h5ad,

Usage

Example

nextflow run main.nf \
  --input /path/to/samplesheet.csv \
  --output_dir /path/to/output \
  --tools BayesPrism,DWLS,GrooD,MuSiC,Scaden

Parameters

Parameter	Default	Description
input	./samplesheet.csv	Path to samplesheet CSV
output_dir	./results/	Output directory root
tools	BayesPrism,DWLS,GrooD,MuSiC,Scaden	One or more tool names (comma-separated). Not case-sensitive.

Set tool‑specific parameters in wrappers under ./bin/tools/ (e.g. bayesprism.R, dwls.py, grood.py, music.R, scaden.py).

Getting Started

Prerequisites

# Core
- Nextflow ≥ 23
- OpenJDK 17
- Conda/Mamba or Docker

# Internal dependencies
- Conda/Mamba: ./env/deconv-benchmark-conda.yml
- Docker: ./env/deconv-benchmark-docker.yml

Installation

Conda

# Create env from .yml
conda env create -f ./env/deconv-benchmark-conda.yml -n deconv-benchmark

# Activate env
conda activate deconv-benchmark

Mamba

# Create env from .yml
mamba env create -f ./env/deconv-benchmark-conda.yml -n deconv-benchmark

# Activate env
mamba activate deconv-benchmark

Docker

# Build container
docker build -t deconv-benchmark:latest -f ./env/Dockerfile .

# Run pipeline with Docker profile
nextflow run main.nf -profile docker \
  --input /path/to/samplesheet.csv \
  --output_dir /path/to/output \
  --tools BayesPrism,DWLS,GrooD,MuSiC,Scaden

Outputs

Results are grouped by a tag derived from input basenames: <bulkBase>_<scBase>_<truthBase|notruth>.
Key files:
- evaluation/evaluation.csv: long‑format table of metrics per tool/cell type
- evaluation/predictions_merged.csv: wide-format table of merged predictions
- evaluation/summary_stats.csv: summary statistics across metrics
- figures/*.png|*.svg: plots from evaluations

Example Structure

results/
└── Bulk_SingleCell_GroundTruth/
    ├── evaluation/
    │   ├── evaluation.csv
    │   ├── predictions_merged.csv
    │   └── summary_stats.csv
    ├── figures/
    │   ├── annotated_heatmap.png
    │   ├── annotated_heatmap.svg
    │   └── ...
    ├── predictions/
    │   ├── bayesprism.csv
    │   ├── dwls.csv
    │   ├── music.csv
    │   ├── scaden.csv
    │   └── grood.csv
    └── preprocessed/
        ├── bulk_preprocessed.h5ad
        └── sc_preprocessed.h5ad

Citations

Tools

Method	License	Citation
BayesPrism	free (GPL 3.0)	Chu, T., Wang, Z., Pe’er, D. et al. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat Cancer 3, 505–517 (2022). https://doi.org/10.1038/s43018-022-00356-3
DWLS	free (GPL)	Tsoucas, D., Dong, R., Chen, H., Zhu, Q., Guo, G., & Yuan, G.-C. (2019). Accurate estimation of cell-type composition from gene expression data. Nature Communications, 10(1), 2975. https://doi.org/10.1038/s41467-019-10802-z
GrooD	TBD	TBD
MuSiC	free (GPL 3.0)	Wang, X., Park, J., Susztak, K., Zhang, N. R., & Li, M. (2019). Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nature Communications, 10(1), 380. https://doi.org/10.1038/s41467-018-08023-x
Scaden	free (MIT)	Menden, K., Marouf, M., Oller, S., Dalmia, A., Kloiber, K., Heutink, P., & Bonn, S. Deep-learning-based cell composition analysis from tissue expression profiles. https://doi.org/10.1101/659227

Other

Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. https://doi.org/10.1038/nbt.3820

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
env		env
modules		modules
subworkflows		subworkflows
test_data		test_data
PLOTS.md		PLOTS.md
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
samplesheet.csv		samplesheet.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deconvolution Benchmarking Pipeline

Overview

Data Inputs

Format

Samplesheet Parameters

Samplesheet Examples

Usage

Example

Parameters

Getting Started

Prerequisites

Installation

Conda

Mamba

Docker

Outputs

Example Structure

Citations

Tools

Other

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deconvolution Benchmarking Pipeline

Overview

Data Inputs

Format

Samplesheet Parameters

Samplesheet Examples

Usage

Example

Parameters

Getting Started

Prerequisites

Installation

Conda

Mamba

Docker

Outputs

Example Structure

Citations

Tools

Other

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages