Pipeline for benchmarking a variety of bulk RNA‑seq deconvolution tools using single‑cell references, orchestrated with Nextflow.
- PREPARE (subworkflow)
PREPROCESS- Reads bulk (.csv/.tsv/.h5ad) and sc (.h5ad)
- Removes zero‑variance genes and aligns genes (intersection between bulk and single-cell)
- DECONVOLVE (subworkflow)
DECONVOLUTIONper selected tool- Runs tool wrapper with
--bulkand--sc - Writes predictions CSV (one per tool)
- Runs tool wrapper with
- ASSESS (subworkflow)
EVALUATE- Merges predictions
- Evaluates vs ground truth (if available)
VISUALIZE- Creates figures from evaluations
- Creates figures using ground truth (if available)
- Required: a samplesheet CSV with header
bulk,sc_ref,truthwheretruthis optional. - Data expectations:
- bulk: samples × genes matrix
- sc_ref:
.h5adwithobs['cell_type']- Optional GrooD features: may use
obs['individual']and/orobs['condition']if present and target set in its wrapper
- Optional GrooD features: may use
- Naming:
sc_refcell‑type names and ground‑truth cell types should ideally align.
| Parameter | Required | Types | Notes |
|---|---|---|---|
| bulk | yes | .csv/.tsv/.h5ad | samples × genes |
| sc_ref | yes | .h5ad | must contain obs['cell_type'] |
| truth | optional | .csv/.tsv | samples × proportions |
With ground truth file:
bulk,sc_ref,truth
/path/to/bulk1.h5ad,/path/to/sc_ref.h5ad,/path/to/ground_truth.tsv
/path/to/bulk2.h5ad,/path/to/sc_ref.h5ad,/path/to/ground_truth.tsvWithout ground truth file (leave blank):
bulk,sc_ref,truth
/path/to/bulk1.h5ad,/path/to/sc_ref.h5ad,
/path/to/bulk2.h5ad,/path/to/sc_ref.h5ad,nextflow run main.nf \
--input /path/to/samplesheet.csv \
--output_dir /path/to/output \
--tools BayesPrism,DWLS,GrooD,MuSiC,Scaden| Parameter | Default | Description |
|---|---|---|
| input | ./samplesheet.csv | Path to samplesheet CSV |
| output_dir | ./results/ | Output directory root |
| tools | BayesPrism,DWLS,GrooD,MuSiC,Scaden | One or more tool names (comma-separated). Not case-sensitive. |
Set tool‑specific parameters in wrappers under ./bin/tools/ (e.g. bayesprism.R, dwls.py, grood.py, music.R, scaden.py).
# Core
- Nextflow ≥ 23
- OpenJDK 17
- Conda/Mamba or Docker
# Internal dependencies
- Conda/Mamba: ./env/deconv-benchmark-conda.yml
- Docker: ./env/deconv-benchmark-docker.yml# Create env from .yml
conda env create -f ./env/deconv-benchmark-conda.yml -n deconv-benchmark
# Activate env
conda activate deconv-benchmark# Create env from .yml
mamba env create -f ./env/deconv-benchmark-conda.yml -n deconv-benchmark
# Activate env
mamba activate deconv-benchmark# Build container
docker build -t deconv-benchmark:latest -f ./env/Dockerfile .
# Run pipeline with Docker profile
nextflow run main.nf -profile docker \
--input /path/to/samplesheet.csv \
--output_dir /path/to/output \
--tools BayesPrism,DWLS,GrooD,MuSiC,Scaden- Results are grouped by a tag derived from input basenames:
<bulkBase>_<scBase>_<truthBase|notruth>. - Key files:
evaluation/evaluation.csv: long‑format table of metrics per tool/cell typeevaluation/predictions_merged.csv: wide-format table of merged predictionsevaluation/summary_stats.csv: summary statistics across metricsfigures/*.png|*.svg: plots from evaluations
results/
└── Bulk_SingleCell_GroundTruth/
├── evaluation/
│ ├── evaluation.csv
│ ├── predictions_merged.csv
│ └── summary_stats.csv
├── figures/
│ ├── annotated_heatmap.png
│ ├── annotated_heatmap.svg
│ └── ...
├── predictions/
│ ├── bayesprism.csv
│ ├── dwls.csv
│ ├── music.csv
│ ├── scaden.csv
│ └── grood.csv
└── preprocessed/
├── bulk_preprocessed.h5ad
└── sc_preprocessed.h5ad
| Method | License | Citation |
|---|---|---|
| BayesPrism | free (GPL 3.0) | Chu, T., Wang, Z., Pe’er, D. et al. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat Cancer 3, 505–517 (2022). https://doi.org/10.1038/s43018-022-00356-3 |
| DWLS | free (GPL) | Tsoucas, D., Dong, R., Chen, H., Zhu, Q., Guo, G., & Yuan, G.-C. (2019). Accurate estimation of cell-type composition from gene expression data. Nature Communications, 10(1), 2975. https://doi.org/10.1038/s41467-019-10802-z |
| GrooD | TBD | TBD |
| MuSiC | free (GPL 3.0) | Wang, X., Park, J., Susztak, K., Zhang, N. R., & Li, M. (2019). Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nature Communications, 10(1), 380. https://doi.org/10.1038/s41467-018-08023-x |
| Scaden | free (MIT) | Menden, K., Marouf, M., Oller, S., Dalmia, A., Kloiber, K., Heutink, P., & Bonn, S. Deep-learning-based cell composition analysis from tissue expression profiles. https://doi.org/10.1101/659227 |
- Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. https://doi.org/10.1038/nbt.3820