A Nextflow implementation of ResolVI and scVIVA for comprehensive spatial transcriptomics analysis.
nf-core/parallax is a comprehensive bioinformatics pipeline that integrates ResolVI (Resolution of Variational Inference) and scVIVA (single-cell Variational Inference for Variational Analysis) for advanced spatial transcriptomics data analysis. This dual-method approach provides state-of-the-art noise correction, cell type prediction, and niche-aware differential expression analysis.
- ResolVI: Deep generative model for technical noise correction in spatial transcriptomics data, particularly effective for platforms like 10x Xenium
- scVIVA: Deep generative model for spatial transcriptomics that incorporates both cell-intrinsic and neighboring gene expression patterns for niche-aware differential expression analysis
The pipeline takes spatialdata zarr stores (typically generated by the SOPA processing pipeline) as input and performs:
- Data preprocessing with flexible annotation label support
- ResolVI model training with GPU acceleration for noise correction and cell type prediction
- Differential abundance (DA) analysis for spatial niche composition
- scVIVA model training for niche-aware analysis using ResolVI predictions
- Niche-aware differential expression analysis with spatial context
- Comprehensive visualization including spatial plots, UMAP embeddings, and comparative analyses
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.
The pipeline consists of six main steps organized in two parallel workflows for maximum efficiency:
- RESOLVI_PREPROCESS: Converts spatialdata zarr stores to AnnData format with flexible annotation labels
- RESOLVI_TRAIN: Trains ResolVI model, generates corrected counts, and produces cell type predictions
- RESOLVI_ANALYZE: Performs differential abundance analysis for spatial niche composition
- RESOLVI_VISUALIZE: Creates comprehensive visualizations (runs in parallel with scVIVA)
- SCVIVA_TRAIN: Trains scVIVA model using ResolVI cell type predictions for niche-aware analysis
- SCVIVA_ANALYZE: Performs niche-aware differential expression analysis with spatial context
This parallel architecture maximizes computational efficiency while providing complementary analyses:
- ResolVI: Focuses on noise correction and cell type prediction + spatial abundance analysis
- scVIVA: Leverages ResolVI predictions for advanced niche-aware differential expression
-
Install
Nextflow(>=23.04.0) -
Install any of
Docker,Singularity(you can follow this tutorial),Podman,ShifterorCharliecloudfor full pipeline reproducibility (you can useCondaboth to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs). -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/parallax -profile test,docker --outdir <OUTDIR>
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (
YOURPROFILEin the example command above). You can chain multiple config profiles in a comma-separated string.- The pipeline comes with config profiles called
docker,singularity,podman,shifter,charliecloudandcondawhich instruct the pipeline to use the named tool for software management. For example,-profile test,docker. - Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute.
- If you are using
singularity, please use thenf-core downloadcommand to download images first, before running the pipeline. Set theNXF_SINGULARITY_CACHEDIRorsingularity.cacheDirNextflow options to be able to store and re-use the images from a central location for future pipeline runs. - If you are using
conda, it is highly recommended to use theNXF_CONDA_CACHEDIRorconda.cacheDirsettings to store the environments in a central location for future pipeline runs.
- The pipeline comes with config profiles called
-
Start running your own analysis!
nextflow run nf-core/parallax --input samplesheet.csv --outdir <OUTDIR> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 2 columns, and a header row as shown in the examples below.
sample_id,zarr_path
sample1,/path/to/sample1_spatialdata.zarr
sample2,/path/to/sample2_spatialdata.zarr
sample3,/path/to/sample3_spatialdata.zarr| Column | Description |
|---|---|
sample_id |
Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (_). |
zarr_path |
Full path to spatialdata zarr store file. File should be generated by SOPA processing pipeline or compatible spatial transcriptomics preprocessing tool. |
An example samplesheet has been provided with the pipeline.
You can optionally provide a list of marker genes for focused analysis and visualization. This can be done in two ways:
-
As a comma-separated string (recommended):
--marker_genes "EPCAM,CD3D,CD68,COL1A1,PECAM1" -
As a text file (one gene per line):
--marker_genes /path/to/marker_genes.txt
Example marker_genes.txt format:
EPCAM CD3D CD68 COL1A1 PECAM1
-
As a comma-separated list in your config file:
params { marker_genes = ['EPCAM', 'CD3D', 'CD68', 'COL1A1', 'PECAM1'] }
If no marker genes are specified, the pipeline will analyze all genes present in the dataset.
You can provide custom comparison specifications for both differential abundance and niche-aware differential expression analyses:
--da_comparisons /path/to/da_comparisons.jsonJSON format example:
[
{
"name": "tcells_vs_bcells",
"group1": "T cells",
"group2": "B cells"
},
{
"name": "epithelial_vs_stromal",
"group1": "Epithelial",
"group2": "Stromal"
}
]--scviva_comparisons /path/to/scviva_comparisons.jsonThe scVIVA workflow supports flexible condition specifications for niche-aware differential expression analysis. Conditions can be established through multiple approaches to enable sophisticated spatial comparisons.
Compare different cell types without specific conditions:
[
{
"name": "tcells_vs_bcells_niche",
"group1": "T cells",
"group2": "B cells"
}
]Compare the same cell type across different treatments or disease states:
[
{
"name": "treated_vs_control_tcells",
"group1": "T cells",
"group2": "T cells",
"condition1": "treated",
"condition2": "control",
"condition_column": "treatment"
}
]Compare cell types or conditions across different spatial regions:
[
{
"name": "core_vs_periphery_macrophages",
"group1": "Macrophages",
"group2": "Macrophages",
"condition1": "tumor_core",
"condition2": "tumor_periphery",
"condition_column": "spatial_region"
}
]Map specific samples to conditions when metadata isn't directly encoded:
[
{
"name": "tumor_vs_normal_epithelial",
"group1": "Epithelial",
"group2": "Epithelial",
"condition1": "tumor",
"condition2": "normal",
"condition_column": "tissue_type",
"samples_condition1": ["tumor_sample1", "tumor_sample2", "tumor_sample3"],
"samples_condition2": ["normal_sample1", "normal_sample2", "normal_sample3"]
}
]Compare different cell types across different conditions:
[
{
"name": "tumor_tcells_vs_normal_bcells",
"group1": "T cells",
"group2": "B cells",
"condition1": "tumor",
"condition2": "normal",
"condition_column": "tissue_type"
}
]The pipeline automatically detects and establishes conditions using the following priority order:
- Explicit condition column: Uses the column specified in
condition_column - Sample mapping: Creates conditions from
samples_condition1andsamples_condition2lists - Spatial regions: Uses
spatial_regioncolumn if available - Common metadata columns: Automatically detects
treatment,tissue_type, etc. - Sample ID fallback: Uses
sample_idas conditions if no other method works
Ensure your AnnData objects contain the necessary metadata:
- Cell type predictions:
resolvi_predicted(automatically generated by ResolVI) - Sample information:
sample_id(automatically added during preprocessing)
condition: General condition labelstreatment: Treatment/control labelstissue_type: Tissue or disease state labelsspatial_region: Spatial region annotationstimepoint: Temporal conditions- Custom condition columns as specified in your comparisons
[
{
"name": "comparison_name",
"group1": "Cell_Type_1",
"group2": "Cell_Type_2",
"condition1": "condition_A",
"condition2": "condition_B",
"condition_column": "metadata_column",
"samples_condition1": ["sample1", "sample2"],
"samples_condition2": ["sample3", "sample4"]
}
]name,group1,group2,condition1,condition2,condition_column
tcells_vs_bcells,T cells,B cells,,,
treated_vs_control_tcells,T cells,T cells,treated,control,treatmentnextflow run nf-core/parallax \
--input samplesheet.csv \
--scviva_comparisons simple_comparisons.json \
--outdir resultsnextflow run nf-core/parallax \
--input samplesheet.csv \
--scviva_comparisons complex_comparisons.json \
--annotation_label cell_type \
--outdir results- Minimum cell numbers: Ensure each comparison group has ≥10 cells for reliable DE analysis
- Balanced comparisons: Try to have reasonably balanced group sizes when possible
- Clear naming: Use descriptive comparison names for easier interpretation
- Condition validation: Verify that your condition columns exist in the data
- Sample mapping: When using sample mapping, ensure sample IDs match exactly
- "Cannot establish conditions": Check that specified condition columns exist in your data
- "Insufficient cells": Reduce the number of comparisons or combine similar conditions
- "Sample not found": Verify sample IDs in
samples_condition1/2match your data exactly
The pipeline logs detailed information about:
- Available metadata columns
- Condition establishment methods used
- Cell counts for each comparison group
- Success/failure status for each comparison
--annotation_label: Column name for cell type annotations (default: 'cell_type')--max_epochs: Maximum number of training epochs for ResolVI model (default: 100)--num_samples: Number of posterior samples for uncertainty quantification (default: 20)--num_gpus: Number of GPUs to use for training (default: auto-detect, 0 for CPU-only, -1 for all available)--da_comparisons: JSON or CSV file specifying differential abundance comparisons
--scviva_max_epochs: Maximum number of training epochs for scVIVA model (default: 100)--scviva_comparisons: JSON or CSV file specifying niche-aware DE comparisons
--max_memory: Maximum memory allocation (default: '128.GB')--max_cpus: Maximum CPU cores (default: 16)--max_time: Maximum runtime per job (default: '240.h')
The pipeline generates comprehensive output organized in the following directory structure:
results/
├── preprocessing/ # Preprocessed AnnData files
├── resolvi_training/ # Trained ResolVI models and processed data
├── resolvi_analysis/ # ResolVI analyzed data and DA results
├── resolvi_visualization/ # ResolVI plots and visualizations
├── scviva_training/ # Trained scVIVA models and processed data
├── scviva_analysis/ # scVIVA niche-aware DE results
├── differential_abundance/ # Spatial niche composition analysis
├── niche_differential_expression/ # Niche-aware DE analysis results
├── plots/ # Combined visualization outputs
├── multiqc/ # MultiQC report
└── pipeline_info/ # Pipeline execution information
- Trained Models:
resolvi_training/resolvi_model/- Saved ResolVI models - Corrected Data:
resolvi_analysis/*_analyzed.h5ad- AnnData with corrected counts and predictions - DA Results:
differential_abundance/da_*.csv- Differential abundance analysis results - Spatial Plots:
plots/resolvi_spatial_*.png- Spatial distribution plots
- Trained Models:
scviva_training/scviva_model/- Saved scVIVA models - Niche-aware Data:
scviva_analysis/*_analyzed.h5ad- AnnData with niche-aware analysis - DE Results:
niche_differential_expression/de_*.csv- Niche-aware DE analysis results - Niche Plots:
plots/scviva_niche_*.png- Niche-aware visualization plots
- UMAP Plots:
plots/umap_*.png- UMAP embeddings showing predictions vs. annotations - Gene Expression:
plots/gene_*_comparison.png- Before/after correction comparisons - Method Comparison:
plots/resolvi_vs_scviva_*.png- Comparative analysis plots
The pipeline supports GPU acceleration for both ResolVI and scVIVA model training, which can significantly reduce training time:
- Auto-detection: Leave
--num_gpusunspecified for automatic GPU detection - Specific GPU count: Use
--num_gpus Nto use N GPUs - All available GPUs: Use
--num_gpus -1 - CPU-only: Use
--num_gpus 0
Both ResolVI and scVIVA training processes use the same GPU configuration for consistency and optimal resource utilization.
Ensure your execution environment has appropriate GPU drivers and CUDA support when using GPU acceleration.
- Dual-method approach: Combines ResolVI's noise correction with scVIVA's niche-aware analysis
- Flexible annotation labels: Support for any cell type annotation column name
- Parallel processing: Maximum computational efficiency with parallel workflows
- Custom comparisons: User-defined comparison specifications for both DA and DE analyses
- Noise decomposition: Separates true signal, diffusion, and background components
- Cell type prediction: Semi-supervised learning with uncertainty quantification
- Spatial niche analysis: Differential abundance of cell types in spatial contexts
- Niche-aware DE: Differential expression analysis considering spatial neighborhoods
- Rich visualizations: Spatial plots, UMAP embeddings, and comparative analyses
The nf-core/parallax pipeline comes with documentation about the pipeline:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
ResolVI is a deep generative model specifically designed for spatial transcriptomics data that:
- Corrects technical noise while preserving biological signal
- Decomposes noise sources into true signal, diffusion, and background components
- Predicts cell types using semi-supervised learning
- Maintains spatial context for downstream spatial analysis
scVIVA is a deep generative model for spatial transcriptomics that:
- Incorporates neighborhood information for niche-aware analysis
- Performs differential expression considering spatial context
- Leverages cell-intrinsic and neighboring gene expression patterns
- Provides niche-specific insights beyond traditional DE methods
The combination of ResolVI and scVIVA provides:
- Comprehensive noise correction (ResolVI) followed by niche-aware analysis (scVIVA)
- Cell type predictions from ResolVI inform spatial neighborhood analysis in scVIVA
- Complementary analyses: Differential abundance (ResolVI) and niche-aware DE (scVIVA)
- Maximum efficiency: Parallel processing after initial ResolVI training
nf-core/parallax was originally written by Christopher Tastad.
We thank the following people for their extensive assistance in the development of this pipeline:
- The ResolVI development team for the original method implementation
- The scVIVA development team for the niche-aware analysis method
- The nf-core community for framework and best practices
- The SOPA development team for spatialdata preprocessing tools
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #parallax channel (you can join with this invite).
If you use nf-core/parallax for your analysis, please cite:
ResolVI: Resolution of Variational Inference for spatial transcriptomics
DOI: doi.org/10.1101/2025.01.20.634005 Nature Biotechnology 2022 Feb 07. doi: 10.1038/s41587-021-01206-w
scVIVA: single-cell Variational Inference for Variational Analysis of spatial transcriptomics
DOI: 10.1101/2025.06.01.657182 Nature Biotechnology 2022 Feb 07. doi: 10.1038/s41587-021-01206-w
Squidpy: a scalable framework for spatial omics analysis
Nature Methods 2022 Feb DOI: doi.org/10.1038/s41592-021-01358-2
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
