GitHub - ChoBioLab/parallax

A Nextflow implementation of ResolVI and scVIVA for comprehensive spatial transcriptomics analysis.

Introduction

nf-core/parallax is a comprehensive bioinformatics pipeline that integrates ResolVI (Resolution of Variational Inference) and scVIVA (single-cell Variational Inference for Variational Analysis) for advanced spatial transcriptomics data analysis. This dual-method approach provides state-of-the-art noise correction, cell type prediction, and niche-aware differential expression analysis.

Key Technologies

ResolVI: Deep generative model for technical noise correction in spatial transcriptomics data, particularly effective for platforms like 10x Xenium
scVIVA: Deep generative model for spatial transcriptomics that incorporates both cell-intrinsic and neighboring gene expression patterns for niche-aware differential expression analysis

The pipeline takes spatialdata zarr stores (typically generated by the SOPA processing pipeline) as input and performs:

Data preprocessing with flexible annotation label support
ResolVI model training with GPU acceleration for noise correction and cell type prediction
Differential abundance (DA) analysis for spatial niche composition
scVIVA model training for niche-aware analysis using ResolVI predictions
Niche-aware differential expression analysis with spatial context
Comprehensive visualization including spatial plots, UMAP embeddings, and comparative analyses

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.

Pipeline Summary

The pipeline consists of six main steps organized in two parallel workflows for maximum efficiency:

ResolVI Workflow

RESOLVI_PREPROCESS: Converts spatialdata zarr stores to AnnData format with flexible annotation labels
RESOLVI_TRAIN: Trains ResolVI model, generates corrected counts, and produces cell type predictions
RESOLVI_ANALYZE: Performs differential abundance analysis for spatial niche composition
RESOLVI_VISUALIZE: Creates comprehensive visualizations (runs in parallel with scVIVA)

scVIVA Workflow (runs in parallel after ResolVI training)

SCVIVA_TRAIN: Trains scVIVA model using ResolVI cell type predictions for niche-aware analysis
SCVIVA_ANALYZE: Performs niche-aware differential expression analysis with spatial context

This parallel architecture maximizes computational efficiency while providing complementary analyses:

ResolVI: Focuses on noise correction and cell type prediction + spatial abundance analysis
scVIVA: Leverages ResolVI predictions for advanced niche-aware differential expression

Quick Start

Install Nextflow (>=23.04.0)
Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).
Download the pipeline and test it on a minimal dataset with a single command:
```
nextflow run nf-core/parallax -profile test,docker --outdir <OUTDIR>
```
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.
- The pipeline comes with config profiles called docker, singularity, podman, shifter, charliecloud and conda which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.
- Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute.
- If you are using singularity, please use the nf-core download command to download images first, before running the pipeline. Set the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options to be able to store and re-use the images from a central location for future pipeline runs.
- If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs.

Start running your own analysis!

nextflow run nf-core/parallax --input samplesheet.csv --outdir <OUTDIR> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>

Input Requirements

Samplesheet

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 2 columns, and a header row as shown in the examples below.

sample_id,zarr_path
sample1,/path/to/sample1_spatialdata.zarr
sample2,/path/to/sample2_spatialdata.zarr
sample3,/path/to/sample3_spatialdata.zarr

Column	Description
`sample_id`	Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`).
`zarr_path`	Full path to spatialdata zarr store file. File should be generated by SOPA processing pipeline or compatible spatial transcriptomics preprocessing tool.

An example samplesheet has been provided with the pipeline.

Marker Genes

You can optionally provide a list of marker genes for focused analysis and visualization. This can be done in two ways:

As a comma-separated string (recommended):

--marker_genes "EPCAM,CD3D,CD68,COL1A1,PECAM1"

As a text file (one gene per line):

--marker_genes /path/to/marker_genes.txt

Example marker_genes.txt format:

EPCAM
CD3D
CD68
COL1A1
PECAM1

As a comma-separated list in your config file:

params {
    marker_genes = ['EPCAM', 'CD3D', 'CD68', 'COL1A1', 'PECAM1']
}

If no marker genes are specified, the pipeline will analyze all genes present in the dataset.

Comparison Specifications

You can provide custom comparison specifications for both differential abundance and niche-aware differential expression analyses:

Differential Abundance Comparisons

--da_comparisons /path/to/da_comparisons.json

JSON format example:

[
  {
    "name": "tcells_vs_bcells",
    "group1": "T cells",
    "group2": "B cells"
  },
  {
    "name": "epithelial_vs_stromal",
    "group1": "Epithelial",
    "group2": "Stromal"
  }
]

scVIVA Niche-aware DE Comparisons

--scviva_comparisons /path/to/scviva_comparisons.json

scVIVA Conditions Setup

The scVIVA workflow supports flexible condition specifications for niche-aware differential expression analysis. Conditions can be established through multiple approaches to enable sophisticated spatial comparisons.

Condition Types

1. Cell Type Comparisons (Simple)

Compare different cell types without specific conditions:

[
  {
    "name": "tcells_vs_bcells_niche",
    "group1": "T cells",
    "group2": "B cells"
  }
]

2. Treatment/Disease Conditions

Compare the same cell type across different treatments or disease states:

[
  {
    "name": "treated_vs_control_tcells",
    "group1": "T cells",
    "group2": "T cells",
    "condition1": "treated",
    "condition2": "control",
    "condition_column": "treatment"
  }
]

3. Spatial Region Conditions

Compare cell types or conditions across different spatial regions:

[
  {
    "name": "core_vs_periphery_macrophages",
    "group1": "Macrophages",
    "group2": "Macrophages",
    "condition1": "tumor_core",
    "condition2": "tumor_periphery",
    "condition_column": "spatial_region"
  }
]

4. Sample-Based Conditions

Map specific samples to conditions when metadata isn't directly encoded:

[
  {
    "name": "tumor_vs_normal_epithelial",
    "group1": "Epithelial",
    "group2": "Epithelial",
    "condition1": "tumor",
    "condition2": "normal",
    "condition_column": "tissue_type",
    "samples_condition1": ["tumor_sample1", "tumor_sample2", "tumor_sample3"],
    "samples_condition2": ["normal_sample1", "normal_sample2", "normal_sample3"]
  }
]

5. Cross-Condition Cell Type Comparisons

Compare different cell types across different conditions:

[
  {
    "name": "tumor_tcells_vs_normal_bcells",
    "group1": "T cells",
    "group2": "B cells",
    "condition1": "tumor",
    "condition2": "normal",
    "condition_column": "tissue_type"
  }
]

Condition Establishment Methods

The pipeline automatically detects and establishes conditions using the following priority order:

Explicit condition column: Uses the column specified in condition_column
Sample mapping: Creates conditions from samples_condition1 and samples_condition2 lists
Spatial regions: Uses spatial_region column if available
Common metadata columns: Automatically detects treatment, tissue_type, etc.
Sample ID fallback: Uses sample_id as conditions if no other method works

Required Data Preparation

Ensure your AnnData objects contain the necessary metadata:

Essential Columns

Cell type predictions: resolvi_predicted (automatically generated by ResolVI)
Sample information: sample_id (automatically added during preprocessing)

Optional Condition Columns

condition: General condition labels
treatment: Treatment/control labels
tissue_type: Tissue or disease state labels
spatial_region: Spatial region annotations
timepoint: Temporal conditions
Custom condition columns as specified in your comparisons

File Formats

JSON Format (Recommended)

[
  {
    "name": "comparison_name",
    "group1": "Cell_Type_1",
    "group2": "Cell_Type_2",
    "condition1": "condition_A",
    "condition2": "condition_B",
    "condition_column": "metadata_column",
    "samples_condition1": ["sample1", "sample2"],
    "samples_condition2": ["sample3", "sample4"]
  }
]

CSV Format

name,group1,group2,condition1,condition2,condition_column
tcells_vs_bcells,T cells,B cells,,,
treated_vs_control_tcells,T cells,T cells,treated,control,treatment

Usage Examples

Basic cell type comparison:

nextflow run nf-core/parallax \
  --input samplesheet.csv \
  --scviva_comparisons simple_comparisons.json \
  --outdir results

Complex condition-based analysis:

nextflow run nf-core/parallax \
  --input samplesheet.csv \
  --scviva_comparisons complex_comparisons.json \
  --annotation_label cell_type \
  --outdir results

Best Practices

Minimum cell numbers: Ensure each comparison group has ≥10 cells for reliable DE analysis
Balanced comparisons: Try to have reasonably balanced group sizes when possible
Clear naming: Use descriptive comparison names for easier interpretation
Condition validation: Verify that your condition columns exist in the data
Sample mapping: When using sample mapping, ensure sample IDs match exactly

Troubleshooting

Common Issues:

"Cannot establish conditions": Check that specified condition columns exist in your data
"Insufficient cells": Reduce the number of comparisons or combine similar conditions
"Sample not found": Verify sample IDs in samples_condition1/2 match your data exactly

Debug Information:

The pipeline logs detailed information about:

Available metadata columns
Condition establishment methods used
Cell counts for each comparison group
Success/failure status for each comparison

Key Parameters

Core ResolVI Parameters

--annotation_label: Column name for cell type annotations (default: 'cell_type')
--max_epochs: Maximum number of training epochs for ResolVI model (default: 100)
--num_samples: Number of posterior samples for uncertainty quantification (default: 20)
--num_gpus: Number of GPUs to use for training (default: auto-detect, 0 for CPU-only, -1 for all available)
--da_comparisons: JSON or CSV file specifying differential abundance comparisons

scVIVA Parameters

--scviva_max_epochs: Maximum number of training epochs for scVIVA model (default: 100)
--scviva_comparisons: JSON or CSV file specifying niche-aware DE comparisons

Resource Parameters

--max_memory: Maximum memory allocation (default: '128.GB')
--max_cpus: Maximum CPU cores (default: 16)
--max_time: Maximum runtime per job (default: '240.h')

Output

The pipeline generates comprehensive output organized in the following directory structure:

results/
├── preprocessing/              # Preprocessed AnnData files
├── resolvi_training/          # Trained ResolVI models and processed data
├── resolvi_analysis/          # ResolVI analyzed data and DA results
├── resolvi_visualization/     # ResolVI plots and visualizations
├── scviva_training/           # Trained scVIVA models and processed data
├── scviva_analysis/           # scVIVA niche-aware DE results
├── differential_abundance/    # Spatial niche composition analysis
├── niche_differential_expression/ # Niche-aware DE analysis results
├── plots/                     # Combined visualization outputs
├── multiqc/                   # MultiQC report
└── pipeline_info/             # Pipeline execution information

Key Output Files

ResolVI Outputs

Trained Models: resolvi_training/resolvi_model/ - Saved ResolVI models
Corrected Data: resolvi_analysis/*_analyzed.h5ad - AnnData with corrected counts and predictions
DA Results: differential_abundance/da_*.csv - Differential abundance analysis results
Spatial Plots: plots/resolvi_spatial_*.png - Spatial distribution plots

scVIVA Outputs

Trained Models: scviva_training/scviva_model/ - Saved scVIVA models
Niche-aware Data: scviva_analysis/*_analyzed.h5ad - AnnData with niche-aware analysis
DE Results: niche_differential_expression/de_*.csv - Niche-aware DE analysis results
Niche Plots: plots/scviva_niche_*.png - Niche-aware visualization plots

Comparative Visualizations

UMAP Plots: plots/umap_*.png - UMAP embeddings showing predictions vs. annotations
Gene Expression: plots/gene_*_comparison.png - Before/after correction comparisons
Method Comparison: plots/resolvi_vs_scviva_*.png - Comparative analysis plots

GPU Support

The pipeline supports GPU acceleration for both ResolVI and scVIVA model training, which can significantly reduce training time:

Auto-detection: Leave --num_gpus unspecified for automatic GPU detection
Specific GPU count: Use --num_gpus N to use N GPUs
All available GPUs: Use --num_gpus -1
CPU-only: Use --num_gpus 0

Both ResolVI and scVIVA training processes use the same GPU configuration for consistency and optimal resource utilization.

Ensure your execution environment has appropriate GPU drivers and CUDA support when using GPU acceleration.

Pipeline Features

Advanced Spatial Analysis

Dual-method approach: Combines ResolVI's noise correction with scVIVA's niche-aware analysis
Flexible annotation labels: Support for any cell type annotation column name
Parallel processing: Maximum computational efficiency with parallel workflows
Custom comparisons: User-defined comparison specifications for both DA and DE analyses

Comprehensive Output

Noise decomposition: Separates true signal, diffusion, and background components
Cell type prediction: Semi-supervised learning with uncertainty quantification
Spatial niche analysis: Differential abundance of cell types in spatial contexts
Niche-aware DE: Differential expression analysis considering spatial neighborhoods
Rich visualizations: Spatial plots, UMAP embeddings, and comparative analyses

Documentation

The nf-core/parallax pipeline comes with documentation about the pipeline:

Installation
Pipeline configuration
- Local installation
- Adding your own system config
Running the pipeline
Output and how to interpret the results
Troubleshooting

Methods Overview

ResolVI (Resolution of Variational Inference)

ResolVI is a deep generative model specifically designed for spatial transcriptomics data that:

Corrects technical noise while preserving biological signal
Decomposes noise sources into true signal, diffusion, and background components
Predicts cell types using semi-supervised learning
Maintains spatial context for downstream spatial analysis

scVIVA (single-cell Variational Inference for Variational Analysis)

scVIVA is a deep generative model for spatial transcriptomics that:

Incorporates neighborhood information for niche-aware analysis
Performs differential expression considering spatial context
Leverages cell-intrinsic and neighboring gene expression patterns
Provides niche-specific insights beyond traditional DE methods

Integrated Workflow Benefits

The combination of ResolVI and scVIVA provides:

Comprehensive noise correction (ResolVI) followed by niche-aware analysis (scVIVA)
Cell type predictions from ResolVI inform spatial neighborhood analysis in scVIVA
Complementary analyses: Differential abundance (ResolVI) and niche-aware DE (scVIVA)
Maximum efficiency: Parallel processing after initial ResolVI training

Credits

nf-core/parallax was originally written by Christopher Tastad.

We thank the following people for their extensive assistance in the development of this pipeline:

The ResolVI development team for the original method implementation
The scVIVA development team for the niche-aware analysis method
The nf-core community for framework and best practices
The SOPA development team for spatialdata preprocessing tools

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #parallax channel (you can join with this invite).

Citations

If you use nf-core/parallax for your analysis, please cite:

Pipeline

ResolVI Method

ResolVI: Resolution of Variational Inference for spatial transcriptomics

DOI: doi.org/10.1101/2025.01.20.634005 Nature Biotechnology 2022 Feb 07. doi: 10.1038/s41587-021-01206-w

scVIVA Method

scVIVA: single-cell Variational Inference for Variational Analysis of spatial transcriptomics

DOI: 10.1101/2025.06.01.657182 Nature Biotechnology 2022 Feb 07. doi: 10.1038/s41587-021-01206-w

Squidpy

Squidpy: a scalable framework for spatial omics analysis

Nature Methods 2022 Feb DOI: doi.org/10.1038/s41592-021-01358-2

nf-core Framework

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
assets		assets
bin		bin
conf		conf
docs		docs
lib		lib
modules		modules
.gitattributes		.gitattributes
.gitignore		.gitignore
.nf-core.yml		.nf-core.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_singularity.sh		build_singularity.sh
environment.yml		environment.yml
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
parallax.def		parallax.def

Folders and files

Latest commit

History

Repository files navigation

Introduction

Key Technologies

Pipeline Summary

ResolVI Workflow

scVIVA Workflow (runs in parallel after ResolVI training)

Quick Start

Input Requirements

Samplesheet

Marker Genes

Comparison Specifications

Differential Abundance Comparisons

scVIVA Niche-aware DE Comparisons

scVIVA Conditions Setup

Condition Types

1. Cell Type Comparisons (Simple)

2. Treatment/Disease Conditions

3. Spatial Region Conditions

4. Sample-Based Conditions

5. Cross-Condition Cell Type Comparisons

Condition Establishment Methods

Required Data Preparation

Essential Columns

Optional Condition Columns

File Formats

JSON Format (Recommended)

CSV Format

Usage Examples

Basic cell type comparison:

Complex condition-based analysis:

Best Practices

Troubleshooting

Common Issues:

Debug Information:

Key Parameters

Core ResolVI Parameters

scVIVA Parameters

Resource Parameters

Output

Key Output Files

ResolVI Outputs

scVIVA Outputs

Comparative Visualizations

GPU Support

Pipeline Features

Advanced Spatial Analysis

Comprehensive Output

Documentation

Methods Overview

ResolVI (Resolution of Variational Inference)

scVIVA (single-cell Variational Inference for Variational Analysis)

Integrated Workflow Benefits

Credits

Contributions and Support

Citations

Pipeline

ResolVI Method

scVIVA Method

Squidpy

nf-core Framework

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages