Skip to content

Latest commit

 

History

History
91 lines (55 loc) · 4.53 KB

File metadata and controls

91 lines (55 loc) · 4.53 KB

linting

Nextflow run with docker run with singularity run with apptainer

Launch on Nextflow Tower

ORB (Omics reference based benchmarking)

Introduction

ORB logo

jlab/orb is a bioinformatics pipeline that calculates performance evaluation scores for assembled sets of contigs. Using Marbel, a researcher is enabled to create an in silico dataset resembling the characteristics of the target environment and, using ORB, test which assembler to use for the analysis of their sample. The pipeline leverages minimap2, Bowtie2, Salmon, DESeq2, edgeR, Calour and custom scripts for the score calculation and includes orthologous groups and DE benchmarking.

Recommended usage:

  • Create in silico datasets using Marbel, name the output dirs: NAME_microbiome
  • Assemble the datasets
  • Run the pipeline

Fill the required parameters per run and adjust the example config files in example/tool.config, example/dataset.config and example/resources.config.

For each dataset create a dataset.config with the data set name. For outdir parameter chose the same parent dir, if you want to visualise the datasets together, e.g., PATH/group/dataset1, PATH/group/dataset2.

Run each dataset with:

nextflow run . \
   -profile apptainer \
   -c example/dataset.config \
   -c example/resources.config \
   -c example/tool.config

Afterwards there is multiple methods which can visualise the runs. A script which creates all orb files is provided: `plotting/plot_orb_figures.py

For the dependencies you can use:

conda env create -f plotting/orb_plots.yaml

conda activate orb_plots

The script can be startet with:

python plotting/plot_orb_figures.py <fp_orb_basedir> <fp_marbel_basedir> <marbel_sequence_file> <file_ending_svg_or_png> <outdir_name> <caviar_log_files>

fp_orb_basedir: Path of the results for orb.

fp_marbel_basedir: Path to the in silico datasets. Datasets folders require _microbiome suffix.

marbel_sequence_file: Path to the bio index file of the Marbel repository: src/marbel/data/deduplicated_pangenome_EDGAR_Microbiome_JLAB2.fas.bgz.bio_index

settings: Style yaml, for example style settings please see plotting/style.yaml

file_ending_svg_or_png: file ending, supported: svg or png

outdir_name: name of the directory where the plots should be saved

caviar_log_files: directory of the log files, if assembled with Caviar, can be left blank with: ""

This script will take some time for the first run. If you recreate environments, remove cache folder: data_cache.

Individual plots may also be imported in a notebook from plot_include_orb and be created there.

Credits

jlab/refbasedassemblereval was originally written by Timo Wentong Lin.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license. Some code from the nf-core community was modified and is posted alongside own code in modules/local.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.