This workflow performs fastq quality trimming, quality checks, alignment and counting using STAR.
The workflow is built using snakemake, and is organized according to recommended best practices from snakemake as described https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html. For a tutorial on how to use snakemake, see here. The key files are workflow/Snakefile and config/config.yaml included in this repo. All programs, with the exception of R, are run in conda environemnts specified in the workflow. These conda environments can be reused, or optionally spun up fresh.
The workflow assumes that you have a resources/fastq directory containing the fastq files. It also assumes that you have a reference genome and gtf file located within the resources/genome directory. Included in the workflow is an indexing step that may or may not be necessary depending on where you point to for your reference genome.
The star workflow now includes a fastq quality control step that uses trim galore for fastq quality checking and trimming.
Following alignment and counting, the data is ready for analysis using DESeq2, edgeR, limma, or any other tool of choice.
Snakemake recommends using mamba over conda. Mamba is a C++ implementation of conda, and thus will generally run faster. If you have not installed either, we recommend installing Mambaforge.
If you already have conda, mamba is included in the snakemake environment setup below.
To build a snakemake conda environment you can use this command:
conda env create -f workflow/envs/snakemake.yml
conda config --set channel_priority strict
This repo contains Rmarkdown documents describing how to execute the workflow.
- workflow/notebooks/workflow.md describes how to run the whole workflow
Best practice is to fork this repo for each project that you want to use the pipeline on.