Skip to content

PallaresLab/VariantCalling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Variant Calling pipeline

Description

This snakemake pipeline is designed for Variant Calling

Inputs

  • the bam files
  • output directory
  • reference genome
  • SNP database
  • bed file

Outputs

  • logs\ - Directory of log files for each job, check here first if you run into errors
  • working\ - Directory containing intermediate files for each job

Workflow

  1. **genome_prepare
  2. **BQSR--BaseRecalibrator, ApplyBQSR
  3. **HaplotypeCaller--HaplotypeCaller
  4. **JointCallSNPs--JointCallSNP, GenotypeGVCFs, GatherVcfs
  5. **VQSR--VariantRecalibrator, ApplyVQSR

Setup environment

  1. Install conda

  2. Clone workflow into working directory

    git clone <repo> <dir>
    cd <dir>
  3. Create a new enviroment

    conda env create -n <project> --file environment.yaml
  4. Activate the environment

    conda activate <project_name>
  5. Enable the Bioconda channel

    conda config --add channels bioconda
    conda config --add channels conda-forge
    

    !!!Notice conda default channel will cause SSL error with Max Planck intranet.

  6. Install snakemake

    conda install snakemake
  7. Edit configuration files change the path of fastq_dir, output_dir, reference_genome in "config.yaml"

  8. Create index for your SNP database

    gatk IndexFeatureFile -I dbSNP.vcf.gz
  9. The first time you are executing this snakemake pipeline it should run locally, once the first run is over (you can use --dry), you can switch to running it on the cluster.

    snakemake --configfile "config.yaml" --use-conda  --cores N --dryrun
  10. Execute the workflow (Add the --notemp flag if you want to keep the intermediate files instead of automatically deleting them)

    snakemake --configfile "config.yaml" --use-conda  --cores N

if you need to submit the jobs to SGE cluster to run the pipeline

download snakemake-executor-plugin-cluster-generic by pip
```bash
pip install snakemake-executor-plugin-cluster-generic
```
then
```bash
snakemake --use-conda --jobs {cores}  --executor cluster-generic --cluster-generic-submit-cmd  "qsub -cwd -V -l h_vmem={resources.mem_mb}M  -pe parallel {threads} -o logs/ -e logs/"
```

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages