diff --git a/.github/workflows/github-actions-demo.yml b/.github/workflows/github-actions-demo.yml new file mode 100644 index 0000000..42bf76d --- /dev/null +++ b/.github/workflows/github-actions-demo.yml @@ -0,0 +1,17 @@ +name: test CBP nextflow pipeline +run-name: ${{ github.actor }} is testing the Canafian Biogenome Project pipeline +on: [push] +jobs: + Explore-GitHub-Actions: + runs-on: ubuntu-latest + steps: + - run: echo "🎉 The job was automatically triggered by a ${{ github.event_name }} event." + - run: echo "🐧 This job is now running on a ${{ runner.os }} server hosted by GitHub!" + - run: echo "🔎 The name of your branch is ${{ github.ref }} and your repository is ${{ github.repository }}." + - name: List files in the repository + run: | + ls ${{ github.workspace }} + - uses: actions/checkout@v3 + - uses: nf-core/setup-nextflow@v1 + - run: nextflow run bcgsc/Canadian_Biogenome_Project -latest -r V2 -profile conda -c nextflow_github_test.config + - run: echo "🍏 This job's status is ${{ job.status }}." diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..75a07d6 --- /dev/null +++ b/.gitignore @@ -0,0 +1,18 @@ +/work/* +/work +/assembly/* +/assembly +/blobtools/* +/blobtools +/hic_scaffolding/* +/hic_scaffolding +/preprocessing/* +/preprocessing +/purge_dups/* +/purge_dups +/QC/* +/QC +.n* +.git/ +/V2 +/V2/* diff --git a/CBP_workflow.png b/CBP_workflow.png new file mode 100644 index 0000000..cb6f8c6 Binary files /dev/null and b/CBP_workflow.png differ diff --git a/README.md b/README.md index 2088fc6..981804f 100644 --- a/README.md +++ b/README.md @@ -7,12 +7,21 @@ In short, each step of the pipeline is included in a module. Most of the modules A lot of the modules available in this pipeline were developed by members of the nf-core/genomeassembler group, if you want to participate, feel free to join the community. +## **Table of Contents** +* **[Input data](#input-data)** +* **[Output data](#output-files)** +* **[Process](#process)** + * [Running the pipeline with test data](#running-the-pipeline-with-test-data-(will-work-once-the-repo-is-public)) + * [Running the pipeline with your own data](#running-the-pipeline-with-your-own-data) +* **[Credits](#credits)** +* **[Details on the test dataset](#details-on-the-test-dataset)** + ## Input data -The pipeline was developped to take as input PacBio ccs files (bam) and Hi-C files (fastq.gz). The pipeline also support the inclusion of nanopore data and short-reads for polishing. +The pipeline was developped to take as input PacBio files (bam, from Sequel II or Revio machines) and Hi-C files (fastq.gz). The pipeline also support the inclusion of nanopore data and short-reads for polishing. -The pipeline also require information related to the specie of interest such as genome size or ploidy. This information can be found on GoaT (https://goat.genomehubs.org). +The pipeline also require the specie NCBI Taxonomy ID, which can be found on GoaT (https://goat.genomehubs.org) or on NCBI. @@ -22,40 +31,39 @@ The pipeline generates many files and intermediate files, most are self explanat ## Process -An overview of the pipeline is visible on the following subway map. Some parts of the pipeline may have been commented out in this version as they relied on locaaly installed software. The code is still available in case you also want to locally install the software and try it out. +An overview of the pipeline is visible on the following subway map. Some parts of the pipeline may have been commented out in this version as they relied on localy installed software. The code is still available in case you also want to locally install the software and try it out. By default, the pipeline will use hifiasm with PacBio data for the assembly, and if Hi-C data is available, YAHS is used for the scaffolding. Other assembler and scaffolder are available within the pipeline, to change, you need to edit the nextflow.config file. Software used that would require local installation: -- LongQC - -- MitoHifi +- [LongQC](https://github.com/yfukasawa/LongQC) +- [MitoHifi](https://github.com/marcelauliano/MitoHiFi) +- [Juicer](https://github.com/aidenlab/juicer) -- Juicer Software that relies on locally downloaded files / databases : -- Busco - -- Kraken +- [Busco](https://busco.ezlab.org/busco_userguide.html#download-and-automated-update) +- [Kraken](http://ccb.jhu.edu/software/kraken/)

- +

Figure : Overview of the Canadian Biogenome project assembly pipeline

-## Running the pipeline with test data (will work once the repo is public) -To run this pipeline, you need nextflow, conda and singularity installed on your system. + +## Running the pipeline with test data +To run this pipeline, you need nextflow and conda or singularity installed on your system. A set of test data are available in this repo to allow you to test the pipeline with just one command line: ``` -nextflow run bcgsc/Canadian_Biogenome_Project -latest -r dev +nextflow run bcgsc/Canadian_Biogenome_Project -latest -r V2 -profile conda ``` The outputs are organized in several subfolder that are self-explenatory. @@ -87,16 +95,18 @@ nextflow run main.nf -profile singularity ## Credits -The pipeline was originnally written by @scorreard with the help and input from : +The pipeline was originnally written by [@scorreard](https://github.com/scorreard) with the help and input from : - Members of the Jones lab (Canada's Michael Smith Genome Sciences Centre, Vancouver, Canada). -- Members of the Earth Biogenome Project and other affiliated projects. + - Special thanks to [@Glenn Chang](https://github.com/Glenn032787) for reviewing that repo. +- Members of the Earth Biogenome Project and other affiliated projects. - Members of the nf-core / nextflow community. + ## Details on the test dataset The PacBio data is a subset of covid ssequences obtained with this command lines : diff --git a/subset_covid_hifi.bam b/example_input/subset_covid_hifi.bam similarity index 100% rename from subset_covid_hifi.bam rename to example_input/subset_covid_hifi.bam diff --git a/test_1.fastq.gz b/example_input/test_1.fastq.gz similarity index 100% rename from test_1.fastq.gz rename to example_input/test_1.fastq.gz diff --git a/test_2.fastq.gz b/example_input/test_2.fastq.gz similarity index 100% rename from test_2.fastq.gz rename to example_input/test_2.fastq.gz diff --git a/main.nf b/main.nf index f2625a2..cf49eaf 100644 --- a/main.nf +++ b/main.nf @@ -6,7 +6,6 @@ log.info """ CBP pipeline - Solenne Correard - Jones lab ============================================= Specie id : ${params.id} -Taxon : ${params.taxon_name} Taxon number : ${params.taxon_taxid} PacBio input type : ${params.pacbio_input_type} PacBio reads cell 1 : ${params.bam_cell1} @@ -21,26 +20,35 @@ Output path : ${params.outdir} Pipeline version : ${params.pipeline_version} Assembly method : ${params.assembly_method} Assembly mode : ${params.assembly_secondary_mode} +FCS (Foreign Conta Screen) : ${params.fcs} Polishing method : ${params.polishing_method} Purging method : ${params.purging_method} Scaffolding method : ${params.scaffolding_method} -Manual curation : ${params.manual_curation} Mitochondrial assembly : ${params.mitohifi} +Pretext Hi-C map : ${params.pretext} +Juicer Hi-C map : ${params.juicer} +Methylation calling : ${params.methylation_calling} +Comparison to related genome : ${params.genome_comparison} +Blobtools : ${params.blobtools} +Busco : ${params.run_busco} +Manual curation : ${params.manual_curation} """ +include { GOAT_TAXONSEARCH } from './modules/goat/taxonsearch/main.nf' + //Pre-processing -include { CCS as CCS_PACBIO_CELL1; CCS as CCS_PACBIO_CELL2; CCS as CCS_PACBIO_CELL3; CCS as CCS_PACBIO_CELL4 } from './modules/pacbio/ccs/main.nf' -include { BAMTOOLS_FILTER as BAMTOOLS_FILTER_PACBIO_CELL1; BAMTOOLS_FILTER as BAMTOOLS_FILTER_PACBIO_CELL2; BAMTOOLS_FILTER as BAMTOOLS_FILTER_PACBIO_CELL3; BAMTOOLS_FILTER as BAMTOOLS_FILTER_PACBIO_CELL4 } from './modules/bamtools_filter/main.nf' -include { PBINDEX as PBINDEX_FILTERED_PACBIO_CELL1; PBINDEX as PBINDEX_FILTERED_PACBIO_CELL2; PBINDEX as PBINDEX_FILTERED_PACBIO_CELL3; PBINDEX as PBINDEX_FILTERED_PACBIO_CELL4 } from './modules/pacbio/pbbam/pbindex/main.nf' +include { CCS as CCS_PACBIO } from './modules/pacbio/ccs/main.nf' +include { BAMTOOLS_FILTER as BAMTOOLS_FILTER_PACBIO } from './modules/bamtools_filter/main.nf' +include { PBINDEX } from './modules/pacbio/pbindex/main.nf' +include { PBBAM_PBMERGE } from './modules/pacbio/pbmerge/main.nf' include { BAM2FASTX } from './modules/pacbio/bam2fastx/main.nf' -include { TWOBAM2FASTX } from './modules/pacbio/bam2fastx/2bam2fastx/main.nf' -include { THREEBAM2FASTX } from './modules/pacbio/bam2fastx/3bam2fastx/main.nf' -include { FOURBAM2FASTX } from './modules/pacbio/bam2fastx/4bam2fastx/main.nf' + +include { PREPROCESS_MERGED } from './modules/pacbio/preprocess_merged/main.nf' include { CUTADAPT } from './modules/cutadapt/main.nf' //QC Input data -include { LONGQC } from './modules/LongQC/main.nf' +include { LONGQC as LONGQC_PACBIO; LONGQC as LONGQC_ONT } from './modules/LongQC/main.nf' include { MERYL_COUNT } from './modules/meryl/count/main.nf' include { MERYL_UNIONSUM } from './modules/meryl/unionsum/main.nf' include { MERYL_HISTOGRAM } from './modules/meryl/histogram/main.nf' @@ -48,10 +56,17 @@ include { GENOMESCOPE2 } from './modules/genomescope2/main.nf' include { KRAKEN2_KRAKEN2 as KRAKEN2_KRAKEN2_PACBIO_BAM; KRAKEN2_KRAKEN2 as KRAKEN2_KRAKEN2_HIC_READS; KRAKEN2_KRAKEN2 as KRAKEN2_KRAKEN2_SR_READS; KRAKEN2_KRAKEN2 as KRAKEN2_KRAKEN2_ONT_READS } from './modules/kraken2/main.nf' include { COVERAGE_CALCULATION } from './modules/coverage_calculation/main.nf' +//Mitochondrial assembly +include { FASTQGZ_TO_FASTA } from './modules/fastqgz_to_fasta/main.nf' +include { FIND_MITO_REFERENCE } from './modules/mitohifi/findmitoreference/main.nf' +include { MITOHIFI } from './modules/mitohifi/mitohifi/main.nf' + + //Assembly //HifiASM +include { YAK as YAK_PAT; YAK as YAK_MAT } from './modules/yak/main.nf' include { HIFIASM } from './modules/hifiasm/main.nf' -include { GFA_TO_FA; GFA_TO_FA as GFA_TO_FA2 } from './modules/gfa_to_fa/main.nf' +include { GFA_TO_FA as GFA_TO_FA_hap1; GFA_TO_FA as GFA_TO_FA_hap2 } from './modules/gfa_to_fa/main.nf' //Canu include { CANU } from './modules/canu/main.nf' @@ -69,17 +84,24 @@ include { VERKKO } from './modules/verkko/main.nf' //Polishing //Pilon include { PILON } from './modules/pilon/main.nf' -include { BWAMEM2_INDEX } from './modules/bwamem2/index/main.nf' -include { BWAMEM2_MEM } from './modules/bwamem2/mem/main.nf' -include { SAMTOOLS_INDEX } from './modules/samtools/index/main.nf' +include { BWAMEM2_INDEX as BWAMEM2_INDEX_PILON } from './modules/bwamem2/index/main.nf' +include { BWAMEM2_MEM as BWAMEM2_MEM_PILON } from './modules/bwamem2/mem/main.nf' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_PILON } from './modules/samtools/index/main.nf' + +//NCBI cleaning sequence +include { FCS_FCSADAPTOR as FCS_FCSADAPTOR_hap1; FCS_FCSADAPTOR as FCS_FCSADAPTOR_ALT } from './modules/fcs/fcsadaptor/' +include { FCS_FCSGX as FCS_FCSGX_hap1; FCS_FCSGX as FCS_FCSGX_ALT } from './modules/fcs/fcsgx' +include { FCS_FCSGX_CLEAN as FCS_FCSGX_CLEAN_hap1; FCS_FCSGX_CLEAN as FCS_FCSGX_CLEAN_ALT } from './modules/fcs/fcsgx_clean' + //PurgeDups +include { CAT } from './modules/cat/main.nf' include { MINIMAP2_ALIGN as MINIMAP2_ALIGN_TO_CONTIG; MINIMAP2_ALIGN as MINIMAP2_ALIGN_TO_SELF; MINIMAP2_ALIGN as MINIMAP2_ALIGN_TO_CONTIG_ALT; MINIMAP2_ALIGN as MINIMAP2_ALIGN_TO_SELF_ALT } from './modules/minimap2/align/main.nf' -include { PURGEDUPS_SPLITFA; PURGEDUPS_SPLITFA as PURGEDUPS_SPLITFA_ALT } from './modules/purgedups/splitfa/main.nf' -include { PURGEDUPS_PBCSTAT; PURGEDUPS_PBCSTAT as PURGEDUPS_PBCSTAT_ALT } from './modules/purgedups/pbcstat/main.nf' -include { PURGEDUPS_CALCUTS; PURGEDUPS_CALCUTS as PURGEDUPS_CALCUTS_ALT } from './modules/purgedups/calcuts/main.nf' -include { PURGEDUPS_PURGEDUPS; PURGEDUPS_PURGEDUPS as PURGEDUPS_PURGEDUPS_ALT } from './modules/purgedups/purgedups/main.nf' -include { PURGEDUPS_GETSEQS; PURGEDUPS_GETSEQS as PURGEDUPS_GETSEQS_ALT } from './modules/purgedups/getseqs/main.nf' +include { PURGEDUPS_SPLITFA as PURGEDUPS_SPLITFA_hap1; PURGEDUPS_SPLITFA as PURGEDUPS_SPLITFA_ALT } from './modules/purgedups/splitfa/main.nf' +include { PURGEDUPS_PBCSTAT as PURGEDUPS_PBCSTAT_hap1; PURGEDUPS_PBCSTAT as PURGEDUPS_PBCSTAT_ALT } from './modules/purgedups/pbcstat/main.nf' +include { PURGEDUPS_CALCUTS as PURGEDUPS_CALCUTS_hap1; PURGEDUPS_CALCUTS as PURGEDUPS_CALCUTS_ALT } from './modules/purgedups/calcuts/main.nf' +include { PURGEDUPS_PURGEDUPS as PURGEDUPS_PURGEDUPS_hap1; PURGEDUPS_PURGEDUPS as PURGEDUPS_PURGEDUPS_ALT } from './modules/purgedups/purgedups/main.nf' +include { PURGEDUPS_GETSEQS as PURGEDUPS_GETSEQS_hap1; PURGEDUPS_GETSEQS as PURGEDUPS_GETSEQS_ALT } from './modules/purgedups/getseqs/main.nf' //HIC scaffolding-SALSA2 include { PREPARE_GENOME } from './modules/nfcore_hic/subworkflows/local/prepare_genome.nf' @@ -97,67 +119,99 @@ include { SALSA2 } from './modules/salsa2/main.nf' include { SALSA2_JUICER } from './modules/juicer/salsa2_juicer/main.nf' //HIC scaffolding-YAHS -include { SAMTOOLS_FAIDX as SAMTOOLS_FAIDX1; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX2; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX1_ALT; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX2_ALT } from './modules/samtools/faidx/main.nf' -include { CHROMAP_INDEX; CHROMAP_INDEX as CHROMAP_INDEX_ALT } from './modules/chromap/index/main.nf' -include { CHROMAP_CHROMAP; CHROMAP_CHROMAP as CHROMAP_CHROMAP_ALT } from './modules/chromap/chromap/main.nf' -include { YAHS; YAHS as YAHS_ALT } from './modules/yahs/main.nf' +include { SAMTOOLS_FAIDX as SAMTOOLS_FAIDX_FCS_hap1; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX_FCS_hap2; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX_PURGE_hap1; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX_PURGE_hap2; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX_SCAFF_hap1; SAMTOOLS_FAIDX as SAMTOOLS_FAIDX_SCAFF_hap2 } from './modules/samtools/faidx/main.nf' +include { CHROMAP_INDEX as CHROMAP_INDEX_hap1; CHROMAP_INDEX as CHROMAP_INDEX_ALT } from './modules/chromap/index/main.nf' +include { CHROMAP_CHROMAP as CHROMAP_CHROMAP_hap1; CHROMAP_CHROMAP as CHROMAP_CHROMAP_ALT } from './modules/chromap/chromap/main.nf' +include { YAHS as YAHS_hap1; YAHS as YAHS_ALT } from './modules/yahs/main.nf' + +//Map PacBio data against newly genevated assembly +include { JASMINE } from './modules/pacbio/jasmine/main.nf' +include { PBMM2 } from './modules/pacbio/pbmm2/main.nf' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_PBMM2 } from './modules/samtools/index/main.nf' //Assembly QC include { YAHS_JUICER } from './modules/juicer/yahs_juicer/main.nf' include { JUICER } from './modules/juicer/juicer/main.nf' include { PRETEXTMAP } from './modules/pretext/pretextmap/main.nf' include { PRETEXTSNAPSHOT } from './modules/pretext/pretextsnapshot/main.nf' - -include { CAT } from './modules/cat/main.nf' -include { BUSCO ; BUSCO as BUSCO_lin2; BUSCO as BUSCO_lin3; BUSCO as BUSCO_lin4; BUSCO as BUSCO_ALT; BUSCO as BUSCO_lin2ALT; BUSCO as BUSCO_lin3ALT; BUSCO as BUSCO_lin4ALT } from './modules/busco/main.nf' -include { MERQURY as MERQURY1; MERQURY as MERQURY2; MERQURY as MERQURY3 } from './modules/merqury/main.nf' -include { MERQURY_DOUBLE as MERQURY1_DOUBLE; MERQURY_DOUBLE as MERQURY2_DOUBLE; MERQURY_DOUBLE as MERQURY3_DOUBLE } from './modules/merqury/merqury_double/main.nf' -include { QUAST as QUAST1; QUAST as QUAST2; QUAST as QUAST3; QUAST as QUAST_PILON } from './modules/quast/main.nf' -include { QUAST_DOUBLE as QUAST1_DOUBLE; QUAST_DOUBLE as QUAST2_DOUBLE; QUAST_DOUBLE as QUAST3_DOUBLE } from './modules/quast/quast_double/main.nf' +include { BEDTOOLS_GENOMECOV } from './modules/bedtools/genomecov/main.nf' +include { GFASTATS } from './modules/gfastats/main.nf' +include { TIDK } from './modules/tidk/main.nf' +include { PRETEXTGRAPH as PRETEXTGRAPH_TELO; PRETEXTGRAPH as PRETEXTGRAPH_TELO_COV } from './modules/pretext/pretextgraph/main.nf' + +//Genome comparison +include { NCBIGENOMEDOWNLOAD } from './modules/ncbigenomedownload/main.nf' +include { JUPITER } from './modules/jupiter/main.nf' +include { MASHMAP } from './modules/mashmap/main.nf' + +//QC of assemblies +include { BUSCO as BUSCO_lin1_PRIM; BUSCO as BUSCO_lin1_cleaned; BUSCO as BUSCO_lin1_purged; BUSCO as BUSCO_lin1_SCAFF; BUSCO as BUSCO_lin2; BUSCO as BUSCO_lin3; BUSCO as BUSCO_lin4; BUSCO as BUSCO_ALT } from './modules/busco/main.nf' +include { MERQURY as MERQURY_ASS; MERQURY as MERQURY_PURGED; MERQURY as MERQURY_SCAFF } from './modules/merqury/main.nf' +include { MERQURY_DOUBLE as MERQURY_ASS_DOUBLE; MERQURY_DOUBLE as MERQURY_PURGED_DOUBLE; MERQURY_DOUBLE as MERQURY_SCAFF_DOUBLE } from './modules/merqury/merqury_double/main.nf' +include { QUAST as QUAST_ASS; QUAST as QUAST_PILON; QUAST as QUAST_CLEAN; QUAST as QUAST_PURGED; QUAST as QUAST_SCAFF } from './modules/quast/main.nf' +include { QUAST_DOUBLE as QUAST_ASS_DOUBLE; QUAST_DOUBLE as QUAST_CLEAN_DOUBLE; QUAST_DOUBLE as QUAST_PURGED_DOUBLE; QUAST_DOUBLE as QUAST_SCAFF_DOUBLE } from './modules/quast/quast_double/main.nf' include { MULTIQC } from './modules/multiqc/main.nf' +//Blobtoolskit include { GZIP } from './modules/gzip/main.nf' -include { BLOBTOOLS_CONFIG } from './modules/blobtools/blobtools_config/main.nf' +include { BLOBTOOLS_CONFIG_1LINEAGE } from './modules/blobtools/blobtools_config/blobtools_config_1lineage/main.nf' +include { BLOBTOOLS_CONFIG_2LINEAGES } from './modules/blobtools/blobtools_config/blobtools_config_2lineages/main.nf' include { BLOBTOOLS_PIPELINE } from './modules/blobtools/blobtools_pipeline/main.nf' include { BLOBTOOLS_CREATE } from './modules/blobtools/blobtools_create/main.nf' include { BLOBTOOLS_ADD } from './modules/blobtools/blobtools_add/main.nf' -include { BLOBTOOLS_VIEW_SNAIL } from './modules/blobtools/blobtools_view_snail/main.nf' -include { BLOBTOOLS_VIEW_BLOB } from './modules/blobtools/blobtools_view_blob/main.nf' -include { BLOBTOOLS_VIEW_CUMULATIVE } from './modules/blobtools/blobtools_view_cumulative/main.nf' +include { BLOBTOOLS_VIEW } from './modules/blobtools/blobtools_view/main.nf' -//Mitochondrial assembly -include { FASTQGZ_TO_FASTA } from './modules/fastqgz_to_fasta/main.nf' -include { FIND_MITO_REFERENCE } from './modules/mitohifi/findmitoreference/main.nf' -include { MITOHIFI } from './modules/mitohifi/mitohifi/main.nf' +include { OVERVIEW_GENERATION_SAMPLE } from './modules/overview_generation/sample/main.nf' +include { CUSTOM_DUMPSOFTWAREVERSIONS } from './modules/dumpsoftwareversions/main' + +include { RAPIDCURATION_SPLIT } from './modules/manualcuration/main.nf' workflow { ////////////////////////////////////////////////// INPUT ////////////////////////////////// +taxon = [ + [ id:params.id ], // meta map + taxon = params.taxon_taxid, + [] + ] + //PacBio data - input_pacbio_cell1 = [ - [ id:params.id, single_end: true], // meta map - [ file(params.bam_cell1, checkIfExists: true) ] - ] - if( params.bam_cell2 ){ - input_pacbio_cell2 = [ - [ id:'pacbio_cell2', single_end: true], // meta map - [ file(params.bam_cell2, checkIfExists: true) ] - ] - } - if( params.bam_cell3 ){ - input_pacbio_cell3 = [ - [ id:'pacbio_cell3', single_end: true], // meta map - [ file(params.bam_cell3, checkIfExists: true) ] - ] - } + if( params.bam_cell4 ){ - input_pacbio_cell4 = [ - [ id:'pacbio_cell4', single_end: true], // meta map - [ file(params.bam_cell4, checkIfExists: true) ] + input_pacbio = [ + [ id:params.id, single_end: true], // meta map + [ + file(params.bam_cell1, checkIfExists: true), + file(params.bam_cell2, checkIfExists: true), + file(params.bam_cell3, checkIfExists: true), + file(params.bam_cell4, checkIfExists: true) + + ] ] - } - + } else if( params.bam_cell3 ){ + input_pacbio = [ + [ id:params.id, single_end: true], // meta map + [ + file(params.bam_cell1, checkIfExists: true), + file(params.bam_cell2, checkIfExists: true), + file(params.bam_cell3, checkIfExists: true) + ] + ] + } else if( params.bam_cell2 ){ + input_pacbio = [ + [ id:params.id, single_end: true], // meta map + [ + file(params.bam_cell1, checkIfExists: true), + file(params.bam_cell2, checkIfExists: true) + ] + ] + } else { + input_pacbio = [ + [ id:params.id, single_end: true], // meta map + [ file(params.bam_cell1, checkIfExists: true) ] + ] + } //ONT data if (params.ont_fastq_1) { @@ -189,6 +243,29 @@ workflow { ] } +//Paternal Illumina SR data (Only if illumina_SR defined) + if (( params.illumina_SR_pat_read1 ) && ( params.illumina_SR_pat_read2 )) { + input_illumina_SR_pat_R1_R2 = [ + [ id:'input_illumina_SR_pat_R1_R2', single_end: false], // meta map + [ + file(params.illumina_SR_pat_read1, checkIfExists: true), + file(params.illumina_SR_pat_read2, checkIfExists: true) + ] + ] + } + +//Maternal Illumina SR data (Only if illumina_SR defined) + if (( params.illumina_SR_mat_read1 ) && ( params.illumina_SR_mat_read2 )) { + input_illumina_SR_mat_R1_R2 = [ + [ id:'input_illumina_SR_mat_R1_R2', single_end: false], // meta map + [ + file(params.illumina_SR_mat_read1, checkIfExists: true), + file(params.illumina_SR_mat_read2, checkIfExists: true) + ] + ] + } + + ////////////////////////////////////////////////// DUMMY FILES ////////////////////////////////// quast_fasta = file('fasta_dummy') quast_gff = file('gff_dummy') @@ -200,85 +277,104 @@ workflow { ////////////////////////////////////////////////// WORKFLOW ////////////////////////////////// - if (params.pacbio_input_type == 'subreads') { - CCS_PACBIO_CELL1(input_pacbio_cell1) - if( params.bam_cell2 ) { - CCS_PACBIO_CELL2(input_pacbio_cell2) - } - if( params.bam_cell3 ) { - CCS_PACBIO_CELL3(input_pacbio_cell3) - } - if( params.bam_cell4 ) { - CCS_PACBIO_CELL4(input_pacbio_cell4) - } - - //Pre-processing - BAMTOOLS_FILTER_PACBIO_CELL1 (CCS_PACBIO_CELL1.out.bam) - PBINDEX_FILTERED_PACBIO_CELL1 (BAMTOOLS_FILTER_PACBIO_CELL1.out.filtered_bam) - - if( params.bam_cell2 ) { - BAMTOOLS_FILTER_PACBIO_CELL2 (CCS_PACBIO_CELL2.out.bam) - PBINDEX_FILTERED_PACBIO_CELL2 (BAMTOOLS_FILTER_PACBIO_CELL2.out.filtered_bam) - } - if( params.bam_cell3 ){ - BAMTOOLS_FILTER_PACBIO_CELL3 (CCS_PACBIO_CELL3.out.bam) - PBINDEX_FILTERED_PACBIO_CELL3 (BAMTOOLS_FILTER_PACBIO_CELL3.out.filtered_bam) - } - if( params.bam_cell4 ){ - BAMTOOLS_FILTER_PACBIO_CELL4 (CCS_PACBIO_CELL4.out.bam) - PBINDEX_FILTERED_PACBIO_CELL4 (BAMTOOLS_FILTER_PACBIO_CELL4.out.filtered_bam) - } + // To gather all QC reports for MultiQC + mqc_input = Channel.empty() + // To gather used softwares versions for MultiQC + ch_versions = Channel.empty() + + GOAT_TAXONSEARCH(taxon) + +//PacBio data is very large +//When there is several HIFI SMRT cells, +//doing the merging, filtering and bam2fastx steps in different modules is space consumming +//Limiting the capacity to run several genomes in parrallel +//Merging the steps in one module and deleting the intermediate files was a solution + + if ((params.bam_cell2) && (params.pacbio_input_type == 'hifi')){ + //No need for filtering as only hifi are in this file (output from the Revio machine) + //PBmerge include an indexing + PBBAM_PBMERGE(input_pacbio) + ch_versions = ch_versions.mix(PBBAM_PBMERGE.out.versions) + final_pacBio_bam = PBBAM_PBMERGE.out.bam + final_pacBio_bam_index = PBBAM_PBMERGE.out.pbi + } else if ((params.bam_cell2) && (params.pacbio_input_type == 'ccs')){ + //MERGED STEPS : PBBAM_PBMERGE + BAMTOOLS_FILTER_PACBIO + PREPROCESS_MERGED(input_pacbio) + PBINDEX (PREPROCESS_MERGED.out.filtered_bam) + ch_versions = ch_versions.mix(PREPROCESS_MERGED.out.versions) + ch_versions = ch_versions.mix(PBINDEX.out.versions) + final_pacBio_bam = PREPROCESS_MERGED.out.filtered_bam + final_pacBio_bam_index = PBINDEX.out.index + } else if ((params.bam_cell2) && (params.pacbio_input_type == 'clr')) { + PBBAM_PBMERGE(input_pacbio) + CCS_PACBIO(PBBAM_PBMERGE.out.bam) + BAMTOOLS_FILTER_PACBIO (CCS_PACBIO_CELL1.out.bam) + PBINDEX (BAMTOOLS_FILTER_PACBIO.out.filtered_bam) + ch_versions = ch_versions.mix(PBBAM_PBMERGE.out.versions) + ch_versions = ch_versions.mix(CCS_PACBIO_CELL1.out.versions) + ch_versions = ch_versions.mix(BAMTOOLS_FILTER_PACBIO.out.versions) + ch_versions = ch_versions.mix(PBINDEX.out.versions) + final_pacBio_bam = BAMTOOLS_FILTER_PACBIO.out.filtered_bam + final_pacBio_bam_index = PBINDEX.out.index + } else if (!(params.bam_cell2) && (params.pacbio_input_type == 'hifi')){ + PBINDEX(params.bam_cell1) + ch_versions = ch_versions.mix(PBINDEX.out.versions) + final_pacBio_bam = params.bam_cell1 + final_pacBio_bam_index = PBINDEX.out.index + } else if (!(params.bam_cell2) && (params.pacbio_input_type == 'ccs')){ + BAMTOOLS_FILTER_PACBIO (input_pacbio) + PBINDEX (BAMTOOLS_FILTER_PACBIO.out.filtered_bam) + ch_versions = ch_versions.mix(BAMTOOLS_FILTER_PACBIO.out.versions) + ch_versions = ch_versions.mix(PBINDEX.out.versions) + final_pacBio_bam = BAMTOOLS_FILTER_PACBIO.out.filtered_bam + final_pacBio_bam_index = PBINDEX.out.index + } else if (!(params.bam_cell2) && (params.pacbio_input_type == 'clr')) { + CCS_PACBIO(input_pacbio) + BAMTOOLS_FILTER_PACBIO (CCS_PACBIO_CELL1.out.bam) + PBINDEX (BAMTOOLS_FILTER_PACBIO.out.filtered_bam) + ch_versions = ch_versions.mix(CCS_PACBIO_CELL1.out.versions) + ch_versions = ch_versions.mix(BAMTOOLS_FILTER_PACBIO.out.versions) + ch_versions = ch_versions.mix(PBINDEX.out.versions) + final_pacBio_bam = BAMTOOLS_FILTER_PACBIO.out.filtered_bam + final_pacBio_bam_index = PBINDEX.out.index } else { - //Pre-processing - BAMTOOLS_FILTER_PACBIO_CELL1 (input_pacbio_cell1) - PBINDEX_FILTERED_PACBIO_CELL1 (BAMTOOLS_FILTER_PACBIO_CELL1.out.filtered_bam) - - if( params.bam_cell2 ) { - BAMTOOLS_FILTER_PACBIO_CELL2 (input_pacbio_cell2) - PBINDEX_FILTERED_PACBIO_CELL2 (BAMTOOLS_FILTER_PACBIO_CELL2.out.filtered_bam) - } - if( params.bam_cell3 ){ - BAMTOOLS_FILTER_PACBIO_CELL3 (input_pacbio_cell3) - PBINDEX_FILTERED_PACBIO_CELL3 (BAMTOOLS_FILTER_PACBIO_CELL3.out.filtered_bam) - } - if( params.bam_cell4 ){ - BAMTOOLS_FILTER_PACBIO_CELL4 (input_pacbio_cell4) - PBINDEX_FILTERED_PACBIO_CELL4 (BAMTOOLS_FILTER_PACBIO_CELL4.out.filtered_bam) - } + error "Invalid pacbio input parameters" } - //Merge the multiple pacbio bam files if multiple are generated and generate fastq files - - if( params.bam_cell4 ) { - FOURBAM2FASTX(BAMTOOLS_FILTER_PACBIO_CELL1.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL1.out.index), BAMTOOLS_FILTER_PACBIO_CELL2.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL2.out.index), BAMTOOLS_FILTER_PACBIO_CELL3.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL3.out.index), BAMTOOLS_FILTER_PACBIO_CELL4.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL4.out.index)) - CUTADAPT (FOURBAM2FASTX.out.reads) - } else if( params.bam_cell3 ) { - THREEBAM2FASTX(BAMTOOLS_FILTER_PACBIO_CELL1.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL1.out.index), BAMTOOLS_FILTER_PACBIO_CELL2.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL2.out.index), BAMTOOLS_FILTER_PACBIO_CELL3.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL3.out.index)) - CUTADAPT (THREEBAM2FASTX.out.reads) - } else if( params.bam_cell2 ){ - TWOBAM2FASTX(BAMTOOLS_FILTER_PACBIO_CELL1.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL1.out.index), BAMTOOLS_FILTER_PACBIO_CELL2.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL2.out.index)) - CUTADAPT (TWOBAM2FASTX.out.reads) - } else { - BAM2FASTX (BAMTOOLS_FILTER_PACBIO_CELL1.out.filtered_bam.join(PBINDEX_FILTERED_PACBIO_CELL1.out.index)) - CUTADAPT (BAM2FASTX.out.reads) - } + BAM2FASTX (final_pacBio_bam.join(final_pacBio_bam_index)) + ch_versions = ch_versions.mix(BAM2FASTX.out.versions) + bam2fastx_output = BAM2FASTX.out.reads + + CUTADAPT (bam2fastx_output) + ch_versions = ch_versions.mix(CUTADAPT.out.versions) //QC Input data - mqc_input = Channel.empty() +// LONGQC_PACBIO (CUTADAPT.out.reads) MERYL_COUNT (CUTADAPT.out.reads) MERYL_HISTOGRAM (MERYL_COUNT.out.meryl_db) - GENOMESCOPE2 (MERYL_HISTOGRAM.out.hist) - COVERAGE_CALCULATION(CUTADAPT.out.reads) + GENOMESCOPE2 (MERYL_HISTOGRAM.out.hist, GOAT_TAXONSEARCH.out.ploidy) + COVERAGE_CALCULATION(CUTADAPT.out.reads, GOAT_TAXONSEARCH.out.genome_size) + + if (params.execute_kraken == 'yes') { + KRAKEN2_KRAKEN2_PACBIO_BAM (CUTADAPT.out.reads, params.kraken_db, false, false ) + mqc_input = mqc_input.mix(KRAKEN2_KRAKEN2_PACBIO_BAM.out.report.collect{it[1]}) + kraken_pacbio = KRAKEN2_KRAKEN2_PACBIO_BAM.out.report + ch_versions = ch_versions.mix(KRAKEN2_KRAKEN2_PACBIO_BAM.out.versions) + } else { + kraken_pacbio = [ + [ id:'dummy', single_end: true], // meta map + [ file('kraken_pacbio_dummy')] + ] + } + -//All the following steps are commented out as they require some local installation to work -/* - LONGQC (CUTADAPT.out.reads) - KRAKEN2_KRAKEN2_PACBIO_BAM (CUTADAPT.out.reads, params.kraken_db, false, false ) - mqc_input = mqc_input.mix(KRAKEN2_KRAKEN2_PACBIO_BAM.out.report.collect{it[1]}) - COVERAGE_CALCULATION(CUTADAPT.out.reads) + // Gather versions of all tools used +// ch_versions = ch_versions.mix(LONGQC_PACBIO.out.versions) + ch_versions = ch_versions.mix(MERYL_COUNT.out.versions) + ch_versions = ch_versions.mix(GENOMESCOPE2.out.versions) //ONLY if Hi-C data available - if (( params.hic_read1 ) && (params.hic_read2 )) { + if (( params.hic_read1 ) && (params.hic_read2 ) && (params.execute_kraken == 'yes')) { KRAKEN2_KRAKEN2_HIC_READS (input_hic_R1_R2, params.kraken_db, false, false ) mqc_input = mqc_input.mix(KRAKEN2_KRAKEN2_HIC_READS.out.report.collect{it[1]}) kraken_hic = KRAKEN2_KRAKEN2_HIC_READS.out.report @@ -299,67 +395,101 @@ workflow { if (params.ont_fastq_1) { KRAKEN2_KRAKEN2_ONT_READS (input_ont_fastq_1, params.kraken_db, false, false ) mqc_input = mqc_input.mix(KRAKEN2_KRAKEN2_ONT_READS.out.report.collect{it[1]}) + LONGQC_ONT(input_ont_fastq_1) } if (params.mitohifi == 'yes') { //Mitochondrial assembly FASTQGZ_TO_FASTA(CUTADAPT.out.reads) - FIND_MITO_REFERENCE(FASTQGZ_TO_FASTA.out.fasta, params.taxon_name) + FIND_MITO_REFERENCE(FASTQGZ_TO_FASTA.out.fasta, GOAT_TAXONSEARCH.out.scientific_name) MITOHIFI(FASTQGZ_TO_FASTA.out.fasta, FIND_MITO_REFERENCE.out.reference_fasta, FIND_MITO_REFERENCE.out.reference_gb) + + ch_versions = ch_versions.mix(FASTQGZ_TO_FASTA.out.versions) + ch_versions = ch_versions.mix(FIND_MITO_REFERENCE.out.versions) + ch_versions = ch_versions.mix(MITOHIFI.out.versions) } -*/ + //Assembly : The method is selected in the parameters : 'hifiasm' or 'flye' or 'canu' or 'verkko' if ( params.assembly_method == 'hifiasm') { //HifiASM : Need to select a secondary mode : 'pacbio' or 'pacbio+hic' or 'pacbio+ont' or 'pacbio+ont+hic' if (params.assembly_secondary_mode == 'pacbio+hic') { - HIFIASM (CUTADAPT.out.reads, [], [], params.hic_read1, params.hic_read2, [] ) + HIFIASM (CUTADAPT.out.reads, [], [], params.hic_read1, params.hic_read2, [], GOAT_TAXONSEARCH.out.ploidy, GOAT_TAXONSEARCH.out.genome_size ) } else if (params.assembly_secondary_mode == 'pacbio+ont') { - HIFIASM (CUTADAPT.out.reads, [], [], [], [], params.ont_fastq_1 ) + HIFIASM (CUTADAPT.out.reads, [], [], [], [], params.ont_fastq_1, GOAT_TAXONSEARCH.out.ploidy, GOAT_TAXONSEARCH.out.genome_size ) } else if (params.assembly_secondary_mode == 'pacbio') { - HIFIASM (CUTADAPT.out.reads, [], [], [], [], [] ) + HIFIASM (CUTADAPT.out.reads, [], [], [], [], [], GOAT_TAXONSEARCH.out.ploidy, GOAT_TAXONSEARCH.out.genome_size ) } else if (params.assembly_secondary_mode == 'pacbio+ont+hic') { - HIFIASM (CUTADAPT.out.reads, [], [], params.hic_read1, params.hic_read2,params.ont_fastq_1 ) + HIFIASM (CUTADAPT.out.reads, [], [], params.hic_read1, params.hic_read2,params.ont_fastq_1, GOAT_TAXONSEARCH.out.ploidy, GOAT_TAXONSEARCH.out.genome_size ) + } else if (params.assembly_secondary_mode == 'trio') { + YAK_PAT(input_illumina_SR_pat_R1_R2) + YAK_MAT(input_illumina_SR_mat_R1_R2) + HIFIASM (CUTADAPT.out.reads, YAK_PAT.out.yak.collect{it[1]}, YAK_MAT.out.yak.collect{it[1]}, [], [], [], GOAT_TAXONSEARCH.out.ploidy, GOAT_TAXONSEARCH.out.genome_size ) } else { error "Invalid hifiasm mode: params.assembly_secondary_mode. These modes are currently supported : 'pacbio' or 'pacbio+hic' or 'pacbio+ont' or 'pacbio+ont+hic'" } - GFA_TO_FA (HIFIASM.out.hap1_contigs) - assembly_primary = GFA_TO_FA.out.fa_assembly - GFA_TO_FA2 (HIFIASM.out.hap2_contigs) - assembly_alternate = GFA_TO_FA2.out.fa_assembly + GFA_TO_FA_hap1 (HIFIASM.out.hap1_contigs) + GFA_TO_FA_hap2 (HIFIASM.out.hap2_contigs) + ch_versions = ch_versions.mix(HIFIASM.out.versions) + ch_versions = ch_versions.mix(GFA_TO_FA_hap1.out.versions) + + assembly_primary = GFA_TO_FA_hap1.out.fa_assembly + assembly_alternate = GFA_TO_FA_hap2.out.fa_assembly } else if ( params.assembly_method == 'canu') { //CANU if (params.assembly_secondary_mode == 'hicanu') { - CANU(CUTADAPT.out.reads) + CANU(CUTADAPT.out.reads, GOAT_TAXONSEARCH.out.genome_size) } else if (params.assembly_secondary_mode == 'ont') { - CANU(input_ont_fastq_1) + CANU(input_ont_fastq_1, GOAT_TAXONSEARCH.out.genome_size) + } else if (params.assembly_secondary_mode == 'clr') { + CANU(CUTADAPT.out.reads, GOAT_TAXONSEARCH.out.genome_size) } else { - error "Invalid canu mode: params.assembly_secondary_mode. These modes are currently supported : 'hicanu' or 'ont'" + error "Invalid canu mode: params.assembly_secondary_mode. These modes are currently supported : 'hicanu', 'ont' or 'clr'" } - assembly_primary = CANU.out.assembly + ch_versions = ch_versions.mix(CANU.out.versions) + + assembly_primary = CANU.out.assembly } else if ( params.assembly_method == 'flye') { //FLYE if (params.assembly_secondary_mode== 'hifi') { mode = "--pacbio-hifi" FLYE (CUTADAPT.out.reads, mode) MINIMAP_ALIGN_FLYE (CUTADAPT.out.reads, FLYE.out.fasta.collect{it[1]}, false, false, false) - RACON (CUTADAPT.out.reads, FLYE.out.fasta.join (MINIMAP_ALIGN_FLYE.out.paf)) - LONGSTITCH (CUTADAPT.out.reads, RACON.out.improved_assembly) + RACON (CUTADAPT.out.reads, FLYE.out.fasta, MINIMAP_ALIGN_FLYE.out.paf) + LONGSTITCH (CUTADAPT.out.reads, RACON.out.improved_assembly, GOAT_TAXONSEARCH.out.genome_size) + // Gather versions of all tools used + ch_versions = ch_versions.mix(FLYE.out.versions) } else if (params.assembly_secondary_mode== 'ont') { mode = "--nano-raw" FLYE (input_ont_fastq_1, mode) MINIMAP_ALIGN_FLYE (input_ont_fastq_1, FLYE.out.fasta.collect{it[1]}, false, false, false) - RACON (input_ont_fastq_1, FLYE.out.fasta.join (MINIMAP_ALIGN_FLYE.out.paf)) - LONGSTITCH (input_ont_fastq_1, RACON.out.improved_assembly) + RACON (input_ont_fastq_1, FLYE.out.fasta, MINIMAP_ALIGN_FLYE.out.paf) + LONGSTITCH (input_ont_fastq_1, RACON.out.improved_assembly, GOAT_TAXONSEARCH.out.genome_size) + // Gather versions of all tools used + ch_versions = ch_versions.mix(FLYE.out.versions) } else if (params.assembly_secondary_mode== 'pacbio+ont') { FLYE_PACBIO_ONT (CUTADAPT.out.reads, input_ont_fastq_1) MINIMAP_ALIGN_FLYE (input_ont_fastq_1, FLYE_PACBIO_ONT.out.fasta.collect{it[1]}, false, false, false) - RACON (input_ont_fastq_1, FLYE_PACBIO_ONT.out.fasta.join (MINIMAP_ALIGN_FLYE.out.paf)) - LONGSTITCH (input_ont_fastq_1, RACON.out.improved_assembly) + RACON (input_ont_fastq_1, FLYE_PACBIO_ONT.out.fasta, MINIMAP_ALIGN_FLYE.out.paf) + LONGSTITCH (input_ont_fastq_1, RACON.out.improved_assembly, GOAT_TAXONSEARCH.out.genome_size) + // Gather versions of all tools used + ch_versions = ch_versions.mix(FLYE_PACBIO_ONT.out.versions) + } else if (params.assembly_secondary_mode== 'clr') { + mode = "--pacbio-raw" + FLYE (CUTADAPT.out.reads, mode) + MINIMAP_ALIGN_FLYE (CUTADAPT.out.reads, FLYE.out.fasta.collect{it[1]}, false, false, false) + RACON (CUTADAPT.out.reads, FLYE.out.fasta, MINIMAP_ALIGN_FLYE.out.paf) + LONGSTITCH (CUTADAPT.out.reads, RACON.out.improved_assembly, GOAT_TAXONSEARCH.out.genome_size) + // Gather versions of all tools used + ch_versions = ch_versions.mix(FLYE.out.versions) } else { - error "Invalid flye mode: params.assembly_secondary_mode. These modes are currently supported : 'hifi' or 'ont' or 'pacbio+ont'" + error "Invalid flye mode: params.assembly_secondary_mode. These modes are currently supported : 'hifi' or 'ont' or 'pacbio+ont' or 'clr'" } - assembly_primary = LONGSTITCH.out.assembly + assembly_primary = LONGSTITCH.out.assembly + // Gather versions of all tools used + ch_versions = ch_versions.mix(MINIMAP_ALIGN_FLYE.out.versions) + ch_versions = ch_versions.mix(RACON.out.versions) + ch_versions = ch_versions.mix(LONGSTITCH.out.versions) } else if ( params.assembly_method == 'verkko') { //VERKKO if (params.assembly_secondary_mode == 'pacbio+ont') { @@ -372,89 +502,182 @@ workflow { error "Invalid verkko mode: params.assembly_secondary_mode. These modes are currently supported : 'pacbio', 'ont', 'pacbio+ont'" } assembly_primary = VERKKO.out.assembly + // Gather versions of all tools used + ch_versions = ch_versions.mix(VERKKO.out.versions) } else { error "Invalid alignment method: params.assembly_method. These methods are currently supported : 'hifiasm', 'canu', 'flye', 'verkko'. " } //QC post assembly - if ((params.assembly_method == 'hifiasm')&& (params.ploidy != '1')) { - QUAST1_DOUBLE (assembly_primary, assembly_alternate, quast_fasta, quast_gff, false, false ) - mqc_input = mqc_input.mix(QUAST1_DOUBLE.out.tsv) - quast_contig = QUAST1_DOUBLE.out.tsv - MERQURY1_DOUBLE (MERYL_COUNT.out.meryl_db, assembly_primary, assembly_alternate) + if ((params.run_busco == 'yes') && (params.busco_extend == 'every_step')) { + if (params.lineage) { + BUSCO_lin1_PRIM(assembly_primary, params.lineage, params.busco_lineages_path, []) + } else { + BUSCO_lin1_PRIM(assembly_primary, 'auto', params.busco_lineages_path, []) + } + mqc_input = mqc_input.mix(BUSCO_lin1_PRIM.out.short_summaries_txt.collect{it[1]}) + } + + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + QUAST_ASS_DOUBLE (assembly_primary, assembly_alternate, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_ASS_DOUBLE.out.tsv) + quast_contig = QUAST_ASS_DOUBLE.out.renamed_tsv + MERQURY_ASS_DOUBLE (MERYL_COUNT.out.meryl_db, assembly_primary, assembly_alternate) + // Gather versions of all tools used + ch_versions = ch_versions.mix(QUAST_ASS_DOUBLE.out.versions) + ch_versions = ch_versions.mix(MERQURY_ASS_DOUBLE.out.versions) } else { - QUAST1 (assembly_primary, quast_fasta, quast_gff, false, false ) - mqc_input = mqc_input.mix(QUAST1.out.tsv) - quast_contig = QUAST1.out.tsv - MERQURY1 (MERYL_COUNT.out.meryl_db.join(assembly_primary)) + QUAST_ASS (assembly_primary, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_ASS.out.tsv) + quast_contig = QUAST_ASS.out.renamed_tsv + MERQURY_ASS (MERYL_COUNT.out.meryl_db.join(assembly_primary)) + // Gather versions of all tools used + ch_versions = ch_versions.mix(QUAST_ASS.out.versions) + ch_versions = ch_versions.mix(MERQURY_ASS.out.versions) } //Polishing (likely not going to happen, only for primary assembly for now) if (params.polishing_method == 'pilon') { - BWAMEM2_INDEX(assembly_primary) - BWAMEM2_MEM(input_illumina_SR_R1_R2, BWAMEM2_INDEX.out.index, true) - SAMTOOLS_INDEX(BWAMEM2_MEM.out.bam) + BWAMEM2_INDEX_PILON(assembly_primary) + BWAMEM2_MEM_PILON(input_illumina_SR_R1_R2, BWAMEM2_INDEX_PILON.out.index, true) + SAMTOOLS_INDEX_PILON(BWAMEM2_MEM_PILON.out.bam) pilon_mode="--frags" - PILON(assembly_primary, pilon_mode, BWAMEM2_MEM.out.bam.join(SAMTOOLS_INDEX.out.bai)) + PILON(assembly_primary, pilon_mode, BWAMEM2_MEM_PILON.out.bam.join(SAMTOOLS_INDEX_PILON.out.bai)) assembly_polished = PILON.out.improved_assembly - QUAST_PILON(assembly_polished, quast_fasta, quast_gff, false, false ) + QUAST_PILON(assembly_polished, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) mqc_input = mqc_input.mix(QUAST_PILON.out.tsv) - assembly_unpurged = assembly_polished + ch_versions = ch_versions.mix(BWAMEM2_INDEX_PILON.out.versions) + ch_versions = ch_versions.mix(BWAMEM2_MEM_PILON.out.versions) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_PILON.out.versions) + ch_versions = ch_versions.mix(PILON.out.versions) + + assembly_unpurged = assembly_polished } else { - assembly_unpurged = assembly_primary + assembly_unpurged = assembly_primary + } + + if (params.fcs == 'yes') { + //Assembly cleaning + FCS_FCSADAPTOR_hap1(assembly_unpurged) + FCS_FCSGX_hap1(FCS_FCSADAPTOR_hap1.out.cleaned_assembly) + FCS_FCSGX_CLEAN_hap1(FCS_FCSADAPTOR_hap1.out.cleaned_assembly, FCS_FCSGX_hap1.out.fcs_gx_report) + SAMTOOLS_FAIDX_FCS_hap1(FCS_FCSGX_CLEAN_hap1.out.cleaned_fasta) + ch_versions = ch_versions.mix(FCS_FCSADAPTOR_hap1.out.versions) + ch_versions = ch_versions.mix(FCS_FCSGX_hap1.out.versions) + ch_versions = ch_versions.mix(FCS_FCSGX_CLEAN_hap1.out.versions) + + cleaned_hap1 = FCS_FCSGX_CLEAN_hap1.out.cleaned_fasta + cleaned_hap1_index = SAMTOOLS_FAIDX_FCS_hap1.out.fai + + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + FCS_FCSADAPTOR_ALT(assembly_alternate) + FCS_FCSGX_ALT(FCS_FCSADAPTOR_ALT.out.cleaned_assembly) + FCS_FCSGX_CLEAN_ALT(FCS_FCSADAPTOR_ALT.out.cleaned_assembly, FCS_FCSGX_ALT.out.fcs_gx_report) + SAMTOOLS_FAIDX_FCS_hap2(FCS_FCSGX_CLEAN_ALT.out.cleaned_fasta) + cleaned_hap2 = FCS_FCSGX_CLEAN_ALT.out.cleaned_fasta + cleaned_hap2_index = SAMTOOLS_FAIDX_FCS_hap2.out.fai + } + } else { + cleaned_hap1 = assembly_unpurged + SAMTOOLS_FAIDX_FCS_hap1(cleaned_hap1) + cleaned_hap1_index = SAMTOOLS_FAIDX_FCS_hap1.out.fai + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + cleaned_hap2 = assembly_alternate + SAMTOOLS_FAIDX_FCS_hap2(cleaned_hap2) + cleaned_hap2_index = SAMTOOLS_FAIDX_FCS_hap2.out.fai + } } - - //PurgeDups for primary assembly - PURGEDUPS_SPLITFA (assembly_unpurged) - MINIMAP2_ALIGN_TO_CONTIG (CUTADAPT.out.reads, assembly_unpurged.collect{it[1]}, false, false, false) - MINIMAP2_ALIGN_TO_SELF (PURGEDUPS_SPLITFA.out.split_fasta, [], false, false, false) - PURGEDUPS_PBCSTAT (MINIMAP2_ALIGN_TO_CONTIG.out.paf) - PURGEDUPS_CALCUTS (PURGEDUPS_PBCSTAT.out.stat) - PURGEDUPS_PURGEDUPS ( - PURGEDUPS_PBCSTAT.out.basecov - .join (PURGEDUPS_CALCUTS.out.cutoff ) - .join (MINIMAP2_ALIGN_TO_SELF.out.paf ) - ) - PURGEDUPS_GETSEQS (assembly_unpurged.join(PURGEDUPS_PURGEDUPS.out.bed)) - SAMTOOLS_FAIDX1 (PURGEDUPS_GETSEQS.out.purged) - purged_primary = PURGEDUPS_GETSEQS.out.purged - - if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { - //Merge haplotig from purge_dups and alternate assembly from hifiasm - CAT (assembly_alternate, PURGEDUPS_GETSEQS.out.haplotigs) - PURGEDUPS_SPLITFA_ALT (CAT.out.alternate_contigs_full) - MINIMAP2_ALIGN_TO_CONTIG_ALT (CUTADAPT.out.reads, CAT.out.alternate_contigs_full.collect{it[1]}, false, false, false) - MINIMAP2_ALIGN_TO_SELF_ALT (PURGEDUPS_SPLITFA_ALT.out.split_fasta, [], false, false, false) - PURGEDUPS_PBCSTAT_ALT (MINIMAP2_ALIGN_TO_CONTIG_ALT.out.paf) - PURGEDUPS_CALCUTS_ALT (PURGEDUPS_PBCSTAT_ALT.out.stat) - PURGEDUPS_PURGEDUPS_ALT ( - PURGEDUPS_PBCSTAT_ALT.out.basecov - .join (PURGEDUPS_CALCUTS_ALT.out.cutoff ) - .join (MINIMAP2_ALIGN_TO_SELF_ALT.out.paf ) - ) - PURGEDUPS_GETSEQS_ALT (assembly_alternate.join(PURGEDUPS_PURGEDUPS_ALT.out.bed)) - SAMTOOLS_FAIDX1_ALT (PURGEDUPS_GETSEQS_ALT.out.purged) - purged_alternate = PURGEDUPS_GETSEQS_ALT.out.purged - - QUAST2_DOUBLE (purged_primary, purged_alternate, quast_fasta, quast_gff, false, false ) - mqc_input = mqc_input.mix(QUAST2_DOUBLE.out.tsv) - quast_contig_purged=QUAST2_DOUBLE.out.tsv - MERQURY2_DOUBLE(MERYL_COUNT.out.meryl_db, purged_primary, purged_alternate) + //QC on cleaned contig assemblies + if ((params.run_busco == 'yes') && (params.busco_extend == 'every_step')) { + if (params.lineage) { + BUSCO_lin1_cleaned(cleaned_hap1, params.lineage, params.busco_lineages_path, []) + } else { + BUSCO_lin1_cleaned(cleaned_hap1, 'auto', params.busco_lineages_path, []) + } + mqc_input = mqc_input.mix(BUSCO_lin1_cleaned.out.short_summaries_txt.collect{it[1]}) + } + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + QUAST_CLEAN_DOUBLE (cleaned_hap1, cleaned_hap2, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_CLEAN_DOUBLE.out.tsv) } else { - QUAST2 (purged_primary, quast_fasta, quast_gff, false, false ) - mqc_input = mqc_input.mix(QUAST2.out.tsv) - quast_contig_purged = QUAST2.out.tsv - MERQURY2(MERYL_COUNT.out.meryl_db.join(purged_primary)) + QUAST_CLEAN (cleaned_hap1, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_CLEAN.out.tsv) } + if (params.purging_method == 'purge_dups') { + //PurgeDups for primary assembly + PURGEDUPS_SPLITFA_hap1 (cleaned_hap1) + MINIMAP2_ALIGN_TO_CONTIG (CUTADAPT.out.reads, cleaned_hap1.collect{it[1]}, false, false, false) + MINIMAP2_ALIGN_TO_SELF (PURGEDUPS_SPLITFA_hap1.out.split_fasta, [], false, false, false) + PURGEDUPS_PBCSTAT_hap1 (MINIMAP2_ALIGN_TO_CONTIG.out.paf) + PURGEDUPS_CALCUTS_hap1 (PURGEDUPS_PBCSTAT_hap1.out.stat) + PURGEDUPS_PURGEDUPS_hap1 (PURGEDUPS_PBCSTAT_hap1.out.basecov.join (PURGEDUPS_CALCUTS_hap1.out.cutoff), MINIMAP2_ALIGN_TO_SELF.out.paf ) + PURGEDUPS_GETSEQS_hap1 (cleaned_hap1, PURGEDUPS_PURGEDUPS_hap1.out.bed) + SAMTOOLS_FAIDX_PURGE_hap1 (PURGEDUPS_GETSEQS_hap1.out.purged) + purged_hap1 = PURGEDUPS_GETSEQS_hap1.out.purged + purged_hap1_index = SAMTOOLS_FAIDX_PURGE_hap1.out.fai + ch_versions = ch_versions.mix(PURGEDUPS_SPLITFA_hap1.out.versions) + ch_versions = ch_versions.mix(MINIMAP2_ALIGN_TO_CONTIG.out.versions) + ch_versions = ch_versions.mix(MINIMAP2_ALIGN_TO_SELF.out.versions) + ch_versions = ch_versions.mix(PURGEDUPS_PBCSTAT_hap1.out.versions) + ch_versions = ch_versions.mix(PURGEDUPS_CALCUTS_hap1.out.versions) + ch_versions = ch_versions.mix(PURGEDUPS_PURGEDUPS_hap1.out.versions) + ch_versions = ch_versions.mix(PURGEDUPS_GETSEQS_hap1.out.versions) + ch_versions = ch_versions.mix(SAMTOOLS_FAIDX_PURGE_hap1.out.versions) + + if (params.run_busco == 'yes') { + if (params.lineage) { + BUSCO_lin1_purged(purged_hap1, params.lineage, params.busco_lineages_path, []) + } else { + BUSCO_lin1_purged(purged_hap1, 'auto', params.busco_lineages_path, []) + } + mqc_input = mqc_input.mix(BUSCO_lin1_purged.out.short_summaries_txt.collect{it[1]}) + // Gather versions of all tools used + ch_versions = ch_versions.mix(BUSCO_lin1_purged.out.versions) + } + + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + //Merge haplotig from purge_dups and alternate assembly from hifiasm + CAT (cleaned_hap2, PURGEDUPS_GETSEQS_hap1.out.haplotigs) + PURGEDUPS_SPLITFA_ALT (CAT.out.alternate_contigs_full) + MINIMAP2_ALIGN_TO_CONTIG_ALT (CUTADAPT.out.reads, CAT.out.alternate_contigs_full.collect{it[1]}, false, false, false) + MINIMAP2_ALIGN_TO_SELF_ALT (PURGEDUPS_SPLITFA_ALT.out.split_fasta, [], false, false, false) + PURGEDUPS_PBCSTAT_ALT (MINIMAP2_ALIGN_TO_CONTIG_ALT.out.paf) + PURGEDUPS_CALCUTS_ALT (PURGEDUPS_PBCSTAT_ALT.out.stat) + PURGEDUPS_PURGEDUPS_ALT (PURGEDUPS_PBCSTAT_ALT.out.basecov.join (PURGEDUPS_CALCUTS_ALT.out.cutoff), MINIMAP2_ALIGN_TO_SELF_ALT.out.paf ) + PURGEDUPS_GETSEQS_ALT (CAT.out.alternate_contigs_full, PURGEDUPS_PURGEDUPS_ALT.out.bed) + SAMTOOLS_FAIDX_PURGE_hap2 (PURGEDUPS_GETSEQS_ALT.out.purged) + purged_hap2 = PURGEDUPS_GETSEQS_ALT.out.purged + purged_hap2_index = SAMTOOLS_FAIDX_PURGE_hap2.out.fai + + QUAST_PURGED_DOUBLE (purged_hap1, purged_hap2, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_PURGED_DOUBLE.out.tsv) + quast_contig_purged=QUAST_PURGED_DOUBLE.out.renamed_tsv + MERQURY_PURGED_DOUBLE(MERYL_COUNT.out.meryl_db, purged_hap1, purged_hap2) + } else { + QUAST_PURGED (purged_hap1, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_PURGED.out.tsv) + quast_contig_purged = QUAST_PURGED.out.renamed_tsv + MERQURY_PURGED(MERYL_COUNT.out.meryl_db.join(purged_hap1)) + } + } else { + purged_hap1 = cleaned_hap1 + purged_hap1_index = cleaned_hap1_index + quast_contig_purged = file('quast_purged_dummy') + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + purged_hap2 = cleaned_hap2 + purged_hap2_index = cleaned_hap2_index + } + } + //Only if HiC data is available if (( params.hic_read1 ) && ( params.hic_read2 )) { //HIC scaffolding: The method is selected in the parameters : salsa or yahs if ( params.scaffolding_method == "salsa") { // For SALSA2, need to run nf-core/hic - PREPARE_GENOME (PURGEDUPS_GETSEQS.out.purged, params.restriction_site) + PREPARE_GENOME (purged_hap1, params.restriction_site) FASTQC (input_hic_R1_R2) ch_map_res = Channel.from( params.bin_size ).splitCsv().flatten().toInteger() HICPRO (input_hic_R1_R2, PREPARE_GENOME.out.index, PREPARE_GENOME.out.res_frag, PREPARE_GENOME.out.chromosome_size, params.ligation_site, ch_map_res) @@ -471,7 +694,7 @@ workflow { .filter{ it[0].resolution == it[2] } .map { it -> [it[0], it[1], it[2]]} .set{ ch_cool_compartments } - COMPARTMENTS (ch_cool_compartments, PURGEDUPS_GETSEQS.out.purged, PREPARE_GENOME.out.chromosome_size) + COMPARTMENTS (ch_cool_compartments, purged_hap1, PREPARE_GENOME.out.chromosome_size) COOLER.out.cool .combine(ch_tads_res) .filter{ it[0].resolution == it[2] } @@ -489,11 +712,11 @@ workflow { //Then start the pipeline again. BED_PROCESSING(HICPRO.out.bam) - SALSA2 (PURGEDUPS_GETSEQS.out.purged.join(SAMTOOLS_FAIDX1.out.fai), BED_PROCESSING.out.sorted_bed.collect{it[1]}, [], [], [] ) - SAMTOOLS_FAIDX2 (SALSA2.out.fasta) + SALSA2 (purged_hap1.join(purged_hap1_index), BED_PROCESSING.out.sorted_bed.collect{it[1]}, [], [], [] ) + SAMTOOLS_FAIDX_SCAFF_hap1 (SALSA2.out.fasta) scaffold = SALSA2.out.fasta scaffold_agp = SALSA2.out.agp - scaffold_index = SAMTOOLS_FAIDX2.out.fai + scaffold_index = SAMTOOLS_FAIDX_SCAFF_hap1.out.fai SALSA2_JUICER ( scaffold @@ -503,71 +726,139 @@ workflow { .join(SALSA2.out.scaffold_length_iteration_1) ) } else if ( params.scaffolding_method == "yahs") { - CHROMAP_INDEX(PURGEDUPS_GETSEQS.out.purged) - CHROMAP_CHROMAP(input_hic_R1_R2, PURGEDUPS_GETSEQS.out.purged, CHROMAP_INDEX.out.index, [],[],[],[]) - YAHS(PURGEDUPS_GETSEQS.out.purged, SAMTOOLS_FAIDX1.out.fai, CHROMAP_CHROMAP.out.bam) - SAMTOOLS_FAIDX2(YAHS.out.fasta) - - scaffold = YAHS.out.fasta - scaffold_agp = YAHS.out.agp - scaffold_bin = YAHS.out.bin - scaffold_index = SAMTOOLS_FAIDX2.out.fai - - if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { - CHROMAP_INDEX_ALT(purged_alternate) - CHROMAP_CHROMAP_ALT(input_hic_R1_R2, purged_alternate, CHROMAP_INDEX_ALT.out.index, [],[],[],[]) - YAHS_ALT(purged_alternate, SAMTOOLS_FAIDX1_ALT.out.fai, CHROMAP_CHROMAP_ALT.out.bam) - SAMTOOLS_FAIDX2_ALT(YAHS_ALT.out.fasta) + CHROMAP_INDEX_hap1(purged_hap1) + CHROMAP_CHROMAP_hap1(input_hic_R1_R2, purged_hap1, CHROMAP_INDEX_hap1.out.index, [],[],[],[]) + YAHS_hap1(purged_hap1, purged_hap1_index, CHROMAP_CHROMAP_hap1.out.bam) + SAMTOOLS_FAIDX_SCAFF_hap1(YAHS_hap1.out.fasta) + + // Gather versions of all tools used + ch_versions = ch_versions.mix(CHROMAP_INDEX_hap1.out.versions) + ch_versions = ch_versions.mix(CHROMAP_CHROMAP_hap1.out.versions) + ch_versions = ch_versions.mix(YAHS_hap1.out.versions) + ch_versions = ch_versions.mix(SAMTOOLS_FAIDX_SCAFF_hap1.out.versions) + + scaffold = YAHS_hap1.out.fasta + scaffold_agp = YAHS_hap1.out.agp + scaffold_bin = YAHS_hap1.out.bin + scaffold_index = SAMTOOLS_FAIDX_SCAFF_hap1.out.fai + + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + CHROMAP_INDEX_ALT(purged_hap2) + CHROMAP_CHROMAP_ALT(input_hic_R1_R2, purged_hap2, CHROMAP_INDEX_ALT.out.index, [],[],[],[]) + YAHS_ALT(purged_hap2, purged_hap2_index, CHROMAP_CHROMAP_ALT.out.bam) + SAMTOOLS_FAIDX_SCAFF_hap2(YAHS_ALT.out.fasta) scaffold_alt = YAHS_ALT.out.fasta scaffold_agp_alt = YAHS_ALT.out.agp scaffold_bin_alt = YAHS_ALT.out.bin - scaffold_index_alt = SAMTOOLS_FAIDX2_ALT.out.fai + scaffold_index_alt = SAMTOOLS_FAIDX_SCAFF_hap2.out.fai } } else { error "Invalid alignment mode: params.scaffolding_method " } //Scaffold QC - if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { - QUAST3_DOUBLE (scaffold, scaffold_alt, quast_fasta, quast_gff, false, false ) - mqc_input = mqc_input.mix(QUAST3_DOUBLE.out.tsv) - quast_scaffold = QUAST3_DOUBLE.out.tsv - MERQURY3_DOUBLE(MERYL_COUNT.out.meryl_db, scaffold, scaffold_alt) + if (params.run_busco == 'yes') { + if (params.lineage) { + BUSCO_lin1_SCAFF(scaffold, params.lineage, params.busco_lineages_path, []) + } else { + BUSCO_lin1_SCAFF(scaffold, 'auto', params.busco_lineages_path, []) + } + mqc_input = mqc_input.mix(BUSCO_lin1_SCAFF.out.short_summaries_txt.collect{it[1]}) + } + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + QUAST_SCAFF_DOUBLE (scaffold, scaffold_alt, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_SCAFF_DOUBLE.out.tsv) + quast_scaffold = QUAST_SCAFF_DOUBLE.out.renamed_tsv + MERQURY_SCAFF_DOUBLE(MERYL_COUNT.out.meryl_db, scaffold, scaffold_alt) } else { - QUAST3 (scaffold, quast_fasta, quast_gff, false, false ) - mqc_input = mqc_input.mix(QUAST3.out.tsv) - quast_scaffold = QUAST3.out.tsv - MERQURY3(MERYL_COUNT.out.meryl_db.join(scaffold)) + QUAST_SCAFF (scaffold, quast_fasta, quast_gff, false, false, GOAT_TAXONSEARCH.out.genome_size) + mqc_input = mqc_input.mix(QUAST_SCAFF.out.tsv) + quast_scaffold = QUAST_SCAFF.out.renamed_tsv + MERQURY_SCAFF(MERYL_COUNT.out.meryl_db.join(scaffold)) } - - // JUICER must have contig fai for scaffold assembly - YAHS_JUICER (scaffold_agp.join(scaffold_bin), SAMTOOLS_FAIDX1.out.fai) + + if (params.methylation_calling == 'yes') { + //Map PacBio data against newly genevated assembly + JASMINE (final_pacBio_bam) + ch_versions = ch_versions.mix(JASMINE.out.versions) + PBMM2 (JASMINE.out.cpg_bam, scaffold) + } else { + PBMM2 (final_pacBio_bam, scaffold) + } + SAMTOOLS_INDEX_PBMM2 (PBMM2.out.aligned_bam) + // Gather versions of all tools used + ch_versions = ch_versions.mix(PBMM2.out.versions) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_PBMM2.out.versions) + + + // JUICER must have contig fai for scaffold assembly + YAHS_JUICER (scaffold_agp, scaffold_bin, purged_hap1_index) chrom_size = YAHS_JUICER.out.chrom_sizes -// JUICER(YAHS_JUICER.out.chrom_sizes, YAHS_JUICER.out.alignments_sorted_txt) - PRETEXTMAP(YAHS_JUICER.out.chrom_sizes, YAHS_JUICER.out.alignments_sorted_txt) - PRETEXTSNAPSHOT (PRETEXTMAP.out.pretext) - -//Blobtoolkit is commented out as it requires some local installation -/* - //BLOBTOOLSKIT + // Gather versions of all tools used + ch_versions = ch_versions.mix(YAHS_JUICER.out.versions) + + GFASTATS(scaffold) + //Identify telomere sequences + TIDK(scaffold) + //Calculate Pacbio coverage and output a bedgraph + BEDTOOLS_GENOMECOV(PBMM2.out.aligned_bam, '1', [], 'bedgraph') + ch_versions = ch_versions.mix(GFASTATS.out.versions) + ch_versions = ch_versions.mix(TIDK.out.versions) + + if (params.pretext == 'yes'){ + //PRETEXT + PRETEXTMAP(YAHS_JUICER.out.chrom_sizes, YAHS_JUICER.out.alignments_sorted_txt) + PRETEXTSNAPSHOT (PRETEXTMAP.out.pretext) + //Add the telomere bedgraph to pretextgraph + PRETEXTGRAPH_TELO(PRETEXTMAP.out.pretext, TIDK.out.bedgraph_telomere, 'telomere') + //Add the coverage bedgraph to pretextgrap + PRETEXTGRAPH_TELO_COV(PRETEXTGRAPH_TELO.out.pretext, BEDTOOLS_GENOMECOV.out.genomecov, 'coverage') + + ch_versions = ch_versions.mix(PRETEXTMAP.out.versions) + ch_versions = ch_versions.mix(PRETEXTSNAPSHOT.out.versions) + ch_versions = ch_versions.mix(PRETEXTGRAPH_TELO.out.versions) + } + + if ((params.juicer == 'yes')) { + // JUICER must have contig fai for scaffold assembly + JUICER(YAHS_JUICER.out.chrom_sizes, YAHS_JUICER.out.alignments_sorted_txt) + // Gather versions of all tools used + ch_versions = ch_versions.mix(JUICER.out.versions) + } + + + //Genome comparison + if (params.genome_comparison == 'yes') { + input_ncbi = [ [ id:params.related_genome, single_end:true ] ] + NCBIGENOMEDOWNLOAD(input_ncbi, []) + JUPITER(scaffold, NCBIGENOMEDOWNLOAD.out.fna) + MASHMAP(scaffold, NCBIGENOMEDOWNLOAD.out.fna) + ch_versions = ch_versions.mix(NCBIGENOMEDOWNLOAD.out.versions) + ch_versions = ch_versions.mix(JUPITER.out.versions) + ch_versions = ch_versions.mix(MASHMAP.out.versions) + } + GZIP(scaffold) - if( params.bam_cell4 ) { - BLOBTOOLS_CONFIG(GZIP.out.gz, FOURBAM2FASTX.out.reads) - } else if( params.bam_cell3 ) { - BLOBTOOLS_CONFIG(GZIP.out.gz, THREEBAM2FASTX.out.reads) - } else if( params.bam_cell2 ){ - BLOBTOOLS_CONFIG(GZIP.out.gz, TWOBAM2FASTX.out.reads) - } else { - BLOBTOOLS_CONFIG(GZIP.out.gz, BAM2FASTX.out.reads) + + if ((params.blobtools == 'yes') && (params.lineage)) { + if (params.lineage2) { + BLOBTOOLS_CONFIG_2LINEAGES(GZIP.out.gz, bam2fastx_output) + blobtools_config=BLOBTOOLS_CONFIG_2LINEAGES.out.config + } else { + BLOBTOOLS_CONFIG_1LINEAGE(GZIP.out.gz, bam2fastx_output) + blobtools_config=BLOBTOOLS_CONFIG_1LINEAGE.out.config + } + BLOBTOOLS_PIPELINE(blobtools_config, GZIP.out.gz) + BLOBTOOLS_CREATE(scaffold, blobtools_config) + BLOBTOOLS_ADD(BLOBTOOLS_PIPELINE.out.blast_out, BLOBTOOLS_PIPELINE.out.diamond_proteome_out, BLOBTOOLS_PIPELINE.out.diamond_busco_out, BLOBTOOLS_PIPELINE.out.assembly_minimap_bam, BLOBTOOLS_PIPELINE.out.hic_minimap_bam, BLOBTOOLS_PIPELINE.out.lineage1_full_table_tsv, BLOBTOOLS_CREATE.out.blobtools_folder) + BLOBTOOLS_VIEW(BLOBTOOLS_ADD.out.blobtools_folder) + + ch_versions = ch_versions.mix(BLOBTOOLS_PIPELINE.out.versions) } - BLOBTOOLS_PIPELINE(BLOBTOOLS_CONFIG.out.config, GZIP.out.gz) - BLOBTOOLS_CREATE(scaffold, BLOBTOOLS_CONFIG.out.config) - BLOBTOOLS_ADD(BLOBTOOLS_PIPELINE.out.blast_out, BLOBTOOLS_PIPELINE.out.diamond_proteome_out, BLOBTOOLS_PIPELINE.out.diamond_busco_out, BLOBTOOLS_PIPELINE.out.assembly_minimap_bam, BLOBTOOLS_PIPELINE.out.hic_minimap_bam , BLOBTOOLS_PIPELINE.out.lineage1_full_table_tsv , BLOBTOOLS_PIPELINE.out.lineage2_full_table_tsv, BLOBTOOLS_CREATE.out.blobtools_folder) - BLOBTOOLS_VIEW_SNAIL(BLOBTOOLS_ADD.out.blobtools_folder) - BLOBTOOLS_VIEW_BLOB(BLOBTOOLS_ADD.out.blobtools_folder) - BLOBTOOLS_VIEW_CUMULATIVE(BLOBTOOLS_ADD.out.blobtools_folder) -*/ + + RAPIDCURATION_SPLIT(scaffold) + ch_versions = ch_versions.mix(RAPIDCURATION_SPLIT.out.versions) } else { quast_scaffold = file('quast_scaffold_dummy') chrom_size = [ @@ -576,38 +867,47 @@ workflow { ] } - -//BUSCO is commented out as it requires local database -/* - //Busco ran only once on the most final assembly (scaffold > purged > contig) + //Busco is ran on all lineages for hap 1 only for the most final assembly (scaffold > purged > contig) + //BUSCO is ran on the principal lineage for hap2 only for the most final assembly (scaffold > purged > contig) //Define which file to run Busco on if (( params.hic_read1 ) && ( params.hic_read2 )) { busco_assembly = scaffold - if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { + if ((params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { busco_assembly_alt = scaffold_alt } } else { busco_assembly = purged_primary - if (( params.assembly_method == 'hifiasm' ) && (params.ploidy != '1')) { + if (( params.assembly_method == 'hifiasm' ) && (params.hap2 == 'yes')) { busco_assembly_alt = purged_alternate } } + if (params.run_busco == 'yes') { + if (( params.hic_read1 ) && ( params.hic_read2 )) { + busco_lin1_json=BUSCO_lin1_SCAFF.out.short_summaries_json + } else { + busco_lin1_json=BUSCO_lin1_purged.out.short_summaries_json + } + } else { + busco_lin1_json = [ + [ id:'dummy', single_end: true], // meta map + [ file('busco_lin1_json_dummy')] + ] + } - BUSCO (busco_assembly, params.lineage, params.busco_lineages_path, []) - mqc_input = mqc_input.mix(BUSCO.out.short_summaries_txt.collect{it[1]}) - if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { - BUSCO_ALT (busco_assembly_alt, params.lineage, params.busco_lineages_path, []) + if ((params.run_busco == 'yes') && (params.busco_extend == 'every_step') && (params.assembly_method == 'hifiasm') && (params.hap2 == 'yes')) { + if (params.lineage) { + BUSCO_ALT (busco_assembly_alt, params.lineage, params.busco_lineages_path, []) + } else { + BUSCO_ALT (busco_assembly_alt, 'auto', params.busco_lineages_path, []) + } mqc_input = mqc_input.mix(BUSCO_ALT.out.short_summaries_txt.collect{it[1]}) } + if (params.lineage2) { - BUSCO_lin2(busco_assembly, params.lineage2, params.busco_lineages_path, []) - mqc_input = mqc_input.mix(BUSCO_lin2.out.short_summaries_txt.collect{it[1]}) + BUSCO_lin2(busco_assembly, params.lineage2, params.busco_lineages_path, []) + mqc_input = mqc_input.mix(BUSCO_lin2.out.short_summaries_txt.collect{it[1]}) busco_lin2_json = BUSCO_lin2.out.short_summaries_json -// if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { -// BUSCO_lin2ALT (busco_assembly_alt, params.lineage2, params.busco_lineages_path, []) -// mqc_input = mqc_input.mix(BUSCO_lin2ALT.out.short_summaries_txt.collect{it[1]}) -// } } else { busco_lin2_json = [ [ id:'dummy', single_end: true], // meta map @@ -618,10 +918,6 @@ workflow { BUSCO_lin3(busco_assembly, params.lineage3, params.busco_lineages_path, []) mqc_input = mqc_input.mix(BUSCO_lin3.out.short_summaries_txt.collect{it[1]}) busco_lin3_json = BUSCO_lin3.out.short_summaries_json -// if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { -// BUSCO_lin3ALT (busco_assembly_alt, params.lineage3, params.busco_lineages_path, []) -// mqc_input = mqc_input.mix(BUSCO_lin3ALT.out.short_summaries_txt.collect{it[1]}) -// } } else { busco_lin3_json = [ [ id:'dummy', single_end: true], // meta map @@ -632,20 +928,26 @@ workflow { BUSCO_lin4(busco_assembly, params.lineage4, params.busco_lineages_path, []) mqc_input = mqc_input.mix(BUSCO_lin4.out.short_summaries_txt.collect{it[1]}) busco_lin4_json = BUSCO_lin4.out.short_summaries_json -// if ((params.assembly_method == 'hifiasm') && (params.ploidy != '1')) { -// BUSCO_lin4ALT (busco_assembly_alt, params.lineage4, params.busco_lineages_path, []) -// mqc_input = mqc_input.mix(BUSCO_lin4ALT.out.short_summaries_txt.collect{it[1]}) -// } } else { busco_lin4_json = [ [ id:'dummy', single_end: true], // meta map [ file('busco_lin4_json_dummy')] ] } -*/ + + // Gather versions of all tools used + ch_version_yaml = Channel.empty() + CUSTOM_DUMPSOFTWAREVERSIONS(ch_versions.unique().collectFile(name: 'collated_versions.yml')) + ch_version_yaml = CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect() //MultiQC report + mqc_input = mqc_input.mix(ch_version_yaml) MULTIQC (mqc_input.collect(), [], [], [], COVERAGE_CALCULATION.out.coverage) + ch_versions = ch_versions.mix(MULTIQC.out.versions) + +// OVERVIEW_GENERATION_SAMPLE(LONGQC_PACBIO.out.report_json, kraken_pacbio, kraken_hic, quast_contig, quast_contig_purged, quast_scaffold, busco_lin1_json, busco_lin2_json, busco_lin3_json, busco_lin4_json, chrom_size, GOAT_TAXONSEARCH.out.ploidy, GOAT_TAXONSEARCH.out.haploid_number, GOAT_TAXONSEARCH.out.scientific_name, GOAT_TAXONSEARCH.out.genome_size) + + } diff --git a/modules/LongQC/main.nf b/modules/LongQC/main.nf index 3a2afef..d472ab9 100644 --- a/modules/LongQC/main.nf +++ b/modules/LongQC/main.nf @@ -10,25 +10,31 @@ process LONGQC { output: tuple val(meta), path('*.html'), emit: report tuple val(meta), path('*.json'), emit: report_json + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ python \\ - $args \\ - longQC.py \\ - sampleqc \\ - -x pb-sequel \\ - -o LongQC \\ - --sample_name ${meta.id} \\ - -p 32 \\ - ${reads} \\ + ${params.singularity_cache}/LongQC/longQC.py \\ + sampleqc \\ + -o LongQC \\ + --sample_name ${meta.id} \\ + -p 32 \\ + $args \\ + ${reads} \\ > LongQC.log mv LongQC/*.html . mv LongQC/*.json . rm LongQC/analysis/*.fastq + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + longqc : \$(python ${params.singularity_cache}/LongQC/longQC.py --version | sed 's/LongQC //g') + END_VERSIONS """ } diff --git a/modules/bed_processing/main.nf b/modules/bed_processing/main.nf index 58cd147..1575d0b 100644 --- a/modules/bed_processing/main.nf +++ b/modules/bed_processing/main.nf @@ -2,20 +2,27 @@ process BED_PROCESSING { tag "$meta.id" label 'process_high' - conda "bioconda::bedtools=2.30.0" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/bedtools:2.30.0--hc088bd4_0' : - 'quay.io/biocontainers/bedtools:2.30.0--hc088bd4_0' }" - input: tuple val(meta), path(bam) output: tuple val(meta), path("*_sorted.bed"), emit: sorted_bed + path "versions.yml" , emit: versions script: """ + #awk '{print \$1"\t"\$2"\t"\$3"\t"\$4"/1""\t"\$5"\t"\$6}' $bed_F > F.bed + #awk '{print \$1"\t"\$2"\t"\$3"\t"\$4"/2""\t"\$5"\t"\$6}' $bed_R > R.bed + #cat F.bed R.bed > concat.bed + #for BAM in `ls $projectDir/results/hicpro/mapping/*.bam`; + #do bedtools bamtobed -i $BAM > merged_bed + bedtools bamtobed -i $bam > merged_bed sort --parallel=8 --buffer-size=80% --temporary-directory=$projectDir --output=bed_sorted.bed \$merged_bed + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") + END_VERSIONS """ } diff --git a/modules/bedtools/bamtobed/main.nf b/modules/bedtools/bamtobed/main.nf index e967357..92b5d48 100644 --- a/modules/bedtools/bamtobed/main.nf +++ b/modules/bedtools/bamtobed/main.nf @@ -2,7 +2,7 @@ process BEDTOOLS_BAMTOBED { tag "$meta.id" label 'process_medium' - conda "bioconda::bedtools=2.30.0" +// conda "bioconda::bedtools=2.30.0" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/bedtools:2.30.0--hc088bd4_0' : 'quay.io/biocontainers/bedtools:2.30.0--hc088bd4_0' }" diff --git a/modules/bedtools/genomecov/main.nf b/modules/bedtools/genomecov/main.nf new file mode 100644 index 0000000..e805d84 --- /dev/null +++ b/modules/bedtools/genomecov/main.nf @@ -0,0 +1,67 @@ +process BEDTOOLS_GENOMECOV { + tag "$meta.id" + label 'process_single' + + conda "bioconda::bedtools=2.30.0" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bedtools:2.30.0--hc088bd4_0' : + 'biocontainers/bedtools:2.30.0--hc088bd4_0' }" + + input: + tuple val(meta), path(intervals) + val(scale) + path sizes + val extension + + output: + tuple val(meta), path("*.${extension}") , emit: genomecov + tuple val(meta), path("*_gap.${extension}"), emit: gap_bedgraph + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args_list = args.tokenize() + args += (scale > 0 && scale != 1) ? " -scale $scale" : "" + if (!args_list.contains('-bg') && (scale > 0 && scale != 1)) { + args += " -bg" + } + + def prefix = task.ext.prefix ?: "${meta.id}" + if (intervals.name =~ /\.bam/) { + """ + bedtools \\ + genomecov \\ + -ibam $intervals \\ + $args \\ + > ${prefix}.${extension} + + #Bedgraph with gaps + awk '\$4 < 1' ${prefix}.${extension} > ${prefix}_gap.${extension} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") + END_VERSIONS + """ + } else { + """ + bedtools \\ + genomecov \\ + -i $intervals \\ + -g $sizes \\ + $args \\ + > ${prefix}.${extension} + + #Bedgraph with gaps + awk '\$4 < 1' ${prefix}.${extension} > ${prefix}_gap.${extension} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") + END_VERSIONS + """ + } +} diff --git a/modules/blobtools/blobtools_add/main.nf b/modules/blobtools/blobtools_add/main.nf index 0ea2de0..6380d13 100644 --- a/modules/blobtools/blobtools_add/main.nf +++ b/modules/blobtools/blobtools_add/main.nf @@ -9,28 +9,34 @@ process BLOBTOOLS_ADD { tuple val(meta), path(assembly_minimap_bam) tuple val(meta), path(hic_minimap_bam) tuple val(meta), path(lineage1_full_table_tsv) - tuple val(meta), path(lineage2_full_table_tsv) +// tuple val(meta), path(lineage2_full_table_tsv) tuple val(meta), path(blobtools_folder) output: +// tuple val(meta), path("${meta.id}/*.json"), emit: json tuple val(meta), path("${meta.id}"), emit: blobtools_folder + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ - singularity exec -B /projects blobtoolkit-blobtools_latest.sif blobtools add \ + singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools add \ --hits ${blast_out} \ --hits ${diamond_proteome_out} \ --hits ${diamond_busco_out} \ --taxrule bestsumorder \ - --taxdump BlobtoolkitDatabase/taxdump \ + --taxdump ${params.Blobtoolkit_db}/taxdump \ --cov ${assembly_minimap_bam} \ --cov ${hic_minimap_bam} \ --busco ${lineage1_full_table_tsv} \ - --busco ${lineage2_full_table_tsv} \ --link taxon.taxid.ENA="https://www.ebi.ac.uk/ena/data/view/Taxon:${params.taxon_taxid}" \ --link taxon.name.Wikipedia="https://en.wikipedia.org/wiki/${meta.id}" \ ${meta.id} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + blobtools add: \$(singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools --version | sed -e "s/blobtoolkit v//g") + END_VERSIONS """ } diff --git a/modules/blobtools/blobtools_config/blobtools_config_1lineage/main.nf b/modules/blobtools/blobtools_config/blobtools_config_1lineage/main.nf new file mode 100644 index 0000000..8f32fca --- /dev/null +++ b/modules/blobtools/blobtools_config/blobtools_config_1lineage/main.nf @@ -0,0 +1,66 @@ +process BLOBTOOLS_CONFIG_1LINEAGE { + tag "$meta.id" + label 'process_low' + + input: + tuple val(meta), path(assembly) + tuple val(meta), path(pacbio_fastq) + + output: + tuple val(meta), path('config.yaml'), emit: config + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" +//The assembly file must be compressed (fa.gz) bgzip -c + """ + echo "assembly: + accession: ${meta.id} + file: ${params.outdir}/QC/blobtools/${assembly} + level: scaffold + prefix: ${meta.id} + busco: + download_dir: ${params.busco_lineages_path} + lineages: + - ${params.lineage} + basal_lineages: + - ${params.lineage} + reads: + paired: + - prefix: ${params.Illumina_prefix} + platform: ILLUMINA + file: ${params.hic_read1};${params.hic_read2} + single: + - prefix: ${meta.id} + platform: PACBIO_SMRT + file: ${params.outdir}/preprocessing/bam2fastx/${pacbio_fastq} + revision: 0 + settings: + blast_chunk: 100000 + blast_max_chunks: 10 + blast_overlap: 0 + blast_min_length: 1000 + taxdump: ${params.Blobtoolkit_db}/taxdump + tmp: /tmp + similarity: + defaults: + evalue: 1.0e-10 + import_evalue: 1.0e-25 + max_target_seqs: 10 + taxrule: bestdistorder + diamond_blastx: + name: reference_proteomes + path: ${params.Blobtoolkit_db}/uniprot + diamond_blastp: + name: reference_proteomes + path: ${params.Blobtoolkit_db}/uniprot + import_max_target_seqs: 100000 + blastn: + name: nt + path: ${params.Blobtoolkit_db}/nt + taxon: + name: ${params.taxon_name} + taxid: '${params.taxon_taxid}' + version: 1" > config.yaml + """ +} diff --git a/modules/blobtools/blobtools_config/main.nf b/modules/blobtools/blobtools_config/blobtools_config_2lineages/main.nf similarity index 82% rename from modules/blobtools/blobtools_config/main.nf rename to modules/blobtools/blobtools_config/blobtools_config_2lineages/main.nf index ccc7d25..7de5b75 100644 --- a/modules/blobtools/blobtools_config/main.nf +++ b/modules/blobtools/blobtools_config/blobtools_config_2lineages/main.nf @@ -1,4 +1,4 @@ -process BLOBTOOLS_CONFIG { +process BLOBTOOLS_CONFIG_2LINEAGES { tag "$meta.id" label 'process_low' @@ -16,11 +16,11 @@ process BLOBTOOLS_CONFIG { """ echo "assembly: accession: ${meta.id} - file: ${params.outdir}/blobtools/${assembly} + file: ${params.outdir}/QC/blobtools/${assembly} level: scaffold prefix: ${meta.id} busco: - download_dir: busco_downloads/ + download_dir: ${params.busco_lineages_path} lineages: - ${params.lineage} - ${params.lineage2} @@ -42,7 +42,7 @@ process BLOBTOOLS_CONFIG { blast_max_chunks: 10 blast_overlap: 0 blast_min_length: 1000 - taxdump: BlobtoolkitDatabase/taxdump + taxdump: ${params.Blobtoolkit_db}/taxdump tmp: /tmp similarity: defaults: @@ -52,14 +52,14 @@ process BLOBTOOLS_CONFIG { taxrule: bestdistorder diamond_blastx: name: reference_proteomes - path: BlobtoolkitDatabase/uniprot + path: ${params.Blobtoolkit_db}/uniprot diamond_blastp: name: reference_proteomes - path: BlobtoolkitDatabase/uniprot + path: ${params.Blobtoolkit_db}/uniprot import_max_target_seqs: 100000 blastn: name: nt - path: BlobtoolkitDatabase/nt + path: ${params.Blobtoolkit_db}/nt taxon: name: ${params.taxon_name} taxid: '${params.taxon_taxid}' diff --git a/modules/blobtools/blobtools_create/main.nf b/modules/blobtools/blobtools_create/main.nf index 2d3275a..e4182c1 100644 --- a/modules/blobtools/blobtools_create/main.nf +++ b/modules/blobtools/blobtools_create/main.nf @@ -8,18 +8,25 @@ process BLOBTOOLS_CREATE { output: tuple val(meta), path("${meta.id}"), emit: blobtools_folder +// tuple val(meta), path('blobtools_folder'), emit: blobtools_folder tuple val(meta), path('*/*.json'), emit: json tuple val(meta), path('*/meta.json'), emit:meta_json + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" - """ - singularity exec -B /projects blobtoolkit-blobtools_latest.sif blobtools create \ + """ + singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools create \ --fasta ${assembly} \ --meta ${config} \ --taxid ${params.taxon_taxid} \ - --taxdump BlobtoolkitDatabase/taxdump \ + --taxdump ${params.Blobtoolkit_db}/taxdump \ ${meta.id} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + blobtools create: \$(singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools --version | sed -e "s/blobtoolkit v//g") + END_VERSIONS """ } diff --git a/modules/blobtools/blobtools_pipeline/main.nf b/modules/blobtools/blobtools_pipeline/main.nf index a9bc2be..f3a781c 100644 --- a/modules/blobtools/blobtools_pipeline/main.nf +++ b/modules/blobtools/blobtools_pipeline/main.nf @@ -13,7 +13,8 @@ process BLOBTOOLS_PIPELINE { tuple val(meta), path('assembly_minimap.bam'), emit: assembly_minimap_bam tuple val(meta), path('hic_minimap.bam'), emit:hic_minimap_bam tuple val(meta), path('lineage1_full_table.tsv.gz'), emit: lineage1_full_table_tsv - tuple val(meta), path('lineage2_full_table.tsv.gz'), emit: lineage2_full_table_tsv + tuple val(meta), path('lineage2_full_table.tsv.gz'), emit: lineage2_full_table_tsv, optional: true + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' @@ -30,7 +31,7 @@ process BLOBTOOLS_PIPELINE { --configfile $config \ --latency-wait 60 \ --stats blobtoolkit.stats \ - -s blobtoolkit/insdc-pipeline/blobtoolkit.smk + -s ${params.blobtoolkit_path}/insdc-pipeline/blobtoolkit.smk cp blastn/*.blastn.nt.out . cp diamond/*.diamond.reference_proteomes.out . @@ -38,7 +39,17 @@ process BLOBTOOLS_PIPELINE { cp minimap/*.bam . mv *.${meta.id}.bam assembly_minimap.bam mv ${meta.id}.*.bam hic_minimap.bam - cp busco/${meta.id}.busco.${params.lineage}/full_table.tsv.gz lineage1_full_table.tsv.gz - cp busco/${meta.id}.busco.${params.lineage2}/full_table.tsv.gz lineage2_full_table.tsv.gz + cp busco/${meta.id}.busco.*/full_table.tsv.gz lineage1_full_table.tsv.gz + #cp busco/${meta.id}.busco.${params.lineage2}/full_table.tsv.gz lineage2_full_table.tsv.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + snakemake : \$(snakemake --version) + minimap2: \$(minimap2 --version 2>&1) + windowmasker: \$(windowmasker -version-full | head -n 1 | sed 's/^.*windowmasker: //; s/ .*\$//') + busco: \$(singularity run -B /projects ${params.singularity_cache}/busco5.sif busco --version 2>&1 | sed 's/^BUSCO //' ) + diamond: \$(diamond --version 2>&1 | tail -n 1 | sed 's/^diamond version //') + blast: \$(blastn -version 2>&1 | sed 's/^.*blastn: //; s/ .*\$//' | head -1) + END_VERSIONS """ } diff --git a/modules/blobtools/blobtools_view/main.nf b/modules/blobtools/blobtools_view/main.nf new file mode 100644 index 0000000..bb6ea9e --- /dev/null +++ b/modules/blobtools/blobtools_view/main.nf @@ -0,0 +1,57 @@ +process BLOBTOOLS_VIEW { + tag "$meta.id" + label 'process_high' + time '10minutes' + errorStrategy 'retry' + maxRetries 5 + + input: + tuple val(meta), path(blobtools_folder) + + output: + tuple val(meta), path('*.png'), emit: png + path "versions.yml" , emit: versions + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools view \\ + --host http://localhost \\ + --timeout 60 \\ + --ports 8010-8099 \\ + --view blob \\ + --param largeFonts=true \\ + --format png \\ + --out ${meta.id} \\ + ${meta.id} + + singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools view \\ + --host http://localhost \\ + --timeout 600 \\ + --ports 8010-8099 \\ + --view cumulative \\ + --param largeFonts=true \\ + --format png \\ + --out ${meta.id} \\ + ${meta.id} + + + singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools view \\ + --host http://localhost \\ + --timeout 60 \\ + --ports 8010-8099 \\ + --view snail \\ + --param largeFonts=true \\ + --format png \\ + --out ${meta.id} \\ + ${meta.id} + + cp */*.png . + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + blobtools : \$(singularity exec -B /projects ${params.singularity_cache}/blobtoolkit-blobtools_latest.sif blobtools --version | sed -e "s/blobtoolkit v//g") + END_VERSIONS + """ +} diff --git a/modules/blobtools/blobtools_view_blob/main.nf b/modules/blobtools/blobtools_view_blob/main.nf deleted file mode 100644 index 5e8224a..0000000 --- a/modules/blobtools/blobtools_view_blob/main.nf +++ /dev/null @@ -1,28 +0,0 @@ -process BLOBTOOLS_VIEW_BLOB { - tag "$meta.id" - label 'process_high' - time '10minutes' - - input: - tuple val(meta), path(blobtools_folder) - - output: - tuple val(meta), path('*.png'), emit: png - - script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - """ - singularity exec -B /projects blobtoolkit-blobtools_latest.sif blobtools view \\ - --host http://localhost \\ - --timeout 60 \\ - --ports 8010-8099 \\ - --view blob \\ - --param largeFonts=true \\ - --format png \\ - --out ${meta.id} \\ - ${meta.id} - - cp */*.png . - """ -} diff --git a/modules/blobtools/blobtools_view_cumulative/main.nf b/modules/blobtools/blobtools_view_cumulative/main.nf deleted file mode 100644 index a6699f2..0000000 --- a/modules/blobtools/blobtools_view_cumulative/main.nf +++ /dev/null @@ -1,28 +0,0 @@ -process BLOBTOOLS_VIEW_CUMULATIVE { - tag "$meta.id" - label 'process_high' - time '10minutes' - - input: - tuple val(meta), path(blobtools_folder) - - output: - tuple val(meta), path('*.png'), emit: png - - script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - """ - singularity exec -B /projects blobtoolkit-blobtools_latest.sif blobtools view \\ - --host http://localhost \\ - --timeout 600 \\ - --ports 8010-8099 \\ - --view cumulative \\ - --param largeFonts=true \\ - --format png \\ - --out ${meta.id} \\ - ${meta.id} - - cp */*.png . - """ -} diff --git a/modules/blobtools/blobtools_view_snail/main.nf b/modules/blobtools/blobtools_view_snail/main.nf deleted file mode 100644 index 3a3dd25..0000000 --- a/modules/blobtools/blobtools_view_snail/main.nf +++ /dev/null @@ -1,28 +0,0 @@ -process BLOBTOOLS_VIEW_SNAIL { - tag "$meta.id" - label 'process_high' - time '10minutes' - - input: - tuple val(meta), path(blobtools_folder) - - output: - tuple val(meta), path('*snail.png'), emit: png - - script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - """ - singularity exec -B /projects blobtoolkit-blobtools_latest.sif blobtools view \\ - --host http://localhost \\ - --timeout 60 \\ - --ports 8010-8099 \\ - --view snail \\ - --param largeFonts=true \\ - --format png \\ - --out ${meta.id} \\ - ${meta.id} - - cp */*snail.png . - """ -} diff --git a/modules/busco/main.nf b/modules/busco/main.nf index 2b7448f..baf383d 100644 --- a/modules/busco/main.nf +++ b/modules/busco/main.nf @@ -1,11 +1,11 @@ process BUSCO { tag "$meta.id" - label 'process_medium' + label 'process_high' - conda (params.enable_conda ? "bioconda::busco=5.4.5" : null) + conda "bioconda::busco=5.3.2" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/busco:5.4.5--pyhdfd78af_0': - 'quay.io/biocontainers/busco:5.4.5--pyhdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/busco:5.3.2--pyhdfd78af_0': + 'quay.io/biocontainers/busco:5.3.2--pyhdfd78af_0' }" input: tuple val(meta), path('tmp_input/*') @@ -27,8 +27,8 @@ process BUSCO { def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}-${lineage}" def busco_config = config_file ? "--config $config_file" : '' - def busco_lineage = lineage.equals('auto') ? '--auto-lineage' : "--lineage_dataset ${lineage}" - def busco_lineage_dir = busco_lineages_path ? "--offline --download_path ${busco_lineages_path}" : '' + def busco_lineage = lineage.equals('auto') ? '--auto-lineage-euk' : "--lineage_dataset ${lineage}" + def busco_lineage_dir = busco_lineages_path ? "--download_path ${busco_lineages_path}" : '' """ # Nextflow changes the container --entrypoint to /bin/bash (container default entrypoint: /usr/local/env-execute) # Check for container variable initialisation script and source it. @@ -61,7 +61,7 @@ process BUSCO { cd .. busco \\ - --cpu 32 \\ + --cpu $task.cpus \\ --in "\$INPUT_SEQS" \\ --out ${prefix}-busco \\ $busco_lineage \\ @@ -76,6 +76,11 @@ process BUSCO { mv ${prefix}-busco/batch_summary.txt ${prefix}-busco.batch_summary.txt mv ${prefix}-busco/*/short_summary.*.{json,txt} . || echo "Short summaries were not available: No genes were found." + for f in ${prefix}-busco/${meta.id}*/run_*/full_table.tsv ;do fp=\$(dirname "\$f"); mv "\$f" "\$fp"_full_table.tsv ;done + #find . -type f -name 'full_table.tsv' -print0 | xargs --null -I{} mv {} {}_full_table.tsv + + mv ${prefix}-busco/*/*full_table.tsv . + cat <<-END_VERSIONS > versions.yml "${task.process}": busco: \$( busco --version 2>&1 | sed 's/^BUSCO //' ) diff --git a/modules/bwamem2/index/main.nf b/modules/bwamem2/index/main.nf index dea99ce..34370d8 100644 --- a/modules/bwamem2/index/main.nf +++ b/modules/bwamem2/index/main.nf @@ -2,7 +2,7 @@ process BWAMEM2_INDEX { tag "$fasta" label 'process_single' - conda "bioconda::bwa-mem2=2.2.1" +// conda "bioconda::bwa-mem2=2.2.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/bwa-mem2:2.2.1--he513fc3_0' : 'quay.io/biocontainers/bwa-mem2:2.2.1--he513fc3_0' }" diff --git a/modules/bwamem2/mem/main.nf b/modules/bwamem2/mem/main.nf index 826ffe8..8683704 100644 --- a/modules/bwamem2/mem/main.nf +++ b/modules/bwamem2/mem/main.nf @@ -2,7 +2,7 @@ process BWAMEM2_MEM { tag "$meta.id" label 'process_high' - conda "bioconda::bwa-mem2=2.2.1 bioconda::samtools=1.16.1" +// conda "bioconda::bwa-mem2=2.2.1 bioconda::samtools=1.16.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/mulled-v2-e5d375990341c5aef3c9aff74f96f66f65375ef6:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0' : 'quay.io/biocontainers/mulled-v2-e5d375990341c5aef3c9aff74f96f66f65375ef6:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0' }" diff --git a/modules/canu/main.nf b/modules/canu/main.nf index b925632..46cb044 100644 --- a/modules/canu/main.nf +++ b/modules/canu/main.nf @@ -2,7 +2,7 @@ process CANU { tag "$meta.id" label 'process_high' - conda "bioconda::canu=2.2" +// conda "bioconda::canu=2.2" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/canu:2.2--ha47f30e_0': 'quay.io/biocontainers/canu:2.2--ha47f30e_0' }" @@ -10,6 +10,7 @@ process CANU { input: //As Canu doesn't take pacbio+ont, it can only be one or the other type tuple val(meta), path(reads) + val(genome_size) output: tuple val(meta), path("*.report") , emit: report @@ -28,37 +29,56 @@ process CANU { script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" + def genomesize = genome_size ? "genomesize=$genome_size" : "" if (params.assembly_secondary_mode == 'hicanu'){ """ - canu \\ - -p ${prefix} \\ - $args \\ - maxThreads=$task.cpus \\ - -pacbio-hifi $reads + canu \\ + -p ${prefix} \\ + $args \\ + $genomesize \\ + maxThreads=$task.cpus \\ + -pacbio-hifi $reads - gzip *.fasta + gzip *.fasta - cat <<-END_VERSIONS > versions.yml - "${task.process}": - canu: \$(echo \$(canu --version 2>&1) | sed 's/^.*canu //; s/Using.*\$//' ) - END_VERSIONS + cat <<-END_VERSIONS > versions.yml + "${task.process}": + canu: \$(echo \$(canu --version 2>&1) | sed 's/^.*canu //; s/Using.*\$//' ) + END_VERSIONS """ } else if (params.assembly_secondary_mode == 'ont'){ """ - canu \\ - -p ${prefix} \\ - $args \\ - maxThreads=$task.cpus \\ - -nanopore $reads + canu \\ + -p ${prefix} \\ + $args \\ + $genomesize \\ + maxThreads=$task.cpus \\ + -nanopore $reads - gzip *.fasta + gzip *.fasta - cat <<-END_VERSIONS > versions.yml - "${task.process}": - canu: \$(echo \$(canu --version 2>&1) | sed 's/^.*canu //; s/Using.*\$//' ) - END_VERSIONS + cat <<-END_VERSIONS > versions.yml + "${task.process}": + canu: \$(echo \$(canu --version 2>&1) | sed 's/^.*canu //; s/Using.*\$//' ) + END_VERSIONS + """ + } else if (params.assembly_secondary_mode == 'clr'){ + """ + canu \\ + -p ${prefix} \\ + $args \\ + $genomesize \\ + maxThreads=$task.cpus \\ + -pacbio $reads + + gzip *.fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + canu: \$(echo \$(canu --version 2>&1) | sed 's/^.*canu //; s/Using.*\$//' ) + END_VERSIONS """ } else { - error "Canu needs a correct mode : 'hicanu' or 'ont'" + error "Canu needs a correct mode : 'hicanu', 'ont' or 'clr'" } } diff --git a/modules/cat/main.nf b/modules/cat/main.nf index bc7bcab..9c7edcc 100644 --- a/modules/cat/main.nf +++ b/modules/cat/main.nf @@ -8,9 +8,15 @@ process CAT { output: tuple val(meta), path("*_alternate_contigs_full.fa"), emit: alternate_contigs_full + path "versions.yml" , emit: versions script: """ cat $file1 $file2 > ${meta.id}_alternate_contigs_full.fa + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(cat --version | sed 's/cat (GNU coreutils) //g' | sed -n 1p) + END_VERSIONS """ } diff --git a/modules/chromap/chromap/main.nf b/modules/chromap/chromap/main.nf index 8934fd6..28cbded 100644 --- a/modules/chromap/chromap/main.nf +++ b/modules/chromap/chromap/main.nf @@ -2,7 +2,7 @@ process CHROMAP_CHROMAP { tag "$meta.id" label 'process_high' - conda "bioconda::chromap=0.2.1 bioconda::samtools=1.16.1" +// conda "bioconda::chromap=0.2.1 bioconda::samtools=1.16.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/mulled-v2-1f09f39f20b1c4ee36581dc81cc323c70e661633:25259bafb105193269a9fd7595434c6fbddd4d3b-0' : 'quay.io/biocontainers/mulled-v2-1f09f39f20b1c4ee36581dc81cc323c70e661633:25259bafb105193269a9fd7595434c6fbddd4d3b-0' }" diff --git a/modules/chromap/index/main.nf b/modules/chromap/index/main.nf index a79e731..7e9631d 100644 --- a/modules/chromap/index/main.nf +++ b/modules/chromap/index/main.nf @@ -2,7 +2,7 @@ process CHROMAP_INDEX { tag "$meta.id" label 'process_medium' - conda "bioconda::chromap=0.2.1" +// conda "bioconda::chromap=0.2.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/chromap:0.2.1--hd03093a_0' : 'quay.io/biocontainers/chromap:0.2.1--hd03093a_0' }" diff --git a/modules/coverage_calculation/main.nf b/modules/coverage_calculation/main.nf index 5b588e9..49f0d66 100644 --- a/modules/coverage_calculation/main.nf +++ b/modules/coverage_calculation/main.nf @@ -4,6 +4,7 @@ process COVERAGE_CALCULATION { input: tuple val(meta), path(fastq) + val (genome_size) output: path('*.txt'), emit: coverage @@ -12,6 +13,6 @@ process COVERAGE_CALCULATION { def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ - gzip -cd $fastq | paste - - - - | cut -f 2 | tr -d '\n' | wc -c | awk '{ print "Genome coverage = "\$0/(${params.hap_gen_size_Gb}*1000000000)}' > coverage.txt + gzip -cd $fastq | paste - - - - | cut -f 2 | tr -d '\n' | wc -c | awk '{ print "Genome coverage = "\$0/${genome_size}}' > coverage.txt """ } diff --git a/modules/cutadapt/main.nf b/modules/cutadapt/main.nf index 9b310c0..31795a4 100644 --- a/modules/cutadapt/main.nf +++ b/modules/cutadapt/main.nf @@ -2,7 +2,7 @@ process CUTADAPT { tag "$meta.id" label 'process_medium' - conda (params.enable_conda ? 'bioconda::cutadapt=3.4' : null) + conda 'bioconda::cutadapt=3.4' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/cutadapt:3.4--py39h38f01e4_1' : 'quay.io/biocontainers/cutadapt:3.4--py39h38f01e4_1' }" diff --git a/modules/dumpsoftwareversions/main.nf b/modules/dumpsoftwareversions/main.nf new file mode 100644 index 0000000..9eff50d --- /dev/null +++ b/modules/dumpsoftwareversions/main.nf @@ -0,0 +1,24 @@ +process CUSTOM_DUMPSOFTWAREVERSIONS { + label 'process_single' + + // Requires `pyyaml` which does not have a dedicated container but is in the MultiQC container + conda 'bioconda::multiqc=1.13' + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.13--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.13--pyhdfd78af_0' }" + + input: + path versions + + output: + path "software_versions.yml" , emit: yml + path "software_versions_mqc.yml", emit: mqc_yml + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + template 'dumpsoftwareversions.py' +} diff --git a/modules/dumpsoftwareversions/templates/dumpsoftwareversions.py b/modules/dumpsoftwareversions/templates/dumpsoftwareversions.py new file mode 100644 index 0000000..da03340 --- /dev/null +++ b/modules/dumpsoftwareversions/templates/dumpsoftwareversions.py @@ -0,0 +1,101 @@ +#!/usr/bin/env python + + +"""Provide functions to merge multiple versions.yml files.""" + + +import yaml +import platform +from textwrap import dedent + + +def _make_versions_html(versions): + """Generate a tabular HTML output of all versions for MultiQC.""" + html = [ + dedent( + """\\ + + + + + + + + + + """ + ) + ] + for process, tmp_versions in sorted(versions.items()): + html.append("") + for i, (tool, version) in enumerate(sorted(tmp_versions.items())): + html.append( + dedent( + f"""\\ + + + + + + """ + ) + ) + html.append("") + html.append("
Process Name Software Version
{process if (i == 0) else ''}{tool}{version}
") + return "\\n".join(html) + + +def main(): + """Load all version files and generate merged output.""" + versions_this_module = {} + versions_this_module["${task.process}"] = { + "python": platform.python_version(), + "yaml": yaml.__version__, + } + + with open("$versions") as f: + versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) | versions_this_module + + # aggregate versions by the module name (derived from fully-qualified process name) + versions_by_module = {} + for process, process_versions in versions_by_process.items(): + module = process.split(":")[-1] + try: + if versions_by_module[module] != process_versions: + raise AssertionError( + "We assume that software versions are the same between all modules. " + "If you see this error-message it means you discovered an edge-case " + "and should open an issue in nf-core/tools. " + ) + except KeyError: + versions_by_module[module] = process_versions + + versions_by_module["Workflow"] = { + "Nextflow": "$workflow.nextflow.version", + "$workflow.manifest.name": "$workflow.manifest.version", + } + + versions_mqc = { + "id": "software_versions", + "section_name": "${workflow.manifest.name} Software Versions", + "section_href": "https://github.com/${workflow.manifest.name}", + "plot_type": "html", + "description": "are collected at run time from the software output.", + "data": _make_versions_html(versions_by_module), + } + + with open("software_versions.yml", "w") as f: + yaml.dump(versions_by_module, f, default_flow_style=False) + with open("software_versions_mqc.yml", "w") as f: + yaml.dump(versions_mqc, f, default_flow_style=False) + + with open("versions.yml", "w") as f: + yaml.dump(versions_this_module, f, default_flow_style=False) + + +if __name__ == "__main__": + main() diff --git a/modules/fastqgz_to_fasta/main.nf b/modules/fastqgz_to_fasta/main.nf index b9d7d36..7c056bb 100644 --- a/modules/fastqgz_to_fasta/main.nf +++ b/modules/fastqgz_to_fasta/main.nf @@ -2,7 +2,6 @@ process FASTQGZ_TO_FASTA { tag "$meta.id" label 'process_medium' - conda "bioconda::samtools=1.16.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' : 'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }" @@ -13,9 +12,15 @@ process FASTQGZ_TO_FASTA { output: tuple val(meta), path('*.fasta'), emit: fasta + path "versions.yml" , emit: versions script: """ bgzip -cd $reads | paste - - - - | cut -f 1,2 | sed 's/^/>/' | tr "\t" "\n" | sed 's/@//' > ${meta.id}_pacbio.fasta + +cat <<-END_VERSIONS > versions.yml +"${task.process}": + bgzip: \$(echo \$(bgzip --version 2>&1) | sed 's/^bgzip (htslib) //; s/ Copyright.*//') +END_VERSIONS """ } diff --git a/modules/fcs/fcsadaptor/main.nf b/modules/fcs/fcsadaptor/main.nf new file mode 100644 index 0000000..bc44af5 --- /dev/null +++ b/modules/fcs/fcsadaptor/main.nf @@ -0,0 +1,51 @@ +process FCS_FCSADAPTOR { + tag "$meta.id" + label 'process_low' + + // WARN: Version information not provided by tool on CLI. Please update version string below when bumping container versions. + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/0.2.3/fcs-adaptor.0.2.3.sif': + 'ncbi/fcs-adaptor:0.2.3' }" + + // Exit if running this module with -profile conda / -profile mamba + if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { + exit 1, "FCS_FCSADAPTOR module does not support Conda. Please use Docker / Singularity / Podman instead." + } + + input: + tuple val(meta), path(assembly) + + output: + tuple val(meta), path("*.cleaned_sequences.fa.gz"), emit: cleaned_assembly + tuple val(meta), path("*.fcs_adaptor_report.txt") , emit: adaptor_report + tuple val(meta), path("*.fcs_adaptor.log") , emit: log + tuple val(meta), path("*.pipeline_args.yaml") , emit: pipeline_args + tuple val(meta), path("*.skipped_trims.jsonl") , emit: skipped_trims + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def FCSADAPTOR_VERSION = '0.2.3' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + """ + /app/fcs/bin/av_screen_x \\ + -o output/ \\ + $args \\ + $assembly + + # compress and/or rename files with prefix + gzip -cf output/cleaned_sequences/* > "${assembly.baseName}.cleaned_sequences.fa.gz" + cp "output/fcs_adaptor_report.txt" "${assembly.baseName}.fcs_adaptor_report.txt" + cp "output/fcs_adaptor.log" "${assembly.baseName}.fcs_adaptor.log" + cp "output/pipeline_args.yaml" "${assembly.baseName}.pipeline_args.yaml" + cp "output/skipped_trims.jsonl" "${assembly.baseName}.skipped_trims.jsonl" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + FCS-adaptor: $FCSADAPTOR_VERSION + END_VERSIONS + """ +} diff --git a/modules/fcs/fcsadaptor/meta.yml b/modules/fcs/fcsadaptor/meta.yml new file mode 100644 index 0000000..e726336 --- /dev/null +++ b/modules/fcs/fcsadaptor/meta.yml @@ -0,0 +1,62 @@ +name: "fcs_fcsadaptor" +description: Run NCBI's FCS adaptor on assembled genomes +keywords: + - assembly + - genomics + - quality control + - contamination + - NCBI +tools: + - "fcs": + description: | + The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign + organisms in genome assemblies to prepare your data for submission. Therefore, the + submission process to NCBI is faster and fewer contaminated genomes are submitted. + This reduces errors in analyses and conclusions, not just for the original data submitter + but for all subsequent users of the assembly. + homepage: "https://www.ncbi.nlm.nih.gov/data-hub/cgr/data-quality-tools/" + documentation: "https://github.com/ncbi/fcs/wiki/FCS-adaptor" + tool_dev_url: "https://github.com/ncbi/fcs" + + licence: "United States Government Work" +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - assembly: + type: file + description: assembly fasta file +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - cleaned_assembly: + type: file + description: Cleaned assembly in fasta format + pattern: "*.{cleaned_sequences.fa.gz}" + - adaptor_report: + type: file + description: Report of identified adaptors + pattern: "*.{fcs_adaptor_report.txt}" + - log: + type: file + description: Log file + pattern: "*.{fcs_adaptor.log}" + - pipeline_args: + type: file + description: Run arguments + pattern: "*.{pipeline_args.yaml}" + - skipped_trims: + type: file + description: Skipped trim information + pattern: "*.{skipped_trims.jsonl}" +authors: + - "@d4straub" diff --git a/modules/fcs/fcsgx/main.nf b/modules/fcs/fcsgx/main.nf new file mode 100644 index 0000000..ca24376 --- /dev/null +++ b/modules/fcs/fcsgx/main.nf @@ -0,0 +1,44 @@ +process FCS_FCSGX { + tag "$meta.id" + label 'process_medium' + + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/0.4.0/fcs-gx.sif' : + 'ncbi/fcs-gx:0.4.0' }" + + // Exit if running this module with -profile conda / -profile mamba + if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { + exit 1, "FCS_FCSGX module does not support Conda. Please use Docker / Singularity / Podman instead." + } + + input: + tuple val(meta), path(assembly) + + output: + tuple val(meta), path("*.fcs_gx_report.txt"), emit: fcs_gx_report + tuple val(meta), path("*.taxonomy.rpt") , emit: taxonomy_report + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def FCSGX_VERSION = '0.4.0' + + """ + python3 /app/bin/run_gx \\ + --fasta $assembly \\ + --out-dir . \\ + --gx-db ${params.fcs_gx_database} \\ + --tax-id ${params.taxon_taxid} \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version 2>&1 | sed -e "s/Python //g") + FCS-GX: $FCSGX_VERSION + END_VERSIONS + """ +} diff --git a/modules/fcs/fcsgx/meta.yml b/modules/fcs/fcsgx/meta.yml new file mode 100644 index 0000000..a413db6 --- /dev/null +++ b/modules/fcs/fcsgx/meta.yml @@ -0,0 +1,60 @@ +me: "fcs_fcsgx" +description: Run FCS-GX on assembled genomes. The contigs of the assembly are searched against a reference database excluding the given taxid. +keywords: + - assembly + - genomics + - quality control + - contamination + - NCBI +tools: + - "fcs": + description: | + "The Foreign Contamination Screening (FCS) tool rapidly detects contaminants from foreign + organisms in genome assemblies to prepare your data for submission. Therefore, the + submission process to NCBI is faster and fewer contaminated genomes are submitted. + This reduces errors in analyses and conclusions, not just for the original data submitter + but for all subsequent users of the assembly." + homepage: "https://www.ncbi.nlm.nih.gov/data-hub/cgr/data-quality-tools/" + documentation: "https://github.com/ncbi/fcs/wiki/FCS-GX" + tool_dev_url: "https://github.com/ncbi/fcs" + + licence: "United States Government Work" + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', taxid:'6973' ] + - assembly: + type: file + description: assembly fasta file + - database: + type: file + description: Files of the database downloaded from the ncbi server, + https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/. All files + of one db should be downloaded and given to the process as + channel.collect(). The link contains 2 databases, test-only and all. + Use all for pipeline usage and test-only for tests. + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', taxid:'9606' ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fcs_gx_report: + type: file + description: Report containing the contig identifier and recommended action (EXCLUDE, TRIM, FIX, REVIEW) + pattern: "*.fcs_gx_report.txt" + - taxonomy_report: + type: file + description: Report containing the contig identifier and mapped contaminant species + pattern: "*.taxonomy.rpt" + +authors: + - "@tillenglert" diff --git a/modules/fcs/fcsgx_clean/main.nf b/modules/fcs/fcsgx_clean/main.nf new file mode 100644 index 0000000..66beb4b --- /dev/null +++ b/modules/fcs/fcsgx_clean/main.nf @@ -0,0 +1,42 @@ +process FCS_FCSGX_CLEAN { + tag "$meta.id" + label 'process_low' + + input: + tuple val(meta), path(assembly) + tuple val(meta2), path(fcsgx_report) + + output: + tuple val(meta), path("*.cleaned.fasta.gz") , emit: cleaned_fasta + tuple val(meta), path("*.contam.fasta.gz") , emit: cantam_fasta + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def FCSGX_VERSION = '0.4.0' + """ + cp $fcsgx_report local_report.txt + + zcat $assembly | python3 ${params.singularity_cache}/fcs.py \\ + --image=${params.singularity_cache}/ftp.ncbi.nlm.nih.gov-genomes-TOOLS-FCS-releases-0.4.0-fcs-gx.sif \\ + clean genome \\ + --action-report=local_report.txt \\ + --output=cleaned.fasta \\ + --contam-fasta-out=contam.fasta + + gzip -c cleaned.fasta > ${assembly.baseName}.cleaned.fasta.gz + gzip -c contam.fasta > ${assembly.baseName}.contam.fasta.gz + rm cleaned.fasta + rm contam.fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version 2>&1 | sed -e "s/Python //g") + FCS-GX: $FCSGX_VERSION + END_VERSIONS + """ +} diff --git a/modules/flye/flye_pacbio_ont/main.nf b/modules/flye/flye_pacbio_ont/main.nf index cb479b9..f7fb5a6 100644 --- a/modules/flye/flye_pacbio_ont/main.nf +++ b/modules/flye/flye_pacbio_ont/main.nf @@ -2,7 +2,7 @@ process FLYE_PACBIO_ONT { tag "$meta.id" label 'process_high' - conda "bioconda::flye=2.9" + //conda "bioconda::flye=2.9" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/flye:2.9--py39h6935b12_1' : 'quay.io/biocontainers/flye:2.9--py39h6935b12_1' }" diff --git a/modules/flye/main.nf b/modules/flye/main.nf index 420d5fe..012150f 100644 --- a/modules/flye/main.nf +++ b/modules/flye/main.nf @@ -2,7 +2,7 @@ process FLYE { tag "$meta.id" label 'process_high' - conda "bioconda::flye=2.9" + //conda "bioconda::flye=2.9" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/flye:2.9--py39h6935b12_1' : 'quay.io/biocontainers/flye:2.9--py39h6935b12_1' }" diff --git a/modules/genomescope2/main.nf b/modules/genomescope2/main.nf index 2ddf9e4..60e71c2 100644 --- a/modules/genomescope2/main.nf +++ b/modules/genomescope2/main.nf @@ -2,13 +2,14 @@ process GENOMESCOPE2 { tag "$meta.id" label 'process_low' - conda (params.enable_conda ? "bioconda::genomescope2=2.0" : null) + conda "bioconda::genomescope2=2.0" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/genomescope2:2.0--py310r41hdfd78af_5': 'quay.io/biocontainers/genomescope2:2.0--py310r41hdfd78af_5' }" input: tuple val(meta), path(histogram) + val(ploidy) output: tuple val(meta), path("*_linear_plot.png") , emit: linear_plot_png @@ -24,10 +25,12 @@ process GENOMESCOPE2 { script: def args = task.ext.args ?: '' + def ploidy_value = ploidy ? "-p $ploidy" : "" prefix = task.ext.prefix ?: "${meta.id}" """ genomescope2 \\ --input $histogram \\ + $ploidy_value \\ $args \\ --output . \\ --name_prefix $prefix diff --git a/modules/gfa_to_fa/main.nf b/modules/gfa_to_fa/main.nf index 0e10baf..946683e 100644 --- a/modules/gfa_to_fa/main.nf +++ b/modules/gfa_to_fa/main.nf @@ -7,11 +7,17 @@ process GFA_TO_FA { output: tuple val(meta), path('*.fa'), emit: fa_assembly + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ awk '/^S/{print ">"\$2;print \$3}' $GFA_assembly > ${GFA_assembly.baseName}.fa + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + awk: \$(awk --version | sed 's/GNU Awk //g'| sed -n 1p) + END_VERSIONS """ } diff --git a/modules/gfastats/main.nf b/modules/gfastats/main.nf new file mode 100644 index 0000000..9e3c24f --- /dev/null +++ b/modules/gfastats/main.nf @@ -0,0 +1,34 @@ +process GFASTATS { + tag "$meta.id" + label 'process_low' + + conda "bioconda::gfastats=1.3.6" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gfastats:1.3.6--hdcf5f25_3': + 'biocontainers/gfastats:1.3.6--hdcf5f25_3' }" + + input: + tuple val(meta), path(assembly) // input.[fasta|fastq|gfa][.gz] + + output: + tuple val(meta), path("*.assembly_summary"), emit: assembly_summary + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + gfastats \\ + $args \\ + --threads $task.cpus \\ + $assembly > ${prefix}.assembly_summary + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gfastats: \$( gfastats -v | sed '1!d;s/.*v//' ) + END_VERSIONS + """ +} diff --git a/modules/goat/taxonsearch/main.nf b/modules/goat/taxonsearch/main.nf new file mode 100644 index 0000000..fbf3757 --- /dev/null +++ b/modules/goat/taxonsearch/main.nf @@ -0,0 +1,47 @@ +process GOAT_TAXONSEARCH { + tag "$meta.id" + label 'process_single' + + conda "bioconda::goat=0.2.0" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/goat:0.2.0--h92d785c_0': + 'biocontainers/goat:0.2.0--h92d785c_0' }" + + input: + tuple val(meta), val(taxon), path(taxa_file) + + output: + tuple val(meta), path("*.tsv"), emit: taxonsearch + path "versions.yml" , emit: versions + env(ploidy) , emit: ploidy + env(haploid_number) , emit: haploid_number + env(scientific_name) , emit: scientific_name + env(genome_size) , emit: genome_size + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + input = taxa_file ? "-f ${taxa_file}" : "-t \"${taxon}\"" + if (!taxon && !taxa_file) error "No input. Valid input: single taxon identifier or a .txt file with identifiers" + if (taxon && taxa_file ) error "Only one input is required: a single taxon identifier or a .txt file with identifiers" + """ + goat-cli taxon search \\ + $args \\ + $input > ${prefix}.tsv + + sed 's/\t/,/g' ${prefix}.tsv > ${prefix}.csv + + ploidy=\$(awk -v get='^(ploidy)' 'BEGIN{FS=OFS=","}FNR==1{for(i=1;i<=NF;i++)if(\$i~get)cols[++c]=i}{for(i=1; i<=c; i++)printf "%s%s", \$(cols[i]), (i versions.yml + "${task.process}": + goat: \$(goat-cli --version | cut -d' ' -f2) + END_VERSIONS + """ +} diff --git a/modules/goat/taxonsearch/t.awk b/modules/goat/taxonsearch/t.awk new file mode 100644 index 0000000..c82cebb --- /dev/null +++ b/modules/goat/taxonsearch/t.awk @@ -0,0 +1,8 @@ +NR==1 { + for (i=1; i<=NF; i++) { + ix[$i] = i + } +} +NR>1 { + print $ix[c1] +} diff --git a/modules/gzip/main.nf b/modules/gzip/main.nf index b59dad6..6d4d47a 100644 --- a/modules/gzip/main.nf +++ b/modules/gzip/main.nf @@ -2,7 +2,6 @@ process GZIP { tag "$meta.id" label 'process_medium' - conda "bioconda::samtools=1.16.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' : 'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }" @@ -13,9 +12,15 @@ process GZIP { output: tuple val(meta), path('*.gz'), emit: gz + path "versions.yml" , emit: versions script: """ bgzip -c $file_to_compress > ${meta.id}.scaffolds_FINAL.fa.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bgzip: \$(bgzip --version | sed -n 1p |sed 's/bgzip (htslib) //g') + END_VERSIONS """ } diff --git a/modules/hifiasm/main.nf b/modules/hifiasm/main.nf index 79bd26f..5e0fb65 100644 --- a/modules/hifiasm/main.nf +++ b/modules/hifiasm/main.nf @@ -2,10 +2,10 @@ process HIFIASM { tag "$meta.id" label 'process_high' - conda "bioconda::hifiasm=0.18.8" + conda "bioconda::hifiasm=0.19.4" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/hifiasm:0.18.5--h5b5514e_0' : - 'quay.io/biocontainers/hifiasm:0.18.5--h5b5514e_0' }" + 'https://depot.galaxyproject.org/singularity/hifiasm:0.19.4--h5b5514e_0' : + 'quay.io/biocontainers/hifiasm:0.19.4--h5b5514e_0' }" input: tuple val(meta), path(reads) @@ -14,6 +14,8 @@ process HIFIASM { path hic_read1 path hic_read2 path(nanopore_UL) + val ploidy + val genome_size output: tuple val(meta), path("*.r_utg.gfa") , emit: raw_unitigs @@ -34,6 +36,7 @@ process HIFIASM { script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" + def ploidy_value = ploidy ? "--n-hap $ploidy" : "" if ((paternal_kmer_dump) && (maternal_kmer_dump) && (hic_read1) && (hic_read2)) { error "Hifiasm Trio-binning and Hi-C integrated should not be used at the same time" } else if ((paternal_kmer_dump) && !(maternal_kmer_dump)) { @@ -42,8 +45,12 @@ process HIFIASM { error "Hifiasm Trio-binning requires paternal data" } else if ((paternal_kmer_dump) && (maternal_kmer_dump)) { """ + hg_size_kb=\$(echo $genome_size | awk '{print \$1 /1000}') + hifiasm \\ $args \\ + $ploidy_value \\ + --hg-size \${hg_size_kb}k \\ -o ${prefix}.asm \\ -t $task.cpus \\ -1 $paternal_kmer_dump \\ @@ -61,8 +68,12 @@ process HIFIASM { error "Hifiasm Hi-C integrated requires paired-end data (only R2 specified here)" } else if ((hic_read1) && (hic_read2) && !(nanopore_UL)) { """ + hg_size_kb=\$(echo $genome_size | awk '{print \$1 /1000}') + hifiasm \\ $args \\ + $ploidy_value \\ + --hg-size \${hg_size_kb}k \\ -o ${prefix}.asm \\ -t $task.cpus \\ --h1 $hic_read1 \\ @@ -76,8 +87,12 @@ process HIFIASM { """ } else if ((nanopore_UL) && !(hic_read1) && !(hic_read2)) { """ + hg_size_kb=\$(echo $genome_size | awk '{print \$1 /1000}') + hifiasm \\ $args \\ + $ploidy_value \\ + --hg-size \${hg_size_kb}k \\ -o ${prefix}.asm \\ -t $task.cpus \\ --ul $nanopore_UL \\ @@ -90,8 +105,12 @@ process HIFIASM { """ } else if ((hic_read1) && (hic_read2) && (nanopore_UL)) { """ + hg_size_kb=\$(echo $genome_size | awk '{print \$1 /1000}') + hifiasm \\ $args \\ + $ploidy_value \\ + --hg-size \${hg_size_kb}k \\ -o ${prefix}.asm \\ -t $task.cpus \\ --h1 $hic_read1 \\ @@ -106,8 +125,12 @@ process HIFIASM { """ } else { """ + hg_size_kb=\$(echo $genome_size | awk '{print \$1 /1000}') + hifiasm \\ $args \\ + $ploidy_value \\ + --hg-size \${hg_size_kb}k \\ -o ${prefix}.asm \\ -t $task.cpus \\ $reads diff --git a/modules/juicer/juicer/main.nf b/modules/juicer/juicer/main.nf index a228c3e..8af38ec 100644 --- a/modules/juicer/juicer/main.nf +++ b/modules/juicer/juicer/main.nf @@ -10,6 +10,7 @@ process JUICER { output: tuple val(meta), path("*.hic"), emit: hic + path "versions.yml" , emit: versions script: """ @@ -17,5 +18,11 @@ process JUICER { $alignments_sorted_txt \\ ${meta.id}.hic \\ $chrom_sizes + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + openjdk: \$(echo \$(java -version 2>&1) | grep version | sed 's/\"//g' | cut -f3 -d ' ') + juicer : \$(echo \$(java -jar ${params.JUICER_JAR} --version | sed -n 2p | sed 's/Juicer Tools Version //g')) + END_VERSIONS """ } diff --git a/modules/juicer/salsa2_juicer/juicer_tools_1.22.01.jar b/modules/juicer/salsa2_juicer/juicer_tools_1.22.01.jar deleted file mode 100644 index 1a141cd..0000000 Binary files a/modules/juicer/salsa2_juicer/juicer_tools_1.22.01.jar and /dev/null differ diff --git a/modules/juicer/salsa2_juicer/main.nf b/modules/juicer/salsa2_juicer/main.nf index fca6b37..eebf8b4 100644 --- a/modules/juicer/salsa2_juicer/main.nf +++ b/modules/juicer/salsa2_juicer/main.nf @@ -12,7 +12,7 @@ process SALSA2_JUICER { output: tuple val(meta), path("*.hic"), emit: hic tuple val(meta), path('*.assembly'), emit: assembly - + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' @@ -24,5 +24,13 @@ process SALSA2_JUICER { awk '{if (\$2 > \$6) {print \$1"\t"\$6"\t"\$7"\t"\$8"\t"\$5"\t"\$2"\t"\$3"\t"\$4} else {print}}' alignments.txt | sort -k2,2d -k6,6d -T $projectDir --parallel=8 | awk 'NF' > alignments_sorted.txt java -jar ${params.JUICER_JAR} pre $args -o ${meta.id} alignments_sorted.txt salsa_scaffolds.hic chromosome_sizes.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cut: \$(cut --version | sed 's/cut (GNU coreutils) //g' | sed -n 1p) + python: \$(python --version | sed 's/Python //g') + awk: \$(awk --version | sed 's/GNU Awk //g' | sed -n 1p) + java: \$(java --version | sed 's/java //g' | sed -n 1p) + END_VERSIONS """ } diff --git a/modules/juicer/yahs_juicer/main.nf b/modules/juicer/yahs_juicer/main.nf index 706fd99..526cfec 100644 --- a/modules/juicer/yahs_juicer/main.nf +++ b/modules/juicer/yahs_juicer/main.nf @@ -5,8 +5,9 @@ process YAHS_JUICER { container "https://depot.galaxyproject.org/singularity/yahs%3A1.2a.2--h7132678_0" input: - tuple val(meta), path(agp), path (bin) - tuple val(meta), path(index) + tuple val(meta), path(agp) + tuple val(meta2), path (bin) + tuple val(meta3), path(index) output: tuple val(meta), path("*_scaffolds_final.chrom.sizes"), emit: chrom_sizes @@ -14,21 +15,27 @@ process YAHS_JUICER { tuple val(meta), path('*.assembly'), emit: assembly tuple val(meta), path('*.txt'), emit: juicer_txt tuple val(meta), path('*.log'), emit: juicer_log + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' """ juicer pre \\ - -a -o ${meta.id} \\ - $bin \\ - $agp \\ - $index + -a -o ${meta.id} \\ + $bin \\ + $agp \\ + $index - juicer pre \\ - $bin \\ - $agp \\ - $index 2>tmp_juicer_pre.log | LC_ALL=C sort -k2,2d -k6,6d -S32G | awk 'NF' > ${meta.id}_alignments_sorted.txt + juicer pre \\ + $bin \\ + $agp \\ + $index 2>tmp_juicer_pre.log | LC_ALL=C sort -k2,2d -k6,6d -S32G | awk 'NF' > ${meta.id}_alignments_sorted.txt cat tmp_juicer_pre.log | grep "PRE_C_SIZE" | cut -d' ' -f2- > ${meta.id}_scaffolds_final.chrom.sizes + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + juicer : \$(echo \$(juicer pre --version)) + END_VERSIONS """ } diff --git a/modules/jupiter/main.nf b/modules/jupiter/main.nf new file mode 100644 index 0000000..da3c1d2 --- /dev/null +++ b/modules/jupiter/main.nf @@ -0,0 +1,38 @@ +process JUPITER { + tag "$meta.id" + label 'process_medium' + + input: + tuple val(meta), path(assembly) + tuple val(meta2), path(reference) + + output: + tuple val(meta), path("*.png") , emit: plot + tuple val(meta), path("*.txt") , emit: txt + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: '' + """ + source \$(conda info --json | awk '/conda_prefix/ { gsub(/"|,/, "", \$2); print \$2 }')/bin/activate /home/scorreard/miniconda3/envs/circos + + gzip -cd $reference > jupiter_reference.fa + + ${params.singularity_cache}/Jupiter/./jupiter \\ + t=$task.cpus \\ + name=${prefix}_${reference.simpleName} \\ + ref=jupiter_reference.fa \\ + fa=$assembly \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + jupiter: \$( ${params.singularity_cache}/Jupiter/./jupiter --version | head -n1) + circos : \$( circos --version | sed 's/circos | v //g'| sed 's/ | 15 Jun 2019 | Perl 5.032001//g') + END_VERSIONS + """ +} diff --git a/modules/kraken2/main.nf b/modules/kraken2/main.nf index 96251d2..9df6f5a 100644 --- a/modules/kraken2/main.nf +++ b/modules/kraken2/main.nf @@ -2,7 +2,7 @@ process KRAKEN2_KRAKEN2 { tag "$meta.id" label 'process_high' - conda (params.enable_conda ? 'bioconda::kraken2=2.1.2 conda-forge::pigz=2.6' : null) + conda 'bioconda::kraken2=2.1.2 conda-forge::pigz=2.6' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0' : 'quay.io/biocontainers/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0' }" diff --git a/modules/longstitch/main.nf b/modules/longstitch/main.nf index 0ab2600..1e7f67b 100644 --- a/modules/longstitch/main.nf +++ b/modules/longstitch/main.nf @@ -2,16 +2,17 @@ process LONGSTITCH { tag "$meta.id" label 'process_high' - conda "bioconda::longstitch" + conda "bioconda::longstitch=1.0.4" // container "https://depot.galaxyproject.org/singularity/longstitch%3A1.0.3--hdfd78af_0" - container "docker://quay.io/biocontainers/longstitch:1.0.2--hdfd78af_0" + container "docker://quay.io/biocontainers/longstitch:1.0.4--hdfd78af_0" input: - tuple val(meta), path(reads) - tuple val(meta2), path(assembly) + tuple val(meta2), path(reads) + tuple val(meta), path(assembly) + val(genome_size) output: - tuple val(meta), path('*.ntLink.scaffolds.fa') , emit: assembly + tuple val(meta), path ('*ntLink-arks.longstitch-scaffolds.fa') , emit: assembly path "versions.yml" , emit: versions when: @@ -19,15 +20,20 @@ process LONGSTITCH { script: def args = task.ext.args ?: '' + def G = genome_size ? "G=$genome_size" : "" def prefix = task.ext.prefix ?: "${meta.id}" """ gzip -cd ${assembly} > ${assembly.simpleName}.fa - ln -s ${reads} ${reads.simpleName}.fq.gz - longstitch tigmint-ntLink-arks \\ + #gunzip -c ${reads} > ${reads.simpleName}.fq.gz + + cp $reads ${reads.simpleName}.fq.gz + + longstitch ntLink-arks \\ draft=${assembly.simpleName} \\ reads=${reads.simpleName} \\ $args \\ + $G \\ t=$task.cpus cat <<-END_VERSIONS > versions.yml diff --git a/modules/manualcuration/main.nf b/modules/manualcuration/main.nf new file mode 100644 index 0000000..de03037 --- /dev/null +++ b/modules/manualcuration/main.nf @@ -0,0 +1,34 @@ +process RAPIDCURATION_SPLIT { + tag "$meta.id" + label 'process_medium' + + conda "bioconda::seqtk=1.3 conda-forge::perl=5.32.1" +// container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? +// 'https://depot.galaxyproject.org/singularity/seqtk:1.3--h5bf99c6_3' : +// 'biocontainers/seqtk:1.3--h5bf99c6_3' }" + + input: + tuple val(meta), path(assembly) + + output: + tuple val(meta), path("*.tpf") , emit: split_tpf + path "versions.yml" , emit: versions + + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: '' + """ + perl ${params.singularity_cache}/rapid-curation/rapid_split.pl -fa $assembly + mv *.tpf ${prefix}.tpf + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + seqtk: \$(echo \$(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + perl: \$(perl --version | grep 'This is perl' | sed 's/.*(v//g' | sed 's/)//g') + END_VERSIONS + """ +} diff --git a/modules/mashmap/main.nf b/modules/mashmap/main.nf new file mode 100644 index 0000000..cb47ff7 --- /dev/null +++ b/modules/mashmap/main.nf @@ -0,0 +1,43 @@ +process MASHMAP { + tag "$meta.id" + label 'process_medium' + + conda "bioconda::mashmap=3.0.4 bioconda::perl-bioperl=1.7.2 conda-forge::gsl=2.7 mkl=2023.1.0 conda-forge::gnuplot=5.4.5" +// container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? +// 'https://depot.galaxyproject.org/singularity/mashmap%3A3.0.4--h97b747e_0' : +// 'biocontainers/mashmap:3.0.4--h97b747e_0' }" + + input: + tuple val(meta), path(assembly) + tuple val(meta2), path(reference) + + output: + tuple val(meta), path("*.png") , emit: mashmap_png + tuple val(meta), path("*.txt"), emit: mashmap_txt + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: '' + """ + mashmap \\ + -r $reference \\ + -q $assembly \\ + $args \\ + -t $task.cpus \\ + -o ${prefix}_${reference.simpleName}.out + + perl ${params.singularity_cache}/MashMap/scripts/generateDotPlot png large ${prefix}_${reference.simpleName}.out + + awk '{print \$1"\t"\$6"\t"\$5}' ${prefix}_${reference.simpleName}.out | sort | uniq > mashmap_correspondance.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + mashmap: \$(echo \$(mashmap --version 2>&1)) + perl: \$(perl --version | grep 'This is perl' | sed 's/.*(v//g' | sed 's/)//g') + END_VERSIONS + """ +} diff --git a/modules/merqury/main.nf b/modules/merqury/main.nf index 2e5b06b..b862acc 100644 --- a/modules/merqury/main.nf +++ b/modules/merqury/main.nf @@ -11,21 +11,21 @@ process MERQURY { tuple val(meta), path(meryl_db), path(assembly) output: - tuple val(meta), path("*_only.bed") , emit: assembly_only_kmers_bed - tuple val(meta), path("*_only.wig") , emit: assembly_only_kmers_wig - tuple val(meta), path("*.completeness.stats"), emit: stats - tuple val(meta), path("*.dist_only.hist") , emit: dist_hist - tuple val(meta), path("*.spectra-cn.fl.png") , emit: spectra_cn_fl_png - tuple val(meta), path("*.spectra-cn.hist") , emit: spectra_cn_hist - tuple val(meta), path("*.spectra-cn.ln.png") , emit: spectra_cn_ln_png - tuple val(meta), path("*.spectra-cn.st.png") , emit: spectra_cn_st_png - tuple val(meta), path("*.spectra-asm.fl.png"), emit: spectra_asm_fl_png - tuple val(meta), path("*.spectra-asm.hist") , emit: spectra_asm_hist - tuple val(meta), path("*.spectra-asm.ln.png"), emit: spectra_asm_ln_png - tuple val(meta), path("*.spectra-asm.st.png"), emit: spectra_asm_st_png - tuple val(meta), path("${prefix}.qv") , emit: assembly_qv - tuple val(meta), path("${prefix}.*.qv") , emit: scaffold_qv - tuple val(meta), path("*.hist.ploidy") , emit: read_ploidy + tuple val(meta), path("*_only.bed") , emit: assembly_only_kmers_bed, optional: true + tuple val(meta), path("*_only.wig") , emit: assembly_only_kmers_wig, optional: true + tuple val(meta), path("*.completeness.stats"), emit: stats , optional: true + tuple val(meta), path("*.dist_only.hist") , emit: dist_hist , optional: true + tuple val(meta), path("*.spectra-cn.fl.png") , emit: spectra_cn_fl_png , optional: true + tuple val(meta), path("*.spectra-cn.hist") , emit: spectra_cn_hist , optional: true + tuple val(meta), path("*.spectra-cn.ln.png") , emit: spectra_cn_ln_png , optional: true + tuple val(meta), path("*.spectra-cn.st.png") , emit: spectra_cn_st_png , optional: true + tuple val(meta), path("*.spectra-asm.fl.png"), emit: spectra_asm_fl_png , optional: true + tuple val(meta), path("*.spectra-asm.hist") , emit: spectra_asm_hist , optional: true + tuple val(meta), path("*.spectra-asm.ln.png"), emit: spectra_asm_ln_png , optional: true + tuple val(meta), path("*.spectra-asm.st.png"), emit: spectra_asm_st_png , optional: true + tuple val(meta), path("${prefix}.qv") , emit: assembly_qv , optional: true + tuple val(meta), path("${prefix}.*.qv") , emit: scaffold_qv , optional: true + tuple val(meta), path("*.hist.ploidy") , emit: read_ploidy , optional: true path "versions.yml" , emit: versions when: diff --git a/modules/merqury/merqury_double/main.nf b/modules/merqury/merqury_double/main.nf index 5d0d2d9..a26a329 100644 --- a/modules/merqury/merqury_double/main.nf +++ b/modules/merqury/merqury_double/main.nf @@ -13,21 +13,21 @@ process MERQURY_DOUBLE { tuple val(meta3), path(assembly2) output: - tuple val(meta), path("*_only.bed") , emit: assembly_only_kmers_bed - tuple val(meta), path("*_only.wig") , emit: assembly_only_kmers_wig - tuple val(meta), path("*.completeness.stats"), emit: stats - tuple val(meta), path("*.dist_only.hist") , emit: dist_hist - tuple val(meta), path("*.spectra-cn.fl.png") , emit: spectra_cn_fl_png - tuple val(meta), path("*.spectra-cn.hist") , emit: spectra_cn_hist - tuple val(meta), path("*.spectra-cn.ln.png") , emit: spectra_cn_ln_png - tuple val(meta), path("*.spectra-cn.st.png") , emit: spectra_cn_st_png - tuple val(meta), path("*.spectra-asm.fl.png"), emit: spectra_asm_fl_png - tuple val(meta), path("*.spectra-asm.hist") , emit: spectra_asm_hist - tuple val(meta), path("*.spectra-asm.ln.png"), emit: spectra_asm_ln_png - tuple val(meta), path("*.spectra-asm.st.png"), emit: spectra_asm_st_png - tuple val(meta), path("${prefix}.qv") , emit: assembly_qv - tuple val(meta), path("${prefix}.*.qv") , emit: scaffold_qv - tuple val(meta), path("*.hist.ploidy") , emit: read_ploidy + tuple val(meta), path("*_only.bed") , emit: assembly_only_kmers_bed, optional: true + tuple val(meta), path("*_only.wig") , emit: assembly_only_kmers_wig, optional: true + tuple val(meta), path("*.completeness.stats"), emit: stats , optional: true + tuple val(meta), path("*.dist_only.hist") , emit: dist_hist , optional: true + tuple val(meta), path("*.spectra-cn.fl.png") , emit: spectra_cn_fl_png , optional: true + tuple val(meta), path("*.spectra-cn.hist") , emit: spectra_cn_hist , optional: true + tuple val(meta), path("*.spectra-cn.ln.png") , emit: spectra_cn_ln_png , optional: true + tuple val(meta), path("*.spectra-cn.st.png") , emit: spectra_cn_st_png , optional: true + tuple val(meta), path("*.spectra-asm.fl.png"), emit: spectra_asm_fl_png , optional: true + tuple val(meta), path("*.spectra-asm.hist") , emit: spectra_asm_hist , optional: true + tuple val(meta), path("*.spectra-asm.ln.png"), emit: spectra_asm_ln_png , optional: true + tuple val(meta), path("*.spectra-asm.st.png"), emit: spectra_asm_st_png , optional: true + tuple val(meta), path("${prefix}.qv") , emit: assembly_qv , optional: true + tuple val(meta), path("${prefix}.*.qv") , emit: scaffold_qv , optional: true + tuple val(meta), path("*.hist.ploidy") , emit: read_ploidy , optional: true path "versions.yml" , emit: versions when: diff --git a/modules/meryl/count/main.nf b/modules/meryl/count/main.nf index 6e7ff18..d1ed723 100644 --- a/modules/meryl/count/main.nf +++ b/modules/meryl/count/main.nf @@ -2,7 +2,7 @@ process MERYL_COUNT { tag "$meta.id" label 'process_high' - conda (params.enable_conda ? "bioconda::meryl=1.3" : null) + conda "bioconda::meryl=1.3" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/meryl:1.3--h87f3376_1': 'quay.io/biocontainers/meryl:1.3--h87f3376_1' }" diff --git a/modules/meryl/histogram/main.nf b/modules/meryl/histogram/main.nf index a1f18f0..b69f0a2 100644 --- a/modules/meryl/histogram/main.nf +++ b/modules/meryl/histogram/main.nf @@ -2,7 +2,7 @@ process MERYL_HISTOGRAM { tag "$meta.id" label 'process_low' - conda (params.enable_conda ? "bioconda::meryl=1.3" : null) + conda "bioconda::meryl=1.3" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/meryl:1.3--h87f3376_1': 'quay.io/biocontainers/meryl:1.3--h87f3376_1' }" diff --git a/modules/meryl/unionsum/main.nf b/modules/meryl/unionsum/main.nf index 98baa87..881b522 100644 --- a/modules/meryl/unionsum/main.nf +++ b/modules/meryl/unionsum/main.nf @@ -2,7 +2,7 @@ process MERYL_UNIONSUM { tag "$meta.id" label 'process_low' - conda (params.enable_conda ? "bioconda::meryl=1.3" : null) + conda "bioconda::meryl=1.3" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/meryl:1.3--h87f3376_1': 'quay.io/biocontainers/meryl:1.3--h87f3376_1' }" diff --git a/modules/minimap2/align/main.nf b/modules/minimap2/align/main.nf index 4c952dc..c3de828 100644 --- a/modules/minimap2/align/main.nf +++ b/modules/minimap2/align/main.nf @@ -1,8 +1,8 @@ process MINIMAP2_ALIGN { tag "$meta.id" - label 'process_medium' + label 'process_high' - conda (params.enable_conda ? 'bioconda::minimap2=2.21 bioconda::samtools=1.12' : null) + conda 'bioconda::minimap2=2.21 bioconda::samtools=1.12' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:1679e915ddb9d6b4abda91880c4b48857d471bd8-0' : 'quay.io/biocontainers/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:1679e915ddb9d6b4abda91880c4b48857d471bd8-0' }" diff --git a/modules/minimap2/index/.main.swp b/modules/minimap2/index/.main.swp new file mode 100644 index 0000000..a1a3c9d Binary files /dev/null and b/modules/minimap2/index/.main.swp differ diff --git a/modules/minimap2/index/main.nf b/modules/minimap2/index/main.nf index 25e9429..bdb101a 100644 --- a/modules/minimap2/index/main.nf +++ b/modules/minimap2/index/main.nf @@ -1,7 +1,7 @@ process MINIMAP2_INDEX { label 'process_medium' - conda (params.enable_conda ? 'bioconda::minimap2=2.21' : null) + conda 'bioconda::minimap2=2.21' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/minimap2:2.21--h5bf99c6_0' : 'quay.io/biocontainers/minimap2:2.21--h5bf99c6_0' }" diff --git a/modules/mitohifi/findmitoreference/main.nf b/modules/mitohifi/findmitoreference/main.nf index 2c5ea79..61dd26c 100644 --- a/modules/mitohifi/findmitoreference/main.nf +++ b/modules/mitohifi/findmitoreference/main.nf @@ -11,15 +11,22 @@ process FIND_MITO_REFERENCE { output: tuple val (meta), path("*.fasta"), emit : reference_fasta tuple val (meta), path("*.gb"), emit : reference_gb + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" + def VERSION = "3.0.0" """ - findMitoReference.py \\ + ${params.singularity_cache}/MitoHiFi/findMitoReference.py \\ --species "$specie" \\ - --email $params.email \\ + --email ${params.email_adress} \\ --outfolder . \\ --min_length 16000 + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + MitoHiFi: $VERSION + END_VERSIONS """ } diff --git a/modules/mitohifi/mitohifi/main.nf b/modules/mitohifi/mitohifi/main.nf index 644742d..cdbf127 100644 --- a/modules/mitohifi/mitohifi/main.nf +++ b/modules/mitohifi/mitohifi/main.nf @@ -2,7 +2,7 @@ process MITOHIFI { tag "$meta.id" label 'process_medium' - conda '/home/miniconda3/envs/mitohifi_v3' + conda '/home/scorreard/miniconda3/envs/mitohifi_v3' input: tuple val(meta), path(reads_fasta) @@ -13,16 +13,26 @@ process MITOHIFI { tuple val (meta), path('final_mitogenome.fasta'), emit : final_mito_fasta tuple val (meta), path('final_mitogenome.gb'), emit : final_mito_gb tuple val (meta), path('contigs_stats.tsv'), emit : mito_contig_stat + tuple val (meta), path('*.png'), emit : figures + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ - python mitohifi.py \\ + python ${params.singularity_cache}/MitoHiFi/mitohifi.py \\ $args \\ -r "$reads_fasta" \\ -f $reference_fasta \\ -g $reference_gb \\ -t 10 + + sed ' 1 s/.*/& [topology=circular] [location=mitochondrion]/' final_mitogenome.fasta > final_mitogenome_tagged.fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + mitohifi : \$(python ${params.singularity_cache}/MitoHiFi/mitohifi.py --version | sed 's/MitoHiFi//g') + END_VERSIONS """ } diff --git a/modules/multiqc/main.nf b/modules/multiqc/main.nf index 2b5765b..4bd8768 100644 --- a/modules/multiqc/main.nf +++ b/modules/multiqc/main.nf @@ -1,7 +1,7 @@ process MULTIQC { label 'process_single' - conda (params.enable_conda ? 'bioconda::multiqc=1.13' : null) + conda 'bioconda::multiqc=1.13' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/multiqc:1.13--pyhdfd78af_0' : 'quay.io/biocontainers/multiqc:1.13--pyhdfd78af_0' }" @@ -32,14 +32,17 @@ process MULTIQC { echo \$coverage_estimation multiqc \\ - --force \\ $args \\ $config \\ $extra_config \\ - -b "\${coverage_estimation}" \\ - . - - + -b "Specie common name : ${params.id}" \\ + -b "Specie taxonomic ID : ${params.taxon_taxid}" \\ + -b "Specie scientific name : ${params.taxon_name}" \\ + -b "Specie ploidy : ${params.ploidy}" \\ + -b "Specie genome size : ${params.hap_gen_size_Gb}" \\ + -b "Specie number of chromosomes : ${params.chrom_num}" \\ + -b "\${coverage_estimation}" \\ + . cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/ncbigenomedownload/main.nf b/modules/ncbigenomedownload/main.nf new file mode 100644 index 0000000..9f59623 --- /dev/null +++ b/modules/ncbigenomedownload/main.nf @@ -0,0 +1,49 @@ +process NCBIGENOMEDOWNLOAD { + tag "$meta.id" + label 'process_low' + + conda "bioconda::ncbi-genome-download=0.3.1" +// container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? +// 'https://depot.galaxyproject.org/singularity/ncbi-genome-download:0.3.1--pyh5e36f6f_0' : +// 'biocontainers/ncbi-genome-download:0.3.1--pyh5e36f6f_0' }" + + input: + val meta + path accessions + + output: + tuple val(meta), path("*_genomic.gbff.gz") , emit: gbk , optional: true + tuple val(meta), path("*_genomic.fna.gz") , emit: fna , optional: true + tuple val(meta), path("*_rm.out.gz") , emit: rm , optional: true + tuple val(meta), path("*_feature_table.txt.gz") , emit: features, optional: true + tuple val(meta), path("*_genomic.gff.gz") , emit: gff , optional: true + tuple val(meta), path("*_protein.faa.gz") , emit: faa , optional: true + tuple val(meta), path("*_protein.gpff.gz") , emit: gpff , optional: true + tuple val(meta), path("*_wgsmaster.gbff.gz") , emit: wgs_gbk , optional: true + tuple val(meta), path("*_cds_from_genomic.fna.gz"), emit: cds , optional: true + tuple val(meta), path("*_rna.fna.gz") , emit: rna , optional: true + tuple val(meta), path("*_rna_from_genomic.fna.gz"), emit: rna_fna , optional: true + tuple val(meta), path("*_assembly_report.txt") , emit: report , optional: true + tuple val(meta), path("*_assembly_stats.txt") , emit: stats , optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def accessions_opt = accessions ? "-A ${accessions}" : "" + """ + ncbi-genome-download \\ + $args \\ + $accessions_opt \\ + --output-folder ./ \\ + --flat-output + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ncbigenomedownload: \$( ncbi-genome-download --version ) + END_VERSIONS + """ +} diff --git a/modules/overview_generation/.publication.R.swp b/modules/overview_generation/.publication.R.swp new file mode 100644 index 0000000..bb06504 Binary files /dev/null and b/modules/overview_generation/.publication.R.swp differ diff --git a/modules/overview_generation/publication.R b/modules/overview_generation/publication.R new file mode 100644 index 0000000..7825259 --- /dev/null +++ b/modules/overview_generation/publication.R @@ -0,0 +1,140 @@ +install.packages('ReporteRs') +library('ReporteRs') + +# Create a word document to contain R outputs +doc <- docx() +# Add a title to the document +doc <- addTitle(doc, paste0("The genome sequence of the ",common_name, ", ", scientific_name) level=1) + +# Add a sub title (author list) +doc <- addTitle(doc, "Author list and affiliation", level = 2) +doc <- addParagraph(doc, "") + +# Add a sub title (abstract) +doc <- addTitle(doc, "Abstract", level = 2) +doc <- addParagraph(doc, paste0("We present a genome assembly of ", scientific_name, " (the ", common_name, "; . The genome sequence is ", genome_size, " gigabases in size. The assembly is composed of ", scaffold_number, ", with a N50 of ", scaffold_N50, " and a BUSCO score of ", Busco_lin1, " for lineage ", lin1, ".")) + +# Add a sub title (Keywords) +doc <- addTitle(doc, "Keywords", level = 2) +doc <- addParagraph(doc, paste0(scientific_name, ", ", common_name, ", genome sequence")) + +# Add a sub title (Taxonomy) +doc <- addTitle(doc, "Species taxonomy", level = 2) +doc <- addParagraph(doc, "") + +# Add a sub title (Intro) +doc <- addTitle(doc, "Introduction", level = 2) +doc <- addParagraph(doc, paste0("The ", common_name, ", ", scientific_name, ". The genome of ", scientific_name, " was sequenced as part of the Canadian BioGenome Project (CBP). The ", scientific_name, " genome will provide insights into genomic diversity and architecture, and inform conservation genomics applications.")) + +# Add a sub title (Method) +doc <- addTitle(doc, "Methods", level = 2) +doc <- addTitle(doc, "Sample collection", level = 3) +doc <- addParagraph(doc, "") + +doc <- addTitle(doc, "Sample extraction, library construction and sequencing", level = 3) +doc <- addParagraph(doc, " using the at . PacBio genome libraries were constructed using and sequenced on at . A Hi-C library was constructed using the at and subjected to PE150 sequencing on an instrument at . If short read, RNA or other data is generated for this specie, add the information>") + +doc <- addTitle(doc, "Genome assembly", level = 3) +if ((assembly_method == "hifiasm") & (purging_method == "purge_dups") & (scaffolding_method == "yahs") & (manual_curation == "none")) { + # Hifiasm + purge_dups + yahs (no manual curation) + doc <- addParagraph(doc, paste0("Assembly was carried out using hifiasm with the ", assembly_secondary_mode, " mode (Cheng et al., 2021). Purging was done using purge_dups (Guan et al., 2020). Scaffolding with Hi-C data was carried out using YaHS (Zhou et al., 2023). The Hi-C contact maps were generated using Pretext (Harry, 2022). The final sequence was analyzed using BlobToolKit (Challis et al., 2020) and BUSCO scores were generated (Manni et al., 2021; Simão et al., 2015). The steps listed before as well as the generation of the manuscript template are organized in a nextflow pipeline available at : https://github.com/bcgsc/Canadian_Biogenome_Project. Software tools and versions are listed in Table 3.")) +} else if ((assembly_method == "hifiasm") & (purging_method == "purge_dups") & (scaffolding_method == "yahs") & (manual_curation == "yes")) { + # Hifiasm + purge_dups + yahs + manual curation using JUICER + doc <- addParagraph(doc, paste0("Assembly was carried out using hifiasm with the ", assembly_secondary_mode, " (Cheng et al., 2021). Purging was done using purge_dups (Guan et al., 2020). Scaffolding with Hi-C data was carried out using YaHS (Zhou et al., 2023). The Hi-C contact maps were generated using Pretext (Harry, 2022). Manual curation was performed using Juicer (Durand et al., 2016). The final sequence was analyzed using BlobToolKit (Challis et al., 2020) and BUSCO scores were generated (Manni et al., 2021; Simão et al., 2015). The steps listed before as well as the generation of the manuscript template are organized in a nextflow pipeline available at : https://github.com/bcgsc/Canadian_Biogenome_Project. Software tools and versions are listed in Table 3.")) +} else if ((assembly_method == "flye") & (purging_method == "purge_dups") & (scaffolding_method == "yahs") & (manual_curation == "none")) { + # Flye + purge_dups + yahs (no manual curation) + doc <- addParagraph(doc, paste0("Assembly was carried out using flye (Kolmogorov et al., 2019). Purging was done using purge_dups (Guan et al., 2020). Scaffolding with Hi-C data was carried out using YaHS (Zhou et al., 2023). The Hi-C contact maps were generated using Pretext (Harry, 2022). The final sequence was analyzed using BlobToolKit (Challis et al., 2020) and BUSCO scores were generated (Manni et al., 2021; Simão et al., 2015). The steps listed before as well as the generation of the manuscript template are organized in a nextflow pipeline available at : https://github.com/bcgsc/Canadian_Biogenome_Project. Software tools and versions are listed in Table 3.")) +} else if ((assembly_method == "flye") & (purging_method == "purge_dups") & (scaffolding_method == "yahs") & (manual_curation == "yes")) { + # Flye + purge_dups + yahs + manual curation using JUICER + doc <- addParagraph(doc, paste0("Assembly was carried out using flye (Kolmogorov et al., 2019). Purging was done using purge_dups (Guan et al., 2020). Scaffolding with Hi-C data was carried out using YaHS (Zhou et al., 2023). The Hi-C contact maps were generated using Pretext (Harry, 2022). Manual curation was performed using Juicer (Durand et al., 2016). The final sequence was analyzed using BlobToolKit (Challis et al., 2020) and BUSCO scores were generated (Manni et al., 2021; Simão et al., 2015). The steps listed before as well as the generation of the manuscript template are organized in a nextflow pipeline available at : https://github.com/bcgsc/Canadian_Biogenome_Project. Software tools and versions are listed in Table 3.")) +} else { + doc <- addParagraph(doc, paste0("")) +} + +if (mitohifi == "yes") { + doc <- addParagraph(doc, paste0("The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021).")) +} + +##MAY NEED TO INCLUDE MERQURY IF FIGURE OR QV USED +#To evaluate the assembly, MerquryFK was used to estimate consensus quality (QV) scores and k-mer completeness (Rhie et al., 2020). + +#May need to include barcoding + +# Add a sub title (Results) +doc <- addTitle(doc, "Results", level = 2) +doc <- addTitle(doc, "Genome sequence report", level = 3) +if (polishing_method == "none") { + doc <- addParagraph(doc, paste0("The genome of ", common_name, " collected from was sequenced. A total of ", pacbio_coverage, "-fold coverage in PacBio long reads (read quality > ", pacbio_minrq, ") were generated. Contigs were then scaffolded with Hi-C data or >. The final assembly has a total length of ", assembly_length, "Gb organized in ", scaffold_number, "sequence scaffolds with a scaffold N50 of ", scaffold_N50, " Mb (Table 1). ", perc_ass_chr, " of the assembly sequence was assigned to the ", chrom_number, "longest scaffolds representing the species’ known ", chrom_number, " autosomes () (numbered by sequence length; Figure 1–Figure 4; Table 2). Determining gene coverage using BUSCO, we estimated ", Busco_lin1, "% gene completeness using the ", lin1, " reference set (Manni et al., 2021).")) +} else { + doc <- addParagraph(doc, paste0("")) +} + + +doc <- addTitle(doc, "Genome annotation", level = 3) +doc <- addParagraph(doc, paste0("Annotation for the ", common_name, " genome assembly (", assembly_name, " (", assembly_number, ")) was generated by the Ensembl Rapid Release gene annotation pipeline (Aken et al., 2016). The resulting Ensembl annotation includes transcripts assigned to coding and non-coding genes (", scientific name, " - Ensembl Rapid Release). The ", common_name, " assembly was also annotated for protein sequences using RefSeq ().")) + + +# Add a sub title (Data availability) +doc <- addTitle(doc, "Data availability", level = 2) +doc <- addTitle(doc, "Underlying data", level = 3) +doc <- addParagraph(doc, paste0("National Centre for Biotechnology Information BioProject: ", common_name, " (", scientific_name, ") genome sequencing and assembly, ", assembly name, ". Accession number: .") +doc <- addParagraph(doc, paste0("The genome sequence is released openly for reuse. The ", scientific_name, " genome sequencing initiative is part of the Canadian BioGenome Project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome is annotated through the Reference Sequence (RefSeq) database in BioProject accession number . Raw data and assembly accession identifiers are reported in Table 1.")) + + +# Add a sub title (References) +doc <- addTitle(doc, "References", level = 2) +doc <- addParagraph(doc, "") + +#Figures +doc <- addTitle(doc, "Figures", level = 1) +doc <- addTitle(doc, paste0("Figure 1. Genome assembly of ", scientific_name, ", ", assembly_name, ": metrics."), level = 2) +doc <- addImage(doc, "Fig1.png") +doc <- addParagraph(doc, paste0("Snail plot showing N50 metrics, base pair composition and BUSCO gene completeness for ", scientific_name, " (", assembly_name, ") generated from Blobtoolkit v.2.6.4 (Challis et al., 2020). The plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the ", assembly_length, " bp assembly. The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (", length_longest_scaffold, " bp) shown in red. Orange and pale-orange arcs show the N50 and N90 chromosome lengths (", scaffold_N50, " bp and ", scaffold_N90, " bp, respectively). The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot displays the distribution of GC (blue), AT (pale blue) and N (white) percentages using the same bins as the inner plot. A summary of complete (", Busco_complete_lin1, "%), fragmented (", Busco_frag_lin1, "%), duplicated (", Busci_dup_lin1, "%), and missing (", Busco_missing_lin1, "%) BUSCO genes in the ", lin1, " set is show in the top right.")) + +doc <- addTitle(doc, paste0("Figure 2. Genome assembly of ", scientific_name, ", ", assembly_name, ": GC-content."), level = 2) +doc <- addImage(doc, "Fig2.png") +doc <- addParagraph(doc, paste0("GC-coverage plot of ", scientific_name, " (", assembly_name, ") generated from Blobtoolkit v.2.6.4 (Challis et al., 2020). Scaffolds are coloured by phylum with represented by blue and no-hit represented by pale blue. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis.")) + +doc <- addTitle(doc, paste0("Figure 3. Genome assembly of ", scientific_name, ", ", assembly_name, ": cumulative sequence length."), level = 2) +doc <- addImage(doc, "Fig3.png") +doc <- addParagraph(doc, paste0("Cumulative sequence length of ", scientific_name, " (", assembly_name, ") generated from Blobtoolkit v.2.6.4 (Challis et al., 2020). The grey line shows the cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the BUSCO genes tax rule, with represented by blue and no-hit represented by pale blue.")) + +doc <- addTitle(doc, paste0("Figure 4. Genome assembly of ", scientific_name, ", ", assembly_name, ": Hi-C contact map."), level = 2) +doc <- addImage(doc, "Fig4.png") +doc <- addParagraph(doc, paste0("HiC contact map of ", assembly_name, " assembly visualized using PRETEXT . Scaffolds are shown in order of size from left to right and top to bottom.")) + +#Tables +doc <- addTitle(doc, "Tables", level = 1) +doc <- addTitle(doc, paste0("Table 1. Genome data for ", scientific_name, ", ", assembly_name, "."), level = 2) +data_table1 <- matrix(c("Project accession data", " ", "Assembly identifier", assembly_name, "Species", scientific_name, "Specimen", "", "NCBI Taxonomy ID", "", "BioProject", "", "BioSample ID", "", "Isolate Information", ""), ncol=2, byrow=TRUE) +Table1 <- vanilla.table(data_table1) +Table1 <- setZebraStyle(Table1, odd = '#eeeeee', even = 'white') +doc <- addFlexTable( doc, Table1) +doc <- addParagraph(doc, paste0("* BUSCO scores based on the ", lin1, " BUSCO set using v5.0.0. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison.")) + + + + + + + + + + + +# Add a hyperlink +#list of hyperlinks : Bioproject, Ensembl release, CBP website? Link to pipeine? References? Tables? Figures? +#my_link <- pot('Click here to visit STHDA web site!', +# hyperlink = 'http://www.sthda.com/english', +# format=textBoldItalic(color = 'blue', underline = TRUE )) +#doc <- addParagraph(doc, my_link) + + + + + + +#http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-package + +# Write the Word document to a file +writeDoc(doc, file = paste0(common_name, "_publication_template.docx")) diff --git a/modules/overview_generation/sample/main.nf b/modules/overview_generation/sample/main.nf new file mode 100644 index 0000000..d1df173 --- /dev/null +++ b/modules/overview_generation/sample/main.nf @@ -0,0 +1,75 @@ +process OVERVIEW_GENERATION_SAMPLE { + tag "$meta.id" + label 'process_low' + + //container = 'https://depot.galaxyproject.org/singularity/r-stringr%3A1.1.0--r3.3.1_0' + container = 'https://depot.galaxyproject.org/singularity/r-rjson%3A0.2.15--r3.3.2_0' + + input: + tuple val(meta), path(longqc) //LonQC + tuple val(meta1), path(kraken_pacbio) //kraken_pacbio + tuple val(meta2), path(kraken_hic) //kraken_hic + path(quast_contig) //quast_contig + path(quast_contig_purged) //quast_contig_purged + path(quast_scaffold) //quast_scaffold + tuple val(meta6), path(busco_lin1) //busco_lineage1 + tuple val(meta7), path(busco_lin2) //busco_lineage2 + tuple val(meta8), path(busco_lin3) //busco_lineage3 + tuple val(meta9), path(busco_lin4) //busco_lineage4 + tuple val(meta10), path(chrom_size) + val (ploidy) + val(haploid_number) + val(scientific_name) + val(genome_size) + + output : + tuple val(meta), path('*.tsv'), emit: overview + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + Rscript ${params.modules_path}/overview_generation/sample/overview_generation_sample.R \\ + $params.id \\ + $params.pipeline_version \\ + $params.outdir \\ + $params.pacbio_input_type \\ + $params.bam_cell1 \\ + $params.bam_cell2 \\ + $params.bam_cell3 \\ + $params.bam_cell4 \\ + $params.ont_fastq_1 \\ + $params.hic_read1 \\ + $params.hic_read2 \\ + $params.illumina_SR_read1 \\ + $params.illumina_SR_read2 \\ + $params.pacbio_rq \\ + $params.assembly_method \\ + $params.assembly_secondary_mode \\ + $params.polishing_method \\ + $params.purging_method \\ + $params.scaffolding_method \\ + $params.manual_curation \\ + $params.mitohifi \\ + $params.taxon_taxid \\ + $ploidy \\ + $genome_size \\ + $haploid_number \\ + $params.lineage \\ + $params.lineage2 \\ + $params.lineage3 \\ + $params.lineage4 \\ + $longqc \\ + $kraken_pacbio \\ + $kraken_hic \\ + $quast_contig \\ + $quast_contig_purged \\ + $quast_scaffold \\ + $busco_lin1 \\ + $busco_lin2 \\ + $busco_lin3 \\ + $busco_lin4 \\ + $chrom_size \\ + $scientific_name + """ +} diff --git a/modules/overview_generation/sample/overview_generation_sample.R b/modules/overview_generation/sample/overview_generation_sample.R new file mode 100644 index 0000000..4a65485 --- /dev/null +++ b/modules/overview_generation/sample/overview_generation_sample.R @@ -0,0 +1,455 @@ +#rjson_url = "https://cran.r-project.org/src/contrib/Archive/rjson/rjson_0.2.20.tar.gz" + +#install.packages(rjson_url, repos=NULL, type="source", lib="${params.singularity_cache}") +library("rjson") +#library(dplyr) +#library(stringr) + + +args <- commandArgs(trailingOnly = TRUE) + +xxid =(args[1]) +pipeline_version = (args[2]) +outdir =(args[3]) +pacbio_input_type =(args[4]) +bam_cell1 =(args[5]) +bam_cell2 =(args[6]) +bam_cell3 =(args[7]) +bam_cell4 =(args[8]) +ont_fastq_1 =(args[9]) +hic_read1 =(args[10]) +hic_read2 =(args[11]) +illumina_SR_read1 =(args[12]) +illumina_SR_read2 =(args[13]) +pacbio_rq =(args[14]) +assembly_method =(args[15]) +assembly_secondary_mode =(args[16]) +polishing_method =(args[17]) +purging_method = (args[18]) +scaffolding_method =(args[19]) +manual_curation=(args[20]) +mitohifi=(args[21]) +taxon_taxid =(args[22]) +ploidy =(args[23]) +hap_gen_size_Gb =(args[24]) +chrom_num =(args[25]) +lineage =(args[26]) +lineage2 =(args[27]) +lineage3 =(args[28]) +lineage4 =(args[29]) + +taxon_name =paste0((args[41]), "_", (args[42])) + +#Depending on the version, the genome size is in bp or in Gb, need to make it all in Gb +if (hap_gen_size_Gb > 1000) { + hap_gen_size_Gb = as.numeric(hap_gen_size_Gb)/1000000000 +} + +#Count the number of files +#number of pacbio bam files (including both barcoded and unbarcoded) +pacbio_concat = paste(bam_cell1, bam_cell2, bam_cell3, bam_cell4, sep = "; ") +#pacbio_n_files = 4-str_count(pacbio_concat, "null") +pacbio_n_files = 4 - lengths(regmatches(pacbio_concat, gregexpr("null", pacbio_concat))) + + +#hic +#Number of PAIRED filed +hic_concat = paste(hic_read1, hic_read2, sep = "; ") +#hic_n_paired_files = 2-str_count(hic_concat, "null") +hic_n_paired_files = (2 - lengths(regmatches(hic_concat, gregexpr("null", hic_concat))))/2 + +#pe150 (illumina short reads) +#Number of PAIRED filed +pe150_concat = paste(illumina_SR_read1, illumina_SR_read2, sep = "; ") +#pe150_n_paired_files = 2-str_count(pe150_concat, "null") +pe150_n_paired_files = (2 - lengths(regmatches(pe150_concat, gregexpr("null", pe150_concat))))/2 + +#ONT +ont_concat = paste(ont_fastq_1, sep = "; ") +#ont_n_files = 1-str_count(ont_concat, "null") +ont_n_files = 1- lengths(regmatches(ont_concat, gregexpr("null", ont_concat))) + +#Extract the information from the different files to generate aggregated tsv and figures + +#LongQC data +if (file.exists(args[30])) { + #longqc_report= fromJSON(file="~/Downloads/Greenland_cockle_R/QC_vals_longQC_sampleqc_Greenland_cockle_004.json") + longqc_report= fromJSON(file=(args[30])) + pacbio_n_reads_longqc = longqc_report[["Num_of_reads"]] + pacbio_longest_read_longqc_kb = longqc_report[["Longest_read"]]/1000 + pacbio_mean_read_length_longqc_kb = longqc_report[["Length_stats"]][["Mean_read_length"]]/1000 + pacbio_coverage_x=(pacbio_n_reads_longqc*pacbio_mean_read_length_longqc_kb)/(as.numeric(hap_gen_size_Gb)*1000000) +} else { + pacbio_n_reads_longqc = "NA" + pacbio_longest_read_longqc_kb = "NA" + pacbio_mean_read_length_longqc_kb = "NA" + pacbio_coverage_xpacbio_coverage_xpacbio_coverage_x = "NA" + pacbio_coverage_x = "NA" +} +lonqc_overview = cbind(pacbio_n_reads_longqc, pacbio_longest_read_longqc_kb, pacbio_mean_read_length_longqc_kb, pacbio_coverage_x) +#head(lonqc_overview) + + +#Pacbio Kraken results +#kraken_pacbio_results = read.table("~/Downloads/Greenland_cockle_R/Greenland_cockle_004.kraken2.report.txt", sep = "\t", blank.lines.skip = FALSE, quote="", comment.char="") +kraken_pacbio_results = read.table(args[31], sep = "\t", blank.lines.skip = FALSE, quote="", comment.char="") +colnames(kraken_pacbio_results)=c("perc", "n_reads_clade", "n_reads_taxon", "rank_code", "NCBI_tax", "scientific_name") +#Remove spaces in scientific_name column +kraken_pacbio_results$scientific_name = gsub('\\s+', '', kraken_pacbio_results$scientific_name) +kraken_pacbio_actinopteri_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "Actinopteri", 1] +kraken_pacbio_amphibia_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "Amphibia", 1] +kraken_pacbio_aves_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "Aves", 1] +kraken_pacbio_bivalvia_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "Bivalvia", 1] +kraken_pacbio_insecta_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "Insecta", 1] +kraken_pacbio_magnoliopsida_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "Magnoliopsida", 1] +kraken_pacbio_mammallia_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "Mammalia", 1] +kraken_pacbio_unclassified_perc = kraken_pacbio_results[kraken_pacbio_results$scientific_name == "unclassified", 1] +kraken_pacbio_other_perc = 100 - (kraken_pacbio_actinopteri_perc + kraken_pacbio_amphibia_perc + kraken_pacbio_aves_perc + kraken_pacbio_bivalvia_perc + kraken_pacbio_insecta_perc + kraken_pacbio_magnoliopsida_perc + kraken_pacbio_mammallia_perc + kraken_pacbio_unclassified_perc) + +kraken_pacbio_overview = cbind(kraken_pacbio_actinopteri_perc, kraken_pacbio_amphibia_perc, kraken_pacbio_aves_perc, kraken_pacbio_bivalvia_perc, kraken_pacbio_insecta_perc, kraken_pacbio_magnoliopsida_perc, kraken_pacbio_mammallia_perc, kraken_pacbio_unclassified_perc, kraken_pacbio_other_perc) + +#Kraken hic results +if (hic_n_paired_files > 0){ + #kraken_hic_results = read.table("~/Downloads/Greenland_cockle_R/Greenland_cockle_004.kraken2.report.txt", sep = "\t", blank.lines.skip = FALSE, quote="", comment.char="") + kraken_hic_results = read.table(args[32], sep = "\t", blank.lines.skip = FALSE, quote="", comment.char="") + colnames(kraken_hic_results)=c("perc", "n_reads_clade", "n_reads_taxon", "rank_code", "NCBI_tax", "scientific_name") + #Remove spaces in scientific_name column + kraken_hic_results$scientific_name = gsub('\\s+', '', kraken_hic_results$scientific_name) + kraken_hic_actinopteri_perc = kraken_hic_results[kraken_hic_results$scientific_name == "Actinopteri", 1] + kraken_hic_amphibia_perc = kraken_hic_results[kraken_hic_results$scientific_name == "Amphibia", 1] + kraken_hic_aves_perc = kraken_hic_results[kraken_hic_results$scientific_name == "Aves", 1] + kraken_hic_bivalvia_perc = kraken_hic_results[kraken_hic_results$scientific_name == "Bivalvia", 1] + kraken_hic_insecta_perc = kraken_hic_results[kraken_hic_results$scientific_name == "Insecta", 1] + kraken_hic_magnoliopsida_perc = kraken_hic_results[kraken_hic_results$scientific_name == "Magnoliopsida", 1] + kraken_hic_mammallia_perc = kraken_hic_results[kraken_hic_results$scientific_name == "Mammalia", 1] + kraken_hic_unclassified_perc = kraken_hic_results[kraken_hic_results$scientific_name == "unclassified", 1] + kraken_hic_other_perc = 100 - (kraken_hic_actinopteri_perc + kraken_hic_amphibia_perc + kraken_hic_aves_perc + kraken_hic_bivalvia_perc + kraken_hic_insecta_perc + kraken_hic_magnoliopsida_perc + kraken_hic_mammallia_perc + kraken_hic_unclassified_perc) + kraken_hic_root_numer_reads = kraken_hic_results[kraken_hic_results$scientific_name == "root", 2] + kraken_hic_unclassified_numer_reads = kraken_hic_results[kraken_hic_results$scientific_name == "unclassified", 2] + kraken_hic_root_unclassified_numer_reads = kraken_hic_root_numer_reads + kraken_hic_unclassified_numer_reads +} else { + kraken_hic_actinopteri_perc = "NA" + kraken_hic_amphibia_perc = "NA" + kraken_hic_aves_perc = "NA" + kraken_hic_bivalvia_perc = "NA" + kraken_hic_insecta_perc = "NA" + kraken_hic_magnoliopsida_perc = "NA" + kraken_hic_mammallia_perc = "NA" + kraken_hic_unclassified_perc = "NA" + kraken_hic_other_perc = "NA" + kraken_hic_root_numer_reads = "NA" + kraken_hic_unclassified_numer_reads = "NA" + kraken_hic_root_unclassified_numer_reads = "NA" +} + +kraken_hic_overview = cbind(kraken_hic_actinopteri_perc, kraken_hic_amphibia_perc, kraken_hic_aves_perc, kraken_hic_bivalvia_perc, kraken_hic_insecta_perc, kraken_hic_magnoliopsida_perc, kraken_hic_mammallia_perc, kraken_hic_unclassified_perc, kraken_hic_other_perc, kraken_hic_root_numer_reads, kraken_hic_unclassified_numer_reads, kraken_hic_root_unclassified_numer_reads) + +#Quast +#Differs if hifiasm is used as there is both haplotypes (Quast_double) +#from quast_report.tsv +#quast_table_contig = read.table("~/Downloads/Greenland_cockle_R/Greenland_cockle_004_report.tsv", sep = "\t", header=T, comment.char="@") +quast_table_contig = read.table(args[33], sep = "\t", header=T, comment.char="@") + +hap1_contig_n50_quast_mb = quast_table_contig[quast_table_contig$Assembly=="N50",2]/1000000 +hap1_contig_l50_quast = quast_table_contig[quast_table_contig$Assembly=="L50",2] +hap1_contig_n90_quast_mb = quast_table_contig[quast_table_contig$Assembly=="N90",2]/1000000 +hap1_contig_l90_quast = quast_table_contig[quast_table_contig$Assembly=="L90",2] +hap1_contig_assembly_length_quast_gb = quast_table_contig[quast_table_contig$Assembly =="Total length",2]/1000000000 +hap1_contig_largest_contig_quast_mb = quast_table_contig[quast_table_contig$Assembly =="Largest contig",2]/1000000 +hap1_contig_number_quast = quast_table_contig[quast_table_contig$Assembly =="# contigs",2] +hap1_contig_GC_quast = quast_table_contig[quast_table_contig$Assembly == "GC (%)", 2] + +if (assembly_method == "hifiasm") { + hap2_contig_n50_quast_mb = quast_table_contig[quast_table_contig$Assembly=="N50",3]/1000000 + hap2_contig_l50_quast = quast_table_contig[quast_table_contig$Assembly=="L50",3] + hap2_contig_n90_quast_mb = quast_table_contig[quast_table_contig$Assembly=="N90",3]/1000000 + hap2_contig_l90_quast = quast_table_contig[quast_table_contig$Assembly=="L90",3] + hap2_contig_assembly_length_quast_gb = quast_table_contig[quast_table_contig$Assembly =="Total length",3]/1000000000 + hap2_contig_largest_contig_quast_mb = quast_table_contig[quast_table_contig$Assembly =="Largest contig",3]/1000000 + hap2_contig_number_quast = quast_table_contig[quast_table_contig$Assembly =="# contigs",3] + hap2_contig_GC_quast = quast_table_contig[quast_table_contig$Assembly == "GC (%)", 3] +} else { + hap2_contig_n50_quast_mb = "NA" + hap2_contig_l50_quast = "NA" + hap2_contig_n90_quast_mb = "NA" + hap2_contig_l90_quast = "NA" + hap2_contig_assembly_length_quast_gb = "NA" + hap2_contig_largest_contig_quast_mb = "NA" + hap2_contig_number_quast = "NA" + hap2_contig_GC_quast = "NA" +} + +quast_contig_overview = cbind (hap1_contig_n50_quast_mb, hap1_contig_l50_quast, hap1_contig_n90_quast_mb, hap1_contig_l90_quast, hap1_contig_assembly_length_quast_gb, hap1_contig_largest_contig_quast_mb, hap1_contig_number_quast, hap1_contig_GC_quast, hap2_contig_n50_quast_mb, hap2_contig_l50_quast, hap2_contig_n90_quast_mb, hap2_contig_l90_quast, hap2_contig_assembly_length_quast_gb, hap2_contig_largest_contig_quast_mb, hap2_contig_number_quast, hap2_contig_GC_quast) + +#If assembly is purged +if (file.exists(args[34])){ + #quast_table_contig_purged = read.table("~/Downloads/Greenland_cockle_R/Greenland_cockle_004_report.tsv", sep = "\t", header=T, comment.char="@") + quast_table_contig_purged = read.table(args[34], sep = "\t", header=T, comment.char="@") + + hap1_contig_purged_n50_quast_mb = quast_table_contig_purged[quast_table_contig_purged$Assembly=="N50",2]/1000000 + hap1_contig_purged_l50_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly=="L50",2] + hap1_contig_purged_n90_quast_mb = quast_table_contig_purged[quast_table_contig_purged$Assembly=="N90",2]/1000000 + hap1_contig_purged_l90_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly=="L90",2] + hap1_contig_purged_assembly_length_quast_gb = quast_table_contig_purged[quast_table_contig_purged$Assembly =="Total length",2]/1000000000 + hap1_contig_purged_largest_contig_quast_mb = quast_table_contig_purged[quast_table_contig_purged$Assembly =="Largest contig",2]/1000000 + hap1_contig_purged_number_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly =="# contigs",2] + hap1_contig_purged_GC_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly =="GC (%)", 2] +} else { + hap1_contig_purged_n50_quast_mb = "NA" + hap1_contig_purged_l50_quast = "NA" + hap1_contig_purged_n90_quast_mb = "NA" + hap1_contig_purged_l90_quast = "NA" + hap1_contig_purged_assembly_length_quast_gb = "NA" + hap1_contig_purged_largest_contig_quast_mb = "NA" + hap1_contig_purged_number_quast = "NA" + hap1_contig_purged_GC_quast = "NA" +} + +if ((file.exists(args[34])) & assembly_method == "hifiasm") { + #quast_table_contig_purged = read.table("~/Downloads/Greenland_cockle_R/Greenland_cockle_004_report.tsv", sep = "\t", header=T, comment.char="@") + quast_table_contig_purged = read.table(args[34], sep = "\t", header=T, comment.char="@") + + hap2_contig_purged_n50_quast_mb = quast_table_contig_purged[quast_table_contig_purged$Assembly=="N50",3]/1000000 + hap2_contig_purged_l50_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly=="L50",3] + hap2_contig_purged_n90_quast_mb = quast_table_contig_purged[quast_table_contig_purged$Assembly=="N90",3]/1000000 + hap2_contig_purged_l90_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly=="L90",3] + hap2_contig_purged_assembly_length_quast_gb = quast_table_contig_purged[quast_table_contig_purged$Assembly =="Total length",3]/1000000000 + hap2_contig_purged_largest_contig_quast_mb = quast_table_contig_purged[quast_table_contig_purged$Assembly =="Largest contig",3]/1000000 + hap2_contig_purged_number_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly =="# contigs",3] + hap2_contig_purged_GC_quast = quast_table_contig_purged[quast_table_contig_purged$Assembly =="GC (%)", 3] +} else { + hap2_contig_purged_n50_quast_mb = "NA" + hap2_contig_purged_l50_quast = "NA" + hap2_contig_purged_n90_quast_mb = "NA" + hap2_contig_purged_l90_quast = "NA" + hap2_contig_purged_assembly_length_quast_gb = "NA" + hap2_contig_purged_largest_contig_quast_mb = "NA" + hap2_contig_purged_number_quast = "NA" + hap2_contig_purged_GC_quast = "NA" +} + +quast_contig_purged_overview = cbind (hap1_contig_purged_n50_quast_mb, hap1_contig_purged_l50_quast, hap1_contig_purged_n90_quast_mb, hap1_contig_purged_l90_quast, hap1_contig_purged_assembly_length_quast_gb, hap1_contig_purged_largest_contig_quast_mb, hap1_contig_purged_number_quast, hap1_contig_purged_GC_quast, hap2_contig_purged_n50_quast_mb, hap2_contig_purged_l50_quast, hap2_contig_purged_n90_quast_mb, hap2_contig_purged_l90_quast, hap2_contig_purged_assembly_length_quast_gb, hap2_contig_purged_largest_contig_quast_mb, hap2_contig_purged_number_quast, hap2_contig_purged_GC_quast) + + +#If assembly is scaffolded +if (file.exists(args[35])){ + #quast_table_scaffold = read.table("~/Downloads/Greenland_cockle_R/Greenland_cockle_004_report.tsv", sep = "\t", header=T, comment.char="@") + quast_table_scaffold = read.table(args[35], sep = "\t", header=T, comment.char="@") + + hap1_scaffold_n50_quast_mb = quast_table_scaffold[quast_table_scaffold$Assembly=="N50",2]/1000000 + hap1_scaffold_l50_quast = quast_table_scaffold[quast_table_scaffold$Assembly=="L50",2] + hap1_scaffold_n90_quast_mb = quast_table_scaffold[quast_table_scaffold$Assembly=="N90",2]/1000000 + hap1_scaffold_l90_quast = quast_table_scaffold[quast_table_scaffold$Assembly=="L90",2] + hap1_scaffold_assembly_length_quast_gb = quast_table_scaffold[quast_table_scaffold$Assembly =="Total length",2]/1000000000 + hap1_scaffold_largest_contig_quast_mb = quast_table_scaffold[quast_table_scaffold$Assembly =="Largest contig",2]/1000000 + hap1_scaffold_number_quast = quast_table_scaffold[quast_table_scaffold$Assembly =="# contigs",2] + hap1_scaffold_GC_quast = quast_table_scaffold[quast_table_scaffold$Assembly =="GC (%)", 2] +} else { + hap1_scaffold_n50_quast_mb = "NA" + hap1_scaffold_l50_quast = "NA" + hap1_scaffold_n90_quast_mb = "NA" + hap1_scaffold_l90_quast = "NA" + hap1_scaffold_assembly_length_quast_gb = "NA" + hap1_scaffold_largest_contig_quast_mb = "NA" + hap1_scaffold_number_quast = "NA" + hap1_scaffold_GC_quast = "NA" +} + +if ((file.exists(args[35])) & assembly_method == "hifiasm") { + quast_table_scaffold = read.table(args[35], sep = "\t", header=T, comment.char="@") + hap2_scaffold_n50_quast_mb = quast_table_scaffold[quast_table_scaffold$Assembly=="N50",3]/1000000 + hap2_scaffold_l50_quast = quast_table_scaffold[quast_table_scaffold$Assembly=="L50",3] + hap2_scaffold_n90_quast_mb = quast_table_scaffold[quast_table_scaffold$Assembly=="N90",3]/1000000 + hap2_scaffold_l90_quast = quast_table_scaffold[quast_table_scaffold$Assembly=="L90",3] + hap2_scaffold_assembly_length_quast_gb = quast_table_scaffold[quast_table_scaffold$Assembly =="Total length",3]/1000000000 + hap2_scaffold_largest_contig_quast_mb = quast_table_scaffold[quast_table_scaffold$Assembly =="Largest contig",3]/1000000 + hap2_scaffold_number_quast = quast_table_scaffold[quast_table_scaffold$Assembly =="# contigs",3] + hap2_scaffold_GC_quast = quast_table_scaffold[quast_table_scaffold$Assembly =="GC (%)", 3] +} else { + hap2_scaffold_n50_quast_mb = "NA" + hap2_scaffold_l50_quast = "NA" + hap2_scaffold_n90_quast_mb = "NA" + hap2_scaffold_l90_quast = "NA" + hap2_scaffold_assembly_length_quast_gb = "NA" + hap2_scaffold_largest_contig_quast_mb = "NA" + hap2_scaffold_number_quast = "NA" + hap2_scaffold_GC_quast = "NA" +} + +quast_scaffold_overview = cbind (hap1_scaffold_n50_quast_mb, hap1_scaffold_l50_quast, hap1_scaffold_n90_quast_mb, hap1_scaffold_l90_quast, hap1_scaffold_assembly_length_quast_gb, hap1_scaffold_largest_contig_quast_mb, hap1_scaffold_number_quast, hap1_scaffold_GC_quast, hap2_scaffold_n50_quast_mb, hap2_scaffold_l50_quast, hap2_scaffold_n90_quast_mb, hap2_scaffold_l90_quast, hap2_scaffold_assembly_length_quast_gb, hap2_scaffold_largest_contig_quast_mb, hap2_scaffold_number_quast, hap2_scaffold_GC_quast) + + +#Busco scores +#Busco are generated only on the latest file (scaffold>purged>contig) as it takes a bunch of time +#lineage1 +if (file.exists(args[36])) { +#busco_report_lin1= fromJSON(file="~/Downloads/Greenland_cockle_R/short_summary.specific.vertebrata_odb10.Greenland_cockle_004.asm.bp.hap1.p_ctg.purged_scaffolds_final.fa.json") + busco_report_lin1= fromJSON(file=args[36]) +# lin1 = busco_report_lin1[["lineage_dataset"]][["name"]] +# busco_complete_single_lin1 = busco_report_lin1[["results"]][["Single copy"]] +# busco_complete_duplicated_lin1 = busco_report_lin1[["results"]][["Multi copy"]] +# busco_complete_single_duplicated_lin1 = busco_report_lin1[["results"]][["Complete"]] +# busco_fragmented_lin1 = busco_report_lin1[["results"]][["Fragmented"]] +# busco_missing_lin1 = busco_report_lin1[["results"]][["Missing"]] + lin1_temp = busco_report_lin1[["dataset"]] + lin1=sub('.*/', '', lin1_temp) + busco_total_num_lin1 = as.numeric(busco_report_lin1[["dataset_total_buscos"]]) + busco_complete_single_lin1_num = busco_report_lin1[["S"]] + busco_complete_single_lin1 = busco_complete_single_lin1_num*100/busco_total_num_lin1 + busco_complete_duplicated_lin1_num = busco_report_lin1[["D"]] + busco_complete_duplicated_lin1 = busco_complete_duplicated_lin1_num*100/busco_total_num_lin1 + busco_complete_single_duplicated_lin1_num = busco_report_lin1[["C"]] + busco_complete_single_duplicated_lin1 = busco_complete_single_duplicated_lin1_num*100/busco_total_num_lin1 + busco_fragmented_lin1_num = busco_report_lin1[["F"]] + busco_fragmented_lin1 = busco_fragmented_lin1_num*100/busco_total_num_lin1 + busco_missing_lin1_num = busco_report_lin1[["M"]] + busco_missing_lin1 = busco_missing_lin1_num*100/busco_total_num_lin1 +} else { + busco_report_lin1 = "NA" + lin1 = "NA" + busco_complete_single_lin1 = "NA" + busco_complete_duplicated_lin1 = "NA" + busco_complete_single_duplicated_lin1 = "NA" + busco_fragmented_lin1 = "NA" + busco_missing_lin1 = "NA" +} +busco_lin1_overview = cbind(lin1, busco_complete_single_lin1, busco_complete_duplicated_lin1, busco_complete_single_duplicated_lin1, busco_fragmented_lin1, busco_missing_lin1) +#colnames(busco_lin1_overview) = c(paste0("busco_complete_single_",lin1), paste0("busco_complete_duplicated_",lin1), paste0("busco_complete_single_duplicated_single_",lin1), paste0("busco_fragmented_",lin1), paste0("busco_missing_",lin1)) + +#lineage2 +#if ( lineage2 != "null") { +if (file.exists(args[37])) { + #busco_report_lin2= fromJSON(file="~/Downloads/Greenland_cockle_R/short_summary.specific.vertebrata_odb10.Greenland_cockle_004.asm.bp.hap1.p_ctg.purged_scaffolds_final.fa.json") + busco_report_lin2= fromJSON(file=args[37]) + lin2_temp = busco_report_lin2[["dataset"]] + lin2=sub('.*/', '', lin2_temp) + busco_total_num_lin2 = as.numeric(busco_report_lin2[["dataset_total_buscos"]]) + busco_complete_single_lin2_num = busco_report_lin2[["S"]] + busco_complete_single_lin2 = busco_complete_single_lin2_num*100/busco_total_num_lin2 + busco_complete_duplicated_lin2_num = busco_report_lin2[["D"]] + busco_complete_duplicated_lin2 = busco_complete_duplicated_lin2_num*100/busco_total_num_lin2 + busco_complete_single_duplicated_lin2_num = busco_report_lin2[["C"]] + busco_complete_single_duplicated_lin2 = busco_complete_single_duplicated_lin2_num*100/busco_total_num_lin2 + busco_fragmented_lin2_num = busco_report_lin2[["F"]] + busco_fragmented_lin2 = busco_fragmented_lin2_num*100/busco_total_num_lin2 + busco_missing_lin2_num = busco_report_lin2[["M"]] + busco_missing_lin2 = busco_missing_lin2_num*100/busco_total_num_lin2 +# lin2 = busco_report_lin2[["lineage_dataset"]][["name"]] +# busco_complete_single_lin2 = busco_report_lin2[["results"]][["Single copy"]] +# busco_complete_duplicated_lin2 = busco_report_lin2[["results"]][["Multi copy"]] +# busco_complete_single_duplicated_lin2 = busco_report_lin2[["results"]][["Complete"]] +# busco_fragmented_lin2 = busco_report_lin2[["results"]][["Fragmented"]] +# busco_missing_lin2 = busco_report_lin2[["results"]][["Missing"]] +} else { + lin2 = "null" + busco_complete_single_lin2 = "NA" + busco_complete_duplicated_lin2 = "NA" + busco_complete_single_duplicated_lin2 = "NA" + busco_fragmented_lin2 = "NA" + busco_missing_lin2 = "NA" +} + +busco_lin2_overview = cbind(lin2, busco_complete_single_lin2, busco_complete_duplicated_lin2, busco_complete_single_duplicated_lin2, busco_fragmented_lin2, busco_missing_lin2) +#colnames(busco_lin2_overview) = c(paste0("busco_complete_single_",lin2), paste0("busco_complete_duplicated_",lin2), paste0("busco_complete_single_duplicated_single_",lin2), paste0("busco_fragmented_",lin2), paste0("busco_missing_",lin2)) + +#lineage3 +#if ( lineage3 != "null") { +if (file.exists(args[38])) { + #busco_report_lin3= fromJSON(file="~/Downloads/Greenland_cockle_R/short_summary.specific.vertebrata_odb10.Greenland_cockle_004.asm.bp.hap1.p_ctg.purged_scaffolds_final.fa.json") + busco_report_lin3= fromJSON(file=args[38]) + lin3_temp = busco_report_lin3[["dataset"]] + lin3=sub('.*/', '', lin3_temp) + busco_total_num_lin3 = as.numeric(busco_report_lin3[["dataset_total_buscos"]]) + busco_complete_single_lin3_num = busco_report_lin3[["S"]] + busco_complete_single_lin3 = busco_complete_single_lin3_num*100/busco_total_num_lin3 + busco_complete_duplicated_lin3_num = busco_report_lin3[["D"]] + busco_complete_duplicated_lin3 = busco_complete_duplicated_lin3_num*100/busco_total_num_lin3 + busco_complete_single_duplicated_lin3_num = busco_report_lin3[["C"]] + busco_complete_single_duplicated_lin3 = busco_complete_single_duplicated_lin3_num*100/busco_total_num_lin3 + busco_fragmented_lin3_num = busco_report_lin3[["F"]] + busco_fragmented_lin3 = busco_fragmented_lin3_num*100/busco_total_num_lin3 + busco_missing_lin3_num = busco_report_lin3[["M"]] + busco_missing_lin3 = busco_missing_lin3_num*100/busco_total_num_lin3 +# lin3 = busco_report_lin3[["lineage_dataset"]][["name"]] +# busco_complete_single_lin3 = busco_report_lin3[["results"]][["Single copy"]] +# busco_complete_duplicated_lin3 = busco_report_lin3[["results"]][["Multi copy"]] +# busco_complete_single_duplicated_lin3 = busco_report_lin3[["results"]][["Complete"]] +# busco_fragmented_lin3 = busco_report_lin3[["results"]][["Fragmented"]] +# busco_missing_lin3 = busco_report_lin3[["results"]][["Missing"]] +} else { + lin3 = "null" + busco_complete_single_lin3 = "NA" + busco_complete_duplicated_lin3 = "NA" + busco_complete_single_duplicated_lin3 = "NA" + busco_fragmented_lin3 = "NA" + busco_missing_lin3 = "NA" +} + +busco_lin3_overview = cbind(lin3, busco_complete_single_lin3, busco_complete_duplicated_lin3, busco_complete_single_duplicated_lin3, busco_fragmented_lin3, busco_missing_lin3) +#colnames(busco_lin3_overview) = c(paste0("busco_complete_single_",lin3), paste0("busco_complete_duplicated_",lin3), paste0("busco_complete_single_duplicated_single_",lin3), paste0("busco_fragmented_",lin3), paste0("busco_missing_",lin3)) + +#lineage4 +#if ( lineage4 != "null") { +if (file.exists(args[39])) { + #busco_report_lin4= fromJSON(file="~/Downloads/Greenland_cockle_R/short_summary.specific.vertebrata_odb10.Greenland_cockle_004.asm.bp.hap1.p_ctg.purged_scaffolds_final.fa.json") + busco_report_lin4= fromJSON(file=args[39]) + lin4_temp = busco_report_lin4[["dataset"]] + lin4=sub('.*/', '', lin4_temp) + busco_total_num_lin4 = as.numeric(busco_report_lin4[["dataset_total_buscos"]]) + busco_complete_single_lin4_num = busco_report_lin4[["S"]] + busco_complete_single_lin4 = busco_complete_single_lin4_num*100/busco_total_num_lin4 + busco_complete_duplicated_lin4_num = busco_report_lin4[["D"]] + busco_complete_duplicated_lin4 = busco_complete_duplicated_lin4_num*100/busco_total_num_lin4 + busco_complete_single_duplicated_lin4_num = busco_report_lin4[["C"]] + busco_complete_single_duplicated_lin4 = busco_complete_single_duplicated_lin4_num*100/busco_total_num_lin4 + busco_fragmented_lin4_num = busco_report_lin4[["F"]] + busco_fragmented_lin4 = busco_fragmented_lin4_num*100/busco_total_num_lin4 + busco_missing_lin4_num = busco_report_lin4[["M"]] + busco_missing_lin4 = busco_missing_lin4_num*100/busco_total_num_lin4 +# lin4 = busco_report_lin4[["lineage_dataset"]][["name"]] +# busco_complete_single_lin4 = busco_report_lin4[["results"]][["Single copy"]] +# busco_complete_duplicated_lin4 = busco_report_lin4[["results"]][["Multi copy"]] +# busco_complete_single_duplicated_lin4 = busco_report_lin4[["results"]][["Complete"]] +# busco_fragmented_lin4 = busco_report_lin4[["results"]][["Fragmented"]] +# busco_missing_lin4 = busco_report_lin4[["results"]][["Missing"]] +} else { + lin4 = "null" + busco_complete_single_lin4 = "NA" + busco_complete_duplicated_lin4 = "NA" + busco_complete_single_duplicated_lin4 = "NA" + busco_fragmented_lin4 = "NA" + busco_missing_lin4 = "NA" +} + +busco_lin4_overview = cbind(lin4, busco_complete_single_lin4, busco_complete_duplicated_lin4, busco_complete_single_duplicated_lin4, busco_fragmented_lin4, busco_missing_lin4) +#colnames(busco_lin4_overview) = c(paste0("busco_complete_single_",lin4), paste0("busco_complete_duplicated_",lin4), paste0("busco_complete_single_duplicated_single_",lin4), paste0("busco_fragmented_",lin4), paste0("busco_missing_",lin4)) + +#Percentage of the genome included in the number of expected chromsome +#Load $id_scaffolds_final.chrom.sizes +if (file.exists(args[40])) { + chrom_size_table = read.table(args[40], sep = " ") + colnames(chrom_size_table) = c("chromsome", "size") + ass_length_adding_scaffold_bp = sum(as.numeric(chrom_size_table$size)) + ass_length_adding_scaffold_gb = ass_length_adding_scaffold_bp / 1000000000 + #Extract table with the expected number of chromosome + #chrom_num = 19 + subset_chrom_size_table = chrom_size_table [1:chrom_num,] + #cumulate size + cum_length_n_chrom_bp = sum(subset_chrom_size_table$size) + #hap_gen_size_Gb = 1.38 + perc_ass_assigned_to_expected_n_of_chr = cum_length_n_chrom_bp*100/ass_length_adding_scaffold_bp +} else { + ass_length_adding_scaffold_gb = "NA" + perc_ass_assigned_to_expected_n_of_chr = "NA" + +} + + +##Write file + +overview_table = cbind(xxid, taxon_name, taxon_taxid, lineage, lineage2, lineage3, lineage4, hap_gen_size_Gb, ploidy, chrom_num, pacbio_n_files, pacbio_input_type, pacbio_rq, hic_n_paired_files, pe150_n_paired_files, ont_n_files, pipeline_version, assembly_method, assembly_secondary_mode, polishing_method, purging_method, scaffolding_method, manual_curation, mitohifi, lonqc_overview, kraken_pacbio_overview, kraken_hic_overview, quast_contig_overview, quast_contig_purged_overview, quast_scaffold_overview, busco_lin1_overview, busco_lin2_overview, busco_lin3_overview, busco_lin4_overview, ass_length_adding_scaffold_gb, perc_ass_assigned_to_expected_n_of_chr) + +write.table(overview_table, file="overview_results.tsv", quote=FALSE, row.names = FALSE, sep="\t") + + + diff --git a/modules/pacbio/bam2fastx/2bam2fastx/main.nf b/modules/pacbio/bam2fastx/2bam2fastx/main.nf deleted file mode 100644 index 09347e6..0000000 --- a/modules/pacbio/bam2fastx/2bam2fastx/main.nf +++ /dev/null @@ -1,29 +0,0 @@ -process TWOBAM2FASTX { - tag "$meta.id" - label 'process_medium' - - conda (params.enable_conda ? "bioconda::bam2fastx=1.3.1" : null) - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/bam2fastx%3A1.3.1--hf05d43a_1': - 'quay.io/biocontainers/bam2fastx' }" - - input: - tuple val(meta), path(bam), path(index) - tuple val(meta2), path(bam2), path(index2) - - output: - tuple val(meta), path('*.fastq.gz'), emit: reads - - script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - """ - bam2fastq \\ - $args \\ - -o ${prefix} \\ - $bam \\ - $bam2 \\ - > ${prefix}.bam2fastx.log - cat <<-END_VERSIONS > versions.yml - """ -} diff --git a/modules/pacbio/bam2fastx/3bam2fastx/main.nf b/modules/pacbio/bam2fastx/3bam2fastx/main.nf deleted file mode 100644 index beb759c..0000000 --- a/modules/pacbio/bam2fastx/3bam2fastx/main.nf +++ /dev/null @@ -1,31 +0,0 @@ -process THREEBAM2FASTX { - tag "$meta.id" - label 'process_medium' - - conda (params.enable_conda ? "bioconda::bam2fastx=1.3.1" : null) - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/bam2fastx%3A1.3.1--hf05d43a_1': - 'quay.io/biocontainers/bam2fastx' }" - - input: - tuple val(meta), path(bam), path(index) - tuple val(meta2), path(bam2), path(index2) - tuple val(meta3), path(bam3), path(index3) - - output: - tuple val(meta), path('*.fastq.gz'), emit: reads - - script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - """ - bam2fastq \\ - $args \\ - -o ${prefix} \\ - $bam \\ - $bam2 \\ - $bam3 \\ - > ${prefix}.bam2fastx.log - cat <<-END_VERSIONS > versions.yml - """ -} diff --git a/modules/pacbio/bam2fastx/4bam2fastx/main.nf b/modules/pacbio/bam2fastx/4bam2fastx/main.nf deleted file mode 100644 index 314a317..0000000 --- a/modules/pacbio/bam2fastx/4bam2fastx/main.nf +++ /dev/null @@ -1,33 +0,0 @@ -process FOURBAM2FASTX { - tag "$meta.id" - label 'process_medium' - - conda (params.enable_conda ? "bioconda::bam2fastx=1.3.1" : null) - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/bam2fastx%3A1.3.1--hf05d43a_1': - 'quay.io/biocontainers/bam2fastx' }" - - input: - tuple val(meta), path(bam), path(index) - tuple val(meta2), path(bam2), path(index2) - tuple val(meta3), path(bam3), path(index3) - tuple val(meta4), path(bam4), path(index4) - - output: - tuple val(meta), path('*.fastq.gz'), emit: reads - - script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - """ - bam2fastq \\ - $args \\ - -o ${prefix} \\ - $bam \\ - $bam2 \\ - $bam3 \\ - $bam4 \\ - > ${prefix}.bam2fastx.log - cat <<-END_VERSIONS > versions.yml - """ -} diff --git a/modules/pacbio/bam2fastx/main.nf b/modules/pacbio/bam2fastx/main.nf index c336f2b..263be01 100644 --- a/modules/pacbio/bam2fastx/main.nf +++ b/modules/pacbio/bam2fastx/main.nf @@ -2,16 +2,17 @@ process BAM2FASTX { tag "$meta.id" label 'process_medium' - conda (params.enable_conda ? "bioconda::bam2fastx=1.3.1" : null) + conda "bioconda::pbtk==3.1.0" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/bam2fastx%3A1.3.1--hf05d43a_1': - 'quay.io/biocontainers/bam2fastx' }" + 'https://depot.galaxyproject.org/singularity/pbtk:3.1.0--h9ee0642_0': + 'quay.io/biocontainers/pbtk' }" input: tuple val(meta), path(bam, stageAs: '??.bam'), path(index, stageAs: '??.bam.pbi') output: tuple val(meta), path('*.fastq.gz'), emit: reads + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' @@ -19,9 +20,13 @@ process BAM2FASTX { """ bam2fastq \\ $args \\ - -o ${prefix} \\ - $bam \\ + -o ${prefix} \\ + $bam \\ > ${prefix}.bam2fastx.log + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bam2fastq: \$(bam2fastq --version | sed 's/bam2fastq //g') + END_VERSIONS """ } diff --git a/modules/pacbio/ccs/main.nf b/modules/pacbio/ccs/main.nf index 08c5811..87be9f0 100644 --- a/modules/pacbio/ccs/main.nf +++ b/modules/pacbio/ccs/main.nf @@ -2,7 +2,7 @@ process CCS { tag "$meta.id" label 'process_medium' - conda (params.enable_conda ? "bioconda::pbccs" : null) + conda "bioconda::pbccs" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/pbccs%3A6.4.0--h9ee0642_0': 'quay.io/biocontainers/pbccs' }" @@ -12,6 +12,7 @@ process CCS { output: tuple val(meta), path('*_postccs.bam'), emit: bam + path "versions.yml" , emit: versions script: """ @@ -20,5 +21,10 @@ process CCS { --all \\ $bam \\ ${meta.id}_postccs.bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ccs: \$(ccs --version | sed 's/ccs v//g') + END_VERSIONS """ } diff --git a/modules/pacbio/jasmine/main.nf b/modules/pacbio/jasmine/main.nf new file mode 100644 index 0000000..e4c602c --- /dev/null +++ b/modules/pacbio/jasmine/main.nf @@ -0,0 +1,31 @@ +process JASMINE { + tag "$meta.id" + label 'process_medium' + + conda "bioconda::pbjasmine" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pbjasmine:2.0.0--h9ee0642_0': + 'quay.io/biocontainers/pbjasmine' }" + + input : + tuple val(meta), path (unaligned_bam) + + output : + tuple val(meta), path ('*_5mc.bam'), emit : cpg_bam + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + jasmine $unaligned_bam ${prefix}_5mc.bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + jasmine: \$( jasmine --version | sed 's/jasmine //g'| head -n1 ) + END_VERSIONS + """ +} diff --git a/modules/pacbio/pbbam/pbindex/main.nf b/modules/pacbio/pbbam/pbindex/main.nf deleted file mode 100644 index 4b3ffaa..0000000 --- a/modules/pacbio/pbbam/pbindex/main.nf +++ /dev/null @@ -1,21 +0,0 @@ -process PBINDEX { - tag "$meta.id" - label 'process_medium' - - conda (params.enable_conda ? "bioconda::pbbam" : null) - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/pbbam%3A2.1.0--h3f0f298_2': - 'quay.io/biocontainers/pbbam' }" - - input: - tuple val(meta), path(bam) - - output: - tuple val(meta), path('*.bam.pbi'), emit: index - - script: - """ - pbindex \\ - $bam - """ -} diff --git a/modules/pacbio/pbindex/main.nf b/modules/pacbio/pbindex/main.nf new file mode 100644 index 0000000..7c43b7a --- /dev/null +++ b/modules/pacbio/pbindex/main.nf @@ -0,0 +1,27 @@ +process PBINDEX { + tag "$meta.id" + label 'process_medium' + + conda "bioconda::pbtk==3.1.0" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pbtk:3.1.0--h9ee0642_0': + 'quay.io/biocontainers/pbtk' }" + + input: + tuple val(meta), path(bam) + + output: + tuple val(meta), path('*.bam.pbi'), emit: index + path "versions.yml" , emit: versions + + script: + """ + pbindex \\ + $bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pbindex: \$(pbindex --version | sed 's/pbindex //g'| head -n1 ) + END_VERSIONS + """ +} diff --git a/modules/pacbio/pbmerge/main.nf b/modules/pacbio/pbmerge/main.nf new file mode 100644 index 0000000..a0f137e --- /dev/null +++ b/modules/pacbio/pbmerge/main.nf @@ -0,0 +1,35 @@ +process PBBAM_PBMERGE { + tag "$meta.id" + label 'process_low' + + conda "bioconda::pbtk==3.1.0" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pbtk:3.1.0--h9ee0642_0': + 'quay.io/biocontainers/pbtk' }" + + input: + tuple val(meta), path(bam, stageAs: "?/*") + + output: + tuple val(meta), path("*.bam"), emit: bam + tuple val(meta), path("*.pbi"), emit: pbi + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + pbmerge \\ + -o ${prefix}.bam \\ + $args \\ + */*.bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pbbam: \$( pbmerge --version | head -n1 | sed 's/pbmerge //' | sed -E 's/ .+//' ) + END_VERSIONS + """ +} diff --git a/modules/pacbio/pbmm2/main.nf b/modules/pacbio/pbmm2/main.nf new file mode 100644 index 0000000..ad72faa --- /dev/null +++ b/modules/pacbio/pbmm2/main.nf @@ -0,0 +1,32 @@ +process PBMM2 { + tag "$meta.id" + label 'process_medium' + + conda "bioconda::pbmm2==1.12.0" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pbmm2:1.12.0--h9ee0642_0': + 'quay.io/biocontainers/pbmm2:1.12.0--h9ee0642_0' }" + + input : + tuple val(meta), path (unaligned_bam) + tuple val(meta2), path (fasta) + + output : + tuple val(meta), path ('*_aligned.bam'), emit : aligned_bam + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + pbmm2 align $unaligned_bam $fasta ${prefix}_aligned.bam --sort + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pbmm2: \$(pbmm2 --version | sed 's/pbmm2 //g'| head -n1 ) + END_VERSIONS + """ +} diff --git a/modules/pacbio/preprocess_merged/main.nf b/modules/pacbio/preprocess_merged/main.nf new file mode 100644 index 0000000..cc45983 --- /dev/null +++ b/modules/pacbio/preprocess_merged/main.nf @@ -0,0 +1,42 @@ +process PREPROCESS_MERGED { + tag "$meta.id" + label 'process_high' + + conda "bioconda::pbtk==3.1.0 bioconda::bamtools=2.5.2" + + input: + tuple val(meta), path(bam, stageAs: "?/*") + + output: + tuple val(meta), path('*_filtered.bam'), emit: filtered_bam + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args_pbmerge = task.ext.args_pbmerge ?: '' + def args_bamtools_filter = task.ext.args_bamtools_filter ?: '' + def args_bam2fastq = task.ext.args_bam2fastq ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + pbmerge \\ + -o ${prefix}_merged.bam \\ + $args_pbmerge \\ + */*.bam + + bamtools \\ + filter \\ + -in ${prefix}_merged.bam \\ + $args_bamtools_filter \\ + -out ${prefix}_filtered.bam + + rm *_merged.bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pbbam: \$( pbmerge --version | head -n1 | sed 's/pbmerge //' | sed -E 's/ .+//' ) + bamtools: \$( bamtools --version | grep -e 'bamtools' | sed 's/^.*bamtools //' ) + END_VERSIONS + """ +} diff --git a/modules/pilon/main.nf b/modules/pilon/main.nf index 53a1b98..79beae1 100644 --- a/modules/pilon/main.nf +++ b/modules/pilon/main.nf @@ -2,7 +2,7 @@ process PILON { tag "$meta.id" label 'process_medium' - conda "bioconda::pilon=1.24" +// conda "bioconda::pilon=1.24" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/pilon:1.24--hdfd78af_0' : 'quay.io/biocontainers/pilon' }" @@ -18,6 +18,7 @@ process PILON { tuple val(meta), path("*.vcf") , emit: vcf, optional : true tuple val(meta), path("*.bed") , emit: tracks_bed, optional : true tuple val(meta), path("*.wig") , emit: tracks_wig, optional : true + path "versions.yml" , emit: versions script: def args = task.ext.args ?: '' @@ -31,5 +32,10 @@ process PILON { --output ${meta.id} \\ --threads $task.cpus \\ $pilon_mode $bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pilon: \$(pilon --version | sed 's/pilon v//g') + END_VERSIONS """ } diff --git a/modules/pretext/pretextgraph/main.nf b/modules/pretext/pretextgraph/main.nf new file mode 100644 index 0000000..2173b42 --- /dev/null +++ b/modules/pretext/pretextgraph/main.nf @@ -0,0 +1,31 @@ +process PRETEXTGRAPH { + tag "$meta.id" + label 'process_single' + + conda "bioconda::pretextgraph=0.0.6" + + input: + tuple val(meta), path(pretext_file) + tuple val(meta2), path(bedgraph) + val (graph_name) + + output: + tuple val(meta), path("*_2.pretext"), emit: pretext + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + gzip -c $bedgraph > bedgraph.file.gz + zcat bedgraph.file.gz | PretextGraph -i ${pretext_file} -n "$graph_name" -o ${prefix}_${graph_name}_2.pretext + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + PretextGraph: 0.0.6 + END_VERSIONS + """ +} diff --git a/modules/pretext/pretextmap/main.nf b/modules/pretext/pretextmap/main.nf index eb4b7bc..dfea2fc 100644 --- a/modules/pretext/pretextmap/main.nf +++ b/modules/pretext/pretextmap/main.nf @@ -1,4 +1,3 @@ - process PRETEXTMAP { tag "$meta.id" label 'process_single' @@ -14,9 +13,16 @@ process PRETEXTMAP { output: tuple val(meta), path("*.pretext"), emit: pretext + path "versions.yml" , emit: versions script: """ (awk 'BEGIN{print "## pairs format v1.0"} {print "#chromsize:\t"\$1"\t"\$2} END {print "#columns:\treadID\tchr1\tpos1\tchr2\tpos2\tstrand1\tstrand2"}' $chrom_sizes; awk '{print ".\t"\$2"\t"\$3"\t"\$6"\t"\$7"\t.\t."}' $alignments_sorted_txt) | PretextMap -o ${meta.id}.pretext + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pretextmap: \$(PretextMap | grep "Version" | sed 's/PretextMap Version //g') + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' ) + END_VERSIONS """ } diff --git a/modules/pretext/pretextsnapshot/main.nf b/modules/pretext/pretextsnapshot/main.nf index a881fe0..f4abb86 100644 --- a/modules/pretext/pretextsnapshot/main.nf +++ b/modules/pretext/pretextsnapshot/main.nf @@ -2,7 +2,7 @@ process PRETEXTSNAPSHOT { tag "$meta.id" label 'process_single' - conda "bioconda::pretextsnapshot=0.0.4" +// conda "bioconda::pretextsnapshot=0.0.4" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/pretextsnapshot:0.0.4--h7d875b9_0': 'quay.io/biocontainers/pretextsnapshot:0.0.4--h7d875b9_0' }" diff --git a/modules/purgedups/calcuts/main.nf b/modules/purgedups/calcuts/main.nf index e6fe8aa..d1f4589 100644 --- a/modules/purgedups/calcuts/main.nf +++ b/modules/purgedups/calcuts/main.nf @@ -2,7 +2,7 @@ process PURGEDUPS_CALCUTS { tag "$meta.id" label 'process_single' - conda (params.enable_conda ? "bioconda::purge_dups=1.2.6" : null) + conda "bioconda::purge_dups=1.2.6" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/purge_dups:1.2.6--h7132678_0': 'quay.io/biocontainers/purge_dups:1.2.6--h7132678_0' }" diff --git a/modules/purgedups/getseqs/main.nf b/modules/purgedups/getseqs/main.nf index f3063f2..5eefaa7 100644 --- a/modules/purgedups/getseqs/main.nf +++ b/modules/purgedups/getseqs/main.nf @@ -2,17 +2,18 @@ process PURGEDUPS_GETSEQS { tag "$meta.id" label 'process_single' - conda (params.enable_conda ? "bioconda::purge_dups=1.2.6" : null) + conda "bioconda::purge_dups=1.2.6" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/purge_dups:1.2.6--h7132678_0': 'quay.io/biocontainers/purge_dups:1.2.6--h7132678_0' }" input: - tuple val(meta), path(assembly), path(bed) + tuple val(meta), path(assembly) + tuple val(meta2), path(bed) output: - tuple val(meta), path("*.hap.fa") , emit: haplotigs - tuple val(meta), path("*.purged.fa"), emit: purged + tuple val(meta2), path("*.hap.fa") , emit: haplotigs + tuple val(meta2), path("*.purged.fa"), emit: purged path "versions.yml" , emit: versions when: diff --git a/modules/purgedups/pbcstat/main.nf b/modules/purgedups/pbcstat/main.nf index 62caf37..8830998 100644 --- a/modules/purgedups/pbcstat/main.nf +++ b/modules/purgedups/pbcstat/main.nf @@ -2,7 +2,7 @@ process PURGEDUPS_PBCSTAT { tag "$meta.id" label 'process_single' - conda (params.enable_conda ? "bioconda::purge_dups=1.2.6" : null) + conda "bioconda::purge_dups=1.2.6" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/purge_dups:1.2.6--h7132678_0': 'quay.io/biocontainers/purge_dups:1.2.6--h7132678_0' }" diff --git a/modules/purgedups/purgedups/main.nf b/modules/purgedups/purgedups/main.nf index 5314850..b068b8f 100644 --- a/modules/purgedups/purgedups/main.nf +++ b/modules/purgedups/purgedups/main.nf @@ -2,13 +2,14 @@ process PURGEDUPS_PURGEDUPS { tag "$meta.id" label 'process_single' - conda (params.enable_conda ? "bioconda::purge_dups=1.2.6" : null) + conda "bioconda::purge_dups=1.2.6" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/purge_dups:1.2.6--h7132678_0': 'quay.io/biocontainers/purge_dups:1.2.6--h7132678_0' }" input: - tuple val(meta), path(basecov), path(cutoff), path(paf) + tuple val(meta), path(basecov), path(cutoff) + tuple val(meta2), path(paf) // tuple val(meta), path(basecov) // path(cutoff) // path(paf) diff --git a/modules/purgedups/splitfa/main.nf b/modules/purgedups/splitfa/main.nf index b7d7311..4401ee1 100644 --- a/modules/purgedups/splitfa/main.nf +++ b/modules/purgedups/splitfa/main.nf @@ -2,7 +2,7 @@ process PURGEDUPS_SPLITFA { tag "$meta.id" label 'process_single' - conda (params.enable_conda ? "bioconda::purge_dups=1.2.6" : null) + conda "bioconda::purge_dups=1.2.6" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/purge_dups:1.2.6--h7132678_0': 'quay.io/biocontainers/purge_dups:1.2.6--h7132678_0' }" diff --git a/modules/quast/main.nf b/modules/quast/main.nf index b99cba5..e9f693b 100644 --- a/modules/quast/main.nf +++ b/modules/quast/main.nf @@ -2,7 +2,7 @@ process QUAST { tag "$meta.id" label 'process_medium' - conda (params.enable_conda ? 'bioconda::quast=5.2.0' : null) + conda 'bioconda::quast=5.2.0' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/quast:5.2.0--py39pl5321h2add14b_1' : 'quay.io/biocontainers/quast:5.2.0--py39pl5321h2add14b_1' }" @@ -13,10 +13,11 @@ process QUAST { path gff val use_fasta val use_gff + val genome_size output: - path "${prefix}" , emit: results - path '*.tsv' , emit: tsv + path "*_quast_report.tsv" , emit: renamed_tsv + path 'report.tsv' , emit: tsv path "versions.yml" , emit: versions when: @@ -24,23 +25,22 @@ process QUAST { script: def args = task.ext.args ?: '' - def args2 = task.ext.args2 ?: '' + def est_ref_size = genome_size ? "--est-ref-size $genome_size" : "" prefix = task.ext.prefix ?: 'quast' def features = use_gff ? "--features $gff" : '' def reference = use_fasta ? "-r $fasta" : '' """ - $args2 - quast.py \\ --output-dir $prefix \\ $reference \\ $features \\ --threads $task.cpus \\ $args \\ + $est_ref_size \\ ${consensus.join(' ')} mv ${prefix}/report.tsv report.tsv - #ln -s ${prefix}/report.tsv + cp report.tsv ${consensus.baseName}_quast_report.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/quast/quast_double/main.nf b/modules/quast/quast_double/main.nf index 360287a..661ebcd 100644 --- a/modules/quast/quast_double/main.nf +++ b/modules/quast/quast_double/main.nf @@ -2,7 +2,7 @@ process QUAST_DOUBLE { tag "$meta.id" label 'process_medium' - conda (params.enable_conda ? 'bioconda::quast=5.2.0' : null) + conda 'bioconda::quast=5.2.0' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/quast:5.2.0--py39pl5321h2add14b_1' : 'quay.io/biocontainers/quast:5.2.0--py39pl5321h2add14b_1' }" @@ -14,10 +14,11 @@ process QUAST_DOUBLE { path gff val use_fasta val use_gff + val genome_size output: - path "${prefix}" , emit: results - path '*.tsv' , emit: tsv + path "*_quast_report.tsv" , emit: renamed_tsv + path 'report.tsv' , emit: tsv path "versions.yml" , emit: versions when: @@ -25,24 +26,23 @@ process QUAST_DOUBLE { script: def args = task.ext.args ?: '' - def args2 = task.ext.args2 ?: '' + def est_ref_size = genome_size ? "--est-ref-size $genome_size" : "" prefix = task.ext.prefix ?: 'quast' def features = use_gff ? "--features $gff" : '' def reference = use_fasta ? "-r $fasta" : '' """ - $args2 - quast.py \\ --output-dir $prefix \\ $reference \\ $features \\ --threads $task.cpus \\ $args \\ + $est_ref_size \\ ${consensus.join(' ')} \\ - $alternate + $alternate mv ${prefix}/report.tsv report.tsv - + cp report.tsv ${consensus.baseName}_quast_report.tsv #ln -s ${prefix}/report.tsv cat <<-END_VERSIONS > versions.yml diff --git a/modules/racon/main.nf b/modules/racon/main.nf index df76ef6..5c94322 100644 --- a/modules/racon/main.nf +++ b/modules/racon/main.nf @@ -8,8 +8,9 @@ process RACON { 'quay.io/biocontainers/racon:1.4.20--h9a82719_1' }" input: - tuple val(meta), path(reads) - tuple val(meta2), path(assembly), path(paf) + tuple val(meta2), path(reads) + tuple val(meta), path(assembly) + tuple val(meta3), path(paf) output: tuple val(meta), path('*_assembly_consensus.fasta.gz') , emit: improved_assembly diff --git a/modules/salsa2/main.nf b/modules/salsa2/main.nf index a96a79e..0770f3b 100644 --- a/modules/salsa2/main.nf +++ b/modules/salsa2/main.nf @@ -3,7 +3,7 @@ process SALSA2 { label 'process_medium' // WARN: Version information not provided by tool on CLI. Please update version string below when bumping container versions. -// conda (params.enable_conda ? "bioconda::salsa2=2.3" : null) +// conda "bioconda::salsa2=2.3" // container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? // 'https://depot.galaxyproject.org/singularity/salsa2:2.3--py27hee3b9ab_0': // 'quay.io/biocontainers/salsa2:2.3--py27hee3b9ab_0' }" diff --git a/modules/samtools/faidx/main.nf b/modules/samtools/faidx/main.nf index 16985f3..d34c75b 100644 --- a/modules/samtools/faidx/main.nf +++ b/modules/samtools/faidx/main.nf @@ -21,10 +21,17 @@ process SAMTOOLS_FAIDX { script: def args = task.ext.args ?: '' """ + FILE=$fasta + if [[ \$FILE == *.gz ]] + then + gzip -cdf $fasta > unzipped.fasta + FILE=unzipped.fasta + fi + samtools \\ faidx \\ $args \\ - $fasta + \$FILE cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/samtools/merge/main.nf b/modules/samtools/merge/main.nf index 1b88b43..6167477 100644 --- a/modules/samtools/merge/main.nf +++ b/modules/samtools/merge/main.nf @@ -2,7 +2,7 @@ process SAMTOOLS_MERGE { tag "$meta.id" label 'process_low' - conda "bioconda::samtools=1.16.1" +// conda "bioconda::samtools=1.16.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' : 'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }" diff --git a/modules/samtools/sort/main.nf b/modules/samtools/sort/main.nf new file mode 100644 index 0000000..af6ea66 --- /dev/null +++ b/modules/samtools/sort/main.nf @@ -0,0 +1,32 @@ +process SAMTOOLS_SORT { + tag "$meta.id" + label 'process_medium' + + conda "bioconda::samtools=1.16.1" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' : + 'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }" + + input: + tuple val(meta), path(bam) + + output: + tuple val(meta), path("*_sorted.bam"), emit: bam + tuple val(meta), path("*.csi"), emit: csi, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + if ("$bam" == "${prefix}_sorted.bam") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + """ + samtools sort -n $args -@ $task.cpus -o ${prefix}_sorted.bam -T $prefix $bam + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/sed/main.nf b/modules/sed/main.nf new file mode 100644 index 0000000..3241844 --- /dev/null +++ b/modules/sed/main.nf @@ -0,0 +1,16 @@ +process SED_NONE { + tag "$meta.id" + label 'process_high' + + input: + tuple val(meta), path(file1) + + output: + tuple val(meta), path("*_sed.fa"), emit: assembly + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + sed 's/None-None//' $file1 > ${prefix}_sed.fa + """ +} diff --git a/modules/sort/main.nf b/modules/sort/main.nf index be25f68..e995283 100644 --- a/modules/sort/main.nf +++ b/modules/sort/main.nf @@ -7,9 +7,15 @@ process SORT { output: tuple val(meta), path("*_sorted.bed"), emit: sorted_bed + path "versions.yml" , emit: versions script: """ sort --parallel=8 --buffer-size=80% --temporary-directory=$projectDir --output=${bed.simpleName}_sorted.bed $bed + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sort: \$(sort --version | sed 's/sort (GNU coreutils) //g' | sed -n 1p) + END_VERSIONS """ } diff --git a/modules/tidk/main.nf b/modules/tidk/main.nf new file mode 100644 index 0000000..bc29e02 --- /dev/null +++ b/modules/tidk/main.nf @@ -0,0 +1,50 @@ +process TIDK { + tag "$meta.id" + label 'process_medium' + + conda "bioconda::tidk=0.2.31" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/tidk:0.2.31--hdbdd923_2' : + 'quay.io/biocontainers/tidk:0.2.31' }" + + input: + tuple val(meta), path(fasta) + + output: + tuple val(meta), path("*.tsv") , emit: tsv_telomere + tuple val(meta), path("*.bedgraph"), emit: bedgraph_telomere + tuple val(meta), path("*.svg") , emit: plot_telomere + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + tidk --version + tidk search \\ + $args \\ + -o ${prefix} \\ + -e tsv \\ + --string ${params.string_telomere} \\ + --dir . \\ + ${fasta} + + tidk plot -t ${prefix}_telomeric_repeat_windows.tsv + + tidk search \\ + $args \\ + -o ${prefix} \\ + -e bedgraph \\ + --string ${params.string_telomere} \\ + --dir . \\ + ${fasta} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tidk: \$(tidk --version | sed -e "s/tidk//g") + END_VERSIONS + """ +} diff --git a/modules/verkko/main.nf b/modules/verkko/main.nf index afd8151..034adb8 100644 --- a/modules/verkko/main.nf +++ b/modules/verkko/main.nf @@ -14,7 +14,7 @@ process VERKKO { output: tuple val(meta), path("*_verkko_assembly.fasta") , emit: assembly tuple val(meta), path("*_homopolymer-compressed.gfa") , emit: gfa -// path "versions.yml" , emit: versions + path "versions.yml" , emit: versions when: task.ext.when == null || task.ext.when @@ -32,6 +32,11 @@ process VERKKO { mv ${meta.id}/assembly.fasta ${meta.id}_verkko_assembly.fasta mv ${meta.id}/assembly.homopolymer-compressed.gfa ${meta.id}_homopolymer-compressed.gfa + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + verkko: \$(verkko --version | sed 's/verkko v//g') + END_VERSIONS """ } else if (params.assembly_secondary_mode == 'pacbio'){ """ @@ -42,6 +47,11 @@ process VERKKO { mv ${meta.id}/assembly.fasta ${meta.id}_verkko_assembly.fasta mv ${meta.id}/assembly.homopolymer-compressed.gfa ${meta.id}_homopolymer-compressed.gfa + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + verkko: \$(verkko --version | sed 's/verkko v//g') + END_VERSIONS """ } else if (params.assembly_secondary_mode == 'ont'){ """ @@ -52,6 +62,11 @@ process VERKKO { mv ${meta.id}/assembly.fasta ${meta.id}_verkko_assembly.fasta mv ${meta.id}/assembly.homopolymer-compressed.gfa ${meta.id}_homopolymer-compressed.gfa + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + verkko: \$(verkko --version | sed 's/verkko v//g') + END_VERSIONS """ } else { error "Verkko needs a correct mode : 'pacbio', 'pacbio+ont' or 'ont'" diff --git a/modules/yahs/main.nf b/modules/yahs/main.nf index 56cfb5f..c35c4ba 100644 --- a/modules/yahs/main.nf +++ b/modules/yahs/main.nf @@ -13,17 +13,31 @@ process YAHS { tuple val(meta), path("*bin"), emit: bin tuple val(meta), path("*_scaffolds_final.agp"), emit: agp tuple val(meta), path("*_scaffolds_final.fa"), emit: fasta + path "versions.yml" , emit: versions when: task.ext.when == null || task.ext.when script: + def prefix = task.ext.prefix ?: "${meta.id}" def args = task.ext.args ?: '' """ + FILE=$assembly + if [[ \$FILE == *.gz ]] + then + gzip -cdf $assembly > unzipped.fasta + FILE=unzipped.fasta + fi + yahs \\ - -o ${assembly.baseName} \\ - $assembly \\ + -o ${assembly.baseName} \\ + \$FILE \\ $bam \\ $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + yahs: \$(yahs --version | sed 's/yahs v//g') + END_VERSIONS """ } diff --git a/modules/yak/main.nf b/modules/yak/main.nf new file mode 100644 index 0000000..9684f6c --- /dev/null +++ b/modules/yak/main.nf @@ -0,0 +1,32 @@ +process YAK { + tag "$meta.id" + label 'process_high' + + conda "bioconda::yak=0.1" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/yak%3A0.1--he4a0461_4' : + 'quay.io/biocontainers/yak%3A0.1--he4a0461_4' }" + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path ("*.yak"), emit: yak + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + # for paired end: to provide two identical streams + yak count -b37 -t32 -o ${prefix}.yak <(zcat $reads) + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + yak: \$(echo \$(yak --version 2>&1) | sed 's/^.*yak //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/nextflow.config b/nextflow.config index b4ae482..7b6df96 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,29 +1,37 @@ params { -//Genral parameters - id = "github" - outdir = "." - pipeline_version = "V1" - +//Specie parameters + id = "github" + taxon_taxid = "50000" + related_genome = "GCA_002816235.1" // optional, will be used to generate the Jupiter plot + string_telomere = "TTAGGG" + pipeline_version = "V2" + +//Optional (if not indicated, autolineage for busco) +// lineage = "" +// lineage2 = "vertebrata_odb10" +// lineage3 = "metazoa_odb10" +// lineage4 = "eukaryota_odb10" + +//Data input + raw_data_path = "https://github.com/bcgsc/Canadian_Biogenome_Project/raw/V2/example_input" //PacBio input - pacbio_input_type = "ccs" //'subreads' or 'ccs' - bam_cell1 = "subset_covid_hifi.bam" -// bam_cell2 = "" -// bam_cell3 = "" -// bam_cell4 = "" - - -//ONT input -// ont_fastq_1 = "" + pacbio_input_type = "ccs" // 'hifi' or 'ccs' or 'clr' - HiFi mode: skips filtering + bam_cell1 = "${raw_data_path}/subset_covid_hifi.bam" +// bam_cell2 = "${raw_data_path}/pacbio/" +// bam_cell3 = "${raw_data_path}/pacbio/" +// bam_cell4 = "${raw_data_path}/pacbio/" //HiC Illumina input - hic_read1 = "test_1.fastq.gz" - hic_read2 = "test_2.fastq.gz" + hic_read1 = "${raw_data_path}/test_1.fastq.gz" + hic_read2 = "${raw_data_path}/test_2.fastq.gz" + Illumina_prefix = "test" - Illumina_prefix = "test" +//ONT input +// ont_fastq_1 = "${raw_data_path}/nanopore/" //Illumina short reads input -// illumina_SR_read1 = "" -// illumina_SR_read2 = "" +// illumina_SR_read1 = "${raw_data_path}/SR/" +// illumina_SR_read2 = "${raw_data_path}/SR/" //Pre-processing //PacBio reads obtained from SickKids are CCS, not Hifi : It includes HiFi reads but also reads of lower quality. @@ -31,95 +39,128 @@ params { // If 'pacbio_rq'=0.99 --> output bam will contain HiFi Reads with predicted accuracy ≥Q20 (HiFi reads only, Probability of incorrect base call : 1 in 100), equivalent of using extracthifi software // If 'pacbio_rq'=0.9 --> Phred Quality Score =10, Probability of incorrect base call : 1 in 10 // If 'pacbio_rq'=0.999 --> Phred Quality Score =30, Probability of incorrect base call : 1 in 1000 - pacbio_rq = "0.99" - +// For CLR, set 'pacbio_rq'=0.1 to keep all the CLR reads. Canu will deal with them + pacbio_rq = "0.9" //Method assembly_method = "hifiasm" // 'hifiasm' or 'canu' of 'flye' or 'verkko' assembly_secondary_mode = "pacbio" // Depends on the assembly method selected, details in the following lines : + hap2 = "no" // With hifiasm, it is possible to process hap2 + // With hifiasm : 'pacbio' (uses pacbio data only), 'pacbio+hic' (--h1 //--h2 : include Hi-C integration, requires Hi-C reads, VGP says that the output requires additional manual curation), 'pacbio+ont' (--ul : Ultra-long ONT integration), 'pacbio+ont+hic' -// With canu : 'hicanu' (-pacbio-hifi : uses HiFi data only), 'ont' (-nanopore : uses nanopore data only) -// With flye : 'hifi' (--pacbio-hifi mode), 'ont' (--nano-raw mode), 'pacbio+ont' +// With canu : 'hicanu' (-pacbio-hifi : uses HiFi data only), 'ont' (-nanopore : uses nanopore data only), 'clr' (-pacbio : for clr reads (lower quality than hifi)) +// With flye : 'hifi' (--pacbio-hifi mode), 'ont' (--nano-raw mode), 'pacbio+ont', 'clr' (--pacbio-raw) // With verkko : 'pacbio' (--hifi: uses HiFi data only), 'ont' (--nano : uses nanopore data only), 'pacbio+ont' (--hifi --nano) polishing_method = "none" // 'pilon' or 'none' - purging_method = "purge_dups" //DO NOT MODIFY + purging_method = "purge_dups" // "purge_dups" or "no" scaffolding_method = "yahs" // 'yahs' or 'salsa' - manual_curation = "none" // DO NOT MODIFY - mitohifi = "no" // 'yes' or 'no' + +//Optional steps + mitohifi = "no" // 'yes' or 'no' - Geneerate the mitochondrial assembly + execute_kraken = "no" // 'yes' or 'no' - Assigning taxonomic labels to short DNA sequences + fcs = "no" // 'yes' or 'no' - Foreign contamination screening + methylation_calling = "no" // 'yes' or 'no' + juicer = "no" // 'yes' or 'no' - HiC contact map + genome_comparison = "no" // 'yes' or 'no' - Jupiter plots using circos + blobtools = "no" // 'yes' or 'no' - Overview of data quality + pretext = "no" // 'yes' or 'no' - HiC contact map + run_busco = "no" // 'yes' or 'no' + busco_extend = "limited" // 'every_step' or 'limited' + manual_curation = "none" // 'yes' or 'no' - This parameter doesn't change the pipeline, it is only used to track which assemblies have been manually curated //If scaffolding_method == 'salsa' restriction_site = "^GATC,G^ANTC,C^TNAG,T^TAA" ligation_site = "GATCGATC,GANTGATC,GANTANTC,GATCANTC" bin_size = "1000000" -// Additional info - taxon_taxid = "1" - taxon_name = "sars cov" //From GoAT (https://goat.genomehubs.org) - ploidy = "1" //From GoAT (https://goat.genomehubs.org) - hap_gen_size_Gb = "0.000029" //From GoAT (https://goat.genomehubs.org) - in Gb without the unit (Ex : in GOAT, genome size --> 1.15G, input 1.15 here) - chrom_num = "2" //From GoAT (https://goat.genomehubs.org) - Number of chromosomes - lineage = "vertebrata_odb10" - lineage2 = "metazoa_odb10" - lineage3 = "eukaryota_odb10" - - email = "" //For mitohifi - //Path - Do not modify - JUICER_JAR = "/juicer_tools_1.22.01.jar" - busco_lineages_path = "/busco_downloads/" - kraken_db = "/kraken-db/" + scratch_dir = "$baseDir" + outdir = "${scratch_dir}/${id}/${pipeline_version}/" + busco_lineages_path = "${scratch_dir}/busco_downloads/" + kraken_db = "${scratch_dir}/kraken-db/" + singularity_cache = "${scratch_dir}/singularity/" + fcs_gx_database = "${scratch_dir}/fcs_gx/gxdb/all" + blobtoolkit_path = "${scratch_dir}/blobtoolkit" + modules_path = "${scratch_dir}/pipeline/modules/" + + JUICER_JAR = "${singularity_cache}/juicer_tools_1.22.01.jar" + Blobtoolkit_db = "/BlobtoolkitDatabase/" + email_adress = "" } process { - executor="local" + cache = 'lenient' + executor= "local" // "local" or "slurm" withLabel:'process_high' { - cpus = 16 - memory = '300 GB' + cpus = 2 + memory = '6 GB' } withLabel:'process_medium' { - cpus = 7 - memory = '200 GB' + cpus = 2 + memory = '6 GB' } withLabel:'process_low' { - cpus = 3 - memory = '100 GB' + cpus = 2 + memory = '6 GB' } -//Pre-processing} - withName: 'BAMTOOLS_FILTER_PACBIO_CELL1|BAMTOOLS_FILTER_PACBIO_CELL2|BAMTOOLS_FILTER_PACBIO_CELL3|BAMTOOLS_FILTER_PACBIO_CELL4' { + withName: 'GOAT_TAXONSEARCH' { + ext.args = "-P -k -G" + } + +//Pre-processing + withName: 'BAMTOOLS_FILTER_PACBIO' { ext.args = [ params.pacbio_rq ? "-tag 'rq':'>=${params.pacbio_rq}'" : '', ].join(' ') } //Needed to run the blobtools_pipeline script - withName: 'BAM2FASTX|TWOBAM2FASTX|THREEBAM2FASTX|FOURBAM2FASTX' { + withName: 'BAM2FASTX' { publishDir = [ path: { "${params.outdir}/preprocessing/bam2fastx" }, - mode : 'copy' + mode : 'copy', + pattern : "*.fastq.gz" ] } + + withName: 'PREPROCESS_MERGED' { + ext.args_bamtools_filter = [ + params.pacbio_rq ? "-tag 'rq':'>=${params.pacbio_rq}'" : '', + ].join(' ') + publishDir = [ + path: { "${params.outdir}/preprocessing/bam2fastx" }, + mode : 'copy', + pattern : "*.fastq.gz" + ] + } + withName: 'CUTADAPT' { ext.args = '--anywhere ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT --anywhere ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT --error-rate 0.1 --overlap 35 --times 3 --revcomp --discard-trimmed' } //QC Input data - withName: 'LONGQC' { + withName: 'LONGQC_PACBIO' { + ext.args = '-x pb-sequel' publishDir = [ - path: { "${params.outdir}/QC/LongQC" }, + path: { "${params.outdir}/QC/input_data/LongQC_PacBio" }, mode : 'copy' ] } + withName: 'LONGQC_ONT' { + ext.args = '-x ont-rapid' + publishDir = [ + path: { "${params.outdir}/QC/input_data/LongQC_ONT" }, + mode : 'copy' + ] + } withName: 'MERYL_COUNT|MERYL_UNIONSUM|MERYL_HISTOGRAM' { ext.args = 'k=21' } withName: 'GENOMESCOPE2' { - ext.args = [ - '-k 21', - params.ploidy ? "-p ${params.ploidy}" : '' - ].join(' ') + ext.args = '-k 21' publishDir = [ - path: { "${params.outdir}/QC/genomescope2" }, + path: { "${params.outdir}/QC/input_data/genomescope2" }, mode : 'copy', pattern : "*.png" ] @@ -133,27 +174,20 @@ process { //Assembly //HifiASM withName: 'HIFIASM' { - cpus=64 - memory = '600 GB' - ext.args = [ - '-l 1', - params.ploidy ? "--n-hap ${params.ploidy}" : '', - params.hap_gen_size_Gb ? "--hg-size ${params.hap_gen_size_Gb}g" : '' - ].join(' ') + cpus=16 + memory = '62 GB' + ext.args = '-l 1' publishDir = [ - path: { "${params.outdir}/assembly/hifiasm" }, + path: { "${params.outdir}/assembly/1_contig/hifiasm" }, mode : 'copy', - saveAs: { filename -> "${params.id}_$filename" } + saveAs: { filename -> "$filename" } ] } //CANU withName: 'CANU' { - ext.args = [ - params.hap_gen_size_Gb ? "genomesize=${params.hap_gen_size_Gb}g" : '' - ].join(' ') - publishDir = [ - path: { "${params.outdir}/assembly/canu" }, + publishDir = [ + path: { "${params.outdir}/assembly/1_contig/canu" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" } ] @@ -170,12 +204,9 @@ process { ext.args = '-m 8 -x -6 -g -8 -w 500' } withName: 'LONGSTITCH' { - ext.args = [ - 'z=100', - params.hap_gen_size_Gb ? "G=${params.hap_gen_size_Gb}e9" : '' - ].join(' ') + ext.args = 'z=100' publishDir = [ - path: { "${params.outdir}/assembly/flye" }, + path: { "${params.outdir}/assembly/1_contig/flye" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" } ] @@ -183,7 +214,7 @@ process { //VERKKO withName: 'VERKKO' { publishDir = [ - path: { "${params.outdir}/assembly/verkko" }, + path: { "${params.outdir}/assembly/1_contig/verkko" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" } ] @@ -193,7 +224,7 @@ process { withName: 'MITOHIFI' { ext.args = '-p 40 -o 2' publishDir = [ - path: { "${params.outdir}/mitohifi" }, + path: { "${params.outdir}/assembly/mitohifi" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" } ] @@ -202,19 +233,47 @@ process { //PILON +////Assembly cleaning + withName: 'FCS_FCSADAPTOR_hap1|FCS_FCSADAPTOR_ALT' { + ext.args = '--euk' + publishDir = [ + path: { "${params.outdir}/assembly/2_FCS/fcs_adaptor_cleaned/" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + ] + } + + withName: 'FCS_FCSGX_hap1|FCS_FCSGX_ALT' { + memory = '600 GB' + cpus=42 + publishDir = [ + path: { "${params.outdir}/assembly/2_FCS/fcs_gx_cleaned/" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + ] + } + + withName: 'FCS_FCSGX_CLEAN_hap1|FCS_FCSGX_CLEAN_ALT' { + publishDir = [ + path: { "${params.outdir}/assembly/2_FCS/fcs_gx_cleaned/" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + ] + } + //PurgeDups - withName: 'PURGEDUPS_CALCUTS|PURGEDUPS_CALCUTS_ALT' { + withName: 'PURGEDUPS_CALCUTS_hap1|PURGEDUPS_CALCUTS_ALT' { ext.args = '-d 1 -u 63' } - withName: 'MINIMAP2_ALIGN_TO_CONTIG|MINIMAP2_ALIGN_TO_CONTIG_ALT' { + withName: 'MINIMAP2_ALIGN_TO_CONTIG_hap1|MINIMAP2_ALIGN_TO_CONTIG_ALT' { ext.args = '-x asm5' } - withName: 'MINIMAP2_ALIGN_TO_SELF|MINIMAP2_ALIGN_TO_SELF_ALT' { + withName: 'MINIMAP2_ALIGN_TO_SELF_hap1|MINIMAP2_ALIGN_TO_SELF_ALT' { ext.args = '-DP -k19 -w 19 -m200' } - withName: 'PURGEDUPS_GETSEQS|SAMTOOLS_FAIDX1|PURGEDUPS_GETSEQS_ALT|SAMTOOLS_FAIDX1_ALT' { + withName: 'PURGEDUPS_GETSEQS_hap1|SAMTOOLS_FAIDX1|PURGEDUPS_GETSEQS_ALT|SAMTOOLS_FAIDX1_ALT' { publishDir = [ - path: { "${params.outdir}/purge_dups" }, + path: { "${params.outdir}/assembly/3_purged/purge_dups" }, mode : 'copy' ] } @@ -224,149 +283,346 @@ process { withName: 'SALSA2' { ext.args = '-o scaffolds -m CLEAN -e GATC,GANTC,CTNAG,TTAA' publishDir = [ - path: { "${params.outdir}/hic_scaffolding/salsa2" }, + path: { "${params.outdir}/assembly/4_scaffold/salsa2" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" } ] } //YAHS - withName: 'CHROMAP_CHROMAP|CHROMAP_CHROMAP_ALT' { + withName: 'CHROMAP_CHROMAP_hap1|CHROMAP_CHROMAP_ALT' { ext.args = '--preset hic --remove-pcr-duplicates --SAM' } - withName: 'YAHS|YAHS_ALT' { + withName: 'YAHS_ALT' { ext.args = '-l 10 --no-contig-ec -e GATC,GANTC,CTNAG,TTAA' publishDir = [ - path: { "${params.outdir}/hic_scaffolding/yahs" }, + path: { "${params.outdir}/assembly/4_scaffold/yahs" }, mode : 'copy', - saveAs: { filename -> "${params.id}_$filename" }, + saveAs: { filename -> "$filename" }, pattern : "*.fa" ] } + withName: 'YAHS_hap1' { + ext.args = '-l 10 --no-contig-ec -e GATC,GANTC,CTNAG,TTAA' + publishDir = [ + path: { "${params.outdir}/manualcuration/" }, + mode : 'copy', + ] + publishDir = [ + path: { "${params.outdir}/assembly/4_scaffold/yahs" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + pattern : "*.fa" + ] + } + //Assembly QC //CONTACT Maps withName: 'SALSA2_JUICER|YAHS_JUICER' { ext.args = '-a' publishDir = [ - path: { "${params.outdir}/QC/juicer" }, + path: { "${params.outdir}/QC/juicer/scaffold/" }, mode : 'copy' ] } withName: 'JUICER' { ext.args = '-S postproc' publishDir = [ - path: { "${params.outdir}/QC/juicer" }, + path: { "${params.outdir}/QC/juicer/scaffold/" }, mode : 'copy' ] } + withName: 'PRETEXTMAP' { + publishDir = [ + path: { "${params.outdir}/QC/pretext/pretext/scaffold/" }, + mode : 'copy' + ] + } + withName: 'PRETEXTGRAPH_TELO_COV|PRETEXTGRAPH_TELO' { + publishDir = [ + path: { "${params.outdir}/manualcuration/" }, + mode : 'copy', + ] + publishDir = [ + path: { "${params.outdir}/QC/pretext/pretext/scaffold/" }, + mode : 'copy' + ] + } + withName: 'PRETEXTSNAPSHOT' { ext.args = '--sequences "=full, =all"' cpus=4 memory = '100 GB' - publishDir = [ - path: { "${params.outdir}/QC/pretext" }, + publishDir = [ + path: { "${params.outdir}/QC/pretext/pretext/scaffold/" }, mode : 'copy' - ] + ] } + + //Assembly continuity //MERQURY - withName: 'MERQURY1|MERQURY2|MERQURY3|MERQURY1_DOUBLE|MERQURY2_DOUBLE|MERQURY3_DOUBLE' { + withName: 'MERQURY_ASS|MERQURY_ASS_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/merqury/contig/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + + withName: 'MERQURY_PURGED|MERQURY_PURGED_DOUBLE' { publishDir = [ - path: { "${params.outdir}/QC/merqury" }, + path: { "${params.outdir}/QC/merqury/purged" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" } ] } - withName: 'QUAST1|QUAST1_DOUBLE' { - ext.args2 = [ - params.hap_gen_size_Gb ? "hap_gen_size_bp=\$(echo ${params.hap_gen_size_Gb} | awk '{print \$1 * 1000000000}')" : '' - ].join(' ') - ext.args = [ - params.hap_gen_size_Gb ? "--est-ref-size \$hap_gen_size_bp" : '' - ].join(' ') + withName: 'MERQURY_SCAFF|MERQURY_SCAFF_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/merqury/scaffold" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + + withName: 'QUAST_ASS|QUAST_ASS_DOUBLE' { publishDir = [ - path: { "${params.outdir}/QC/quast1" }, + path: { "${params.outdir}/QC/quast/contig/" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" }, - pattern : "*.tsv" + pattern : "report.tsv" ] } - withName: 'QUAST2|QUAST2_DOUBLE' { - ext.args2 = [ - params.hap_gen_size_Gb ? "hap_gen_size_bp=\$(echo ${params.hap_gen_size_Gb} | awk '{print \$1 * 1000000000}')" : '' - ].join(' ') - ext.args = [ - params.hap_gen_size_Gb ? "--est-ref-size \$hap_gen_size_bp" : '' - ].join(' ') + + withName: 'QUAST_PILON' { publishDir = [ - path: { "${params.outdir}/QC/quast2" }, + path: { "${params.outdir}/QC/quast/pilon/" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" }, - pattern : "*.tsv" + pattern : "report.tsv" ] } - withName: 'QUAST_PILON' { - ext.args2 = [ - params.hap_gen_size_Gb ? "hap_gen_size_bp=\$(echo ${params.hap_gen_size_Gb} | awk '{print \$1 * 1000000000}')" : '' - ].join(' ') - ext.args = [ - params.hap_gen_size_Gb ? "--est-ref-size \$hap_gen_size_bp" : '' - ].join(' ') + + withName: 'QUAST_CLEAN|QUAST_CLEAN_DOUBLE' { publishDir = [ - path: { "${params.outdir}/QC/quast_pilon" }, + path: { "${params.outdir}/QC/quast/FCS/" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" }, - pattern : "*.tsv" - ] - } - withName: 'QUAST3|QUAST3_DOUBLE' { - ext.args2 = [ - params.hap_gen_size_Gb ? "hap_gen_size_bp=\$(echo ${params.hap_gen_size_Gb} | awk '{print \$1 * 1000000000}')" : '' - ].join(' ') - ext.args = [ - params.hap_gen_size_Gb ? "--est-ref-size \$hap_gen_size_bp" : '' - ].join(' ') + pattern : "report.tsv" + ] + } + + withName: 'QUAST_PURGED|QUAST_PURGED_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/quast/purged/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "report.tsv" + ] + } + + withName: 'QUAST_SCAFF|QUAST_SCAFF_DOUBLE' { publishDir = [ - path: { "${params.outdir}/QC/quast3" }, + path: { "${params.outdir}/QC/quast/scaffold/" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" }, - pattern : "*.tsv" + pattern : "report.tsv" ] } // Completeness - withName: 'BUSCO|BUSCO_ALT' { + withName: 'BUSCO_lin1_PRIM' { + memory = '700 GB' + cpus=32 ext.args = '--mode genome' publishDir = [ - path: { "${params.outdir}/QC/busco1" }, + path: { "${params.outdir}/QC/busco/contig_hap1_lin1" }, mode : 'copy', pattern : "short_summary.*.txt" ] + publishDir = [ + path: { "${params.outdir}/QC/busco/contig_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] } - withName: 'BUSCO_lin2|BUSCO_lin3|BUSCO_lin4|BUSCO_lin2ALT|BUSCO_lin3ALT|BUSCO_lin4ALT' { + withName: 'BUSCO_lin1_cleaned' { + memory = '700 GB' + cpus=32 ext.args = '--mode genome' - } - -//MultiQC - withName: 'MULTIQC' { + publishDir = [ + path: { "${params.outdir}/QC/busco/FCS_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/FCS_hap1_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + } + withName: 'BUSCO_lin1_purged' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/purged_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/purged_hap1_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + } + withName: 'BUSCO_lin1_SCAFF' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/scaffold_hap1_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/scaffold_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_lin2' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin2" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] publishDir = [ - path: { "${params.outdir}/QC/multiqc" }, + path: { "${params.outdir}/QC/busco/hap1_lin2" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_lin3' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin3" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin3" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_lin4' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin4" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin4" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" }, - pattern : "*_report.html" + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_ALT' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap2_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" ] + publishDir = [ + path: { "${params.outdir}/QC/busco/hap2_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + +//Methylation + withName: 'JASMINE|PBMM2|SAMTOOLS_INDEX_PBMM2' { + publishDir = [ + path: { "${params.outdir}/methylation/" }, + mode : 'copy', + ] + } + +//BLOBTOOLSKIT + withName: 'GZIP|BLOBTOOLS_CONFIG|BLOBTOOLS_PIPELINE|BLOBTOOLS_CREATE|BLOBTOOLS_ADD|BLOBTOOLS_VIEW' { + publishDir = [ + path: { "${params.outdir}/QC/blobtools/" }, + mode : 'copy', + ] + } + + withName: 'RAPIDCURATION_SPLIT' { + publishDir = [ + path: { "${params.outdir}/manualcuration/" }, + mode : 'copy', + ] + } + +// Genome COmparison + withName: 'NCBIGENOMEDOWNLOAD' { + ext.args = [ + params.related_genome ? "-s genbank -A ${params.related_genome} --formats fasta all" : '' + ].join(' ') + } + withName: 'MASHMAP' { + ext.args = '-f one-to-one --pi 95 -s 100000' + publishDir = [ + path: { "${params.outdir}/QC/mashmap/scaffold/" }, + mode : 'copy', + ] + } + withName: 'JUPITER' { + ext.args = 'ng=90' + publishDir = [ + path: { "${params.outdir}/QC/jupiter/scaffold/" }, + mode : 'copy', + ] + } + + withName: 'BEDTOOLS_GENOMECOV' { + ext.args = '-bga' + } + +//MultiQC + withName: 'MULTIQC' { + ext.args = '--fullnames --force' publishDir = [ path: { "${params.outdir}/QC/multiqc" }, mode : 'copy', saveAs: { filename -> "${params.id}_$filename" }, - pattern : "*.json" + pattern : "*_report.html" ] - } + } //Overview withName: 'OVERVIEW_GENERATION_SAMPLE' { @@ -375,92 +631,89 @@ process { mode : 'copy', saveAs: { filename -> "${params.id}_$filename" }, ] - } + } -//BLOBTOOLSKIT - withName: 'GZIP' { + withName: 'CUSTOM_DUMPSOFTWAREVERSIONS' { publishDir = [ - path: { "${params.outdir}/blobtools/" }, + path: { "${params.outdir}/QC/versions/" }, mode : 'copy', - saveAs: { filename -> "$filename" } - ] - } - withName: 'BLOBTOOLS_VIEW_SNAIL|BLOBTOOLS_VIEW_BLOB|BLOBTOOLS_VIEW_CUMULATIVE' { - publishDir = [ - path: { "${params.outdir}/blobtools/" }, - mode : 'copy' ] } + } -profiles { - debug { process.beforeScript = 'echo $HOSTNAME' } - conda { - conda.enabled = true - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - } - mamba { - conda.enabled = true - conda.useMamba = true - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - } - docker { - docker.enabled = true - docker.userEmulation = true - conda.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - } - arm { - docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64' - } - singularity { - singularity.enabled = true - singularity.autoMounts = true - conda.enabled = false - docker.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - } - podman { - podman.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - shifter.enabled = false - charliecloud.enabled = false - } - shifter { - shifter.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - charliecloud.enabled = false - } - charliecloud { - charliecloud.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - } - gitpod { - executor.name = 'local' - executor.cpus = 16 - executor.memory = 60.GB - } +conda{ + enabled = true + createOptions = '--channel conda-forge' } +profiles { + debug { process.beforeScript = 'echo $HOSTNAME' } + conda { + conda.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + mamba { + conda.enabled = true + conda.useMamba = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + docker { + docker.enabled = true + docker.userEmulation = true + conda.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + arm { + docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64' + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + conda.enabled = false + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + podman { + podman.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + shifter.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + } + charliecloud { + charliecloud.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + } + gitpod { + executor.name = 'local' + executor.cpus = 16 + executor.memory = 60.GB + } +} diff --git a/nextflow_github_test.config b/nextflow_github_test.config new file mode 100644 index 0000000..f0013a6 --- /dev/null +++ b/nextflow_github_test.config @@ -0,0 +1,719 @@ +params { +//Specie parameters + id = "github" + taxon_taxid = "50000" + related_genome = "GCA_002816235.1" // optional, will be used to generate the Jupiter plot + string_telomere = "TTAGGG" + pipeline_version = "V2" + +//Optional (if not indicated, autolineage for busco) +// lineage = "" +// lineage2 = "vertebrata_odb10" +// lineage3 = "metazoa_odb10" +// lineage4 = "eukaryota_odb10" + +//Data input + raw_data_path = "https://github.com/bcgsc/Canadian_Biogenome_Project/raw/V2/example_input" +//PacBio input + pacbio_input_type = "ccs" // 'hifi' or 'ccs' or 'clr' - HiFi mode: skips filtering + bam_cell1 = "${raw_data_path}/subset_covid_hifi.bam" +// bam_cell2 = "${raw_data_path}/pacbio/" +// bam_cell3 = "${raw_data_path}/pacbio/" +// bam_cell4 = "${raw_data_path}/pacbio/" + +//HiC Illumina input + hic_read1 = "${raw_data_path}/test_1.fastq.gz" + hic_read2 = "${raw_data_path}/test_2.fastq.gz" + Illumina_prefix = "test" + +//ONT input +// ont_fastq_1 = "${raw_data_path}/nanopore/" + +//Illumina short reads input +// illumina_SR_read1 = "${raw_data_path}/SR/" +// illumina_SR_read2 = "${raw_data_path}/SR/" + +//Pre-processing +//PacBio reads obtained from SickKids are CCS, not Hifi : It includes HiFi reads but also reads of lower quality. +// This threshold allow to remove reads of lower quality (equivalent to --min-rq in ccs software). +// If 'pacbio_rq'=0.99 --> output bam will contain HiFi Reads with predicted accuracy ≥Q20 (HiFi reads only, Probability of incorrect base call : 1 in 100), equivalent of using extracthifi software +// If 'pacbio_rq'=0.9 --> Phred Quality Score =10, Probability of incorrect base call : 1 in 10 +// If 'pacbio_rq'=0.999 --> Phred Quality Score =30, Probability of incorrect base call : 1 in 1000 +// For CLR, set 'pacbio_rq'=0.1 to keep all the CLR reads. Canu will deal with them + pacbio_rq = "0.9" + +//Method + assembly_method = "hifiasm" // 'hifiasm' or 'canu' of 'flye' or 'verkko' + assembly_secondary_mode = "pacbio" // Depends on the assembly method selected, details in the following lines : + hap2 = "no" // With hifiasm, it is possible to process hap2 + +// With hifiasm : 'pacbio' (uses pacbio data only), 'pacbio+hic' (--h1 //--h2 : include Hi-C integration, requires Hi-C reads, VGP says that the output requires additional manual curation), 'pacbio+ont' (--ul : Ultra-long ONT integration), 'pacbio+ont+hic' +// With canu : 'hicanu' (-pacbio-hifi : uses HiFi data only), 'ont' (-nanopore : uses nanopore data only), 'clr' (-pacbio : for clr reads (lower quality than hifi)) +// With flye : 'hifi' (--pacbio-hifi mode), 'ont' (--nano-raw mode), 'pacbio+ont', 'clr' (--pacbio-raw) +// With verkko : 'pacbio' (--hifi: uses HiFi data only), 'ont' (--nano : uses nanopore data only), 'pacbio+ont' (--hifi --nano) + polishing_method = "none" // 'pilon' or 'none' + purging_method = "purge_dups" // "purge_dups" or "no" + scaffolding_method = "yahs" // 'yahs' or 'salsa' + +//Optional steps + mitohifi = "no" // 'yes' or 'no' - Geneerate the mitochondrial assembly + execute_kraken = "no" // 'yes' or 'no' - Assigning taxonomic labels to short DNA sequences + fcs = "no" // 'yes' or 'no' - Foreign contamination screening + methylation_calling = "no" // 'yes' or 'no' + juicer = "no" // 'yes' or 'no' - HiC contact map + genome_comparison = "no" // 'yes' or 'no' - Jupiter plots using circos + blobtools = "no" // 'yes' or 'no' - Overview of data quality + pretext = "no" // 'yes' or 'no' - HiC contact map + run_busco = "no" // 'yes' or 'no' + busco_extend = "limited" // 'every_step' or 'limited' + manual_curation = "none" // 'yes' or 'no' - This parameter doesn't change the pipeline, it is only used to track which assemblies have been manually curated + +//If scaffolding_method == 'salsa' + restriction_site = "^GATC,G^ANTC,C^TNAG,T^TAA" + ligation_site = "GATCGATC,GANTGATC,GANTANTC,GATCANTC" + bin_size = "1000000" + +//Path - Do not modify + scratch_dir = "$baseDir" + outdir = "${scratch_dir}/${id}/${pipeline_version}/" + busco_lineages_path = "${scratch_dir}/busco_downloads/" + kraken_db = "${scratch_dir}/kraken-db/" + singularity_cache = "${scratch_dir}/singularity/" + fcs_gx_database = "${scratch_dir}/fcs_gx/gxdb/all" + blobtoolkit_path = "${scratch_dir}/blobtoolkit" + modules_path = "${scratch_dir}/pipeline/modules/" + + JUICER_JAR = "${singularity_cache}/juicer_tools_1.22.01.jar" + Blobtoolkit_db = "/BlobtoolkitDatabase/" + email_adress = "" +} + +process { + cache = 'lenient' + executor= "local" // "local" or "slurm" + withLabel:'process_high' { + cpus = 2 + memory = '6 GB' + } + withLabel:'process_medium' { + cpus = 2 + memory = '6 GB' + } + withLabel:'process_low' { + cpus = 2 + memory = '6 GB' + } + + withName: 'GOAT_TAXONSEARCH' { + ext.args = "-P -k -G" + } + +//Pre-processing + withName: 'BAMTOOLS_FILTER_PACBIO' { + ext.args = [ + params.pacbio_rq ? "-tag 'rq':'>=${params.pacbio_rq}'" : '', + ].join(' ') + } + +//Needed to run the blobtools_pipeline script + withName: 'BAM2FASTX' { + publishDir = [ + path: { "${params.outdir}/preprocessing/bam2fastx" }, + mode : 'copy', + pattern : "*.fastq.gz" + ] + } + + withName: 'PREPROCESS_MERGED' { + ext.args_bamtools_filter = [ + params.pacbio_rq ? "-tag 'rq':'>=${params.pacbio_rq}'" : '', + ].join(' ') + publishDir = [ + path: { "${params.outdir}/preprocessing/bam2fastx" }, + mode : 'copy', + pattern : "*.fastq.gz" + ] + } + + withName: 'CUTADAPT' { + ext.args = '--anywhere ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT --anywhere ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT --error-rate 0.1 --overlap 35 --times 3 --revcomp --discard-trimmed' + } + +//QC Input data + withName: 'LONGQC_PACBIO' { + ext.args = '-x pb-sequel' + publishDir = [ + path: { "${params.outdir}/QC/input_data/LongQC_PacBio" }, + mode : 'copy' + ] + } + withName: 'LONGQC_ONT' { + ext.args = '-x ont-rapid' + publishDir = [ + path: { "${params.outdir}/QC/input_data/LongQC_ONT" }, + mode : 'copy' + ] + } + withName: 'MERYL_COUNT|MERYL_UNIONSUM|MERYL_HISTOGRAM' { + ext.args = 'k=21' + } + withName: 'GENOMESCOPE2' { + ext.args = '-k 21' + publishDir = [ + path: { "${params.outdir}/QC/input_data/genomescope2" }, + mode : 'copy', + pattern : "*.png" + ] + } + withName: 'KRAKEN2_KRAKEN2_PACBIO_BAM|KRAKEN2_KRAKEN2_HIC_READS|KRAKEN2_KRAKEN2_SR_READS|KRAKEN2_KRAKEN2_ONT_READS' { + cpus=64 + memory = '400 GB' + ext.args = '--memory-mapping --quick' + } + +//Assembly +//HifiASM + withName: 'HIFIASM' { + cpus=2 + memory = '6 GB' + ext.args = '-l 1' + publishDir = [ + path: { "${params.outdir}/assembly/1_contig/hifiasm" }, + mode : 'copy', + saveAs: { filename -> "$filename" } + ] + } + +//CANU + withName: 'CANU' { + publishDir = [ + path: { "${params.outdir}/assembly/1_contig/canu" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + +//FLYE + withName: 'FLYE|FLYE_PACBIO_ONT' { + memory = '300 GB' + } + withName: 'MINIMAP_ALIGN_FLYE' { + ext.args = '-xmap-hifi' + } + withName: 'RACON' { + ext.args = '-m 8 -x -6 -g -8 -w 500' + } + withName: 'LONGSTITCH' { + ext.args = 'z=100' + publishDir = [ + path: { "${params.outdir}/assembly/1_contig/flye" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } +//VERKKO + withName: 'VERKKO' { + publishDir = [ + path: { "${params.outdir}/assembly/1_contig/verkko" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + +//MitoHifi + withName: 'MITOHIFI' { + ext.args = '-p 40 -o 2' + publishDir = [ + path: { "${params.outdir}/assembly/mitohifi" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + + +//PILON + +////Assembly cleaning + withName: 'FCS_FCSADAPTOR_hap1|FCS_FCSADAPTOR_ALT' { + ext.args = '--euk' + publishDir = [ + path: { "${params.outdir}/assembly/2_FCS/fcs_adaptor_cleaned/" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + ] + } + + withName: 'FCS_FCSGX_hap1|FCS_FCSGX_ALT' { + memory = '600 GB' + cpus=42 + publishDir = [ + path: { "${params.outdir}/assembly/2_FCS/fcs_gx_cleaned/" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + ] + } + + withName: 'FCS_FCSGX_CLEAN_hap1|FCS_FCSGX_CLEAN_ALT' { + publishDir = [ + path: { "${params.outdir}/assembly/2_FCS/fcs_gx_cleaned/" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + ] + } + +//PurgeDups + withName: 'PURGEDUPS_CALCUTS_hap1|PURGEDUPS_CALCUTS_ALT' { + ext.args = '-d 1 -u 63' + } + withName: 'MINIMAP2_ALIGN_TO_CONTIG_hap1|MINIMAP2_ALIGN_TO_CONTIG_ALT' { + ext.args = '-x asm5' + } + withName: 'MINIMAP2_ALIGN_TO_SELF_hap1|MINIMAP2_ALIGN_TO_SELF_ALT' { + ext.args = '-DP -k19 -w 19 -m200' + } + withName: 'PURGEDUPS_GETSEQS_hap1|SAMTOOLS_FAIDX1|PURGEDUPS_GETSEQS_ALT|SAMTOOLS_FAIDX1_ALT' { + publishDir = [ + path: { "${params.outdir}/assembly/3_purged/purge_dups" }, + mode : 'copy' + ] + } + +//HIC scaffolding +//SALSA2 + withName: 'SALSA2' { + ext.args = '-o scaffolds -m CLEAN -e GATC,GANTC,CTNAG,TTAA' + publishDir = [ + path: { "${params.outdir}/assembly/4_scaffold/salsa2" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + +//YAHS + withName: 'CHROMAP_CHROMAP_hap1|CHROMAP_CHROMAP_ALT' { + ext.args = '--preset hic --remove-pcr-duplicates --SAM' + } + + withName: 'YAHS_ALT' { + ext.args = '-l 10 --no-contig-ec -e GATC,GANTC,CTNAG,TTAA' + publishDir = [ + path: { "${params.outdir}/assembly/4_scaffold/yahs" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + pattern : "*.fa" + ] + } + + withName: 'YAHS_hap1' { + ext.args = '-l 10 --no-contig-ec -e GATC,GANTC,CTNAG,TTAA' + publishDir = [ + path: { "${params.outdir}/manualcuration/" }, + mode : 'copy', + ] + publishDir = [ + path: { "${params.outdir}/assembly/4_scaffold/yahs" }, + mode : 'copy', + saveAs: { filename -> "$filename" }, + pattern : "*.fa" + ] + } + +//Assembly QC +//CONTACT Maps + withName: 'SALSA2_JUICER|YAHS_JUICER' { + ext.args = '-a' + publishDir = [ + path: { "${params.outdir}/QC/juicer/scaffold/" }, + mode : 'copy' + ] + } + withName: 'JUICER' { + ext.args = '-S postproc' + publishDir = [ + path: { "${params.outdir}/QC/juicer/scaffold/" }, + mode : 'copy' + ] + } + + withName: 'PRETEXTMAP' { + publishDir = [ + path: { "${params.outdir}/QC/pretext/pretext/scaffold/" }, + mode : 'copy' + ] + } + withName: 'PRETEXTGRAPH_TELO_COV|PRETEXTGRAPH_TELO' { + publishDir = [ + path: { "${params.outdir}/manualcuration/" }, + mode : 'copy', + ] + publishDir = [ + path: { "${params.outdir}/QC/pretext/pretext/scaffold/" }, + mode : 'copy' + ] + } + + withName: 'PRETEXTSNAPSHOT' { + ext.args = '--sequences "=full, =all"' + cpus=4 + memory = '100 GB' + publishDir = [ + path: { "${params.outdir}/QC/pretext/pretext/scaffold/" }, + mode : 'copy' + ] + } + + + +//Assembly continuity +//MERQURY + withName: 'MERQURY_ASS|MERQURY_ASS_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/merqury/contig/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + + withName: 'MERQURY_PURGED|MERQURY_PURGED_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/merqury/purged" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + + withName: 'MERQURY_SCAFF|MERQURY_SCAFF_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/merqury/scaffold" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" } + ] + } + + withName: 'QUAST_ASS|QUAST_ASS_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/quast/contig/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "report.tsv" + ] + } + + withName: 'QUAST_PILON' { + publishDir = [ + path: { "${params.outdir}/QC/quast/pilon/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "report.tsv" + ] + } + + withName: 'QUAST_CLEAN|QUAST_CLEAN_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/quast/FCS/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "report.tsv" + ] + } + + withName: 'QUAST_PURGED|QUAST_PURGED_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/quast/purged/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "report.tsv" + ] + } + + withName: 'QUAST_SCAFF|QUAST_SCAFF_DOUBLE' { + publishDir = [ + path: { "${params.outdir}/QC/quast/scaffold/" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "report.tsv" + ] + } + +// Completeness + withName: 'BUSCO_lin1_PRIM' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/contig_hap1_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/contig_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_lin1_cleaned' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/FCS_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/FCS_hap1_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + } + withName: 'BUSCO_lin1_purged' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/purged_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/purged_hap1_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + } + withName: 'BUSCO_lin1_SCAFF' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/scaffold_hap1_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/scaffold_hap1_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_lin2' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin2" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin2" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_lin3' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin3" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin3" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_lin4' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin4" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/hap1_lin4" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + withName: 'BUSCO_ALT' { + memory = '700 GB' + cpus=32 + ext.args = '--mode genome' + publishDir = [ + path: { "${params.outdir}/QC/busco/hap2_lin1" }, + mode : 'copy', + pattern : "short_summary.*.txt" + ] + publishDir = [ + path: { "${params.outdir}/QC/busco/hap2_lin1" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*full_table.tsv" + ] + } + +//Methylation + withName: 'JASMINE|PBMM2|SAMTOOLS_INDEX_PBMM2' { + publishDir = [ + path: { "${params.outdir}/methylation/" }, + mode : 'copy', + ] + } + +//BLOBTOOLSKIT + withName: 'GZIP|BLOBTOOLS_CONFIG|BLOBTOOLS_PIPELINE|BLOBTOOLS_CREATE|BLOBTOOLS_ADD|BLOBTOOLS_VIEW' { + publishDir = [ + path: { "${params.outdir}/QC/blobtools/" }, + mode : 'copy', + ] + } + + withName: 'RAPIDCURATION_SPLIT' { + publishDir = [ + path: { "${params.outdir}/manualcuration/" }, + mode : 'copy', + ] + } + +// Genome COmparison + withName: 'NCBIGENOMEDOWNLOAD' { + ext.args = [ + params.related_genome ? "-s genbank -A ${params.related_genome} --formats fasta all" : '' + ].join(' ') + } + withName: 'MASHMAP' { + ext.args = '-f one-to-one --pi 95 -s 100000' + publishDir = [ + path: { "${params.outdir}/QC/mashmap/scaffold/" }, + mode : 'copy', + ] + } + withName: 'JUPITER' { + ext.args = 'ng=90' + publishDir = [ + path: { "${params.outdir}/QC/jupiter/scaffold/" }, + mode : 'copy', + ] + } + + withName: 'BEDTOOLS_GENOMECOV' { + ext.args = '-bga' + } + +//MultiQC + withName: 'MULTIQC' { + ext.args = '--fullnames --force' + publishDir = [ + path: { "${params.outdir}/QC/multiqc" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + pattern : "*_report.html" + ] + } + +//Overview + withName: 'OVERVIEW_GENERATION_SAMPLE' { + publishDir = [ + path: { "${params.outdir}/QC/overview" }, + mode : 'copy', + saveAs: { filename -> "${params.id}_$filename" }, + ] + } + + withName: 'CUSTOM_DUMPSOFTWAREVERSIONS' { + publishDir = [ + path: { "${params.outdir}/QC/versions/" }, + mode : 'copy', + ] + } + +} + +conda{ + enabled = true + createOptions = '--channel conda-forge' +} + +profiles { + debug { process.beforeScript = 'echo $HOSTNAME' } + conda { + conda.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + mamba { + conda.enabled = true + conda.useMamba = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + docker { + docker.enabled = true + docker.userEmulation = true + conda.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + arm { + docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64' + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + conda.enabled = false + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + podman { + podman.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + shifter.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + } + charliecloud { + charliecloud.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + } + gitpod { + executor.name = 'local' + executor.cpus = 16 + executor.memory = 60.GB + } +} diff --git a/res/CBP_workflow.png b/res/CBP_workflow.png new file mode 100644 index 0000000..cb6f8c6 Binary files /dev/null and b/res/CBP_workflow.png differ