-
Notifications
You must be signed in to change notification settings - Fork 1k
New module: exomiser/analyse #11023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
New module: exomiser/analyse #11023
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| process EXOMISER_ANALYSE { | ||
| tag "${meta.id}" | ||
| label 'process_medium' | ||
|
|
||
| container "nf-core/exomiser-cli:15.0.0-bash" | ||
|
|
||
| input: | ||
| tuple val(meta), path(vcf), path(ped), val(assembly), path(phenopacket), path(analysis_script) | ||
| tuple val(meta2), path(reference_cache, stageAs: 'exomiser_data/*'), val(reference_version) | ||
| tuple val(meta3), path(phenotype_cache, stageAs: 'exomiser_data/*'), val(phenotype_version) | ||
|
Comment on lines
+9
to
+10
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it maybe make sense to have a seperate module that takes care of properly loading this data? That is what we did for PCGR. That would be then
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO the data is too big to be loaded on the fly. It comes down to about 50GB of reference data in total
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes thats why I would handle it seperately or at least that is how we are doing it with vep cache etc. Either we add the data to the vep cache thingy @maxulysse built or we create a module that can be used in a pipeline to have this loaded see pcgr in the variantprioritization pipeline. And then for testing we subsample this cache to chr22 etc (I did that for pcgr as well).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see #9295 |
||
|
|
||
| output: | ||
| tuple val(meta), path("*.tsv"), emit: tsv | ||
| tuple val(meta), path("*.json"), emit: json | ||
| tuple val(meta), path("*.html"), emit: html | ||
| tuple val(meta), path("*.parquet"), emit: parquet | ||
| tuple val(meta), path("*.vcf"), emit: vcf | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say we need to bgzip this vcf before we output it
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. The thing is, I'm only just testing this tool and I haven't had the opportunity to look further into it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes its bgzipped (otherwise it cannot be index afaik) so perfect |
||
| tuple val("${task.process}"), val('exomiser'), eval("exomiser --version"), topic: versions, emit: versions_exomiser | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| // Exit if running this module with -profile conda / -profile mamba | ||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
| error("EXOMISER_ANALYSE module does not support Conda. Please use Docker / Singularity / Podman instead.") | ||
| } | ||
| def args = task.ext.args ?: '' | ||
| def prefix = task.ext.prefix ?: "${meta.id}" | ||
| def ped_cmd = ped ? "--ped=${ped}" : "" | ||
| def phenopacket_cmd = phenopacket ? "--sample=${phenopacket}" : "" | ||
| def assembly_cmd = assembly ? "--assembly=${assembly}" : "" | ||
| def analysis_cmd = analysis_script ? "--analysis ${analysis_script}" : "" | ||
| def vcf_cmd = vcf ? "--vcf=${vcf}" : "" | ||
|
|
||
| """ | ||
| export EXOMISER_DATA_DIRECTORY=./exomiser_data | ||
matthdsm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| export EXOMISER_${assembly}_DATA_VERSION=${reference_version} | ||
| export EXOMISER_PHENOTYPE_DATA_VERSION=${phenotype_version} | ||
|
|
||
| exomiser analyse \\ | ||
| ${ped_cmd} \\ | ||
| ${phenopacket_cmd} \\ | ||
| ${assembly_cmd} \\ | ||
| ${vcf_cmd}\\ | ||
| ${analysis_cmd} \\ | ||
| ${args} \\ | ||
| --output-directory=\$PWD \\ | ||
| --output-filename=${prefix} | ||
| """ | ||
|
|
||
| stub: | ||
matthdsm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // Exit if running this module with -profile conda / -profile mamba | ||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
| error("EXOMISER_ANALYSE module does not support Conda. Please use Docker / Singularity / Podman instead.") | ||
| } | ||
| def args = task.ext.args ?: '' | ||
| def prefix = task.ext.prefix ?: "${meta.id}" | ||
|
|
||
| """ | ||
| echo ${args} | ||
| touch ${prefix}.tsv | ||
| touch ${prefix}.json | ||
| touch ${prefix}.html | ||
| touch ${prefix}.parquet | ||
| touch ${prefix}.vcf | ||
| """ | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| name: "exomiser_analyse" | ||
| description: Phenotype-driven variant prioritisation for rare Mendelian | ||
| disorders. | ||
| keywords: | ||
| - exomiser | ||
| - variant prioritisation | ||
| - rare disease | ||
| - Mendelian disorders | ||
| tools: | ||
| - "exomiser": | ||
| description: "A Tool to Annotate and Prioritize Exome Variants" | ||
| homepage: "https://exomiser.readthedocs.io/en/stable/" | ||
| documentation: "https://exomiser.readthedocs.io/en/stable/" | ||
| tool_dev_url: "https://github.com/exomiser/Exomiser" | ||
| doi: "10.1038/s41525-024-00456-2" | ||
| licence: | ||
| - "AGPL-3.0" | ||
| identifier: biotools:exomiser | ||
| input: | ||
| - - meta: | ||
| type: map | ||
| description: Groovy Map containing sample information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - vcf: | ||
| type: file | ||
| description: "VCF file containing variants to be analysed." | ||
| pattern: "*.vcf.gz" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3989 | ||
| - ped: | ||
| type: file | ||
| description: "PED file containing family information." | ||
| pattern: "*.ped" | ||
| ontologies: [] | ||
| - assembly: | ||
| type: string | ||
| description: "Genome assembly to use. e.g. GRCh37, GRCh38" | ||
| - phenopacket: | ||
| type: file | ||
| description: "Phenopacket file containing phenotype information." | ||
| pattern: "*.{yml,yaml,json}" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3750 | ||
| - edam: http://edamontology.org/format_3464 | ||
| - analysis_script: | ||
| type: file | ||
| description: "Custom analysis script for Exomiser analysis" | ||
| pattern: "*.{yml,yaml,json}" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3750 | ||
| - edam: http://edamontology.org/format_3464 | ||
| - - meta2: | ||
| type: map | ||
| description: Groovy Map containing reference cache information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - reference_cache: | ||
| type: file | ||
| description: "Reference cache for Exomiser analysis" | ||
| pattern: "exomiser_data/*" | ||
| ontologies: [] | ||
| - reference_version: | ||
| type: string | ||
| description: "Reference version for Exomiser analysis" | ||
| - - meta3: | ||
| type: map | ||
| description: Groovy Map containing phenotype cache information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - phenotype_cache: | ||
| type: file | ||
| description: "Phenotype cache for Exomiser analysis" | ||
| pattern: "exomiser_data/*" | ||
| ontologies: [] | ||
| - phenotype_version: | ||
| type: string | ||
| description: "Phenotype version for Exomiser analysis" | ||
| output: | ||
| tsv: | ||
| - - meta: | ||
| type: map | ||
| description: Groovy Map containing sample information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - "*.tsv": | ||
| type: file | ||
| description: "TSV file containing prioritized variants." | ||
| pattern: "*.tsv" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3475 | ||
| json: | ||
| - - meta: | ||
| type: map | ||
| description: Groovy Map containing sample information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - "*.json": | ||
| type: file | ||
| description: "JSON file containing prioritized variants." | ||
| pattern: "*.json" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3464 | ||
| html: | ||
| - - meta: | ||
| type: map | ||
| description: Groovy Map containing sample information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - "*.html": | ||
| type: file | ||
| description: "HTML file containing prioritized variants." | ||
| pattern: "*.html" | ||
| ontologies: [] | ||
| parquet: | ||
| - - meta: | ||
| type: map | ||
| description: Groovy Map containing sample information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - "*.parquet": | ||
| type: file | ||
| description: "Parquet file containing prioritized variants." | ||
| pattern: "*.parquet" | ||
| ontologies: [] | ||
| vcf: | ||
| - - meta: | ||
| type: map | ||
| description: Groovy Map containing sample information. e.g. `[ | ||
| id:'sample1' ]` | ||
| - "*.vcf": | ||
| type: file | ||
| description: "VCF file containing prioritized variants." | ||
| pattern: "*.vcf" | ||
| ontologies: [] | ||
| versions_exomiser: | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - exomiser: | ||
| type: string | ||
| description: The name of the tool | ||
| - exomiser --version: | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| topics: | ||
| versions: | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - exomiser: | ||
| type: string | ||
| description: The name of the tool | ||
| - exomiser --version: | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| authors: | ||
| - "@matthdsm" | ||
| maintainers: | ||
| - "@matthdsm" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| nextflow_process { | ||
|
|
||
| name "Test Process EXOMISER_ANALYSE" | ||
| script "../main.nf" | ||
| process "EXOMISER_ANALYSE" | ||
|
|
||
| tag "modules" | ||
| tag "modules_nfcore" | ||
| tag "exomiser" | ||
| tag "exomiser/analyse" | ||
|
|
||
| test("homo_sapiens - vcf - stub") { | ||
|
|
||
| options "-stub" | ||
|
|
||
| when { | ||
| process { | ||
| """ | ||
| input[0] = [ | ||
| [ id:'test' ], | ||
| file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/vcf/test.vcf.gz'), | ||
matthdsm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/ped/test.ped'), | ||
| "GRCh38", | ||
| file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/phenopacket/test.yml'), | ||
| [] | ||
| ] | ||
| input[1] = [ | ||
| [ id:'test' ], | ||
| file("s3://nf-core-reference-data/exomiser/GRCh38/1234/reference_cache"), | ||
| "1234" | ||
| ] | ||
| input[2] = [ | ||
| [ id:'test' ], | ||
| file("s3://nf-core-reference-data/exomiser/GRCh38/1234/phenotype_cache"), | ||
| "1234" | ||
| ] | ||
| """ | ||
| } | ||
| } | ||
|
|
||
| then { | ||
| assertAll ( | ||
| { assert process.success }, | ||
| { assert snapshot(process.out.findAll { key, val -> key.startsWith("versions") }).match() } | ||
| ) | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| { | ||
| "homo_sapiens - vcf - stub": { | ||
| "content": [ | ||
| { | ||
| "versions_exomiser": [ | ||
|
|
||
| ] | ||
| } | ||
| ], | ||
| "meta": { | ||
| "nf-test": "0.9.3", | ||
| "nextflow": "25.10.4" | ||
| }, | ||
| "timestamp": "2026-03-23T19:09:41.158068" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff looks reasonable to me, I don't really have a better idea