-
Notifications
You must be signed in to change notification settings - Fork 1k
Add HTSLIB/BGZIPTABIX and deprecate redundant modules #11571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
934acb9
e22514b
10752d9
eab5fd4
1abe38a
83b1596
3b15095
cb0ef77
1f0b792
d9d8010
df9984d
64170ab
7c60cfd
e7dea71
74279c7
4c4cf85
7cb1de4
03f2444
6e29c48
3ef74c5
7db504a
cd93d55
69f07c3
115d7b7
1747b37
5bfd88a
fddf258
f0e63d8
a5a8ec9
8762946
a8f7621
dfd06a9
d15e7c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| --- | ||
| # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json | ||
| channels: | ||
| - conda-forge | ||
| - bioconda | ||
| dependencies: | ||
| - "bioconda::htslib=1.23.1" | ||
| - "conda-forge::xz=5.8.3" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| process HTSLIB_BGZIPTABIX { | ||
| tag "${meta.id}" | ||
| label 'process_low' | ||
|
|
||
| conda "${moduleDir}/environment.yml" | ||
| container "${workflow.containerEngine in ['singularity', 'apptainer'] && !task.ext.singularity_pull_docker_container | ||
| ? 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/33/33a1f2c7f36ec58339e41cbea096d121f606918778a91cfbef944b40ba7ce48b/data' | ||
| : 'community.wave.seqera.io/library/htslib_xz:49c8c84af5c4b3b9'}" | ||
|
|
||
| input: | ||
| tuple val(meta), path(infile), path(infile_tbi), path(regions) | ||
| val action | ||
| val make_index | ||
| val out_ext | ||
|
Comment on lines
+12
to
+14
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the reason to have them all separated in channels? I feel like they could be in a tuple but thats not a strong opinion. in the pipeline i would control these values via the meta map anyway and set this then in the modules config I think
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like the practice with the most support in the community is to have this type of "setting" inputs is to have them as separate |
||
|
|
||
| output: | ||
| tuple val(meta), path("${outfile}"), emit: output | ||
| tuple val(meta), path("${outfile}.{tbi,csi}"), emit: index, optional: true | ||
| // all htslib tools have the same version, we use bgzip | ||
| tuple val("${task.process}"), val('htslib'), eval("bgzip --version | sed '1! d; s/bgzip (htslib) //'"), topic: versions, emit: versions_htslib | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should add the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
| tuple val("${task.process}"), val('xz'), eval("xz --version | sed '1! d; s/xz (XZ Utils) //'"), topic: versions, emit: versions_xz | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| def allowed_actions = ["compress", "decompress"] | ||
| if (action !in allowed_actions) { | ||
| error("htslib/bgziptabix: Invalid action: ${action}. Allowed actions are: ${allowed_actions.join(', ')}") | ||
| } | ||
|
|
||
| if (action == "decompress" && make_index) { | ||
| log.warn("htslib/bgziptabix: Cannot create index when decompressing. Ignoring make_index option.") | ||
| } | ||
|
|
||
| def args = task.ext.args ?: '' | ||
| def args2 = task.ext.args2 ?: '' | ||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
| outfile = action == "compress" ? (out_ext ? "${prefix}.${out_ext}.gz" : "${prefix}.gz") : (out_ext ? "${prefix}.${out_ext}" : "${prefix}") | ||
|
|
||
| def compress_cmd = action == "compress" ? "bgzip -c ${args} -@ ${task.cpus}" : "cat" | ||
| def bgzip_cmd = action == "compress" ? "[ '\$(basename ${infile})' != '\$(basename ${outfile})' ] && ln -s ${infile} ${outfile}" : "bgzip -c -d ${args} -@ ${task.cpus} ${infile} > ${outfile}" | ||
|
|
||
| def regions_arg = regions ? "-R ${regions}" : "" | ||
| def tabix_cmd = (make_index && !infile_tbi) ? "tabix -@ ${task.cpus} ${regions_arg} ${args2} -f ${outfile}" : "" | ||
| def link_tabix_cmd = make_index && infile_tbi ? "ln -s ${infile_tbi} ${outfile}.${infile_tbi.extension}" : "" | ||
| def uncompressed_cmd = action == "compress" ? "${compress_cmd} ${infile} > ${outfile}" : (infile.getName() == outfile ? "" : "ln -s ${infile} ${outfile}") | ||
| """ | ||
| ${link_tabix_cmd} | ||
|
|
||
| FILE_TYPE=\$(htsfile ${infile}) | ||
|
|
||
| case "\$FILE_TYPE" in | ||
| *BGZF-compressed*) | ||
| ${bgzip_cmd} ;; | ||
| *gzip-compressed*) | ||
| [ "\$(basename ${infile})" == "\$(basename ${outfile})" ] && echo "Input and output names cannot be the same" && exit 1 | ||
| zcat ${infile} | ${compress_cmd} > ${outfile} ;; | ||
| *bzip2-compressed*) | ||
| bzcat ${infile} | ${compress_cmd} > ${outfile} ;; | ||
| *XZ-compressed*) | ||
| xzcat ${infile} | ${compress_cmd} > ${outfile} ;; | ||
| *) | ||
| ${uncompressed_cmd} ;; | ||
| esac | ||
|
|
||
| ${tabix_cmd} | ||
| """ | ||
|
|
||
| stub: | ||
| def args = task.ext.args ?: '' | ||
| def args2 = task.ext.args2 ?: '' | ||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
| outfile = action == "compress" ? (out_ext ? "${prefix}.${out_ext}.gz" : "${prefix}.gz") : (out_ext ? "${prefix}.${out_ext}" : "${prefix}") | ||
|
|
||
| def touch_cmd = action == "compress" ? "echo | bgzip -c" : "echo" | ||
| def index_fmt = args2.contains('-C') ? 'csi' : 'tbi' | ||
| def tabix_cmd = make_index ? "touch ${outfile}.${index_fmt}" : "" | ||
| def link_tabix_cmd = make_index && infile_tbi ? "ln -s ${infile_tbi} ${outfile}.${infile_tbi.extension}" : "" | ||
| """ | ||
| echo ${args} | ||
|
|
||
| ${touch_cmd} > ${outfile} | ||
|
|
||
| ${tabix_cmd} | ||
| ${link_tabix_cmd} | ||
| """ | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| name: "htslib_bgziptabix" | ||
| description: "Multi-purpose module to compress, decompress and index files using bgzip | ||
| and tabix." | ||
| keywords: | ||
| - compress | ||
| - decompress | ||
| - index | ||
| - bgzip | ||
| - tabix | ||
| - gzip | ||
| - bzip | ||
| - xz | ||
| tools: | ||
| - "htslib": | ||
| description: "C library for high-throughput sequencing data formats." | ||
| homepage: "http://www.htslib.org/" | ||
| documentation: "http://www.htslib.org/doc/" | ||
| tool_dev_url: "https://github.com/samtools/htslib" | ||
| doi: "10.1093/gigascience/giab007" | ||
| licence: | ||
| - "MIT" | ||
| identifier: biotools:htslib | ||
| input: | ||
| - - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'sample1' ] | ||
| - infile: | ||
| type: file | ||
| description: Input file to compress or decompress | ||
| pattern: "*" | ||
| ontologies: [] | ||
| - infile_tbi: | ||
| type: file | ||
| description: Optional tabix index for the input file. | ||
| pattern: "*.{tbi,csi}" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3616 # tabix | ||
| - regions: | ||
| type: file | ||
| description: Optional file of regions to extract (BED or chr:start-end format). | ||
| Only used when creating an index for the output file. | ||
| pattern: "*.{bed,txt,tsv}" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3475 # TSV | ||
| - edam: http://edamontology.org/format_3003 # BED | ||
| - action: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not 100% sure if I like the approach that one module does both directions, i feel like having one compress and one decompress module would make it a bit easier to reuse but I see the maintenance burden again.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a philosophical question, we can discuss it with the others on Slack. It was Maxime's idea to merge everything into one module and I think it makes sense maintenance-wise (without putting that much burden on pipeline developers). |
||
| type: string | ||
| description: Action to perform, either `compress` or `decompress` | ||
| - make_index: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would argue that we want this index to always be created automatically and that its just either used then in the pipeline downstream with some kind of .join or ignored. But in most cases I think the index is needed rather than not needed.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to remove this option, but it causes issues in cases where the format is legitimate bgzip input, but not supported by tabix (i.e. any non-tabular file). I know it's a minor use case, but I see no other way to make it a comprehensive bgzip module. |
||
| type: boolean | ||
| description: Whether to create a tabix index for the output file; only used | ||
| if `action` is `compress` | ||
| - out_ext: | ||
| type: string | ||
| description: Output file extension without `.gz` suffix (for example `vcf`) | ||
| output: | ||
| output: | ||
| - - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'sample1' ] | ||
| - ${outfile}: | ||
| type: file | ||
| description: Compressed or decompressed output file | ||
| pattern: "*" | ||
| ontologies: [] | ||
| index: | ||
| - - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'sample1' ] | ||
| - ${outfile}.{tbi,csi}: | ||
| type: file | ||
| description: Tabix index file for the compressed output file | ||
| pattern: "*.{tbi,csi}" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3616 # tabix | ||
| versions_htslib: | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - htslib: | ||
| type: string | ||
| description: The name of the tool | ||
| - bgzip --version | sed '1! d; s/bgzip (htslib) //': | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| versions_xz: | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - xz: | ||
| type: string | ||
| description: The name of the tool | ||
| - xz --version | sed '1! d; s/xz (XZ Utils) //': | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| topics: | ||
| versions: | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - htslib: | ||
| type: string | ||
| description: The name of the tool | ||
| - bgzip --version | sed '1! d; s/bgzip (htslib) //': | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - xz: | ||
| type: string | ||
| description: The name of the tool | ||
| - xz --version | sed '1! d; s/xz (XZ Utils) //': | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| authors: | ||
| - "@itrujnara" | ||
| maintainers: | ||
| - "@itrujnara" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use
on this file to clean this up nicely. (mainly setting
{}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, it broke Harshil alignment though