Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
934acb9
feat(htslib_bgziptabix): add module for bgzip compression and decompr…
itrujnara May 7, 2026
e22514b
feat(htslib_bgziptabix): update environment and main process for impr…
itrujnara May 7, 2026
10752d9
feat(tabix_bgzip): deprecate TABIX/BGZIP
itrujnara May 7, 2026
eab5fd4
feat: update references from TABIX_BGZIP to HTSLIB_BGZIPTABIX across …
itrujnara May 7, 2026
1abe38a
feat(tabix_bgziptabix): deprecate TABIX/BGZIPTABIX
itrujnara May 7, 2026
83b1596
feat: refactor input assignment in test processes for consistency
itrujnara May 7, 2026
3b15095
feat: update test snapshots and fix timestamp formatting in VCF annot…
itrujnara May 8, 2026
cb0ef77
feat: deprecate SAMTOOLS_BGZIP
itrujnara May 8, 2026
1f0b792
feat: Refactor and deprecate tabix module in favor of HTSLIB/BGZIPTABIX
itrujnara May 8, 2026
d9d8010
feat: update VCF processing workflows to improve output handling and …
itrujnara May 8, 2026
df9984d
Merge branch 'master' into tabixupdate
itrujnara May 8, 2026
64170ab
feat: update tabix command logic and add test for VCF compression wit…
itrujnara May 8, 2026
7c60cfd
feat: update input file definitions and add optional regions file sup…
itrujnara May 8, 2026
e7dea71
Merge branch 'tabixupdate' of https://github.com/itrujnara/modules in…
itrujnara May 8, 2026
74279c7
feat: update output file definitions to use consistent naming for com…
itrujnara May 8, 2026
4c4cf85
fix: Add missing module information in subworkflow meta.yml
itrujnara May 8, 2026
7cb1de4
Merge branch 'master' into tabixupdate
itrujnara May 8, 2026
03f2444
fix: update samtools version to 1.23.1 in environment.yml
itrujnara May 8, 2026
6e29c48
Merge branch 'tabixupdate' of https://github.com/itrujnara/modules in…
itrujnara May 8, 2026
3ef74c5
test: Update snapshot for bam_variant_calling_mpileup_bcftools
itrujnara May 8, 2026
7db504a
fix: update container image and samtools version in PARAPHASE process
itrujnara May 8, 2026
cd93d55
fix (multivcfanalyzer): update htslib version to 1.23.1 and adjust te…
itrujnara May 8, 2026
69f07c3
fix(paraphase): standardize quotes in gzip command for consistency
itrujnara May 8, 2026
115d7b7
Merge branch 'master' into tabixupdate
itrujnara May 11, 2026
1747b37
fix(htslib_bgziptabix): improve string interpolation and update descr…
itrujnara May 12, 2026
5bfd88a
Merge branch 'tabixupdate' of https://github.com/itrujnara/modules in…
itrujnara May 12, 2026
fddf258
fix: fix linting in htslib meta nd paraphase stub
itrujnara May 12, 2026
f0e63d8
Merge branch 'master' into tabixupdate
itrujnara May 12, 2026
a5a8ec9
test: update snapshot for htslib/bgziptabix
itrujnara May 12, 2026
8762946
Merge branch 'tabixupdate' of https://github.com/itrujnara/modules in…
itrujnara May 12, 2026
a8f7621
test: update snapshot for htslib/bgziptabix
itrujnara May 12, 2026
dfd06a9
Merge branch 'master' into tabixupdate
itrujnara May 13, 2026
d15e7c7
Merge branch 'master' into tabixupdate
itrujnara May 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions modules/nf-core/bandage/image/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ nextflow_process {
tag "modules_nfcore"
tag "bandage"
tag "bandage/image"
tag "tabix/bgzip"
tag "htslib/bgziptabix"

test("test-bandage-image") {

Expand Down Expand Up @@ -41,14 +41,19 @@ nextflow_process {
test("test-bandage-image - gzip") {

setup {
run("TABIX_BGZIP"){
script "../../../tabix/bgzip/main.nf"
run("HTSLIB_BGZIPTABIX"){
script "../../../htslib/bgziptabix/main.nf"
process {
"""
input[0] = [
[ id:'B-3106' ], // meta map
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/gfa/assembly.gfa', checkIfExists: true)
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/gfa/assembly.gfa', checkIfExists: true),
[],
[]
]
input[1] = "compress"
input[2] = false
input[3] = "gfa"
"""
}
}
Expand All @@ -57,7 +62,7 @@ nextflow_process {
when {
process {
"""
input[0] = TABIX_BGZIP.out.output
input[0] = HTSLIB_BGZIPTABIX.out.output
"""
}
}
Expand Down
15 changes: 10 additions & 5 deletions modules/nf-core/custom/geneticmapconvert/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ nextflow_process {
tag "modules_nfcore"
tag "custom"
tag "custom/geneticmapconvert"
tag "tabix/bgzip"
tag "htslib/bgziptabix"

test("Convert map with pos\\tchr\\tcm - with header - meta.chr (glimpse format)") {
when {
Expand Down Expand Up @@ -44,14 +44,19 @@ nextflow_process {

test("Convert map with pos\\tchr\\tcM - with header - meta.chr (glimpse compressed format)") {
setup {
run("TABIX_BGZIP"){
script "../../../tabix/bgzip/main.nf"
run("HTSLIB_BGZIPTABIX"){
script "../../../htslib/bgziptabix/main.nf"
process {
"""
input[0] = [
[id: "test", chr:"chr21"],
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genetic_map/genome.GRCh38.chr21.glimpse.map', checkIfExists: true)
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genetic_map/genome.GRCh38.chr21.glimpse.map', checkIfExists: true),
[],
[]
]
input[1] = "compress"
input[2] = false
input[3] = "map"
"""
}
}
Expand All @@ -62,7 +67,7 @@ nextflow_process {
}
process {
"""
input[0] = TABIX_BGZIP.out.output
input[0] = HTSLIB_BGZIPTABIX.out.output
"""
}
}
Expand Down
8 changes: 8 additions & 0 deletions modules/nf-core/htslib/bgziptabix/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
channels:
- conda-forge
- bioconda
dependencies:
- "bioconda::htslib=1.23.1"
- "conda-forge::xz=5.8.3"
88 changes: 88 additions & 0 deletions modules/nf-core/htslib/bgziptabix/main.nf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use

nextflow lint -format -sort-declarations -spaces 4 -harshil-alignment

on this file to clean this up nicely. (mainly setting {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, it broke Harshil alignment though

Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
process HTSLIB_BGZIPTABIX {
tag "${meta.id}"
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${workflow.containerEngine in ['singularity', 'apptainer'] && !task.ext.singularity_pull_docker_container
? 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/33/33a1f2c7f36ec58339e41cbea096d121f606918778a91cfbef944b40ba7ce48b/data'
: 'community.wave.seqera.io/library/htslib_xz:49c8c84af5c4b3b9'}"

input:
tuple val(meta), path(infile), path(infile_tbi), path(regions)
val action
val make_index
val out_ext
Comment on lines +12 to +14
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason to have them all separated in channels? I feel like they could be in a tuple but thats not a strong opinion. in the pipeline i would control these values via the meta map anyway and set this then in the modules config I think

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the practice with the most support in the community is to have this type of "setting" inputs is to have them as separate val channels. Some (e.g. Simon) would push much of this into the config, but it's not fully accepted as it stands.


output:
tuple val(meta), path("${outfile}"), emit: output
tuple val(meta), path("${outfile}.{tbi,csi}"), emit: index, optional: true
// all htslib tools have the same version, we use bgzip
tuple val("${task.process}"), val('htslib'), eval("bgzip --version | sed '1! d; s/bgzip (htslib) //'"), topic: versions, emit: versions_htslib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add the xz version here as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tuple val("${task.process}"), val('xz'), eval("xz --version | sed '1! d; s/xz (XZ Utils) //'"), topic: versions, emit: versions_xz

when:
task.ext.when == null || task.ext.when

script:
def allowed_actions = ["compress", "decompress"]
if (action !in allowed_actions) {
error("htslib/bgziptabix: Invalid action: ${action}. Allowed actions are: ${allowed_actions.join(', ')}")
}

if (action == "decompress" && make_index) {
log.warn("htslib/bgziptabix: Cannot create index when decompressing. Ignoring make_index option.")
}

def args = task.ext.args ?: ''
def args2 = task.ext.args2 ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
outfile = action == "compress" ? (out_ext ? "${prefix}.${out_ext}.gz" : "${prefix}.gz") : (out_ext ? "${prefix}.${out_ext}" : "${prefix}")

def compress_cmd = action == "compress" ? "bgzip -c ${args} -@ ${task.cpus}" : "cat"
def bgzip_cmd = action == "compress" ? "[ '\$(basename ${infile})' != '\$(basename ${outfile})' ] && ln -s ${infile} ${outfile}" : "bgzip -c -d ${args} -@ ${task.cpus} ${infile} > ${outfile}"

def regions_arg = regions ? "-R ${regions}" : ""
def tabix_cmd = (make_index && !infile_tbi) ? "tabix -@ ${task.cpus} ${regions_arg} ${args2} -f ${outfile}" : ""
def link_tabix_cmd = make_index && infile_tbi ? "ln -s ${infile_tbi} ${outfile}.${infile_tbi.extension}" : ""
def uncompressed_cmd = action == "compress" ? "${compress_cmd} ${infile} > ${outfile}" : (infile.getName() == outfile ? "" : "ln -s ${infile} ${outfile}")
"""
${link_tabix_cmd}

FILE_TYPE=\$(htsfile ${infile})

case "\$FILE_TYPE" in
*BGZF-compressed*)
${bgzip_cmd} ;;
*gzip-compressed*)
[ "\$(basename ${infile})" == "\$(basename ${outfile})" ] && echo "Input and output names cannot be the same" && exit 1
zcat ${infile} | ${compress_cmd} > ${outfile} ;;
*bzip2-compressed*)
bzcat ${infile} | ${compress_cmd} > ${outfile} ;;
*XZ-compressed*)
xzcat ${infile} | ${compress_cmd} > ${outfile} ;;
*)
${uncompressed_cmd} ;;
esac

${tabix_cmd}
"""

stub:
def args = task.ext.args ?: ''
def args2 = task.ext.args2 ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
outfile = action == "compress" ? (out_ext ? "${prefix}.${out_ext}.gz" : "${prefix}.gz") : (out_ext ? "${prefix}.${out_ext}" : "${prefix}")

def touch_cmd = action == "compress" ? "echo | bgzip -c" : "echo"
def index_fmt = args2.contains('-C') ? 'csi' : 'tbi'
def tabix_cmd = make_index ? "touch ${outfile}.${index_fmt}" : ""
def link_tabix_cmd = make_index && infile_tbi ? "ln -s ${infile_tbi} ${outfile}.${infile_tbi.extension}" : ""
"""
echo ${args}

${touch_cmd} > ${outfile}

${tabix_cmd}
${link_tabix_cmd}
"""
}
125 changes: 125 additions & 0 deletions modules/nf-core/htslib/bgziptabix/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
name: "htslib_bgziptabix"
description: "Multi-purpose module to compress, decompress and index files using bgzip
and tabix."
keywords:
- compress
- decompress
- index
- bgzip
- tabix
- gzip
- bzip
- xz
tools:
- "htslib":
description: "C library for high-throughput sequencing data formats."
homepage: "http://www.htslib.org/"
documentation: "http://www.htslib.org/doc/"
tool_dev_url: "https://github.com/samtools/htslib"
doi: "10.1093/gigascience/giab007"
licence:
- "MIT"
identifier: biotools:htslib
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'sample1' ]
- infile:
type: file
description: Input file to compress or decompress
pattern: "*"
ontologies: []
- infile_tbi:
type: file
description: Optional tabix index for the input file.
pattern: "*.{tbi,csi}"
ontologies:
- edam: http://edamontology.org/format_3616 # tabix
- regions:
type: file
description: Optional file of regions to extract (BED or chr:start-end format).
Only used when creating an index for the output file.
pattern: "*.{bed,txt,tsv}"
ontologies:
- edam: http://edamontology.org/format_3475 # TSV
- edam: http://edamontology.org/format_3003 # BED
- action:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure if I like the approach that one module does both directions, i feel like having one compress and one decompress module would make it a bit easier to reuse but I see the maintenance burden again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a philosophical question, we can discuss it with the others on Slack. It was Maxime's idea to merge everything into one module and I think it makes sense maintenance-wise (without putting that much burden on pipeline developers).

type: string
description: Action to perform, either `compress` or `decompress`
- make_index:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that we want this index to always be created automatically and that its just either used then in the pipeline downstream with some kind of .join or ignored. But in most cases I think the index is needed rather than not needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to remove this option, but it causes issues in cases where the format is legitimate bgzip input, but not supported by tabix (i.e. any non-tabular file). I know it's a minor use case, but I see no other way to make it a comprehensive bgzip module.

type: boolean
description: Whether to create a tabix index for the output file; only used
if `action` is `compress`
- out_ext:
type: string
description: Output file extension without `.gz` suffix (for example `vcf`)
output:
output:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'sample1' ]
- ${outfile}:
type: file
description: Compressed or decompressed output file
pattern: "*"
ontologies: []
index:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'sample1' ]
- ${outfile}.{tbi,csi}:
type: file
description: Tabix index file for the compressed output file
pattern: "*.{tbi,csi}"
ontologies:
- edam: http://edamontology.org/format_3616 # tabix
versions_htslib:
- - ${task.process}:
type: string
description: The name of the process
- htslib:
type: string
description: The name of the tool
- bgzip --version | sed '1! d; s/bgzip (htslib) //':
type: eval
description: The expression to obtain the version of the tool
versions_xz:
- - ${task.process}:
type: string
description: The name of the process
- xz:
type: string
description: The name of the tool
- xz --version | sed '1! d; s/xz (XZ Utils) //':
type: eval
description: The expression to obtain the version of the tool
topics:
versions:
- - ${task.process}:
type: string
description: The name of the process
- htslib:
type: string
description: The name of the tool
- bgzip --version | sed '1! d; s/bgzip (htslib) //':
type: eval
description: The expression to obtain the version of the tool
- - ${task.process}:
type: string
description: The name of the process
- xz:
type: string
description: The name of the tool
- xz --version | sed '1! d; s/xz (XZ Utils) //':
type: eval
description: The expression to obtain the version of the tool
authors:
- "@itrujnara"
maintainers:
- "@itrujnara"
Loading
Loading