Skip to content

tabix/tabix: update module to support region-based VCF extraction#11064

Open
luisas wants to merge 11 commits intonf-core:masterfrom
luisas:tabix-tabix-update
Open

tabix/tabix: update module to support region-based VCF extraction#11064
luisas wants to merge 11 commits intonf-core:masterfrom
luisas:tabix-tabix-update

Conversation

@luisas
Copy link
Copy Markdown
Contributor

@luisas luisas commented Mar 27, 2026

Add support for region-based VCF extraction

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

luisas added 8 commits March 27, 2026 09:47
- Reverts tabix/tabix to its original indexing behaviour
- New tabix/extract module: takes a bgzipped VCF + tbi + optional regions file
- Outputs bgzipped VCF (always) and tbi index (optional, controlled by create_index input)
- Follows samtools/view pattern for optional outputs
Always runs extraction + bgzip + tabix index. Both vcf and tbi outputs
are optional: true — callers use whichever channels they need.
- Drop tabix/extract — functionality merged into tabix/tabix
- New input: optional regions file (pass [] to use indexing mode)
- New input: index path (required for extraction mode)
- index output: optional, emitted in indexing mode
- vcf output: optional, emitted in extraction mode
- tbi output: optional, emitted in extraction mode
@luisas luisas requested a review from maxulysse as a code owner March 27, 2026 13:53
Copy link
Copy Markdown
Contributor

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I started this review but then I realized that this is in my eyes a different module. Maybe @nf-core/maintainers have the same opinion but I would rather have this separate because from tabix/tabix I just expect indexing and I would otherwise call it tabix/extract or something similar.

Edit: I just saw that that was you original idea and now I am a little confused what led to the change :D

Comment on lines +11 to +12
tuple val(meta), path(tab)
tuple val(meta2), path(regions)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this into one input tuple :)

Suggested change
tuple val(meta), path(tab)
tuple val(meta2), path(regions)
tuple val(meta), path(tab), path(regions)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are actually 2 different files that I believe conceptually work better in 2 tuples: the first one is the input file (with its own meta) and the second one is the subsetting instructions file (also should probably have its own meta) - do you think I could leave it separate?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One might want to do sample specific subsetting, which is much easier with 1 tuple.

It's easier to join inputs then it is to pass two inputs together in sync 😉

This is one of the reasons we are moving to a flatter input structure.

output:
tuple val(meta), path("*.{tbi,csi}"), emit: index
tuple val(meta), path("*.{tbi,csi}"), emit: index, optional: true
tuple val(meta), path("${prefix}.vcf"), emit: vcf, optional: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we emit this bgzipped please? :)

Suggested change
tuple val(meta), path("${prefix}.vcf"), emit: vcf, optional: true
tuple val(meta), path("${prefix}.vcf.gz"), emit: vcf, optional: true

@luisas
Copy link
Copy Markdown
Contributor Author

luisas commented Mar 27, 2026

Hi @famosab, thanks for starting the review!

I started doing it like tabix_extract but then I had a chat with @FriederikeHanssen on how to handle these modules that have different optional outputs depending on the input flags and because of maintainance burdens we thought it would be better to have it in one module? Similar to samtools/view

I am torn tbh, I am also liking the tabix/extract version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants