tabix/tabix: update module to support region-based VCF extraction#11064
tabix/tabix: update module to support region-based VCF extraction#11064luisas wants to merge 11 commits intonf-core:masterfrom
Conversation
- Reverts tabix/tabix to its original indexing behaviour - New tabix/extract module: takes a bgzipped VCF + tbi + optional regions file - Outputs bgzipped VCF (always) and tbi index (optional, controlled by create_index input) - Follows samtools/view pattern for optional outputs
Always runs extraction + bgzip + tabix index. Both vcf and tbi outputs are optional: true — callers use whichever channels they need.
- Drop tabix/extract — functionality merged into tabix/tabix - New input: optional regions file (pass [] to use indexing mode) - New input: index path (required for extraction mode) - index output: optional, emitted in indexing mode - vcf output: optional, emitted in extraction mode - tbi output: optional, emitted in extraction mode
There was a problem hiding this comment.
Ok I started this review but then I realized that this is in my eyes a different module. Maybe @nf-core/maintainers have the same opinion but I would rather have this separate because from tabix/tabix I just expect indexing and I would otherwise call it tabix/extract or something similar.
Edit: I just saw that that was you original idea and now I am a little confused what led to the change :D
modules/nf-core/tabix/tabix/main.nf
Outdated
| tuple val(meta), path(tab) | ||
| tuple val(meta2), path(regions) |
There was a problem hiding this comment.
Please make this into one input tuple :)
| tuple val(meta), path(tab) | |
| tuple val(meta2), path(regions) | |
| tuple val(meta), path(tab), path(regions) |
There was a problem hiding this comment.
They are actually 2 different files that I believe conceptually work better in 2 tuples: the first one is the input file (with its own meta) and the second one is the subsetting instructions file (also should probably have its own meta) - do you think I could leave it separate?
There was a problem hiding this comment.
One might want to do sample specific subsetting, which is much easier with 1 tuple.
It's easier to join inputs then it is to pass two inputs together in sync 😉
This is one of the reasons we are moving to a flatter input structure.
modules/nf-core/tabix/tabix/main.nf
Outdated
| output: | ||
| tuple val(meta), path("*.{tbi,csi}"), emit: index | ||
| tuple val(meta), path("*.{tbi,csi}"), emit: index, optional: true | ||
| tuple val(meta), path("${prefix}.vcf"), emit: vcf, optional: true |
There was a problem hiding this comment.
Can we emit this bgzipped please? :)
| tuple val(meta), path("${prefix}.vcf"), emit: vcf, optional: true | |
| tuple val(meta), path("${prefix}.vcf.gz"), emit: vcf, optional: true |
|
Hi @famosab, thanks for starting the review! I started doing it like tabix_extract but then I had a chat with @FriederikeHanssen on how to handle these modules that have different optional outputs depending on the input flags and because of maintainance burdens we thought it would be better to have it in one module? Similar to samtools/view I am torn tbh, I am also liking the tabix/extract version. |
Add support for region-based VCF extraction
PR checklist
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda