Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions subworkflows/nf-core/snpclustering/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IMPORT NF-CORE MODULES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

include { BCFTOOLS_FILTER } from '../../../modules/nf-core/bcftools/filter/main'
include { PLINK2_INDEP_PAIRWISE } from '../../../modules/nf-core/plink2/indeppairwise/main'
include { PLINK2_RECODE_VCF } from '../../../modules/nf-core/plink2/recodevcf/main'
include { FLASHPCA2 } from '../../../modules/nf-core/flashpca2/main'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SUBWORKFLOW
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

workflow SNPCLUSTERING {
take:
meta
vcf
vcf_index
maf
missing

main:
versions = Channel.empty()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for each module if they still export the versions I think at least bcftools/filter does not anymore


BCFTOOLS_FILTER ( vcf.join(vcf_index), maf, missing )
versions = versions.mix(BCFTOOLS_FILTER.out.versions.first())

PLINK2_INDEP_PAIRWISE ( BCFTOOLS_FILTER.out.vcf )
versions = versions.mix(PLINK2_INDEP_PAIRWISE.out.versions.first())

PLINK2_RECODE_VCF ( PLINK2_INDEP_PAIRWISE.out.pgen )
versions = versions.mix(PLINK2_RECODE_VCF.out.versions.first())

FLASHPCA2 ( PLINK2_RECODE_VCF.out.vcf )
versions = versions.mix(FLASHPCA2.out.versions.first())

// TODO: qui aggiungeremo KMeans/DBSCAN/plot quando creeremo i moduli local
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still something to add?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment @famosab .

You’re absolutely right — the clustering components (KMeans, DBSCAN), internal validation metrics (Silhouette, Calinski–Harabasz, Davies–Bouldin), non-linear embeddings (t-SNE/UMAP), and the final HTML report still need to be integrated.

These features are already implemented in the original pipeline (https://github.com/dbaku42/nf-core-snpclustering). I intentionally left them out of this PR to keep the subworkflow minimal and easier to review.

I’m happy to proceed in either of the following ways:

  1. Include all these components directly in this PR (my preferred option), or
  2. Add them in a dedicated follow-up PR immediately after this one is merged.

Please let me know which approach you’d prefer.

Thanks again!


emit:
cluster_labels = Channel.empty() // placeholder
metrics = Channel.empty() // placeholder
plots = Channel.empty()
versions = versions
}
60 changes: 60 additions & 0 deletions subworkflows/nf-core/snpclustering/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/nf-core/meta-schema.json

name: "snpclustering"
description: "End-to-end unsupervised clustering of genomic samples starting from multi-sample VCF files. Performs variant filtering (MAF + missingness), optional LD pruning, PCA (FlashPCA2 or IncrementalPCA), KMeans/DBSCAN clustering and internal validation."
keywords:
- genomics
- clustering
- unsupervised clustering
- VCF
- nf-core
authors:
- "Donald Baku (@dbaku42)"
components:
- bcftools/filter
- plink2/indep/pairwise
- plink2/recode/vcf
- plink2/indeppairwise
- plink2/recodevcf
- flashpca2
input:
- meta:
type: map
description: "Groovy Map containing sample metadata"
- vcf:
type: file
description: "Multi-sample VCF file (bgzipped and indexed)"
pattern: "*.vcf.gz"
- vcf_index:
type: file
description: "Index of the VCF file (.tbi or .csi)"
pattern: "*.{tbi,csi}"
- maf:
type: float
description: "Minimum minor allele frequency threshold"
default: 0.01
- missing:
type: float
description: "Maximum missingness threshold"
default: 0.10
output:
- meta:
type: map
description: "Groovy Map containing sample metadata"
- cluster_labels:
type: file
description: "CSV file with per-sample cluster assignments"
pattern: "cluster_labels.csv"
- metrics:
type: file
description: "Table with all cluster quality metrics"
pattern: "*_metrics.tsv"
- plots:
type: file
description: "Directory containing publication-ready plots"
pattern: "plots/"
- versions:
type: file
description: "File containing versions of all tools used"
pattern: "versions.yml"
34 changes: 34 additions & 0 deletions subworkflows/nf-core/snpclustering/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
nextflow_workflow {

name "Test Workflow SNPCLUSTERING"
script "../main.nf"
workflow "SNPCLUSTERING"
config "./nextflow.config"

tag "subworkflows"
tag "subworkflows_nfcore"
tag "subworkflows/snpclustering"
tag "bcftools/filter"
tag "plink2/indeppairwise"
tag "plink2/recodevcf"
tag "flashpca2"

test("vcf.gz input") {

when {
workflow {
"""
input[0] = [ id:'test' ]
input[1] = file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/vcf/test.vcf.gz', checkIfExists: true)
input[2] = file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/vcf/test.vcf.gz.tbi', checkIfExists: true)
input[3] = 0.01
input[4] = 0.10
"""
}
}

then {
assert workflow.success
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want a snapshot here (look at other subworkflows)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test now passes with direct nf-test. The failure with nf-core subworkflows test is due to a temporary missing Wave container for the plink2/vcf module (manifest unknown). The logic and snapshot are correct.

}
}
}
2 changes: 2 additions & 0 deletions subworkflows/nf-core/snpclustering/tests/tags.yml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
subworkflows/snpclustering:
- subworkflows/nf-core/snpclustering/**