Skip to content

Variant Medium Nextflow Pipeline#3

Merged
ozlemmuslu merged 25 commits intoTRON-Bioinformatics:mainfrom
khersameesh24:quick-nextflow-pipeline
Jan 8, 2026
Merged

Variant Medium Nextflow Pipeline#3
ozlemmuslu merged 25 commits intoTRON-Bioinformatics:mainfrom
khersameesh24:quick-nextflow-pipeline

Conversation

@khersameesh24
Copy link
Copy Markdown
Member

@khersameesh24 khersameesh24 commented Dec 14, 2025

  • nextflow modules for variant filtering, variant calling (snv/indel)
  • singularity support for variantmedium modules
  • samplesheet (csv/tsv)
  • src/ moved to bin/ to fix relative imports
  • some nf-core standards implemented but would need more refinements later
  • variantmedium.sh script - launcher for the pipeline - run info with bash variantmedium.sh --help - updated in README

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive Nextflow pipeline for the VariantMedium somatic variant caller, enabling automated execution of SNV and INDEL calling workflows with support for both conda and singularity environments.

Key changes:

  • Nextflow DSL2 modules and workflows for variant filtering and calling
  • Bash launcher script (variantmedium.sh) for simplified pipeline execution
  • Code reorganization from src/ to bin/ to fix relative imports
  • Samplesheet-based input handling (CSV/TSV formats)

Reviewed changes

Copilot reviewed 33 out of 71 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
workflows/variantmedium_call_variants.nf Implements SNV/INDEL variant calling workflow with DenseNet models
workflows/variantmedium_filter_candidates.nf Orchestrates ExtraTrees-based candidate filtering for SNV/INDEL
workflows/variantmedium_stage_data.nf Stages reference data and ML models
workflows/variantmedium_prepare_inputs.nf Prepares input TSV files from samplesheet
variantmedium.sh Bash launcher script orchestrating all 8 pipeline steps
main.nf Entry point coordinating workflow execution based on execution_step parameter
nextflow.config Configuration with conda/singularity profiles and parameter definitions
conf/modules.config Per-module configuration including publish directories
conf/base.config Process resource labels and defaults
subworkflows/data_staging/main.nf Stages references and models through dedicated modules
subworkflows/parse_samplesheet/main.nf Validates and parses input samplesheet
subworkflows/parameter_validation/main.nf Validates required parameters
modules/variantmedium/filter/main.nf ExtraTrees filtering module
modules/variantmedium/call/main.nf DenseNet variant calling module
modules/prepare_inputs/main.nf Generates TSV files for downstream tools
modules/stage_refs/main.nf Downloads and stages reference data
modules/stage_models/main.nf Downloads and verifies ML model weights
bin/prepare_input_files.py Python script to generate pipeline input TSVs
bin/filter_candidates.py Python script for candidate filtering
bin/run_variant_medium.py Entry point for variant calling
bin/src/* Refactored Python source code with fixed relative imports
README.md Updated with new pipeline launcher documentation
Comments suppressed due to low confidence (1)

README.md:80

  • There's a spelling error in "envirnments". It should be "environments".
reference genome and S07604624 SureSelect Human All Exon V6+UTR from UCSC if you need them.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread workflows/variantmedium_call_variants.nf Outdated
Comment thread conf/modules.config Outdated
Comment thread main.nf Outdated
Comment thread subworkflows/parse_samplesheet/main.nf Outdated
Comment thread subworkflows/data_staging/main.nf Outdated
Comment thread modules/stage_refs/main.nf Outdated
Comment thread modules/stage_models/main.nf Outdated
Comment thread modules/variantmedium/filter/main.nf Outdated
Comment thread modules/variantmedium/call/main.nf
Comment thread modules/prepare_inputs/main.nf
Comment thread README.md Outdated
VariantMedium is a deep learning-based somatic variant caller for matched tumor-normal short-read sequencing data. It integrates machine learning–based filtering and 3D convolutional neural networks to classify candidate sites as somatic, germline, or non-variant, with high sensitivity and robustness across diverse genomic contexts and sample types.

## Dependencies
## Dependencies (handleled in the module environments)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the modules now have a environment.yml file from which the env is built on-the-go we do not need any prior installations

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't even need Nextflow installation?

Comment thread README.md
- conda >= 4.4 (miniconda >=23.11.0 recommended)
- CUDA 11.4 (optional for GPU support)

## Installation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user would still need to clone the repository, or?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to clone the repo as well, the pipeline can be run as nextflow run TRON-Bioinformatics/VariantMedium [--options]

Comment thread README.md
--profile STRING Nextflow profile name (conda, singularity) [default: conda]
[Parts of the pipeline may not support singularity - Prefer using conda]
OPTIONAL ARGUMENTS:
--config PATH Path to custom config file (.conf)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameters in the config file are essential for running the pipeline, this is a required argument

Comment thread README.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a very imple minimal example usage for basic users that come with their bam files and want VCFs with variant calls. Only the minimum needed steps should be as simple as possible explained.

All further detail can come in later section of the README or in a separate documentation.

Comment thread modules/variantmedium/call/main.nf Outdated

stub:
"""
touch fake.somatic_snv.VariantMedium.tsv
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these files for?

Comment thread nextflow.config Outdated
reference_dir = 'ref_data' // directory name for reference data
models_dir = 'models' // directory name for trained models

// call
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, some of the parameters (learning rate, drop rate, possibly epoch) below are not updated during inference stage, and are only there for data ops. The others (aug_rate and aug_mixes) might really impact the results and should not be changed (esp. aug_rate, aug_mixes is also mostly there for tracking)

could you add a note, that they are not to be changed

Comment thread nextflow.config Outdated

manifest {
name = 'TRON-Bioinformatics/variantmedium'
author = 'Ozlem Muslu, Jonas Ibn-Salem, Shaya Akbarinejad, Luis Kress'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your name as well :)

Comment thread nextflow.config Outdated
nextflowVersion = '>=24.10.3'
version = VERSION
doi = DOI
version = '1.1.0'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the version to 1.1.1

Comment thread variantmedium.sh Outdated
#---------------------------------------

TSV_FOLDER="${OUTDIR}/tsv_folder"
REF_DIR="${OUTDIR}/data_staging/ref_data"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables on lines 171-174 should be pulled from the config file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour (setting a default and updating with config later) is risky because user might think they don't need to input these files, but have BAMs aligned to other versions of the reference genome

Comment thread README.md Outdated
VariantMedium is a deep learning-based somatic variant caller for matched tumor-normal short-read sequencing data. It integrates machine learning–based filtering and 3D convolutional neural networks to classify candidate sites as somatic, germline, or non-variant, with high sensitivity and robustness across diverse genomic contexts and sample types.

## Dependencies
## Dependencies (handleled in the module environments)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't even need Nextflow installation?

touch fake.all_scores_somatic_indel.tsv
touch fake.all_scores_germline_indel.tsv
touch sample.somatic_snv.VariantMedium.tsv
touch sample.germline_snv.VariantMedium.tsv
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we have the files with germline here, VariantMedium is not designed for germline variant calling, and I have it only for experimental reasons

@ozlemmuslu ozlemmuslu merged commit 05685c9 into TRON-Bioinformatics:main Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants