Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conf/dev.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ process {

// ONSITE uses its own container (not OpenMS thirdparty)
withName: '.*:PHOSPHO_SCORING:ONSITE' {
container = {"${ ( workflow.containerEngine == 'singularity' || workflow.containerEngine == 'apptainer' ) && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/pyonsite:0.0.1--pyhdfd78af_0' : 'quay.io/biocontainers/pyonsite:0.0.1--pyhdfd78af_0' }"}
container = {"${ ( workflow.containerEngine == 'singularity' || workflow.containerEngine == 'apptainer' ) && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/pyonsite:0.0.2--pyhdfd78af_0' : 'quay.io/biocontainers/pyonsite:0.0.2--pyhdfd78af_0' }"}
}
}
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@
"https://github.com/bigbio/nf-modules.git": {
"modules": {
"bigbio": {
"onsite": {
"branch": "main",
"git_sha": "e4f16df2acb2c084a5043840212d0fb52f69fbbc",
"installed_by": ["modules"]
},
"thermorawfileparser": {
"branch": "main",
"git_sha": "a1a4a11ff508b2b5c23c9fb21c51c3327b748d4d",
Expand Down
7 changes: 7 additions & 0 deletions modules/bigbio/onsite/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: onsite
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- bioconda::pyonsite=0.0.2
126 changes: 126 additions & 0 deletions modules/bigbio/onsite/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
process ONSITE {
tag "$meta.mzml_id"
label 'process_medium'
label 'onsite'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/pyonsite:0.0.2--pyhdfd78af_0' :
'quay.io/biocontainers/pyonsite:0.0.2--pyhdfd78af_0' }"

input:
tuple val(meta), path(mzml_file), path(id_file)

output:
tuple val(meta), path("${id_file.baseName}_*.idXML"), emit: ptm_in_id_onsite
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output pattern uses a wildcard "${id_file.baseName}_.idXML" which is less specific than the previous implementation that used "${id_file.baseName}onsite.idXML". This could potentially match unintended files if other processes create files with similar naming patterns. Consider making the pattern more specific to avoid ambiguity.

Suggested change
tuple val(meta), path("${id_file.baseName}_*.idXML"), emit: ptm_in_id_onsite
tuple val(meta), path("${id_file.baseName}_${params.onsite_algorithm ?: 'lucxor'}.idXML"), emit: ptm_in_id_onsite

Copilot uses AI. Check for mistakes.
path "versions.yml", emit: versions
path "*.log", emit: log

script:
def args = task.ext.args ?: ''
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'args' is defined from task.ext.args but is never used in the command construction. This suggests either the variable should be removed, or it should be incorporated into the algorithm commands to allow external configuration via task.ext.args.

Copilot uses AI. Check for mistakes.
def prefix = task.ext.prefix ?: "${meta.mzml_id}"
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'prefix' is defined but never used in the script. Consider removing this unused variable or utilizing it in the output file naming if that was the intent.

Suggested change
def prefix = task.ext.prefix ?: "${meta.mzml_id}"

Copilot uses AI. Check for mistakes.

// Algorithm selection: lucxor (default), ascore, or phosphors
def algorithm = params.onsite_algorithm ?: 'lucxor'

// Common parameters for all algorithms
def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.05'
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default fragment_tolerance has changed from 0.5 in the old code to '0.05' (as a string) in the new code. This is a significant change in default value (10x smaller) that could affect results. Verify this change is intentional, and consider if the default should remain 0.5, or document why the stricter tolerance of 0.05 is now preferred.

Suggested change
def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.05'
def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.5'

Copilot uses AI. Check for mistakes.
def compute_all_scores = params.onsite_compute_all_scores ? '--compute-all-scores' : ''

// Set default value for add_decoys (can be overridden by setting params.onsite_add_decoys = false)
def onsite_add_decoys = params.containsKey('onsite_add_decoys') ? params.onsite_add_decoys : true

// Algorithm-specific parameters
def fragment_unit = ''
def add_decoys = onsite_add_decoys ? '--add-decoys' : ''
def debug = params.onsite_debug ? '--debug' : ''

// Build algorithm-specific command
def algorithm_cmd = ''

if (algorithm == 'ascore') {
// AScore: uses -in, -id, -out, --fragment-mass-unit
fragment_unit = params.onsite_fragment_unit ?: 'Da'
algorithm_cmd = """
onsite ascore \\
-in ${mzml_file} \\
-id ${id_file} \\
-out ${id_file.baseName}_ascore.idXML \\
--fragment-mass-tolerance ${fragment_tolerance} \\
--fragment-mass-unit ${fragment_unit} \\
${add_decoys} \\
${compute_all_scores} \\
${debug}
"""
} else if (algorithm == 'phosphors') {
// PhosphoRS: uses -in, -id, -out, --fragment-mass-unit
fragment_unit = params.onsite_fragment_unit ?: 'Da'
algorithm_cmd = """
onsite phosphors \\
-in ${mzml_file} \\
-id ${id_file} \\
-out ${id_file.baseName}_phosphors.idXML \\
--fragment-mass-tolerance ${fragment_tolerance} \\
--fragment-mass-unit ${fragment_unit} \\
${add_decoys} \\
${compute_all_scores} \\
${debug}
"""
} else if (algorithm == 'lucxor') {
// LucXor: uses -in, -id, -out, --fragment-error-units (note: error-units not mass-unit)
fragment_unit = params.onsite_fragment_error_units ?: 'Da'
def fragment_method = params.onsite_fragment_method ?: 'CID'
def min_mz = params.onsite_min_mz ?: '150.0'
def max_charge = params.onsite_max_charge_state ?: '5'
def max_peptide_len = params.onsite_max_peptide_length ?: '40'
def max_num_perm = params.onsite_max_num_perm ?: '16384'
def modeling_threshold = params.onsite_modeling_score_threshold ?: '0.95'
def scoring_threshold = params.onsite_scoring_threshold ?: '0.0'
def min_num_psms = params.onsite_min_num_psms_model ?: '5'
def rt_tolerance = params.onsite_rt_tolerance ?: '0.01'
def disable_split_by_charge = params.onsite_disable_split_by_charge ? '--disable-split-by-charge' : ''

// Optional target modifications - default for LucXor includes decoy
def target_mods = params.onsite_target_modifications ? "--target-modifications ${params.onsite_target_modifications}" : "--target-modifications 'Phospho(S),Phospho(T),Phospho(Y),PhosphoDecoy(A)'"
def neutral_losses = params.onsite_neutral_losses ? "--neutral-losses ${params.onsite_neutral_losses}" : "--neutral-losses 'sty -H3PO4 -97.97690'"
def decoy_mass = params.onsite_decoy_mass ? "--decoy-mass ${params.onsite_decoy_mass}" : "--decoy-mass 79.966331"
def decoy_losses = params.onsite_decoy_neutral_losses ? "--decoy-neutral-losses ${params.onsite_decoy_neutral_losses}" : "--decoy-neutral-losses 'X -H3PO4 -97.97690'"

algorithm_cmd = """
onsite lucxor \\
-in ${mzml_file} \\
-id ${id_file} \\
-out ${id_file.baseName}_lucxor.idXML \\
--fragment-method ${fragment_method} \\
--fragment-mass-tolerance ${fragment_tolerance} \\
--fragment-error-units ${fragment_unit} \\
--min-mz ${min_mz} \\
${target_mods} \\
${neutral_losses} \\
${decoy_mass} \\
${decoy_losses} \\
--max-charge-state ${max_charge} \\
--max-peptide-length ${max_peptide_len} \\
--max-num-perm ${max_num_perm} \\
--modeling-score-threshold ${modeling_threshold} \\
--scoring-threshold ${scoring_threshold} \\
--min-num-psms-model ${min_num_psms} \\
--rt-tolerance ${rt_tolerance} \\
${disable_split_by_charge} \\
${compute_all_scores} \\
${debug}
"""
} else {
error "Unknown onsite algorithm: ${algorithm}. Supported algorithms: ascore, phosphors, lucxor"
}

"""
${algorithm_cmd} \\
2>&1 | tee ${id_file.baseName}_${algorithm}.log
Comment on lines +116 to +118
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ONSITE process builds the onsite CLI command by interpolating multiple params.* values (e.g. params.onsite_target_modifications, params.onsite_neutral_losses, params.onsite_decoy_mass, params.onsite_decoy_neutral_losses, and other numeric thresholds) directly into a shell string (algorithm_cmd), which is then executed via the script block. If any of these parameters (or derived values like id_file.baseName) contain shell metacharacters (;, &, backticks, $(...), etc.), an attacker who can control pipeline parameters or file names can inject additional commands and gain code execution in the workflow environment when this module is run as part of a multi-tenant service. To mitigate this, ensure all params.* values used in algorithm_cmd are either strictly validated to the expected formats (e.g. numeric ranges or whitelisted tokens) before use, or are passed to the command via safe quoting/escaping or Nextflow’s safer parameter binding mechanisms rather than raw string interpolation into a bash script.

Suggested change
"""
${algorithm_cmd} \\
2>&1 | tee ${id_file.baseName}_${algorithm}.log
// Sanitize values used in shell redirection/filenames to prevent command injection
def safeAlgorithm = (algorithm =~ /[^A-Za-z0-9_-]/).replaceAll('_')
def safeBaseName = (id_file.baseName =~ /[^A-Za-z0-9._-]/).replaceAll('_')
def log_file = "${safeBaseName}_${safeAlgorithm}.log"
"""
${algorithm_cmd} \\
2>&1 | tee '${log_file}'

Copilot uses AI. Check for mistakes.

cat <<-END_VERSIONS > versions.yml
"${task.process}":
onsite: \$(onsite --version 2>&1 | grep -oP 'version \\K[0-9.]+' || echo "unknown")
algorithm: ${algorithm}
END_VERSIONS
"""
}
56 changes: 56 additions & 0 deletions modules/bigbio/onsite/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
name: onsite
description: Post-translational modification (PTM) localization using onsite algorithms (AScore, PhosphoRS, LucXor)
keywords:
- onsite
- PTM
- phosphorylation
- AScore
- PhosphoRS
- LucXor
- modification
- localization
tools:
- onsite:
description: |
Comprehensive Python package for mass spectrometry post-translational modification (PTM) localization.
Provides algorithms for confident phosphorylation site localization and scoring, including
implementations of AScore, PhosphoRS, and LucXor (LuciPHOr2).
homepage: https://github.com/bigbio/onsite
documentation: https://github.com/bigbio/onsite/blob/main/README.md
tool_dev_url: https://github.com/bigbio/onsite
doi: ""
licence: ["MIT"]
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- mzml_file:
type: file
description: Input spectrum file in mzML format
pattern: "*.mzML"
- id_file:
type: file
description: Protein/peptide identifications file in idXML format
pattern: "*.idXML"
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- ptm_in_id_onsite:
type: file
description: Protein/peptide identifications file with PTM localization scores
pattern: "*_{ascore,phosphors,lucxor}.idXML"
- log:
type: file
description: Log file from onsite execution
pattern: "*.log"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
Comment on lines +37 to +53
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meta.yml file has an incorrect structure. Lines 37-53 should be under an "output:" section, not under "input:". The current structure duplicates the meta input and incorrectly places output fields within the input section. The structure should be:

  • input: (lines 24-36)
  • output: (lines 37-53)

Copilot uses AI. Check for mistakes.
authors:
- "@ypriverol"
- "@weizhongchun"
112 changes: 112 additions & 0 deletions modules/bigbio/onsite/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
nextflow_process {

name "Test Process ONSITE"
script "../main.nf"
process "ONSITE"
tag "modules"
tag "modules_onsite"
tag "onsite"

test("Should run AScore algorithm") {

when {
process {
"""
input[0] = [
[ id: 'test', mzml_id: 'test_sample' ],
file(params.test_data['proteomics']['onsite']['mzml'], checkIfExists: true),
file(params.test_data['proteomics']['onsite']['idxml'], checkIfExists: true)
]
"""
}
params {
onsite_algorithm = 'ascore'
}
}

then {
assert process.success
assert snapshot(process.out.versions).match("versions_ascore")
assert process.out.ptm_in_id_onsite.size() == 1
assert process.out.log.size() == 1
// Check output file has correct naming
assert process.out.ptm_in_id_onsite[0][1].toString().endsWith('_ascore.idXML')
}
}

test("Should run PhosphoRS algorithm") {

when {
process {
"""
input[0] = [
[ id: 'test', mzml_id: 'test_sample' ],
file(params.test_data['proteomics']['onsite']['mzml'], checkIfExists: true),
file(params.test_data['proteomics']['onsite']['idxml'], checkIfExists: true)
]
"""
}
params {
onsite_algorithm = 'phosphors'
}
}

then {
assert process.success
assert snapshot(process.out.versions).match("versions_phosphors")
assert process.out.ptm_in_id_onsite.size() == 1
assert process.out.log.size() == 1
// Check output file has correct naming
assert process.out.ptm_in_id_onsite[0][1].toString().endsWith('_phosphors.idXML')
}
}

test("Should run LucXor algorithm") {

when {
process {
"""
input[0] = [
[ id: 'test', mzml_id: 'test_sample' ],
file(params.test_data['proteomics']['onsite']['mzml'], checkIfExists: true),
file(params.test_data['proteomics']['onsite']['idxml'], checkIfExists: true)
]
"""
}
params {
onsite_algorithm = 'lucxor'
}
}

then {
assert process.success
assert snapshot(process.out.versions).match("versions_lucxor")
assert process.out.ptm_in_id_onsite.size() == 1
assert process.out.log.size() == 1
// Check output file has correct naming
assert process.out.ptm_in_id_onsite[0][1].toString().endsWith('_lucxor.idXML')
}
}

test("Should run stub mode") {

options "-stub"

when {
process {
"""
input[0] = [
[ id: 'test', mzml_id: 'test_sample' ],
file(params.test_data['proteomics']['onsite']['mzml'], checkIfExists: true),
file(params.test_data['proteomics']['onsite']['idxml'], checkIfExists: true)
]
"""
}
}

then {
assert process.success
assert snapshot(process.out.versions).match("versions_stub")
}
}
Comment on lines +91 to +111
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file includes a stub mode test (lines 91-111), but the main.nf file does not include a stub section to handle stub execution. This will cause the stub test to fail. A stub block should be added to the process definition to support stub mode testing.

Copilot uses AI. Check for mistakes.
}
3 changes: 3 additions & 0 deletions modules/bigbio/onsite/tests/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
process {
publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }
}
Loading
Loading