Skip to content

Pass CWL ToolTimeLimit.timelimit to slurm job submission#5502

Merged
adamnovak merged 16 commits into
DataBiosphere:masterfrom
lonbar:issues/featurebranch
Jun 4, 2026
Merged

Pass CWL ToolTimeLimit.timelimit to slurm job submission#5502
adamnovak merged 16 commits into
DataBiosphere:masterfrom
lonbar:issues/featurebranch

Conversation

@lonbar

@lonbar lonbar commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Provides an initial implementation for the option to pass runtimes to slurm's resource allocation. Addresss #3037.

Changelog Entry

To be copied to the draft changelog by merger:

  • Runtimes for slurm jobs can now be set using --defaultWalltime
  • Time limits from CWL's ToolTimeLimit are used in slurm batch submissions

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passed tests, including the Gitlab tests, for the most recent commit in its branch.
  • Make sure the PR has been reviewed. If not, review it. If it has been reviewed and any requested changes seem to have been addressed, proceed.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@mr-c

mr-c commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

@adamnovak CI is running using https://github.com/DataBiosphere/toil/tree/issues/3037-wallclock-slurm (a.k.a contrib/admin/test-pr lonbar issues/featurebranch issues/3037-wallclock-slurm)

Comment thread src/toil/cwl/cwltoil.py Outdated
Comment thread src/toil/job.py Outdated
@adamnovak

Copy link
Copy Markdown
Member

Oh excellent, we have wanted this for a while.

@lonbar

lonbar commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

@mr-c I have been made aware of --maxJobDuration, which allows toil itself to cancel jobs after a specified time. I was wondering if it makes sense to use this input instead of adding --defaultWalltime.

My thinking is that it might make sense to keep them separate, as currently the logic is:

  1. If ToolTimeLimit.timelimit is set, use that value.
  2. If ToolTimeLimit.timelimit is not set but --defaultWalltime is used, use the value of --defaultWalltime.
  3. Otherwise, do not set a time limit for slurm jobs.

This means that --defaultWalltime provides a baseline wall time for steps that do not specify a one and if, say, a step in a CWL sets ToolTimeLimit.timelimit: 0, the resulting slurm submission will not contain a wall time argument even if --defaultWalltime is set. --maxJobDuration allows a user to put an upper bound to the runtime of such jobs. Happy to hear your thoughts.

@mr-c

mr-c commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

@mr-c I have been made aware of --maxJobDuration, which allows toil itself to cancel jobs after a specified time. I was wondering if it makes sense to use this input instead of adding --defaultWalltime.

My thinking is that it might make sense to keep them separate, as currently the logic is:

1. If `ToolTimeLimit.timelimit` is set, use that value.

2. If `ToolTimeLimit.timelimit` is not set but `--defaultWalltime` is used, use the value of `--defaultWalltime`.

3. Otherwise, do not set a time limit for slurm jobs.

This means that --defaultWalltime provides a baseline wall time for steps that do not specify a one and if, say, a step in a CWL sets ToolTimeLimit.timelimit: 0, the resulting slurm submission will not contain a wall time argument even if --defaultWalltime is set. --maxJobDuration allows a user to put an upper bound to the runtime of such jobs. Happy to hear your thoughts.

I think it makes sense to keep --defaultWalltime and still respect --maxJobDuration if set, yes.

@lonbar lonbar marked this pull request as ready for review May 12, 2026 11:21
@adamnovak adamnovak requested a review from mr-c May 12, 2026 20:44

@adamnovak adamnovak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, but I think some of the variable names ought to change and I think supporting --defaultWalltime=0 would be good.

Comment thread src/toil/lib/conversions.py Outdated
Comment thread src/toil/options/common.py Outdated
)
cpu_note = "Fractions of a core (for example 0.1) are supported on some batch systems [mesos, single_machine]"
disk_mem_note = "Standard suffixes like K, Ki, M, Mi, G or Gi are supported"
disk_walltime_note = "Values are assumed to be in seconds. A value of 0 does not limit the walltime"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing disk_ about this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Updated in 7d734b4.

Comment thread src/toil/options/common.py Outdated
Comment thread src/toil/job.py Outdated
Comment thread src/toil/batchSystems/slurm.py
@lonbar lonbar requested a review from adamnovak May 21, 2026 13:58
@mr-c mr-c added the cwl label Jun 1, 2026

@adamnovak adamnovak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use more testing to ensure that the limit is really applied, and I think the WDL interpreter will want to plug into this eventually, and it would be good if --defaultWalltime could take formatted durations and not just a number of seconds, but all that can be improved later.

@adamnovak

Copy link
Copy Markdown
Member

I pulled this in with contrib/admin/test-pr lonbar issues/featurebranch issues/3037-wallclock-slurm for testing.

@adamnovak adamnovak merged commit d60175f into DataBiosphere:master Jun 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement wall clock time resource requests in Toil and expose to CWL and WDL jobs

3 participants