Skip to content

DSL2: #1167

@ilight1542

Description

@ilight1542

Is your feature request related to a problem? Please describe

just running into some issues here with getting the test_full run snapshot done. the main hangup is that since dsl2-eager runs each BAM_SPLIT_BY_REGION:SAMTOOLS_VIEW and associated jobs as a contig-by-contig job, it quickly approachs 1000s+ jobs for any lower-quality reference genomes. the scheduling of these (very low computational load) jobs is therefore significant and causes immense slowdowns on shared clusters, potentially leading at times to a user getting restricted to spawn new jobs or the parent process getting killed/running to runtime limits. many users will be running dsl2-eager in clusters with prioritzation queues which will cause the whole eager run to balloon in runtime when jobs that take <10 seconds to execute are queued in eg. slurm for 10+ minutes

Describe the solution you'd like

Include optional parameter for of processing deduplication steps without contig-by-contig splitting (as was implemented in dsl1 eager).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions