Is your feature request related to a problem? Please describe
just running into some issues here with getting the test_full run snapshot done. the main hangup is that since dsl2-eager runs each BAM_SPLIT_BY_REGION:SAMTOOLS_VIEW and associated jobs as a contig-by-contig job, it quickly approachs 1000s+ jobs for any lower-quality reference genomes. the scheduling of these (very low computational load) jobs is therefore significant and causes immense slowdowns on shared clusters, potentially leading at times to a user getting restricted to spawn new jobs or the parent process getting killed/running to runtime limits. many users will be running dsl2-eager in clusters with prioritzation queues which will cause the whole eager run to balloon in runtime when jobs that take <10 seconds to execute are queued in eg. slurm for 10+ minutes
Describe the solution you'd like
Include optional parameter for of processing deduplication steps without contig-by-contig splitting (as was implemented in dsl1 eager).
Is your feature request related to a problem? Please describe
just running into some issues here with getting the test_full run snapshot done. the main hangup is that since dsl2-eager runs each BAM_SPLIT_BY_REGION:SAMTOOLS_VIEW and associated jobs as a contig-by-contig job, it quickly approachs 1000s+ jobs for any lower-quality reference genomes. the scheduling of these (very low computational load) jobs is therefore significant and causes immense slowdowns on shared clusters, potentially leading at times to a user getting restricted to spawn new jobs or the parent process getting killed/running to runtime limits. many users will be running dsl2-eager in clusters with prioritzation queues which will cause the whole eager run to balloon in runtime when jobs that take <10 seconds to execute are queued in eg. slurm for 10+ minutes
Describe the solution you'd like
Include optional parameter for of processing deduplication steps without contig-by-contig splitting (as was implemented in dsl1 eager).