Bug report
When staging files, it can be helpful to set resources based on size. However, I've found this to be difficult when dealing with a list of files.
Assume files with contents:
bar.fastq
bar
baz.fastq
baz
foo.fastq
foo
Consider this trivial workflow:
workflow {
manyChan = Channel.fromPath("*.fastq")
singleChan = Channel.fromPath("foo.fastq")
manyChan \
| collect \
| view { it -> "many: ${it}" }
singleChan \
| collect \
| view { it -> "single: ${it}" }
}
It produces:
bash-4.2# nextflow run main.nf
N E X T F L O W ~ version 23.04.1
Launching `main.nf` [awesome_kare] DSL2 - revision: 5ceaaa278c
single: [/work/foo.fastq]
many: [/work/foo.fastq, /work/baz.fastq, /work/bar.fastq]
Correctly, we see lists of files.
Consider this less trivial workflow:
process sizeFastqs {
input:
path(fastqs, stageAs: '*.fastq')
output:
path 'summed.txt'
shell:
summed = fastqs*.size().sum()
"""
printf '!{summed}' > summed.txt
"""
}
workflow {
// chan = Channel.fromPath("*.fastq") // works
// chan = Channel.fromPath("b*.fastq") // works
chan = Channel.fromPath("foo.fastq") // doesnt work
sizeFastqs(chan.collect())
}
Expected behavior and actual behavior
We see:
bash-4.2# nextflow run main.nf
N E X T F L O W ~ version 23.04.1
Launching `main.nf` [pedantic_bardeen] DSL2 - revision: e78dae9554
[- ] process > sizeFastqs -
ERROR ~ Error executing process > 'sizeFastqs'
Caused by:
No such file or directory: .fastq
Note that the other two channel factories work here. So it seems that with path(...), [file] is treated differently than [file, file]. I could understand file being different than [file], but I'd expect [file] and [file, file] to behave similarly.
Is this a bug? If not, what is a reliable way to sum input size for a | collect | myProc?
Steps to reproduce the problem
All details are above.
Program output
Caused by:
No such file or directory: .fastq
Source block:
summed = fastqs*.size().sum()
"""
printf '!{summed}' > summed.txt
"""
Environment
bash-4.2# nextflow info
Version: 23.04.1 build 5866
Created: 15-04-2023 06:51 UTC
System: Linux 5.15.0-69-generic
Runtime: Groovy 3.0.16 on OpenJDK 64-Bit Server VM 17.0.6+10-LTS
Encoding: UTF-8 (UTF-8)
Bug report
When staging files, it can be helpful to set resources based on size. However, I've found this to be difficult when dealing with a list of files.
Assume files with contents:
Consider this trivial workflow:
workflow { manyChan = Channel.fromPath("*.fastq") singleChan = Channel.fromPath("foo.fastq") manyChan \ | collect \ | view { it -> "many: ${it}" } singleChan \ | collect \ | view { it -> "single: ${it}" } }It produces:
Correctly, we see lists of files.
Consider this less trivial workflow:
process sizeFastqs { input: path(fastqs, stageAs: '*.fastq') output: path 'summed.txt' shell: summed = fastqs*.size().sum() """ printf '!{summed}' > summed.txt """ } workflow { // chan = Channel.fromPath("*.fastq") // works // chan = Channel.fromPath("b*.fastq") // works chan = Channel.fromPath("foo.fastq") // doesnt work sizeFastqs(chan.collect()) }Expected behavior and actual behavior
We see:
Note that the other two channel factories work here. So it seems that with
path(...),[file]is treated differently than[file, file]. I could understandfilebeing different than[file], but I'd expect[file]and[file, file]to behave similarly.Is this a bug? If not, what is a reliable way to sum input size for a
| collect | myProc?Steps to reproduce the problem
All details are above.
Program output
Environment