Skip to content

path input with collect behaves differently for single element vs. multiple #3886

@jchorl

Description

@jchorl

Bug report

When staging files, it can be helpful to set resources based on size. However, I've found this to be difficult when dealing with a list of files.

Assume files with contents:

bar.fastq
bar

baz.fastq
baz

foo.fastq
foo

Consider this trivial workflow:

workflow {
  manyChan = Channel.fromPath("*.fastq")
  singleChan = Channel.fromPath("foo.fastq")
  manyChan \
    | collect \
    | view { it -> "many: ${it}" }
  singleChan \
    | collect \
    | view { it -> "single: ${it}" }
}

It produces:

bash-4.2# nextflow run main.nf
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [awesome_kare] DSL2 - revision: 5ceaaa278c
single: [/work/foo.fastq]
many: [/work/foo.fastq, /work/baz.fastq, /work/bar.fastq]

Correctly, we see lists of files.

Consider this less trivial workflow:

process sizeFastqs {
  input:
    path(fastqs, stageAs: '*.fastq')
  output:
    path 'summed.txt'

  shell:
    summed = fastqs*.size().sum()
    """
    printf '!{summed}' > summed.txt
    """
}

workflow {
  // chan = Channel.fromPath("*.fastq")  // works
  // chan = Channel.fromPath("b*.fastq") // works
  chan = Channel.fromPath("foo.fastq")   // doesnt work
  sizeFastqs(chan.collect())
}

Expected behavior and actual behavior

We see:

bash-4.2# nextflow run main.nf
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [pedantic_bardeen] DSL2 - revision: e78dae9554
[-        ] process > sizeFastqs -
ERROR ~ Error executing process > 'sizeFastqs'

Caused by:
  No such file or directory: .fastq

Note that the other two channel factories work here. So it seems that with path(...), [file] is treated differently than [file, file]. I could understand file being different than [file], but I'd expect [file] and [file, file] to behave similarly.

Is this a bug? If not, what is a reliable way to sum input size for a | collect | myProc?

Steps to reproduce the problem

All details are above.

Program output

Caused by:
  No such file or directory: .fastq

Source block:
  summed = fastqs*.size().sum()
  """
      printf '!{summed}' > summed.txt
      """

Environment

bash-4.2# nextflow info
  Version: 23.04.1 build 5866
  Created: 15-04-2023 06:51 UTC 
  System: Linux 5.15.0-69-generic
  Runtime: Groovy 3.0.16 on OpenJDK 64-Bit Server VM 17.0.6+10-LTS
  Encoding: UTF-8 (UTF-8)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions