Skip to content

Masurca#11049

Draft
LiaOb21 wants to merge 8 commits intonf-core:masterfrom
LiaOb21:masurca
Draft

Masurca#11049
LiaOb21 wants to merge 8 commits intonf-core:masterfrom
LiaOb21:masurca

Conversation

@LiaOb21
Copy link
Copy Markdown
Contributor

@LiaOb21 LiaOb21 commented Mar 25, 2026

Masurca module [draft]

Supports assembly with Cabog assembly only. Flye and SOAPdenovo are not supported.

Supported reads:

  • single end or merged reads
  • paired end only
  • paired end + mate pair (jump)
  • paired end + nanopore
  • paired end + pacbio
  • paired end + nanopore + pacbio
  • paired end + reference genome

Does not support other kinds of reads (Sanger, 454, etc)

Current problems:

  • conda tests work in the first run, but in the second run the MD5 of the outputs is different
  • docker doesn't work. I tried multiple builds with wave without success.

PR checklist

Closes #10945

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@chriswyatt1
Copy link
Copy Markdown
Contributor

Are the value here pipeline wide, or specific to each genome being run? Is that why they need to be specific channel inpus?
val fragment_mean
val fragment_stdev
val jump_mean
val jump_stdev

@chriswyatt1
Copy link
Copy Markdown
Contributor

path(other_reads) doesn't appear to be used, so could be removed

@chriswyatt1
Copy link
Copy Markdown
Contributor

chriswyatt1 commented Mar 26, 2026

Looks like a specific build type, so I think seqera containers won't be able to handle that.
https://github.com/alekseyzimin/masurca?tab=readme-ov-file#compileinstall-requirements
We will need to build this manually I think. I can have a go

Then we will need to do this:
https://nf-co.re/docs/contributing/guidelines/recommendations/custom_containers

@LiaOb21
Copy link
Copy Markdown
Contributor Author

LiaOb21 commented Mar 26, 2026

Are the value here pipeline wide, or specific to each genome being run? Is that why they need to be specific channel inpus? val fragment_mean val fragment_stdev val jump_mean val jump_stdev

I think they are meant to be as realistic as possible for specific genomes. However, I ran all the tests locally and manually in a conda env using the same nf-core datasets I'm using in main.test.nf and using the same values, and they all gave me the expected outputs.

path(other_reads) doesn't appear to be used, so could be removed

Yes, I was initially planning to integrate that as well, but then I decided to leave it. I forgot to remove it from the input tuple.

Looks like a specific build type, so I think seqera containers won't be able to handle that. https://github.com/alekseyzimin/masurca?tab=readme-ov-file#compileinstall-requirements We will need to build this manually I think. I can have a go

Thank you @chriswyatt1! 😊

@chriswyatt1
Copy link
Copy Markdown
Contributor

OK, Just thinking to make this module more atomic, we should have these values as ext args. But if its helpful, to have this in a channel, I think it can be justified. If we think that any user would need to parse this information in for multiple genomes always, rather than set them generally in the modules.config say

@chriswyatt1
Copy link
Copy Markdown
Contributor

The only other thing of general comments, is about all the echo statements. Is this essential, or maybe we could move some of it into a script maybe. I will check out what other modules do in this case

@LiaOb21
Copy link
Copy Markdown
Contributor Author

LiaOb21 commented Mar 26, 2026

OK, Just thinking to make this module more atomic, we should have these values as ext args. But if its helpful, to have this in a channel, I think it can be justified. If we think that any user would need to parse this information in for multiple genomes always, rather than set them generally in the modules.config say

I think the ideal use would be to have a specific value per genome. Personally I'm using an average fixed value, as I often don't have information about the fragment size. Also from masurca documentation is not completely clear if they refer to the fragment size or the insert size. I don't know what would be more appropriate for the nf-core module, so I'll follow your suggestion 😊

The only other thing of general comments, is about all the echo statements. Is this essential, or maybe we could move some of it into a script maybe. I will check out what other modules do in this case

Yes, so that's the main reason why I was so confused on how setting up this module. I asked in the community and they suggested to have a look to these two modules:

controlfreec also uses echo commands and I thought that was probably the easiest way to build the config for masurca. But happy to change if there is any better way to deal with this! 😊

@chriswyatt1
Copy link
Copy Markdown
Contributor

I made a container, not sure it will work, but worth a try:

quay.io/ecoflowucl/masurca so in the module you can just put ecoflowucl/masurca

test with docker pull quay.io/ecoflowucl/masurca:v4.1.4

Dockerfile is here: https://github.com/Eco-Flow/docker-build/tree/main/masurca

@chriswyatt1
Copy link
Copy Markdown
Contributor

I think the container works. But yes, there are some output files that have time stamps and specific work dir paths, which will change on each run. So we need to make the nf-test more specific. Pick files that won't change
assemble.sh
#!/bin/bash

assemble.sh generated by masurca:

CONFIG_PATH="/Users/cwyatt/Downloads/modules-2/.nf-test/tests/aad480bb9b0561d1bd4bc958001b0e93/work/e4/920dca1167539f25218df125acaf84/test_masurca_config.txt"

test-masurca.log:
[Thu Mar 26 18:02:31 UTC 2026] Processing pe library reads
[Thu Mar 26 18:02:32 UTC 2026] Average PE read length 100
[Thu Mar 26 18:02:32 UTC 2026] Using kmer size of 67 for the graph
[Thu Mar 26 18:02:32 UTC 2026] MIN_Q_CHAR: 33
[Thu Mar 26 18:02:32 UTC 2026] Creating mer database for Quorum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

container-issue help wanted Extra attention is needed

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

new module: MASURCA

2 participants