The current implementation of RandomReadsMG treats contigs as linear which prevents the generation of reads that span the (often arbitrary) start/end seam of circular elements. It would be very helpful if RandomReadsMG could simulate reads from circular genomes (e.g., plasmids, bacterial/archaeal chromosomes), which is critical for generating realistic metagenomic datasets. This functionality is present in other simulators like readSimulator.
A key difficulty is how to designate specific contigs in a multi-FASTA input as circular. Enforcing a specific FASTA header format is complex and can be brittle, so I think it would suffice to just apply the circularity property at the input file level by modifying the existing custom depth notation.
The new notation could be (feel free to pick any other notation):
- Current notation:
<file>=X (e.g., ecoli.fa=40) sets a custom depth of 40x for ecoli.fa.
- Proposed notation:
<file>=Xc (e.g., plasmid_library.fa=50c) would set a custom depth of 50x and treat all contigs within plasmid_library.fa as circular.
The current implementation of
RandomReadsMGtreats contigs as linear which prevents the generation of reads that span the (often arbitrary) start/end seam of circular elements. It would be very helpful ifRandomReadsMGcould simulate reads from circular genomes (e.g., plasmids, bacterial/archaeal chromosomes), which is critical for generating realistic metagenomic datasets. This functionality is present in other simulators likereadSimulator.A key difficulty is how to designate specific contigs in a multi-FASTA input as circular. Enforcing a specific FASTA header format is complex and can be brittle, so I think it would suffice to just apply the circularity property at the input file level by modifying the existing custom depth notation.
The new notation could be (feel free to pick any other notation):
<file>=X(e.g.,ecoli.fa=40) sets a custom depth of 40x forecoli.fa.<file>=Xc(e.g.,plasmid_library.fa=50c) would set a custom depth of 50x and treat all contigs withinplasmid_library.faas circular.