-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Fastq headers sometimes contain the f/r pair identification. In my case, I have the identification at the end of the header as .1 or .2 for fastq1 and fastq2, respectively:
# F1
@A66666:555:HTK7JDSX2:4:1128:30879:19867.1
# F2
@A66666:555:HTK7JDSX2:4:1128:30879:19867.2
According to this article on FASTQ format, illumina reads can also contain "/1" or "/2" for indicating the pair.
For my fastq files, the pipeline broke at the stage:
#### extract split reads
samtools view -h $sample.unique.bam \
| python3 $dir/extractSplitReads_BwaMem.py -i stdin \
| samtools view -Sb > $sample.unsort.splitters.bam
samtools sort -@ $sort_t -o $sample.splitters.bam $sample.unsort.splitters.bam
samtools index $sample.splitters.bam
samtools index $sample.unique.bamNo .splitters.bam was produced and therefore no acc.csv neither.
It worked when I removed the .1 and .2 from the end of the headers.
My suggestion is to add the recommendation that the paired files should have identical headers. Also, would be good if the pipeline could deal with such cases :).
Thanks for the cool tool!
Cheers,
Metadata
Metadata
Assignees
Labels
No labels