Skip to content

Deal with unidentical paired-end fastq headers  #2

@LucasMS

Description

@LucasMS

Fastq headers sometimes contain the f/r pair identification. In my case, I have the identification at the end of the header as .1 or .2 for fastq1 and fastq2, respectively:

# F1
@A66666:555:HTK7JDSX2:4:1128:30879:19867.1
# F2
@A66666:555:HTK7JDSX2:4:1128:30879:19867.2

According to this article on FASTQ format, illumina reads can also contain "/1" or "/2" for indicating the pair.

For my fastq files, the pipeline broke at the stage:

#### extract split reads
samtools view -h $sample.unique.bam \
| python3 $dir/extractSplitReads_BwaMem.py -i stdin \
| samtools view -Sb > $sample.unsort.splitters.bam
samtools sort -@ $sort_t -o $sample.splitters.bam $sample.unsort.splitters.bam
samtools index $sample.splitters.bam
samtools index $sample.unique.bam

No .splitters.bam was produced and therefore no acc.csv neither.

It worked when I removed the .1 and .2 from the end of the headers.

My suggestion is to add the recommendation that the paired files should have identical headers. Also, would be good if the pipeline could deal with such cases :).

Thanks for the cool tool!
Cheers,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions