I was running some external data through with fastqs and the function below resulted in the barcodes being assigned
1172 length=111 from fastq records that look like so :
@SRR20318439.1 A00536:248:HFHTKDSX3:1:1101:2736:1000 length=111
NCACNAATTNAAACCATTACAACNATNAACACTNTATNATAAATANNCCNANNTNCCANANTATAAAAAACNCATTANTACANTCATTTAAATTATATTAAATTTATACCT
+
#FFF#FFFF#FFFFF:FFFFFFF#FF#FFFFFF#FFF#FFFFFFF##FF#F##F#FFF#F#FFFFFFFFFF#FFFFF#FFFF#FFFFFFFFFFFFFFFFFFFFFFFFFFFF
The external data has no barcode in the header, so the function doesn't produce a logical outcome. Because that resulted in a random string with a space in it this caused downstream processes to fail as it added a space in the commands containing the "barcode" variable. Perhaps some checks could be added for when the fastq/bam headers are not formatted as expected for cases like external data or users. Because I could not find any barcodes in metadata I couldn't add them to the original data before running.
barcodes_from_fastq () {
set +o pipefail
zcat -f $1 \
| head -n10000 \
| awk '{
if (NR%4==1) {
split($0, parts, ":");
arr[ parts[ length(parts) ] ]++
}} END { for (i in arr) {print arr[i]"\\t"i} }' \
| sort -k1nr | head -n1 | cut -f2
# | tr -c "[ACGTN]" "\\t"
set -o pipefail
}
I was running some external data through with fastqs and the function below resulted in the barcodes being assigned
1172 length=111from fastq records that look like so :@SRR20318439.1 A00536:248:HFHTKDSX3:1:1101:2736:1000 length=111
NCACNAATTNAAACCATTACAACNATNAACACTNTATNATAAATANNCCNANNTNCCANANTATAAAAAACNCATTANTACANTCATTTAAATTATATTAAATTTATACCT
+
#FFF#FFFF#FFFFF:FFFFFFF#FF#FFFFFF#FFF#FFFFFFF##FF#F##F#FFF#F#FFFFFFFFFF#FFFFF#FFFF#FFFFFFFFFFFFFFFFFFFFFFFFFFFF
The external data has no barcode in the header, so the function doesn't produce a logical outcome. Because that resulted in a random string with a space in it this caused downstream processes to fail as it added a space in the commands containing the "barcode" variable. Perhaps some checks could be added for when the fastq/bam headers are not formatted as expected for cases like external data or users. Because I could not find any barcodes in metadata I couldn't add them to the original data before running.