Hi there :)
First of all, a wonderful collection of tools and a pipeline !! I have had trouble with some if the installation but I am just chopping and cutting the pipeline to fit my needs.
A quick query for the Virfinder step. So I know you want to do q-value < 0.01 and length > 1000bp. I have run the virfinder rscript file and got the output folder. I just want to double check that the filtering awk script is correct.
head P10_virfinder.tsv
"name" "length" "score" "pvalue" "qvalue"
"1" "k141_270856 flag=1 multi=2.0000 len=320" 320 0.138513157806992 0.588997020854022 0.56810885275558
"2" "k141_33860 flag=1 multi=2.0000 len=361" 361 0.248927602838187 0.42085402184707 0.525288351341074
"3" "k141_237000 flag=1 multi=6.0000 len=478" 478 0.392398453460999 0.271737835153923 0.470800520085438
"4" "k141_93112 flag=1 multi=5.0000 len=419" 419 0.316555183433318 0.343932472691162 0.499862326324076
"5" "k141_177752 flag=1 multi=2.0000 len=398" 398 0.461420194784674 0.219920556107249 0.450421369675625
"6" "k141_321640 flag=1 multi=4.0000 len=383" 383 0.456895550455814 0.222641509433962 0.450905940743768
"7" "k141_211608 flag=1 multi=2.0000 len=477" 477 0.152712635250371 0.562542204568024 0.561554065996342
"8" "k141_16930 flag=1 multi=2.0000 len=426" 426 0.165548747858572 0.539602780536246 0.555654697140864
"9" "k141_8465 flag=1 multi=3.0000 len=431" 431 0.855015816724245 0.0269116186693148 0.268291393791584
The subsequent awk script is then this obviously (ignore all mu '"'"'" this is for a loop script that generates jobs for all of my samples).
cat '"$PALPAL"'_virfinder.tsv | awk -F'\t' '"'"'{ if ($4 <= 0.01) print }'"'"' | awk -F'_' '"'"'{ if ($4 >= 1000) print }'"'"' | cut -f2 | sed "s/\"//g" > '"$PALPAL"'_virfinder_filtered_data.txt
So this awk script is currently firstly using awk on the fourth \t column which is p-value ($4), should this not be $5 as the <0.01 filtering is for q-value ?
Secondly, second awk is extracting the fourth instance after underscore, looking at the output this doesn't seem to be correct. If you are wanting length >1000 would it not be a better idea to do awk -F'\t' '"'"'{ if ($2 >= 1000) print }'"'"'
Looking at the virfinder github it seems the original results format would work with your awk script, but I think a new update may of switched this ?
I used the mamba install in the mudoger install scripts but edited a wee bit (mamba create -n virfinder_env -c bioconda r-virfinder) so I believe both my local and the mudogoer install versions would be the same.
Happy to provide more info if needed, and I apologise in advance if I am barking up the wrong tree, but I was doing some QC and testing and noticed that the awk was not approproate for the output file generated by virfinder :).
Ben
Hi there :)
First of all, a wonderful collection of tools and a pipeline !! I have had trouble with some if the installation but I am just chopping and cutting the pipeline to fit my needs.
A quick query for the Virfinder step. So I know you want to do q-value < 0.01 and length > 1000bp. I have run the virfinder rscript file and got the output folder. I just want to double check that the filtering awk script is correct.
head P10_virfinder.tsvThe subsequent awk script is then this obviously (ignore all mu '"'"'" this is for a loop script that generates jobs for all of my samples).
So this awk script is currently firstly using awk on the fourth \t column which is p-value ($4), should this not be $5 as the <0.01 filtering is for q-value ?
Secondly, second awk is extracting the fourth instance after underscore, looking at the output this doesn't seem to be correct. If you are wanting length >1000 would it not be a better idea to do
awk -F'\t' '"'"'{ if ($2 >= 1000) print }'"'"'Looking at the
virfindergithub it seems the original results format would work with your awk script, but I think a new update may of switched this ?I used the mamba install in the mudoger install scripts but edited a wee bit (
mamba create -n virfinder_env -c bioconda r-virfinder) so I believe both my local and the mudogoer install versions would be the same.Happy to provide more info if needed, and I apologise in advance if I am barking up the wrong tree, but I was doing some QC and testing and noticed that the awk was not approproate for the output file generated by
virfinder:).Ben