-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi,
Thanks for making this wonderful tool! I have run into an issue:
After running the following command...
samtools view ./WT_96_2.bam | BAFExtract -generate_compressed_pileup_per_SAM stdin ./CaSpER/dm6.list ./CaSpER/baf/wt96_2 50 0; BAFExtract -get_SNVs_per_pileup ./CaSpER/dm6.list ./CaSpER/baf/wt96_2 ./CaSpER/dm6_fasta_pileup/ 20 4 0.1 ./CaSpER/baf/wt96_2/WT_96_2.baf
...I get the following std output:
Generating pileup from SAM file with min mapp qual 50, min base qual 0
Dumping the pileups per SAM file stdin
Allocating pileup memory.
Done.
Maximum # of alleles at each position: 65536
Phred+33 encoding @ 14. read (1):
CACTCAATTATATACTTTATATGGTCGGAAAAGCTTCCTTCTGCCTGTAA
D@@@@EHHEEHEHHEHHGGIH1G1D1FHHHGHGC@<C@HHHHECHHCE@H
Finished reading.00. read
27379256 0 signal, 4037446 4-bit, 636538 8-bit, 26073 12-bit, 18 14-bit, 0 16-bit positions.
Skipping loading check.
24265304 0 signal, 3313416 4-bit, 517137 8-bit, 14343 12-bit, 27 14-bit, 0 16-bit positions.
Skipping loading check.
21308450 0 signal, 3421544 4-bit, 532190 8-bit, 24291 12-bit, 461 14-bit, 0 16-bit positions.
Skipping loading check.
19142111 0 signal, 3866743 4-bit, 516616 8-bit, 16734 12-bit, 67 14-bit, 0 16-bit positions.
Skipping loading check.
20211830 0 signal, 2818271 4-bit, 463435 8-bit, 20107 12-bit, 69 14-bit, 0 16-bit positions.
Skipping loading check.
3663413 0 signal, 3939 4-bit, 0 8-bit, 0 12-bit, 0 14-bit, 0 16-bit positions.
Skipping loading check.
1060875 0 signal, 256237 4-bit, 28894 8-bit, 2125 12-bit, 0 14-bit, 0 16-bit positions.
Skipping loading check.
Calling SNVs with minimum 20 total and 4 alternate read coverage and 0.100 min alternate frequency.
Procesing 3R
Loading pileup of length 32079331
Loaded 32079331 long pileup.
Procesing 3L
Loading pileup of length 28110227
Loaded 28110227 long pileup.
Procesing 2R
Loading pileup of length 25286936
Loaded 25286936 long pileup.
Procesing X
Loading pileup of length 23542271
Loaded 23542271 long pileup.
Procesing 2L
Loading pileup of length 23513712
Loaded 23513712 long pileup.
Procesing Y
Loading pileup of length 3667352
Loaded 3667352 long pileup.
Procesing 4
Loading pileup of length 1348131
Loaded 1348131 long pileup.
Question 1: what does the 20 4 0.1 parameter for BAFExtract -get_SNVs_per_pileup mean?
It seems to have worked and I can see the .baf file along with the allele_counts.bin files for each chromosome in the output folder. This is the output:
$WT_96_2$WT_96_2.baf
chr position alt ref coverage baf dev
21231 4 66150 15 8 23 0.6521739 0.15217391
21232 4 116697 142 0 142 1.0000000 0.50000000
21233 4 116733 87 53 140 0.6214286 0.12142857
21234 4 117428 55 24 79 0.6962025 0.19620253
21235 4 117647 27 0 27 1.0000000 0.50000000
21236 4 138445 16 13 29 0.5517241 0.05172414
21237 4 177684 42 14 56 0.7500000 0.25000000
21238 4 308618 26 12 38 0.6842105 0.18421053
21239 4 310183 111 86 197 0.5634518 0.06345178
21240 4 310405 23 2 25 0.9200000 0.42000000
21241 4 313589 21 18 39 0.5384615 0.03846154
21242 4 331005 19 3 22 0.8636364 0.36363636
21243 4 331333 18 8 26 0.6923077 0.19230769
21244 4 349108 16 9 25 0.6400000 0.14000000
21245 4 372148 20 9 29 0.6896552 0.18965517
21246 4 372423 42 15 57 0.7368421 0.23684211
21247 4 372620 13 8 21 0.6190476 0.11904762
21248 4 408746 56 0 56 1.0000000 0.50000000
21249 4 546394 14 8 22 0.6363636 0.13636364
21250 4 611349 22 0 22 1.0000000 0.50000000
21251 4 691913 33 0 33 1.0000000 0.50000000
21252 4 918729 58 0 58 1.0000000 0.50000000
21253 4 951893 23 0 23 1.0000000 0.50000000
21254 4 952037 33 2 35 0.9428571 0.44285714
21255 4 952409 31 0 31 1.0000000 0.50000000
21256 4 962628 23 0 23 1.0000000 0.50000000
21257 4 1016473 26 0 26 1.0000000 0.50000000
21258 4 1016485 24 0 24 1.0000000 0.50000000
21259 4 1028923 5 41 46 0.1086957 0.39130435
21260 4 1029043 26 0 26 1.0000000 0.50000000
21261 4 1115591 94 0 94 1.0000000 0.50000000
21262 4 1118484 33 0 33 1.0000000 0.50000000
HOWEVER, given that dm6 (fruit-flies) has 2L, 2R, 3L, 3R, 4, X, Y chromosomes, and that the generated baf file only ends up containing the values for chromosome 4, I don't quite understand what happened there.
So, Question 2: How can I fix this?
This is for every sample run for the overall experiment. Any help would be greatly appreciated!
Regards,
Deep