Skip to content

Baf for only one (the last in the list) chromosome is made. #13

@chatterjee89

Description

@chatterjee89

Hi,

Thanks for making this wonderful tool! I have run into an issue:

After running the following command...
samtools view ./WT_96_2.bam | BAFExtract -generate_compressed_pileup_per_SAM stdin ./CaSpER/dm6.list ./CaSpER/baf/wt96_2 50 0; BAFExtract -get_SNVs_per_pileup ./CaSpER/dm6.list ./CaSpER/baf/wt96_2 ./CaSpER/dm6_fasta_pileup/ 20 4 0.1 ./CaSpER/baf/wt96_2/WT_96_2.baf

...I get the following std output:

Generating pileup from SAM file with min mapp qual 50, min base qual 0
Dumping the pileups per SAM file stdin
Allocating pileup memory.
Done.
Maximum # of alleles at each position: 65536
Phred+33 encoding @ 14. read (1):
CACTCAATTATATACTTTATATGGTCGGAAAAGCTTCCTTCTGCCTGTAA
D@@@@EHHEEHEHHEHHGGIH1G1D1FHHHGHGC@<C@HHHHECHHCE@H
Finished reading.00. read
27379256 0 signal, 4037446 4-bit, 636538 8-bit, 26073 12-bit, 18 14-bit, 0 16-bit positions.
Skipping loading check.
24265304 0 signal, 3313416 4-bit, 517137 8-bit, 14343 12-bit, 27 14-bit, 0 16-bit positions.
Skipping loading check.
21308450 0 signal, 3421544 4-bit, 532190 8-bit, 24291 12-bit, 461 14-bit, 0 16-bit positions.
Skipping loading check.
19142111 0 signal, 3866743 4-bit, 516616 8-bit, 16734 12-bit, 67 14-bit, 0 16-bit positions.
Skipping loading check.
20211830 0 signal, 2818271 4-bit, 463435 8-bit, 20107 12-bit, 69 14-bit, 0 16-bit positions.
Skipping loading check.
3663413 0 signal, 3939 4-bit, 0 8-bit, 0 12-bit, 0 14-bit, 0 16-bit positions.
Skipping loading check.
1060875 0 signal, 256237 4-bit, 28894 8-bit, 2125 12-bit, 0 14-bit, 0 16-bit positions.
Skipping loading check.
Calling SNVs with minimum 20 total and 4 alternate read coverage and 0.100 min alternate frequency.
Procesing 3R
Loading pileup of length 32079331
Loaded 32079331 long pileup.
Procesing 3L
Loading pileup of length 28110227
Loaded 28110227 long pileup.
Procesing 2R
Loading pileup of length 25286936
Loaded 25286936 long pileup.
Procesing X
Loading pileup of length 23542271
Loaded 23542271 long pileup.
Procesing 2L
Loading pileup of length 23513712
Loaded 23513712 long pileup.
Procesing Y
Loading pileup of length 3667352
Loaded 3667352 long pileup.
Procesing 4
Loading pileup of length 1348131
Loaded 1348131 long pileup.

Question 1: what does the 20 4 0.1 parameter for BAFExtract -get_SNVs_per_pileup mean?

It seems to have worked and I can see the .baf file along with the allele_counts.bin files for each chromosome in the output folder. This is the output:

$WT_96_2$WT_96_2.baf
      chr position alt ref coverage       baf        dev
21231   4    66150  15   8       23 0.6521739 0.15217391
21232   4   116697 142   0      142 1.0000000 0.50000000
21233   4   116733  87  53      140 0.6214286 0.12142857
21234   4   117428  55  24       79 0.6962025 0.19620253
21235   4   117647  27   0       27 1.0000000 0.50000000
21236   4   138445  16  13       29 0.5517241 0.05172414
21237   4   177684  42  14       56 0.7500000 0.25000000
21238   4   308618  26  12       38 0.6842105 0.18421053
21239   4   310183 111  86      197 0.5634518 0.06345178
21240   4   310405  23   2       25 0.9200000 0.42000000
21241   4   313589  21  18       39 0.5384615 0.03846154
21242   4   331005  19   3       22 0.8636364 0.36363636
21243   4   331333  18   8       26 0.6923077 0.19230769
21244   4   349108  16   9       25 0.6400000 0.14000000
21245   4   372148  20   9       29 0.6896552 0.18965517
21246   4   372423  42  15       57 0.7368421 0.23684211
21247   4   372620  13   8       21 0.6190476 0.11904762
21248   4   408746  56   0       56 1.0000000 0.50000000
21249   4   546394  14   8       22 0.6363636 0.13636364
21250   4   611349  22   0       22 1.0000000 0.50000000
21251   4   691913  33   0       33 1.0000000 0.50000000
21252   4   918729  58   0       58 1.0000000 0.50000000
21253   4   951893  23   0       23 1.0000000 0.50000000
21254   4   952037  33   2       35 0.9428571 0.44285714
21255   4   952409  31   0       31 1.0000000 0.50000000
21256   4   962628  23   0       23 1.0000000 0.50000000
21257   4  1016473  26   0       26 1.0000000 0.50000000
21258   4  1016485  24   0       24 1.0000000 0.50000000
21259   4  1028923   5  41       46 0.1086957 0.39130435
21260   4  1029043  26   0       26 1.0000000 0.50000000
21261   4  1115591  94   0       94 1.0000000 0.50000000
21262   4  1118484  33   0       33 1.0000000 0.50000000

HOWEVER, given that dm6 (fruit-flies) has 2L, 2R, 3L, 3R, 4, X, Y chromosomes, and that the generated baf file only ends up containing the values for chromosome 4, I don't quite understand what happened there.
So, Question 2: How can I fix this?

This is for every sample run for the overall experiment. Any help would be greatly appreciated!

Regards,
Deep

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions