Skip to content

All mapped reads flagged as "too short" #396

@jacopoM28

Description

@jacopoM28

Dear all,

I am writing because I am experiencing some issues with the TADbit tools, where all mapped reads are flagged as either "too close to RES" or "too short". Specifically, I have two HiC libraries generated with Arima-HiC (150bp PE reads), which I have mapped separately in an iterative fashion without specifying any restriction enzyme, using the following commands:

$tadbit map --fastq "$FASTQ.1_R1" --index ../qmProCava1.cleaned.HiC_ProCav2.curated.1.primary.curated.gem --read 1 \
--renz NONE -C 40 --windows 1:15 1:20 1:25 1:30 1:35 1:40 1:45 1:50 1:55 1:60 1:65 1:70 1:75 -w Large --iterative

$tadbit map --fastq "$FASTQ.1_R2" --index ../qmProCava1.cleaned.HiC_ProCav2.curated.1.primary.curated.gem --read 2 \
--renz NONE -C 40 --windows 1:15 1:20 1:25 1:30 1:35 1:40 1:45 1:50 1:55 1:60 1:65 1:70 1:75 -w Large --iterative

$tadbit map --fastq "$FASTQ.2_R1" --index ../qmProCava1.cleaned.HiC_ProCav2.curated.1.primary.curated.gem --read 1 \
--renz NONE -C 40 --windows 1:15 1:20 1:25 1:30 1:35 1:40 1:45 1:50 1:55 1:60 1:65 1:70 1:75 -w Large --iterative

$tadbit map --fastq "$FASTQ.2_R2" --index ../qmProCava1.cleaned.HiC_ProCav2.curated.1.primary.curated.gem --read 2 \
--renz NONE -C 40 --windows 1:15 1:20 1:25 1:30 1:35 1:40 1:45 1:50 1:55 1:60 1:65 1:70 1:75 -w Large --iterative

After this, I merged all files with:

$tadbit parse -w Large/ --genome ../qmProCava1.cleaned.HiC_ProCav2.curated.1.primary.curated.fa

Finally, for reads filtering, I used:

$tadbit filter -w Large/ -C 10 --apply 1 2 3 4 6 7 9 10
Getting intersection between read 1 and read 2
Get insert size...
  - median insert size = 356.0
  - double median absolution of insert size = 87.0
  - max insert size (when a gap in continuity of > 10 bp is found in fragment lengths) = 1356
   Using the maximum continuous fragment size(1356 bp) to check for pseudo-dangling ends
   Using maximum continuous fragment size plus the MAD (1443 bp) to check for random breaks
identify pairs to filter...
Filtered reads (and percentage of total):

                   Mapped both  :  103,322,115 (100.00%)
  -----------------------------------------------------
   1-               self-circle :    5,101,974 (  4.94%)
   2-              dangling-end :   27,421,527 ( 26.54%)
   3-                     error :   10,421,255 ( 10.09%)
   4-        extra dangling-end :            0 (  0.00%)
   5-        too close from RES :  103,322,116 (100.00%)
   6-                 too short :  103,322,116 (100.00%)
   7-                 too large :            0 (  0.00%)
   8-          over-represented :   74,076,173 ( 71.69%)
   9-                duplicated :   45,211,936 ( 43.76%)
  10-             random breaks :            0 (  0.00%)
    saving to file 0 reads without.

As you can see from the TADbit log, all reads appear to have been flagged as either "too short" or "too close to RES". Do you have any idea if there is something wrong with my commands and/or if I am missing something?

Thanks in advance for your support!

All the best,
Jacopo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions