-
Notifications
You must be signed in to change notification settings - Fork 0
Description
- There is a plot showing the distribution of positions in the BAM. This is useful, but I would see if it is possible to convert the x-axis from genomicPosition (which we need to use in the raw data) and report it by chromosome. I R/ggplot I solved this in the past by manually specifying breaks and labels. For example, if you know the chromsome 1 position 1 = genomicPosition 1, you can label as such. You can repeat for each contig. You would need a sequence dictionary providing ordered contigs and lengths, but you probably already have this.
1b) This will be a little tricky when the actual region is a very specific zone within one contig. However, perhaps you could explicitly test this by inspecting the min/max genomicPositions. If their contigs are the same, then maybe label the x-axis as "Position on XX"? In this case, reporting actual contig position would be useful to people. Another thought might be to report the x-axis using actual contig position, but facet the plot horizontally by contig (ideally allowing the python plot code to distribute space unevenly)? i actually wonder if this idea would be easier to implement than when I suggested in #1.
-
Related to the plot above, I would facet by R1/R2 if possible. Putting these above one another probably makes the most sense. I think you have them as two lines, but it's not easy to see both now.
-
Just a comment: the "Concordance between the input BAM calls and the corresponding nimble call" is probably useful.
-
In the "Distribution of nimble base pair scores across r1 and r2 mates" plot, do you think it would make sense to just drop unaligned (maybe report that as a number or percentage in the title text), to make the range within aligned more clear?
-
In the "Distribution of nimble base pair scores across r1 and r2 mates" plot, since there is a hard cutoff being applied, would it make sense to add a dotted line at this threshold to denote that?
-
Why is the 'Density of positions reported in the input BAM file for read-pairs that received a nimble alignment' plot faceted for NKG2D, but not the others? are those 2 different contigs? If so, you might already be doing what I suggested in 1B. I would just: a) label them as such, b) allow the facets to have different widths, and c) report positions unique to that contig, d) still facet R1/R2 vertically.
-
Comment: CCR7 seems to have a real clear issue with R1 alignment, for example. Same on CD27. Same for SELL.
-
Comment: the "Concordance between the input BAM calls and the corresponding nimble call" plots are informative for the LILRs in particular.
-
This is picky, but what's the sort order of the features? Maybe it should be alphabetic?
There are some additional plots I think would be useful:
-
Could you summarize the data in terms of F/F, F/U, F/R, R/R, R/U, etc.? Maybe as a tile plot, where the color is scaled based on the proportion of read pairs in each category?
-
Can you summarize reads that are filtered and the reason for filtering? I realize nimble doesnt have complete access to all the data yet, but whatever information is available would be useful. For example, we see a lot of cases where features have strong R2 hits, but lots of zeros for R1. Understanding what happened to those R1s could be important.
-
Since the set of features can be large, it might be a nice enhancement to have some kind of bullet list of features names at the top, where each is a link to jump to the start of that feature's section.
-
It would probably be useful to print the a basic header first, with something like "Nimble QC Report", that also prints the date run, tool version, perhaps the exact command executed, name of reference library, etc.
-
Should sense/antisense be represented in any of these figures? For example, if a given feature is getting a lot of R1 or R2 hits in the opposite direction (which the F/R figure might also report), then we should have a way to know about that.