Skip to content

Support run mode providing FASTQ pair and TSV with allowable cell/umi barcodes #78

@bbimber

Description

@bbimber

As is, nimble accepts either a pair of FASTQs as input (for bulk RNA-seq), or a 10x BAM. The only reason we need this BAM is b/c it has the cellbarcode+UMIs parsed from the raw reads. This is somewhat limiting, because this means we need to re-create that BAM every time before running nimble. We should support a third mode:

  1. I created a tool that will iterate the 10x BAM and write out two TSVs, one for each CR->CB and UR->UB mapping (https://github.com/BimberLab/DISCVRSeq/blob/c456a336cbb7bed2c7c17a7b72781554f3004554/src/main/java/com/github/discvrseq/walkers/Save10xBarcodes.java#L62). The idea is to provide a map of raw->corrected sequence for all valid combination that CellRanger found in the data.

  2. We should make a python-based entrypoint (maybe just a different argument in the existing 'align' task). This would accept a N pairs of FASTQs as input, along with TSVs for raw->corrected cellbarcode and raw->corrected UMIs. We also need to supply the cellbarcode pattern and UMI pattern.

  3. The python code should read those TSVs into memory as hash maps. It should then iterate each pair of FASTQs (Ideally we with some kind of dual iterator to read both forward/reverse at once). For each read pair, parse R1 to grab the raw CB and raw UMI. Use the in-memory lookup to translate that to corrected CB/UB.

  4. Strip the barcode portion from R1 and then write both forward and reverse reads to a new BAM file (as unaligned reads). If the R1 has no bases beyond the barcode information, skip it. Write UB/CB tags in that BAM.

  5. This new BAM should be able to feed directly into the existing codepath. We still need to sort it, but all downstream code is exactly the same as for a regular 10x BAM.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions