At the Arc Institute, we use binseq for increased performance versus fastq files. It would be great to see binseq supported in STAR-suite as an optional input backend. Some features of binseq:
- Binary two-bit sequence encoding
- Support for single-end and paired-end reads
- Optional storage of qualities and headers
- A blocked/columnar layout
- Per-column ZSTD compression
- Potential lazy/partial decoding of fields that a workflow actually needs
CBQ input could provide up to 2.0x faster input ingestion vs native gzip FASTQ.
At the Arc Institute, we use binseq for increased performance versus fastq files. It would be great to see binseq supported in STAR-suite as an optional input backend. Some features of binseq:
CBQ input could provide up to 2.0x faster input ingestion vs native gzip FASTQ.