Skip to content

Output anndata files #1

@ilan-gold

Description

@ilan-gold

I'd like to tackle this once I'm back from vacation.

I need to read the STARSolo manual closely to understand what can be handled cleanly and what can't but I think this is a good medium-term goal to output all cell/feature alignable innformation in anndata. Long-term, having some sort of "aligner schema" at a high level where we output information for identifying how a count matrix was generated would be great but is a stretch.

For anndata, there are now readers in most bioinformatics-relevant languages I can think of and the format is well-documented: https://github.com/kaizhang/anndata-rs, https://github.com/ilan-gold/anndata.js, https://github.com/scverse/anndataR, and maybe others I'm missing (Julia?).

Ideally we would output zarr because it is both the fastest (ideally with sharding + v3) and best cross-language (with the main outlier being R, where there is ongoing work: scverse/anndataR#190). Relatedly, hdf5 is not cloud friendly (i.e., browser or local + no prospect for support in JavaScript unlike in R for zarr) and is generally slower from what I have observed (in rust + python at least, multithreading in zarr is the default and works well). However, providing options for both is probably realistic.

The other thing about zarr is that SpatialData only supports zarr.

Metadata

Metadata

Assignees

Labels

No labels
No labels
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions