cellarr-se

cellarr-se is a read-only, out-of-core coordinator for TileDB-backed genomic datasets. It wraps the cellarr-array and cellarr-frame primitives into a lazy, SummarizedExperiment-compatible interface, so you can slice large genomics datasets stored on disk without loading them into memory.

Single-cell and bulk RNA-seq datasets frequently exceed available RAM. cellarr-se keeps assay matrices and metadata tables on disk as TileDB arrays, performing synchronized lazy slices across all components only when you request them. The result is always a standard in-memory SummarizedExperiment object.

Install

pip install cellarr-se

Usage

Construction

CellArraySE wraps existing TileDB arrays and frames; it does not create them. Use cellarr-array and cellarr-frame to build the backing stores first.

from cellarr_se import CellArraySE

se = CellArraySE(
    assays={"counts": my_cell_array, "tpm": my_tpm_array},
    row_data=my_row_frame,   # gene annotations (CellArrayFrame)
    col_data=my_col_frame,   # sample annotations (CellArrayFrame)
)

Inspection

se.shape          # (n_genes, n_samples)
se.assay_names    # ["counts", "tpm"]
se.row_names      # pd.Index of gene identifiers
se.col_names      # pd.Index of sample identifiers
se.row_columns    # list of gene metadata fields
se.col_columns    # list of sample metadata fields

se.show()         # print a summary with the first 5 rows of each metadata table
repr(se)          # <CellArraySE: 20000x500 | counts, tpm>

Slicing

Bracket notation supports integer indices, slices, name strings, and lists:

# Positional slice
subset = se[0:100, 0:50]

# Single element
gene = se[5, 3]

# Lists of indices or names
subset = se[["BRCA1", "TP53"], ["sample_001", "sample_042"]]

For attribute-filtered access, use slice() with TileDB query strings:

# Filter rows and columns by metadata attributes
subset = se.slice(
    row_query="gene_type == 'protein_coding'",
    col_query="tissue == 'liver'",
)

# Combine query with explicit column selection
subset = se.slice(
    row_query="gene_type == 'protein_coding'",
    col_subset=slice(0, 50),
    assays=["counts"],
    row_columns=["gene_id", "gene_name"],
)

Both se[...] and se.slice(...) return a standard in-memory SummarizedExperiment.

Assay metadata

se.is_sparse("counts")        # True if backed by SparseCellArray
se.get_assay_type("counts")   # numpy dtype of the assay

Demo

A worked example covering construction, inspection, and slicing is available in the demo notebook.

Note

This project has been set up using BiocSetup and PyScaffold.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
docs		docs
src/cellarr_se		src/cellarr_se
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cellarr-se

Install

Usage

Construction

Inspection

Slicing

Assay metadata

Demo

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cellarr-se

Install

Usage

Construction

Inspection

Slicing

Assay metadata

Demo

Note

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages