diff --git a/404.html b/404.html index 7a24aa47..6415dcbe 100644 --- a/404.html +++ b/404.html @@ -77,7 +77,7 @@

Page not found (404)

diff --git a/articles/index.html b/articles/index.html index fbfc6fa7..6d6fecda 100644 --- a/articles/index.html +++ b/articles/index.html @@ -73,7 +73,7 @@

Additional articles

diff --git a/authors.html b/authors.html index f0599b2b..11253fc1 100644 --- a/authors.html +++ b/authors.html @@ -69,13 +69,13 @@

Citation

Parks B, Abdi I (2025). BPCells: Single Cell Counts Matrices to PCA. -R package version 0.3.1, https://github.com/bnprks/BPCells, https://bnprks.github.io/BPCells. +R package version 0.3.1, https://bnprks.github.io/BPCells.

@Manual{,
   title = {BPCells: Single Cell Counts Matrices to PCA},
   author = {Benjamin Parks and Immanuel Abdi},
   year = {2025},
-  note = {R package version 0.3.1, https://github.com/bnprks/BPCells},
+  note = {R package version 0.3.1},
   url = {https://bnprks.github.io/BPCells},
 }
@@ -89,7 +89,7 @@

Citation

diff --git a/index.html b/index.html index 89ef73cb..201dc877 100644 --- a/index.html +++ b/index.html @@ -197,10 +197,10 @@

Citation

Developers

@@ -215,7 +215,7 @@

Developers

diff --git a/news/index.html b/news/index.html index a30aa69a..b0874988 100644 --- a/news/index.html +++ b/news/index.html @@ -58,6 +58,7 @@

Features

Bug-fixes

To-dos

@@ -86,7 +87,7 @@

ImprovementsExport merge_peaks_iterative(), which helps create non-overlapping peak sets. (pull request #216)
  • Add support for uint16_t when reading in anndata matrices using open_matrix_anndata_hdf5(). (pull request #248)
  • Switch write_matrix_10x_hdf5() to use signed rather than unsigned integers for indices, indptr, and shape to improve compatibility with 10x-produced files. (Thanks to @ycli1995 for pull request #256)
  • -
  • Change behaviour during cbind() and rbind() when matrices are of different types, to upcast instead of erroring out. (pull request #265)
  • +
  • Change behaviour during cbind() and rbind() when matrices are of different types, to upcast instead of erroring out. (pull request #265)
  • Bug-fixes

    @@ -239,7 +240,7 @@

    Bug-fixes#26 reported thanks to @ttumkaya) -
  • Renaming rownames() or colnames() is now propagated when saving matrices (Issue #29 reported thanks to @realzehuali, with an additional fix after report thanks to @Dario-Rocha)
  • +
  • Renaming rownames() or colnames() is now propagated when saving matrices (Issue #29 reported thanks to @realzehuali, with an additional fix after report thanks to @Dario-Rocha)
  • Fixed 64-bit integer overflow (!) that could cause incorrect p-value calculations in marker_features() for features with more than 2.6 million zeros.
  • Improved robustness of the Windows installation process for setups that do not need the -lsz linker flag to compile hdf5
  • Fixed possible memory safety bug where wrapped R objects (such as dgCMatrix) could be potentially garbage collected while C++ was still trying to access the data in rare circumstances.
  • @@ -312,7 +313,7 @@

    Features -

    Site built with pkgdown 2.1.1.

    +

    Site built with pkgdown 2.2.0.

    diff --git a/reference/index.html b/reference/index.html index 89b9c171..a948d300 100644 --- a/reference/index.html +++ b/reference/index.html @@ -47,14 +47,12 @@

    ATAC-seq Fragments +

    Fragment I/O

    - -
    +
    @@ -65,13 +63,15 @@

    Fragment I/Owrite_fragments_memory() write_fragments_dir() open_fragments_dir() write_fragments_hdf5() open_fragments_hdf5()
    Read/write BPCells fragment objects
    -
    + +
    convert_to_fragments() as() as.data.frame() @@ -82,8 +82,7 @@

    Fragment I/OATAC Analysis

    - -

    +
    @@ -94,55 +93,64 @@

    ATAC Analysisnucleosome_counts()
    Count fragments by nucleosomal size
    -
    + +
    footprint()
    Get footprints around a set of genomic coordinates
    -
    + +
    peak_matrix()
    Calculate ranges x cells overlap matrix
    -
    + +
    tile_matrix()
    Calculate ranges x cells tile overlap matrix
    -
    + +
    gene_score_weights_archr() gene_score_archr()
    Calculate GeneActivityScores
    -
    + +
    call_peaks_macs()
    Call peaks using MACS2/3
    -
    + +
    call_peaks_tile()
    Call peaks from tiles
    -
    + +
    merge_peaks_iterative()
    Merge peaks
    -
    + +
    write_insertion_bedgraph() write_insertion_bed() @@ -153,8 +161,7 @@

    ATAC AnalysisFragment Operations

    - -

    +
    @@ -165,49 +172,57 @@

    Fragment Operationsselect_chromosomes()
    Subset, translate, or reorder chromosome IDs
    -
    + +
    select_cells()
    Subset, translate, or reorder cell IDs
    -
    + +
    merge_cells()
    Merge cells into pseudobulks
    -
    + +
    subset_lengths()
    Subset fragments by length
    -
    + +
    select_regions()
    Subset fragments by genomic region
    -
    + +
    prefix_cell_names()
    Add sample prefix to cell names
    -
    + +
    show(<IterableFragments>) cellNames() `cellNames<-`() chrNames() `chrNames<-`()
    IterableFragments methods
    -
    + +
    fragments_identical() @@ -218,8 +233,7 @@

    Fragment OperationsGenomic Range Calculations

    - -

    +
    @@ -230,31 +244,36 @@

    Genomic Range Calculations
    Genomic range formats
    -
    + +
    order_ranges()
    Get end-sorted ordering for genome ranges
    -
    + +
    range_distance_to_nearest()
    Find signed distance to nearest genomic ranges
    -
    + +
    extend_ranges()
    Extend genome ranges in a strand-aware fashion.
    -
    + +
    gene_score_tiles_archr()
    Calculate gene-tile distances for ArchR gene activities
    -
    + +
    normalize_ranges() @@ -265,14 +284,12 @@

    Matrix Operations (RNA + ATAC) +

    Matrix I/O

    - -
    +
    @@ -283,25 +300,29 @@

    Matrix I/Oopen_matrix_anndata_hdf5() write_matrix_anndata_hdf5() write_matrix_anndata_hdf5_dense()
    Read/write AnnData matrix
    -
    + +
    write_matrix_memory() write_matrix_dir() open_matrix_dir() write_matrix_hdf5() open_matrix_hdf5()
    Read/write sparse matrices
    -
    + +
    import_matrix_market() import_matrix_market_10x()
    Import MatrixMarket files
    -
    + +
    as() as.matrix() @@ -312,8 +333,7 @@

    Matrix I/OMatrix Operations

    - -

    +
    @@ -324,85 +344,99 @@

    Matrix Operationsmatrix_stats()
    Calculate matrix stats
    -
    + +
    svds()
    Calculate svds
    -
    + +
    convert_matrix_type()
    Convert the type of a matrix
    -
    + +
    transpose_storage_order()
    Transpose the storage order for a matrix
    -
    + +
    sctransform_pearson()
    SCTransform Pearson Residuals
    -
    + +
    min_scalar() min_by_row() min_by_col()
    Elementwise minimum
    -
    + +
    add_rows() add_cols() multiply_rows() multiply_cols()
    Broadcasting vector arithmetic
    -
    + +
    binarize()
    Convert matrix elements to zeros and ones
    -
    + +
    all_matrix_inputs() `all_matrix_inputs<-`()
    Get/set inputs to a matrix transform
    -
    + +
    checksum()
    Calculate the MD5 checksum of an IterableMatrix
    -
    + +
    apply_by_row() apply_by_col()
    Apply a function to summarize rows/cols
    -
    + +
    regress_out()
    Regress out unwanted variation
    -
    + +
    matrix_type() storage_order() show(<IterableMatrix>) t(<IterableMatrix>) `%*%`(<IterableMatrix>,<matrix>) rowSums(<IterableMatrix>) colSums(<IterableMatrix>) rowMeans(<IterableMatrix>) colMeans(<IterableMatrix>) colVars() rowVars() rowMaxs() colMaxs() rowQuantiles() colQuantiles() log1p(<IterableMatrix>) log1p_slow() expm1(<IterableMatrix>) expm1_slow() `^`(<IterableMatrix>,<numeric>) `<`(<numeric>,<IterableMatrix>) `>`(<IterableMatrix>,<numeric>) `<=`(<numeric>,<IterableMatrix>) `>=`(<IterableMatrix>,<numeric>) round(<IterableMatrix>) `*`(<IterableMatrix>,<numeric>) `+`(<IterableMatrix>,<numeric>) `/`(<IterableMatrix>,<numeric>) `-`(<IterableMatrix>,<numeric>)
    IterableMatrix methods
    -
    + +
    pseudobulk_matrix() @@ -413,8 +447,7 @@

    Reference Annotations
    +

    + +
    match_gene_symbol() canonical_gene_symbol()
    Gene symbol matching
    -
    + +
    read_gtf() read_gencode_genes() read_gencode_transcripts()
    Read GTF gene annotations
    -
    + +
    read_bed() read_encode_blacklist()
    Read a bed file into a data frame
    -
    + +
    read_ucsc_chrom_sizes() @@ -454,8 +491,7 @@

    Clustering +

    @@ -466,25 +502,29 @@

    Clusteringknn_hnsw() knn_annoy()
    Get a knn object from reduced dimensions
    -
    + +
    cluster_graph_leiden() cluster_graph_louvain() cluster_graph_seurat()
    Cluster an adjacency matrix
    -
    + +
    knn_to_graph() knn_to_snn_graph() knn_to_geodesic_graph()
    K Nearest Neighbor (KNN) Graph
    -
    + +
    cluster_membership_matrix() @@ -495,14 +535,12 @@

    Plots

    +

    Single cell plots

    - -
    +
    @@ -513,37 +551,43 @@

    Single cell plotsplot_embedding()

    Plot UMAP or embeddings
    -
    + +
    plot_dot()
    Dotplot
    -
    + +
    plot_fragment_length()
    Fragment size distribution
    -
    + +
    plot_tf_footprint()
    Plot TF footprint
    -
    + +
    plot_tss_profile()
    Plot TSS profile
    -
    + +
    plot_tss_scatter() @@ -554,8 +598,7 @@

    Single cell plotsGenomic track plots

    - -

    +
    @@ -566,43 +609,50 @@

    Genomic track plotstrackplot_coverage()
    Pseudobulk coverage trackplot
    -
    + +
    trackplot_gene()
    Plot transcript models
    -
    + +
    trackplot_loop()
    Plot loops
    -
    + +
    trackplot_genome_annotation()
    Plot range-based annotation tracks (e.g. peaks)
    -
    + +
    trackplot_scalebar()
    Plot scale bar
    -
    + +
    gene_region()
    Find gene region
    -
    + +
    set_trackplot_label() set_trackplot_height() get_trackplot_height() @@ -613,8 +663,7 @@

    Genomic track plotsPlotting utilities

    - -

    +
    @@ -625,13 +674,15 @@

    Plotting utilitiescollect_features()
    Collect features for plotting
    -
    + +
    rotate_x_labels() @@ -642,8 +693,7 @@

    Data

    - -

    +
    @@ -664,7 +714,7 @@

    Data

    diff --git a/search.json b/search.json index 55ecd6e0..d2442d0c 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Basic tutorial","text":"tutorial, : Load RNA ATAC-seq data 10x multiome experiment Filter high-quality cells RNA PCA + UMAP dimensionality reduction Unbiased clustering Visualize marker genes annotate clusters Call ATAC-seq peaks ATAC PCA + UMAP dimensionality reduction Visualize transcription factor footprints Plot accessibility genome tracks tutorial work--progress, inspired Seurat’s PBMC 3k clustering tutorial.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"install-packages","dir":"Articles","previous_headings":"Setup","what":"Install packages","title":"Basic tutorial","text":"Install cran dependencies: irlba (PCA) uwot (UMAP) RcppHNSW (clustering) igraph (clustering) BiocManager (access bioconductor packages) ggplot2 version <=3.3.5 >=3.4.1 (hexbin broken versions 3.3.6-3.4.0) Bioconductor dependencies: BSgenome.Hsapiens.UCSC.hg38 (TF motif scanning) Github: motifmatchr (TF motif scanning) chromVARmotifs (TF motif database)","code":"install.packages(c(\"irlba\", \"uwot\", \"RcppHNSW\", \"igraph\", \"BiocManager\", \"remotes\", \"ggplot2\")) BiocManager::install(\"BSgenome.Hsapiens.UCSC.hg38\") remotes::install_github(c(\"GreenleafLab/motifmatchr\", \"GreenleafLab/chromVARmotifs\"), repos=BiocManager::repositories()) remotes::install_github(c(\"bnprks/BPCells/r\"))"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"set-up-analysis-folder","dir":"Articles","previous_headings":"Setup","what":"Set up analysis folder","title":"Basic tutorial","text":"","code":"library(BPCells) suppressPackageStartupMessages({ library(dplyr) }) # Substitute your preferred working directory for data_dir data_dir <- file.path(tempdir(), \"pbmc-3k\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) setwd(data_dir)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"download-data","dir":"Articles","previous_headings":"Setup","what":"Download data","title":"Basic tutorial","text":"Next, download 3k PBMC dataset 10x Genomics temporary directory. files 500MB large combined","code":"url_base <- \"https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_3k/\" rna_raw_url <- paste0(url_base, \"pbmc_granulocyte_sorted_3k_raw_feature_bc_matrix.h5\") atac_raw_url <- paste0(url_base, \"pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz\") # Increase download timeout from 60 seconds to 5 minutes options(timeout=300) # Only download files if we haven't downloaded already if (!file.exists(\"pbmc_3k_10x.h5\")) { download.file(rna_raw_url, \"pbmc_3k_10x.h5\", mode=\"wb\") } if (!file.exists(\"pbmc_3k_10x.fragments.tsv.gz\")) { download.file(atac_raw_url, \"pbmc_3k_10x.fragments.tsv.gz\", mode=\"wb\") }"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"data-loading","dir":"Articles","previous_headings":"","what":"Data Loading","title":"Basic tutorial","text":"First, convert raw data inputs 10x format bitpacked compressed format stored binary files disk. BPCells can still read data don’t convert format, certain ATAC-seq functionality run much faster converted data. Convert RNA matrix: Convert ATAC-seq fragments ATAC storage space dropped 468 MB gzipped 10x file 209 MB bitpacked storage. RNA storage space dropped 51.2 MB 10x hdf5 file gzip compression 33.5 MB using bitpacking compression. case, storage space little misleading since 39% bitpacked storage spent gene + cell names. case 10x compressed hdf5 bitpacking compression 4-6x smaller uncompressed sparse matrix format AnnData h5ad’s default.","code":"# Check if we already ran import if (!file.exists(\"pbmc_3k_rna_raw\")) { mat_raw <- open_matrix_10x_hdf5(\"pbmc_3k_10x.h5\", feature_type=\"Gene Expression\") %>% write_matrix_dir(\"pbmc_3k_rna_raw\") } else { mat_raw <- open_matrix_dir(\"pbmc_3k_rna_raw\") } mat_raw ## 36601 x 650165 IterableMatrix object with class MatrixDir ## ## Row names: ENSG00000243485, ENSG00000237613 ... ENSG00000277196 ## Col names: AAACAGCCAAACAACA-1, AAACAGCCAAACATAG-1 ... TTTGTTGGTTTGTTGC-1 ## ## Data type: uint32_t ## Storage order: column major ## ## Queued Operations: ## 1. Load compressed matrix from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_rna_raw # Check if we already ran import if (!file.exists(\"pbmc_3k_frags\")) { frags_raw <- open_fragments_10x(\"pbmc_3k_10x.fragments.tsv.gz\") %>% write_fragments_dir(\"pbmc_3k_frags\") } else { frags_raw <- open_fragments_dir(\"pbmc_3k_frags\") } frags_raw ## IterableFragments object of class \"FragmentsDir\" ## ## Cells: 462264 cells with names TTTAGCAAGGTAGCTT-1, GCCTTTGGTTGGTTCT-1 ... ATCACCCTCCATAATG-1 ## Chromosomes: 39 chromosomes with names chr1, chr10 ... KI270713.1 ## ## Queued Operations: ## 1. Read compressed fragments from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_frags"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"rna-seq-filtering","dir":"Articles","previous_headings":"Filter for high-quality cells","what":"RNA-seq filtering","title":"Basic tutorial","text":"use simple minimum read threshold RNA-seq quality. cutoff choose just first knee log-log plot reads vs. barcode rank, separates cells empty droplets.","code":"reads_per_cell <- Matrix::colSums(mat_raw) plot_read_count_knee(reads_per_cell, cutoff = 1e3)"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"download-reference-annotations","dir":"Articles","previous_headings":"Filter for high-quality cells > ATAC-seq filtering","what":"Download reference annotations","title":"Basic tutorial","text":"fetch reference information necessary calculate quality-control statistics. default, fetches latest annotations hg38. Since fetching references involves downloading gtf bed files, provide name directory save files . also allows us skip re-downloading files next time.","code":"genes <- read_gencode_transcripts( \"./references\", release=\"42\", transcript_choice=\"MANE_Select\", annotation_set = \"basic\", features=\"transcript\" # Make sure to set this so we don't get exons as well ) head(genes) ## # A tibble: 6 × 13 ## chr source feature start end score strand frame gene_id gene_type ## ## 1 chr1 HAVANA transcript 65418 71585 . + . ENSG000001… protein_… ## 2 chr1 HAVANA transcript 450739 451678 . - . ENSG000002… protein_… ## 3 chr1 HAVANA transcript 685715 686654 . - . ENSG000002… protein_… ## 4 chr1 HAVANA transcript 923922 944574 . + . ENSG000001… protein_… ## 5 chr1 HAVANA transcript 944202 959256 . - . ENSG000001… protein_… ## 6 chr1 HAVANA transcript 960583 965719 . + . ENSG000001… protein_… ## # ℹ 3 more variables: gene_name , transcript_id , MANE_Select blacklist <- read_encode_blacklist(\"./references\", genome=\"hg38\") head(blacklist) ## # A tibble: 6 × 4 ## chr start end reason ## ## 1 chr10 0 45700 Low Mappability ## 2 chr10 38481300 38596500 High Signal Region ## 3 chr10 38782600 38967900 High Signal Region ## 4 chr10 39901300 41712900 High Signal Region ## 5 chr10 41838900 42107300 High Signal Region ## 6 chr10 42279400 42322500 High Signal Region chrom_sizes <- read_ucsc_chrom_sizes(\"./references\", genome=\"hg38\") head(chrom_sizes) ## # A tibble: 6 × 3 ## chr start end ## ## 1 chr1 0 248956422 ## 2 chr2 0 242193529 ## 3 chr3 0 198295559 ## 4 chr4 0 190214555 ## 5 chr5 0 181538259 ## 6 chr6 0 170805979"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"calculate-atac-seq-quality-control-metrics","dir":"Articles","previous_headings":"Filter for high-quality cells > ATAC-seq filtering","what":"Calculate ATAC-seq quality-control metrics","title":"Basic tutorial","text":"can calculate several built-quality control metrics barcode, including number fragments TSS enrichment. calculations fully compatible ArchR’s methodology quality control statistics. One key ways identify high-quality cells ATAC-seq data plot number fragments vs. TSS Enrichment. plot puts empty droplets bottom-left quadrant, low-quality/dead cells bottom-right quadrant, high-quality cells top-right quadrant. flow-cytometry perspective, use bottom-left group empty droplets negative control help set cutoffs. Due thresholding ArchR’s formula applies denominator TSS Enrichment calculation, low-read cells can’t assigned high TSS Enrichment value. plot TSS enrichment without thresholding, following: Note 200/101 fraction accounts ReadsInTSS drawing 101-bp windows, ReadsFlankingTSS drawing 2x100-bp windows. results low-read droplets measuring high TSS Enrichment, use slightly adjusted cutoffs. can also plot sample-level quality control plots. left, fragment length distribution shows three broad bumps corresponding nucleosome spacing (147bp), smaller wiggles corresponding DNA winding (11.5bp). right, TSS enrichment profile shows strong enrichment signal transcription start sites, well small asymmetrical bump downstream TSS +1 nucleosome.","code":"atac_qc <- qc_scATAC(frags_raw, genes, blacklist) head(atac_qc) ## # A tibble: 6 × 10 ## cellName TSSEnrichment nFrags subNucleosomal monoNucleosomal multiNucleosomal ## ## 1 TTTAGCAA… 45.1 16363 8069 5588 2706 ## 2 GCCTTTGG… 0.198 3 1 2 0 ## 3 AGCCGGTT… 30.9 33313 15855 11868 5590 ## 4 TGATTAGT… 41.9 11908 6103 3817 1988 ## 5 ATTGACTC… 43.9 13075 6932 4141 2002 ## 6 CGTTAGGT… 31.5 14874 6833 5405 2636 ## # ℹ 4 more variables: ReadsInTSS , ReadsFlankingTSS , ## # ReadsInPromoter , ReadsInBlacklist plot_tss_scatter(atac_qc, min_frags=1000, min_tss=10) atac_qc %>% dplyr::mutate(TSSEnrichment=ReadsInTSS/pmax(1,ReadsFlankingTSS) * 200/101) %>% plot_tss_scatter(min_frags=2000, min_tss=20) + ggplot2::labs(title=\"Raw TSS Enrichment\") plot_fragment_length(frags_raw) + plot_tss_profile(frags_raw, genes)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"select-high-quality-cells","dir":"Articles","previous_headings":"Filter for high-quality cells","what":"Select high-quality cells","title":"Basic tutorial","text":"take cells pass minimum RNA reads, minimum ATAC reads, minimum TSS Enrichment cutoffs. subset RNA ATAC input data just cells passing filter. RNA, subset genes least 3 reads. subset operation also puts cells matching order simplifies cross-modality calculations later .","code":"pass_atac <- atac_qc %>% dplyr::filter(nFrags > 1000, TSSEnrichment > 10) %>% dplyr::pull(cellName) pass_rna <- colnames(mat_raw)[Matrix::colSums(mat_raw) > 1e3] keeper_cells <- intersect(pass_atac, pass_rna) frags <- frags_raw %>% select_cells(keeper_cells) keeper_genes <- Matrix::rowSums(mat_raw) > 3 mat <- mat_raw[keeper_genes,keeper_cells]"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"matrix-normalization","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"Matrix normalization","title":"Basic tutorial","text":", walk Seurat-style matrix normalization calculations manually, though soon helper functions simplify process. First log-normalize, roughly equivalent Seurat::NormalizeData Next pick variable genes: look normalized matrix object, can see quite math operations queued performed --fly needed. improve performance downstream PCA, save sparse normalized matrix temporary file just prior normalizations make matrix dense. saves storage space preventing us re-calculate queued operations several-hundred times PCA optimization iterations. case, matrix quite small ’ll just store memory. larger example swap write_matrix_dir(tempfile(\"mat\")) Finally, perform z-score normalization makes matrix dense.","code":"# Normalize by reads-per-cell mat <- multiply_cols(mat, 1/Matrix::colSums(mat)) # Log normalization mat <- log1p(mat * 10000) # Log normalization stats <- matrix_stats(mat, row_stats=\"variance\") # To keep the example small, we'll do a very naive variable gene selection variable_genes <- order(stats$row_stats[\"variance\",], decreasing=TRUE) %>% head(1000) %>% sort() mat_norm <- mat[variable_genes,] mat_norm ## 1000 x 2600 IterableMatrix object with class TransformLog1p ## ## Row names: ENSG00000078369, ENSG00000116251 ... ENSG00000212907 ## Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 ## ## Data type: double ## Storage order: column major ## ## Queued Operations: ## 1. Load compressed matrix from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_rna_raw ## 2. Select rows: 87, 171 ... 36568 and cols: 640783, 89020 ... 504383 ## 3. Convert type from uint32_t to double ## 4. Scale by 1e+04 ## 5. Scale columns by 0.000221, 0.000118 ... 0.000177 ## 6. Transform log1p mat_norm <- mat_norm %>% write_matrix_memory(compress=FALSE) gene_means <- stats$row_stats[\"mean\",variable_genes] gene_vars <- stats$row_stats[\"variance\", variable_genes] mat_norm <- (mat_norm - gene_means) / gene_vars"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"pca-and-umap","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"PCA and UMAP","title":"Basic tutorial","text":"PCA can performed standard solver like irlba, though BPCells also provides C++-level solver based Spectra package built-parallelization support. Next calculate UMAP coordinates","code":"svd <- BPCells::svds(mat_norm, k=50) # Alternate option: irlba::irlba(mat_norm, nv=50) pca <- multiply_cols(svd$v, svd$d) cat(sprintf(\"PCA dimensions: %s\\n\", toString(dim(pca)))) pca[1:4,1:3] ## PCA dimensions: 2600, 50 ## [,1] [,2] [,3] ## [1,] 15.167732 0.8951489 -2.3650024 ## [2,] 6.599775 7.2484737 4.4369185 ## [3,] 14.621697 -1.1929478 -0.6439662 ## [4,] 8.142875 1.0977223 -2.5066235 set.seed(12341512) umap <- uwot::umap(pca) umap[1:4,] ## [,1] [,2] ## [1,] 10.280173 3.032208 ## [2,] -1.513292 -9.496552 ## [3,] 10.052034 3.215374 ## [4,] 8.457198 0.609063"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"clustering","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"Clustering","title":"Basic tutorial","text":"perform quick clustering follows, based PCA coordinates. now can visualize clusters UMAP:","code":"clusts <- knn_hnsw(pca, ef=500) %>% # Find approximate nearest neighbors knn_to_snn_graph() %>% # Convert to a SNN graph cluster_graph_louvain() # Perform graph-based clustering cat(sprintf(\"Clusts length: %s\\n\", length(clusts))) clusts[1:10] ## Clusts length: 2600 ## [1] 1 2 1 2 2 3 2 2 4 5 ## Levels: 1 2 3 4 5 6 7 8 9 10 11 12 plot_embedding(clusts, umap)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"visualize-marker-genes","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"Visualize marker genes","title":"Basic tutorial","text":"annotate clusters cell types, can plot several marker genes overlaid onto UMAP. observe cluster-specific enrichment B-cell marker MS4A1, T-cell marker CD3E, Monocyte marker LYZ. allows us make broad cell type groupings follows: can visualize marker genes cluster using dot plot. typical situations, known marker genes clear, others less specific.","code":"plot_embedding( source = mat, umap, features = c(\"MS4A1\", \"GNLY\", \"CD3E\", \"CD14\", \"FCER1A\", \"FCGR3A\", \"LYZ\", \"CD4\",\"CD8\"), ) cluster_annotations <- c( \"1\" = \"T\", \"2\" = \"CD8 T\", \"3\" = \"B\", \"4\" = \"T\", \"5\" = \"NK\", \"6\" = \"Mono\", \"7\" = \"Mono\", \"8\" = \"Mono\", \"9\" = \"T\", \"10\" = \"DC\", \"11\" = \"Mono\", \"12\" = \"DC\" ) cell_types <- cluster_annotations[clusts] plot_embedding(cell_types, umap) plot_dot( mat, c(\"MS4A1\", \"GNLY\", \"CD3E\", \"CD14\", \"FCER1A\", \"FCGR3A\", \"LYZ\", \"CD4\", \"CD8\"), cell_types )"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"atac-normalization-pca-and-umap","dir":"Articles","previous_headings":"","what":"ATAC Normalization, PCA and UMAP","title":"Basic tutorial","text":"start tile-based peak calling, tests pre-determined overlapping tile positions significant enrichment ATAC-seq signal genome-wide background cell type independently. faster using traditional peak-caller like MACS, though default parameters peaks always 200bp wide positioning resolution approximately +/- 30bp. Next compute peak matrix counting many ATAC-seq insertions overlap peak. save memory rather saving disk since dataset quite small. Next calculate TF-IDF normalization. formula TF-IDF variant Stuart et al. Looking LSI matrix, can see power BPCells performing matrix operations --fly: LSI normalization fact calculated time fragment overlap calculations read matrix. don’t need store intermediate matrices calculations, even peak matrix can re-calculated --fly based fragments object saved disk. Just like RNA, save matrix running PCA. larger dataset, save disk rather memory. Finally, z-score normalization LSI matrix run PCA. standard practice running PCA, commonly done ATAC-seq datasets due fact greatly increases memory usage. methods, 1st PC highly correlated number reads per cell, thrown empirical correction. Luckily, BPCells can avoid memory usage can just normalize data run PCA usual Next calculate UMAP cluster, just like RNA can plot ATAC-seq embedding ATAC-derived clusters, easily compare RNA-derived clusters earlier. BPCells works based order cells matrix fragment object. Since ATAC PCA rows cell order RNA clusters, datasets combine additional work. skip normalization, first observe get high correlation first PC reads-per-cell peak matrix terms actual PCA results, can see cell embeddings mostly 1--1 correspondence across first 6 PCs, though later PCs start diverge. first PC raw TF-IDF corresponds mostly read depth, signal spread across 2 PCs z-score normalized variant. look loading peak PCA, see similar result. Finally, UMAP generated exclude first PC fairly similar, though notable difference positioning dendritic cells","code":"frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE) peaks <- call_peaks_tile(frags_filter_blacklist, chrom_sizes, cell_groups=cell_types, effective_genome_size = 2.8e9) head(peaks) ## # A tibble: 6 × 7 ## chr start end group p_val q_val enrichment ## ## 1 chr1 16644600 16644800 T 0 0 1017. ## 2 chr19 18281733 18281933 Mono 0 0 518. ## 3 chr17 81860866 81861066 DC 7.87e- 63 1.21e- 55 552. ## 4 chr1 1724333 1724533 Mono 0 0 512. ## 5 chr1 228140000 228140200 NK 2.77e-162 2.14e-155 842. ## 6 chr8 30083133 30083333 CD8 T 4.80e-220 7.41e-213 744. top_peaks <- head(peaks, 50000) top_peaks <- top_peaks[order_ranges(top_peaks, chrNames(frags)),] peak_mat <- peak_matrix(frags, top_peaks, mode=\"insertions\") mat_lsi <- peak_mat %>% multiply_cols(1 / Matrix::colSums(peak_mat)) %>% multiply_rows(1 / Matrix::rowMeans(peak_mat)) mat_lsi <- log1p(10000 * mat_lsi) mat_lsi ## 50000 x 2600 IterableMatrix object with class TransformLog1p ## ## Row names: chr1:817200-817400, chr1:827466-827666 ... chrX:155881200-155881400 ## Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 ## ## Data type: double ## Storage order: row major ## ## Queued Operations: ## 1. Read compressed fragments from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_frags ## 2. Select 2600 cells by name: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 ## 3. Calculate 2600 peaks over 50000 ranges: chr1:817201-817400 ... chrX:155881201-155881400 ## 4. Convert type from uint32_t to double ## 5. Scale by 1e+04 ## 6. Scale columns by 6.71e-05, 5.17e-05 ... 0.000801 ## 7. Scale rows by 11.1, 2.78 ... 3.77 ## 8. Transform log1p mat_lsi <- write_matrix_memory(mat_lsi, compress=FALSE) # Compute colMean and colVariance in one pass cell_peak_stats <- matrix_stats(mat_lsi, col_stats=\"variance\")$col_stats cell_means <- cell_peak_stats[\"mean\",] cell_vars <- cell_peak_stats[\"variance\",] mat_lsi_norm <- mat_lsi %>% add_cols(-cell_means) %>% multiply_cols(1 / cell_vars) svd_atac <- BPCells::svds(mat_lsi_norm, k=10) pca_atac <- multiply_cols(svd_atac$v, svd_atac$d) pca_atac[1:4,1:4] ## [,1] [,2] [,3] [,4] ## [1,] -103.64071 1.553515 2.436147 21.22977 ## [2,] -44.75342 -28.737622 -12.681591 -10.01745 ## [3,] -90.74857 3.266168 3.627660 12.95109 ## [4,] -90.74640 -6.447103 6.853840 -15.76629 set.seed(12341512) umap_atac <- uwot::umap(pca_atac) umap_atac[1:4,] ## [,1] [,2] ## [1,] 7.341801 4.20816287 ## [2,] 5.417667 -0.05818889 ## [3,] 4.133997 9.12835273 ## [4,] 8.743609 0.89240464 clusts_atac <- knn_hnsw(pca_atac, ef=500) %>% # Find approximate nearest neighbors knn_to_snn_graph() %>% # Convert to a SNN graph cluster_graph_louvain() # Perform graph-based clustering plot_embedding(clusts_atac, umap_atac, colors_discrete = discrete_palette(\"ironMan\")) + ggplot2::guides(color=\"none\") + plot_embedding(cell_types, umap_atac) svd_atac_no_norm <- BPCells::svds(mat_lsi, k=10) pca_atac_no_norm <- multiply_cols(svd_atac_no_norm$v, svd_atac$d) cor_to_depth <- dplyr::bind_rows( tibble::tibble( method=\"z-score normalize\", abs_cor_to_depth = as.numeric(abs(cor(Matrix::colSums(mat_lsi), pca_atac))), PC=seq_along(abs_cor_to_depth) ), tibble::tibble( method=\"raw TF-IDF\", abs_cor_to_depth = as.numeric(abs(cor(Matrix::colSums(mat_lsi), pca_atac_no_norm))), PC=seq_along(abs_cor_to_depth) ) ) ggplot2::ggplot(cor_to_depth, ggplot2::aes(PC, abs_cor_to_depth, color=method)) + ggplot2::geom_point() + ggplot2::theme_bw() + ggplot2::labs(title=\"Correlation to of PCs to read depth\") cor_between_embeddings <- tidyr::expand_grid( pca_atac_no_norm = seq_len(ncol(pca_atac_no_norm)), pca_atac=seq_len(ncol(pca_atac)) ) %>% mutate( cor = as.numeric(abs(cor(.env$pca_atac, .env$pca_atac_no_norm))) ) ggplot2::ggplot(cor_between_embeddings, ggplot2::aes(pca_atac, pca_atac_no_norm, fill=abs(cor))) + ggplot2::geom_tile() + ggplot2::geom_text(mapping=ggplot2::aes(label=sprintf(\"%.2f\", cor))) + ggplot2::scale_x_continuous(breaks=1:10) + ggplot2::scale_y_continuous(breaks=1:10) + ggplot2::theme_classic() + ggplot2::labs(title=\"Correlation between cell embeddings\", x=\"z-score normalize PCs\", y =\"raw TF-IDF PCs\") cor_between_loadings <- tidyr::expand_grid( pca_atac_no_norm = seq_len(ncol(svd_atac_no_norm$u)), pca_atac=seq_len(ncol(svd_atac$u)) ) %>% mutate( cor = as.numeric(abs(cor(.env$svd_atac$u, .env$svd_atac_no_norm$u))) ) ggplot2::ggplot(cor_between_loadings, ggplot2::aes(pca_atac, pca_atac_no_norm, fill=abs(cor))) + ggplot2::geom_tile() + ggplot2::geom_text(mapping=ggplot2::aes(label=sprintf(\"%.2f\", cor))) + ggplot2::scale_x_continuous(breaks=1:10) + ggplot2::scale_y_continuous(breaks=1:10) + ggplot2::theme_classic() + ggplot2::labs(title=\"Correlation between peak loadings\", x=\"z-score normalize PCs\", y =\"raw TF-IDF PCs\") set.seed(12341512) umap_atac_no_norm <- uwot::umap(pca_atac_no_norm[,-1]) plot_embedding(clusts_atac, umap_atac_no_norm, colors_discrete = discrete_palette(\"ironMan\")) + ggplot2::guides(color=\"none\") + plot_embedding(cell_types, umap_atac_no_norm)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"motif-footprinting","dir":"Articles","previous_headings":"","what":"Motif footprinting","title":"Basic tutorial","text":"motif footprinting, first need find instances motifs--interest peaks Next, can use motif positions plot aggregate accessibility surrounding TF binding sites across cell types proxy TF activity. ’re able see enrichment accessibility neighboring sites myeloid transcription factor DC Monocyte cells. Transcription factor binding (generally) mutually-exclusive nucleosome occupancy, transcription factor bound creates accessibility flanking regions. squiggly bit center due Tn5 insertion bias motif . can also use patchwork library show multiple plots grid, highlighting cell-type-specific factors well general factors like CTCF.","code":"suppressPackageStartupMessages({ library(GenomicRanges) library(Biostrings) }) peaks_sorted <- dplyr::arrange(peaks, chr, start) peaks_gr <- dplyr::mutate(peaks_sorted, start = start + 1) %>% as(\"GenomicRanges\") selected_motifs <- c( \"CEBPA\" = \"ENSG00000245848_LINE568_CEBPA_D_N4\", \"EOMES\" = \"ENSG00000163508_LINE3544_EOMES_D_N1\", \"SPI1\" = \"ENSG00000066336_LINE1813_SPI1_D_N5\", \"CTCF\" = \"ENSG00000102974_LINE747_CTCF_D_N67\" ) suppressWarnings({ motif_positions <- motifmatchr::matchMotifs( chromVARmotifs::human_pwms_v2[selected_motifs], peaks_gr, genome=\"hg38\", out=\"positions\") }) names(motif_positions) <- names(selected_motifs) motif_positions ## GRangesList object of length 4: ## $CEBPA ## GRanges object with 13983 ranges and 1 metadata column: ## seqnames ranges strand | score ## | ## [1] chr1 1060191-1060200 + | 7.24878 ## [2] chr1 1398356-1398365 - | 7.31950 ## [3] chr1 1408228-1408237 - | 7.91954 ## [4] chr1 1470604-1470613 + | 7.26055 ## [5] chr1 1614370-1614379 - | 7.33072 ## ... ... ... ... . ... ## [13979] chrX 154247973-154247982 + | 7.91954 ## [13980] chrX 154377819-154377828 - | 7.91954 ## [13981] chrX 154497506-154497515 + | 8.62478 ## [13982] chrX 154734157-154734166 + | 7.33771 ## [13983] chrX 155242494-155242503 - | 7.24396 ## ------- ## seqinfo: 39 sequences from an unspecified genome; no seqlengths ## ## ... ## <3 more elements> plot_tf_footprint( frags, motif_positions$CEBPA, cell_groups = cell_types, flank = 250, smooth = 2 ) + ggplot2::labs(title=\"CEBPA\") footprinting_plots <- list() for (motif in names(selected_motifs)) { footprinting_plots[[motif]] <- plot_tf_footprint( frags, motif_positions[[motif]], cell_groups = cell_types, flank=250, smooth=2) + ggplot2::labs(title=motif, color=\"Cluster\") } patchwork::wrap_plots(footprinting_plots, guides=\"collect\")"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"genome-accessibility-tracks","dir":"Articles","previous_headings":"","what":"Genome accessibility tracks","title":"Basic tutorial","text":"plot genome accessibility tracks, need select genome region view. BPCells provides helper function find genome regions centered around gene. normalizing tracks, need provide total number reads cell type. can substituted total reads peaks metrics desired. can create first component track plot plotting genome tracks . can see small peak center mainly present B cells (top row), unclear sits relative B-cell marker CD19. much useful gene annotation track added . ’ll get set canonical transcripts (one per gene) Gencode can make annotation track optionally scale bar. Finally, can stack elements trackplot_combine(). Now see small peak just upstream CD19 gene.","code":"region <- gene_region(genes, \"CD19\", extend_bp = 1e5) region ## $chr ## [1] \"chr16\" ## ## $start ## [1] 28831970 ## ## $end ## [1] 29039342 read_counts <- atac_qc$nFrags[ match(cellNames(frags), atac_qc$cellName) ] coverage_plot <- trackplot_coverage( frags, region = region, groups=cell_types, read_counts, bins=500 ) coverage_plot transcripts <- read_gencode_transcripts(\"./references\", release=\"42\") head(transcripts) ## # A tibble: 6 × 13 ## chr source feature start end score strand frame gene_id gene_type ## ## 1 chr1 HAVANA transcript 65418 71585 . + . ENSG000001… protein_… ## 2 chr1 HAVANA exon 65418 65433 . + . ENSG000001… protein_… ## 3 chr1 HAVANA exon 65519 65573 . + . ENSG000001… protein_… ## 4 chr1 HAVANA exon 69036 71585 . + . ENSG000001… protein_… ## 5 chr1 HAVANA transcript 450739 451678 . - . ENSG000002… protein_… ## 6 chr1 HAVANA exon 450739 451678 . - . ENSG000002… protein_… ## # ℹ 3 more variables: gene_name , transcript_id , MANE_Select gene_plot <- trackplot_gene(transcripts, region) gene_plot scalebar_plot <- trackplot_scalebar(region) scalebar_plot # We list plots in order from top to bottom to combine. # Notice that our inputs are also just ggplot objects, so we can make modifications # like removing the color legend from our gene track. trackplot_combine( list( scalebar_plot, coverage_plot, gene_plot + ggplot2::guides(color=\"none\") ) )"},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"rna-seq-normalization-pca","dir":"Articles > Web-only","previous_headings":"","what":"RNA-seq normalization + PCA","title":"Performance Benchmarks","text":"BPCells can perform operations streaming disk, able use dramatically less memory traditional -memory workflows. also extensively optimized underlying C++ code make BPCells much faster traditional disk-backed tools like DelayedArray. benchmark , show time memory usage perform standardized workflow data normalization, variable gene selection, PCA. Note show Seurat’s -memory workflow, though Seurat v5 also offers BPCells integration disk-backed operations. tools given 3 hour time limit 256GB RAM, BPCells able process largest datasets within resource limits. Notice execution speed tends scale number non-zero entries matrix, whereas memory usage BPCells scales number cells (.e. space required store output PCA embeddings).","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"multi-threading","dir":"Articles > Web-only","previous_headings":"RNA-seq normalization + PCA","what":"Multi-threading","title":"Performance Benchmarks","text":"benchmark run single-threaded since tools support multi-threading. However, BPCells can offer 5-10x speedups multiple threads:","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"counts-matrices-rna-or-atac","dir":"Articles > Web-only","previous_headings":"Bitpacking compression","what":"Counts matrices (RNA or ATAC)","title":"Performance Benchmarks","text":"BPCells uses bitpacking compression help speed disk-backed workflows. general purpose compression algorithms shrink file sizes reduce disk read bandwidth, typically come high compute cost. BPCells able provide similar space savings general-purpose compression algorithms like gzip (10x HDF5) LZ4-Blosc (zarr), much lower compute cost. AnnData h5ad files default using compression due speed costs read/write, BPCells bitpacking compression can get faster read/write 4-7x space savings. , benchmark storing + loading 1.3M cell RNA-seq experiment 10x Genomics, using default compression settings storage format1. Notice BPCells compressed format fast enough provide faster read write speeds uncompressed h5ad2. Note don’t benchmark 10x HDF5 write, since 10x directly provide software perform arbitrary matrix writes. likely even slower 10x read speed.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"fragment-alignments-atac","dir":"Articles > Web-only","previous_headings":"Bitpacking compression","what":"Fragment alignments (ATAC)","title":"Performance Benchmarks","text":"addition RNA/ATAC counts matrices, BPCells also introduces compressed file formats scATAC-seq fragments. compared 10x fragment files, ArchR, SnapATAC2 storage 1M cell dataset, found BPCells gave smallest file sizes fastest read/write speeds. using bitpacking compression, BPCells can afford storage space keep fragments fully genome-sorted order. makes import 10x fragment files dramatically faster compared ArchR SnapATAC2 must re-sort fragments grouped cell. ’ll see later also speeds genomic overlap calculations.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"atac-seq-overlap-calculations","dir":"Articles > Web-only","previous_headings":"","what":"ATAC-seq overlap calculations","title":"Performance Benchmarks","text":"working large reference datasets, ’s often helpful able quickly re-quantify cell x peak overlap matrices directly fragments, datasets usually use different sets peak coordinates depending biology interest. BPCells stores fragments genome-sorted order, ’s able perform peak calculations much faster ArchR SnapATAC2. BPCells takes seconds find overlaps small set 10 peaks, also means fast calculating genomic coverage tracks visualization large datasets.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"m-cell-analysis-of-cellxgene-census","dir":"Articles > Web-only","previous_headings":"","what":"44M cell analysis of CELLxGENE census","title":"Performance Benchmarks","text":"combination fast storage disk-backed compute, BPCells able handle unique human cells CELLxGENE census laptop 16 threads 32GB RAM. Compared TileDB matrix storage format, found BPCells file formats offer much faster read/write times similar space usage3. Note dataset take 750GB store counts matrix without compression. much faster file read speeds, BPCells able light compute tasks like computing per-gene mean variance across 44M cells <3 minutes. PCA expensive operation BPCells performs, using 167 passes input matrix calculate 32 PCs. Still, completes less 1 hour server 6.2 hours laptop. makes atlas-scale analysis possible laptop leaving headroom server datasets order magnitude larger. Benchmark details: Mean variance computed log-normalized matrix, unlike plots BPCells manuscript just scaled read counts match default normalization CELLxGENE census. Laptop 32GB RAM 16 threads; server 256GB RAM 32 threads. See BPCells manuscript details.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"update-log","dir":"Articles > Web-only","previous_headings":"","what":"Update log","title":"Performance Benchmarks","text":"Normalization & PCA: Add comparison datasets, add DelayedArray results, add multithreading plot RNA storage: Add comparisons zarr include write times. ATAC storage: Add 10x SnapATAC2 results include read times, switch comparison larger dataset. Peak matrix: Update benchmarks show cross-dataset results 100k peaks, peak subset results 1M cells rather 30k cells. Add comparisons SnapATAC2. Add results 44M cell analysis. March 30, 2023: Added clarification AnnData benchmarks referred h5ad default compression settings (.e. none). March 29, 2023: Created benchmark page.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"matrix-logical-storage-layout","dir":"Articles > Web-only","previous_headings":"","what":"Matrix Logical Storage Layout","title":"Matrix and Fragment Storage Formats","text":"data storage, use storage abstraction named data arrays, stored e.g. single group hdf5 directory files. matrix format compressed sparse column/row (CSC/CSR) format following data arrays: interpretation array follows: val - Values non-zero entries increasing order (column, row) position. index - index[] provides 0-based row index value found val[] (column index row-major storage order) idxptr - indexes idx val entries column j can found idxptr[j] idxptr[j+1] - 1 , inclusive. (row j row-major storage order) shape - number rows matrix, followed number columns row_names - Names row matrix (optional) col_names - Names column matrix (optional) storage_order- col compressed-sparse-column, row compressed-sparse-row Bitpacked compressed matrices consist following modifications: val: unsigned 32-bit integers, replace val val_data, val_idx, val_idx_offsets corresponding BP-128m1 encoding described . total number values already stored last value idxptr. 32-bit 64-bit floats val remains unchanged. index: replace index array BP-128d1z encoded data arrays index_data, index_idx, index_idx_offsets, index_starts matrix stored single directory, HDF5 group, R S4 object. storage format matrix encoded version string. current version string format [compression]-[datatype]-matrix-v2, [compression] can either packed unpacked, [datatype] can one uint, float, double corresponding 32-bit unsigned integer, 32-bit float, 64-bit double respectively. v1 formats, difference idxptr type uint32.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"genomic-fragments-logical-storage-layout","dir":"Articles > Web-only","previous_headings":"","what":"Genomic fragments logical storage layout","title":"Matrix and Fragment Storage Formats","text":"BPCells fragment files store (chromosome, start, end, cell ID) fragment, sorted (chromosome, start). coordinate system follows bed format convention, first base chromosome numbered 0 end coordinate fragment non-inclusive. means 10 base pair long fragment starting first base genome start=0 end=10. End coordinates always guaranteed least large start coordinates. Uncompressed fragment data stored following arrays: arrays following contents: cell: List numeric cell IDs, one per fragment. smallest cell ID 0. start: List fragment start coordinates. first base chromosome 0. end: List fragment end coordinates. base end coordinate one past last base fragment. end_max: end_max[] maximum end coordinate fragments start chromosome fragment index *128-127. multiple chromosomes fragments given chunk 128 fragments, end_max maximum end coordinates. end_max array allows quickly seeking fragments overlapping given genomic region. chr_ptr: chr_ptr[2*] index first fragment chromosome cell, start, end arrays. chr_ptr[2*+ 1]-1 index last fragment chromosome . Fragments need necessarily sorted order increasing chromosome ID, though fragments given chromosome must still stored contiguously. allows logically re-ordering chromosomes write-time even input data source support reading chromosomes --order (.e. 10x fragment files without genome index). cell_names: string identifiers numeric cell ID. chr_names: string identifiers numeric chromosome ID. Compressed fragments stored following modifications: cell replaced cell_data, cell_idx, cell_idx_offsets, compressed according BP-128 encoding. start replaced start_data, start_idx, start_idx_offsets, start_starts, compressed according BP-128d1 encoding. end replaced end_data, end_idx, end_idx_offsets, stores start - end fragment, encoded using BP-128 encoding. current version string equal unpacked-fragments-v2 uncompressed fragments, packed-fragments-v2 compressed fragments. v1 formats, difference chr_ptr type uint32.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bitpacking-formats","dir":"Articles > Web-only","previous_headings":"","what":"Bitpacking formats","title":"Matrix and Fragment Storage Formats","text":"bitpacked formats based formats described paper Lemire Boytsov.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128","title":"Matrix and Fragment Storage Formats","text":"vanilla BP-128 format stored 3 arrays follows: data - stream bitpacked data, represented 32-bit integers interleaved bit layout shown Lemire Boytsov figure 6. chunk 128 32-bit input integers BB bits per integer stored using 4B4B 32-bit integers holding bitpacked data. idx - list 32-bit integers, encoded data integers index 128*128*+ 127 can found data index idx[] index idx[+1]-1. lists 2322^{32} (4 billion) entries greater, idx stores index modulo 2322^{32} idx_offsets - list 64-bit integers, values idx indices idx_offsets[] idx_offsets[+1]-1 *(2^32) added .","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128m1","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128m1","title":"Matrix and Fragment Storage Formats","text":"BP-128, 1 subtracted value prior compression","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128d1","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128d1","title":"Matrix and Fragment Storage Formats","text":"Equivalent BP-128* algorithm Lemire Boytsov integers difference encoded prior bitpacking. best lists sorted integers. data - Encoding vanilla BP-128, difference encoding prior bitpacking: x0′=0x_{0}^{\\prime}=0, x1′=x1−x0x_{1}^{\\prime}=x_{1}-x_{0}, x2′=x2−x1x_{2}^{\\prime}=x_{2}-x_{1}, …, x127′=x127−x126x_{127}^{\\prime}=x_{127}-x_{126} idx, idx_offsets - identical BP-128 starts - list 32-bit integers, starts[] decoded value integer index 128*","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128d1z","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128d1z","title":"Matrix and Fragment Storage Formats","text":"Similar BP128d1 zigzag encoding applied difference encoding. best lists close fully sorted runs integers. data - Encoding BP-128d1, difference encoding bitpacking, results zigzag encoded, zigzag(x)=2xzigzag(x)=2x x≥0x\\geq0, zigzag(x)=−2x−1zigzag(x)=-2x-1 x<0x<0. idx, idx_offsets - identical BP-128 starts - identical BP128-d1 Illustrative reference code BP-128 d1 zigzag transformations can found .","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"physical-storage-layout","dir":"Articles > Web-only","previous_headings":"","what":"Physical storage layout","title":"Matrix and Fragment Storage Formats","text":"abstraction named data arrays can realized different formats. three currently supported BPCells :","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"directory-of-files-format","dir":"Articles > Web-only","previous_headings":"Physical storage layout","what":"Directory of files format:","title":"Matrix and Fragment Storage Formats","text":"default storage backend due simplicity high performance. Arrays stored binary files within directory. Numeric array files 8-byte header followed data values little-endian binary format integers IEEE-754 32-bit 64-bit floating point numbers. Header values 8-byte ASCII text follows: unsigned 32-bit integer UINT32v1, unsigned 64-bit integer UINT64v1, 32-bit float FLOATSv1, 64-bit float DOUBLEv1. Arrays strings stored ASCII text one array value per line header. version string stored file named “version” containing version string followed newline.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"hdf5-file-format","dir":"Articles > Web-only","previous_headings":"Physical storage layout","what":"Hdf5 file format:","title":"Matrix and Fragment Storage Formats","text":"storage backend can useful embedding BPCells formats group within h5ad HDF5 file. Arrays numbers stored HDF5 datasets using built-HDF5 encoding format. Arrays strings stored HDF5 variable length string datasets. version string stored version attribute HDF5 group.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"r-object-format","dir":"Articles > Web-only","previous_headings":"Physical storage layout","what":"R object format:","title":"Matrix and Fragment Storage Formats","text":"storage backend primarily useful testing, bitpacking compression -memory data desired avoid disk bandwidth bottlenecks. Strings stored native R character arrays. Unsigned integers 32-bit floats stored native R integer arrays bitcasting R signed integers required data types. 64-bit floats stored native R numeric arrays. 64-bit integers stored doubles R numeric arrays. reduces highest representable value 264−12^{64}-1 253−12^{53}-1 (9 quadrillion), expect pose practical problems. Named collections arrays stored R lists (writing) S4 objects (reading). version string stored string vector named “version” length 1.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/how-it-works.html","id":"operating-principles","dir":"Articles > Web-only","previous_headings":"","what":"Operating Principles","title":"How BPCells works","text":"Two key principles understand using BPCells operations streaming lazy. Streaming means minimal amount data stored memory computation happening. almost memory used storing intermediate results. Hence, can compute operations large matrices without ever loading fully memory. Lazy means real work performed matrix fragment objects result needs returned R object written disk. helps support streaming computation, since otherwise forced compute intermediate results use additional memory.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/how-it-works.html","id":"basic-usage","dir":"Articles > Web-only","previous_headings":"Operating Principles","what":"Basic usage","title":"How BPCells works","text":"begin basic example loading ATAC fragments 10x fragments file, reading peak set bed file, calculating cell x peak matrix. bitpacked compressed fragment file half size 10x file, much faster read.","code":"library(\"BPCells\") # File reading is lazy, so this is instantaneous fragments <- open_fragments_10x(\"atac_fragments.tsv.gz\") # This is when we actually read the file, should take 1-2 minutes to scan # since we bottleneck on gzip decompression. packed_fragments <- write_fragments_dir(fragments, \"pbmc-3k-fragments\") # Later, we can re-open these fragments packed_fragments <- open_fragments_dir(\"pbmc-3k_fragments\") peaks <- read_bed(\"peaks.bed\") # This is fast because the peak matrix calculation is lazy. # It will be computed on-the-fly when we actually need results from it. peak_matrix <- peak_matrix(packed_fragments, peaks) # Here is where the peak matrix calculation happens. Runs over 10-times # faster than ArchR, which utilizes IRanges to perform overlap calculations. R_matrix <- as(peak_matrix, \"dgCMatrix\")"},{"path":"https://bnprks.github.io/BPCells/articles/web-only/how-it-works.html","id":"streaming-operations","dir":"Articles > Web-only","previous_headings":"Operating Principles","what":"Streaming operations","title":"How BPCells works","text":"lazy, stream-oriented design means can calculate complicated transformations single pass. faster memory-efficient calculating several intermediate results sequential manner. example, perform following pipeline: 1. Exclude fragments non-standard chromosomes 2. Subset cells 3. Add Tn5 offset 4. Calculate peak matrix 5. Calculate mean-accessibility per peak done using e.g. GRanges sparse matrices, need 3 passes fragments saving intermediate results, 2 passes peak matrix. BPCell’s streaming operations, can done directly fragments single pass, memory usage limited bytes per cell iterating peak matrix returning colMeans. Note knew cell names ahead time, even perform operation directly orignal 10x fragments without ever saving fragments memory. fairly slow 10x fragment files slow decompress, ’s recommended convert BPCells format.","code":"# Here I make use of the new pipe operator |> for better readability # We'll subset to just the standard chromosomes standard_chr <- which( stringr::str_detect(chrNames(packed_fragments), \"^chr[0-9XY]+$\") ) # Pick a random subset of 100 cells to consider set.seed(1337) keeper_cells <- sample(cellNames(packed_fragments), 100) # Run the pipeline, and save the average accessibility per peak peak_accessibility <- packed_fragments |> select_chromosomes(standard_chr) |> select_cells(keeper_cells) |> shift_fragments(shift_start=4, shift_end=-5) |> peak_matrix(peaks) |> colMeans()"},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-efficiency.html","id":"normalizations-and-pca","dir":"Articles > Web-only","previous_headings":"","what":"Normalizations and PCA","title":"Efficiency tips","text":"Avoid dense matrices whenever possible. Put normalizations preserve sparsity (0 values stay 0) normalizations break sparsity (e.g. adding values row/column). typical RNA-seq matrix <5% non-zero entries, code operate 20x entries dense matrix. operations, recommend using lazy evaluation avoid creating intermediate matrices. one common exception rule running PCA. PCA requires looping matrix several hundred times, often faster write matrix disk just PCA rather recalculating entries PCA iteration. storage efficiency, keep sparsity-breaking normalizations delayed, store sparse normalizations temporary location write_matrix_dir() apply sparsity-breaking normalizations Adding values rows/columns matrix little overhead PCA translates pre post processing step mat-vec multiply iteration. sparsity-breaking operation, adding vector matrix causes operations become expensive, however.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-efficiency.html","id":"storage-order","dir":"Articles > Web-only","previous_headings":"","what":"Storage order","title":"Efficiency tips","text":"Marker features can computed matrix indexed gene/feature. Sparse matrix multiplication can performed matrices storage order Sparse matrix multiplication performance can change dramatically depending storage order relative matrix size/sparsity. column-major matrices, left matrix fast load contain delayed operations, right matrix can slow load contain many delayed operations. row-major matrices left/right preferences reversed. can check storage order matrix printing R terminal calling t() function, BPCells just flips boolean flag whether matrix row-major column-major. affect underlying storage order. adjust underlying storage order, call transpose_storage_order(). slower operation, requires writing new copy data disk.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-efficiency.html","id":"other-tips","dir":"Articles > Web-only","previous_headings":"","what":"Other tips","title":"Efficiency tips","text":"running disk-backed analysis, always try store working copy data fast local SSDs. default laptops, servers may want copy data files networked file system physically attached SSD best performance. Use single call matrix_stats() calculate mean + variance single pass matrix possible. See function reference details. ATAC-seq data, can calculate variable features tile matrix without ever saving disk. allows subset variable tiles create peak matrix just variable tiles space savings.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-philosophy.html","id":"working-without-a-project-object","dir":"Articles > Web-only","previous_headings":"","what":"Working without a project object","title":"Programming Philosophy","text":"Imagine want plot UMAP cells colored cluster. BPCells, way providing: 1, matrix cells x UMAP coordinates 2. vector listing cells belong cluster correspondence cells clusters determined based ordering. rows UMAP matrix order cluster membership vector. keep simple, recommend following approach: See tutorial example, make keeper_cells vector order data consistently according list cell IDs. downstream operations (PCA, clustering, etc.), cell order preserved unless explicitly change . things “just work” keep track per-cell metadata, can helpful make data frame tracking sample IDs, cluster membership, metadata Working without project object provides lot flexibility, since user can easily swap UMAP embeddings, cluster assignments, etc. just providing different variable input. ’s also need “export” metadata since wasn’t import step begin . course, power come additional responsibility keep track metadata. Keeping BPCells flexible power users retaining ease--use newbies ongoing effort, BPCells currently falls side power users","code":""},{"path":"https://bnprks.github.io/BPCells/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Benjamin Parks. Author, maintainer, copyright holder. Immanuel Abdi. Author. Stanford University. Copyright holder, funder. Genentech, Inc.. Copyright holder, funder.","code":""},{"path":"https://bnprks.github.io/BPCells/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Parks B, Abdi (2025). BPCells: Single Cell Counts Matrices PCA. R package version 0.3.1, https://github.com/bnprks/BPCells, https://bnprks.github.io/BPCells.","code":"@Manual{, title = {BPCells: Single Cell Counts Matrices to PCA}, author = {Benjamin Parks and Immanuel Abdi}, year = {2025}, note = {R package version 0.3.1, https://github.com/bnprks/BPCells}, url = {https://bnprks.github.io/BPCells}, }"},{"path":"https://bnprks.github.io/BPCells/index.html","id":"bpcells","dir":"","previous_headings":"","what":"Single Cell Counts Matrices to PCA","title":"Single Cell Counts Matrices to PCA","text":"site R package. Python site (experimental) BPCells package high performance single cell analysis large RNA-seq ATAC-seq datasets. can run normalization PCA 1.3M cell dataset 4 minutes 2GB RAM, create scATAC-seq peak matrices fragment coordinates 50x less CPU time ArchR SnapATAC2. BPCells can even handle full CELLxGENE census human dataset, running full precision PCA 44M cell x 60k gene matrix 6 hours laptop <1 hour server. See benchmarks page details. BPCells provides: Efficient storage single cell datasets via bitpacking compression Fast, disk-backed RNA-seq ATAC-seq data processing powered C++ Downstream analysis marker genes, clustering Interoperability AnnData, 10x datasets, R sparse matrices, GRanges Demonstrated scalability 44M cells laptop Additionally, BPCells exposes optimized data processing infrastructure use scaling 3rd party single cell tools (e.g. Seurat)","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"learn-more","dir":"","previous_headings":"","what":"Learn more","title":"Single Cell Counts Matrices to PCA","text":"BioRxiv preprint Benchmarks Multiomic analysis example BPCells works Additional articles Function documentation News","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"r-installation","dir":"","previous_headings":"","what":"R Installation","title":"Single Cell Counts Matrices to PCA","text":"recommend installing BPCells directly github: installing, must HDF5 library installed accessible system. HDF5 can installed choice package manager. See operating system specific instructions . Mac Windows users trouble installing github, check R-universe page instructions install pre-built binary packages. binary packages automatically track latest github main branch. BPCells available via conda thanks @mfansler Conda Forge R team (see issue #241 details). issues bioconda package reported bioconda-recipes. Version updates managed bioconda team.","code":"remotes::install_github(\"bnprks/BPCells/r\")"},{"path":"https://bnprks.github.io/BPCells/index.html","id":"linux","dir":"","previous_headings":"R Installation","what":"Linux","title":"Single Cell Counts Matrices to PCA","text":"Obtaining HDF5 dependency usually pretty straightforward Linux apt: sudo apt-get install libhdf5-dev yum: sudo yum install hdf5-devel Note: Linux users prefer distro’s package manager (e.g. apt yum) possible, appears give slightly reliable installation experience.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"windows","dir":"","previous_headings":"R Installation","what":"Windows","title":"Single Cell Counts Matrices to PCA","text":"Compiling R packages source Windows requires installing R tools Windows. See Issue #9 discussion.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"macos","dir":"","previous_headings":"R Installation","what":"MacOS","title":"Single Cell Counts Matrices to PCA","text":"MacOS, installing HDF5 homebrew seems reliable: brew install hdf5. Mac-specific troubleshooting: Check R installation running sessionInfo(), seeing lists ARM x86 “Platform”. easiest option use ARM R homebrew default ARM hdf5 installation possible (though tricky) install x86 copy homebrew order access x86 version hdf5 Older Macs (10.14 Mojave older): default compiler old Macs support needed C++17 filesystem features. See issue #3 tips getting newer compiler set via homebrew.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"supported-compilers","dir":"","previous_headings":"R Installation","what":"Supported compilers","title":"Single Cell Counts Matrices to PCA","text":"cases, already appropriate compiler. BPCells recommends gcc >=9.1, clang >= 9.0. corresponds versions late-2018 newer. Older versions may work cases long basic C++17 support, officially supported.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"general-installation-troubleshooting","dir":"","previous_headings":"R Installation","what":"General Installation troubleshooting","title":"Single Cell Counts Matrices to PCA","text":"BPCells tries print informative error messages compilation help diagnose problem. verbose set information, run Sys.setenv(BPCELLS_DEBUG_INSTALL=\"true\") prior remotes::install_github(\"bnprks/BPCells/r\"). still can’t solve issue additional information, feel free file Github issue, sure use collapsible section verbose installation log.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"contributing","dir":"","previous_headings":"","what":"Contributing","title":"Single Cell Counts Matrices to PCA","text":"BPCells open source project, welcome quality contributions. interested contributing experience C++, along Python R, feel free reach ideas like implement . ’re happy provide pointers get started, time permitting. unfamiliar C++ difficult contribute code, detailed bug reports reproducible examples still great way help . Github issues best forum . maintain single cell analysis package want use BPCells improve scalability, ’re happy provide advice. couple labs try far, promising success. Email best way get touch (look DESCRIPTION file github contact info). Python developers welcome, though current python package still experimental status.","code":""},{"path":"https://bnprks.github.io/BPCells/python/api/fragments.html","id":null,"dir":"Python > Api","previous_headings":"","what":"Fragment functions#","title":null,"text":"experimental.import_10x_fragments(input, output) Convert 10x fragment file BPCells format experimental.build_cell_groups(fragments, ...) Build cell_groups array use pseudobulk_insertion_counts() experimental.pseudobulk_insertion_counts(...) Calculate pseudobulk coverage matrix experimental.precalculate_insertion_counts(...) Precalculate per-base insertion counts fragment data experimental.PrecalculatedInsertionMatrix(path) Disk-backed precalculated insertion matrix","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/api/matrix.html","id":null,"dir":"Python > Api","previous_headings":"","what":"Matrix functions#","title":null,"text":"experimental.DirMatrix(dir) Disk-backed BPCells integer matrix experimental.MemMatrix(dir[, threads]) -memory BPCells integer matrix","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.T.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.T#","title":null,"text":"property DirMatrix.T: DirMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_h5ad.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_h5ad#","title":null,"text":"classmethod DirMatrix.from_h5ad(h5ad_path: str, out_dir: str, group: str = 'X') → DirMatrix[source]# Create DirMatrix h5ad file. Truncates floating point values integers Parameters: h5ad_path (str) – Path h5ad file out_dir (str) – Output path DirMatrix group (str, optional) – HDF5 group read matrix . Defaults “X”. Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_hstack.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_hstack#","title":null,"text":"classmethod DirMatrix.from_hstack(mats: List[DirMatrix], out_dir: str) → DirMatrix[source]# Create DirMatrix concatenating list DirMatrix objects horizontally (column wise) Parameters: mats (List[DirMatrix]) – List input matrices out_dir (str) – Output path DirMatrix Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_scipy_sparse.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_scipy_sparse#","title":null,"text":"classmethod DirMatrix.from_scipy_sparse(scipy_mat: spmatrix, dir: str) → DirMatrix[source]# Create DirMatrix scipy sparse matrix. write compressed sparse column format input types scipy.sparse.csr_matrix Parameters: scipy_mat (scipy.spmatrix) – Scipy sparse matrix dir (str) – Path write matrix Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_vstack.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_vstack#","title":null,"text":"classmethod DirMatrix.from_vstack(mats: List[DirMatrix], out_dir: str) → DirMatrix[source]# Create DirMatrix concatenating list DirMatrix objects vertically (row wise) Parameters: mats (List[DirMatrix]) – List input matrices out_dir (str) – Output path DirMatrix Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix#","title":null,"text":"class bpcells.experimental.DirMatrix(dir: str)[source]# Disk-backed BPCells integer matrix reads BPCells-format matrices, returning scipy.sparse.csc_matrix objects sliced. Parameters: dir (str) – Path matrix directory Examples Attributes DirMatrix.T Return transposed view matrix DirMatrix.shape Dimensions matrix DirMatrix.threads Number threads use reading (default=1) Methods DirMatrix.from_h5ad(h5ad_path, out_dir[, group]) Create DirMatrix h5ad file. DirMatrix.from_hstack(mats, out_dir) Create DirMatrix concatenating list DirMatrix objects horizontally (column wise) DirMatrix.from_scipy_sparse(scipy_mat, dir) Create DirMatrix scipy sparse matrix. DirMatrix.from_vstack(mats, out_dir) Create DirMatrix concatenating list DirMatrix objects vertically (row wise) DirMatrix.transpose() Return transposed view matrix","code":">>> from bpcells import DirMatrix >>> mat = DirMatrix(\"/path/to/matrix\") >>> mat[:,[1,3,2,4]] <3x4 sparse matrix of type '' with 6 stored elements in Compressed Sparse Column format>"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.shape.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.shape#","title":null,"text":"DirMatrix.shape# Dimensions matrix Type: Tuple[int,int]","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.threads.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.threads#","title":null,"text":"DirMatrix.threads# Number threads use reading (default=1) Type: int","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.transpose.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.transpose#","title":null,"text":"DirMatrix.transpose() → DirMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.T.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.T#","title":null,"text":"property MemMatrix.T: MemMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix#","title":null,"text":"class bpcells.experimental.MemMatrix(dir: str, threads: int = 0)[source]# -memory BPCells integer matrix reads BPCells-format matrices disk, returning scipy.sparse.csc_matrix objects sliced. much memory-intensive, consistently fast random reads Parameters: dir (str) – Path matrix directory Examples Attributes MemMatrix.T Return transposed view matrix MemMatrix.shape Dimensions matrix MemMatrix.threads Threads used reads (default=1) Methods MemMatrix.transpose() Return transposed view matrix","code":">>> from bpcells import MemMatrix >>> mat = MemMatrix(\"/path/to/matrix\") >>> mat[:,[1,3,2,4]] <3x4 sparse matrix of type '' with 6 stored elements in Compressed Sparse Column format>"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.shape.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.shape#","title":null,"text":"MemMatrix.shape# Dimensions matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.threads.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.threads#","title":null,"text":"MemMatrix.threads# Threads used reads (default=1)","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.transpose.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.transpose#","title":null,"text":"MemMatrix.transpose() → MemMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.PrecalculatedInsertionMatrix.get_counts.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.PrecalculatedInsertionMatrix.get_counts#","title":null,"text":"PrecalculatedInsertionMatrix.get_counts(regions: DataFrame)[source]# Load pseudobulk insertion counts Parameters: regions (pandas.DataFrame) – Pandas dataframe columns (chrom, start, end) representing genomic ranges (0-based, end-exclusive like BED format). regions must size. chrom string column; start/end numeric. Returns: Numpy array dimensions (region, psudobulks, position) type numpy.int32 Return type: numpy.ndarray","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.PrecalculatedInsertionMatrix.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.PrecalculatedInsertionMatrix#","title":null,"text":"class bpcells.experimental.PrecalculatedInsertionMatrix(path: str)[source]# Disk-backed precalculated insertion matrix reads per-base precalculated insertion matrices. current implementation EXPERIMENTAL, crash matrices 2^32-1 non-zero entries. Parameters: dir (str) – Path matrix directory See also precalculate_insertion_counts() Attributes PrecalculatedInsertionMatrix.shape Methods PrecalculatedInsertionMatrix.get_counts(regions) Load pseudobulk insertion counts","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.PrecalculatedInsertionMatrix.shape.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.PrecalculatedInsertionMatrix.shape#","title":null,"text":"property PrecalculatedInsertionMatrix.shape: Tuple[int, int][source]#","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.build_cell_groups.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.build_cell_groups#","title":null,"text":"bpcells.experimental.build_cell_groups(fragments: str, cell_ids: Sequence[str], group_ids: Sequence[str], group_order: Sequence[str]) → ndarray[source]# Build cell_groups array use pseudobulk_insertion_counts() Parameters: fragments (str) – Path BPCells fragments directory cell_ids (list[str]) – List cell IDs group_ids (list[str]) – List pseudobulk IDs cell (length cell_ids) group_order (list[str]) – Output order pseudobulks (Contain unique group_ids) Returns: Numpy array suitable input cell_groups pseudobulk_insertion_counts(). length total number cells fragments input, specifying output pseudobulk index cell (-1 cell excluded consideration) Return type: numpy.ndarray See also pseudobulk_insertion_counts()","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.import_10x_fragments.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.import_10x_fragments#","title":null,"text":"bpcells.experimental.import_10x_fragments(input: str, output: str, shift_start: int = 0, shift_end: int = 0, keeper_cells: List[str] | None = None)[source]# Convert 10x fragment file BPCells format Parameters: input (str) – Path 10x input file output (str) – Path BPCells output directory shift_start (int) – Basepairs add start coordinates (generally positive number) shift_end (int) – Basepairs subtract end coordinates (generally negative number) keeper_cells (list[str]) – None, save fragments cells keeper_cells list","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.precalculate_insertion_counts.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.precalculate_insertion_counts#","title":null,"text":"bpcells.experimental.precalculate_insertion_counts(fragments: str, output_dir: str, cell_groups: Sequence[int], chrom_sizes: str | Dict[str, int], threads: int = 0)[source]# Precalculate per-base insertion counts fragment data current implementation EXPERIMENTAL, crash matrices 2^32-1 non-zero entries. Parameters: fragments (str) – Path BPCells fragments directory output_dir (str) – Path save insertion counts cell_groups (list[int]) – List pseudbulk groupings created build_cell_groups() chrom_sizes (str | dict[str, int]) – Path/URL UCSC-style chrom.sizes file, dictionary mapping chromosome names sizes threads (int) – Number threads use matrix calculation (default = 1) Returns: PrecalculatedInsertionMatrix object See also PrecalculatedInsertionMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.pseudobulk_insertion_counts.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.pseudobulk_insertion_counts#","title":null,"text":"bpcells.experimental.pseudobulk_insertion_counts(fragments: str, regions: DataFrame, cell_groups: Sequence[int], bin_size: int = 1) → ndarray[source]# Calculate pseudobulk coverage matrix Coverage calculated number start/end coordinates falling given position bin. Parameters: fragments (str) – Path BPCells fragments directory regions (pandas.DataFrame) – Pandas dataframe columns (chrom, start, end) representing genomic ranges (0-based, end-exclusive like BED format). regions must size. chrom string column; start/end numeric. cell_groups (list[int]) – List pseudbulk groupings created build_cell_groups() bin_size (int) – Size bins within region given basepairs. region width even multiple resolution_bp, last region may truncated. Returns: Numpy array dimensions (region, psudobulks, position) type numpy.int32 Return type: numpy.ndarray See also build_cell_groups()","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"BPCells#","title":null,"text":"BPCells python bindings still experimental API subject change. existing functionality mainly focused allowing read/write access BPCells file formats integer matrices scATAC fragments. Future updates add data-processing functions present R interface (e.g. streaming normalization, PCA, ATAC-seq peak/tile matrix creation). provide Python access shared C++ core code. Notably, plotting functionality currently planned implementation, written primarily R relies R plotting libraries present Python. helper functions R BPCells implemented pure R thus unlikely added Python near future. functionality interest , welcome contributions – able write code pure Python. Reach via github/email interested. BPCells can directly installed via pip: Matrix slicing Basepair insertion dataloading Fragment functions Matrix functions Installation Tutorials API Reference R Docs","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"Installation#","title":null,"text":"BPCells can directly installed via pip:","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"Tutorials#","title":null,"text":"Matrix slicing Basepair insertion dataloading","code":""},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"API Reference#","title":null,"text":"Fragment functions Matrix functions Installation Tutorials API Reference R Docs","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Basepair insertion counts tutorial#","title":null,"text":"BPCells python bindings can used query basepair-level coverage predefined cell types. way works two steps: 10x ArchR arrow files converted BPCells format. flexible BPCells R bindings, though single-sample 10x import supported python bindings. BPCells python bindings use input fragment files create large matrix dimensions (# cell types, # basepairs genome). cell type groupings determined. BPCells python bindings can slice arbitrary genomic regions, returning numpy array dimensions (regions, cell types, basepairs) Benchmark dataset: 600K cell subset Catlas paper, 2.5 billion fragments Benchmark task: Load 128 random 501-bp peak regions 111 cell types basepair resolution Storage location: Local SSD. Networked file systems slower BPCells BigWigs Creation time 4.7 minutes, 8 threads ? File size 6.2 GB 13 GB Query time 0.37 seconds 2.2 seconds Cell type count aggregation can re-run fully Python Query time 6x faster BigWigs Caveat prototype: due development time limitations, insertion matrix implementation support >=2^32 non-zero entries (4.29 billion). catlas dataset 3.2 billion non-zero entries. limitation can removed additional technical work, workaround multiple matrix objects can created individually <2^32 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Benchmark estimates#","title":null,"text":"Benchmark dataset: 600K cell subset Catlas paper, 2.5 billion fragments Benchmark task: Load 128 random 501-bp peak regions 111 cell types basepair resolution Storage location: Local SSD. Networked file systems slower BPCells BigWigs Creation time 4.7 minutes, 8 threads ? File size 6.2 GB 13 GB Query time 0.37 seconds 2.2 seconds","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Main benefits of BPCells#","title":null,"text":"Cell type count aggregation can re-run fully Python Query time 6x faster BigWigs Caveat prototype: due development time limitations, insertion matrix implementation support >=2^32 non-zero entries (4.29 billion). catlas dataset 3.2 billion non-zero entries. limitation can removed additional technical work, workaround multiple matrix objects can created individually <2^32 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Usage Demo#","title":null,"text":"use public 500-cell 10x dataset 484 rows × 20 columns Notice conversion allows adjusting start/end coordinates, well subsetting barcodes passing QC. Adding 1 end coordinate necessary 10x inputs produced cellranger calculate insertion matrix, first define cell groups, well ordering cell groups want output matrix. use first two characters cell barcode since annotated cell types available. Note possible leave cells calling build_cell_groups, case data included precalculated matrix Next, precalculate insertion counts matrix, can use parallelization speed portions work. can load pre-calculated matrix input path. query matrix, use pandas DataFrame, columns (chrom, start, end). regions must length BPCells returns numpy array dimensions (regions, cell types, basepairs), holding per-base counts cell type simple wrap matrix pytorch-compatible dataset, given set regions training set. Note use non-standard __getitems__() function pytorch uses provide batched loading higher performance. dataset object can directly passed torch.utils.data.DataLoader.","code":"import bpcells.experimental import pandas as pd import os.path import subprocess import tempfile tmpdir = tempfile.TemporaryDirectory() fragments_10x_path = os.path.join(tmpdir.name, \"atac_fragments.tsv.gz\") data_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz\" subprocess.run([\"curl\", \"--silent\", data_url], stdout=open(fragments_10x_path, \"w\")) CompletedProcess(args=['curl', '--silent', 'https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz'], returncode=0) metadata_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_singlecell.csv\" metadata_path = os.path.join(tmpdir.name, \"cell_metadata.csv\") subprocess.run([\"curl\", \"--silent\", metadata_url], stdout=open(metadata_path, \"w\")) cell_metadata = pd.read_csv(metadata_path) cell_metadata = cell_metadata[cell_metadata.is__cell_barcode == 1].reset_index() cell_metadata cell_metadata.is__cell_barcode.sum() np.int64(484) %%time fragments_bpcells_path = os.path.join(tmpdir.name, \"bpcells_fragments\") bpcells.experimental.import_10x_fragments( input = fragments_10x_path, output = fragments_bpcells_path, shift_end=1, keeper_cells=cell_metadata.barcode[cell_metadata.is__cell_barcode == 1] ) CPU times: user 3.43 s, sys: 74.9 ms, total: 3.51 s Wall time: 3.45 s %%time barcodes = cell_metadata.barcode clusters = cell_metadata.barcode.str.slice(0,2) cluster_order = sorted(set(clusters)) cell_groups_array = bpcells.experimental.build_cell_groups(fragments_bpcells_path, barcodes, clusters, cluster_order) # We could provide a dict or local file path, but URL is easier chrom_sizes = \"http://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes\" insertions_matrix_path = os.path.join(tmpdir.name, \"bpcells_insertions_matrix\") bpcells.experimental.precalculate_insertion_counts( fragments_bpcells_path, insertions_matrix_path, cell_groups_array, chrom_sizes, threads=4 ) CPU times: user 3min 8s, sys: 710 ms, total: 3min 8s Wall time: 1min 45s Notebooks","previous_headings":"","what":"Data download#","title":null,"text":"use public 500-cell 10x dataset 484 rows × 20 columns","code":"import os.path import subprocess import tempfile tmpdir = tempfile.TemporaryDirectory() fragments_10x_path = os.path.join(tmpdir.name, \"atac_fragments.tsv.gz\") data_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz\" subprocess.run([\"curl\", \"--silent\", data_url], stdout=open(fragments_10x_path, \"w\")) CompletedProcess(args=['curl', '--silent', 'https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz'], returncode=0) metadata_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_singlecell.csv\" metadata_path = os.path.join(tmpdir.name, \"cell_metadata.csv\") subprocess.run([\"curl\", \"--silent\", metadata_url], stdout=open(metadata_path, \"w\")) cell_metadata = pd.read_csv(metadata_path) cell_metadata = cell_metadata[cell_metadata.is__cell_barcode == 1].reset_index() cell_metadata cell_metadata.is__cell_barcode.sum() np.int64(484)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Convert to BPCells format#","title":null,"text":"Notice conversion allows adjusting start/end coordinates, well subsetting barcodes passing QC. Adding 1 end coordinate necessary 10x inputs produced cellranger","code":"%%time fragments_bpcells_path = os.path.join(tmpdir.name, \"bpcells_fragments\") bpcells.experimental.import_10x_fragments( input = fragments_10x_path, output = fragments_bpcells_path, shift_end=1, keeper_cells=cell_metadata.barcode[cell_metadata.is__cell_barcode == 1] ) CPU times: user 3.43 s, sys: 74.9 ms, total: 3.51 s Wall time: 3.45 s"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Create the insertion matrix#","title":null,"text":"calculate insertion matrix, first define cell groups, well ordering cell groups want output matrix. use first two characters cell barcode since annotated cell types available. Note possible leave cells calling build_cell_groups, case data included precalculated matrix Next, precalculate insertion counts matrix, can use parallelization speed portions work.","code":"%%time barcodes = cell_metadata.barcode clusters = cell_metadata.barcode.str.slice(0,2) cluster_order = sorted(set(clusters)) cell_groups_array = bpcells.experimental.build_cell_groups(fragments_bpcells_path, barcodes, clusters, cluster_order) # We could provide a dict or local file path, but URL is easier chrom_sizes = \"http://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes\" insertions_matrix_path = os.path.join(tmpdir.name, \"bpcells_insertions_matrix\") bpcells.experimental.precalculate_insertion_counts( fragments_bpcells_path, insertions_matrix_path, cell_groups_array, chrom_sizes, threads=4 ) CPU times: user 3min 8s, sys: 710 ms, total: 3min 8s Wall time: 1min 45s Notebooks","previous_headings":"","what":"Querying the insertion matrix#","title":null,"text":"can load pre-calculated matrix input path. query matrix, use pandas DataFrame, columns (chrom, start, end). regions must length BPCells returns numpy array dimensions (regions, cell types, basepairs), holding per-base counts cell type","code":"mat = bpcells.experimental.PrecalculatedInsertionMatrix(insertions_matrix_path) mat Notebooks","previous_headings":"","what":"Pytorch-compatible dataset#","title":null,"text":"simple wrap matrix pytorch-compatible dataset, given set regions training set. Note use non-standard __getitems__() function pytorch uses provide batched loading higher performance. dataset object can directly passed torch.utils.data.DataLoader.","code":"class BPCellsDataset: def __init__(self, regions, matrix_dir): self.regions = regions[[\"chrom\", \"start\", \"end\"]] matrix_dir = str(os.path.abspath(os.path.expanduser(matrix_dir))) self.mat = bpcells.experimental.PrecalculatedInsertionMatrix(matrix_dir) peak_width = self.regions.end[0] - self.regions.start[0] assert (self.regions.end - self.regions.start == peak_width).all() def __getitem__(self, i): return self.__getitems__([i])[0] def __getitems__(self, idx): # Adding this function allows for batched loading # See: https://github.com/pytorch/pytorch/issues/107218 # Return tensor of shape (batch_size, n_tasks, basepairs) return self.mat.get_counts( self.regions.iloc[idx,] ) def __len__(self): return self.regions.shape[0]"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Matrix slicing tutorial#","title":null,"text":"BPCells prototype Python bindings allow matrix creation slicing, optional multithreaded reads. scimilarity dataset 15M human cells, compressed storage 64GB (2.2 bytes/non-zero) Read speeds 10k random cells 15M human cells (range 5 random tests) Storage location 1 thread 4 threads Memory 2.8-4.7 seconds 1.0-1.1 seconds Local SSD 4.5-4.9 seconds 1.5-1.7 seconds Networked FS (warm cache) 20-21 seconds 5.5-6.2 seconds Networked FS (cold cache) 🙁 76-115 seconds Slicing matrix returns scipy.sparse matrix can use many slicing options standard numpy matrices can also make transposed view matrix similar numpy. work done, just switch row-major col-major representations matrix path 13 files (compressed integer matrices), contain data metadata can concatenate multiple matrices single file disk low memory usage. allows importing many samples parallel, concatenating together single matrix larger matrices, can desirable perform matrix reading multi-threaded manner. using multiple threads, BPCells divide matrix slice query chunks loaded parallel, recombined memory threads completed. performing random slicing along major storage axis, seek latency primary performance bottleneck. Setting high number threads (even actual core count machine) can help mitigate filesystem seek latency. slicing across non-major storage axis, decompression speed can become performance bottleneck. Setting threads number available cores can help parallelize decompression speed. cell-major RNA-seq matrices, thread can process compressed input rate 1 GB/s, filesystems >1GB/s sequential read speeds benefit parallelization. neural network training use-cases, fast slicing performance may critical avoid bottlenecking data loads. case, BPCells supports loading compressed data memory, eliminates seek latency saving ~4x memory usage compared uncompressed scipy sparse matrix. Loading can performed existing BPCells matrix directory, current version involves re-compressing data -memory load time (avoidable, bit trickier code direct loading isn’t implemented yet)","code":"import bpcells.experimental import os import tempfile import numpy as np import scipy.sparse tmp = tempfile.TemporaryDirectory() os.chdir(tmp.name) mat = scipy.sparse.csc_matrix(np.array([ [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]] )) mat mat.toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]]) bp_mat = bpcells.experimental.DirMatrix.from_scipy_sparse(mat, \"basic_mat\") bp_mat <3x4 col-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat> bp_mat[:,:] bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat[1:3, [0,2]].toarray() array([[0, 5], [2, 6]], dtype=uint32) bp_mat[[True, False, True], -2:].toarray() array([[4, 0], [6, 0]], dtype=uint32) bp_mat.T <4x3 row-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat> !ls -l basic_mat total 44 -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 col_names -rw-rw-r-- 1 bparks bparks 48 Aug 25 00:51 idxptr -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 index_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 index_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 index_idx_offsets -rw-rw-r-- 1 bparks bparks 12 Aug 25 00:51 index_starts -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 row_names -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 shape -rw-rw-r-- 1 bparks bparks 4 Aug 25 00:51 storage_order -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 val_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 val_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 val_idx_offsets -rw-rw-r-- 1 bparks bparks 22 Aug 25 00:51 version bp_mat = bpcells.experimental.DirMatrix(\"basic_mat\") import anndata anndata.AnnData(mat).write(\"mat.h5ad\") bp_mat = bpcells.experimental.DirMatrix.from_h5ad(\"mat.h5ad\", \"basic_mat_from_h5ad\") bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bpcells.experimental.DirMatrix.from_hstack( [bp_mat, bp_mat], \"basic_mat_hstack\" )[:,:].toarray() array([[1, 0, 4, 0, 1, 0, 4, 0], [0, 0, 5, 7, 0, 0, 5, 7], [2, 3, 6, 0, 2, 3, 6, 0]], dtype=uint32) bpcells.experimental.DirMatrix.from_vstack( [bp_mat, bp_mat], \"basic_mat_vstack\" )[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0], [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat.threads = 8 bp_mat[:,:].toarray() # This will be performed with 8 threads now array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat_mem = bpcells.experimental.MemMatrix(\"basic_mat\") bp_mat_mem.threads = 8 bp_mat_mem[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Performance estimates#","title":null,"text":"scimilarity dataset 15M human cells, compressed storage 64GB (2.2 bytes/non-zero) Read speeds 10k random cells 15M human cells (range 5 random tests) Storage location 1 thread 4 threads Memory 2.8-4.7 seconds 1.0-1.1 seconds Local SSD 4.5-4.9 seconds 1.5-1.7 seconds Networked FS (warm cache) 20-21 seconds 5.5-6.2 seconds Networked FS (cold cache) 🙁 76-115 seconds","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Demo data setup#","title":null,"text":"","code":"import bpcells.experimental import os import tempfile import numpy as np import scipy.sparse tmp = tempfile.TemporaryDirectory() os.chdir(tmp.name) mat = scipy.sparse.csc_matrix(np.array([ [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]] )) mat mat.toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]])"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Basic usage from scipy.sparse#","title":null,"text":"Slicing matrix returns scipy.sparse matrix can use many slicing options standard numpy matrices can also make transposed view matrix similar numpy. work done, just switch row-major col-major representations","code":"bp_mat = bpcells.experimental.DirMatrix.from_scipy_sparse(mat, \"basic_mat\") bp_mat <3x4 col-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat> bp_mat[:,:] bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat[1:3, [0,2]].toarray() array([[0, 5], [2, 6]], dtype=uint32) bp_mat[[True, False, True], -2:].toarray() array([[4, 0], [6, 0]], dtype=uint32) bp_mat.T <4x3 row-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat>"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Reopening the matrix later#","title":null,"text":"matrix path 13 files (compressed integer matrices), contain data metadata","code":"!ls -l basic_mat total 44 -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 col_names -rw-rw-r-- 1 bparks bparks 48 Aug 25 00:51 idxptr -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 index_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 index_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 index_idx_offsets -rw-rw-r-- 1 bparks bparks 12 Aug 25 00:51 index_starts -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 row_names -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 shape -rw-rw-r-- 1 bparks bparks 4 Aug 25 00:51 storage_order -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 val_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 val_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 val_idx_offsets -rw-rw-r-- 1 bparks bparks 22 Aug 25 00:51 version bp_mat = bpcells.experimental.DirMatrix(\"basic_mat\")"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Import from h5ad#","title":null,"text":"","code":"import anndata anndata.AnnData(mat).write(\"mat.h5ad\") bp_mat = bpcells.experimental.DirMatrix.from_h5ad(\"mat.h5ad\", \"basic_mat_from_h5ad\") bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Concatenate multiple matrices#","title":null,"text":"can concatenate multiple matrices single file disk low memory usage. allows importing many samples parallel, concatenating together single matrix","code":"bpcells.experimental.DirMatrix.from_hstack( [bp_mat, bp_mat], \"basic_mat_hstack\" )[:,:].toarray() array([[1, 0, 4, 0, 1, 0, 4, 0], [0, 0, 5, 7, 0, 0, 5, 7], [2, 3, 6, 0, 2, 3, 6, 0]], dtype=uint32) bpcells.experimental.DirMatrix.from_vstack( [bp_mat, bp_mat], \"basic_mat_vstack\" )[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0], [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Multithreaded operation#","title":null,"text":"larger matrices, can desirable perform matrix reading multi-threaded manner. using multiple threads, BPCells divide matrix slice query chunks loaded parallel, recombined memory threads completed. performing random slicing along major storage axis, seek latency primary performance bottleneck. Setting high number threads (even actual core count machine) can help mitigate filesystem seek latency. slicing across non-major storage axis, decompression speed can become performance bottleneck. Setting threads number available cores can help parallelize decompression speed. cell-major RNA-seq matrices, thread can process compressed input rate 1 GB/s, filesystems >1GB/s sequential read speeds benefit parallelization.","code":"bp_mat.threads = 8 bp_mat[:,:].toarray() # This will be performed with 8 threads now array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Compressed in-memory storage#","title":null,"text":"neural network training use-cases, fast slicing performance may critical avoid bottlenecking data loads. case, BPCells supports loading compressed data memory, eliminates seek latency saving ~4x memory usage compared uncompressed scipy sparse matrix. Loading can performed existing BPCells matrix directory, current version involves re-compressing data -memory load time (avoidable, bit trickier code direct loading isn’t implemented yet)","code":"bp_mat_mem = bpcells.experimental.MemMatrix(\"basic_mat\") bp_mat_mem.threads = 8 bp_mat_mem[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"Python Docs#","title":null,"text":"BPCells python bindings still experimental API subject change. existing functionality mainly focused allowing read/write access BPCells file formats integer matrices scATAC fragments. Future updates add data-processing functions present R interface (e.g. streaming normalization, PCA, ATAC-seq peak/tile matrix creation). provide Python access shared C++ core code. Notably, plotting functionality currently planned implementation, written primarily R relies R plotting libraries present Python. helper functions R BPCells implemented pure R thus unlikely added Python near future. functionality interest , welcome contributions – able write code pure Python. Reach via github/email interested. BPCells can directly installed via pip: Matrix slicing Basepair insertion dataloading Fragment functions Matrix functions","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"Installation#","title":null,"text":"BPCells can directly installed via pip:","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"Tutorials#","title":null,"text":"Matrix slicing Basepair insertion dataloading","code":""},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"API Reference#","title":null,"text":"Fragment functions Matrix functions","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_open_matrix_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","title":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","text":"experimental sparse-column format designed handle storage matrices many columns -zero, less 2^32-1 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_open_matrix_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","text":"","code":"EXPERIMENTAL_open_matrix_dir(dir, buffer_size = 8192L)"},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_open_matrix_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","text":"dir Directory load data buffer_size performance tuning . number items buffered memory calling writes disk.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_write_matrix_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","title":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","text":"experimental sparse-column format designed handle storage matrices many columns -zero, less 2^32-1 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_write_matrix_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","text":"","code":"EXPERIMENTAL_write_matrix_dir(mat, dir, buffer_size = 8192L, overwrite = FALSE)"},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_write_matrix_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","text":"dir Directory save data overwrite TRUE, write temp dir overwrite existing data. Alternatively, pass temp path string customize temp dir location.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableFragments methods — IterableFragments-methods","title":"IterableFragments methods — IterableFragments-methods","text":"Methods IterableFragments objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableFragments methods — IterableFragments-methods","text":"","code":"# S4 method for class 'IterableFragments' show(object) cellNames(x) cellNames(x, ...) <- value chrNames(x) chrNames(x, ...) <- value"},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableFragments methods — IterableFragments-methods","text":"object IterableFragments object x IterableFragments object value Character vector new names","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"IterableFragments methods — IterableFragments-methods","text":"cellNames() Character vector cell names, NULL none known chrNames(): Character vector chromosome names, NULL none known","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"IterableFragments methods — IterableFragments-methods","text":"cellNames<- possible replace names, add new names. chrNames<- possible replace names, add new names.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableFragments methods — IterableFragments-methods","text":"show(IterableFragments): Print IterableFragments cellNames(): Get cell names cellNames(x, ...) <- value: Set cell names chrNames(): Set chromosome names chrNames(x, ...) <- value: Set chromosome names","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"IterableFragments methods — IterableFragments-methods","text":"","code":"## Prep data frags <- tibble::tibble( chr = paste0(\"chr\", c(rep(1,3), rep(2,3))), start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) frags #> # A tibble: 6 × 4 #> chr start end cell_id #> #> 1 chr1 10 40 cell1 #> 2 chr1 60 90 cell1 #> 3 chr1 110 140 cell2 #> 4 chr2 160 190 cell2 #> 5 chr2 210 240 cell3 #> 6 chr2 260 290 cell3 frags <- frags %>% convert_to_fragments() ####################################################################### ## show(IterableFragments) example ####################################################################### show(frags) #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 2 chromosomes with names chr1, chr2 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ####################################################################### ## cellNames(IterableFragments) example ####################################################################### cellNames(frags) #> [1] \"cell1\" \"cell2\" \"cell3\" ####################################################################### ## cellNames(IterableFragments)<- example ####################################################################### cellNames(frags) <- paste0(\"cell\", 5:7) cellNames(frags) #> [1] \"cell5\" \"cell6\" \"cell7\" ####################################################################### ## chrNames(IterableFragments) example ####################################################################### chrNames(frags) #> [1] \"chr1\" \"chr2\" ####################################################################### ## chrNames(IterableFragments)<- example ####################################################################### chrNames(frags) <- paste0(\"chr\", 5:6) chrNames(frags) #> [1] \"chr5\" \"chr6\""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"Methods defined classes extend IterableFragments, providing access metadata specialised behaviours storage backends selection wrappers.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"","code":"# S4 method for class 'FragmentsTsv' chrNames(x) # S4 method for class 'FragmentsTsv' cellNames(x) # S4 method for class 'UnpackedMemFragments' chrNames(x) # S4 method for class 'UnpackedMemFragments' cellNames(x) # S4 method for class 'PackedMemFragments' chrNames(x) # S4 method for class 'PackedMemFragments' cellNames(x) # S4 method for class 'FragmentsDir' chrNames(x) # S4 method for class 'FragmentsDir' cellNames(x) # S4 method for class 'FragmentsHDF5' chrNames(x) # S4 method for class 'FragmentsHDF5' cellNames(x) # S4 method for class 'ChrSelectName' chrNames(x) # S4 method for class 'ChrSelectIndex' chrNames(x) # S4 method for class 'CellSelectName' cellNames(x) # S4 method for class 'CellSelectIndex' cellNames(x) # S4 method for class 'CellMerge' cellNames(x) # S4 method for class 'ChrRename' chrNames(x) # S4 method for class 'CellRename' cellNames(x) # S4 method for class 'CellPrefix' cellNames(x) # S4 method for class 'MergeFragments' chrNames(x) # S4 method for class 'MergeFragments' cellNames(x)"},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"x object inheriting IterableFragments.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"chrNames(FragmentsTsv): Get chromosome names FragmentsTsv cellNames(FragmentsTsv): Get cell names FragmentsTsv chrNames(UnpackedMemFragments): Get chromosome names UnpackedMemFragments cellNames(UnpackedMemFragments): Get cell names UnpackedMemFragments chrNames(PackedMemFragments): Get chromosome names PackedMemFragments cellNames(PackedMemFragments): Get cell names PackedMemFragments chrNames(FragmentsDir): Get chromosome names FragmentsDir cellNames(FragmentsDir): Get cell names FragmentsDir chrNames(FragmentsHDF5): Get chromosome names FragmentsHDF5 cellNames(FragmentsHDF5): Get cell names FragmentsHDF5 chrNames(ChrSelectName): Get chromosome names ChrSelectName chrNames(ChrSelectIndex): Get chromosome names ChrSelectIndex cellNames(CellSelectName): Get cell names CellSelectName cellNames(CellSelectIndex): Get cell names CellSelectIndex cellNames(CellMerge): Get cell names CellMerge chrNames(ChrRename): Get chromosome names ChrRename cellNames(CellRename): Get cell names CellRename cellNames(CellPrefix): Get cell names CellPrefix chrNames(MergeFragments): Get chromosome names MergeFragments cellNames(MergeFragments): Get cell names MergeFragments","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-matrixgenerics.html","id":null,"dir":"Reference","previous_headings":"","what":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","title":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","text":"S4 methods enabling MatrixGenerics generics (e.g., rowQuantiles, colQuantiles, rowVars, colVars, rowMaxs, colMaxs) operate IterableMatrix. registered runtime MatrixGenerics available.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-matrixgenerics.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","text":"x IterableMatrix. ... Passed underlying implementation.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-matrixgenerics.html","id":"availability","dir":"Reference","previous_headings":"","what":"Availability","title":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","text":"Methods registered conditionally; MatrixGenerics installed, nothing registered generics fall back usual.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"Generic methods built-functions IterableMatrix objects. include methods described IterableMatrix-methods sense redundancy. instance, %*% described IterableMatrix matrix left right respectively. need show method IterableMatrix right instead.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"","code":"# S4 method for class 'matrix,IterableMatrix' x %*% y # S4 method for class 'numeric,IterableMatrix' x %*% y # S4 method for class 'dgCMatrix,IterableMatrix' x %*% y # S3 method for class 'IterableMatrix' rowQuantiles( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) colQuantiles.IterableMatrix( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) # S4 method for class 'IterableMatrix,numeric' e1 < e2 # S4 method for class 'numeric,IterableMatrix' e1 > e2 # S4 method for class 'IterableMatrix,numeric' e1 <= e2 # S4 method for class 'numeric,IterableMatrix' e1 >= e2 # S4 method for class 'numeric,IterableMatrix' e1 * e2 # S4 method for class 'numeric,IterableMatrix' e1 + e2 # S4 method for class 'numeric,IterableMatrix' e1 - e2"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"digits Number decimal places quantile calculations ... Additional arguments passed methods drop Logical indicating whether drop dimensions subsetting.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"x %*% y: Multiply dense matrix IterableMatrix x %*% y: Multiply numeric row vector IterableMatrix x %*% y: Multiply dgCMatrix IterableMatrix rowQuantiles(IterableMatrix): Calculate rowQuantiles (replacement matrixStats::rowQuantiles) colQuantiles.IterableMatrix(): Calculate colQuantiles (replacement matrixStats::colQuantiles) e1 < e2: Perform matrix < numeric comparison (unsupported) e1 > e2: Perform numeric > matrix comparison (unsupported) e1 <= e2: Perform matrix <= numeric comparison (unsupported) e1 >= e2: Compare numeric value IterableMatrix using >= (numeric left operand) e1 * e2: Multiply IterableMatrix numeric value row-wise vector (numeric left operand) e1 + e2: Add IterableMatrix numeric value row-wise vector (numeric left operand) e1 - e2: Subtract matrix numeric constant/vector","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableMatrix methods — IterableMatrix-methods","title":"IterableMatrix methods — IterableMatrix-methods","text":"Generic methods built-functions IterableMatrix objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableMatrix methods — IterableMatrix-methods","text":"","code":"matrix_type(x) storage_order(x) # S4 method for class 'IterableMatrix' show(object) # S4 method for class 'IterableMatrix' t(x) # S4 method for class 'IterableMatrix,matrix' x %*% y # S4 method for class 'IterableMatrix' rowSums(x) # S4 method for class 'IterableMatrix' colSums(x) # S4 method for class 'IterableMatrix' rowMeans(x) # S4 method for class 'IterableMatrix' colMeans(x) colVars( x, rows = NULL, cols = NULL, na.rm = FALSE, center = NULL, ..., useNames = TRUE ) rowVars( x, rows = NULL, cols = NULL, na.rm = FALSE, center = NULL, ..., useNames = TRUE ) rowMaxs(x, rows = NULL, cols = NULL, na.rm = FALSE, ..., useNames = TRUE) colMaxs(x, rows = NULL, cols = NULL, na.rm = FALSE, ..., useNames = TRUE) rowQuantiles( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) colQuantiles( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) # S4 method for class 'IterableMatrix' log1p(x) log1p_slow(x) # S4 method for class 'IterableMatrix' expm1(x) expm1_slow(x) # S4 method for class 'IterableMatrix,numeric' e1^e2 # S4 method for class 'numeric,IterableMatrix' e1 < e2 # S4 method for class 'IterableMatrix,numeric' e1 > e2 # S4 method for class 'numeric,IterableMatrix' e1 <= e2 # S4 method for class 'IterableMatrix,numeric' e1 >= e2 # S4 method for class 'IterableMatrix' round(x, digits = 0) # S4 method for class 'IterableMatrix,numeric' e1 * e2 # S4 method for class 'IterableMatrix,numeric' e1 + e2 # S4 method for class 'IterableMatrix,numeric' e1/e2 # S4 method for class 'IterableMatrix,numeric' e1 - e2"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableMatrix methods — IterableMatrix-methods","text":"x IterableMatrix object matrix-like object. object IterableMatrix object y matrix probs (Numeric) Quantile value(s) computed, 0 1. type (Integer) 4 9 selecting quantile algorithm use, detailed matrixStats::rowQuantiles()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"IterableMatrix methods — IterableMatrix-methods","text":"t() Transposed object x %*% y: dense matrix result rowSums(): vector row sums colSums(): vector col sums rowMeans(): vector row means colMeans(): vector col means colVars(): vector col variance rowVars(): vector row variance rowMaxs(): vector maxes every row colMaxs(): vector column maxes rowQuantiles(): length(probs) == 1, return numeric number entries equal number rows matrix. Else, return Matrix quantile values, cols representing quantile, row representing row input matrix. colQuantiles(): length(probs) == 1, return numeric number entries equal number columns matrix. Else, return Matrix quantile values, cols representing quantile, row representing col input matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableMatrix methods — IterableMatrix-methods","text":"matrix_type(): Get matrix data type (mat_uint32_t, mat_float, mat_double now) storage_order(): Get matrix storage order (\"row\" \"col\") show(IterableMatrix): Display IterableMatrix t(IterableMatrix): Transpose IterableMatrix x %*% y: Multiply dense matrix rowSums(IterableMatrix): Calculate rowSums colSums(IterableMatrix): Calculate colSums rowMeans(IterableMatrix): Calculate rowMeans colMeans(IterableMatrix): Calculate colMeans colVars(): Calculate colVars (replacement matrixStats::colVars()) rowVars(): Calculate rowVars (replacement matrixStats::rowVars()) rowMaxs(): Calculate rowMaxs (replacement matrixStats::rowMaxs()) colMaxs(): Calculate colMax (replacement matrixStats::colMax()) rowQuantiles(): Calculate rowQuantiles (replacement matrixStats::rowQuantiles) colQuantiles(): Calculate colQuantiles (replacement matrixStats::colQuantiles) log1p(IterableMatrix): Calculate log(x + 1) log1p_slow(): Calculate log(x + 1) (non-SIMD version) expm1(IterableMatrix): Calculate exp(x) - 1 expm1_slow(): Calculate exp(x) - 1 (non-SIMD version) e1^e2: Calculate x^y (elementwise; y > 0) e1 < e2: Binarize matrix according numeric < matrix comparison e1 > e2: Binarize matrix according matrix > numeric comparison e1 <= e2: Binarize matrix according numeric <= matrix comparison e1 >= e2: Binarize matrix according matrix >= numeric comparison round(IterableMatrix): round nearest integer (digits must 0) e1 * e2: Multiply constant, multiply rows vector length nrow(mat) e1 + e2: Add constant, row-wise addition vector length nrow(mat) e1 / e2: Divide constant, divide rows vector length nrow(mat) e1 - e2: Subtract constant, row-wise subtraction vector length nrow(mat)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"IterableMatrix methods — IterableMatrix-methods","text":"","code":"## Prep data mat <- matrix(1:25, nrow = 5) %>% as(\"dgCMatrix\") mat #> 5 x 5 sparse Matrix of class \"dgCMatrix\" #> #> [1,] 1 6 11 16 21 #> [2,] 2 7 12 17 22 #> [3,] 3 8 13 18 23 #> [4,] 4 9 14 19 24 #> [5,] 5 10 15 20 25 mat <- as(mat, \"IterableMatrix\") mat #> 5 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ####################################################################### ## matrix_type() example ####################################################################### matrix_type(mat) #> [1] \"double\" ####################################################################### ## storage_order() example ####################################################################### storage_order(mat) #> [1] \"col\" ####################################################################### ## show() example ####################################################################### show(mat) #> 5 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ####################################################################### ## t() example ####################################################################### t(mat) #> 5 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ####################################################################### ## `x %*% y` example ####################################################################### mat %*% as(matrix(1:50, nrow = 5), \"dgCMatrix\") #> 5 x 10 IterableMatrix object with class MatrixMultiply #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (5x5) * Iterable_dgCMatrix_wrapper (5x10) ####################################################################### ## rowSums() example ####################################################################### rowSums(mat) #> [1] 55 60 65 70 75 ####################################################################### ## colSums() example ####################################################################### colSums(mat) #> [1] 15 40 65 90 115 ####################################################################### ## rowMeans() example ####################################################################### rowMeans(mat) #> [1] 11 12 13 14 15 ####################################################################### ## colMeans() example ####################################################################### colMeans(mat) #> [1] 3 8 13 18 23 ####################################################################### ## colVars() example ####################################################################### colVars(mat) #> [1] 2.5 2.5 2.5 2.5 2.5 ####################################################################### ## rowMaxs() example ####################################################################### rowMaxs(mat) #> [1] 21 22 23 24 25 ####################################################################### ## colMaxs() example ####################################################################### colMaxs(mat) #> [1] 5 10 15 20 25 ####################################################################### ## rowQuantiles() example ####################################################################### rowQuantiles(transpose_storage_order(mat)) #> 0% 25% 50% 75% 100% #> [1,] 1 6 11 16 21 #> [2,] 2 7 12 17 22 #> [3,] 3 8 13 18 23 #> [4,] 4 9 14 19 24 #> [5,] 5 10 15 20 25 ####################################################################### ## colQuantiles() example ####################################################################### colQuantiles(mat) #> 0% 25% 50% 75% 100% #> [1,] 1 2 3 4 5 #> [2,] 6 7 8 9 10 #> [3,] 11 12 13 14 15 #> [4,] 16 17 18 19 20 #> [5,] 21 22 23 24 25 ####################################################################### ## log1p() example ####################################################################### log1p(mat) #> 5 x 5 IterableMatrix object with class TransformLog1p #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform log1p ####################################################################### ## log1p_slow() example ####################################################################### log1p_slow(mat) #> 5 x 5 IterableMatrix object with class TransformLog1pSlow #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform log1p (non-SIMD implementation) ####################################################################### ## expm1() example ####################################################################### expm1(mat) #> 5 x 5 IterableMatrix object with class TransformExpm1 #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform expm1 ####################################################################### ## expm1_slow() example ####################################################################### expm1_slow(mat) #> 5 x 5 IterableMatrix object with class TransformExpm1Slow #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform expm1 (non-SIMD implementation) ####################################################################### ## `e1 < e2` example ####################################################################### 5 < mat #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x < 5 #> 3. Convert type from double to uint32_t ####################################################################### ## `e1 > e2` example ####################################################################### mat > 5 #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x < 5 #> 3. Convert type from double to uint32_t ####################################################################### ## `e1 <= e2` example ####################################################################### 5 <= mat #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x <= 5 #> 3. Convert type from double to uint32_t ####################################################################### ## `e1 >= e2` example ####################################################################### mat >= 5 #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x <= 5 #> 3. Convert type from double to uint32_t ####################################################################### ## round() example ####################################################################### round(mat) #> 5 x 5 IterableMatrix object with class TransformRound #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform round to 0 decimal places ####################################################################### ## `e1 * e2` example ####################################################################### ## Multiplying by a constant mat * 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale by 5 ## Multiplying by a vector of length `nrow(mat)` mat * 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale rows by 1, 2 ... 5 ####################################################################### ## `e1 + e2` example ####################################################################### ## Add by a constant mat + 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift by 5 ## Adding row-wise by a vector of length `nrow(mat)` mat + 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift rows by 1, 2 ... 5 ####################################################################### ## `e1 / e2` example ####################################################################### ## Divide by a constant mat / 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale by 0.2 ## Divide by a vector of length `nrow(mat)` mat / 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale rows by 1, 0.5 ... 0.2 ####################################################################### ## `e1 - e2` example ####################################################################### ## Subtracting by a constant mat - 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift by -5 ## Subtracting by a vector of length `nrow(mat)` mat - 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift rows by -1, -2 ... -5"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableMatrix subclass methods — IterableMatrix-misc-methods","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"Methods classes extend IterableMatrix dispatched directly base class. typically helper objects wrap another matrix alter behaviour (e.g., concatenation, -disk access).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"","code":"# S4 method for class 'MatrixMultiply' matrix_type(x) # S4 method for class 'MatrixMultiply,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'MatrixMask' matrix_type(x) # S4 method for class 'MatrixRankTransform' matrix_type(x) # S4 method for class 'MatrixSubset' matrix_type(x) # S4 method for class 'MatrixSubset,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'RenameDims' matrix_type(x) # S4 method for class 'RenameDims,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'RowBindMatrices' matrix_type(x) # S4 method for class 'ColBindMatrices' matrix_type(x) # S4 method for class 'RowBindMatrices,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'ColBindMatrices,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'PackedMatrixMem_uint32_t' matrix_type(x) # S4 method for class 'PackedMatrixMem_float' matrix_type(x) # S4 method for class 'PackedMatrixMem_double' matrix_type(x) # S4 method for class 'UnpackedMatrixMem_uint32_t' matrix_type(x) # S4 method for class 'UnpackedMatrixMem_float' matrix_type(x) # S4 method for class 'UnpackedMatrixMem_double' matrix_type(x) # S4 method for class 'MatrixDir' matrix_type(x) # S4 method for class 'EXPERIMENTAL_MatrixDirCompressedCol' matrix_type(x) # S4 method for class 'MatrixH5' matrix_type(x) # S4 method for class '10xMatrixH5' matrix_type(x) # S4 method for class 'AnnDataMatrixH5' matrix_type(x) # S4 method for class 'PeakMatrix' matrix_type(x) # S4 method for class 'PeakMatrix,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'TileMatrix' matrix_type(x) # S4 method for class 'TileMatrix,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'ConvertMatrixType' matrix_type(x) # S4 method for class 'ConvertMatrixType,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'Iterable_dgCMatrix_wrapper' matrix_type(x) # S4 method for class 'TransformedMatrix' matrix_type(x) # S4 method for class 'TransformedMatrix,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'TransformScaleShift,numeric' e1 * e2 # S4 method for class 'TransformScaleShift,numeric' e1 + e2 # S4 method for class 'numeric,TransformScaleShift' e1 * e2 # S4 method for class 'numeric,TransformScaleShift' e1 + e2"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"x object inheriting IterableMatrix. Row indices selection helpers. j Column indices selection helpers. ... Additional arguments passed call. drop Logical indicating whether drop dimensions (subsetting). e1 Left operand binary operations. e2 Right operand binary operations.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"matrix_type(MatrixMultiply): Matrix data type MatrixMultiply objects x[: Subset MatrixMultiply results matrix_type(MatrixMask): Matrix data type MatrixMask objects matrix_type(MatrixRankTransform): Matrix data type MatrixRankTransform objects matrix_type(MatrixSubset): Matrix data type MatrixSubset objects x[: Subset MatrixSubset transforms matrix_type(RenameDims): Matrix data type RenameDims objects x[: Subset RenameDims transforms matrix_type(RowBindMatrices): Matrix data type RowBindMatrices objects matrix_type(ColBindMatrices): Matrix data type ColBindMatrices objects x[: Subset RowBindMatrices transforms x[: Subset ColBindMatrices transforms matrix_type(PackedMatrixMem_uint32_t): Matrix data type PackedMatrixMem_uint32_t objects matrix_type(PackedMatrixMem_float): Matrix data type PackedMatrixMem_float objects matrix_type(PackedMatrixMem_double): Matrix data type PackedMatrixMem_double objects matrix_type(UnpackedMatrixMem_uint32_t): Matrix data type UnpackedMatrixMem_uint32_t objects matrix_type(UnpackedMatrixMem_float): Matrix data type UnpackedMatrixMem_float objects matrix_type(UnpackedMatrixMem_double): Matrix data type UnpackedMatrixMem_double objects matrix_type(MatrixDir): Matrix data type MatrixDir objects matrix_type(EXPERIMENTAL_MatrixDirCompressedCol): Matrix data type EXPERIMENTAL_MatrixDirCompressedCol objects matrix_type(MatrixH5): Matrix data type MatrixH5 objects matrix_type(`10xMatrixH5`): Matrix data type 10xMatrixH5 objects matrix_type(AnnDataMatrixH5): Matrix data type AnnDataMatrixH5 objects matrix_type(PeakMatrix): Matrix data type PeakMatrix objects x[: Subset PeakMatrix matrix_type(TileMatrix): Matrix data type TileMatrix objects x[: Subset TileMatrix matrix_type(ConvertMatrixType): Matrix data type ConvertMatrixType objects x[: Subset ConvertMatrixType transforms matrix_type(Iterable_dgCMatrix_wrapper): Matrix data type Iterable_dgCMatrix_wrapper objects matrix_type(TransformedMatrix): Matrix data type TransformedMatrix objects x[: Subset TransformedMatrix results e1 * e2: Scale TransformScaleShift results numeric values e1 + e2: Shift TransformScaleShift results numeric values e1 * e2: Apply numeric scaling left TransformScaleShift results e1 + e2: Add TransformScaleShift results numeric values (numeric left operand)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-class.html","id":null,"dir":"Reference","previous_headings":"","what":"Represent a sparse matrix-vector product operation — LinearOperator-class","title":"Represent a sparse matrix-vector product operation — LinearOperator-class","text":"LinearOperators perform sparse matrix-vector product operations downstream matrix solvers. avoid repeatedly calling iterate_matrix SVD solver possible efficiency gain","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":null,"dir":"Reference","previous_headings":"","what":"LinearOperator multiplication helpers — LinearOperator-math","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"Methods enabling \\%*% LinearOperator objects dense matrices numeric vectors.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"","code":"# S4 method for class 'LinearOperator,matrix' x %*% y # S4 method for class 'matrix,LinearOperator' x %*% y # S4 method for class 'LinearOperator,numeric' x %*% y # S4 method for class 'numeric,LinearOperator' x %*% y"},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"x Left operand. y Right operand.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"x %*% y: Multiply LinearOperator dense matrix x %*% y: Multiply dense matrix LinearOperator x %*% y: Multiply LinearOperator numeric vector x %*% y: Multiply numeric vector LinearOperator","code":""},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":null,"dir":"Reference","previous_headings":"","what":"Get/set inputs to a matrix transform — all_matrix_inputs","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"matrix object can either input (.e. file disk raw matrix memory), can represent delayed operation one matrices. all_matrix_inputs() getter setter functions allow accessing base-level input matrices list, changing . useful want re-locate data disk without losing transformed BPCells matrix. (Note: experimental API; potentially subject revisions).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"","code":"all_matrix_inputs(x) all_matrix_inputs(x) <- value"},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"x IterableMatrix value List IterableMatrix objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"List IterableMatrix objects. matrix m input object, all_matrix_inputs(m) return list(m).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":null,"dir":"Reference","previous_headings":"","what":"Apply a function to summarize rows/cols — apply_by_row","title":"Apply a function to summarize rows/cols — apply_by_row","text":"Apply custom R function row/col BPCells matrix. run slower builtin C++-backed functions, keep memory benefits disk-backed operations.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Apply a function to summarize rows/cols — apply_by_row","text":"","code":"apply_by_row(mat, fun, ...) apply_by_col(mat, fun, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Apply a function to summarize rows/cols — apply_by_row","text":"mat IterableMatrix object fun function(val, row, col) takes row/col values returns summary output. Argument details: val - Vector length (# non-zero values) value non-zero matrix entry row - one-based row index (apply_by_col: vector length (# non-zero values), apply_by_row: single integer) col - one-based col index (apply_by_col: single integer, apply_by_row: vector length (# non-zero values)) ... - Optional additional arguments (named row, col, val) ... Optional additional arguments passed fun","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Apply a function to summarize rows/cols — apply_by_row","text":"apply_by_row - list length nrow(matrix) results returned fun() row apply_by_col - list length ncol(matrix) results returned fun() row","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Apply a function to summarize rows/cols — apply_by_row","text":"functions require row-major matrix storage apply_by_row col-major storage apply_by_col, matrices stored wrong order may neeed re-ordered copy created using transpose_storage_order() first. required able keep memory-usage low allow calculating result single streaming pass input matrix. vector/matrix outputs desired instead lists, calling unlist(x) .call(cbind, x) .call(rbind, x) can convert list output.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Apply a function to summarize rows/cols — apply_by_row","text":"","code":"mat <- matrix(rbinom(40, 1, 0.5) * sample.int(5, 40, replace = TRUE), nrow = 4) rownames(mat) <- paste0(\"gene\", 1:4) mat #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> gene1 0 0 2 0 0 0 3 3 0 2 #> gene2 1 0 3 0 0 3 4 1 1 2 #> gene3 2 0 1 0 0 2 0 0 0 0 #> gene4 0 0 0 0 3 0 1 0 0 0 mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") ####################################################################### ## apply_by_row() example ####################################################################### ## Get mean of every row ## expect an error in the case that col-major matrix is passed apply_by_row(mat, function(val, row, col) {sum(val) / nrow(mat)}) %>% unlist() #> Error in apply_by_row(mat, function(val, row, col) { sum(val)/nrow(mat)}): Cannot call apply_by_row on a col-major matrix. Please call transpose_storage_order() first ## Need to transpose matrix to make sure it is in row-order mat_row_order <- transpose_storage_order(mat) ## works as expected for row major apply_by_row(mat_row_order, function(val, row, col) sum(val) / ncol(mat_row_order) ) %>% unlist() #> [1] 1.0 1.5 0.5 0.4 # Also analogous to running rowMeans() without names rowMeans(mat) #> gene1 gene2 gene3 gene4 #> 1.0 1.5 0.5 0.4 ####################################################################### ## apply_by_col() example ####################################################################### ## Get argmax of every col apply_by_col(mat, function(val, row, col) if (length(val) > 0) row[which.max(val)] else 1L ) %>% unlist() #> [1] 3 1 2 1 4 2 2 1 2 1"},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert matrix elements to zeros and ones — binarize","title":"Convert matrix elements to zeros and ones — binarize","text":"Binarize compares matrix element values threshold value sets output elements either zero one. default, element values greater threshold set one; otherwise, set zero. strict_inequality set FALSE, element values greater equal threshold set one. alternative, <, <=, >, >= operators also supported.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert matrix elements to zeros and ones — binarize","text":"","code":"binarize(mat, threshold = 0, strict_inequality = TRUE)"},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert matrix elements to zeros and ones — binarize","text":"mat IterableMatrix threshold numeric value determines whether elements x set zero one. strict_inequality logical value determining whether comparison threshold >= (strict_inequality=FALSE) > (strict_inequality=TRUE).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert matrix elements to zeros and ones — binarize","text":"binarized IterableMatrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_macs_peaks.html","id":null,"dir":"Reference","previous_headings":"","what":"Call peaks using MACS2/3 — call_macs_peaks","title":"Call peaks using MACS2/3 — call_macs_peaks","text":"function renamed call_peaks_macs()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_macs_peaks.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call peaks using MACS2/3 — call_macs_peaks","text":"","code":"call_macs_peaks(...)"},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":null,"dir":"Reference","previous_headings":"","what":"Call peaks using MACS2/3 — call_peaks_macs","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"Export pseudobulk bed files input MACS, run MACS read output peaks tibble. step can can run independently, allowing quickly re-loading results already completed call, running MACS externally (e.g. via cluster job submisison) increased parallelization. See details information.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"","code":"call_peaks_macs( fragments, path, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), effective_genome_size = 2.9e+09, insertion_mode = c(\"both\", \"start_only\", \"end_only\"), step = c(\"all\", \"prep-inputs\", \"run-macs\", \"read-outputs\"), macs_executable = NULL, additional_params = \"--call-summits --keep-dup all --shift -75 --extsize 150 --nomodel --nolambda\", verbose = FALSE, threads = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"fragments IterableFragments object path (string) Parent directory store MACS inputs outputs. Inputs stored /input/ outputs /output//. See \"File format\" details cell_groups Grouping vector one entry per cell fragments, e.g. cluster IDs effective_genome_size (numeric) Effective genome size MACS. Default 2.9e9 following MACS default GRCh38. See deeptools values common genomes. insertion_mode (string) fragment ends use coverage calculation. One , start_only, end_only. step (string) step run. One , prep-inputs, run-macs, read-outputs. prep-inputs, create input bed files macs, provides shell script per cell group command run macs. run-macs, also run bash scripts execute macs. read-outputs, read outputs tibbles. macs_executable (string) Path either MACS2/3 executable. Default (NULL) autodetect PATH. additional_params (string) Additional parameters pass MACS2/3. verbose (bool) Whether provide verbose output MACS. used step run-macs . threads (int) Number threads use.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"step prep-inputs, return script paths cell group given character vector. step run-macs, return NULL. step read-outputs , returns tibble peaks cell group concatenated. Columnns chr, start, end, group, name, score, strand, fold_enrichment, log10_pvalue, log10_qvalue, summit_offset","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"File format: Inputs written bed file used input MACS, well shell file containing call MACS written cell group. Bed files containing chr, start, end coordinates insertions written /input/.bed.gz. Shell commands run MACS manually written /input/.sh. Outputs written output directory subdirectory cell group. cell group's output directory contains file narrowPeaks, peaks, summits. NarrowPeaks written /output//_peaks.narrowPeak. Peaks written /output//_peaks.xls. Summits written /output//_summits.bed. narrowPeaks file read tibble returned. information outputs MACS, visit MACS docs Performance: Running 2600 cell dataset taking start end insertions account, written input bedfiles MACS outputs used 364 MB 158 MB space respectively. 4 threads, running function end end took 74 seconds, 61 seconds spent running MACS. Running MACS manually: run MACS manually, first run call_peaks_macs() step=\"prep-inputs. , manually run shell scripts generated /input/.sh. Finally, run call_peaks_macs() original arguments, setting step=\"read-outputs\".","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"","code":"macs_files <- file.path(tempdir(), \"peaks\") frags <- get_demo_frags() head(call_peaks_macs(frags, macs_files)) #> # A tibble: 6 × 11 #> chr start end group name score strand fold_enrichment log10_pvalue #> #> 1 chr11 175907 176057 all all_peak_1 22 . 2.95 3.29 #> 2 chr11 179864 180015 all all_peak_2 33 . 3.68 4.44 #> 3 chr11 180095 180352 all all_peak_3 13 . 2.21 2.23 #> 4 chr11 184430 184599 all all_peak_4 33 . 3.68 4.44 #> 5 chr11 188061 188273 all all_peak_5 56 . 5.16 6.97 #> 6 chr11 189522 189672 all all_peak_6 33 . 3.68 4.44 #> # ℹ 2 more variables: log10_qvalue , summit_offset ## Can also just run the input prep, then run macs manually ## by setting step to 'prep_inputs' macs_script <- call_peaks_macs(frags, macs_files, step = \"prep-inputs\") system2(\"bash\", macs_script[1], stdout = FALSE, stderr = FALSE) ## Then read the narrow peaks files list.files(file.path(macs_files, \"output\", \"all\")) #> [1] \"all_peaks.narrowPeak\" \"all_peaks.xls\" \"all_summits.bed\" #> [4] \"log.txt\" ## call_peaks_macs() can also solely perform the output reading step head(call_peaks_macs(frags, macs_files, step = \"read-outputs\")) #> # A tibble: 6 × 11 #> chr start end group name score strand fold_enrichment log10_pvalue #> #> 1 chr11 175907 176057 all all_peak_1 22 . 2.95 3.29 #> 2 chr11 179864 180015 all all_peak_2 33 . 3.68 4.44 #> 3 chr11 180095 180352 all all_peak_3 13 . 2.21 2.23 #> 4 chr11 184430 184599 all all_peak_4 33 . 3.68 4.44 #> 5 chr11 188061 188273 all all_peak_5 56 . 5.16 6.97 #> 6 chr11 189522 189672 all all_peak_6 33 . 3.68 4.44 #> # ℹ 2 more variables: log10_qvalue , summit_offset "},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":null,"dir":"Reference","previous_headings":"","what":"Call peaks from tiles — call_peaks_tile","title":"Call peaks from tiles — call_peaks_tile","text":"Calling peaks pre-set list tiles can much faster using dedicated peak-calling software like macs3. resulting peaks less precise terms exact coordinates, sufficient analyses.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call peaks from tiles — call_peaks_tile","text":"","code":"call_peaks_tile( fragments, chromosome_sizes, cell_groups = rep.int(\"all\", length(cellNames(fragments))), effective_genome_size = NULL, peak_width = 200, peak_tiling = 3, fdr_cutoff = 0.01, merge_peaks = c(\"all\", \"group\", \"none\") )"},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Call peaks from tiles — call_peaks_tile","text":"fragments IterableFragments object chromosome_sizes Chromosome start end coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position See read_ucsc_chrom_sizes(). cell_groups Grouping vector one entry per cell fragments, e.g. cluster IDs effective_genome_size (Optional) effective genome size poisson background rate estimation. See deeptools values common genomes. Defaults sum chromosome sizes, overestimates peak significance peak_width Width candidate peaks peak_tiling Number candidate peaks overlapping base genome. E.g. peak_width = 300 peak_tiling = 3 results candidate peaks 300bp spaced 100bp apart fdr_cutoff Adjusted p-value significance cutoff merge_peaks merge significant peaks merge_peaks_iterative() \"\" Merge full set peaks \"group\" Merge peaks within group \"none\" perform merging","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Call peaks from tiles — call_peaks_tile","text":"tibble peak calls following columns: chr, start, end: genome coordinates group: group ID peak identified p_val, q_val: Poission p-value BH-corrected p-value enrichment: Enrichment counts peak compared genome-wide background","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Call peaks from tiles — call_peaks_tile","text":"Peak calling steps: Estimate genome-wide expected insertions per tile based peak_width, effective_genome_size, per-group read counts Tile genome nonoverlapping tiles size peak_width tile group, calculate p_value based Poisson model Compute adjusted p-values using BH method using total number tiles number hypotheses tested. Repeat steps 2-4 peak_tiling times, evenly spaced offsets merge_peaks \"\" \"group\": use merge_peaks_iterative() within group keep significant overlapping candidate peaks merge_peaks \"\", perform final round merge_peaks_iterative(), prioritizing peak within-group significance rank","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Call peaks from tiles — call_peaks_tile","text":"","code":"## Prep data reference_dir <- file.path(tempdir(), \"references\") frags <- get_demo_frags() ## Remove blacklist regions from fragments blacklist <- read_encode_blacklist(reference_dir, genome=\"hg38\") frags_filter_blacklist <- select_regions(frags, blacklist, invert_selection = TRUE) chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) ## Call peaks call_peaks_tile(frags_filter_blacklist, chrom_sizes, effective_genome_size = 2.8e9) #> # A tibble: 73,160 × 7 #> chr start end group p_val q_val enrichment #> #> 1 chr11 65615400 65615600 all 0 0 6764. #> 2 chr4 2262266 2262466 all 0 0 6422. #> 3 chr11 119057200 119057400 all 0 0 6188. #> 4 chr11 695133 695333 all 0 0 6180. #> 5 chr11 2400400 2400600 all 0 0 6166. #> 6 chr4 1346933 1347133 all 0 0 6109. #> 7 chr11 3797600 3797800 all 0 0 6017. #> 8 chr11 64878600 64878800 all 0 0 5948. #> 9 chr11 57667733 57667933 all 0 0 5946. #> 10 chr11 83156933 83157133 all 0 0 5913. #> # ℹ 73,150 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate the MD5 checksum of an IterableMatrix — checksum","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"Calculate MD5 checksum IterableMatrix return checksum hexidecimal format.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"","code":"checksum(matrix)"},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"matrix IterableMatrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"MD5 checksum string hexidecimal format.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"checksum() converts non-zero elements sparse input matrix double precision, concatenates element value element row column index words, uses 16-byte blocks along matrix dimensions row column names calculate checksum. checksum value depends storage order column- row-order matrices element values give different checksum values. checksum() uses element index values little-endian CPU storage order. converts little-endian order big-endian architecture although tested.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"","code":"library(Matrix) library(BPCells) m1 <- matrix(seq(1,12), nrow=3) m2 <- as(m1, 'dgCMatrix') m3 <- as(m2, 'IterableMatrix') checksum(m3) #> [1] \"8a6bf37ef376f7d74b4642a2ed0fc58d\""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster an adjacency matrix — cluster_graph_leiden","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Cluster adjacency matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"","code":"cluster_graph_leiden( snn, resolution = 1, objective_function = c(\"modularity\", \"CPM\"), seed = 12531, ... ) cluster_graph_louvain(snn, resolution = 1, seed = 12531) cluster_graph_seurat(snn, resolution = 0.8, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"snn Symmetric adjacency matrix (dgCMatrix) output e.g. knn_to_snn_graph() knn_to_geodesic_graph(). lower triangle used resolution Resolution parameter. Higher values result clusters objective_function Graph statistic optimize clustering. Modularity default keeps resolution independent dataset size (see details ). meaning option, see igraph::cluster_leiden(). seed Random seed clustering initialization ... Additional arguments underlying clustering function","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Factor vector containing cluster assignment cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"cluster_graph_leiden: Leiden clustering algorithm igraph::cluster_leiden(). Note using objective_function = \"CPM\" number clusters empirically scales cells * resolution, 1e-3 good resolution 10k cells, 1M cells better 1e-5 resolution. resolution 1 good default objective_function = \"modularity\" per default. cluster_graph_louvain: Louvain graph clustering algorithm igraph::cluster_louvain() cluster_graph_seurat: Seurat's clustering algorithm Seurat::FindClusters()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"Take cell embedding matrix, find k nearest neighbors (KNN) cell, convert KNN graph (adjacency matrix), run graph-based clustering algorithm. steps can customized passing function performs step (see details).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"","code":"cluster_cells_graph( mat, knn_method = knn_hnsw, knn_to_graph_method = knn_to_geodesic_graph, cluster_graph_method = cluster_graph_leiden, threads = 0L, verbose = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"mat (matrix) Cell embeddings matrix shape (cells x n_embeddings) knn_method (function) Function takes embedding matrix first argument returns k nearest neighbors (KNN) object. example, knn_hnsw(), knn_annoy(), parameterized version (see Details). knn_to_graph_method (function) Function takes KNN object returns graph undirected graph (lower-triangular dgCMatrix adjacency matrix). example, knn_to_graph(), knn_to_snn_graph(), knn_to_geodesic_graph(), parameterized version (see Details). cluster_graph_method (function) Function takes undirected graph cell similarity returns factor cluster assignments cell. example, cluster_graph_leiden(), cluster_graph_louvain(), cluster_graph_seurat(), parameterized version (see Details). threads (integer) Number threads use knn_method, knn_to_graph_method cluster_graph_method. functions utilize threads argument, silently ignored. verbose (logical) Whether print progress information knn_method, knn_to_graph_method cluster_graph_method. functions utilize verbose argument, silently ignored.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"(factor) Factor vector containing cluster assignment cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"Customizing clustering steps BPCells functions named like knn_*, knn_to_graph_*, cluster_graph_* support customizing parameters via partial function application. example, look 20 neighbors k nearest neighbors search, setting knn_method=knn_hnsw(k=20) convenient shortcut knn_method=function(x) knn_hnsw(x, k=20). Similarly, lowering default clustering resolution can done cluster_graph_method=cluster_graph_louvain(resolution=0.5). works functions written return partially parameterized copy function object first argument missing. even advanced customization, users can manually call knn, knn_to_graph, cluster_graph methods rather using cluster_cells_graph() convenient wrapper. Implementing custom clustering steps required interfaces step follows: knn_method: First argument matrix cell embeddings, shape (cells x n_embeddings). Returns named list two matrices dimension (cells x k): idx: Neighbor indices, idx[c, n] index nth nearest neighbor cell c. dist: Neighbor distances, dist[c, n] distance cell c nth nearest neighbor. Self-neighbors allowed, sufficient search effort idx[c,1] == c nearly cells. knn_to_graph_method: First argument KNN object returned knn_method. Returns weighted similarity graph lower triangular sparse adjacency matrix (dgCMatrix). cells j, similarity score adjacency_mat[max(,j), min(,j)]. cluster_graph_method: First argument weighted similarity graph returned knn_to_graph_method. Returns factor vector length cells cluster assignment cell.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster an adjacency matrix — cluster_graph_leiden","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Cluster adjacency matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"","code":"cluster_graph_leiden( mat, resolution = 1, objective_function = c(\"modularity\", \"CPM\"), seed = 12531, ... ) cluster_graph_louvain(mat, resolution = 1, seed = 12531) cluster_graph_seurat(mat, resolution = 0.8, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"mat Symmetric adjacency matrix (dgCMatrix) output e.g. knn_to_snn_graph() knn_to_geodesic_graph(). lower triangle used. resolution Resolution parameter. Higher values result clusters objective_function Graph statistic optimize clustering. Modularity default keeps resolution independent dataset size (see details ). meaning option, see igraph::cluster_leiden(). seed Random seed clustering initialization ... Additional arguments underlying clustering function","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Factor vector containing cluster assignment cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"cluster_graph_leiden: Leiden clustering algorithm igraph::cluster_leiden(). Note using objective_function = \"CPM\" number clusters empirically scales cells * resolution, 1e-3 good resolution 10k cells, 1M cells better 1e-5 resolution. resolution 1 good default objective_function = \"modularity\" per default. cluster_graph_louvain: Louvain graph clustering algorithm igraph::cluster_louvain() cluster_graph_seurat: Seurat's clustering algorithm Seurat::FindClusters()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert grouping vector to sparse matrix — cluster_membership_matrix","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"Converts vector membership IDs sparse matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"","code":"cluster_membership_matrix(groups, group_order = NULL)"},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"groups Vector one entry per cell, specifying cell's group group_order Optional vector listing ordering groups","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"cell x group matrix entry 1 cell given group","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":null,"dir":"Reference","previous_headings":"","what":"Collect features for plotting — collect_features","title":"Collect features for plotting — collect_features","text":"Helper function data features plot diverse set data sources.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Collect features for plotting — collect_features","text":"","code":"collect_features( source, features = NULL, gene_mapping = human_gene_mapping, n = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Collect features for plotting — collect_features","text":"source Matrix data frame pull features , vector feature values single feature. matrix, features must rows. features Character vector features names plot source vector. gene_mapping optional vector gene name matching match_gene_symbol(). Ignored source data frame. n Internal-use parameter marking number nested calls. used finding name \"source\" input variable caller's perspective","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Collect features for plotting — collect_features","text":"Data frame one column feature requested","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Collect features for plotting — collect_features","text":"source data.frame, features drawn columns. source matrix object (IterableMatrix, dgCMatrix, matrix), features drawn rows.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/concat_dimnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function for rbind/cbind concatenating dimnames — concat_dimnames","title":"Helper function for rbind/cbind concatenating dimnames — concat_dimnames","text":"Helper function rbind/cbind concatenating dimnames","code":""},{"path":"https://bnprks.github.io/BPCells/reference/concat_dimnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function for rbind/cbind concatenating dimnames — concat_dimnames","text":"","code":"concat_dimnames(x, y, len_x, len_y, warning_prefix, dim_type)"},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert the type of a matrix — convert_matrix_type","title":"Convert the type of a matrix — convert_matrix_type","text":"Convert type matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert the type of a matrix — convert_matrix_type","text":"","code":"convert_matrix_type(matrix, type = c(\"uint32_t\", \"double\", \"float\"))"},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert the type of a matrix — convert_matrix_type","text":"matrix IterableMatrix object input type One uint32_t (unsigned 32-bit integer), float (32-bit real number), double (64-bit real number)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert the type of a matrix — convert_matrix_type","text":"IterableMatrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert the type of a matrix — convert_matrix_type","text":"","code":"mat <- matrix(rnorm(50), nrow = 10, ncol = 5) rownames(mat) <- paste0(\"gene\", seq_len(10)) colnames(mat) <- paste0(\"cell\", seq_len(5)) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") mat #> 10 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory convert_matrix_type(mat, \"float\") #> 10 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: float #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Convert type from double to float"},{"path":"https://bnprks.github.io/BPCells/reference/create_partial.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper to create partial functions — create_partial","title":"Helper to create partial functions — create_partial","text":"Automatically creates partial application caller function including non-missing arguments.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/create_partial.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper to create partial functions — create_partial","text":"","code":"create_partial()"},{"path":"https://bnprks.github.io/BPCells/reference/create_partial.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Helper to create partial functions — create_partial","text":"bpcells_partial object (function extra attributes)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve BPCells demo data — get_demo_mat","title":"Retrieve BPCells demo data — get_demo_mat","text":"Functions download matrices fragments derived 10X Genomics PBMC 3k dataset, options filter common qc metrics, subset genes fragments chromosome 4 11.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve BPCells demo data — get_demo_mat","text":"","code":"get_demo_mat(filter_qc = TRUE, subset = TRUE) get_demo_frags(filter_qc = TRUE, subset = TRUE) remove_demo_data()"},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve BPCells demo data — get_demo_mat","text":"filter_qc (bool) Whether filter RNA ATAC data using qc metrics (described details). subset (bool) Whether subset genes/insertions chromosome 4 11.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve BPCells demo data — get_demo_mat","text":"get_demo_mat(): (IterableMatrix) (features x cells) matrix. get_demo_frags(): (IterableFragments) Fragments object. remove_demo_data(): NULL","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Retrieve BPCells demo data — get_demo_mat","text":"data functions experimental. interface, well demo dataset likely undergo changes near future. Data Processing: first time either get_demo_mat(), get_demo_frags(), ran demo data downloaded stored BPCells data directory (file.path(tools::R_user_dir(\"BPcells\", =\"data\"), \"demo_data\")). Subsequent calls function use previously downloaded matrix/fragments, given combination filtering subsetting performed previously. preparation matrix can reproduced running internal function prepare_demo_data() directory set BPCells data directory. case demo data pre-downloaded demo data download fails, prepare_demo_data() act fallback. matrix get_demo_mat() fragments get_demo_frags() may removed running remove_demo_data(). Filtering using QC information fragments matrix object chooses cells least 1000 reads, 1000 frags, minimum tss enrichment 10. Subsetting provides genes insertions chromosomes 4 11. Dimensions: Data size: Function Description: get_demo_mat(): Retrieve demo IterableMatrix object representing 10X Genomics PBMC 3k dataset. get_demo_frags(): Retrieve demo IterableFragments object representing 10X Genomics PBMC 3k dataset. remove_demo_data(): Remove demo data BPCells data directory.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve BPCells demo data — get_demo_mat","text":"","code":"####################################################################### ## get_demo_mat() example ####################################################################### get_demo_mat() #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /home/imman/.local/share/R/BPCells/demo_data/demo_mat_filtered_subsetted ####################################################################### ## get_demo_frags() example ####################################################################### get_demo_frags() #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered_subsetted ####################################################################### ## remove_demo_data() example ####################################################################### remove_demo_data() ## Demo data folder is now empty data_dir <- file.path(tools::R_user_dir(\"BPCells\", which = \"data\"), \"demo_data\") list.files(data_dir) #> character(0)"},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"function renamed trackplot_combine().","code":""},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"","code":"draw_trackplot_grid( ..., labels, title = NULL, heights = rep(1, length(plots)), label_width = 0.2, label_style = list(fontface = \"bold\", size = 4) )"},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"... Plots order top bottom, generally plain ggplots. better accomodate many bulk tracks, patchwork objects contain multiple tracks also accepted. case, plot labels drawn attribute $patchwork$labels present, rather labels argument. labels Text labels display track title Text overarching title plot heights Relative heights component plot. suggested use 1 standard height pseudobulk track. label_width Fraction width used labels relative main track area label_style Arguments pass geom_text adjust label text style","code":""},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"plot object aligned genome plots. aligned row text label, y-axis, plot body. relative height row given heights. shared title x-axis put top.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/ensure_downloaded.html","id":null,"dir":"Reference","previous_headings":"","what":"Download a file with a custom timeout — ensure_downloaded","title":"Download a file with a custom timeout — ensure_downloaded","text":"Download file custom timeout","code":""},{"path":"https://bnprks.github.io/BPCells/reference/ensure_downloaded.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download a file with a custom timeout — ensure_downloaded","text":"","code":"ensure_downloaded(path, backup_url, timeout)"},{"path":"https://bnprks.github.io/BPCells/reference/ensure_downloaded.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download a file with a custom timeout — ensure_downloaded","text":"path Output path write file timeout timeout seconds url download ","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Extend genome ranges in a strand-aware fashion. — extend_ranges","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"Extend genome ranges strand-aware fashion.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"","code":"extend_ranges( ranges, upstream = 0, downstream = 0, metadata_cols = c(\"strand\"), chromosome_sizes = NULL, zero_based_coords = !is(ranges, \"GRanges\") )"},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position upstream Number bases extend range upstream (negative shrink width) downstream Number bases extend range downstream (negative shrink width) metadata_cols Optional list metadata columns require & extract chromosome_sizes (optional) Size chromosomes genomic-ranges object zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"Note ranges blocked extending past beginning chromosome (base 0), chromosome_sizes given also blocked extending past end chromosome","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"","code":"## Prep data ranges <- tibble::tibble( chr = \"chr1\", start = seq(50, 4050, 1000), end = start + 50, strand = \"+\" ) ranges #> # A tibble: 5 × 4 #> chr start end strand #> #> 1 chr1 50 100 + #> 2 chr1 1050 1100 + #> 3 chr1 2050 2100 + #> 4 chr1 3050 3100 + #> 5 chr1 4050 4100 + ## Extend ranges 1 bp upstream, 1 bp downstream extend_ranges(ranges, upstream = 1, downstream = 1) #> # A tibble: 5 × 4 #> chr start end strand #> #> 1 chr1 49 101 TRUE #> 2 chr1 1049 1101 TRUE #> 3 chr1 2049 2101 TRUE #> 4 chr1 3049 3101 TRUE #> 5 chr1 4049 4101 TRUE"},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":null,"dir":"Reference","previous_headings":"","what":"Get footprints around a set of genomic coordinates — footprint","title":"Get footprints around a set of genomic coordinates — footprint","text":"Get footprints around set genomic coordinates","code":""},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get footprints around a set of genomic coordinates — footprint","text":"","code":"footprint( fragments, ranges, zero_based_coords = !is(ranges, \"GRanges\"), cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), cell_weights = rlang::rep_along(cell_groups, 1), flank = 125L, normalization_width = flank%/%10L )"},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get footprints around a set of genomic coordinates — footprint","text":"fragments IterableFragments object ranges Footprint centers given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand \"+\" strand ranges footprint around start coordinate, \"-\" strand ranges around end coordinate. zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range cell_groups Character factor assigning group cell, order cellNames(fragments) cell_weights Numeric vector assigning weight factors (e.g. inverse total reads) cell, order cellNames(fragments) flank Number flanking basepairs include either side motif normalization_width Number basepairs upstream + downstream extremes use calculating enrichment","code":""},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get footprints around a set of genomic coordinates — footprint","text":"tibble::tibble() columns group, position, count, enrichment","code":""},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get footprints around a set of genomic coordinates — footprint","text":"","code":"## Prep data frags <- get_demo_frags() ## Motif positions taken from taking a subset of GATA1 motifs ## positions in peaks using motifmatchr ## See basic tutorial for description of generating ## positions motif_positions <- tibble::tibble( chr = rep(\"chr4\", 3), start = c(338237, 498344, 499851), end = c(338247, 498354, 499861), strand = c(\"-\", \"+\", \"+\"), score = c(8.1422, 8.1415, 9.59462) ) ## Run footprinting footprint(frags, motif_positions) #> # A tibble: 251 × 4 #> group position count enrichment #> #> 1 all -125 0 0 #> 2 all -124 1 2 #> 3 all -123 0 0 #> 4 all -122 0 0 #> 5 all -121 2 4 #> 6 all -120 0 0 #> 7 all -119 1 2 #> 8 all -118 0 0 #> 9 all -117 0 0 #> 10 all -116 1 2 #> # ℹ 241 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert between BPCells fragments and R objects. — convert_to_fragments","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"BPCells fragments can interconverted GRanges data.frame R objects. main conversion method R's builtin () function, though convert_to_fragments() helper also available. R objects except GRanges, BPCells assumes 0-based, end-exclusive coordinate system. (See genomic-ranges-like reference details)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"","code":"# Convert from R to BPCells convert_to_fragments(x, zero_based_coords = !is(x, \"GRanges\")) as(x, \"IterableFragments\") # Convert from BPCells to R as.data.frame(bpcells_fragments) as(bpcells_fragments, \"data.frame\") as(bpcells_fragments, \"GRanges\")"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"x Fragment coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position cell_id: cell barcodes unique identifiers string factor zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"convert_to_fragments(): IterableFragments object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"","code":"frags_table <- tibble::tibble( chr = paste0(\"chr\", 1:10), start = 0, end = 5, cell_id = \"cell1\" ) frags_table #> # A tibble: 10 × 4 #> chr start end cell_id #> #> 1 chr1 0 5 cell1 #> 2 chr2 0 5 cell1 #> 3 chr3 0 5 cell1 #> 4 chr4 0 5 cell1 #> 5 chr5 0 5 cell1 #> 6 chr6 0 5 cell1 #> 7 chr7 0 5 cell1 #> 8 chr8 0 5 cell1 #> 9 chr9 0 5 cell1 #> 10 chr10 0 5 cell1 frags_granges <- GenomicRanges::makeGRangesFromDataFrame( frags_table, keep.extra.columns = TRUE ) frags_granges #> GRanges object with 10 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 0-5 * | cell1 #> [2] chr2 0-5 * | cell1 #> [3] chr3 0-5 * | cell1 #> [4] chr4 0-5 * | cell1 #> [5] chr5 0-5 * | cell1 #> [6] chr6 0-5 * | cell1 #> [7] chr7 0-5 * | cell1 #> [8] chr8 0-5 * | cell1 #> [9] chr9 0-5 * | cell1 #> [10] chr10 0-5 * | cell1 #> ------- #> seqinfo: 10 sequences from an unspecified genome; no seqlengths ####################################################################### ## convert_to_fragments() example ####################################################################### frags <- convert_to_fragments(frags_granges) frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 10 chromosomes with names chr1, chr2 ... chr10 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ####################################################################### ## as(x, \"IterableFragments\") example ####################################################################### frags <- as(frags_table, \"IterableFragments\") frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 10 chromosomes with names chr1, chr10 ... chr9 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ####################################################################### ## as(bpcells_fragments, \"data.frame\") example ####################################################################### frags_table <- as(frags, \"data.frame\") frags_table #> chr start end cell_id #> 1 chr1 0 5 cell1 #> 2 chr10 0 5 cell1 #> 3 chr2 0 5 cell1 #> 4 chr3 0 5 cell1 #> 5 chr4 0 5 cell1 #> 6 chr5 0 5 cell1 #> 7 chr6 0 5 cell1 #> 8 chr7 0 5 cell1 #> 9 chr8 0 5 cell1 #> 10 chr9 0 5 cell1 ####################################################################### ## as(bpcells_fragments, \"GRanges\") example ####################################################################### frags_granges <- as(frags, \"GRanges\") frags_granges #> GRanges object with 10 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 1-5 * | cell1 #> [2] chr10 1-5 * | cell1 #> [3] chr2 1-5 * | cell1 #> [4] chr3 1-5 * | cell1 #> [5] chr4 1-5 * | cell1 #> [6] chr5 1-5 * | cell1 #> [7] chr6 1-5 * | cell1 #> [8] chr7 1-5 * | cell1 #> [9] chr8 1-5 * | cell1 #> [10] chr9 1-5 * | cell1 #> ------- #> seqinfo: 10 sequences from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write BPCells fragment objects — write_fragments_memory","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"BPCells fragments can read/written compressed (bitpacked) uncompressed form variety storage locations: memory (R object), hdf5 file, directory disk (containing binary files).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"","code":"write_fragments_memory(fragments, compress = TRUE) write_fragments_dir( fragments, dir, compress = TRUE, buffer_size = 1024L, overwrite = FALSE ) open_fragments_dir(dir, buffer_size = 1024L) write_fragments_hdf5( fragments, path, group = \"fragments\", compress = TRUE, buffer_size = 8192L, chunk_size = 1024L, overwrite = FALSE, gzip_level = 0L ) open_fragments_hdf5(path, group = \"fragments\", buffer_size = 16384L)"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"fragments Input fragments object compress Whether compress data. compression, storage size half size gzip-compressed 10x fragments file. dir Directory read/write data buffer_size performance tuning . number items bufferred memory calling writes disk. overwrite TRUE, write temp dir overwrite existing data. Alternatively, pass temp path string customize temp dir location. path Path hdf5 file disk group group within hdf5 file write data . writing existing hdf5 file group must already use chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression). recommended compression compatibility outside programs required. Otherwise, using compress=TRUE recommended >10x faster often similar compression levels.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"Fragment object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"Saving directory disk good default local analysis, provides best /O performance lowest memory usage. HDF5 format allows saving within existing hdf5 files group data together, memory format provides fastest performance event memory usage unimportant.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"","code":"## Create temporary directory to keep demo fragments data_dir <- file.path(tempdir(), \"frags\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) ## Get demo frags loaded from disk frags <- get_demo_frags() frags #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered_subsetted ####################################################################### ## write_fragments_memory() example ####################################################################### frags_memory <- write_fragments_memory(frags) frags_memory #> IterableFragments object of class \"PackedMemFragments\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from memory ####################################################################### ## write_fragments_dir() example ####################################################################### frags <- write_fragments_dir( frags_memory, file.path(data_dir, \"demo_frags\"), overwrite = TRUE ) frags #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /tmp/RtmpCiGY9C/frags/demo_frags ####################################################################### ## open_fragments_dir() example ####################################################################### frags <- open_fragments_dir(file.path(data_dir, \"demo_frags\")) frags #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /tmp/RtmpCiGY9C/frags/demo_frags ####################################################################### ## write_fragments_hdf5() example ####################################################################### frags_hdf5 <- write_fragments_hdf5( frags, file.path(data_dir, \"demo_frags.h5\"), overwrite = TRUE ) frags_hdf5 #> IterableFragments object of class \"FragmentsHDF5\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from /tmp/RtmpCiGY9C/frags/demo_frags.h5, group fragments ####################################################################### ## open_fragments_hdf5() example ####################################################################### frags_hdf5 <- open_fragments_hdf5(file.path(data_dir, \"demo_frags.h5\")) frags_hdf5 #> IterableFragments object of class \"FragmentsHDF5\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from /tmp/RtmpCiGY9C/frags/demo_frags.h5, group fragments"},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if two fragments objects are identical — fragments_identical","title":"Check if two fragments objects are identical — fragments_identical","text":"Check two fragments objects identical","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if two fragments objects are identical — fragments_identical","text":"","code":"fragments_identical(fragments1, fragments2)"},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if two fragments objects are identical — fragments_identical","text":"fragments1 First IterableFragments compare fragments2 Second IterableFragments compare","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if two fragments objects are identical — fragments_identical","text":"boolean whether fragments objects identical","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if two fragments objects are identical — fragments_identical","text":"","code":"## Prep data frags_1 <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags_1 #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory frags_2_identical <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags_3_different <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(4,2))) ) %>% convert_to_fragments() ## In the case of mismatching cell ids fragments_identical(frags_1, frags_3_different) #> [1] FALSE ## In the case of two identical frag objects fragments_identical(frags_1, frags_2_identical) #> [1] TRUE"},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":null,"dir":"Reference","previous_headings":"","what":"Gene Symbol Mapping data — human_gene_mapping","title":"Gene Symbol Mapping data — human_gene_mapping","text":"Mapping canonical gene symbols corresponding unambiguous alias, previous symbol, ensembl ID, entrez ID.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Gene Symbol Mapping data — human_gene_mapping","text":"","code":"human_gene_mapping mouse_gene_mapping"},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Gene Symbol Mapping data — human_gene_mapping","text":"human_gene_mapping named character vector. Names aliases IDs values corresponding canonical gene symbol mouse_gene_mapping named character vector. Names aliases IDs values corresponding canonical gene symbol","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Gene Symbol Mapping data — human_gene_mapping","text":"human_gene_mapping http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/non_alt_loci_set.txt mouse_gene_mapping http://www.informatics.jax.org/downloads/reports/MGI_EntrezGene.rpt http://www.informatics.jax.org/downloads/reports/MRK_ENSEMBL.rpt","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Gene Symbol Mapping data — human_gene_mapping","text":"See source code data-raw/human_gene_mapping.R data-raw/mouse_gene_mapping.R exactly mappings made.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Gene Symbol Mapping data — human_gene_mapping","text":"","code":"####################################################################### ## human_gene_mapping head(human_gene_mapping) #> 0808y08y 1 1-8D 1-8U 1-Cys 1/2-SBSRNA4 #> \"NFYC-AS1\" \"A1BG\" \"IFITM2\" \"IFITM3\" \"PRDX6\" \"SEC24B-AS1\" ####################################################################### ####################################################################### ## mouse_gene_mapping head(mouse_gene_mapping) #> (ACTbEGFP)10sb (CAM)alpha1B-AR #> \"Tg(CAG-EGFP)1Osb\" \"Tg(CAMalpha1b)7Wjk\" #> (CaMKII)Cre2834 (G2019S) LRRK2 #> \"Tg(Camk2a-cre)2834Lusc\" \"Tg(PDGFB-LRRK2*G2019S)32Hlw\" #> (G93A)Tg+ (H163R) PS-1 YAC #> \"Tg(SOD1*G93A)1Gur\" \"Tg(PSEN1H163R)G9Btla\""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":null,"dir":"Reference","previous_headings":"","what":"Gene symbol matching — match_gene_symbol","title":"Gene symbol matching — match_gene_symbol","text":"Correct alias gene symbols, Ensembl IDs, Entrez IDs canonical gene symbols. useful matching gene names different datasets might always use gene naming conventions.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Gene symbol matching — match_gene_symbol","text":"","code":"match_gene_symbol(query, subject, gene_mapping = human_gene_mapping) canonical_gene_symbol(query, gene_mapping = human_gene_mapping)"},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Gene symbol matching — match_gene_symbol","text":"query Character vector gene symbols IDs subject Vector gene symbols IDs index gene_mapping Named vector names gene symbols IDs values canonical gene symbols","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Gene symbol matching — match_gene_symbol","text":"match_gene_symbol Integer vector indices v subject[v] corresponds gene symbols query canonical_gene_symbol Character vector canonical gene symbols symbol query","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Gene symbol matching — match_gene_symbol","text":"","code":"####################################################################### ## match_gene_symbol() example ####################################################################### match_gene_symbol( c(\"CD8\", \"CD4\", \"CD45\"), c(\"ENSG00000081237.19\", \"ENSG00000153563.15\", \"ENSG00000010610.9\", \"ENSG00000288825\") ) #> [1] 2 3 1 ####################################################################### ## canonical_gene_symbol() example ####################################################################### canonical_gene_symbol(c(\"CD45\", \"CD8\", \"CD4\")) #> CD45 CD8 CD4 #> \"PTPRC\" \"CD8A\" \"CD4\""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":null,"dir":"Reference","previous_headings":"","what":"Find gene region — gene_region","title":"Find gene region — gene_region","text":"Conveniently look region gene gene symbol. value returned function can used region argument trackplot functions trackplot_coverage() trackplot_gene()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find gene region — gene_region","text":"","code":"gene_region( genes, gene_symbol, extend_bp = c(10000, 10000), gene_mapping = human_gene_mapping )"},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find gene region — gene_region","text":"genes Transcipt features given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand gene_name: Symbol gene ID gene_symbol Name gene symbol ID extend_bp Bases extend region upstream downstream gene. length 1, extension symmetric. length 2, provide upstream extension downstream extension positive distances. gene_mapping Named vector names gene symbols IDs values canonical gene symbols","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find gene region — gene_region","text":"List chr, start, end positions use trackplot functions.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find gene region — gene_region","text":"","code":"## Prep data genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) ## Get gene region gene_region(genes, \"CD19\", extend_bp = 1e5) #> $chr #> [1] \"chr16\" #> #> $start #> [1] 28831970 #> #> $end #> [1] 29039342 #>"},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"ArchR-style gene activity scores based weighted sum tile according signed distance tile gene body. function calculates signed distances according ArchR's default parameters.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"","code":"gene_score_tiles_archr( genes, chromosome_sizes = NULL, tile_width = 500, addArchRBug = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"genes Gene coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand chromosome_sizes (optional) Size chromosomes genomic-ranges object tile_width Size tiles consider addArchRBug Replicate ArchR bug handling nested genes","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"Tibble one range per tile, additional metadata columns gene_idx (row index gene tile corresponds ) distance. Distance signed distance calculated tile smaller start coordinate gene gene + strand, distance negative. distance adjacent non-overlapping regions 1bp, counting .","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"ArchR's tile distance algorithm works follows Genes extended 5kb upstream Genes linked tiles 1kb-100kb upstream + downstream, tiles beyond neighboring gene considered","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"","code":"## Prep data directory <- file.path(tempdir(), \"references\") genes <- read_gencode_genes( directory, release = \"42\", annotation_set = \"basic\", ) ## Get gene scores by tile gene_score_tiles_archr( genes ) #> # A tibble: 6,900,314 × 5 #> chr start end gene_idx distance #> #> 1 chr1 0 500 1 -6369 #> 2 chr1 500 1000 1 -5869 #> 3 chr1 1000 1500 1 -5369 #> 4 chr1 1500 2000 1 -4869 #> 5 chr1 2000 2500 1 -4369 #> 6 chr1 2500 3000 1 -3869 #> 7 chr1 3000 3500 1 -3369 #> 8 chr1 3500 4000 1 -2869 #> 9 chr1 4000 4500 1 -2369 #> 10 chr1 4500 5000 1 -1869 #> # ℹ 6,900,304 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate GeneActivityScores — gene_score_weights_archr","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"Gene activity scores can calculated distance-weighted sum per-tile accessibility. tile weights gene can represented sparse matrix dimension genes x tiles. multiply weight matrix corresponding tile matrix (tiles x cells), can get gene activity score matrix genes x cells. gene_score_weights_archr() calculates weight matrix (best pre-computed tile matrix), gene_score_archr() provides easy--use wrapper.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"","code":"gene_score_weights_archr( genes, chromosome_sizes, blacklist = NULL, tile_width = 500, gene_name_column = \"gene_id\", addArchRBug = FALSE ) gene_score_archr( fragments, genes, chromosome_sizes, blacklist = NULL, tile_width = 500, gene_name_column = \"gene_id\", addArchRBug = FALSE, tile_max_count = 4, scale_factor = 10000, tile_matrix_path = tempfile(pattern = \"gene_score_tile_mat\") )"},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"genes Gene coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand chromosome_sizes Chromosome start end coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position See read_ucsc_chrom_sizes(). blacklist Regions exclude calculations, given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position tile_width Size tiles consider gene_name_column NULL, column name genes use row names addArchRBug Replicate ArchR bug handling nested genes fragments Input fragments object tile_max_count Maximum value tile counts matrix. null, tile counts higher clipped tile_max_count. Equivalent ceiling argument ArchR::addGeneScoreMatrix() scale_factor null, counts cell scaled sum scale_factor. Equivalent scaleTo argument ArchR::addGeneScoreMatrix() tile_matrix_path Path directory intermediate tile matrix saved","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"gene_score_weights_archr Weight matrix dimension genes x tiles gene_score_archr Gene score matrix dimension genes x cells.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"gene_score_weights_archr: Given set tile coordinates distances returned gene_score_tiles_archr(), calculate weight matrix dimensions genes x tiles. matrix can multiplied tile matrix obtain ArchR-compatible gene activity scores.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"","code":"## Prep data reference_dir <- file.path(tempdir(), \"references\") frags <- get_demo_frags() genes <- read_gencode_genes( reference_dir, release=\"42\", annotation_set = \"basic\", ) %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) blacklist <- read_encode_blacklist(reference_dir, genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) chrom_sizes$tile_width = 500 ####################################################################### ## gene_score_weights_archr() example ####################################################################### ## Get gene score weight matrix (genes x tiles) gene_score_weights <- gene_score_weights_archr( genes, chrom_sizes, blacklist ) ## Get tile matrix (tiles x cells) tiles <- tile_matrix(frags, chrom_sizes, mode = \"fragments\") ## Get gene scores per cell gene_score_weights %*% tiles #> 3849 x 2600 IterableMatrix object with class MatrixMultiply #> #> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * ConvertMatrixType (2600x650604) ####################################################################### ## gene_score_archr() example ####################################################################### ## This is a wrapper that creates both the gene score weight ## matrix and tile matrix together gene_score_archr(frags, genes, chrom_sizes, blacklist) #> 3849 x 2600 IterableMatrix object with class TransformScaleShift #> #> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * TransformMin (2600x650604) #> 2. Scale columns by 0.917, 0.495 ... 8.53"},{"path":"https://bnprks.github.io/BPCells/reference/genomic-ranges-like.html","id":null,"dir":"Reference","previous_headings":"","what":"Genomic range formats — genomic-ranges-like","title":"Genomic range formats — genomic-ranges-like","text":"BPCells accepts flexible set genomic ranges-like objects input, either GRanges, data.frame, lists, character vectors. objects must specify chromosome, start, end coordinates along optional metadata range. exception GenomicRanges::GRanges objects, BPCells assumes objects use zero-based, end-exclusive coordinate system (see details).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/genomic-ranges-like.html","id":"valid-range-like-objects","dir":"Reference","previous_headings":"","what":"Valid Range-like objects","title":"Genomic range formats — genomic-ranges-like","text":"BPCells can interpret following types ranges: list(), data.frame(), columns: chr: Character factor chromosome names start: Start coordinates (0-based) end: End coordinates (exclusive) (optional) strand: \"+\"/\"-\" TRUE/FALSE pos/neg strand (optional) Additional metadata named list entries data.frame columns GenomicRanges::GRanges start(x) interpreted 1-based start coordinate end(x) interpreted inclusive end coordinate strand(x): \"*\" entries interpeted postive strand (optional) mcols(x) holds additional metadata character Given format \"chr1:1000-2000\" \"chr1:1,000-2,000\" Uses 0-based, end-exclusive coordinate system used ranges additional metadata required","code":""},{"path":"https://bnprks.github.io/BPCells/reference/genomic-ranges-like.html","id":"range-coordinate-systems","dir":"Reference","previous_headings":"","what":"Range coordinate systems","title":"Genomic range formats — genomic-ranges-like","text":"two main conventions coordinate systems: One-based, end-inclusive ranges first base chromosome numbered 1 last base range equal end coordinate e.g. 1-5 describes first 5 bases chromosome Used formats SAM, GTF BPCells, used reading writing GenomicRanges::GRanges objects Zero-based, end-exclusive ranges first base chromosome numbered 0 last base range one less end coordinate e.g. 0-5 describes first 5 bases chromosome Used formats BAM, BED BPCells, used range objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":null,"dir":"Reference","previous_headings":"","what":"Import MatrixMarket files — import_matrix_market","title":"Import MatrixMarket files — import_matrix_market","text":"Read sparse matrix MatrixMarket file. text-based format used 10x, Parse, others store sparse matrices. Format details NIST website.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Import MatrixMarket files — import_matrix_market","text":"","code":"import_matrix_market( mtx_path, outdir = tempfile(\"matrix_market\"), row_names = NULL, col_names = NULL, row_major = FALSE, tmpdir = tempdir(), load_bytes = 4194304L, sort_bytes = 1073741824L ) import_matrix_market_10x( mtx_dir, outdir = tempfile(\"matrix_market\"), feature_type = NULL, row_major = FALSE, tmpdir = tempdir(), load_bytes = 4194304L, sort_bytes = 1073741824L )"},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Import MatrixMarket files — import_matrix_market","text":"mtx_path Path mtx mtx.gz file outdir Directory store output row_names Character vector row names col_names Character vector col names row_major true, store matrix row-major orientation tmpdir Temporary directory use intermediate storage load_bytes minimum contiguous load size merge sort passes sort_bytes amount memory allocate re-sorting chunks entries mtx_dir Directory holding matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz feature_type String vector feature types include. (cellranger 3.0 newer)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Import MatrixMarket files — import_matrix_market","text":"MatrixDir object imported matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Import MatrixMarket files — import_matrix_market","text":"Import MatrixMarket mtx files BPCells format. implementation ensures fixed memory usage even large inputs -disk sorts. much slower hdf5 inputs, use MatrixMarket format absolutely necessary. rough speed estimate, importing 17GB Parse 1M PBMC DGE_1M_PBMC.mtx file takes 4 minutes 1.3GB RAM, producing compressed output matrix 1.5GB. mtx.gz files slower import due gzip decompression. importing 10x mtx files, row column names can read automatically using import_matrix_market_10x() convenience function.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_adjacency_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","title":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","text":"Clustering functions like cluster_graph_leiden() cluster_graph_louvain() require graph adjacency matrix input. assume square dgCMatrix graph adjacency matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_adjacency_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","text":"","code":"is_adjacency_matrix(mat)"},{"path":"https://bnprks.github.io/BPCells/reference/is_adjacency_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","text":"TRUE mat graph adjacency matrix, FALSE otherwise","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_knn_object.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if an input is a kNN object — is_knn_object","title":"Check if an input is a kNN object — is_knn_object","text":"knn object functions knn_hnsw() knn_annoy() return list two matrices, idx dist. used inputs create graph adjacency matrices clustering. Assume list least idx dist items kNN object.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_knn_object.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if an input is a kNN object — is_knn_object","text":"","code":"is_knn_object(mat)"},{"path":"https://bnprks.github.io/BPCells/reference/is_knn_object.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if an input is a kNN object — is_knn_object","text":"TRUE mat knn object, FALSE otherwise","code":""},{"path":"https://bnprks.github.io/BPCells/reference/iterate_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a wrapped pointer to the iterable matrix — iterate_matrix","title":"Get a wrapped pointer to the iterable matrix — iterate_matrix","text":"Get wrapped pointer iterable matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/iterate_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a wrapped pointer to the iterable matrix — iterate_matrix","text":"","code":"iterate_matrix(x)"},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a knn object from reduced dimensions — knn_hnsw","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"Search approximate nearest neighbors cells reduced dimensions (e.g. PCA), return k nearest neighbors (knn) cell. Optionally, can find neighbors two separate sets cells utilizing data query.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"","code":"knn_hnsw( data, query = NULL, k = 10, metric = c(\"euclidean\", \"cosine\"), verbose = TRUE, threads = 1, ef = 100 ) knn_annoy( data, query = NULL, k = 10, metric = c(\"euclidean\", \"cosine\", \"manhattan\", \"hamming\"), n_trees = 50, search_k = -1 )"},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"data cell x dims matrix reference dataset query cell x dims matrix query dataset (optional) k number neighbors calculate metric distance metric use verbose whether print progress information search threads Number threads use. Note result non-deterministic threads > 1 ef ef parameter RcppHNSW::hnsw_search(). Increase slower search improved accuracy n_trees Number trees index build time. trees gives higher accuracy search_k Number nodes inspect query, -1 default value. Higher number gives higher accuracy","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"Named list two matrices dimension (cells x k): idx: Neighbor indices, idx[c, n] index nth nearest neighbor cell c. dist: Neighbor distances, dist[c, n] distance cell c nth nearest neighbor. query given, nearest neighbors found mapping data matrix , likely including self-neighbors (.e. idx[c,1] == c cells).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"knn_hnsw: Use RcppHNSW knn engine knn_annoy: Use RcppAnnoy knn engine","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":null,"dir":"Reference","previous_headings":"","what":"K Nearest Neighbor (KNN) Graph — knn_to_graph","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"Convert KNN object (e.g. returned knn_hnsw() knn_annoy()) graph. graph represented sparse adjacency matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"","code":"knn_to_graph(knn, use_weights = FALSE, self_loops = TRUE) knn_to_snn_graph( knn, min_val = 1/15, self_loops = FALSE, return_type = c(\"matrix\", \"list\") ) knn_to_geodesic_graph(knn, return_type = c(\"matrix\", \"list\"), threads = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"knn List 2 matrices – idx cell x K neighbor indices, dist cell x K neighbor distances use_weights boolean whether replace distance weights 1 self_loops Whether allow self-loops output graph min_val minimum jaccard index neighbors. Values round 0 return_type Whether return sparse adjacency matrix edge list threads Number threads use calculations","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"knn_to_graph Sparse matrix (dgCMatrix) mat[,j] = distance cell cell j, 0 cell j K nearest neighbors knn_to_snn_graph return_type == \"matrix\": Sparse matrix (dgCMatrix) mat[,j] = jaccard index overlap nearest neigbors cell cell j, 0 jaccard index < min_val. lower triangle filled , compatible BPCells clustering methods return_type == \"list\": List 3 equal-length vectors , j, weight, along integer dim. correspond rows, cols, values non-zero entries lower triangle adjacency matrix. dim total number vertices (cells) graph knn_to_geodesic_graph return_type == \"matrix\": Sparse matrix (dgCMatrix) mat[,j] = normalized similarity cell cell j. lower triangle filled , compatible BPCells clustering methods return_type == \"list\": List 3 equal-length vectors , j, weight, along integer dim. correspond rows, cols, values non-zero entries lower triangle adjacency matrix. dim total number vertices (cells) graph","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"knn_to_graph Create knn graph knn_to_snn_graph Convert knn object shared nearest neighbors adjacency matrix. follows algorithm Seurat uses compute SNN graphs knn_to_geodesic_graph Convert knn object undirected weighted graph, using geodesic distance estimation method UMAP package. matches output umap._umap.fuzzy_simplicial_set umap-learn python package, used default scanpy.pp.neighbors. re-weights symmetrizes KNN graph, usually use less memory return sparser graph knn_to_snn_graph computes 2nd-order neighbors. Note: cells listed nearest neighbor, results may differ slightly umap._umap.fuzzy_simplicial_set, assumes self always successfully found approximate nearest neighbor search.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/linear_operator.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct a LinearOperator object — linear_operator","title":"Construct a LinearOperator object — linear_operator","text":"Constructs C++ matrix object save pointer use repeated matrix-vector products bit experimental still internal use","code":""},{"path":"https://bnprks.github.io/BPCells/reference/linear_operator.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct a LinearOperator object — linear_operator","text":"","code":"linear_operator(mat)"},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":null,"dir":"Reference","previous_headings":"","what":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"Test MACS executable valid. macs_executable NULL, function try auto-detect MACS PATH, preference MACS3 MACS2. macs_executable provided, function check MACS can called.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"","code":"macs_path_is_valid(macs_executable)"},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"macs_executable (string) Path either MACS2/3 executable. Default (NULL) autodetect PATH.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"MACS executable path.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":null,"dir":"Reference","previous_headings":"","what":"Test for marker features — marker_features","title":"Test for marker features — marker_features","text":"Given features x cells matrix, perform one-vs-differential tests find markers.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test for marker features — marker_features","text":"","code":"marker_features(mat, groups, method = \"wilcoxon\")"},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Test for marker features — marker_features","text":"mat IterableMatrix object dimensions features x cells groups Character/factor vector cell groups/clusters. Length #cells method Test method use. Current options : wilcoxon: Wilconxon rank-sum test .k.Mann-Whitney U test","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Test for marker features — marker_features","text":"tibble following columns: foreground: Group ID used foreground background: Group ID used background (NA comparing rest cells) feature: ID feature p_val_raw: Unadjusted p-value differential test foreground_mean: Average value foreground group background_mean: Average value background group","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Test for marker features — marker_features","text":"Tips using values function: Use dplyr::mutate() add columns e.g. adjusted p-value log fold change. Use dplyr::filter() get differential genes given threshold get adjusted p-values, use R p.adjust(), recommended method \"BH\" get log2 fold change: input matrix already log-transformed, calculate (foreground_mean - background_mean)/log(2). input matrix log-transformed, calculate log2(forground_mean/background_mean)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Test for marker features — marker_features","text":"","code":"mat <- get_demo_mat() groups <- sample(c(\"A\", \"B\", \"C\", \"D\"), ncol(mat), replace = TRUE) marker_feats <- marker_features(mat, groups) #> Warning: marker features calculation requires row-major storage #> • Consider using transpose_storage_order() if running marker_features repeatedly #> This message is displayed once every 8 hours. #> Writing transposed storage order to /tmp/RtmpCiGY9C/transpose3d2cda46aa001f ## to see the results of one specific group vs all other groups marker_feats %>% dplyr::filter(foreground == \"A\") #> # A tibble: 3,582 × 6 #> foreground background feature p_val_raw foreground_mean background_mean #> #> 1 A NA ENSG00000272… 0.130 0.0275 0.0427 #> 2 A NA ENSG00000250… 0.886 0.136 0.143 #> 3 A NA ENSG00000275… 0.412 0 0.00103 #> 4 A NA ENSG00000186… 1 0 0 #> 5 A NA ENSG00000286… 0.389 0.0107 0.00771 #> 6 A NA ENSG00000131… 0.347 0.113 0.131 #> 7 A NA ENSG00000281… 0.657 0.0183 0.0211 #> 8 A NA ENSG00000272… 1 0 0 #> 9 A NA ENSG00000182… 0.148 0.359 0.304 #> 10 A NA ENSG00000174… 0.832 0.111 0.111 #> # ℹ 3,572 more rows ## get only differential genes given a threshold value marker_feats %>% dplyr::filter(p_val_raw < 0.05) #> # A tibble: 473 × 6 #> foreground background feature p_val_raw foreground_mean background_mean #> #> 1 B NA ENSG00000178… 0.0360 0.0180 0.00931 #> 2 A NA ENSG00000163… 0.0436 0.0748 0.102 #> 3 C NA ENSG00000159… 0.0380 0.205 0.145 #> 4 A NA ENSG00000125… 0.0429 0.00763 0.0175 #> 5 B NA ENSG00000159… 0.0238 0.0616 0.0982 #> 6 D NA ENSG00000159… 0.0484 0.120 0.0787 #> 7 B NA ENSG00000248… 0.0160 0.00300 0 #> 8 C NA ENSG00000173… 0.00666 0 0.0143 #> 9 B NA ENSG00000013… 0.00435 0.168 0.113 #> 10 A NA ENSG00000246… 0.0221 0.0260 0.0123 #> # ℹ 463 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/mask_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","title":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","text":"Mask matrix entries zero Set matrix entries zero given mask matrix dimensions. Normally, non-zero values mask set matrix entry zero. inverted, zero values mask matrix set matrix entry zero.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mask_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","text":"","code":"mask_matrix(mat, mask, invert = FALSE)"},{"path":"https://bnprks.github.io/BPCells/reference/mask_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","text":"mat Data matrix (IterableMatrix) mask Mask matrix (IterableMatrix dgCMatrix)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":null,"dir":"Reference","previous_headings":"","what":"Broadcasting vector arithmetic — add_rows","title":"Broadcasting vector arithmetic — add_rows","text":"Convenience functions adding multiplying row / column matrix number.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Broadcasting vector arithmetic — add_rows","text":"","code":"add_rows(mat, vec) add_cols(mat, vec) multiply_rows(mat, vec) multiply_cols(mat, vec)"},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Broadcasting vector arithmetic — add_rows","text":"mat Matrix-like object vec Numeric vector","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Broadcasting vector arithmetic — add_rows","text":"Matrix-like object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Broadcasting vector arithmetic — add_rows","text":"","code":"set.seed(12345) mat <- matrix(rpois(40, lambda = 5), nrow = 4) rownames(mat) <- paste0(\"gene\", 1:4) mat <- mat %>% as(\"dgCMatrix\") mat #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 5 6 6 4 5 6 3 3 8 #> gene2 8 3 11 . 4 4 4 5 6 8 #> gene3 6 4 1 4 3 9 6 7 4 6 #> gene4 8 5 3 5 9 6 5 . 4 3 mat <- mat %>% as(\"IterableMatrix\") ####################################################################### ## add_rows() example ####################################################################### add_rows(mat, 1:4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 7 6 7 7 5 6 7 4 4 9 #> gene2 10 5 13 2 6 6 6 7 8 10 #> gene3 9 7 4 7 6 12 9 10 7 9 #> gene4 12 9 7 9 13 10 9 4 8 7 ####################################################################### ## add_cols() example ####################################################################### add_cols(mat, 1:10) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 7 7 9 10 9 11 13 11 12 18 #> gene2 9 5 14 4 9 10 11 13 15 18 #> gene3 7 6 4 8 8 15 13 15 13 16 #> gene4 9 7 6 9 14 12 12 8 13 13 ####################################################################### ## multiply_rows() example ####################################################################### multiply_rows(mat, 1:4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 5 6 6 4 5 6 3 3 8 #> gene2 16 6 22 . 8 8 8 10 12 16 #> gene3 18 12 3 12 9 27 18 21 12 18 #> gene4 32 20 12 20 36 24 20 . 16 12 ####################################################################### ## multiply_cols() example ####################################################################### multiply_cols(mat, 1:10) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 10 18 24 20 30 42 24 27 80 #> gene2 8 6 33 . 20 24 28 40 54 80 #> gene3 6 8 3 16 15 54 42 56 36 60 #> gene4 8 10 9 20 45 36 35 . 36 30"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_R_conversion.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert between BPCells matrix and R objects. — matrix_R_conversion","title":"Convert between BPCells matrix and R objects. — matrix_R_conversion","text":"BPCells matrices can interconverted Matrix package dgCMatrix sparse matrices, well base R dense matrices (though may result high memory usage large matrices)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_R_conversion.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert between BPCells matrix and R objects. — matrix_R_conversion","text":"","code":"# Convert to R from BPCells as(bpcells_mat, \"dgCMatrix\") # Sparse matrix conversion as.matrix(bpcells_mat) # Dense matrix conversion # Convert to BPCells from R as(dgc_mat, \"IterableMatrix\")"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_R_conversion.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert between BPCells matrix and R objects. — matrix_R_conversion","text":"","code":"mat <- get_demo_mat()[1:2, 1:2] mat #> 2 x 2 IterableMatrix object with class MatrixSubset #> #> Row names: ENSG00000272602, ENSG00000250312 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /home/imman/.local/share/R/BPCells/demo_data/demo_mat_filtered_subsetted #> 2. Select rows: 1, 2 and cols: 1, 2 ####################################################################### ## as(bpcells_mat, \"dgCMatrix\") example ####################################################################### mat_dgc <- as(mat, \"dgCMatrix\") mat_dgc #> 2 x 2 sparse Matrix of class \"dgCMatrix\" #> TTTAGCAAGGTAGCTT-1 AGCCGGTTCCGGAACC-1 #> ENSG00000272602 1 . #> ENSG00000250312 . . ## as.matrix(bpcells_mat) example as.matrix(mat) #> Warning: Converting to a dense matrix may use excessive memory #> This message is displayed once every 8 hours. #> TTTAGCAAGGTAGCTT-1 AGCCGGTTCCGGAACC-1 #> ENSG00000272602 1 0 #> ENSG00000250312 0 0 ## Alternatively, can also use function as() as(mat, \"matrix\") #> TTTAGCAAGGTAGCTT-1 AGCCGGTTCCGGAACC-1 #> ENSG00000272602 1 0 #> ENSG00000250312 0 0 ####################################################################### ## as(dgc_mat, \"IterableMatrix\") example ####################################################################### as(mat_dgc, \"IterableMatrix\") #> 2 x 2 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: ENSG00000272602, ENSG00000250312 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_inputs.html","id":null,"dir":"Reference","previous_headings":"","what":"Return a list of input matrices to the current matrix (experimental) — matrix_inputs","title":"Return a list of input matrices to the current matrix (experimental) — matrix_inputs","text":"File objects 0 inputs. transforms 1 input. transforms (e.g. matrix multiplication matrix concatenation) can multiple used primarily know safe clear dimnames intermediate transformed matrices. C++ relies base matrices (non-transform) dimnames, R relies outermost matrix (transform) dimnames.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_inputs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return a list of input matrices to the current matrix (experimental) — matrix_inputs","text":"","code":"matrix_inputs(x)"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write sparse matrices — write_matrix_memory","title":"Read/write sparse matrices — write_matrix_memory","text":"BPCells matrices stored sparse format, meaning non-zero entries stored. Matrices can store integer counts data decimal numbers (float double). See details information.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write sparse matrices — write_matrix_memory","text":"","code":"write_matrix_memory(mat, compress = TRUE) write_matrix_dir( mat, dir, compress = TRUE, buffer_size = 8192L, overwrite = FALSE ) open_matrix_dir(dir, buffer_size = 8192L) write_matrix_hdf5( mat, path, group, compress = TRUE, buffer_size = 8192L, chunk_size = 1024L, overwrite = FALSE, gzip_level = 0L ) open_matrix_hdf5(path, group, buffer_size = 16384L)"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write sparse matrices — write_matrix_memory","text":"compress Whether compress data. dir Directory save data buffer_size performance tuning . number items buffered memory calling writes disk. overwrite TRUE, write temp dir overwrite existing data. Alternatively, pass temp path string customize temp dir location. path Path hdf5 file disk group group within hdf5 file write data . writing existing hdf5 file group must already use chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression). recommended compression compatibility outside programs required. Otherwise, using compress=TRUE recommended >10x faster often similar compression levels. matrix Input matrix, either IterableMatrix dgCMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write sparse matrices — write_matrix_memory","text":"BPCells matrix object","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"storage-locations","dir":"Reference","previous_headings":"","what":"Storage locations","title":"Read/write sparse matrices — write_matrix_memory","text":"Matrices can stored directory disk, memory, HDF5 file. Saving directory disk good default local analysis, provides best /O performance lowest memory usage. HDF5 format allows saving within existing hdf5 files group data together, memory format provides fastest performance event memory usage unimportant.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"bitpacking-compression","dir":"Reference","previous_headings":"","what":"Bitpacking Compression","title":"Read/write sparse matrices — write_matrix_memory","text":"typical RNA counts matrices holding integer counts, bitpacking compression result 6-8x less space R dgCMatrix, 4-6x smaller scipy csc_matrix. compression effective count values matrix small, rows matrix sorted rowMeans. tests RNA-seq data optimal ordering save 40% storage space. non-integer data row indices compressed, values space savings smaller. non-integer data matrices, bitpacking compression much less effective, can applied indexes entry values. still space savings, far less counts matrices.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write sparse matrices — write_matrix_memory","text":"","code":"## Create temporary directory to keep demo matrix data_dir <- file.path(tempdir(), \"mat\") if (dir.exists(data_dir)) unlink(data_dir, recursive = TRUE) dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) mat <- get_demo_mat() mat #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /home/imman/.local/share/R/BPCells/demo_data/demo_mat_filtered_subsetted ####################################################################### ## write_matrix_memory() example ####################################################################### mat_memory <- write_matrix_memory(mat) mat_memory #> 3582 x 2600 IterableMatrix object with class PackedMatrixMem_uint32_t #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from memory ####################################################################### ## write_matrix_dir() example ####################################################################### mat %>% write_matrix_dir( file.path(data_dir, \"demo_mat\"), overwrite = TRUE ) #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /tmp/RtmpCiGY9C/mat/demo_mat ####################################################################### ## open_matrix_dir() example ####################################################################### mat <- open_matrix_dir( file.path(data_dir, \"demo_mat\") ) mat #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /tmp/RtmpCiGY9C/mat/demo_mat ####################################################################### ## write_matrix_hdf5() example ####################################################################### mat %>% write_matrix_hdf5(path = file.path(data_dir, \"demo_mat.h5\"), group = \"mat\") #> 3582 x 2600 IterableMatrix object with class MatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix in hdf5 file /tmp/RtmpCiGY9C/mat/demo_mat.h5, group mat ####################################################################### ## open_matrix_hdf5() example ####################################################################### mat_hdf5 <- open_matrix_hdf5( file.path(data_dir, \"demo_mat.h5\"), group = 'mat' ) mat_hdf5 #> 3582 x 2600 IterableMatrix object with class MatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix in hdf5 file /tmp/RtmpCiGY9C/mat/demo_mat.h5, group mat"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate matrix stats — matrix_stats","title":"Calculate matrix stats — matrix_stats","text":"Calculate matrix stats","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate matrix stats — matrix_stats","text":"","code":"matrix_stats( matrix, row_stats = c(\"none\", \"nonzero\", \"mean\", \"variance\"), col_stats = c(\"none\", \"nonzero\", \"mean\", \"variance\"), threads = 0L )"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate matrix stats — matrix_stats","text":"matrix Input matrix object row_stats row statistics compute col_stats col statistics compute threads Number threads use execution","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate matrix stats — matrix_stats","text":"List row_stats: matrix n_stats x n_rows, col_stats: matrix n_stats x n_cols","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate matrix stats — matrix_stats","text":"statistics calculated single pass matrix, method desirable use efficiency purposes compared standard rowMeans colMeans multiple statistics needed. stats ordered complexity: nonzero, mean, variance. less complex stats calculated process calculating complicated stat. calculate mean variance simultaneously, just ask variance, compute mean nonzero counts side-effect","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate matrix stats — matrix_stats","text":"","code":"mat <- matrix(rpois(100, lambda = 5), nrow = 10) rownames(mat) <- paste0(\"gene\", 1:10) colnames(mat) <- paste0(\"cell\", 1:10) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") ## By default, no row or column stats are calculated res_none <- matrix_stats(mat) res_none #> $row_stats #> gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9 gene10 #> #> $col_stats #> cell1 cell2 cell3 cell4 cell5 cell6 cell7 cell8 cell9 cell10 #> ## Request row variance (automatically computes mean and nonzero too) res_row_var <- matrix_stats(mat, row_stats = \"variance\") res_row_var #> $row_stats #> gene1 gene2 gene3 gene4 gene5 gene6 gene7 #> nonzero 10.000000 10.000000 10.00000 10.000000 10.000000 10.000000 10.000000 #> mean 6.000000 5.200000 5.40000 4.800000 5.700000 5.800000 7.000000 #> variance 5.555556 1.733333 10.93333 3.288889 6.677778 3.511111 5.555556 #> gene8 gene9 gene10 #> nonzero 10.000000 10.000000 10.000000 #> mean 4.200000 3.500000 4.800000 #> variance 3.288889 3.388889 5.288889 #> #> $col_stats #> cell1 cell2 cell3 cell4 cell5 cell6 cell7 cell8 cell9 cell10 #> ## Request both row variance and column variance res_both_var <- matrix_stats( mat = mat, row_stats = \"variance\", col_stats = \"mean\" ) res_both_var #> $row_stats #> gene1 gene2 gene3 gene4 gene5 gene6 gene7 #> nonzero 10.000000 10.000000 10.00000 10.000000 10.000000 10.000000 10.000000 #> mean 6.000000 5.200000 5.40000 4.800000 5.700000 5.800000 7.000000 #> variance 5.555556 1.733333 10.93333 3.288889 6.677778 3.511111 5.555556 #> gene8 gene9 gene10 #> nonzero 10.000000 10.000000 10.000000 #> mean 4.200000 3.500000 4.800000 #> variance 3.288889 3.388889 5.288889 #> #> $col_stats #> cell1 cell2 cell3 cell4 cell5 cell6 cell7 cell8 cell9 cell10 #> nonzero 10.0 10.0 10.0 10 10.0 10.0 10.0 10.0 10.0 10.0 #> mean 4.5 4.9 6.5 5 4.3 5.1 5.8 5.4 5.4 5.5 #>"},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":null,"dir":"Reference","previous_headings":"","what":"Merge cells into pseudobulks — merge_cells","title":"Merge cells into pseudobulks — merge_cells","text":"Peak tile matrix calculations can sped reducing number cells. cases outputs going added together afterwards, can provide performance improvement","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Merge cells into pseudobulks — merge_cells","text":"","code":"merge_cells(fragments, cell_groups)"},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Merge cells into pseudobulks — merge_cells","text":"fragments Input fragments object cell_groups Character factor vector providing group cell. Ordering cellNames(fragments)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Merge cells into pseudobulks — merge_cells","text":"","code":"frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Pseudobulk into two groups merge_cells(frags, as.factor(c(rep(1,3), rep(2,3)))) #> IterableFragments object of class \"CellMerge\" #> #> Cells: 2 cells with names 1, 2 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Merge 6 cells into 2 groups"},{"path":"https://bnprks.github.io/BPCells/reference/merge_dimnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function for rbind/cbind merging dimnames — merge_dimnames","title":"Helper function for rbind/cbind merging dimnames — merge_dimnames","text":"Helper function rbind/cbind merging dimnames","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_dimnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function for rbind/cbind merging dimnames — merge_dimnames","text":"","code":"merge_dimnames(x, y, warning_prefix, dim_type)"},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":null,"dir":"Reference","previous_headings":"","what":"Merge peaks — merge_peaks_iterative","title":"Merge peaks — merge_peaks_iterative","text":"Merge peaks according ArchR's iterative merging algorithm. details ArchR website","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Merge peaks — merge_peaks_iterative","text":"","code":"merge_peaks_iterative(peaks)"},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Merge peaks — merge_peaks_iterative","text":"peaks Peaks given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position Must ordered priority columns chr, start, end.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Merge peaks — merge_peaks_iterative","text":"tibble::tibble() nonoverlapping subset rows peaks. metadata columns preserved","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Merge peaks — merge_peaks_iterative","text":"Properties merged peaks: peaks merged set overlap Peaks prioritized according order original input output peaks subset input peaks, peak boundaries changed","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Merge peaks — merge_peaks_iterative","text":"","code":"## Create example peaks peaks <- tibble::tibble( chr = \"chr1\", start = as.integer(1:10), end = start + 2L ) peaks #> # A tibble: 10 × 3 #> chr start end #> #> 1 chr1 1 3 #> 2 chr1 2 4 #> 3 chr1 3 5 #> 4 chr1 4 6 #> 5 chr1 5 7 #> 6 chr1 6 8 #> 7 chr1 7 9 #> 8 chr1 8 10 #> 9 chr1 9 11 #> 10 chr1 10 12 ## Merge peaks merge_peaks_iterative(peaks) #> # A tibble: 5 × 3 #> chr start end #> #> 1 chr1 1 3 #> 2 chr1 3 5 #> 3 chr1 5 7 #> 4 chr1 7 9 #> 5 chr1 9 11"},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":null,"dir":"Reference","previous_headings":"","what":"Elementwise minimum — min_scalar","title":"Elementwise minimum — min_scalar","text":"min_scalar: Take minumum global constant min_by_row: Take minimum per-row constant min_by_col: Take minimum per-col constant","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Elementwise minimum — min_scalar","text":"","code":"min_scalar(mat, val) min_by_row(mat, vals) min_by_col(mat, vals)"},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Elementwise minimum — min_scalar","text":"mat IterableMatrix val Single positive numeric value","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Elementwise minimum — min_scalar","text":"IterableMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Elementwise minimum — min_scalar","text":"Take minimum value matrix per-row, per-col, global constant. constant must >0 preserve sparsity matrix. effect capping maximum value matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Elementwise minimum — min_scalar","text":"","code":"set.seed(12345) mat <- matrix(rpois(40, lambda = 5), nrow = 4) rownames(mat) <- paste0(\"gene\", 1:4) mat <- mat %>% as(\"dgCMatrix\") mat #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 5 6 6 4 5 6 3 3 8 #> gene2 8 3 11 . 4 4 4 5 6 8 #> gene3 6 4 1 4 3 9 6 7 4 6 #> gene4 8 5 3 5 9 6 5 . 4 3 mat <- mat %>% as(\"IterableMatrix\") ####################################################################### ## min_scalar() example ####################################################################### min_scalar(mat, 4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 4 4 4 4 4 4 4 3 3 4 #> gene2 4 3 4 . 4 4 4 4 4 4 #> gene3 4 4 1 4 3 4 4 4 4 4 #> gene4 4 4 3 4 4 4 4 . 4 3 ####################################################################### ## min_by_row() example ####################################################################### min_by_row(mat, 1:4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 1 1 1 1 1 1 1 1 1 1 #> gene2 2 2 2 . 2 2 2 2 2 2 #> gene3 3 3 1 3 3 3 3 3 3 3 #> gene4 4 4 3 4 4 4 4 . 4 3 ####################################################################### ## min_by_col() example ####################################################################### min_by_col(mat, 1:10) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 1 2 3 4 4 5 6 3 3 8 #> gene2 1 2 3 . 4 4 4 5 6 8 #> gene3 1 2 1 4 3 6 6 7 4 6 #> gene4 1 2 3 4 5 6 5 . 4 3"},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Normalize an object representing genomic ranges — normalize_ranges","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"Normalize object representing genomic ranges","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"","code":"normalize_ranges( ranges, metadata_cols = character(0), zero_based_coords = !is(ranges, \"GRanges\"), n = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position metadata_cols Optional list metadata columns require & extract zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"data frame zero-based coordinates, elements chr (factor), start (int), end (int). ranges chr level information, chr levels sorted unique values chr. strand metadata_cols, output strand element TRUE positive strand, FALSE negative strand. (Converted character vector \"+\"/\"-\" necessary)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"","code":"## Prep data ranges <- GenomicRanges::GRanges( seqnames = S4Vectors::Rle(c(\"chr1\", \"chr2\", \"chr3\"), c(1, 2, 2)), ranges = IRanges::IRanges(101:105, end = 111:115, names = head(letters, 5)), strand = S4Vectors::Rle(GenomicRanges::strand(c(\"-\", \"+\", \"*\")), c(1, 2, 2)), score = 1:5, GC = seq(1, 0, length=5)) ranges #> GRanges object with 5 ranges and 2 metadata columns: #> seqnames ranges strand | score GC #> | #> a chr1 101-111 - | 1 1.00 #> b chr2 102-112 + | 2 0.75 #> c chr2 103-113 + | 3 0.50 #> d chr3 104-114 * | 4 0.25 #> e chr3 105-115 * | 5 0.00 #> ------- #> seqinfo: 3 sequences from an unspecified genome; no seqlengths ## Normalize ranges normalize_ranges(ranges) #> # A tibble: 5 × 3 #> chr start end #> #> 1 chr1 100 111 #> 2 chr2 101 112 #> 3 chr2 102 113 #> 4 chr3 103 114 #> 5 chr3 104 115 ## With metadata information normalize_ranges(ranges, metadata_cols = c(\"strand\", \"score\", \"GC\")) #> # A tibble: 5 × 6 #> strand chr start end score GC #> #> 1 FALSE chr1 100 111 1 1 #> 2 TRUE chr2 101 112 2 0.75 #> 3 TRUE chr2 102 113 3 0.5 #> 4 TRUE chr3 103 114 4 0.25 #> 5 TRUE chr3 104 115 5 0"},{"path":"https://bnprks.github.io/BPCells/reference/normalize_unique_file_names.html","id":null,"dir":"Reference","previous_headings":"","what":"Adjust a set of (unique) potential file names to not include any invalid characters. — normalize_unique_file_names","title":"Adjust a set of (unique) potential file names to not include any invalid characters. — normalize_unique_file_names","text":"Adjust set (unique) potential file names include invalid characters.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_unique_file_names.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adjust a set of (unique) potential file names to not include any invalid characters. — normalize_unique_file_names","text":"","code":"normalize_unique_file_names(names, replacement = \"_\")"},{"path":"https://bnprks.github.io/BPCells/reference/normalized_dimnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function to set dimnames to NULL instead of 0-length character vectors — normalized_dimnames","title":"Helper function to set dimnames to NULL instead of 0-length character vectors — normalized_dimnames","text":"Helper function set dimnames NULL instead 0-length character vectors","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalized_dimnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function to set dimnames to NULL instead of 0-length character vectors — normalized_dimnames","text":"","code":"normalized_dimnames(row_names, col_names)"},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":null,"dir":"Reference","previous_headings":"","what":"Count fragments by nucleosomal size — nucleosome_counts","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"Count fragments nucleosomal size","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"","code":"nucleosome_counts(fragments, nucleosome_width = 147)"},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"fragments Fragments object nucleosome_width Integer cutoff use nucleosome width","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"List names subNucleosomal, monoNucleosomal, multiNucleosomal, nFrags, containing count vectors fragments class per cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"Shorter nucleosome_width subNucleosomal, nucleosome_width 2*nucleosome_width-1 monoNucleosomal, anything longer multiNucleosomal. sum fragments given nFrags","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"","code":"## Prep data frags_sub_nucleosomal <- tibble::tibble( chr = 1, start = seq(0, 3000, by = 1000), end = start + 146, cell_id = c(rep(\"cell1\", 3), rep(\"cell2\", 1)) ) frags_sub_nucleosomal #> # A tibble: 4 × 4 #> chr start end cell_id #> #> 1 1 0 146 cell1 #> 2 1 1000 1146 cell1 #> 3 1 2000 2146 cell1 #> 4 1 3000 3146 cell2 frags_nucleosomal <- tibble::tibble( chr = 1, start = seq(5000, 7000, by = 1000), end = start + 147, # Value equal to nucleosome_width is inclusive cell_id = c(rep(\"cell1\", 1), rep(\"cell2\", 2)) ) frags_nucleosomal #> # A tibble: 3 × 4 #> chr start end cell_id #> #> 1 1 5000 5147 cell1 #> 2 1 6000 6147 cell2 #> 3 1 7000 7147 cell2 frags_multi_nucleosomal <- tibble::tibble( chr = 1, start = seq(12000, 15000, by = 1000), end = start + 294, # Value equal to 2x nucleosome_width cell_id = c(rep(\"cell1\", 2), rep(\"cell2\", 2)) ) frags_multi_nucleosomal #> # A tibble: 4 × 4 #> chr start end cell_id #> #> 1 1 12000 12294 cell1 #> 2 1 13000 13294 cell1 #> 3 1 14000 14294 cell2 #> 4 1 15000 15294 cell2 frags <- dplyr::bind_rows( frags_sub_nucleosomal, frags_nucleosomal, frags_multi_nucleosomal ) %>% convert_to_fragments() ## Get nucleosome counts head(nucleosome_counts(frags)) #> $subNucleosomal #> [1] 3 1 #> #> $monoNucleosomal #> [1] 1 2 #> #> $multiNucleosomal #> [1] 2 2 #> #> $nFrags #> [1] 6 5 #>"},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write a 10x fragments file — open_fragments_10x","title":"Read/write a 10x fragments file — open_fragments_10x","text":"10x fragment files come bed-like format, columns chr, start, end, cell_id, pcr_duplicates. Unlike standard bed format, format cellranger inclusive end-coordinate, meaning end coordinate counted tagmentation site, rather offset 1.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write a 10x fragments file — open_fragments_10x","text":"","code":"open_fragments_10x(path, comment = \"#\", end_inclusive = TRUE) write_fragments_10x( fragments, path, end_inclusive = TRUE, append_5th_column = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write a 10x fragments file — open_fragments_10x","text":"path File path (e.g. fragments.tsv fragments.tsv.gz) comment Skip lines beginning file start comment string end_inclusive Whether end coordinate bed inclusive – .e. insertion end coordinate rather base end coordinate. 10x default, though quite standard bed file format. fragments Input fragments object append_5th_column Whether include 5th column 0 compatibility 10x fragment file outputs (defaults 4 columns chr,start,end,cell)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write a 10x fragments file — open_fragments_10x","text":"10x fragments file object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write a 10x fragments file — open_fragments_10x","text":"open_fragments_10x disk operations take place fragments used function write_fragments_10x Fragments written disk immediately, returned readable object.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write a 10x fragments file — open_fragments_10x","text":"","code":"## Download example fragments from pbmc 500 dataset and save in temp directory data_dir <- file.path(tempdir(), \"frags_10x\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) url_base <- \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/\" frags_file <- \"atac_pbmc_500_nextgem_fragments.tsv.gz\" atac_raw_url <- paste0(url_base, frags_file) if (!file.exists(file.path(data_dir, frags_file))) { download.file(atac_raw_url, file.path(data_dir, frags_file), mode=\"wb\") } ####################################################################### ## open_fragments_10x() example ####################################################################### frags <- open_fragments_10x( file.path(data_dir, frags_file) ) ## A Fragments object imported from 10x will not have cell/chromosome ## information directly known unless written as a BPCells fragment object frags #> IterableFragments object of class \"ShiftFragments\" #> #> Cells: count unknown #> Chromosomes: count unknown #> #> Queued Operations: #> 1. Load 10x fragments file from /tmp/RtmpCiGY9C/frags_10x/atac_pbmc_500_nextgem_fragments.tsv.gz #> 2. Shift start +0bp, end +1bp frags %>% write_fragments_dir( file.path(data_dir, \"demo_frags_from_h5\"), overwrite = TRUE ) #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 219780 cells with names CCACGTTCAGTAACTC-1, GCGAGAAGTCCACCAG-1 ... AAACGAAGTTCAGAAA-1 #> Chromosomes: 39 chromosomes with names chr1, chr10 ... KI270713.1 #> #> Queued Operations: #> 1. Read compressed fragments from directory /tmp/RtmpCiGY9C/frags_10x/demo_frags_from_h5 ####################################################################### ## write_fragments_10x() example ####################################################################### frags <- write_fragments_10x( frags, file.path(data_dir, paste0(\"new_\", frags_file)) ) frags #> IterableFragments object of class \"ShiftFragments\" #> #> Cells: count unknown #> Chromosomes: count unknown #> #> Queued Operations: #> 1. Load 10x fragments file from /tmp/RtmpCiGY9C/frags_10x/new_atac_pbmc_500_nextgem_fragments.tsv.gz #> 2. Shift start +0bp, end +1bp"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"Read/write 10x feature matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"","code":"open_matrix_10x_hdf5(path, feature_type = NULL, buffer_size = 16384L) write_matrix_10x_hdf5( mat, path, barcodes = colnames(mat), feature_ids = rownames(mat), feature_names = rownames(mat), feature_types = \"Gene Expression\", feature_metadata = list(), buffer_size = 16384L, chunk_size = 1024L, gzip_level = 0L, type = c(\"uint32_t\", \"double\", \"float\", \"auto\") )"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"path Path hdf5 file disk feature_type Optional selection feature types include output matrix. multiome data, options \"Gene Expression\" \"Peaks\". option compatible files cellranger 3.0 newer. buffer_size performance tuning . number items buffered memory calling writes disk. mat IterableMatrix barcodes Vector names cells feature_ids Vector IDs features feature_names Vector names features feature_types String vector feature types feature_metadata Named list additional metadata vectors store feature chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression) type Data type output matrix. Default uint32_t match matrix 10x UMI counts. Non-integer data types include float double. auto, use data type mat.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"BPCells matrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"10x format makes use gzip compression matrix data, can slow read performance. Consider writing another format read performance important . Input matrices must column-major storage order, rownames colnames set, names must provided relevant metadata parameters. metadata parameters read default BPCells, possible export use tools.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"","code":"## Download example matrices from pbmc 500 dataset and save in temp directory data_dir <- file.path(tempdir(), \"mat_10x\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) url_base <- \"https://cf.10xgenomics.com/samples/cell-exp/6.1.0/500_PBMC_3p_LT_Chromium_X/\" mat_file <- \"500_PBMC_3p_LT_Chromium_X_filtered_feature_bc_matrix.h5\" rna_url <- paste0(url_base, mat_file) if (!file.exists(file.path(data_dir, mat_file))) { download.file(rna_url, file.path(data_dir, mat_file), mode=\"wb\") } ####################################################################### ## open_matrix_10x_hdf5() example ####################################################################### mat <- open_matrix_10x_hdf5( file.path(data_dir, mat_file) ) mat #> 36601 x 587 IterableMatrix object with class 10xMatrixH5 #> #> Row names: ENSG00000243485, ENSG00000237613 ... ENSG00000277196 #> Col names: AATCACGAGCATTGAA-1, AATCACGCAAGCCATT-1 ... TTTGTTGTCTCTAGGA-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. 10x HDF5 feature matrix in file /tmp/RtmpCiGY9C/mat_10x/500_PBMC_3p_LT_Chromium_X_filtered_feature_bc_matrix.h5 ####################################################################### ## write_matrix_10x_hdf5() example ####################################################################### mat <- write_matrix_10x_hdf5( mat, file.path(data_dir, paste0(\"new\", mat_file)) ) mat #> 36601 x 587 IterableMatrix object with class 10xMatrixH5 #> #> Row names: ENSG00000243485, ENSG00000237613 ... ENSG00000277196 #> Col names: AATCACGAGCATTGAA-1, AATCACGCAAGCCATT-1 ... TTTGTTGTCTCTAGGA-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. 10x HDF5 feature matrix in file /tmp/RtmpCiGY9C/mat_10x/new500_PBMC_3p_LT_Chromium_X_filtered_feature_bc_matrix.h5"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write AnnData matrix — open_matrix_anndata_hdf5","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"Read write matrix anndata hdf5 file. functions automatically transpose matrices converting /AnnData format. AnnData convention stores cells rows, whereas R convention stores cells columns. behavior undesired, call t() manually matrix inputs outputs functions. users writing AnnData files default write_matrix_anndata_hdf5() rather dense variant (see details information).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"","code":"open_matrix_anndata_hdf5(path, group = \"X\", buffer_size = 16384L) write_matrix_anndata_hdf5( mat, path, group = \"X\", buffer_size = 16384L, chunk_size = 1024L, gzip_level = 0L ) write_matrix_anndata_hdf5_dense( mat, path, dataset = \"X\", buffer_size = 16384L, chunk_size = 1024L, gzip_level = 0L )"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"path Path hdf5 file disk group group within hdf5 file write data . writing existing hdf5 file group must already use buffer_size performance tuning . number items buffered memory calling writes disk. chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression) dataset dataset within hdf5 file write matrix . Used write_matrix_anndata_hdf5_dense","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"AnnDataMatrixH5 object, cells columns.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"Efficiency considerations: Reading dense AnnData matrix generally slower sparse single cell datasets, recommended re-write dense AnnData inputs sparse format early processing. write_matrix_anndata_hdf5() used default, always writes efficient sparse format. write_matrix_anndata_hdf5_dense() writes AnnData dense format, can used smaller matrices efficiency file size less concern increased portability (e.g. writing obsm varm matrices). See AnnData docs format details. Dimension names: Dimnames inferred obs/_index var/_index based length matching. helps infer dimnames obsp, varm, etc. number len(obs) == len(var), dimname inference disabled.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"","code":"## Create temporary directory to keep demo matrix data_dir <- file.path(tempdir(), \"mat_anndata\") if (dir.exists(data_dir)) unlink(data_dir, recursive = TRUE) dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) mat <- get_demo_mat() ####################################################################### ## write_matrix_anndata_hdf5() example ####################################################################### mat <- write_matrix_anndata_hdf5( mat, file.path(data_dir, paste0(\"new_demo_mat.h5\")) ) mat #> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. AnnData HDF5 matrix in file /tmp/RtmpCiGY9C/mat_anndata/new_demo_mat.h5, group X ####################################################################### ## open_matrix_anndata_hdf5() example ####################################################################### mat <- open_matrix_anndata_hdf5( file.path(data_dir, paste0(\"new_demo_mat.h5\")) ) mat #> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. AnnData HDF5 matrix in file /tmp/RtmpCiGY9C/mat_anndata/new_demo_mat.h5, group X ####################################################################### ## write_matrix_anndata_hdf5_dense() example ####################################################################### mat <- write_matrix_anndata_hdf5_dense( mat, file.path(data_dir, paste0(\"new_demo_mat_dense.h5\")) ) mat #> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. AnnData HDF5 matrix in file /tmp/RtmpCiGY9C/mat_anndata/new_demo_mat_dense.h5, group X"},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Get end-sorted ordering for genome ranges — order_ranges","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"Use function order regioins prior calling peak_matrix() tile_matrix().","code":""},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"","code":"order_ranges(ranges, chr_levels, sort_by_end = TRUE)"},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position chr_levels Ordering chromosome names sort_by_end TRUE (defualt), sort (chr, end, start). Else sort (chr, start, end)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"Numeric vector analagous order function. Provides index selection reorder input ranges sorted chr, end, start","code":""},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"","code":"## Prep data ranges <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(310, 0, -60), cell_id = paste0(\"cell1\") ) %>% as(\"GRanges\") ranges #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 10-320 * | cell1 #> [2] chr1 60-310 * | cell1 #> [3] chr1 110-300 * | cell1 #> [4] chr1 160-290 * | cell1 #> [5] chr1 210-280 * | cell1 #> [6] chr1 260-270 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths ## Get end-sorted ordering order_ranges(ranges, levels(GenomicRanges::seqnames(ranges))) #> [1] 6 5 4 3 2 1"},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":null,"dir":"Reference","previous_headings":"","what":"Color palettes — discrete_palette","title":"Color palettes — discrete_palette","text":"color palettes derived ArchR color palettes, provide large sets distinguishable colors","code":""},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Color palettes — discrete_palette","text":"","code":"discrete_palette(name, n = 1) continuous_palette(name)"},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Color palettes — discrete_palette","text":"name Name color palette. Valid discrete palettes : stallion, calm, kelly, bear, ironMan, circus, paired, grove, summerNight, captain. Valid continuous palettes bluePurpleDark n Minimum number colors needed","code":""},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Color palettes — discrete_palette","text":"Character vector hex color codes","code":""},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Color palettes — discrete_palette","text":"requested number colors large, new palette constructed via interpolation requested palette","code":""},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Prepare a matrix for multi-threaded operation — parallel_split","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"Transforms matrix matrix_stats matrix multiplies vector/dense matrix evaluated parallel. speeds specific operations, reading writing matrix general. parallelism guaranteed work additional operations applied parallel split.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"","code":"parallel_split(mat, threads, chunks = threads)"},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"mat IterableMatrix threads Number execution threads chunks Number chunks use (>= threads)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"IterableMatrix perform certain operations parallel","code":""},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":null,"dir":"Reference","previous_headings":"","what":"Create partial function calls — partial_apply","title":"Create partial function calls — partial_apply","text":"Specify arguments function.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create partial function calls — partial_apply","text":"","code":"partial_apply(f, ..., .overwrite = TRUE, .missing_args_error = TRUE)"},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create partial function calls — partial_apply","text":"f function ... Named arguments f .overwrite (bool) f already output partial_apply(), whether parameter re-definitions ignored overwrite existing definitions .missing_args_error (bool) TRUE, passing arguments function's signature raise error, otherwise ignored","code":""},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create partial function calls — partial_apply","text":"bpcells_partial object (function extra attributes)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate ranges x cells overlap matrix — peak_matrix","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"Calculate ranges x cells overlap matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"","code":"peak_matrix( fragments, ranges, mode = c(\"insertions\", \"fragments\", \"overlaps\"), zero_based_coords = !is(ranges, \"GRanges\"), explicit_peak_names = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"fragments Input fragments object. Must cell names chromosome names defined ranges Peaks/ranges overlap, given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position mode Mode counting peak overlaps. (See \"value\" section details) zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) explicit_peak_names Boolean whether add rownames output matrix format e.g chr1:500-1000, start end coords given 0-based coordinate system. Note either way, peak names written matrix saved.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"Iterable matrix object dimension ranges x cells. saved, column names output matrix format chr1:500-1000, start end coords given 0-based coordinate system. mode options \"insertions\": Start end coordinates separately overlapped peak \"fragments\": Like \"insertions\", fragment can contribute 1 count peak, even start end coordinates overlap \"overlaps\": Like \"fragments\", overlap also counted fragment fully spans peak even neither start end falls within peak","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"calculating matrix directly fragments tsv, necessary first call select_chromosomes() order provide ordering chromosomes expect reading tsv.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"","code":"## Prep demo data frags <- get_demo_frags(subset = FALSE) chrom_sizes <- read_ucsc_chrom_sizes(file.path(tempdir(), \"references\"), genome=\"hg38\") blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE) peaks <- call_peaks_tile( frags_filter_blacklist, chrom_sizes, effective_genome_size = 2.8e9 ) top_peaks <- head(peaks, 5000) top_peaks <- top_peaks[order_ranges(top_peaks, chrNames(frags)),] ## Get peak matrix peak_matrix(frags_filter_blacklist, top_peaks, mode=\"insertions\") #> 5000 x 2600 IterableMatrix object with class PeakMatrix #> #> Row names: chr1:959200-959400, chr1:1019400-1019600 ... chrX:154379066-154379266 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: row major #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered #> 2. Subset to fragments not overlapping 636 ranges: chr10:1-45700 ... chrY:26637301-57227400 #> 3. Calculate 2600 peaks over 5000 ranges: chr1:959201-959400 ... chrX:154379067-154379266"},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":null,"dir":"Reference","previous_headings":"","what":"Dotplot — plot_dot","title":"Dotplot — plot_dot","text":"Plot feature levels per group cluster grid dots. Dots colored z-score normalized average expression, sized percent non-zero.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Dotplot — plot_dot","text":"","code":"plot_dot( source, features, groups, group_order = NULL, gene_mapping = human_gene_mapping, colors = c(\"lightgrey\", \"#4682B4\"), return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Dotplot — plot_dot","text":"source Feature x cell matrix data.frame features. best results, features sparse log-normalized (e.g. run log1p() zero raw counts map zero) features Character vector features plot groups Vector one entry per cell, specifying cell's group group_order Optional vector listing ordering groups gene_mapping optional vector gene name matching match_gene_symbol(). colors Color scale plot return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Dotplot — plot_dot","text":"","code":"## Prep data mat <- get_demo_mat() cell_types <- paste(\"Group\", rep(1:3, length.out = length(colnames(mat)))) ## Plot dot plot <- plot_dot(mat, c(\"MS4A1\", \"CD3E\"), cell_types) BPCells:::render_plot_from_storage( plot, width = 4, height = 5 )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot UMAP or embeddings — plot_embedding","title":"Plot UMAP or embeddings — plot_embedding","text":"Plot one features coloring cells UMAP plot.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot UMAP or embeddings — plot_embedding","text":"","code":"plot_embedding( source, embedding, features = NULL, quantile_range = c(0.01, 0.99), randomize_order = TRUE, smooth = NULL, smooth_rounds = 3, gene_mapping = human_gene_mapping, size = NULL, rasterize = FALSE, raster_pixels = 512, legend_continuous = c(\"auto\", \"quantile\", \"value\"), labels_quantile_range = TRUE, colors_continuous = c(\"lightgrey\", \"#4682B4\"), legend_discrete = TRUE, labels_discrete = TRUE, colors_discrete = discrete_palette(\"stallion\"), return_data = FALSE, return_plot_list = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot UMAP or embeddings — plot_embedding","text":"source Matrix, data frame pull features , vector feature values single feature. matrix, features must rows. embedding matrix dimensions cells x 2 embedding coordinates features Character vector features plot source vector. quantile_range (optional) Length 2 vector giving quantiles clip minimum maximum color scale values, fractions 0 1. NULL NA values skip clipping randomize_order TRUE, shuffle cells prevent overplotting biases. Can pass integer instead specify random seed use. smooth (optional) Sparse matrix dimensions cells x cells cell-cell distance weights smoothing. smooth_rounds Number multiplication rounds apply smoothing. gene_mapping optional vector gene name matching match_gene_symbol(). Ignored source data frame. size Point size plotting rasterize Whether rasterize point drawing speed display graphics programs. raster_pixels Number pixels use rasterizing. Can provide one number square dimensions, two numbers width x height. legend_continuous Whether label continuous features quantile value. \"auto\" labels quantile features continuous quantile_range NULL. Quantile labeling adds text annotation listing range displayed values. labels_quantile_range Whether add text label value range feature legend set quantile colors_continuous Vector colors use continuous color palette legend_discrete Whether show legend discrete (categorical) features. labels_discrete Whether add text labels center group discrete (categorical) features. colors_discrete Vector colors use discrete (categorical) features. return_data true, return data just plotting rather plot. return_plot_list TRUE, return multiple plots list, rather single plot combined using patchwork::wrap_plots() apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot UMAP or embeddings — plot_embedding","text":"default, returns ggplot2 object requested features plotted grid. return_data return_plot_list called, return value match argument.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"smoothing","dir":"Reference","previous_headings":"","what":"Smoothing","title":"Plot UMAP or embeddings — plot_embedding","text":"Smoothing performed follows: first, smoothing matrix normalized sum incoming weights every cell 1. , raw data values repeatedly multiplied smoothing matrix re-scaled average value stays .","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot UMAP or embeddings — plot_embedding","text":"","code":"set.seed(123) mat <- get_demo_mat() ## Normalize matrix mat_norm <- log1p(multiply_cols(mat, 1/colSums(mat)) * 10000) %>% write_matrix_memory(compress = FALSE) ## Get variable genes stats <- matrix_stats(mat, row_stats = \"variance\") variable_genes <- order(stats$row_stats[\"variance\",], decreasing=TRUE) %>% head(1000) %>% sort() # Z score normalize genes mat_norm <- mat[variable_genes, ] gene_means <- stats$row_stats['mean', variable_genes] gene_vars <- stats$row_stats['variance', variable_genes] mat_norm <- (mat_norm - gene_means) / gene_vars ## Save matrix to memory mat_norm <- mat_norm %>% write_matrix_memory(compress = FALSE) ## Run SVD svd <- BPCells::svds(mat_norm, k = 10) pca <- multiply_cols(svd$v, svd$d) ## Get UMAP umap <- uwot::umap(pca) ## Get clusters clusts <- knn_hnsw(pca, ef = 500) %>% knn_to_snn_graph() %>% cluster_graph_louvain() #> 14:58:39 Building HNSW index with metric 'euclidean' ef = 200 M = 16 using 1 threads #> 14:58:39 Finished building index #> 14:58:39 Searching HNSW index with ef = 500 and 1 threads #> 14:58:39 Finished searching ## Plot embeddings print(length(clusts)) #> [1] 2600 plot_embedding(clusts, umap) ### Can also plot by features #plot_embedding( # source = mat, # umap, # features = c(\"MS4A1\", \"CD3E\"), #)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":null,"dir":"Reference","previous_headings":"","what":"Fragment size distribution — plot_fragment_length","title":"Fragment size distribution — plot_fragment_length","text":"Plot distribution fragment lengths, length basepairs x-axis, proportion fragments y-axis. Typical plots show 10-basepair periodicity, well humps spaced multiples nucleosome width (150bp).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fragment size distribution — plot_fragment_length","text":"","code":"plot_fragment_length( fragments, max_length = 500, return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fragment size distribution — plot_fragment_length","text":"fragments Fragments object max_length Maximum length show plot return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fragment size distribution — plot_fragment_length","text":"Numeric vector index contans number length-fragments","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fragment size distribution — plot_fragment_length","text":"","code":"frags <- get_demo_frags(filter_qc = FALSE, subset = FALSE) plot_fragment_length(frags)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":null,"dir":"Reference","previous_headings":"","what":"Knee plot of single cell read counts — plot_read_count_knee","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"Plots read count rank vs. number reads log-log scale.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"","code":"plot_read_count_knee( read_counts, cutoff = NULL, return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"read_counts Vector read counts per cell cutoff (optional) Read cutoff mark plot return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"ggplot2 plot object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"Performs logarithmic downsampling reduce number points plotted","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"","code":"## Prep data mat <- get_demo_mat(filter_qc = FALSE, subset = FALSE) reads_per_cell <- colSums(mat) # Render knee plot plot_read_count_knee(reads_per_cell, cutoff = 1e3)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tf_footprint.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot TF footprint — plot_tf_footprint","title":"Plot TF footprint — plot_tf_footprint","text":"Plot footprinting around TF motif sites","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tf_footprint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot TF footprint — plot_tf_footprint","text":"","code":"plot_tf_footprint( fragments, motif_positions, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), flank = 250L, smooth = 0L, zero_based_coords = !is(genes, \"GRanges\"), colors = discrete_palette(\"stallion\"), return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tf_footprint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot TF footprint — plot_tf_footprint","text":"fragments IterableFragments object motif_positions Coordinate ranges motifs (must include strand) constant width cell_groups Character factor assigning group cell, order cellNames(fragments) flank Number flanking basepairs include either side motif smooth (optional) Sparse matrix dimensions cells x cells cell-cell distance weights smoothing. zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot TSS profile — plot_tss_profile","title":"Plot TSS profile — plot_tss_profile","text":"Plot enrichmment insertions relative transcription start sites (TSS). Typically, plot shows strong enrichment insertions near TSS, small bump downstream around 220bp downstream TSS +1 nucleosome.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot TSS profile — plot_tss_profile","text":"","code":"plot_tss_profile( fragments, genes, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), flank = 2000L, smooth = 0L, zero_based_coords = !is(genes, \"GRanges\"), colors = discrete_palette(\"stallion\"), return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot TSS profile — plot_tss_profile","text":"fragments IterableFragments object genes Coordinate ranges genes (must include strand) cell_groups Character factor assigning group cell, order cellNames(fragments) flank Number flanking basepairs include either side motif smooth Number bases smooth (rolling average) zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot TSS profile — plot_tss_profile","text":"","code":"## Prep data frags <- get_demo_frags(filter_qc = FALSE, subset = FALSE) genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) ## Plot tss profile plot_tss_profile(frags, genes)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":null,"dir":"Reference","previous_headings":"","what":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"Density scatter plot log10(fragment_count) x-axis TSS enrichment y-axis. plot useful select cell barcodes experiment correspond high-quality cells","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"","code":"plot_tss_scatter( atac_qc, min_frags = NULL, min_tss = NULL, bins = 100, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"atac_qc Tibble returned qc_scATAC(). Must columns nFrags TSSEnrichment min_frags Minimum fragment count cutoff min_tss Minimum TSS Enrichment cutoff bins Number bins density calculation apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"","code":"## Prep data frags <- get_demo_frags(filter_qc = FALSE, subset = FALSE) genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") atac_qc <- qc_scATAC(frags, genes, blacklist) ## Render tss enrichment vs fragment plot plot_tss_scatter(atac_qc, min_frags = 1000, min_tss = 10)"},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":null,"dir":"Reference","previous_headings":"","what":"Add sample prefix to cell names — prefix_cell_names","title":"Add sample prefix to cell names — prefix_cell_names","text":"Rename cells adding prefix names. commonly sample name. cells recieve exact text prefix added beginning, separator characters like \"_\" must included given prefix. Use prior merging fragments different experiments c() order help prevent cell name clashes.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add sample prefix to cell names — prefix_cell_names","text":"","code":"prefix_cell_names(fragments, prefix)"},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add sample prefix to cell names — prefix_cell_names","text":"fragments Input fragments object. prefix String add prefix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add sample prefix to cell names — prefix_cell_names","text":"Fragments object prefixed names","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add sample prefix to cell names — prefix_cell_names","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Prefix cells with foo prefix_cell_names(frags, \"foo_\") %>% as(\"GRanges\") #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 11-40 * | foo_cell1 #> [2] chr1 61-90 * | foo_cell1 #> [3] chr1 111-140 * | foo_cell2 #> [4] chr1 161-190 * | foo_cell2 #> [5] chr1 211-240 * | foo_cell3 #> [6] chr1 261-290 * | foo_cell3 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a small demo matrix and fragment object. — prepare_demo_data","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"Downloads 10x Genomics dataset, consisting 3k cells performs optional QC subsetting. Holds subsetted objects disk, returns list matrix fragments.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"","code":"prepare_demo_data( directory = NULL, filter_qc = TRUE, subset = TRUE, timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"directory (character) directory input/output data stored. Downloaded intermediates stored subdir intermediates. NULL, temporary directory created. filter_qc (bool) Whether filter RNA ATAC data using QC information. subset (bool) Whether subset genes/insertions chromosome 4 11. timeout (numeric) Timeout downloading files seconds.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"(list) list RNA matrix name mat, ATAC fragments name frags.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"function downloads 10x Genomics PBMC 3k dataset. Filtering using QC information fragments matrix provides cells least 1000 reads, 1000 frags, minimum tss enrichment 10. Subsetting provides genes insertions chromosomes 4 11. name matrix fragments folders demo_mat demo_frags respectively. Additionally, choosing qc filter appends _filtered, choosing subset data appends _subsetted name.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"Given (features x cells) matrix, group cells cell_groups aggregate counts method feature.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"","code":"pseudobulk_matrix(mat, cell_groups, method = \"sum\", threads = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"mat IterableMatrix object dimensions features x cells cell_groups (Character/factor) Vector group/cluster assignments cell. Length must ncol(mat). method (Character vector) Method(s) aggregate counts. one method provided, output matrix. multiple methods provided, output named list matrices. Current options : nonzeros, sum, mean, variance. threads (integer) Number threads use.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"method length 1, returns matrix shape (features x groups). method greater length 1, returns list matrices matrix representing pseudobulk matrix different aggregation method. matrix shape (features x groups), names one nonzeros, sum, mean, variance.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"simpler stats calculated process calculating complex statistics. calculating variance, nonzeros mean can included extra calculation time, calculating mean, adding nonzeros take extra time.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"","code":"set.seed(12345) mat <- matrix(rpois(100, lambda = 5), nrow = 10) rownames(mat) <- paste0(\"gene\", 1:10) colnames(mat) <- paste0(\"cell\", 1:10) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") groups <- rep(c(\"Cluster1\", \"Cluster2\"), each = 5) ## When calculating only sum across two groups pseudobulk_res <- pseudobulk_matrix( mat = mat, cell_groups = groups, method = \"sum\" ) pseudobulk_res #> Cluster1 Cluster2 #> gene1 26 38 #> gene2 19 27 #> gene3 32 21 #> gene4 27 19 #> gene5 22 27 #> gene6 20 23 #> gene7 24 37 #> gene8 24 22 #> gene9 20 23 #> gene10 34 21 ## Can also request multiple summary statistics for pseudoulking pseudobulk_res_multi <- pseudobulk_matrix( mat = mat, cell_groups = groups, method = c(\"mean\", \"variance\") ) names(pseudobulk_res_multi) #> [1] \"mean\" \"variance\" pseudobulk_res_multi$mean #> Cluster1 Cluster2 #> gene1 5.2 7.6 #> gene2 3.8 5.4 #> gene3 6.4 4.2 #> gene4 5.4 3.8 #> gene5 4.4 5.4 #> gene6 4.0 4.6 #> gene7 4.8 7.4 #> gene8 4.8 4.4 #> gene9 4.0 4.6 #> gene10 6.8 4.2"},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"Calculate ArchR-compatible per-cell QC statistics","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"","code":"qc_scATAC(fragments, genes, blacklist)"},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"fragments IterableFragments object genes Gene coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position blacklist Blacklisted regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"data.frame QC data","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"implementation mimics ArchR's default parameters. uses requiring flexibility tweak default parameters, best option re-implement function required changes. Output columns data.frame: cellName: cell name cell nFrags: number fragments per cell subNucleosomal, monoNucleosomal, multiNucleosomal: number fragments size 1-146bp, 147-254bp, 255bp + respectively. equivalent ArchR's nMonoFrags, nDiFrags, nMultiFrags respectively TSSEnrichment: AvgInsertInTSS / max(AvgInsertFlankingTSS, 0.1), AvgInsertInTSS ReadsInTSS / 101 (window size), AvgInsertFlankingTSS ReadsFlankingTSS / (100*2) (window size). max(0.1) ensures low-read cells get assigned spuriously high TSSEnrichment. ReadsInPromoter: Number reads 2000bp upstream TSS 101bp downstream TSS ReadsInBlacklist: Number reads provided blacklist region ReadsInTSS: Number reads overlapping 101bp centered around TSS ReadsFlankingTSS: Number reads overlapping 1901-2000bp +/- TSS Differences ArchR: Note ArchR default uses different set annotations derive TSS sites promoter sites. function uses just one annotation gene start+end sites, must called twice exactly re-calculate ArchR QC stats. ArchR's PromoterRatio BlacklistRatio included output, can easily calculated ReadsInPromoter / nFrags ReadsInBlacklist / nFrags. Similarly, ArchR's NucleosomeRatio can calculated (monoNucleosomal + multiNucleosomal) / subNucleosomal.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"","code":"## Prep data frags <- get_demo_frags(subset = FALSE) reference_dir <- file.path(tempdir(), \"references\") genes <- read_gencode_transcripts( reference_dir, release=\"42\", transcript_choice=\"MANE_Select\", annotation_set = \"basic\", features=\"transcript\" ) blacklist <- read_encode_blacklist(reference_dir, genome = \"hg38\") ## Run qc head(qc_scATAC(frags, genes, blacklist)) #> # A tibble: 6 × 10 #> cellName TSSEnrichment nFrags subNucleosomal monoNucleosomal multiNucleosomal #> #> 1 TTTAGCAA… 45.1 16363 8069 5588 2706 #> 2 AGCCGGTT… 30.9 33313 15855 11868 5590 #> 3 TGATTAGT… 41.9 11908 6103 3817 1988 #> 4 ATTGACTC… 43.9 13075 6932 4141 2002 #> 5 CGTTAGGT… 31.5 14874 6833 5405 2636 #> 6 AAACCGCG… 41.9 30141 15085 10199 4857 #> # ℹ 4 more variables: ReadsInTSS , ReadsFlankingTSS , #> # ReadsInPromoter , ReadsInBlacklist "},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":null,"dir":"Reference","previous_headings":"","what":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"Given set genomic ranges, find distance nearest neighbors upstream downstream.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"","code":"range_distance_to_nearest( ranges, addArchRBug = FALSE, zero_based_coords = !is(ranges, \"GRanges\") )"},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand addArchRBug boolean reproduce ArchR's bug incorrectly handles nested genes zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range","code":""},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"2-column data.frame columns upstream downstream, containing distances nearest neighbor respective directions. ranges + * strand, distance calculated : upstream = max(start(range) - end(upstreamNeighbor), 0) downstream = max(start(downstreamNeighbor) - end(range), 0) ranges - strand, definition upstream downstream flipped. Note definition distance one GenomicRanges::distance(), ranges neighbor overlap given distance 1 rather 0.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"","code":"## Prep data ranges <- tibble::tibble( chr = \"chr1\", start = seq(10, 410, 100), end = start + 50, strand = \"+\" ) ## Add one range that is completely nested in the other ranges ranges_with_nesting <- ranges %>% tibble::add_row(chr = \"chr1\", start = 11, end = 20, strand = \"+\") ## Get range distance to nearest range_distance_to_nearest(ranges_with_nesting) #> # A tibble: 6 × 2 #> upstream downstream #> #> 1 Inf 51 #> 2 51 51 #> 3 51 51 #> 4 51 51 #> 5 51 Inf #> 6 0 0"},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":null,"dir":"Reference","previous_headings":"","what":"Rank-transform a matrix — rank_transform","title":"Rank-transform a matrix — rank_transform","text":"Rank values within row/col matrix, output rank values new matrix. Rank values offset rank 0 value 0, ties handled averaging ranks.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rank-transform a matrix — rank_transform","text":"","code":"rank_transform(mat, axis)"},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rank-transform a matrix — rank_transform","text":"mat Data matrix (IterableMatrix) axis Axis rank values within. \"col\" rank values within column, \"row\" rank values within row.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Rank-transform a matrix — rank_transform","text":"Note efficient rank calculation depends storage order matrix, may necessary call transpose_storage_order()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":null,"dir":"Reference","previous_headings":"","what":"Read a bed file into a data frame — read_bed","title":"Read a bed file into a data frame — read_bed","text":"Bed files can contain peak blacklist annotations. utilities help read thos annotations","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read a bed file into a data frame — read_bed","text":"","code":"read_bed( path, additional_columns = character(0), backup_url = NULL, timeout = 300 ) read_encode_blacklist( dir, genome = c(\"hg38\", \"mm10\", \"hg19\", \"dm6\", \"dm3\", \"ce11\", \"ce10\"), timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read a bed file into a data frame — read_bed","text":"path Path file (desired save location backup_url used) additional_columns Names additional columns bed file backup_url path exist, provides URL download gtf timeout Maximum time seconds wait download backup_url dir Output directory cache downloaded gtf file genome genome name","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read a bed file into a data frame — read_bed","text":"Data frame coordinates using 0-based convention.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read a bed file into a data frame — read_bed","text":"read_bed Read bed file disk url. read_encode_blacklist Downloads Boyle Lab blacklist, described https://doi.org/10.1038/s41598-019-45839-z","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read a bed file into a data frame — read_bed","text":"","code":"## Dummy bed file creation data.frame( chrom = rep(\"chr1\", 6), start = seq(20, 121, 20), end = seq(39, 140, 20) ) %>% write.table(\"./references/example.bed\", row.names = FALSE, col.names = FALSE, sep = \"\\t\") ####################################################################### ## read_bed() example ####################################################################### read_bed(\"./references/example.bed\") #> # A tibble: 6 × 3 #> chr start end #> #> 1 chr1 20 39 #> 2 chr1 40 59 #> 3 chr1 60 79 #> 4 chr1 80 99 #> 5 chr1 100 119 #> 6 chr1 120 139 ####################################################################### ## read_encode_blacklist() example ####################################################################### read_encode_blacklist(\"./reference\") #> # A tibble: 636 × 4 #> chr start end reason #> #> 1 chr10 0 45700 Low Mappability #> 2 chr10 38481300 38596500 High Signal Region #> 3 chr10 38782600 38967900 High Signal Region #> 4 chr10 39901300 41712900 High Signal Region #> 5 chr10 41838900 42107300 High Signal Region #> 6 chr10 42279400 42322500 High Signal Region #> 7 chr10 126946300 126953400 Low Mappability #> 8 chr10 133625800 133797400 High Signal Region #> 9 chr11 0 194500 Low Mappability #> 10 chr11 518900 520700 Low Mappability #> # ℹ 626 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":null,"dir":"Reference","previous_headings":"","what":"Read GTF gene annotations — read_gtf","title":"Read GTF gene annotations — read_gtf","text":"Read gene annotations gtf format data frame. source can URL, gtf file disk, gencode release version.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read GTF gene annotations — read_gtf","text":"","code":"read_gtf( path, attributes = c(\"gene_id\"), tags = character(0), features = c(\"gene\"), keep_attribute_column = FALSE, backup_url = NULL, timeout = 300 ) read_gencode_genes( dir, release = \"latest\", annotation_set = c(\"basic\", \"comprehensive\"), gene_type = \"lncRNA|protein_coding|IG_.*_gene|TR_.*_gene\", attributes = c(\"gene_id\", \"gene_type\", \"gene_name\"), tags = character(0), features = c(\"gene\"), timeout = 300 ) read_gencode_transcripts( dir, release = \"latest\", transcript_choice = c(\"MANE_Select\", \"Ensembl_Canonical\", \"all\"), annotation_set = c(\"basic\", \"comprehensive\"), gene_type = \"lncRNA|protein_coding|IG_.*_gene|TR_.*_gene\", attributes = c(\"gene_id\", \"gene_type\", \"gene_name\", \"transcript_id\"), features = c(\"transcript\", \"exon\"), timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read GTF gene annotations — read_gtf","text":"path Path file (desired save location backup_url used) attributes Vector GTF attribute names parse columns tags Vector tags parse boolean presence/absence features List features types keep GTF (e.g. gene, transcript, exon, intron) keep_attribute_column Boolean whether preserve raw attribute text column backup_url path exist, provides URL download gtf timeout Maximum time seconds wait download backup_url dir Output directory cache downloaded gtf file release release version (prefix M mouse versions). recent version, use \"latest\" \"latest_mouse\" annotation_set Either \"basic\" \"comprehensive\" annotation sets (see details section). gene_type Regular expression gene types keep. Defaults protein_coding, lncRNA, IG/TR genes transcript_choice Method selecting representative transcripts. Choices : MANE_Select: human-, conservative Ensembl_Canonical: human+mouse, superset MANE_Select human : Preserve transcript models (recommended plotting)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read GTF gene annotations — read_gtf","text":"Data frame coordinates using 0-based convention. Columns : chr source feature start end score strand frame attributes (optional; named according listed attributes) tags (named according listed tags)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read GTF gene annotations — read_gtf","text":"read_gtf Read gtf file URL read_gencode_genes Read gene annotations directly GENCODE. file name vary depending release annotation set requested, format gencode.v42.annotation.gtf.gz. GENCODE currently recommends basic set: https://www.gencodegenes.org/human/. release 42, comprehensive basic sets identical gene-level annotations, comprehensive set additional transcript variants annotated. read_gencode_transcripts Read transcript models GENCODE, use trackplot_gene()","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read GTF gene annotations — read_gtf","text":"","code":"####################################################################### ## read_gtf() example ####################################################################### species <- \"Saccharomyces_cerevisiae\" version <- \"GCF_000146045.2_R64\" head(read_gtf( path = sprintf(\"./reference/%s_genomic.gtf.gz\", version), backup_url = sprintf( \"https://ftp.ncbi.nlm.nih.gov/genomes/refseq/fungi/%s/reference/%s/%s_genomic.gtf.gz\", species, version, version ) )) #> # A tibble: 6 × 9 #> chr source feature start end score strand frame gene_id #> #> 1 NC_001133.9 RefSeq gene 1806 2169 . - . YAL068C #> 2 NC_001133.9 RefSeq gene 2479 2707 . + . YAL067W-A #> 3 NC_001133.9 RefSeq gene 7234 9016 . - . YAL067C #> 4 NC_001133.9 RefSeq gene 11564 11951 . - . YAL065C #> 5 NC_001133.9 RefSeq gene 12045 12426 . + . YAL064W-B #> 6 NC_001133.9 RefSeq gene 13362 13743 . - . YAL064C-A ####################################################################### ## read_gencode_genes() example ####################################################################### read_gencode_genes(\"./references\", release = \"42\") #> # A tibble: 39,319 × 11 #> chr source feature start end score strand frame gene_id gene_type #> #> 1 chr1 HAVANA gene 11868 14409 . + . ENSG00000290… lncRNA #> 2 chr1 HAVANA gene 29553 31109 . + . ENSG00000243… lncRNA #> 3 chr1 HAVANA gene 34553 36081 . - . ENSG00000237… lncRNA #> 4 chr1 HAVANA gene 57597 64116 . + . ENSG00000290… lncRNA #> 5 chr1 HAVANA gene 65418 71585 . + . ENSG00000186… protein_… #> 6 chr1 HAVANA gene 89294 133723 . - . ENSG00000238… lncRNA #> 7 chr1 HAVANA gene 89550 91105 . - . ENSG00000239… lncRNA #> 8 chr1 HAVANA gene 139789 140339 . - . ENSG00000239… lncRNA #> 9 chr1 HAVANA gene 141473 173862 . - . ENSG00000241… lncRNA #> 10 chr1 HAVANA gene 160445 161525 . + . ENSG00000241… lncRNA #> # ℹ 39,309 more rows #> # ℹ 1 more variable: gene_name ####################################################################### ## read_gencode_transcripts() example ####################################################################### ## If read_gencode_genes() was already ran on the same release, ## will reuse previously downloaded annotations read_gencode_transcripts(\"./references\", release = \"42\") #> # A tibble: 220,296 × 13 #> chr source feature start end score strand frame gene_id gene_type #> #> 1 chr1 HAVANA transcript 65418 71585 . + . ENSG00000… protein_… #> 2 chr1 HAVANA exon 65418 65433 . + . ENSG00000… protein_… #> 3 chr1 HAVANA exon 65519 65573 . + . ENSG00000… protein_… #> 4 chr1 HAVANA exon 69036 71585 . + . ENSG00000… protein_… #> 5 chr1 HAVANA transcript 450739 451678 . - . ENSG00000… protein_… #> 6 chr1 HAVANA exon 450739 451678 . - . ENSG00000… protein_… #> 7 chr1 HAVANA transcript 685715 686654 . - . ENSG00000… protein_… #> 8 chr1 HAVANA exon 685715 686654 . - . ENSG00000… protein_… #> 9 chr1 HAVANA transcript 923922 944574 . + . ENSG00000… protein_… #> 10 chr1 HAVANA exon 923922 924948 . + . ENSG00000… protein_… #> # ℹ 220,286 more rows #> # ℹ 3 more variables: gene_name , transcript_id , MANE_Select "},{"path":"https://bnprks.github.io/BPCells/reference/read_ucsc_chrom_sizes.html","id":null,"dir":"Reference","previous_headings":"","what":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","title":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","text":"Read chromosome sizes UCSC return tibble one row per chromosome. underlying data pulled : https://hgdownload.soe.ucsc.edu/downloads.html","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_ucsc_chrom_sizes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","text":"","code":"read_ucsc_chrom_sizes( dir, genome = c(\"hg38\", \"mm39\", \"mm10\", \"mm9\", \"hg19\"), keep_chromosomes = \"chr[0-9]+|chrX|chrY\", timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/read_ucsc_chrom_sizes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","text":"","code":"read_ucsc_chrom_sizes(\"./reference\") #> # A tibble: 24 × 3 #> chr start end #> #> 1 chr1 0 248956422 #> 2 chr2 0 242193529 #> 3 chr3 0 198295559 #> 4 chr4 0 190214555 #> 5 chr5 0 181538259 #> 6 chr6 0 170805979 #> 7 chr7 0 159345973 #> 8 chrX 0 156040895 #> 9 chr8 0 145138636 #> 10 chr9 0 138394717 #> # ℹ 14 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. magrittr %>% Matrix colMeans, colSums, rowMeans, rowSums","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":null,"dir":"Reference","previous_headings":"","what":"Regress out unwanted variation — regress_out","title":"Regress out unwanted variation — regress_out","text":"Regress effects confounding variables using linear least squares regression model.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Regress out unwanted variation — regress_out","text":"","code":"regress_out(mat, latent_data, prediction_axis = c(\"row\", \"col\"))"},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Regress out unwanted variation — regress_out","text":"mat Input IterableMatrix latent_data Data regress , data.frame column variable regress . prediction_axis axis corresponds prediction outputs linear models (e.g. gene axis typical single cell analysis). Options include \"row\" (default) \"col\".","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Regress out unwanted variation — regress_out","text":"IterableMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Regress out unwanted variation — regress_out","text":"Conceptually, regress_out calculates linear least squares best fit model row matrix. (column prediction_axis \"col\"). input data regression model columns latent_data, model tries predict values corresponding row (column) mat. fitting model, regress_out subtract model predictions input values, aiming retain effects explained variables latent_data. models can fit efficiently since share input data calculations closed-form best fit solution shared. QR factorization model matrix dense matrix-vector multiply sufficient fully calculate residual values. Efficiency considerations: output matrix dense rather sparse, mean variance calculations may run comparatively slowly. However, PCA matrix/vector multiply operations can performed nearly cost input matrix due mathematical simplifications. Memory usage scales n_features * ((nrow(mat) + ncol(mat)). Generally, n_features == ncol(latent_data), categorical variables latent_data, category expanded indicator variable. Memory usage therefore higher using categorical input variables many (.e. >100) distinct values.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/render_plot_from_storage.html","id":null,"dir":"Reference","previous_headings":"","what":"Render a plot with intermediate disk storage step — render_plot_from_storage","title":"Render a plot with intermediate disk storage step — render_plot_from_storage","text":"Take plotting object save temp storage, can outputted exact dimensions. Primarily used allow adjusting plot dimensions within function reference examples.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/render_plot_from_storage.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Render a plot with intermediate disk storage step — render_plot_from_storage","text":"","code":"render_plot_from_storage(plot, width, height)"},{"path":"https://bnprks.github.io/BPCells/reference/render_plot_from_storage.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Render a plot with intermediate disk storage step — render_plot_from_storage","text":"plot (ggplot) ggplot output plotting function width (numeric) width rendered plot height (numeric) height rendered plot","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rotate_x_labels.html","id":null,"dir":"Reference","previous_headings":"","what":"Rotate ggplot x axis labels — rotate_x_labels","title":"Rotate ggplot x axis labels — rotate_x_labels","text":"Rotate ggplot x axis labels","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rotate_x_labels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rotate ggplot x axis labels — rotate_x_labels","text":"","code":"rotate_x_labels(degrees = 45)"},{"path":"https://bnprks.github.io/BPCells/reference/rotate_x_labels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rotate ggplot x axis labels — rotate_x_labels","text":"degrees Number degrees rotate ","code":""},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":null,"dir":"Reference","previous_headings":"","what":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"Scan fragments without performing operations (used benchmarking)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"","code":"scan_fragments(fragments)"},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"fragments Fragments object scan","code":""},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"Length 4 vector fragment count, sums chr, starts, ends","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":null,"dir":"Reference","previous_headings":"","what":"SCTransform Pearson Residuals — sctransform_pearson","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"Calculate pearson residuals negative binomial sctransform model. Normalized values calculated (X - mu) / sqrt(mu + mu^2/theta). mu calculated cell_read_counts * gene_beta.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"","code":"sctransform_pearson( mat, gene_theta, gene_beta, cell_read_counts, min_var = -Inf, clip_range = c(-10, 10), columns_are_cells = TRUE, slow = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"mat IterableMatrix (raw counts) gene_theta Vector per-gene thetas (overdispersion values) gene_beta Vector per-gene betas (expression level values) cell_read_counts Vector total reads per (umi count RNA) min_var Minimum value clipping variance clip_range Length 2 vector min max clipping range columns_are_cells Whether columns matrix correspond cells (default) genes slow TRUE, use 10x slower precise implementation (default FALSE)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"IterableMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"parameterization used somewhat simplified compared original SCTransform paper, particular uses linear-scale rather log-scale represent cell_read_counts gene_beta variables. also support addition arbitrary cell metadata (e.g. batch) add negative binomial regression.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset, translate, or reorder cell IDs — select_cells","title":"Subset, translate, or reorder cell IDs — select_cells","text":"Subset, translate, reorder cell IDs","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset, translate, or reorder cell IDs — select_cells","text":"","code":"select_cells(fragments, cell_selection)"},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset, translate, or reorder cell IDs — select_cells","text":"fragments Input fragments object cell_selection List cell IDs (numeric), names (character), logical mask.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Subset, translate, or reorder cell IDs — select_cells","text":"Numeric cell IDs re-assigned order cell_selection. output cell ID n taken input cell ID/name cell_selection[n].","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset, translate, or reorder cell IDs — select_cells","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Select cells by name select_cells(frags, \"cell1\") #> IterableFragments object of class \"CellSelectName\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 1 cells by name: cell1 ## Select cells by index select_cells(frags, c(1,3)) #> IterableFragments object of class \"CellSelectIndex\" #> #> Cells: 2 cells with names cell1, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 2 cells by index: 1, 3"},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset, translate, or reorder chromosome IDs — select_chromosomes","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"Subset, translate, reorder chromosome IDs","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"","code":"select_chromosomes(fragments, chromosome_selection)"},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"fragments Input fragments object chromosome_selection List chromosme IDs (numeric), names (character), logical mask.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"Numeric chromosome IDs re-assigned order chromosome_selection. output chromosome ID n taken input chromosome ID/name chromosome_selection[n].","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"","code":"## Prep data frags <- tibble::tibble( chr = c(rep(\"chr1\", 2), rep(\"chrX\", 2), rep(\"chr3\", 2)), start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell1\") ) %>% as(\"GRanges\") frags <- frags %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 3 chromosomes with names chr1, chr3, chrX #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Selecting by chromosome IDs select_chromosomes(frags, c(1, 3)) #> IterableFragments object of class \"ChrSelectIndex\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 2 chromosomes with names chr1, chrX #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 2 chromosomes by index: 1, 3 ## Selecting by name select_chromosomes(frags, c(\"chrX\")) #> IterableFragments object of class \"ChrSelectName\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 1 chromosomes with names chrX #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 1 chromosomes by name: chrX"},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset fragments by genomic region — select_regions","title":"Subset fragments by genomic region — select_regions","text":"Fragments can subset based overlapping (overlapping) set regions","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset fragments by genomic region — select_regions","text":"","code":"select_regions( fragments, ranges, invert_selection = FALSE, zero_based_coords = !is(ranges, \"GRanges\") )"},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset fragments by genomic region — select_regions","text":"fragments Input fragments object. ranges Peaks/ranges overlap, given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position invert_selection TRUE, select fragments overlapping selected regions instead fragments overlapping selected regions. zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Subset fragments by genomic region — select_regions","text":"Fragments object filtered according selected regions","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset fragments by genomic region — select_regions","text":"","code":"frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = \"cell1\" ) frags #> # A tibble: 6 × 4 #> chr start end cell_id #> #> 1 chr1 10 15 cell1 #> 2 chr1 60 70 cell1 #> 3 chr1 110 125 cell1 #> 4 chr1 160 180 cell1 #> 5 chr1 210 235 cell1 #> 6 chr1 260 290 cell1 frags <- frags %>% convert_to_fragments() region <- tibble::tibble( chr = \"chr1\", start = 60, end = 130 ) %>% as(\"GRanges\") ## Select ranges overlapping with region select_regions(frags, region) %>% as(\"GRanges\") #> GRanges object with 2 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 61-70 * | cell1 #> [2] chr1 111-125 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths ## Select ranges not overlapping with region select_regions(frags, region, invert_selection = TRUE) %>% as(\"GRanges\") #> GRanges object with 4 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 11-15 * | cell1 #> [2] chr1 161-180 * | cell1 #> [3] chr1 211-235 * | cell1 #> [4] chr1 261-290 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":null,"dir":"Reference","previous_headings":"","what":"Set matrix op thread count — set_threads","title":"Set matrix op thread count — set_threads","text":"Set number threads use sparse-dense multiply matrix_stats.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set matrix op thread count — set_threads","text":"","code":"set_threads(mat, threads = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set matrix op thread count — set_threads","text":"mat IterableMatrix, product rbind cbind threads Number threads use execution","code":""},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Set matrix op thread count — set_threads","text":"valid concatenated matrices","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":null,"dir":"Reference","previous_headings":"","what":"Shift start or end coordinates — shift_fragments","title":"Shift start or end coordinates — shift_fragments","text":"Shifts start end fragments fixed amount, can useful correct Tn5 offset.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Shift start or end coordinates — shift_fragments","text":"","code":"shift_fragments(fragments, shift_start = 0L, shift_end = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Shift start or end coordinates — shift_fragments","text":"fragments Input fragments object shift_start many basepairs shift start coords shift_end many basepairs shift end coords","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Shift start or end coordinates — shift_fragments","text":"Shifted fragments object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Shift start or end coordinates — shift_fragments","text":"correct Tn5 offset +/- 4bp since Tn5 cut sites opposite strands offset 9bp. However, +4/-5 bp often applied bed-format files, since end coordinate bed files 1 past last basepair sequenced DNA fragment. results bed-like format except inclusive end coordinates.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Shift start or end coordinates — shift_fragments","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell1\") ) %>% as(\"GRanges\") frags #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 10-40 * | cell1 #> [2] chr1 60-90 * | cell1 #> [3] chr1 110-140 * | cell1 #> [4] chr1 160-190 * | cell1 #> [5] chr1 210-240 * | cell1 #> [6] chr1 260-290 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths frags <- frags %>% convert_to_fragments() ## Shift fragments shift_fragments(frags, shift_start = 4, shift_end = -4) %>% as(\"GRanges\") #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 14-36 * | cell1 #> [2] chr1 64-86 * | cell1 #> [3] chr1 114-136 * | cell1 #> [4] chr1 164-186 * | cell1 #> [5] chr1 214-236 * | cell1 #> [6] chr1 264-286 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset fragments by length — subset_lengths","title":"Subset fragments by length — subset_lengths","text":"Subset fragments length","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset fragments by length — subset_lengths","text":"","code":"subset_lengths(fragments, min_len = 0L, max_len = NA_integer_)"},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset fragments by length — subset_lengths","text":"fragments Input fragments object min_len Minimum bases fragment (inclusive) max_len Maximum bases fragment (inclusive)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Subset fragments by length — subset_lengths","text":"Fragments object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Subset fragments by length — subset_lengths","text":"Fragment length calculated end-start","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset fragments by length — subset_lengths","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) frags #> # A tibble: 6 × 4 #> chr start end cell_id #> #> 1 chr1 10 15 cell1 #> 2 chr1 60 70 cell1 #> 3 chr1 110 125 cell2 #> 4 chr1 160 180 cell2 #> 5 chr1 210 235 cell3 #> 6 chr1 260 290 cell3 frags <- frags %>% convert_to_fragments() ## Subset lengths subset_lengths(frags, min_len = 10, max_len = 20) %>% as(\"GRanges\") #> GRanges object with 3 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 61-70 * | cell1 #> [2] chr1 111-125 * | cell2 #> [3] chr1 161-180 * | cell2 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate svds — svds","title":"Calculate svds — svds","text":"Use C++ Spectra solver (RSpectra package), order compute largest k values corresponding singular vectors. Empirically, memory usage much lower using irlba::irlba(), likely due avoiding R garbage creation solving due pure-C++ solver. documentation slightly-edited version RSpectra::svds() documentation.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate svds — svds","text":"","code":"svds(A, k, nu = k, nv = k, opts = list(), threads=0L, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate svds — svds","text":"matrix whose truncated SVD computed. k Number singular values requested. nu Number right singular vectors computed. must 0 'k'. (Must equal 'k' BPCells IterableMatrix) opts Control parameters related computing algorithm. See Details threads Control threads use calculating mat-vec producs (BPCells specific)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate svds — svds","text":"list following components: d vector computed singular values. u m nu matrix whose columns contain left singular vectors. nu == 0, NULL returned. v n nv matrix whose columns contain right singular vectors. nv == 0, NULL returned. nconv Number converged singular values. niter Number iterations used. nops Number matrix-vector multiplications used.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate svds — svds","text":"RSpectra installed, function just add method RSpectra::svds() IterableMatrix class. opts argument list can supply following parameters: ncv Number Lanzcos basis vectors use. vectors result faster convergence, greater memory use. ncv must satisfy \\(k < ncv \\le p\\) p = min(m, n). Default min(p, max(2*k+1, 20)). tol Precision parameter. Default 1e-10. maxitr Maximum number iterations. Default 1000. center Either logical value (TRUE/FALSE), numeric vector length \\(n\\). vector \\(c\\) supplied, SVD computed matrix \\(- 1c'\\), implicit way without actually forming matrix. center = TRUE effect center = colMeans(). Default FALSE. Ignored BPCells scale Either logical value (TRUE/FALSE), numeric vector length \\(n\\). vector \\(s\\) supplied, SVD computed matrix \\((- 1c')S\\), \\(c\\) centering vector \\(S = diag(1/s)\\). scale = TRUE, vector \\(s\\) computed column norm \\(- 1c'\\). Default FALSE. Ignored BPCells","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculate svds — svds","text":"Qiu Y, Mei J (2022). RSpectra: Solvers Large-Scale Eigenvalue SVD Problems. R package version 0.16-1, https://CRAN.R-project.org/package=RSpectra.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate svds — svds","text":"","code":"mat <- matrix(rnorm(500), nrow = 50, ncol = 10) rownames(mat) <- paste0(\"gene\", seq_len(50)) colnames(mat) <- paste0(\"cell\", seq_len(10)) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") svd_res <- svds(mat, k = 5) names(svd_res) #> [1] \"d\" \"u\" \"v\" \"niter\" \"nops\" \"nconv\" svd_res$d #> [1] 10.213518 9.181788 8.371677 7.570168 7.202453 dim(svd_res$u) #> [1] 50 5 dim(svd_res$v) #> [1] 10 5 # Can also pass in values directly into RSpectra::svds svd_res <- svds(mat, k = 5, opts=c(maxitr = 500))"},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate ranges x cells tile overlap matrix — tile_matrix","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"Calculate ranges x cells tile overlap matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"","code":"tile_matrix( fragments, ranges, mode = c(\"insertions\", \"fragments\"), zero_based_coords = !is(ranges, \"GRanges\"), explicit_tile_names = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"fragments Input fragments object ranges Tiled regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position tile_width: Size tile region basepairs Must non-overlapping sorted (chr, start), chromosomes ordered according chromosome names fragments mode Mode counting tile overlaps. (See \"value\" section detail) zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) explicit_tile_names Boolean whether add rownames output matrix format e.g chr1:500-1000, start end coords given 0-based coordinate system. whole-genome Tile matrices names take ~5 seconds generate take 400MB memory. Note either way, tile names written matrix saved.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"Iterable matrix object dimension ranges x cells. saved, column names format chr1:500-1000, start end coords given 0-based coordinate system. mode options \"insertions\": Start end coordinates separately overlapped tile \"fragments\": Like \"insertions\", fragment can contribute 1 count tile, even start end coordinates overlap","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"calculating matrix directly fragments tsv, necessary first call select_chromosomes() order provide ordering chromosomes expect reading tsv.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"","code":"## Prep demo data frags <- get_demo_frags(subset = FALSE) chrom_sizes <- read_ucsc_chrom_sizes(file.path(tempdir(), \"references\"), genome=\"hg38\") blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE) ranges <- tibble::tibble( chr = \"chr4\", start = 0, end = \"190214555\", tile_width = 200 ) ## Get tile matrix tile_matrix(frags_filter_blacklist, ranges) #> 951073 x 2600 IterableMatrix object with class TileMatrix #> #> Row names: unknown names #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: row major #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered #> 2. Subset to fragments not overlapping 636 ranges: chr10:1-45700 ... chrY:26637301-57227400 #> 3. Calculate 951073 tiles over 1 ranges: chr4:1-190214555 (200bp), chr4:1-190214555 (200bp)"},{"path":"https://bnprks.github.io/BPCells/reference/tile_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Get ranges corresponding to selected tiles of a tile matrix — tile_ranges","title":"Get ranges corresponding to selected tiles of a tile matrix — tile_ranges","text":"Get ranges corresponding selected tiles tile matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get ranges corresponding to selected tiles of a tile matrix — tile_ranges","text":"","code":"tile_ranges(tile_matrix, selection)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":null,"dir":"Reference","previous_headings":"","what":"Pseudobulk trackplot — trackplot_bulk","title":"Pseudobulk trackplot — trackplot_bulk","text":"function renamed trackplot_coverage() Plot pseudobulk genome track, showing number fragment insertions across region.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pseudobulk trackplot — trackplot_bulk","text":"","code":"trackplot_bulk( fragments, region, groups, cell_read_counts, group_order = NULL, bins = 200, clip_quantile = 0.999, colors = discrete_palette(\"stallion\"), legend_label = \"group\", zero_based_coords = !is(region, \"GRanges\"), return_data = FALSE, return_plot_list = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pseudobulk trackplot — trackplot_bulk","text":"fragments Fragments object region GRanges length 1 region plot, list/data.frame one entry chr, start, end. See gene_region() genomic-ranges details groups Vector one entry per cell, specifying cell's group cell_read_counts Numeric vector read counts cell (used normalization) group_order Optional vector listing ordering groups bins Number bins plot across region clip_quantile (optional) Quantile values clipping y-axis limits. Default 0.999 crop just extreme outliers across region. NULL disable clipping colors Character vector color values (optionally named group) legend_label Custom label put legend zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) return_data true, return data just plotting rather plot. return_plot_list TRUE, return multiple plots list, rather single plot combined using patchwork::wrap_plots() apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pseudobulk trackplot — trackplot_bulk","text":"Returns combined plot pseudobulk genome tracks. compatability draw_trackplot_grid(), extra attribute $patches$labels added specify labels track. return_data return_plot_list TRUE, return value modified accordingly.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"Calculate y positions trackplot segments avoid overlap Steps: Calculate maximum overlap depth transcripts Iterate start/end segments sorted order Randomly assign segment y-coordinate 1 max overlap depth, restriction segment y-coordinate overlapping segment","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"","code":"trackplot_calculate_segment_height(data)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"data tibble genome ranges start end columns, assumed chromosome.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"Vector y coordinates, one per input row, ranges y coordinate overlap","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine track plots — trackplot_combine","title":"Combine track plots — trackplot_combine","text":"Combines multiple track plots region single grid. Uses patchwork package perform alignment.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine track plots — trackplot_combine","text":"","code":"trackplot_combine( tracks, side_plot = NULL, title = NULL, side_plot_width = 0.3 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine track plots — trackplot_combine","text":"tracks List tracks order top bottom, generally ggplots output trackplot_*() functions. side_plot Optional plot align right (e.g. RNA expression per cluster). aligned first trackplot_coverage() output present, else first generic ggplot alignment. horizontal orientation cluster ordering coverage plots. title Text overarching title plot side_plot_width Fraction width used side plot relative main track area","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine track plots — trackplot_combine","text":"plot object aligned genome plots. aligned row text label, y-axis, plot body. relative height row given heights. shared title x-axis put top.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Combine track plots — trackplot_combine","text":"","code":"## Prep data frags <- get_demo_frags() ## Use genes and blacklist to determine proper number of reads per cell genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") read_counts <- qc_scATAC(frags, genes, blacklist)$nFrags region <- \"chr4:3034877-4034877\" cell_types <- paste(\"Group\", rep(1:3, length.out = length(cellNames(frags)))) transcripts <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\" ) region <- \"chr4:3034877-4034877\" ## Get all trackplots and scalebars to combine plot_scalebar <- trackplot_scalebar(region) plot_gene <- trackplot_gene(transcripts, region) plot_coverage <- trackplot_coverage(frags, region, groups = cell_types, cell_read_counts = read_counts) ## Combine trackplots and render ## Also remove colors from gene track plot <- trackplot_combine( list(plot_scalebar, plot_coverage, plot_gene + ggplot2::guides(color = \"none\")) ) BPCells:::render_plot_from_storage(plot, width = 6, height = 4)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":null,"dir":"Reference","previous_headings":"","what":"Pseudobulk coverage trackplot — trackplot_coverage","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"Plot pseudobulk genome track, showing number fragment insertions across region cell type group.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"","code":"trackplot_coverage( fragments, region, groups, cell_read_counts, group_order = NULL, bins = 500, clip_quantile = 0.999, colors = discrete_palette(\"stallion\"), legend_label = NULL, zero_based_coords = !is(region, \"GRanges\"), return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"fragments Fragments object region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details groups Vector one entry per cell, specifying cell's group cell_read_counts Numeric vector read counts cell (used normalization) group_order Optional vector listing ordering groups bins Number bins plot across region clip_quantile (optional) Quantile values clipping y-axis limits. Default 0.999 crop just extreme outliers across region. NULL disable clipping colors Character vector color values (optionally named group) legend_label Custom label put legend (longer used color legend shown anymore) zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) return_data true, return data just plotting rather plot. scale_bar Whether include scale bar top track (TRUE FALSE)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"Returns combined plot pseudobulk genome tracks. compatability draw_trackplot_grid(), extra attribute $patches$labels added specify labels track. return_data return_plot_list TRUE, return value modified accordingly.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"","code":"frags <- get_demo_frags() ## Use genes and blacklist to determine proper number of reads per cell genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") read_counts <- qc_scATAC(frags, genes, blacklist)$nFrags region <- \"chr4:3034877-4034877\" cell_types <- paste(\"Group\", rep(1:3, length.out = length(cellNames(frags)))) BPCells:::render_plot_from_storage( trackplot_coverage(frags, region, groups = cell_types, cell_read_counts = read_counts), width = 6, height = 3 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":null,"dir":"Reference","previous_headings":"","what":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"Break segments smaller segments length plot, divided size","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"","code":"trackplot_create_arrow_segs(data, region, size = 50, head_only = FALSE)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"data Dataframe full segments broken region Region plotted end start attr size int Number arrows span x axis track head_only bool TRUE, head segment plotted","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"Dataframe segments broken smaller segments. columns start, end, additional metadata columns original data","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot transcript models — trackplot_gene","title":"Plot transcript models — trackplot_gene","text":"Plot transcript models","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot transcript models — trackplot_gene","text":"","code":"trackplot_gene( transcripts, region, exon_size = 2.5, gene_size = 0.5, label_size = 11 * 0.8/ggplot2::.pt, track_label = \"Genes\", return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot transcript models — trackplot_gene","text":"transcripts Transcipt features given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand feature: entries marked \"transcript\" \"exon\" considered gene_name: Symbol gene ID display transcript_id: Transcritp identifier link transcripts exons Usually given output read_gencode_transcripts() region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details exon_size size exon lines units mm gene_size size intron/gene lines units mm label_size size transcript labels units mm return_data true, return data just plotting rather plot. labels Character vector labels item transcripts. NA items labeled transcript_size size transcript lines units mm","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot transcript models — trackplot_gene","text":"Plot gene locations","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot transcript models — trackplot_gene","text":"","code":"## Prep data transcripts <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) region <- \"chr4:3034877-4034877\" ## Plot gene trackplot plot <- trackplot_gene(transcripts, region) BPCells:::render_plot_from_storage(plot, width = 6, height = 1)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"Plot range-based annotation tracks (e.g. peaks)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"","code":"trackplot_genome_annotation( loci, region, color_by = NULL, colors = NULL, label_by = NULL, label_size = 11 * 0.8/ggplot2::.pt, show_strand = FALSE, annotation_size = 2.5, track_label = \"Peaks\", return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"loci Genomic loci given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details color_by Name metadata column loci use coloring, data vector length loci. Column must numeric convertible factor. colors Vector hex color codes use color scale. numeric color_by data, passed ggplot2::scale_color_gradientn(), otherwise interpreted discrete color palette ggplot2::scale_color_manual() label_by Name metadata column loci use labeling, data vector length loci. Column must hold string data. label_size size labels units mm show_strand TRUE, show strand direction arrows annotation_size size annotation lines mm return_data true, return data just plotting rather plot.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"Plot genomic loci return_data FALSE, otherwise returns data frame used generate plot","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"","code":"## Prep data ## Peaks generated from demo frags, as input into `call_peaks_tile()` peaks <- tibble::tibble( chr = factor(rep(\"chr4\", 16)), start = c(3041400, 3041733, 3037400, 3041933, 3040466, 3041200, 3038200, 3038000, 3040266, 3037733, 3040800, 3042133, 3038466, 3037200, 3043333, 3040066), end = c(3041600, 3041933, 3037600, 3042133, 3040666, 3041400, 3038400, 3038200, 3040466, 3037933, 3041000, 3042333, 3038666, 3037400, 3043533, 3040266), enrichment = c(46.4, 43.5, 28.4, 27.3, 17.3, 11.7, 10.5, 7.95, 7.22, 6.86, 6.32, 6.14, 5.96, 5.06, 4.51, 3.43) ) region <- \"chr4:3034877-3044877\" ## Plot peaks BPCells:::render_plot_from_storage( trackplot_genome_annotation(peaks, region, color_by = \"enrichment\"), width = 6, height = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot loops — trackplot_loop","title":"Plot loops — trackplot_loop","text":"Plot loops","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot loops — trackplot_loop","text":"","code":"trackplot_loop( loops, region, color_by = NULL, colors = NULL, allow_truncated = TRUE, curvature = 0.75, track_label = \"Links\", return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot loops — trackplot_loop","text":"loops Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details color_by Name metadata column loops use coloring, data vector length loci. Column must numeric convertible factor. colors Vector hex color codes use color scale. numeric color_by data, passed ggplot2::scale_color_gradientn(), otherwise interpreted discrete color palette ggplot2::scale_color_manual() allow_truncated FALSE, remove loops fully contained within region curvature Curvature value 0 1. 1 180-degree arc, 0 flat lines. return_data true, return data just plotting rather plot.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot loops — trackplot_loop","text":"Plot loops connecting genomic coordinates","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot loops — trackplot_loop","text":"","code":"peaks <- c(3054877, 3334877, 3534877, 3634877, 3734877) loops <- tibble::tibble( chr = \"chr4\", start = peaks[c(1,1,2,3)], end = peaks[c(2,3,4,5)], score = c(4,1,3,2) ) region <- \"chr4:3034877-4034877\" ## Plot loops plot <- trackplot_loop(loops, region, color_by = \"score\") BPCells:::render_plot_from_storage(plot, width = 6, height = 1.5)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":null,"dir":"Reference","previous_headings":"","what":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"Normalize trackplot ranges data, handling metadata argument renaming type conversions Type conversions follows: color -> factor numeric label -> string","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"","code":"trackplot_normalize_ranges_with_metadata(data, metadata)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"data Input ranges-like object metadata List form e.g. list(color=color_by, label=label_by). values can either column names data vectors. NULL values skipped","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"Tibble normalized ranges additional columns populated requested metadata","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot scale bar — trackplot_scalebar","title":"Plot scale bar — trackplot_scalebar","text":"Plots human-readable scale bar coordinates region plotted","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot scale bar — trackplot_scalebar","text":"","code":"trackplot_scalebar(region, font_pt = 11)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot scale bar — trackplot_scalebar","text":"region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details font_pt Font size scale bar labels units pt.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot scale bar — trackplot_scalebar","text":"Plot coordinates scalebar plotted genomic region","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot scale bar — trackplot_scalebar","text":"","code":"region <- \"chr4:3034877-3044877\" BPCells:::render_plot_from_storage( trackplot_scalebar(region), width = 6, height = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":null,"dir":"Reference","previous_headings":"","what":"Adjust trackplot properties — set_trackplot_label","title":"Adjust trackplot properties — set_trackplot_label","text":"Adjust labels heights trackplots. Labels set facet labels ggplot2, heights additional properties read trackplot_combine() determine relative height input plots.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adjust trackplot properties — set_trackplot_label","text":"","code":"set_trackplot_label(plot, labels) set_trackplot_height(plot, height) get_trackplot_height(plot)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Adjust trackplot properties — set_trackplot_label","text":"plot ggplot object labels character vector labels – must match existing number facets plot height New height. numeric, adjusts relative height. ggplot2::unit grid::unit sets absolute height specified units. \"null\" units interpreted relative height.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Adjust trackplot properties — set_trackplot_label","text":"set_trackplot_label: ggplot object adjusted facet labels set_trackplot_height: ggplot object adjusted trackplot height get_trackplot_height: ggplot2::unit object height setting","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":null,"dir":"Reference","previous_headings":"","what":"Transpose the storage order for a matrix — transpose_storage_order","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"Transpose storage order matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"","code":"transpose_storage_order( matrix, outdir = tempfile(\"transpose\"), tmpdir = tempdir(), load_bytes = 4194304L, sort_bytes = 1073741824L )"},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"matrix Input matrix outdir Directory store output tmpdir Temporary directory use intermediate storage load_bytes minimum contiguous load size merge sort passes sort_bytes amount memory allocate re-sorting chunks entries","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"MatrixDir object copy input matrix, storage order flipped","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"re-sorts entries matrix change storage order row-major col-major. large matrices, can slow – around 2 minutes transpose 500k cell RNA-seq matrix default load_bytes (4MiB) sort_bytes (1GiB) parameters allow ~85GB data sorted two passes data, ~7.3TB data sorted three passes data.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"","code":"mat <- matrix(rnorm(50), nrow = 10, ncol = 5) rownames(mat) <- paste0(\"gene\", seq_len(10)) colnames(mat) <- paste0(\"cell\", seq_len(5)) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") mat #> 10 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ## A regular transpose operation switches a user's rows and cols t(mat) #> 5 x 10 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: cell1, cell2 ... cell5 #> Col names: gene1, gene2 ... gene10 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ## Running `transpose_storage_order()` instead changes whether the storage is in row-major or col-major, ## but does not switch the rows and cols transpose_storage_order(mat) #> 10 x 5 IterableMatrix object with class MatrixDir #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Load compressed matrix from directory /tmp/RtmpCiGY9C/transpose3d2cda1481e785"},{"path":"https://bnprks.github.io/BPCells/reference/wrapMatrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an S4 matrix object wrapping another matrix object — wrapMatrix","title":"Construct an S4 matrix object wrapping another matrix object — wrapMatrix","text":"Helps avoid duplicate storage dimnames","code":""},{"path":"https://bnprks.github.io/BPCells/reference/wrapMatrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an S4 matrix object wrapping another matrix object — wrapMatrix","text":"","code":"wrapMatrix(class, m, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":null,"dir":"Reference","previous_headings":"","what":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"Write insertion counts data one pseudobulks bed/bedgraph format. Beds hold chrom, start, end data, bedGraphs also provide score column. reports total number insertions basepair group listed cell_groups.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"","code":"write_insertion_bedgraph( fragments, path, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), insertion_mode = c(\"both\", \"start_only\", \"end_only\"), tile_width = 1, normalization_method = c(\"none\", \"cpm\", \"n_cells\"), chrom_sizes = NULL ) write_insertion_bed( fragments, path, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), insertion_mode = c(\"both\", \"start_only\", \"end_only\"), verbose = FALSE, threads = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"fragments IterableFragments object path (character vector) Path(s) save bed/bedgraphs , optionally ending \".gz\" add gzip compression. cell_groups provided, path must named character vector, one name level cell_groups cell_groups Character factor assigning group cell, order cellNames(fragments) insertion_mode (string) fragment ends use coverage calculation. One \"\", \"start_only\", \"end_only\" tile_width (integer) Width tiles use binning insertions. insertions single bin summed. tile_width 1, functionally equivalent write_insertion_bedgraph(). normalization_method (character) Normalization method use. One : none: normalization cpm: Normalize total number fragments group, scaling 1 million fragments (.e. CPM). n_cells: Normalize total number cells group. chrom_sizes (GRanges, data.frame, list, numeric, NULL) Chromosome sizes clip tiles end chromosome. NULL, tile_width required 1. data.frame list, must contain columns chr end (See help(\"genomic-ranges-like\")). numeric vector, assumed chromosome sizes order chrNames(fragments). verbose (bool) Whether provide verbose progress output console. threads (int) Number threads use.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"NULL","code":""},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"","code":"## Prep data frags <- get_demo_frags() bedgraph_outputs <- file.path(tempdir(), \"bedgraph_outputs\") ###################################################### ## `write_insertion_bedgraph()` examples ###################################################### ## Write insertions write_insertion_bedgraph(frags, file.path(bedgraph_outputs, \"all.tar.gz\")) list.files(bedgraph_outputs) #> [1] \"all.tar.gz\" # With tiling chrom_sizes <- read_ucsc_chrom_sizes(\"./reference\", genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) write_insertion_bedgraph(frags, file.path(bedgraph_outputs, \"all_tiled.bedGraph\"), chrom_sizes = chrom_sizes, normalization_method = \"cpm\", tile_width = 100) reads <- readr::read_tsv(file.path(bedgraph_outputs, \"all_tiled.bedGraph\"), col_names = c(\"chr\", \"start\", \"end\", \"score\"), show_col_types = FALSE) head(reads) #> # A tibble: 6 × 4 #> chr start end score #> #> 1 chr4 10000 10100 1.45 #> 2 chr4 10100 10200 0.869 #> 3 chr4 10300 10400 0.290 #> 4 chr4 10400 10500 0.145 #> 5 chr4 10600 10700 0.434 #> 6 chr4 11100 11200 0.145 ###################################################### ## `write_insertion_bed()` examples ###################################################### # We utilize two groups this time bed_outputs <- file.path(tempdir(), \"bed_outputs\") cell_groups <- rep(c(\"A\", \"B\"), length.out = length(cellNames(frags))) bed_paths <- c(file.path(bed_outputs, \"A.bed\"), file.path(bed_outputs, \"B.bed\")) names(bed_paths) <- c(\"A\", \"B\") write_insertion_bed( frags, path = bed_paths, cell_groups = cell_groups, verbose = TRUE ) #> 2025-11-10 15:00:48 Writing bed file for cluster: A #> 2025-11-10 15:00:48 Bed file for cluster: A written to: /tmp/RtmpCiGY9C/bed_outputs/A.bed #> 2025-11-10 15:00:48 Writing bed file for cluster: B #> 2025-11-10 15:00:49 Bed file for cluster: B written to: /tmp/RtmpCiGY9C/bed_outputs/B.bed #> 2025-11-10 15:00:49 Finished writing bed files list.files(bed_outputs) #> [1] \"A.bed\" \"B.bed\" head(readr::read_tsv( file.path(bed_outputs, \"A.bed\"), col_names = c(\"chr\", \"start\", \"end\"), show_col_types = FALSE) ) #> # A tibble: 6 × 3 #> chr start end #> #> 1 chr4 10035 10036 #> 2 chr4 10045 10046 #> 3 chr4 10045 10046 #> 4 chr4 10046 10047 #> 5 chr4 10046 10047 #> 6 chr4 10066 10067"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"breaking-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"BPCells 0.4.0 (in-progress main branch)","text":"Change first parameter name cluster_graph_leiden(), cluster_graph_louvain() cluster_graph_seurat() snn mat accurately reflect input type. (pull request #292)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-4-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.4.0 (in-progress main branch)","text":"Create wrapper function cluster_cells_graph() wraps steps knn object creation, graph adjacency creation, clustering within single function (pull request #292) Add tile_width normalization arguments write_insertion_bedgraph() allow flexible bedgraph creation (pull request #299) Export write_insertion_bed(), originally helper peak calling (pull request #302)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-4-0","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.4.0 (in-progress main branch)","text":"Fix error documentation examples plot_embedding(), resulting way documentation examples use nested function calls (pull request #316).","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"to-dos-0-4-0","dir":"Changelog","previous_headings":"","what":"To-dos","title":"BPCells 0.4.0 (in-progress main branch)","text":"Add support sparse pseudobulking pseudobulk_matrix(). Currently progress #268. Add support duplicate rows/cols subsetting operations. Add support matrix matrix addition. Maybe add CCA support? Refactor C++ backend take logic R S4 methods. allow cleaner seperation R C++ code, allow much quicker porting Python future.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bpcells-031-7212025","dir":"Changelog","previous_headings":"","what":"BPCells 0.3.1 (7/21/2025)","title":"BPCells 0.3.1 (7/21/2025)","text":"BPCells 0.3.1 release covers 7 months changes 40 commits 5 contributors. Notable changes include writing matrices AnnData’s dense format, methods retrieving demo data testing examples. Full details changes . Thanks @ycli1995 @mfansler pull requests contributed release, well users submitted github issues help identify fix bugs.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-3-1","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.3.1 (7/21/2025)","text":"Add write_matrix_anndata_hdf5_dense() allows writing matrices AnnData’s dense format, commonly used obsm varm matrices. (Thanks @ycli1995 pull request #166) Add get_demo_mat(), get_demo_frags() remove_demo_data() retrieve small test matrix/fragments object PBMC 3k dataset 10X Genomics. (pull request #193)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"improvements-0-3-1","dir":"Changelog","previous_headings":"","what":"Improvements","title":"BPCells 0.3.1 (7/21/2025)","text":"Speed taking large subsets large concatenated matrices, e.g. selecting 9M cells 10M cell matrix composed ~100 concatenated pieces. (pull request #179) matrix_stats() now also works types matrix dgCMatrix. (pull request #190) Fixed memory errors running writeInsertionBed() writeInsertionBedGraph() (pull request #{118, 134}) Export merge_peaks_iterative(), helps create non-overlapping peak sets. (pull request #216) Add support uint16_t reading anndata matrices using open_matrix_anndata_hdf5(). (pull request #248) Switch write_matrix_10x_hdf5() use signed rather unsigned integers indices, indptr, shape improve compatibility 10x-produced files. (Thanks @ycli1995 pull request #256) Change behaviour cbind() rbind() matrices different types, upcast instead erroring . (pull request #265)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-3-1","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.3.1 (7/21/2025)","text":"Fix error message printing MACS crashes call_peaks_macs() (pull request #175) Fix gene_score_archr() gene_score_weights_archr() malfunctioning non-default tile_width settings. (Thanks @Baboon61 reporting issue #185) Fix gene_score_archr() chromosome_sizes argument sorted. (Thanks @Baboon61 reporting issue #188) Fix matrix transpose error BPCells loaded via devtools::load_all() BiocGenerics imported previously. (pull request #191) Fix error using single group write_insertion_bedgraph() (pull request #214) Fix GRanges conversion functions sometimes defined BPCells built binary package prior GenomicRanges installed. (pull request #231; thanks @mfansler reporting issue #229) Fix error write_matrix_hdf5() overwriting .h5 file exist. (pull request #234) Fix configure script use pre-installed libhwy available installation time. (Thanks @mfansler submitting PR #228) Fix line-ending issue caused windows-created matrices readable platforms. (pull request #257; thanks @pavsol reporting issue #253) Fix compilation exists system-installed libhwy old. (pull request #288, thanks @GerardoZA reporting issue #285)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bpcells-030-12212024","dir":"Changelog","previous_headings":"","what":"BPCells 0.3.0 (12/21/2024)","title":"BPCells 0.3.0 (12/21/2024)","text":"BPCells 0.3.0 release covers 6 months changes 45 commits 5 contributors. Notable improvements release include support peak calling MACS addition pseudobulk matrix stats calculations. also released initial prototype BPCells Python library (details ). Full details changes . Thanks @ycli1995, @Yunuuuu, @douglasgscofield pull requests contributed release, well users sumitted github issues help identify fix bugs. also added @immanuelazn team new hire! responsible many new features release continue help maintenance new development moving forwards.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-3-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.3.0 (12/21/2024)","text":"apply_by_col() apply_by_row() allow providing custom R functions compute per row/col summaries. initial tests calculating row/col means using R functions ~2x slower C++-based implementation memory usage remains low. Add rowMaxs() colMaxs() functions, return maximum value row column matrix. matrixStats MatrixGenerics packages installed, BPCells::rowMaxs() fall back implementations non-BPCells objects. Thanks @immanuelazn first contribution new lab hire! Add regress_out() allow removing unwanted sources variation via least squares linear regression models. Thanks @ycli1995 pull request #110 Add trackplot_genome_annotation() plotting peaks, options directional arrows, colors, labels, peak widths. (pull request #113) Add MACS2/3 input creation peak calling call_peaks_macs()(pull request #118). Note, renamed call_macs_peaks() pull request #143 Add rowQuantiles() colQuantiles() functions, return quantiles row/column matrix. Currently rowQuantiles() works row-major matrices colQuantiles() works col-major matrices. matrixStats MatrixGenerics packages installed, BPCells::colQuantiles() fall back implementations non-BPCells objects. (pull request #128) Add pseudobulk_matrix() allows pseudobulk aggregation sum mean calculation per-pseudobulk variance nonzero statistics gene (pull request #128)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"improvements-0-3-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"BPCells 0.3.0 (12/21/2024)","text":"trackplot_loop() now accepts discrete color scales trackplot_combine() now smarter layout logic margins, well detecting plots combined cover different genomic regions. (pull request #116) select_cells() select_chromosomes() now also allow using logical mask selection. (pull request #117) BPCells installation can now also configured setting LDFLAGS CFLAGS environment variables addition setting ~/.R/Makevars (pull request #124) open_matrix_anndata_hdf5() now supports reading AnnData matrices dense format. (pull request #146) cluster_graph_leiden() now better defaults produce reasonable cluster counts regardless dataset size. (pull request #147)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-3-0","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.3.0 (12/21/2024)","text":"Fixed error message matrix large converted dgCMatrix. (Thanks @RookieA1 reporting issue #95) Fixed forgetting dimnames subsetting certain sets operations. (Thanks @Yunuuuu reporting issues #97 #100) Fixed plotting crashes running trackplot_coverage() fragments single cluster. (Thanks @sjessa directly reporting bug coming fix) Fixed issues trackplot_coverage() called ranges less 500 bp length (Thanks @bettybliu directly reporting bug.) Fix Rcpp warning created handling compressed matrices one non-zero entry (pull request #123) Fixed discrepancy default ArchR BPCells peak calling insertion method, BPCells defaulted using start fragment opposed ArchR’s method using start end sites fragments (pull request #143) Fix error tile_matrix() fragment mode (pull request #141) Fix precision bug sctransform_pearson() ARM architecture (pull request #141) Fix type-confusion error pseudobulk_matrix() gets integer matrix (pull request #174)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"deprecations-0-3-0","dir":"Changelog","previous_headings":"","what":"Deprecations","title":"BPCells 0.3.0 (12/21/2024)","text":"trackplot_coverage() legend_label argument now ignored, color legend longer shown default coverage plots.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bpcells-020-6142024","dir":"Changelog","previous_headings":"","what":"BPCells 0.2.0 (6/14/2024)","title":"BPCells 0.2.0 (6/14/2024)","text":"finally declaring new release version, covering large amount changes improvements past year. Among major features parallelization options svds() matrix_stats(), improved genomic track plots, runtime CPU feature detection SIMD code (enables higher performance, portable builds). Full details changes . version also comes new installation path, done preparation future Python package release. (can one folder R one Python, rather R files sit root folder). breaking change requires slightly modified installation command. Thanks @brgew, @ycli1995, @Yunuuuu pull requests contributed release, well users submitted github issues help identify fix bugs.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"breaking-changes-0-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"BPCells 0.2.0 (6/14/2024)","text":"r-universe mirrors add \"subdir\": \"r\" packages.json config. New slots added 10x matrix objects, saved RDS files may need 10x matrix inputs re-opened replaced calling all_matrix_inputs(). Outside loading old RDS files changes needed. trackplot_gene() now returns plot facet label match new trackplot system. label can removed calling trackplot_gene(...) + ggplot2::facet_null() equivalent old function’s output.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"deprecations-0-2-0","dir":"Changelog","previous_headings":"","what":"Deprecations","title":"BPCells 0.2.0 (6/14/2024)","text":"draw_trackplot_grid() deprecated, replaced trackplot_combine() simplified arguments trackplot_bulk() deprecated, replaced trackplot_coverage() equivalent functionality old function names output deprecation warnings, otherwise work .","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-2-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.2.0 (6/14/2024)","text":"New svds() function, based excellent Spectra C++ library (used RSpectra) Yixuan Qiu. ensure lower memory usage compared irlba, achieving similar speed + accuracy. normalizations supported, operations like marker_features() writing matrix disk remain single-threaded. Running svds() many threads gene-major matrices can result high memory usage now. problem present cell-major matrices. Reading text-based MatrixMarket inputs (e.g. 10x Parse) now supported via import_matrix_market() convenience function import_matrix_market_10x(). implementation uses disk-backed sorting allow importing large files low memory usage. Added binarize() function associated generics <, <=, >, >=. supports comparison non-negative numbers currently. (Thanks contribution @brgew) Added round() matrix transformation (Thanks contributions @brgew) Add getter/setter function all_matrix_inputs() help enable relocating underlying storage BPCells matrix transform objects. hdf5-writing functions now support gzip_level parameter, enable shuffle + gzip filter compression. generally much slower bitpacking compression, adds improved storage options files must read outside programs. Thanks @ycli1995 submitting improvement pull #42. AnnData export now supported via write_matrix_anndata_hdf5() (issue #49) Re-licensed code base use dual-licensed Apache V2 MIT instead GPLv3 Assigning subset now supported (e.g. m1[,j] <- m2). Note modify data disk. Instead, uses series subsetting concatenation operations provide appearance overwriting appropriate entries. Added knn_to_geodesic_graph(), matches Scanpy default construction graph-based clustering Add checksum(), allows calculating MD5 checksum matrix contents. Thanks @brgrew submitting improvement pull request #83 write_insertion_bedgraph() allows exporting pseudobulk insertion data bedgraph format","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"improvements-0-2-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"BPCells 0.2.0 (6/14/2024)","text":"Merging fragments c() now handles inputs mismatched chromosome names. Merging fragments now 2-3.5x faster SNN graph construction knn_to_snn_graph() work smoothly large datasets due C++ implementation Reduced memory usage marker_features() samples millions cells large number clusters compare. Windows, increased maximum number files can simultaneously open. Previously, opening >63 compressed counts matrices simultaneously hit limit. Now least 1,000 simultaneous matrices possible. Subsetting peak tile matrices [ now propagates always avoid computing parts peak/tile matrix discarded subset. Subsetting tile matrix automatically convert peak matrix possible improved efficiency. Subsetting RowBindMatrices ColBindMatrices now propagates avoid touching matrices selected indices Added logic help reduce cases subsetting causes BPCells fall back less efficient matrix-vector multiply algorithm. affects math transforms. part , filtering part subset propagate earlier transformation steps, reordering . Thanks @nimanouri-nm raising issue #65 fix bug initial implementation. Additional C++17 filesystem backwards compatibility allow slightly older compilers GCC 7.5 build BPCells. .matrix() produce integer matrices appropriate (Thanks @Yunuuuu pull #77) 10x HDF5 matrices can now read write non-integer types requested (Thanks @ycli1995 pull #75) Old-style 10x files cellranger v2 can now read multi-genome files, returned list (Thanks @ycli1995 pull #75) Trackplots now use faceting provide per-plot labels, leading easier--use trackplot_combine() trackplot_gene() now draws arrows direction transcription trackplot_loop() new track type allows plotting interactions genomic regions, instance peak-gene correlations loop calls Hi-C trackplot_scalebar() added show genomic scale trackplot functions now return ggplot objects additional metadata stored plotting height track Labels heights trackplots can adjusted using set_trackplot_label() set_trackplot_height() getting started pbmc 3k vignette now includes updated trackplot APIs final example Add rowVars() colVars() functions, convenience wrappers around matrix_stats(). matrixStats MatrixGenerics packages installed, BPCells::rowVars() fall back implementations non-BPCells objects. Unfortunately, matrixStats::rowVars() generic, either BPCells::rowVars() BPCells::colVars() Optimize mean variance calculations matrices added per-row per-column constant. Adds run-time detection CPU features eliminate architecture-specific compilation now, Pow SIMD implementation removed, Square gets new SIMD implementation Empirically, operations using SIMD math instructions 2x faster. includes log1p(), sctransform_pearson() Minor speedups dense-sparse matrix multiply functions (1.1-1.5x faster)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-2-0","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.2.0 (6/14/2024)","text":"Fixed fragment transforms using chrNames(frags) <- val cellNames(frags) <- val cause downstream errors. Fixed errors transpose_storage_order() matrices >4 billion non-zero entries. Fixed error transpose_storage_order() matrices non-zero entries. Fixed bug writing fragment files >512 chromosomes. Fixed bug reading fragment files >4 billion fragments. Fixed file permissions errors using read-hdf5 files (Issue #26 reported thanks @ttumkaya) Renaming rownames() colnames() now propagated saving matrices (Issue #29 reported thanks @realzehuali, additional fix report thanks @Dario-Rocha) Fixed 64-bit integer overflow (!) cause incorrect p-value calculations marker_features() features 2.6 million zeros. Improved robustness Windows installation process setups need -lsz linker flag compile hdf5 Fixed possible memory safety bug wrapped R objects (dgCMatrix) potentially garbage collected C++ still trying access data rare circumstances. Fixed case dimnames preserved calling convert_matrix_type() twice row cancels (e.g. double -> uint32_t -> double). Thanks @brgrew reporting issue #43 Caused fixed issue resulting unusably slow performance reading matrices HDF5 files. Broken versions range commit 21f8dcf fix 3711a40 (October 18-November 3, 2023). Thanks @abhiachoudhary reporting issue #53 Fixed error svds() handling row-major matrices correctly. Thanks @ycli1995 reporting issue #55 Fixed error row/col name handling AnnData matrices. Thanks @lisch7 reporting issue #57 Fixed error merging matrices different data types. Thanks @Yunuuuu identifying issue providing fix (#68 #70) Fixed issue losing dimnames subset assignment [<-. Thanks @Yunuuuu identifying issue #67 Fixed incorrect results cases scaling matrix shifting. Thanks @Yunuuuu identifying issue #72 Fixed infinite loop bug calling transpose_storage_order() densely-transformed matrix. Thanks @Yunuuuu reporting issue #71 h5ad outputs now subset properly loaded Python anndata package (Thanks issue described @ggruenhagen3 issue #49 fixed @ycli1995 pull #81) Disk-backed fragment objects now load via absolute path, matching behavior matrices making objects loaded via readRDS() can used different working directories. footprints() now respects user interrupts via Ctrl-C","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-1-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.1.0 (4/7/2023)","text":"Reading/writing 10x fragment files disk Reading/writing compressed fragments disk (folder hdf5 group) Interconversion fragments objects GRanges / data.frame Merging multiple source fragment files transparently run time Calculation Cell x Peak matrices, Cell x Tile matrices ArchR-compatible QC calculations ArchR-compatible gene activity score calculations Filtering fragments chromosmes, cells, lengths, genomic region Fast peak calling approximation via overlapping tiles Conversion /R sparse matrices Read-write access 10x hdf5 feature matrices, read-access AnnData files Reading/writing compressed matrices disk (folder hdf5 group) Support integer single/double-precision floating point matrices disk Fast transposition storage order, switch indexing cell gene/feature. Concatenation multiple source matrix files transparently run time Single-pass calculation row/column mean variance Wilcoxon marker feature calculation Transparent handling vector +, -, *, /, log1p streaming normalization, along less common operations. allows implementation ATAC-seq LSI Seurat default normalization, along published log-based normalizations. SCTransform pearson residual calculation Multiplication sparse matrices Read count knee cutoffs UMAP embeddings Dot plots Transcription factor footprinting / TSS profile plotting Fragments vs. TSS Enrichment ATAC-seq QC plot Pseudobulk genome track plots, gene annotation plots Matching gene symbols/IDs canonical symbols Download transcript annotations Gencode GTF files Download + parse UCSC chromosome sizes Parse peak files BED format; Download ENCODE blacklist region Wrappers knn graph calculation + clustering Note: operations interoperate storage formats. example, matrix operations can applied directly AnnData 10x matrix file. many cases bitpacking-compressed formats provide performance/space advantages, required use computations.","code":""}] +[{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Basic tutorial","text":"tutorial, : Load RNA ATAC-seq data 10x multiome experiment Filter high-quality cells RNA PCA + UMAP dimensionality reduction Unbiased clustering Visualize marker genes annotate clusters Call ATAC-seq peaks ATAC PCA + UMAP dimensionality reduction Visualize transcription factor footprints Plot accessibility genome tracks tutorial work--progress, inspired Seurat’s PBMC 3k clustering tutorial.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"install-packages","dir":"Articles","previous_headings":"Setup","what":"Install packages","title":"Basic tutorial","text":"Install cran dependencies: irlba (PCA) uwot (UMAP) RcppHNSW (clustering) igraph (clustering) BiocManager (access bioconductor packages) ggplot2 version <=3.3.5 >=3.4.1 (hexbin broken versions 3.3.6-3.4.0) Bioconductor dependencies: BSgenome.Hsapiens.UCSC.hg38 (TF motif scanning) Github: motifmatchr (TF motif scanning) chromVARmotifs (TF motif database)","code":"install.packages(c(\"irlba\", \"uwot\", \"RcppHNSW\", \"igraph\", \"BiocManager\", \"remotes\", \"ggplot2\")) BiocManager::install(\"BSgenome.Hsapiens.UCSC.hg38\") remotes::install_github(c(\"GreenleafLab/motifmatchr\", \"GreenleafLab/chromVARmotifs\"), repos=BiocManager::repositories()) remotes::install_github(c(\"bnprks/BPCells/r\"))"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"set-up-analysis-folder","dir":"Articles","previous_headings":"Setup","what":"Set up analysis folder","title":"Basic tutorial","text":"","code":"library(BPCells) suppressPackageStartupMessages({ library(dplyr) }) # Substitute your preferred working directory for data_dir data_dir <- file.path(tempdir(), \"pbmc-3k\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) setwd(data_dir)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"download-data","dir":"Articles","previous_headings":"Setup","what":"Download data","title":"Basic tutorial","text":"Next, download 3k PBMC dataset 10x Genomics temporary directory. files 500MB large combined","code":"url_base <- \"https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_3k/\" rna_raw_url <- paste0(url_base, \"pbmc_granulocyte_sorted_3k_raw_feature_bc_matrix.h5\") atac_raw_url <- paste0(url_base, \"pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz\") # Increase download timeout from 60 seconds to 5 minutes options(timeout=300) # Only download files if we haven't downloaded already if (!file.exists(\"pbmc_3k_10x.h5\")) { download.file(rna_raw_url, \"pbmc_3k_10x.h5\", mode=\"wb\") } if (!file.exists(\"pbmc_3k_10x.fragments.tsv.gz\")) { download.file(atac_raw_url, \"pbmc_3k_10x.fragments.tsv.gz\", mode=\"wb\") }"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"data-loading","dir":"Articles","previous_headings":"","what":"Data Loading","title":"Basic tutorial","text":"First, convert raw data inputs 10x format bitpacked compressed format stored binary files disk. BPCells can still read data don’t convert format, certain ATAC-seq functionality run much faster converted data. Convert RNA matrix: Convert ATAC-seq fragments ATAC storage space dropped 468 MB gzipped 10x file 209 MB bitpacked storage. RNA storage space dropped 51.2 MB 10x hdf5 file gzip compression 33.5 MB using bitpacking compression. case, storage space little misleading since 39% bitpacked storage spent gene + cell names. case 10x compressed hdf5 bitpacking compression 4-6x smaller uncompressed sparse matrix format AnnData h5ad’s default.","code":"# Check if we already ran import if (!file.exists(\"pbmc_3k_rna_raw\")) { mat_raw <- open_matrix_10x_hdf5(\"pbmc_3k_10x.h5\", feature_type=\"Gene Expression\") %>% write_matrix_dir(\"pbmc_3k_rna_raw\") } else { mat_raw <- open_matrix_dir(\"pbmc_3k_rna_raw\") } mat_raw ## 36601 x 650165 IterableMatrix object with class MatrixDir ## ## Row names: ENSG00000243485, ENSG00000237613 ... ENSG00000277196 ## Col names: AAACAGCCAAACAACA-1, AAACAGCCAAACATAG-1 ... TTTGTTGGTTTGTTGC-1 ## ## Data type: uint32_t ## Storage order: column major ## ## Queued Operations: ## 1. Load compressed matrix from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_rna_raw # Check if we already ran import if (!file.exists(\"pbmc_3k_frags\")) { frags_raw <- open_fragments_10x(\"pbmc_3k_10x.fragments.tsv.gz\") %>% write_fragments_dir(\"pbmc_3k_frags\") } else { frags_raw <- open_fragments_dir(\"pbmc_3k_frags\") } frags_raw ## IterableFragments object of class \"FragmentsDir\" ## ## Cells: 462264 cells with names TTTAGCAAGGTAGCTT-1, GCCTTTGGTTGGTTCT-1 ... ATCACCCTCCATAATG-1 ## Chromosomes: 39 chromosomes with names chr1, chr10 ... KI270713.1 ## ## Queued Operations: ## 1. Read compressed fragments from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_frags"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"rna-seq-filtering","dir":"Articles","previous_headings":"Filter for high-quality cells","what":"RNA-seq filtering","title":"Basic tutorial","text":"use simple minimum read threshold RNA-seq quality. cutoff choose just first knee log-log plot reads vs. barcode rank, separates cells empty droplets.","code":"reads_per_cell <- Matrix::colSums(mat_raw) plot_read_count_knee(reads_per_cell, cutoff = 1e3)"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"download-reference-annotations","dir":"Articles","previous_headings":"Filter for high-quality cells > ATAC-seq filtering","what":"Download reference annotations","title":"Basic tutorial","text":"fetch reference information necessary calculate quality-control statistics. default, fetches latest annotations hg38. Since fetching references involves downloading gtf bed files, provide name directory save files . also allows us skip re-downloading files next time.","code":"genes <- read_gencode_transcripts( \"./references\", release=\"42\", transcript_choice=\"MANE_Select\", annotation_set = \"basic\", features=\"transcript\" # Make sure to set this so we don't get exons as well ) head(genes) ## # A tibble: 6 × 13 ## chr source feature start end score strand frame gene_id gene_type ## ## 1 chr1 HAVANA transcript 65418 71585 . + . ENSG000001… protein_… ## 2 chr1 HAVANA transcript 450739 451678 . - . ENSG000002… protein_… ## 3 chr1 HAVANA transcript 685715 686654 . - . ENSG000002… protein_… ## 4 chr1 HAVANA transcript 923922 944574 . + . ENSG000001… protein_… ## 5 chr1 HAVANA transcript 944202 959256 . - . ENSG000001… protein_… ## 6 chr1 HAVANA transcript 960583 965719 . + . ENSG000001… protein_… ## # ℹ 3 more variables: gene_name , transcript_id , MANE_Select blacklist <- read_encode_blacklist(\"./references\", genome=\"hg38\") head(blacklist) ## # A tibble: 6 × 4 ## chr start end reason ## ## 1 chr10 0 45700 Low Mappability ## 2 chr10 38481300 38596500 High Signal Region ## 3 chr10 38782600 38967900 High Signal Region ## 4 chr10 39901300 41712900 High Signal Region ## 5 chr10 41838900 42107300 High Signal Region ## 6 chr10 42279400 42322500 High Signal Region chrom_sizes <- read_ucsc_chrom_sizes(\"./references\", genome=\"hg38\") head(chrom_sizes) ## # A tibble: 6 × 3 ## chr start end ## ## 1 chr1 0 248956422 ## 2 chr2 0 242193529 ## 3 chr3 0 198295559 ## 4 chr4 0 190214555 ## 5 chr5 0 181538259 ## 6 chr6 0 170805979"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"calculate-atac-seq-quality-control-metrics","dir":"Articles","previous_headings":"Filter for high-quality cells > ATAC-seq filtering","what":"Calculate ATAC-seq quality-control metrics","title":"Basic tutorial","text":"can calculate several built-quality control metrics barcode, including number fragments TSS enrichment. calculations fully compatible ArchR’s methodology quality control statistics. One key ways identify high-quality cells ATAC-seq data plot number fragments vs. TSS Enrichment. plot puts empty droplets bottom-left quadrant, low-quality/dead cells bottom-right quadrant, high-quality cells top-right quadrant. flow-cytometry perspective, use bottom-left group empty droplets negative control help set cutoffs. Due thresholding ArchR’s formula applies denominator TSS Enrichment calculation, low-read cells can’t assigned high TSS Enrichment value. plot TSS enrichment without thresholding, following: Note 200/101 fraction accounts ReadsInTSS drawing 101-bp windows, ReadsFlankingTSS drawing 2x100-bp windows. results low-read droplets measuring high TSS Enrichment, use slightly adjusted cutoffs. can also plot sample-level quality control plots. left, fragment length distribution shows three broad bumps corresponding nucleosome spacing (147bp), smaller wiggles corresponding DNA winding (11.5bp). right, TSS enrichment profile shows strong enrichment signal transcription start sites, well small asymmetrical bump downstream TSS +1 nucleosome.","code":"atac_qc <- qc_scATAC(frags_raw, genes, blacklist) head(atac_qc) ## # A tibble: 6 × 10 ## cellName TSSEnrichment nFrags subNucleosomal monoNucleosomal multiNucleosomal ## ## 1 TTTAGCAA… 45.1 16363 8069 5588 2706 ## 2 GCCTTTGG… 0.198 3 1 2 0 ## 3 AGCCGGTT… 30.9 33313 15855 11868 5590 ## 4 TGATTAGT… 41.9 11908 6103 3817 1988 ## 5 ATTGACTC… 43.9 13075 6932 4141 2002 ## 6 CGTTAGGT… 31.5 14874 6833 5405 2636 ## # ℹ 4 more variables: ReadsInTSS , ReadsFlankingTSS , ## # ReadsInPromoter , ReadsInBlacklist plot_tss_scatter(atac_qc, min_frags=1000, min_tss=10) atac_qc %>% dplyr::mutate(TSSEnrichment=ReadsInTSS/pmax(1,ReadsFlankingTSS) * 200/101) %>% plot_tss_scatter(min_frags=2000, min_tss=20) + ggplot2::labs(title=\"Raw TSS Enrichment\") plot_fragment_length(frags_raw) + plot_tss_profile(frags_raw, genes)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"select-high-quality-cells","dir":"Articles","previous_headings":"Filter for high-quality cells","what":"Select high-quality cells","title":"Basic tutorial","text":"take cells pass minimum RNA reads, minimum ATAC reads, minimum TSS Enrichment cutoffs. subset RNA ATAC input data just cells passing filter. RNA, subset genes least 3 reads. subset operation also puts cells matching order simplifies cross-modality calculations later .","code":"pass_atac <- atac_qc %>% dplyr::filter(nFrags > 1000, TSSEnrichment > 10) %>% dplyr::pull(cellName) pass_rna <- colnames(mat_raw)[Matrix::colSums(mat_raw) > 1e3] keeper_cells <- intersect(pass_atac, pass_rna) frags <- frags_raw %>% select_cells(keeper_cells) keeper_genes <- Matrix::rowSums(mat_raw) > 3 mat <- mat_raw[keeper_genes,keeper_cells]"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"matrix-normalization","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"Matrix normalization","title":"Basic tutorial","text":", walk Seurat-style matrix normalization calculations manually, though soon helper functions simplify process. First log-normalize, roughly equivalent Seurat::NormalizeData Next pick variable genes: look normalized matrix object, can see quite math operations queued performed --fly needed. improve performance downstream PCA, save sparse normalized matrix temporary file just prior normalizations make matrix dense. saves storage space preventing us re-calculate queued operations several-hundred times PCA optimization iterations. case, matrix quite small ’ll just store memory. larger example swap write_matrix_dir(tempfile(\"mat\")) Finally, perform z-score normalization makes matrix dense.","code":"# Normalize by reads-per-cell mat <- multiply_cols(mat, 1/Matrix::colSums(mat)) # Log normalization mat <- log1p(mat * 10000) # Log normalization stats <- matrix_stats(mat, row_stats=\"variance\") # To keep the example small, we'll do a very naive variable gene selection variable_genes <- order(stats$row_stats[\"variance\",], decreasing=TRUE) %>% head(1000) %>% sort() mat_norm <- mat[variable_genes,] mat_norm ## 1000 x 2600 IterableMatrix object with class TransformLog1p ## ## Row names: ENSG00000078369, ENSG00000116251 ... ENSG00000212907 ## Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 ## ## Data type: double ## Storage order: column major ## ## Queued Operations: ## 1. Load compressed matrix from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_rna_raw ## 2. Select rows: 87, 171 ... 36568 and cols: 640783, 89020 ... 504383 ## 3. Convert type from uint32_t to double ## 4. Scale by 1e+04 ## 5. Scale columns by 0.000221, 0.000118 ... 0.000177 ## 6. Transform log1p mat_norm <- mat_norm %>% write_matrix_memory(compress=FALSE) gene_means <- stats$row_stats[\"mean\",variable_genes] gene_vars <- stats$row_stats[\"variance\", variable_genes] mat_norm <- (mat_norm - gene_means) / gene_vars"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"pca-and-umap","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"PCA and UMAP","title":"Basic tutorial","text":"PCA can performed standard solver like irlba, though BPCells also provides C++-level solver based Spectra package built-parallelization support. Next calculate UMAP coordinates","code":"svd <- BPCells::svds(mat_norm, k=50) # Alternate option: irlba::irlba(mat_norm, nv=50) pca <- multiply_cols(svd$v, svd$d) cat(sprintf(\"PCA dimensions: %s\\n\", toString(dim(pca)))) pca[1:4,1:3] ## PCA dimensions: 2600, 50 ## [,1] [,2] [,3] ## [1,] 15.167732 0.8951489 -2.3650024 ## [2,] 6.599775 7.2484737 4.4369185 ## [3,] 14.621697 -1.1929478 -0.6439662 ## [4,] 8.142875 1.0977223 -2.5066235 set.seed(12341512) umap <- uwot::umap(pca) umap[1:4,] ## [,1] [,2] ## [1,] 10.280173 3.032208 ## [2,] -1.513292 -9.496552 ## [3,] 10.052034 3.215374 ## [4,] 8.457198 0.609063"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"clustering","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"Clustering","title":"Basic tutorial","text":"perform quick clustering follows, based PCA coordinates. now can visualize clusters UMAP:","code":"clusts <- knn_hnsw(pca, ef=500) %>% # Find approximate nearest neighbors knn_to_snn_graph() %>% # Convert to a SNN graph cluster_graph_louvain() # Perform graph-based clustering cat(sprintf(\"Clusts length: %s\\n\", length(clusts))) clusts[1:10] ## Clusts length: 2600 ## [1] 1 2 1 2 2 3 2 2 4 5 ## Levels: 1 2 3 4 5 6 7 8 9 10 11 12 plot_embedding(clusts, umap)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"visualize-marker-genes","dir":"Articles","previous_headings":"RNA Normalization, PCA and UMAP","what":"Visualize marker genes","title":"Basic tutorial","text":"annotate clusters cell types, can plot several marker genes overlaid onto UMAP. observe cluster-specific enrichment B-cell marker MS4A1, T-cell marker CD3E, Monocyte marker LYZ. allows us make broad cell type groupings follows: can visualize marker genes cluster using dot plot. typical situations, known marker genes clear, others less specific.","code":"plot_embedding( source = mat, umap, features = c(\"MS4A1\", \"GNLY\", \"CD3E\", \"CD14\", \"FCER1A\", \"FCGR3A\", \"LYZ\", \"CD4\",\"CD8\"), ) cluster_annotations <- c( \"1\" = \"T\", \"2\" = \"CD8 T\", \"3\" = \"B\", \"4\" = \"T\", \"5\" = \"NK\", \"6\" = \"Mono\", \"7\" = \"Mono\", \"8\" = \"Mono\", \"9\" = \"T\", \"10\" = \"DC\", \"11\" = \"Mono\", \"12\" = \"DC\" ) cell_types <- cluster_annotations[clusts] plot_embedding(cell_types, umap) plot_dot( mat, c(\"MS4A1\", \"GNLY\", \"CD3E\", \"CD14\", \"FCER1A\", \"FCGR3A\", \"LYZ\", \"CD4\", \"CD8\"), cell_types )"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"atac-normalization-pca-and-umap","dir":"Articles","previous_headings":"","what":"ATAC Normalization, PCA and UMAP","title":"Basic tutorial","text":"start tile-based peak calling, tests pre-determined overlapping tile positions significant enrichment ATAC-seq signal genome-wide background cell type independently. faster using traditional peak-caller like MACS, though default parameters peaks always 200bp wide positioning resolution approximately +/- 30bp. Next compute peak matrix counting many ATAC-seq insertions overlap peak. save memory rather saving disk since dataset quite small. Next calculate TF-IDF normalization. formula TF-IDF variant Stuart et al. Looking LSI matrix, can see power BPCells performing matrix operations --fly: LSI normalization fact calculated time fragment overlap calculations read matrix. don’t need store intermediate matrices calculations, even peak matrix can re-calculated --fly based fragments object saved disk. Just like RNA, save matrix running PCA. larger dataset, save disk rather memory. Finally, z-score normalization LSI matrix run PCA. standard practice running PCA, commonly done ATAC-seq datasets due fact greatly increases memory usage. methods, 1st PC highly correlated number reads per cell, thrown empirical correction. Luckily, BPCells can avoid memory usage can just normalize data run PCA usual Next calculate UMAP cluster, just like RNA can plot ATAC-seq embedding ATAC-derived clusters, easily compare RNA-derived clusters earlier. BPCells works based order cells matrix fragment object. Since ATAC PCA rows cell order RNA clusters, datasets combine additional work. skip normalization, first observe get high correlation first PC reads-per-cell peak matrix terms actual PCA results, can see cell embeddings mostly 1--1 correspondence across first 6 PCs, though later PCs start diverge. first PC raw TF-IDF corresponds mostly read depth, signal spread across 2 PCs z-score normalized variant. look loading peak PCA, see similar result. Finally, UMAP generated exclude first PC fairly similar, though notable difference positioning dendritic cells","code":"frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE) peaks <- call_peaks_tile(frags_filter_blacklist, chrom_sizes, cell_groups=cell_types, effective_genome_size = 2.8e9) head(peaks) ## # A tibble: 6 × 7 ## chr start end group p_val q_val enrichment ## ## 1 chr1 16644600 16644800 T 0 0 1017. ## 2 chr19 18281733 18281933 Mono 0 0 518. ## 3 chr17 81860866 81861066 DC 7.87e- 63 1.21e- 55 552. ## 4 chr1 1724333 1724533 Mono 0 0 512. ## 5 chr1 228140000 228140200 NK 2.77e-162 2.14e-155 842. ## 6 chr8 30083133 30083333 CD8 T 4.80e-220 7.41e-213 744. top_peaks <- head(peaks, 50000) top_peaks <- top_peaks[order_ranges(top_peaks, chrNames(frags)),] peak_mat <- peak_matrix(frags, top_peaks, mode=\"insertions\") mat_lsi <- peak_mat %>% multiply_cols(1 / Matrix::colSums(peak_mat)) %>% multiply_rows(1 / Matrix::rowMeans(peak_mat)) mat_lsi <- log1p(10000 * mat_lsi) mat_lsi ## 50000 x 2600 IterableMatrix object with class TransformLog1p ## ## Row names: chr1:817200-817400, chr1:827466-827666 ... chrX:155881200-155881400 ## Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 ## ## Data type: double ## Storage order: row major ## ## Queued Operations: ## 1. Read compressed fragments from directory /mnt/c/Users/Immanuel/PycharmProjects/BPCells/r/vignettes/pbmc-3k-data/pbmc_3k_frags ## 2. Select 2600 cells by name: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 ## 3. Calculate 2600 peaks over 50000 ranges: chr1:817201-817400 ... chrX:155881201-155881400 ## 4. Convert type from uint32_t to double ## 5. Scale by 1e+04 ## 6. Scale columns by 6.71e-05, 5.17e-05 ... 0.000801 ## 7. Scale rows by 11.1, 2.78 ... 3.77 ## 8. Transform log1p mat_lsi <- write_matrix_memory(mat_lsi, compress=FALSE) # Compute colMean and colVariance in one pass cell_peak_stats <- matrix_stats(mat_lsi, col_stats=\"variance\")$col_stats cell_means <- cell_peak_stats[\"mean\",] cell_vars <- cell_peak_stats[\"variance\",] mat_lsi_norm <- mat_lsi %>% add_cols(-cell_means) %>% multiply_cols(1 / cell_vars) svd_atac <- BPCells::svds(mat_lsi_norm, k=10) pca_atac <- multiply_cols(svd_atac$v, svd_atac$d) pca_atac[1:4,1:4] ## [,1] [,2] [,3] [,4] ## [1,] -103.64071 1.553515 2.436147 21.22977 ## [2,] -44.75342 -28.737622 -12.681591 -10.01745 ## [3,] -90.74857 3.266168 3.627660 12.95109 ## [4,] -90.74640 -6.447103 6.853840 -15.76629 set.seed(12341512) umap_atac <- uwot::umap(pca_atac) umap_atac[1:4,] ## [,1] [,2] ## [1,] 7.341801 4.20816287 ## [2,] 5.417667 -0.05818889 ## [3,] 4.133997 9.12835273 ## [4,] 8.743609 0.89240464 clusts_atac <- knn_hnsw(pca_atac, ef=500) %>% # Find approximate nearest neighbors knn_to_snn_graph() %>% # Convert to a SNN graph cluster_graph_louvain() # Perform graph-based clustering plot_embedding(clusts_atac, umap_atac, colors_discrete = discrete_palette(\"ironMan\")) + ggplot2::guides(color=\"none\") + plot_embedding(cell_types, umap_atac) svd_atac_no_norm <- BPCells::svds(mat_lsi, k=10) pca_atac_no_norm <- multiply_cols(svd_atac_no_norm$v, svd_atac$d) cor_to_depth <- dplyr::bind_rows( tibble::tibble( method=\"z-score normalize\", abs_cor_to_depth = as.numeric(abs(cor(Matrix::colSums(mat_lsi), pca_atac))), PC=seq_along(abs_cor_to_depth) ), tibble::tibble( method=\"raw TF-IDF\", abs_cor_to_depth = as.numeric(abs(cor(Matrix::colSums(mat_lsi), pca_atac_no_norm))), PC=seq_along(abs_cor_to_depth) ) ) ggplot2::ggplot(cor_to_depth, ggplot2::aes(PC, abs_cor_to_depth, color=method)) + ggplot2::geom_point() + ggplot2::theme_bw() + ggplot2::labs(title=\"Correlation to of PCs to read depth\") cor_between_embeddings <- tidyr::expand_grid( pca_atac_no_norm = seq_len(ncol(pca_atac_no_norm)), pca_atac=seq_len(ncol(pca_atac)) ) %>% mutate( cor = as.numeric(abs(cor(.env$pca_atac, .env$pca_atac_no_norm))) ) ggplot2::ggplot(cor_between_embeddings, ggplot2::aes(pca_atac, pca_atac_no_norm, fill=abs(cor))) + ggplot2::geom_tile() + ggplot2::geom_text(mapping=ggplot2::aes(label=sprintf(\"%.2f\", cor))) + ggplot2::scale_x_continuous(breaks=1:10) + ggplot2::scale_y_continuous(breaks=1:10) + ggplot2::theme_classic() + ggplot2::labs(title=\"Correlation between cell embeddings\", x=\"z-score normalize PCs\", y =\"raw TF-IDF PCs\") cor_between_loadings <- tidyr::expand_grid( pca_atac_no_norm = seq_len(ncol(svd_atac_no_norm$u)), pca_atac=seq_len(ncol(svd_atac$u)) ) %>% mutate( cor = as.numeric(abs(cor(.env$svd_atac$u, .env$svd_atac_no_norm$u))) ) ggplot2::ggplot(cor_between_loadings, ggplot2::aes(pca_atac, pca_atac_no_norm, fill=abs(cor))) + ggplot2::geom_tile() + ggplot2::geom_text(mapping=ggplot2::aes(label=sprintf(\"%.2f\", cor))) + ggplot2::scale_x_continuous(breaks=1:10) + ggplot2::scale_y_continuous(breaks=1:10) + ggplot2::theme_classic() + ggplot2::labs(title=\"Correlation between peak loadings\", x=\"z-score normalize PCs\", y =\"raw TF-IDF PCs\") set.seed(12341512) umap_atac_no_norm <- uwot::umap(pca_atac_no_norm[,-1]) plot_embedding(clusts_atac, umap_atac_no_norm, colors_discrete = discrete_palette(\"ironMan\")) + ggplot2::guides(color=\"none\") + plot_embedding(cell_types, umap_atac_no_norm)"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"motif-footprinting","dir":"Articles","previous_headings":"","what":"Motif footprinting","title":"Basic tutorial","text":"motif footprinting, first need find instances motifs--interest peaks Next, can use motif positions plot aggregate accessibility surrounding TF binding sites across cell types proxy TF activity. ’re able see enrichment accessibility neighboring sites myeloid transcription factor DC Monocyte cells. Transcription factor binding (generally) mutually-exclusive nucleosome occupancy, transcription factor bound creates accessibility flanking regions. squiggly bit center due Tn5 insertion bias motif . can also use patchwork library show multiple plots grid, highlighting cell-type-specific factors well general factors like CTCF.","code":"suppressPackageStartupMessages({ library(GenomicRanges) library(Biostrings) }) peaks_sorted <- dplyr::arrange(peaks, chr, start) peaks_gr <- dplyr::mutate(peaks_sorted, start = start + 1) %>% as(\"GenomicRanges\") selected_motifs <- c( \"CEBPA\" = \"ENSG00000245848_LINE568_CEBPA_D_N4\", \"EOMES\" = \"ENSG00000163508_LINE3544_EOMES_D_N1\", \"SPI1\" = \"ENSG00000066336_LINE1813_SPI1_D_N5\", \"CTCF\" = \"ENSG00000102974_LINE747_CTCF_D_N67\" ) suppressWarnings({ motif_positions <- motifmatchr::matchMotifs( chromVARmotifs::human_pwms_v2[selected_motifs], peaks_gr, genome=\"hg38\", out=\"positions\") }) names(motif_positions) <- names(selected_motifs) motif_positions ## GRangesList object of length 4: ## $CEBPA ## GRanges object with 13983 ranges and 1 metadata column: ## seqnames ranges strand | score ## | ## [1] chr1 1060191-1060200 + | 7.24878 ## [2] chr1 1398356-1398365 - | 7.31950 ## [3] chr1 1408228-1408237 - | 7.91954 ## [4] chr1 1470604-1470613 + | 7.26055 ## [5] chr1 1614370-1614379 - | 7.33072 ## ... ... ... ... . ... ## [13979] chrX 154247973-154247982 + | 7.91954 ## [13980] chrX 154377819-154377828 - | 7.91954 ## [13981] chrX 154497506-154497515 + | 8.62478 ## [13982] chrX 154734157-154734166 + | 7.33771 ## [13983] chrX 155242494-155242503 - | 7.24396 ## ------- ## seqinfo: 39 sequences from an unspecified genome; no seqlengths ## ## ... ## <3 more elements> plot_tf_footprint( frags, motif_positions$CEBPA, cell_groups = cell_types, flank = 250, smooth = 2 ) + ggplot2::labs(title=\"CEBPA\") footprinting_plots <- list() for (motif in names(selected_motifs)) { footprinting_plots[[motif]] <- plot_tf_footprint( frags, motif_positions[[motif]], cell_groups = cell_types, flank=250, smooth=2) + ggplot2::labs(title=motif, color=\"Cluster\") } patchwork::wrap_plots(footprinting_plots, guides=\"collect\")"},{"path":"https://bnprks.github.io/BPCells/articles/pbmc3k.html","id":"genome-accessibility-tracks","dir":"Articles","previous_headings":"","what":"Genome accessibility tracks","title":"Basic tutorial","text":"plot genome accessibility tracks, need select genome region view. BPCells provides helper function find genome regions centered around gene. normalizing tracks, need provide total number reads cell type. can substituted total reads peaks metrics desired. can create first component track plot plotting genome tracks . can see small peak center mainly present B cells (top row), unclear sits relative B-cell marker CD19. much useful gene annotation track added . ’ll get set canonical transcripts (one per gene) Gencode can make annotation track optionally scale bar. Finally, can stack elements trackplot_combine(). Now see small peak just upstream CD19 gene.","code":"region <- gene_region(genes, \"CD19\", extend_bp = 1e5) region ## $chr ## [1] \"chr16\" ## ## $start ## [1] 28831970 ## ## $end ## [1] 29039342 read_counts <- atac_qc$nFrags[ match(cellNames(frags), atac_qc$cellName) ] coverage_plot <- trackplot_coverage( frags, region = region, groups=cell_types, read_counts, bins=500 ) coverage_plot transcripts <- read_gencode_transcripts(\"./references\", release=\"42\") head(transcripts) ## # A tibble: 6 × 13 ## chr source feature start end score strand frame gene_id gene_type ## ## 1 chr1 HAVANA transcript 65418 71585 . + . ENSG000001… protein_… ## 2 chr1 HAVANA exon 65418 65433 . + . ENSG000001… protein_… ## 3 chr1 HAVANA exon 65519 65573 . + . ENSG000001… protein_… ## 4 chr1 HAVANA exon 69036 71585 . + . ENSG000001… protein_… ## 5 chr1 HAVANA transcript 450739 451678 . - . ENSG000002… protein_… ## 6 chr1 HAVANA exon 450739 451678 . - . ENSG000002… protein_… ## # ℹ 3 more variables: gene_name , transcript_id , MANE_Select gene_plot <- trackplot_gene(transcripts, region) gene_plot scalebar_plot <- trackplot_scalebar(region) scalebar_plot # We list plots in order from top to bottom to combine. # Notice that our inputs are also just ggplot objects, so we can make modifications # like removing the color legend from our gene track. trackplot_combine( list( scalebar_plot, coverage_plot, gene_plot + ggplot2::guides(color=\"none\") ) )"},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"rna-seq-normalization-pca","dir":"Articles > Web-only","previous_headings":"","what":"RNA-seq normalization + PCA","title":"Performance Benchmarks","text":"BPCells can perform operations streaming disk, able use dramatically less memory traditional -memory workflows. also extensively optimized underlying C++ code make BPCells much faster traditional disk-backed tools like DelayedArray. benchmark , show time memory usage perform standardized workflow data normalization, variable gene selection, PCA. Note show Seurat’s -memory workflow, though Seurat v5 also offers BPCells integration disk-backed operations. tools given 3 hour time limit 256GB RAM, BPCells able process largest datasets within resource limits. Notice execution speed tends scale number non-zero entries matrix, whereas memory usage BPCells scales number cells (.e. space required store output PCA embeddings).","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"multi-threading","dir":"Articles > Web-only","previous_headings":"RNA-seq normalization + PCA","what":"Multi-threading","title":"Performance Benchmarks","text":"benchmark run single-threaded since tools support multi-threading. However, BPCells can offer 5-10x speedups multiple threads:","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"counts-matrices-rna-or-atac","dir":"Articles > Web-only","previous_headings":"Bitpacking compression","what":"Counts matrices (RNA or ATAC)","title":"Performance Benchmarks","text":"BPCells uses bitpacking compression help speed disk-backed workflows. general purpose compression algorithms shrink file sizes reduce disk read bandwidth, typically come high compute cost. BPCells able provide similar space savings general-purpose compression algorithms like gzip (10x HDF5) LZ4-Blosc (zarr), much lower compute cost. AnnData h5ad files default using compression due speed costs read/write, BPCells bitpacking compression can get faster read/write 4-7x space savings. , benchmark storing + loading 1.3M cell RNA-seq experiment 10x Genomics, using default compression settings storage format1. Notice BPCells compressed format fast enough provide faster read write speeds uncompressed h5ad2. Note don’t benchmark 10x HDF5 write, since 10x directly provide software perform arbitrary matrix writes. likely even slower 10x read speed.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"fragment-alignments-atac","dir":"Articles > Web-only","previous_headings":"Bitpacking compression","what":"Fragment alignments (ATAC)","title":"Performance Benchmarks","text":"addition RNA/ATAC counts matrices, BPCells also introduces compressed file formats scATAC-seq fragments. compared 10x fragment files, ArchR, SnapATAC2 storage 1M cell dataset, found BPCells gave smallest file sizes fastest read/write speeds. using bitpacking compression, BPCells can afford storage space keep fragments fully genome-sorted order. makes import 10x fragment files dramatically faster compared ArchR SnapATAC2 must re-sort fragments grouped cell. ’ll see later also speeds genomic overlap calculations.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"atac-seq-overlap-calculations","dir":"Articles > Web-only","previous_headings":"","what":"ATAC-seq overlap calculations","title":"Performance Benchmarks","text":"working large reference datasets, ’s often helpful able quickly re-quantify cell x peak overlap matrices directly fragments, datasets usually use different sets peak coordinates depending biology interest. BPCells stores fragments genome-sorted order, ’s able perform peak calculations much faster ArchR SnapATAC2. BPCells takes seconds find overlaps small set 10 peaks, also means fast calculating genomic coverage tracks visualization large datasets.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"m-cell-analysis-of-cellxgene-census","dir":"Articles > Web-only","previous_headings":"","what":"44M cell analysis of CELLxGENE census","title":"Performance Benchmarks","text":"combination fast storage disk-backed compute, BPCells able handle unique human cells CELLxGENE census laptop 16 threads 32GB RAM. Compared TileDB matrix storage format, found BPCells file formats offer much faster read/write times similar space usage3. Note dataset take 750GB store counts matrix without compression. much faster file read speeds, BPCells able light compute tasks like computing per-gene mean variance across 44M cells <3 minutes. PCA expensive operation BPCells performs, using 167 passes input matrix calculate 32 PCs. Still, completes less 1 hour server 6.2 hours laptop. makes atlas-scale analysis possible laptop leaving headroom server datasets order magnitude larger. Benchmark details: Mean variance computed log-normalized matrix, unlike plots BPCells manuscript just scaled read counts match default normalization CELLxGENE census. Laptop 32GB RAM 16 threads; server 256GB RAM 32 threads. See BPCells manuscript details.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/benchmarks.html","id":"update-log","dir":"Articles > Web-only","previous_headings":"","what":"Update log","title":"Performance Benchmarks","text":"Normalization & PCA: Add comparison datasets, add DelayedArray results, add multithreading plot RNA storage: Add comparisons zarr include write times. ATAC storage: Add 10x SnapATAC2 results include read times, switch comparison larger dataset. Peak matrix: Update benchmarks show cross-dataset results 100k peaks, peak subset results 1M cells rather 30k cells. Add comparisons SnapATAC2. Add results 44M cell analysis. March 30, 2023: Added clarification AnnData benchmarks referred h5ad default compression settings (.e. none). March 29, 2023: Created benchmark page.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"matrix-logical-storage-layout","dir":"Articles > Web-only","previous_headings":"","what":"Matrix Logical Storage Layout","title":"Matrix and Fragment Storage Formats","text":"data storage, use storage abstraction named data arrays, stored e.g. single group hdf5 directory files. matrix format compressed sparse column/row (CSC/CSR) format following data arrays: interpretation array follows: val - Values non-zero entries increasing order (column, row) position. index - index[] provides 0-based row index value found val[] (column index row-major storage order) idxptr - indexes idx val entries column j can found idxptr[j] idxptr[j+1] - 1 , inclusive. (row j row-major storage order) shape - number rows matrix, followed number columns row_names - Names row matrix (optional) col_names - Names column matrix (optional) storage_order- col compressed-sparse-column, row compressed-sparse-row Bitpacked compressed matrices consist following modifications: val: unsigned 32-bit integers, replace val val_data, val_idx, val_idx_offsets corresponding BP-128m1 encoding described . total number values already stored last value idxptr. 32-bit 64-bit floats val remains unchanged. index: replace index array BP-128d1z encoded data arrays index_data, index_idx, index_idx_offsets, index_starts matrix stored single directory, HDF5 group, R S4 object. storage format matrix encoded version string. current version string format [compression]-[datatype]-matrix-v2, [compression] can either packed unpacked, [datatype] can one uint, float, double corresponding 32-bit unsigned integer, 32-bit float, 64-bit double respectively. v1 formats, difference idxptr type uint32.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"genomic-fragments-logical-storage-layout","dir":"Articles > Web-only","previous_headings":"","what":"Genomic fragments logical storage layout","title":"Matrix and Fragment Storage Formats","text":"BPCells fragment files store (chromosome, start, end, cell ID) fragment, sorted (chromosome, start). coordinate system follows bed format convention, first base chromosome numbered 0 end coordinate fragment non-inclusive. means 10 base pair long fragment starting first base genome start=0 end=10. End coordinates always guaranteed least large start coordinates. Uncompressed fragment data stored following arrays: arrays following contents: cell: List numeric cell IDs, one per fragment. smallest cell ID 0. start: List fragment start coordinates. first base chromosome 0. end: List fragment end coordinates. base end coordinate one past last base fragment. end_max: end_max[] maximum end coordinate fragments start chromosome fragment index *128-127. multiple chromosomes fragments given chunk 128 fragments, end_max maximum end coordinates. end_max array allows quickly seeking fragments overlapping given genomic region. chr_ptr: chr_ptr[2*] index first fragment chromosome cell, start, end arrays. chr_ptr[2*+ 1]-1 index last fragment chromosome . Fragments need necessarily sorted order increasing chromosome ID, though fragments given chromosome must still stored contiguously. allows logically re-ordering chromosomes write-time even input data source support reading chromosomes --order (.e. 10x fragment files without genome index). cell_names: string identifiers numeric cell ID. chr_names: string identifiers numeric chromosome ID. Compressed fragments stored following modifications: cell replaced cell_data, cell_idx, cell_idx_offsets, compressed according BP-128 encoding. start replaced start_data, start_idx, start_idx_offsets, start_starts, compressed according BP-128d1 encoding. end replaced end_data, end_idx, end_idx_offsets, stores start - end fragment, encoded using BP-128 encoding. current version string equal unpacked-fragments-v2 uncompressed fragments, packed-fragments-v2 compressed fragments. v1 formats, difference chr_ptr type uint32.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bitpacking-formats","dir":"Articles > Web-only","previous_headings":"","what":"Bitpacking formats","title":"Matrix and Fragment Storage Formats","text":"bitpacked formats based formats described paper Lemire Boytsov.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128","title":"Matrix and Fragment Storage Formats","text":"vanilla BP-128 format stored 3 arrays follows: data - stream bitpacked data, represented 32-bit integers interleaved bit layout shown Lemire Boytsov figure 6. chunk 128 32-bit input integers BB bits per integer stored using 4B4B 32-bit integers holding bitpacked data. idx - list 32-bit integers, encoded data integers index 128*128*+ 127 can found data index idx[] index idx[+1]-1. lists 2322^{32} (4 billion) entries greater, idx stores index modulo 2322^{32} idx_offsets - list 64-bit integers, values idx indices idx_offsets[] idx_offsets[+1]-1 *(2^32) added .","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128m1","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128m1","title":"Matrix and Fragment Storage Formats","text":"BP-128, 1 subtracted value prior compression","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128d1","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128d1","title":"Matrix and Fragment Storage Formats","text":"Equivalent BP-128* algorithm Lemire Boytsov integers difference encoded prior bitpacking. best lists sorted integers. data - Encoding vanilla BP-128, difference encoding prior bitpacking: x0′=0x_{0}^{\\prime}=0, x1′=x1−x0x_{1}^{\\prime}=x_{1}-x_{0}, x2′=x2−x1x_{2}^{\\prime}=x_{2}-x_{1}, …, x127′=x127−x126x_{127}^{\\prime}=x_{127}-x_{126} idx, idx_offsets - identical BP-128 starts - list 32-bit integers, starts[] decoded value integer index 128*","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"bp-128d1z","dir":"Articles > Web-only","previous_headings":"Bitpacking formats","what":"BP-128d1z","title":"Matrix and Fragment Storage Formats","text":"Similar BP128d1 zigzag encoding applied difference encoding. best lists close fully sorted runs integers. data - Encoding BP-128d1, difference encoding bitpacking, results zigzag encoded, zigzag(x)=2xzigzag(x)=2x x≥0x\\geq0, zigzag(x)=−2x−1zigzag(x)=-2x-1 x<0x<0. idx, idx_offsets - identical BP-128 starts - identical BP128-d1 Illustrative reference code BP-128 d1 zigzag transformations can found .","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"physical-storage-layout","dir":"Articles > Web-only","previous_headings":"","what":"Physical storage layout","title":"Matrix and Fragment Storage Formats","text":"abstraction named data arrays can realized different formats. three currently supported BPCells :","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"directory-of-files-format","dir":"Articles > Web-only","previous_headings":"Physical storage layout","what":"Directory of files format:","title":"Matrix and Fragment Storage Formats","text":"default storage backend due simplicity high performance. Arrays stored binary files within directory. Numeric array files 8-byte header followed data values little-endian binary format integers IEEE-754 32-bit 64-bit floating point numbers. Header values 8-byte ASCII text follows: unsigned 32-bit integer UINT32v1, unsigned 64-bit integer UINT64v1, 32-bit float FLOATSv1, 64-bit float DOUBLEv1. Arrays strings stored ASCII text one array value per line header. version string stored file named “version” containing version string followed newline.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"hdf5-file-format","dir":"Articles > Web-only","previous_headings":"Physical storage layout","what":"Hdf5 file format:","title":"Matrix and Fragment Storage Formats","text":"storage backend can useful embedding BPCells formats group within h5ad HDF5 file. Arrays numbers stored HDF5 datasets using built-HDF5 encoding format. Arrays strings stored HDF5 variable length string datasets. version string stored version attribute HDF5 group.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/bitpacking-format.html","id":"r-object-format","dir":"Articles > Web-only","previous_headings":"Physical storage layout","what":"R object format:","title":"Matrix and Fragment Storage Formats","text":"storage backend primarily useful testing, bitpacking compression -memory data desired avoid disk bandwidth bottlenecks. Strings stored native R character arrays. Unsigned integers 32-bit floats stored native R integer arrays bitcasting R signed integers required data types. 64-bit floats stored native R numeric arrays. 64-bit integers stored doubles R numeric arrays. reduces highest representable value 264−12^{64}-1 253−12^{53}-1 (9 quadrillion), expect pose practical problems. Named collections arrays stored R lists (writing) S4 objects (reading). version string stored string vector named “version” length 1.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/how-it-works.html","id":"operating-principles","dir":"Articles > Web-only","previous_headings":"","what":"Operating Principles","title":"How BPCells works","text":"Two key principles understand using BPCells operations streaming lazy. Streaming means minimal amount data stored memory computation happening. almost memory used storing intermediate results. Hence, can compute operations large matrices without ever loading fully memory. Lazy means real work performed matrix fragment objects result needs returned R object written disk. helps support streaming computation, since otherwise forced compute intermediate results use additional memory.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/how-it-works.html","id":"basic-usage","dir":"Articles > Web-only","previous_headings":"Operating Principles","what":"Basic usage","title":"How BPCells works","text":"begin basic example loading ATAC fragments 10x fragments file, reading peak set bed file, calculating cell x peak matrix. bitpacked compressed fragment file half size 10x file, much faster read.","code":"library(\"BPCells\") # File reading is lazy, so this is instantaneous fragments <- open_fragments_10x(\"atac_fragments.tsv.gz\") # This is when we actually read the file, should take 1-2 minutes to scan # since we bottleneck on gzip decompression. packed_fragments <- write_fragments_dir(fragments, \"pbmc-3k-fragments\") # Later, we can re-open these fragments packed_fragments <- open_fragments_dir(\"pbmc-3k_fragments\") peaks <- read_bed(\"peaks.bed\") # This is fast because the peak matrix calculation is lazy. # It will be computed on-the-fly when we actually need results from it. peak_matrix <- peak_matrix(packed_fragments, peaks) # Here is where the peak matrix calculation happens. Runs over 10-times # faster than ArchR, which utilizes IRanges to perform overlap calculations. R_matrix <- as(peak_matrix, \"dgCMatrix\")"},{"path":"https://bnprks.github.io/BPCells/articles/web-only/how-it-works.html","id":"streaming-operations","dir":"Articles > Web-only","previous_headings":"Operating Principles","what":"Streaming operations","title":"How BPCells works","text":"lazy, stream-oriented design means can calculate complicated transformations single pass. faster memory-efficient calculating several intermediate results sequential manner. example, perform following pipeline: 1. Exclude fragments non-standard chromosomes 2. Subset cells 3. Add Tn5 offset 4. Calculate peak matrix 5. Calculate mean-accessibility per peak done using e.g. GRanges sparse matrices, need 3 passes fragments saving intermediate results, 2 passes peak matrix. BPCell’s streaming operations, can done directly fragments single pass, memory usage limited bytes per cell iterating peak matrix returning colMeans. Note knew cell names ahead time, even perform operation directly orignal 10x fragments without ever saving fragments memory. fairly slow 10x fragment files slow decompress, ’s recommended convert BPCells format.","code":"# Here I make use of the new pipe operator |> for better readability # We'll subset to just the standard chromosomes standard_chr <- which( stringr::str_detect(chrNames(packed_fragments), \"^chr[0-9XY]+$\") ) # Pick a random subset of 100 cells to consider set.seed(1337) keeper_cells <- sample(cellNames(packed_fragments), 100) # Run the pipeline, and save the average accessibility per peak peak_accessibility <- packed_fragments |> select_chromosomes(standard_chr) |> select_cells(keeper_cells) |> shift_fragments(shift_start=4, shift_end=-5) |> peak_matrix(peaks) |> colMeans()"},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-efficiency.html","id":"normalizations-and-pca","dir":"Articles > Web-only","previous_headings":"","what":"Normalizations and PCA","title":"Efficiency tips","text":"Avoid dense matrices whenever possible. Put normalizations preserve sparsity (0 values stay 0) normalizations break sparsity (e.g. adding values row/column). typical RNA-seq matrix <5% non-zero entries, code operate 20x entries dense matrix. operations, recommend using lazy evaluation avoid creating intermediate matrices. one common exception rule running PCA. PCA requires looping matrix several hundred times, often faster write matrix disk just PCA rather recalculating entries PCA iteration. storage efficiency, keep sparsity-breaking normalizations delayed, store sparse normalizations temporary location write_matrix_dir() apply sparsity-breaking normalizations Adding values rows/columns matrix little overhead PCA translates pre post processing step mat-vec multiply iteration. sparsity-breaking operation, adding vector matrix causes operations become expensive, however.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-efficiency.html","id":"storage-order","dir":"Articles > Web-only","previous_headings":"","what":"Storage order","title":"Efficiency tips","text":"Marker features can computed matrix indexed gene/feature. Sparse matrix multiplication can performed matrices storage order Sparse matrix multiplication performance can change dramatically depending storage order relative matrix size/sparsity. column-major matrices, left matrix fast load contain delayed operations, right matrix can slow load contain many delayed operations. row-major matrices left/right preferences reversed. can check storage order matrix printing R terminal calling t() function, BPCells just flips boolean flag whether matrix row-major column-major. affect underlying storage order. adjust underlying storage order, call transpose_storage_order(). slower operation, requires writing new copy data disk.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-efficiency.html","id":"other-tips","dir":"Articles > Web-only","previous_headings":"","what":"Other tips","title":"Efficiency tips","text":"running disk-backed analysis, always try store working copy data fast local SSDs. default laptops, servers may want copy data files networked file system physically attached SSD best performance. Use single call matrix_stats() calculate mean + variance single pass matrix possible. See function reference details. ATAC-seq data, can calculate variable features tile matrix without ever saving disk. allows subset variable tiles create peak matrix just variable tiles space savings.","code":""},{"path":"https://bnprks.github.io/BPCells/articles/web-only/programming-philosophy.html","id":"working-without-a-project-object","dir":"Articles > Web-only","previous_headings":"","what":"Working without a project object","title":"Programming Philosophy","text":"Imagine want plot UMAP cells colored cluster. BPCells, way providing: 1, matrix cells x UMAP coordinates 2. vector listing cells belong cluster correspondence cells clusters determined based ordering. rows UMAP matrix order cluster membership vector. keep simple, recommend following approach: See tutorial example, make keeper_cells vector order data consistently according list cell IDs. downstream operations (PCA, clustering, etc.), cell order preserved unless explicitly change . things “just work” keep track per-cell metadata, can helpful make data frame tracking sample IDs, cluster membership, metadata Working without project object provides lot flexibility, since user can easily swap UMAP embeddings, cluster assignments, etc. just providing different variable input. ’s also need “export” metadata since wasn’t import step begin . course, power come additional responsibility keep track metadata. Keeping BPCells flexible power users retaining ease--use newbies ongoing effort, BPCells currently falls side power users","code":""},{"path":"https://bnprks.github.io/BPCells/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Benjamin Parks. Author, maintainer, copyright holder. Immanuel Abdi. Author. Stanford University. Copyright holder, funder. Genentech, Inc.. Copyright holder, funder.","code":""},{"path":"https://bnprks.github.io/BPCells/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Parks B, Abdi (2025). BPCells: Single Cell Counts Matrices PCA. R package version 0.3.1, https://bnprks.github.io/BPCells.","code":"@Manual{, title = {BPCells: Single Cell Counts Matrices to PCA}, author = {Benjamin Parks and Immanuel Abdi}, year = {2025}, note = {R package version 0.3.1}, url = {https://bnprks.github.io/BPCells}, }"},{"path":"https://bnprks.github.io/BPCells/index.html","id":"bpcells","dir":"","previous_headings":"","what":"Single Cell Counts Matrices to PCA","title":"Single Cell Counts Matrices to PCA","text":"site R package. Python site (experimental) BPCells package high performance single cell analysis large RNA-seq ATAC-seq datasets. can run normalization PCA 1.3M cell dataset 4 minutes 2GB RAM, create scATAC-seq peak matrices fragment coordinates 50x less CPU time ArchR SnapATAC2. BPCells can even handle full CELLxGENE census human dataset, running full precision PCA 44M cell x 60k gene matrix 6 hours laptop <1 hour server. See benchmarks page details. BPCells provides: Efficient storage single cell datasets via bitpacking compression Fast, disk-backed RNA-seq ATAC-seq data processing powered C++ Downstream analysis marker genes, clustering Interoperability AnnData, 10x datasets, R sparse matrices, GRanges Demonstrated scalability 44M cells laptop Additionally, BPCells exposes optimized data processing infrastructure use scaling 3rd party single cell tools (e.g. Seurat)","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"learn-more","dir":"","previous_headings":"","what":"Learn more","title":"Single Cell Counts Matrices to PCA","text":"BioRxiv preprint Benchmarks Multiomic analysis example BPCells works Additional articles Function documentation News","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"r-installation","dir":"","previous_headings":"","what":"R Installation","title":"Single Cell Counts Matrices to PCA","text":"recommend installing BPCells directly github: installing, must HDF5 library installed accessible system. HDF5 can installed choice package manager. See operating system specific instructions . Mac Windows users trouble installing github, check R-universe page instructions install pre-built binary packages. binary packages automatically track latest github main branch. BPCells available via conda thanks @mfansler Conda Forge R team (see issue #241 details). issues bioconda package reported bioconda-recipes. Version updates managed bioconda team.","code":"remotes::install_github(\"bnprks/BPCells/r\")"},{"path":"https://bnprks.github.io/BPCells/index.html","id":"linux","dir":"","previous_headings":"R Installation","what":"Linux","title":"Single Cell Counts Matrices to PCA","text":"Obtaining HDF5 dependency usually pretty straightforward Linux apt: sudo apt-get install libhdf5-dev yum: sudo yum install hdf5-devel Note: Linux users prefer distro’s package manager (e.g. apt yum) possible, appears give slightly reliable installation experience.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"windows","dir":"","previous_headings":"R Installation","what":"Windows","title":"Single Cell Counts Matrices to PCA","text":"Compiling R packages source Windows requires installing R tools Windows. See Issue #9 discussion.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"macos","dir":"","previous_headings":"R Installation","what":"MacOS","title":"Single Cell Counts Matrices to PCA","text":"MacOS, installing HDF5 homebrew seems reliable: brew install hdf5. Mac-specific troubleshooting: Check R installation running sessionInfo(), seeing lists ARM x86 “Platform”. easiest option use ARM R homebrew default ARM hdf5 installation possible (though tricky) install x86 copy homebrew order access x86 version hdf5 Older Macs (10.14 Mojave older): default compiler old Macs support needed C++17 filesystem features. See issue #3 tips getting newer compiler set via homebrew.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"supported-compilers","dir":"","previous_headings":"R Installation","what":"Supported compilers","title":"Single Cell Counts Matrices to PCA","text":"cases, already appropriate compiler. BPCells recommends gcc >=9.1, clang >= 9.0. corresponds versions late-2018 newer. Older versions may work cases long basic C++17 support, officially supported.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"general-installation-troubleshooting","dir":"","previous_headings":"R Installation","what":"General Installation troubleshooting","title":"Single Cell Counts Matrices to PCA","text":"BPCells tries print informative error messages compilation help diagnose problem. verbose set information, run Sys.setenv(BPCELLS_DEBUG_INSTALL=\"true\") prior remotes::install_github(\"bnprks/BPCells/r\"). still can’t solve issue additional information, feel free file Github issue, sure use collapsible section verbose installation log.","code":""},{"path":"https://bnprks.github.io/BPCells/index.html","id":"contributing","dir":"","previous_headings":"","what":"Contributing","title":"Single Cell Counts Matrices to PCA","text":"BPCells open source project, welcome quality contributions. interested contributing experience C++, along Python R, feel free reach ideas like implement . ’re happy provide pointers get started, time permitting. unfamiliar C++ difficult contribute code, detailed bug reports reproducible examples still great way help . Github issues best forum . maintain single cell analysis package want use BPCells improve scalability, ’re happy provide advice. couple labs try far, promising success. Email best way get touch (look DESCRIPTION file github contact info). Python developers welcome, though current python package still experimental status.","code":""},{"path":"https://bnprks.github.io/BPCells/python/api/fragments.html","id":null,"dir":"Python > Api","previous_headings":"","what":"Fragment functions#","title":null,"text":"experimental.import_10x_fragments(input, output) Convert 10x fragment file BPCells format experimental.build_cell_groups(fragments, ...) Build cell_groups array use pseudobulk_insertion_counts() experimental.pseudobulk_insertion_counts(...) Calculate pseudobulk coverage matrix experimental.precalculate_insertion_counts(...) Precalculate per-base insertion counts fragment data experimental.PrecalculatedInsertionMatrix(path) Disk-backed precalculated insertion matrix","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/api/matrix.html","id":null,"dir":"Python > Api","previous_headings":"","what":"Matrix functions#","title":null,"text":"experimental.DirMatrix(dir) Disk-backed BPCells integer matrix experimental.MemMatrix(dir[, threads]) -memory BPCells integer matrix","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.T.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.T#","title":null,"text":"property DirMatrix.T: DirMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_h5ad.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_h5ad#","title":null,"text":"classmethod DirMatrix.from_h5ad(h5ad_path: str, out_dir: str, group: str = 'X') → DirMatrix[source]# Create DirMatrix h5ad file. Truncates floating point values integers Parameters: h5ad_path (str) – Path h5ad file out_dir (str) – Output path DirMatrix group (str, optional) – HDF5 group read matrix . Defaults “X”. Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_hstack.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_hstack#","title":null,"text":"classmethod DirMatrix.from_hstack(mats: List[DirMatrix], out_dir: str) → DirMatrix[source]# Create DirMatrix concatenating list DirMatrix objects horizontally (column wise) Parameters: mats (List[DirMatrix]) – List input matrices out_dir (str) – Output path DirMatrix Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_scipy_sparse.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_scipy_sparse#","title":null,"text":"classmethod DirMatrix.from_scipy_sparse(scipy_mat: spmatrix, dir: str) → DirMatrix[source]# Create DirMatrix scipy sparse matrix. write compressed sparse column format input types scipy.sparse.csr_matrix Parameters: scipy_mat (scipy.spmatrix) – Scipy sparse matrix dir (str) – Path write matrix Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.from_vstack.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.from_vstack#","title":null,"text":"classmethod DirMatrix.from_vstack(mats: List[DirMatrix], out_dir: str) → DirMatrix[source]# Create DirMatrix concatenating list DirMatrix objects vertically (row wise) Parameters: mats (List[DirMatrix]) – List input matrices out_dir (str) – Output path DirMatrix Returns: View matrix written disk Return type: DirMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix#","title":null,"text":"class bpcells.experimental.DirMatrix(dir: str)[source]# Disk-backed BPCells integer matrix reads BPCells-format matrices, returning scipy.sparse.csc_matrix objects sliced. Parameters: dir (str) – Path matrix directory Examples Attributes DirMatrix.T Return transposed view matrix DirMatrix.shape Dimensions matrix DirMatrix.threads Number threads use reading (default=1) Methods DirMatrix.from_h5ad(h5ad_path, out_dir[, group]) Create DirMatrix h5ad file. DirMatrix.from_hstack(mats, out_dir) Create DirMatrix concatenating list DirMatrix objects horizontally (column wise) DirMatrix.from_scipy_sparse(scipy_mat, dir) Create DirMatrix scipy sparse matrix. DirMatrix.from_vstack(mats, out_dir) Create DirMatrix concatenating list DirMatrix objects vertically (row wise) DirMatrix.transpose() Return transposed view matrix","code":">>> from bpcells import DirMatrix >>> mat = DirMatrix(\"/path/to/matrix\") >>> mat[:,[1,3,2,4]] <3x4 sparse matrix of type '' with 6 stored elements in Compressed Sparse Column format>"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.shape.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.shape#","title":null,"text":"DirMatrix.shape# Dimensions matrix Type: Tuple[int,int]","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.threads.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.threads#","title":null,"text":"DirMatrix.threads# Number threads use reading (default=1) Type: int","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.DirMatrix.transpose.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.DirMatrix.transpose#","title":null,"text":"DirMatrix.transpose() → DirMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.T.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.T#","title":null,"text":"property MemMatrix.T: MemMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix#","title":null,"text":"class bpcells.experimental.MemMatrix(dir: str, threads: int = 0)[source]# -memory BPCells integer matrix reads BPCells-format matrices disk, returning scipy.sparse.csc_matrix objects sliced. much memory-intensive, consistently fast random reads Parameters: dir (str) – Path matrix directory Examples Attributes MemMatrix.T Return transposed view matrix MemMatrix.shape Dimensions matrix MemMatrix.threads Threads used reads (default=1) Methods MemMatrix.transpose() Return transposed view matrix","code":">>> from bpcells import MemMatrix >>> mat = MemMatrix(\"/path/to/matrix\") >>> mat[:,[1,3,2,4]] <3x4 sparse matrix of type '' with 6 stored elements in Compressed Sparse Column format>"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.shape.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.shape#","title":null,"text":"MemMatrix.shape# Dimensions matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.threads.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.threads#","title":null,"text":"MemMatrix.threads# Threads used reads (default=1)","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.MemMatrix.transpose.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.MemMatrix.transpose#","title":null,"text":"MemMatrix.transpose() → MemMatrix[source]# Return transposed view matrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.PrecalculatedInsertionMatrix.get_counts.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.PrecalculatedInsertionMatrix.get_counts#","title":null,"text":"PrecalculatedInsertionMatrix.get_counts(regions: DataFrame)[source]# Load pseudobulk insertion counts Parameters: regions (pandas.DataFrame) – Pandas dataframe columns (chrom, start, end) representing genomic ranges (0-based, end-exclusive like BED format). regions must size. chrom string column; start/end numeric. Returns: Numpy array dimensions (region, psudobulks, position) type numpy.int32 Return type: numpy.ndarray","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.PrecalculatedInsertionMatrix.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.PrecalculatedInsertionMatrix#","title":null,"text":"class bpcells.experimental.PrecalculatedInsertionMatrix(path: str)[source]# Disk-backed precalculated insertion matrix reads per-base precalculated insertion matrices. current implementation EXPERIMENTAL, crash matrices 2^32-1 non-zero entries. Parameters: dir (str) – Path matrix directory See also precalculate_insertion_counts() Attributes PrecalculatedInsertionMatrix.shape Methods PrecalculatedInsertionMatrix.get_counts(regions) Load pseudobulk insertion counts","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.PrecalculatedInsertionMatrix.shape.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.PrecalculatedInsertionMatrix.shape#","title":null,"text":"property PrecalculatedInsertionMatrix.shape: Tuple[int, int][source]#","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.build_cell_groups.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.build_cell_groups#","title":null,"text":"bpcells.experimental.build_cell_groups(fragments: str, cell_ids: Sequence[str], group_ids: Sequence[str], group_order: Sequence[str]) → ndarray[source]# Build cell_groups array use pseudobulk_insertion_counts() Parameters: fragments (str) – Path BPCells fragments directory cell_ids (list[str]) – List cell IDs group_ids (list[str]) – List pseudobulk IDs cell (length cell_ids) group_order (list[str]) – Output order pseudobulks (Contain unique group_ids) Returns: Numpy array suitable input cell_groups pseudobulk_insertion_counts(). length total number cells fragments input, specifying output pseudobulk index cell (-1 cell excluded consideration) Return type: numpy.ndarray See also pseudobulk_insertion_counts()","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.import_10x_fragments.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.import_10x_fragments#","title":null,"text":"bpcells.experimental.import_10x_fragments(input: str, output: str, shift_start: int = 0, shift_end: int = 0, keeper_cells: List[str] | None = None)[source]# Convert 10x fragment file BPCells format Parameters: input (str) – Path 10x input file output (str) – Path BPCells output directory shift_start (int) – Basepairs add start coordinates (generally positive number) shift_end (int) – Basepairs subtract end coordinates (generally negative number) keeper_cells (list[str]) – None, save fragments cells keeper_cells list","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.precalculate_insertion_counts.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.precalculate_insertion_counts#","title":null,"text":"bpcells.experimental.precalculate_insertion_counts(fragments: str, output_dir: str, cell_groups: Sequence[int], chrom_sizes: str | Dict[str, int], threads: int = 0)[source]# Precalculate per-base insertion counts fragment data current implementation EXPERIMENTAL, crash matrices 2^32-1 non-zero entries. Parameters: fragments (str) – Path BPCells fragments directory output_dir (str) – Path save insertion counts cell_groups (list[int]) – List pseudbulk groupings created build_cell_groups() chrom_sizes (str | dict[str, int]) – Path/URL UCSC-style chrom.sizes file, dictionary mapping chromosome names sizes threads (int) – Number threads use matrix calculation (default = 1) Returns: PrecalculatedInsertionMatrix object See also PrecalculatedInsertionMatrix","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/generated/bpcells.experimental.pseudobulk_insertion_counts.html","id":null,"dir":"Python > Generated","previous_headings":"","what":"bpcells.experimental.pseudobulk_insertion_counts#","title":null,"text":"bpcells.experimental.pseudobulk_insertion_counts(fragments: str, regions: DataFrame, cell_groups: Sequence[int], bin_size: int = 1) → ndarray[source]# Calculate pseudobulk coverage matrix Coverage calculated number start/end coordinates falling given position bin. Parameters: fragments (str) – Path BPCells fragments directory regions (pandas.DataFrame) – Pandas dataframe columns (chrom, start, end) representing genomic ranges (0-based, end-exclusive like BED format). regions must size. chrom string column; start/end numeric. cell_groups (list[int]) – List pseudbulk groupings created build_cell_groups() bin_size (int) – Size bins within region given basepairs. region width even multiple resolution_bp, last region may truncated. Returns: Numpy array dimensions (region, psudobulks, position) type numpy.int32 Return type: numpy.ndarray See also build_cell_groups()","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"BPCells#","title":null,"text":"BPCells python bindings still experimental API subject change. existing functionality mainly focused allowing read/write access BPCells file formats integer matrices scATAC fragments. Future updates add data-processing functions present R interface (e.g. streaming normalization, PCA, ATAC-seq peak/tile matrix creation). provide Python access shared C++ core code. Notably, plotting functionality currently planned implementation, written primarily R relies R plotting libraries present Python. helper functions R BPCells implemented pure R thus unlikely added Python near future. functionality interest , welcome contributions – able write code pure Python. Reach via github/email interested. BPCells can directly installed via pip: Matrix slicing Basepair insertion dataloading Fragment functions Matrix functions Installation Tutorials API Reference R Docs","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"Installation#","title":null,"text":"BPCells can directly installed via pip:","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"Tutorials#","title":null,"text":"Matrix slicing Basepair insertion dataloading","code":""},{"path":"https://bnprks.github.io/BPCells/python/index.html","id":null,"dir":"Python","previous_headings":"","what":"API Reference#","title":null,"text":"Fragment functions Matrix functions Installation Tutorials API Reference R Docs","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Basepair insertion counts tutorial#","title":null,"text":"BPCells python bindings can used query basepair-level coverage predefined cell types. way works two steps: 10x ArchR arrow files converted BPCells format. flexible BPCells R bindings, though single-sample 10x import supported python bindings. BPCells python bindings use input fragment files create large matrix dimensions (# cell types, # basepairs genome). cell type groupings determined. BPCells python bindings can slice arbitrary genomic regions, returning numpy array dimensions (regions, cell types, basepairs) Benchmark dataset: 600K cell subset Catlas paper, 2.5 billion fragments Benchmark task: Load 128 random 501-bp peak regions 111 cell types basepair resolution Storage location: Local SSD. Networked file systems slower BPCells BigWigs Creation time 4.7 minutes, 8 threads ? File size 6.2 GB 13 GB Query time 0.37 seconds 2.2 seconds Cell type count aggregation can re-run fully Python Query time 6x faster BigWigs Caveat prototype: due development time limitations, insertion matrix implementation support >=2^32 non-zero entries (4.29 billion). catlas dataset 3.2 billion non-zero entries. limitation can removed additional technical work, workaround multiple matrix objects can created individually <2^32 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Benchmark estimates#","title":null,"text":"Benchmark dataset: 600K cell subset Catlas paper, 2.5 billion fragments Benchmark task: Load 128 random 501-bp peak regions 111 cell types basepair resolution Storage location: Local SSD. Networked file systems slower BPCells BigWigs Creation time 4.7 minutes, 8 threads ? File size 6.2 GB 13 GB Query time 0.37 seconds 2.2 seconds","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Main benefits of BPCells#","title":null,"text":"Cell type count aggregation can re-run fully Python Query time 6x faster BigWigs Caveat prototype: due development time limitations, insertion matrix implementation support >=2^32 non-zero entries (4.29 billion). catlas dataset 3.2 billion non-zero entries. limitation can removed additional technical work, workaround multiple matrix objects can created individually <2^32 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Usage Demo#","title":null,"text":"use public 500-cell 10x dataset 484 rows × 20 columns Notice conversion allows adjusting start/end coordinates, well subsetting barcodes passing QC. Adding 1 end coordinate necessary 10x inputs produced cellranger calculate insertion matrix, first define cell groups, well ordering cell groups want output matrix. use first two characters cell barcode since annotated cell types available. Note possible leave cells calling build_cell_groups, case data included precalculated matrix Next, precalculate insertion counts matrix, can use parallelization speed portions work. can load pre-calculated matrix input path. query matrix, use pandas DataFrame, columns (chrom, start, end). regions must length BPCells returns numpy array dimensions (regions, cell types, basepairs), holding per-base counts cell type simple wrap matrix pytorch-compatible dataset, given set regions training set. Note use non-standard __getitems__() function pytorch uses provide batched loading higher performance. dataset object can directly passed torch.utils.data.DataLoader.","code":"import bpcells.experimental import pandas as pd import os.path import subprocess import tempfile tmpdir = tempfile.TemporaryDirectory() fragments_10x_path = os.path.join(tmpdir.name, \"atac_fragments.tsv.gz\") data_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz\" subprocess.run([\"curl\", \"--silent\", data_url], stdout=open(fragments_10x_path, \"w\")) CompletedProcess(args=['curl', '--silent', 'https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz'], returncode=0) metadata_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_singlecell.csv\" metadata_path = os.path.join(tmpdir.name, \"cell_metadata.csv\") subprocess.run([\"curl\", \"--silent\", metadata_url], stdout=open(metadata_path, \"w\")) cell_metadata = pd.read_csv(metadata_path) cell_metadata = cell_metadata[cell_metadata.is__cell_barcode == 1].reset_index() cell_metadata cell_metadata.is__cell_barcode.sum() np.int64(484) %%time fragments_bpcells_path = os.path.join(tmpdir.name, \"bpcells_fragments\") bpcells.experimental.import_10x_fragments( input = fragments_10x_path, output = fragments_bpcells_path, shift_end=1, keeper_cells=cell_metadata.barcode[cell_metadata.is__cell_barcode == 1] ) CPU times: user 3.43 s, sys: 74.9 ms, total: 3.51 s Wall time: 3.45 s %%time barcodes = cell_metadata.barcode clusters = cell_metadata.barcode.str.slice(0,2) cluster_order = sorted(set(clusters)) cell_groups_array = bpcells.experimental.build_cell_groups(fragments_bpcells_path, barcodes, clusters, cluster_order) # We could provide a dict or local file path, but URL is easier chrom_sizes = \"http://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes\" insertions_matrix_path = os.path.join(tmpdir.name, \"bpcells_insertions_matrix\") bpcells.experimental.precalculate_insertion_counts( fragments_bpcells_path, insertions_matrix_path, cell_groups_array, chrom_sizes, threads=4 ) CPU times: user 3min 8s, sys: 710 ms, total: 3min 8s Wall time: 1min 45s Notebooks","previous_headings":"","what":"Data download#","title":null,"text":"use public 500-cell 10x dataset 484 rows × 20 columns","code":"import os.path import subprocess import tempfile tmpdir = tempfile.TemporaryDirectory() fragments_10x_path = os.path.join(tmpdir.name, \"atac_fragments.tsv.gz\") data_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz\" subprocess.run([\"curl\", \"--silent\", data_url], stdout=open(fragments_10x_path, \"w\")) CompletedProcess(args=['curl', '--silent', 'https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fragments.tsv.gz'], returncode=0) metadata_url = \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_singlecell.csv\" metadata_path = os.path.join(tmpdir.name, \"cell_metadata.csv\") subprocess.run([\"curl\", \"--silent\", metadata_url], stdout=open(metadata_path, \"w\")) cell_metadata = pd.read_csv(metadata_path) cell_metadata = cell_metadata[cell_metadata.is__cell_barcode == 1].reset_index() cell_metadata cell_metadata.is__cell_barcode.sum() np.int64(484)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Convert to BPCells format#","title":null,"text":"Notice conversion allows adjusting start/end coordinates, well subsetting barcodes passing QC. Adding 1 end coordinate necessary 10x inputs produced cellranger","code":"%%time fragments_bpcells_path = os.path.join(tmpdir.name, \"bpcells_fragments\") bpcells.experimental.import_10x_fragments( input = fragments_10x_path, output = fragments_bpcells_path, shift_end=1, keeper_cells=cell_metadata.barcode[cell_metadata.is__cell_barcode == 1] ) CPU times: user 3.43 s, sys: 74.9 ms, total: 3.51 s Wall time: 3.45 s"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/fragment_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Create the insertion matrix#","title":null,"text":"calculate insertion matrix, first define cell groups, well ordering cell groups want output matrix. use first two characters cell barcode since annotated cell types available. Note possible leave cells calling build_cell_groups, case data included precalculated matrix Next, precalculate insertion counts matrix, can use parallelization speed portions work.","code":"%%time barcodes = cell_metadata.barcode clusters = cell_metadata.barcode.str.slice(0,2) cluster_order = sorted(set(clusters)) cell_groups_array = bpcells.experimental.build_cell_groups(fragments_bpcells_path, barcodes, clusters, cluster_order) # We could provide a dict or local file path, but URL is easier chrom_sizes = \"http://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes\" insertions_matrix_path = os.path.join(tmpdir.name, \"bpcells_insertions_matrix\") bpcells.experimental.precalculate_insertion_counts( fragments_bpcells_path, insertions_matrix_path, cell_groups_array, chrom_sizes, threads=4 ) CPU times: user 3min 8s, sys: 710 ms, total: 3min 8s Wall time: 1min 45s Notebooks","previous_headings":"","what":"Querying the insertion matrix#","title":null,"text":"can load pre-calculated matrix input path. query matrix, use pandas DataFrame, columns (chrom, start, end). regions must length BPCells returns numpy array dimensions (regions, cell types, basepairs), holding per-base counts cell type","code":"mat = bpcells.experimental.PrecalculatedInsertionMatrix(insertions_matrix_path) mat Notebooks","previous_headings":"","what":"Pytorch-compatible dataset#","title":null,"text":"simple wrap matrix pytorch-compatible dataset, given set regions training set. Note use non-standard __getitems__() function pytorch uses provide batched loading higher performance. dataset object can directly passed torch.utils.data.DataLoader.","code":"class BPCellsDataset: def __init__(self, regions, matrix_dir): self.regions = regions[[\"chrom\", \"start\", \"end\"]] matrix_dir = str(os.path.abspath(os.path.expanduser(matrix_dir))) self.mat = bpcells.experimental.PrecalculatedInsertionMatrix(matrix_dir) peak_width = self.regions.end[0] - self.regions.start[0] assert (self.regions.end - self.regions.start == peak_width).all() def __getitem__(self, i): return self.__getitems__([i])[0] def __getitems__(self, idx): # Adding this function allows for batched loading # See: https://github.com/pytorch/pytorch/issues/107218 # Return tensor of shape (batch_size, n_tasks, basepairs) return self.mat.get_counts( self.regions.iloc[idx,] ) def __len__(self): return self.regions.shape[0]"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Matrix slicing tutorial#","title":null,"text":"BPCells prototype Python bindings allow matrix creation slicing, optional multithreaded reads. scimilarity dataset 15M human cells, compressed storage 64GB (2.2 bytes/non-zero) Read speeds 10k random cells 15M human cells (range 5 random tests) Storage location 1 thread 4 threads Memory 2.8-4.7 seconds 1.0-1.1 seconds Local SSD 4.5-4.9 seconds 1.5-1.7 seconds Networked FS (warm cache) 20-21 seconds 5.5-6.2 seconds Networked FS (cold cache) 🙁 76-115 seconds Slicing matrix returns scipy.sparse matrix can use many slicing options standard numpy matrices can also make transposed view matrix similar numpy. work done, just switch row-major col-major representations matrix path 13 files (compressed integer matrices), contain data metadata can concatenate multiple matrices single file disk low memory usage. allows importing many samples parallel, concatenating together single matrix larger matrices, can desirable perform matrix reading multi-threaded manner. using multiple threads, BPCells divide matrix slice query chunks loaded parallel, recombined memory threads completed. performing random slicing along major storage axis, seek latency primary performance bottleneck. Setting high number threads (even actual core count machine) can help mitigate filesystem seek latency. slicing across non-major storage axis, decompression speed can become performance bottleneck. Setting threads number available cores can help parallelize decompression speed. cell-major RNA-seq matrices, thread can process compressed input rate 1 GB/s, filesystems >1GB/s sequential read speeds benefit parallelization. neural network training use-cases, fast slicing performance may critical avoid bottlenecking data loads. case, BPCells supports loading compressed data memory, eliminates seek latency saving ~4x memory usage compared uncompressed scipy sparse matrix. Loading can performed existing BPCells matrix directory, current version involves re-compressing data -memory load time (avoidable, bit trickier code direct loading isn’t implemented yet)","code":"import bpcells.experimental import os import tempfile import numpy as np import scipy.sparse tmp = tempfile.TemporaryDirectory() os.chdir(tmp.name) mat = scipy.sparse.csc_matrix(np.array([ [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]] )) mat mat.toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]]) bp_mat = bpcells.experimental.DirMatrix.from_scipy_sparse(mat, \"basic_mat\") bp_mat <3x4 col-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat> bp_mat[:,:] bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat[1:3, [0,2]].toarray() array([[0, 5], [2, 6]], dtype=uint32) bp_mat[[True, False, True], -2:].toarray() array([[4, 0], [6, 0]], dtype=uint32) bp_mat.T <4x3 row-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat> !ls -l basic_mat total 44 -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 col_names -rw-rw-r-- 1 bparks bparks 48 Aug 25 00:51 idxptr -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 index_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 index_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 index_idx_offsets -rw-rw-r-- 1 bparks bparks 12 Aug 25 00:51 index_starts -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 row_names -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 shape -rw-rw-r-- 1 bparks bparks 4 Aug 25 00:51 storage_order -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 val_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 val_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 val_idx_offsets -rw-rw-r-- 1 bparks bparks 22 Aug 25 00:51 version bp_mat = bpcells.experimental.DirMatrix(\"basic_mat\") import anndata anndata.AnnData(mat).write(\"mat.h5ad\") bp_mat = bpcells.experimental.DirMatrix.from_h5ad(\"mat.h5ad\", \"basic_mat_from_h5ad\") bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bpcells.experimental.DirMatrix.from_hstack( [bp_mat, bp_mat], \"basic_mat_hstack\" )[:,:].toarray() array([[1, 0, 4, 0, 1, 0, 4, 0], [0, 0, 5, 7, 0, 0, 5, 7], [2, 3, 6, 0, 2, 3, 6, 0]], dtype=uint32) bpcells.experimental.DirMatrix.from_vstack( [bp_mat, bp_mat], \"basic_mat_vstack\" )[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0], [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat.threads = 8 bp_mat[:,:].toarray() # This will be performed with 8 threads now array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat_mem = bpcells.experimental.MemMatrix(\"basic_mat\") bp_mat_mem.threads = 8 bp_mat_mem[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Performance estimates#","title":null,"text":"scimilarity dataset 15M human cells, compressed storage 64GB (2.2 bytes/non-zero) Read speeds 10k random cells 15M human cells (range 5 random tests) Storage location 1 thread 4 threads Memory 2.8-4.7 seconds 1.0-1.1 seconds Local SSD 4.5-4.9 seconds 1.5-1.7 seconds Networked FS (warm cache) 20-21 seconds 5.5-6.2 seconds Networked FS (cold cache) 🙁 76-115 seconds","code":""},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Demo data setup#","title":null,"text":"","code":"import bpcells.experimental import os import tempfile import numpy as np import scipy.sparse tmp = tempfile.TemporaryDirectory() os.chdir(tmp.name) mat = scipy.sparse.csc_matrix(np.array([ [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]] )) mat mat.toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]])"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Basic usage from scipy.sparse#","title":null,"text":"Slicing matrix returns scipy.sparse matrix can use many slicing options standard numpy matrices can also make transposed view matrix similar numpy. work done, just switch row-major col-major representations","code":"bp_mat = bpcells.experimental.DirMatrix.from_scipy_sparse(mat, \"basic_mat\") bp_mat <3x4 col-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat> bp_mat[:,:] bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32) bp_mat[1:3, [0,2]].toarray() array([[0, 5], [2, 6]], dtype=uint32) bp_mat[[True, False, True], -2:].toarray() array([[4, 0], [6, 0]], dtype=uint32) bp_mat.T <4x3 row-major sparse array stored in \t/tmp/tmpgnnfj3gp/basic_mat>"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Reopening the matrix later#","title":null,"text":"matrix path 13 files (compressed integer matrices), contain data metadata","code":"!ls -l basic_mat total 44 -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 col_names -rw-rw-r-- 1 bparks bparks 48 Aug 25 00:51 idxptr -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 index_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 index_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 index_idx_offsets -rw-rw-r-- 1 bparks bparks 12 Aug 25 00:51 index_starts -rw-rw-r-- 1 bparks bparks 0 Aug 25 00:51 row_names -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 shape -rw-rw-r-- 1 bparks bparks 4 Aug 25 00:51 storage_order -rw-rw-r-- 1 bparks bparks 56 Aug 25 00:51 val_data -rw-rw-r-- 1 bparks bparks 16 Aug 25 00:51 val_idx -rw-rw-r-- 1 bparks bparks 24 Aug 25 00:51 val_idx_offsets -rw-rw-r-- 1 bparks bparks 22 Aug 25 00:51 version bp_mat = bpcells.experimental.DirMatrix(\"basic_mat\")"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Import from h5ad#","title":null,"text":"","code":"import anndata anndata.AnnData(mat).write(\"mat.h5ad\") bp_mat = bpcells.experimental.DirMatrix.from_h5ad(\"mat.h5ad\", \"basic_mat_from_h5ad\") bp_mat[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Concatenate multiple matrices#","title":null,"text":"can concatenate multiple matrices single file disk low memory usage. allows importing many samples parallel, concatenating together single matrix","code":"bpcells.experimental.DirMatrix.from_hstack( [bp_mat, bp_mat], \"basic_mat_hstack\" )[:,:].toarray() array([[1, 0, 4, 0, 1, 0, 4, 0], [0, 0, 5, 7, 0, 0, 5, 7], [2, 3, 6, 0, 2, 3, 6, 0]], dtype=uint32) bpcells.experimental.DirMatrix.from_vstack( [bp_mat, bp_mat], \"basic_mat_vstack\" )[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0], [1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Multithreaded operation#","title":null,"text":"larger matrices, can desirable perform matrix reading multi-threaded manner. using multiple threads, BPCells divide matrix slice query chunks loaded parallel, recombined memory threads completed. performing random slicing along major storage axis, seek latency primary performance bottleneck. Setting high number threads (even actual core count machine) can help mitigate filesystem seek latency. slicing across non-major storage axis, decompression speed can become performance bottleneck. Setting threads number available cores can help parallelize decompression speed. cell-major RNA-seq matrices, thread can process compressed input rate 1 GB/s, filesystems >1GB/s sequential read speeds benefit parallelization.","code":"bp_mat.threads = 8 bp_mat[:,:].toarray() # This will be performed with 8 threads now array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":"https://bnprks.github.io/BPCells/python/notebooks/matrix_basics.html","id":null,"dir":"Python > Notebooks","previous_headings":"","what":"Compressed in-memory storage#","title":null,"text":"neural network training use-cases, fast slicing performance may critical avoid bottlenecking data loads. case, BPCells supports loading compressed data memory, eliminates seek latency saving ~4x memory usage compared uncompressed scipy sparse matrix. Loading can performed existing BPCells matrix directory, current version involves re-compressing data -memory load time (avoidable, bit trickier code direct loading isn’t implemented yet)","code":"bp_mat_mem = bpcells.experimental.MemMatrix(\"basic_mat\") bp_mat_mem.threads = 8 bp_mat_mem[:,:].toarray() array([[1, 0, 4, 0], [0, 0, 5, 7], [2, 3, 6, 0]], dtype=uint32)"},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"Python Docs#","title":null,"text":"BPCells python bindings still experimental API subject change. existing functionality mainly focused allowing read/write access BPCells file formats integer matrices scATAC fragments. Future updates add data-processing functions present R interface (e.g. streaming normalization, PCA, ATAC-seq peak/tile matrix creation). provide Python access shared C++ core code. Notably, plotting functionality currently planned implementation, written primarily R relies R plotting libraries present Python. helper functions R BPCells implemented pure R thus unlikely added Python near future. functionality interest , welcome contributions – able write code pure Python. Reach via github/email interested. BPCells can directly installed via pip: Matrix slicing Basepair insertion dataloading Fragment functions Matrix functions","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"Installation#","title":null,"text":"BPCells can directly installed via pip:","code":"python -m pip install bpcells"},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"Tutorials#","title":null,"text":"Matrix slicing Basepair insertion dataloading","code":""},{"path":"https://bnprks.github.io/BPCells/python/python.html","id":null,"dir":"Python","previous_headings":"","what":"API Reference#","title":null,"text":"Fragment functions Matrix functions","code":""},{"path":[]},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_open_matrix_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","title":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","text":"experimental sparse-column format designed handle storage matrices many columns -zero, less 2^32-1 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_open_matrix_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","text":"","code":"EXPERIMENTAL_open_matrix_dir(dir, buffer_size = 8192L)"},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_open_matrix_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Open experimental sparse-column format integer matrix — EXPERIMENTAL_open_matrix_dir","text":"dir Directory load data buffer_size performance tuning . number items buffered memory calling writes disk.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_write_matrix_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","title":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","text":"experimental sparse-column format designed handle storage matrices many columns -zero, less 2^32-1 non-zero entries.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_write_matrix_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","text":"","code":"EXPERIMENTAL_write_matrix_dir(mat, dir, buffer_size = 8192L, overwrite = FALSE)"},{"path":"https://bnprks.github.io/BPCells/reference/EXPERIMENTAL_write_matrix_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Write to experimental sparse-column format integer matrix — EXPERIMENTAL_write_matrix_dir","text":"dir Directory save data overwrite TRUE, write temp dir overwrite existing data. Alternatively, pass temp path string customize temp dir location.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableFragments methods — IterableFragments-methods","title":"IterableFragments methods — IterableFragments-methods","text":"Methods IterableFragments objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableFragments methods — IterableFragments-methods","text":"","code":"# S4 method for class 'IterableFragments' show(object) cellNames(x) cellNames(x, ...) <- value chrNames(x) chrNames(x, ...) <- value"},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableFragments methods — IterableFragments-methods","text":"object IterableFragments object x IterableFragments object value Character vector new names","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"IterableFragments methods — IterableFragments-methods","text":"cellNames() Character vector cell names, NULL none known chrNames(): Character vector chromosome names, NULL none known","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"IterableFragments methods — IterableFragments-methods","text":"cellNames<- possible replace names, add new names. chrNames<- possible replace names, add new names.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableFragments methods — IterableFragments-methods","text":"show(IterableFragments): Print IterableFragments cellNames(): Get cell names cellNames(x, ...) <- value: Set cell names chrNames(): Set chromosome names chrNames(x, ...) <- value: Set chromosome names","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-methods.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"IterableFragments methods — IterableFragments-methods","text":"","code":"## Prep data frags <- tibble::tibble( chr = paste0(\"chr\", c(rep(1,3), rep(2,3))), start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) frags #> # A tibble: 6 × 4 #> chr start end cell_id #> #> 1 chr1 10 40 cell1 #> 2 chr1 60 90 cell1 #> 3 chr1 110 140 cell2 #> 4 chr2 160 190 cell2 #> 5 chr2 210 240 cell3 #> 6 chr2 260 290 cell3 frags <- frags %>% convert_to_fragments() ####################################################################### ## show(IterableFragments) example ####################################################################### show(frags) #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 2 chromosomes with names chr1, chr2 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ####################################################################### ## cellNames(IterableFragments) example ####################################################################### cellNames(frags) #> [1] \"cell1\" \"cell2\" \"cell3\" ####################################################################### ## cellNames(IterableFragments)<- example ####################################################################### cellNames(frags) <- paste0(\"cell\", 5:7) cellNames(frags) #> [1] \"cell5\" \"cell6\" \"cell7\" ####################################################################### ## chrNames(IterableFragments) example ####################################################################### chrNames(frags) #> [1] \"chr1\" \"chr2\" ####################################################################### ## chrNames(IterableFragments)<- example ####################################################################### chrNames(frags) <- paste0(\"chr\", 5:6) chrNames(frags) #> [1] \"chr5\" \"chr6\""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"Methods defined classes extend IterableFragments, providing access metadata specialised behaviours storage backends selection wrappers.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"","code":"# S4 method for class 'FragmentsTsv' chrNames(x) # S4 method for class 'FragmentsTsv' cellNames(x) # S4 method for class 'UnpackedMemFragments' chrNames(x) # S4 method for class 'UnpackedMemFragments' cellNames(x) # S4 method for class 'PackedMemFragments' chrNames(x) # S4 method for class 'PackedMemFragments' cellNames(x) # S4 method for class 'FragmentsDir' chrNames(x) # S4 method for class 'FragmentsDir' cellNames(x) # S4 method for class 'FragmentsHDF5' chrNames(x) # S4 method for class 'FragmentsHDF5' cellNames(x) # S4 method for class 'ChrSelectName' chrNames(x) # S4 method for class 'ChrSelectIndex' chrNames(x) # S4 method for class 'CellSelectName' cellNames(x) # S4 method for class 'CellSelectIndex' cellNames(x) # S4 method for class 'CellMerge' cellNames(x) # S4 method for class 'ChrRename' chrNames(x) # S4 method for class 'CellRename' cellNames(x) # S4 method for class 'CellPrefix' cellNames(x) # S4 method for class 'MergeFragments' chrNames(x) # S4 method for class 'MergeFragments' cellNames(x)"},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"x object inheriting IterableFragments.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableFragments-misc-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableFragments subclass methods — chrNames,FragmentsTsv-method","text":"chrNames(FragmentsTsv): Get chromosome names FragmentsTsv cellNames(FragmentsTsv): Get cell names FragmentsTsv chrNames(UnpackedMemFragments): Get chromosome names UnpackedMemFragments cellNames(UnpackedMemFragments): Get cell names UnpackedMemFragments chrNames(PackedMemFragments): Get chromosome names PackedMemFragments cellNames(PackedMemFragments): Get cell names PackedMemFragments chrNames(FragmentsDir): Get chromosome names FragmentsDir cellNames(FragmentsDir): Get cell names FragmentsDir chrNames(FragmentsHDF5): Get chromosome names FragmentsHDF5 cellNames(FragmentsHDF5): Get cell names FragmentsHDF5 chrNames(ChrSelectName): Get chromosome names ChrSelectName chrNames(ChrSelectIndex): Get chromosome names ChrSelectIndex cellNames(CellSelectName): Get cell names CellSelectName cellNames(CellSelectIndex): Get cell names CellSelectIndex cellNames(CellMerge): Get cell names CellMerge chrNames(ChrRename): Get chromosome names ChrRename cellNames(CellRename): Get cell names CellRename cellNames(CellPrefix): Get cell names CellPrefix chrNames(MergeFragments): Get chromosome names MergeFragments cellNames(MergeFragments): Get cell names MergeFragments","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-matrixgenerics.html","id":null,"dir":"Reference","previous_headings":"","what":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","title":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","text":"S4 methods enabling MatrixGenerics generics (e.g., rowQuantiles, colQuantiles, rowVars, colVars, rowMaxs, colMaxs) operate IterableMatrix. registered runtime MatrixGenerics available.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-matrixgenerics.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","text":"x IterableMatrix. ... Passed underlying implementation.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-matrixgenerics.html","id":"availability","dir":"Reference","previous_headings":"","what":"Availability","title":"MatrixGenerics methods for IterableMatrix — IterableMatrix-matrixgenerics","text":"Methods registered conditionally; MatrixGenerics installed, nothing registered generics fall back usual.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"Generic methods built-functions IterableMatrix objects. include methods described IterableMatrix-methods sense redundancy. instance, %*% described IterableMatrix matrix left right respectively. need show method IterableMatrix right instead.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"","code":"# S4 method for class 'matrix,IterableMatrix' x %*% y # S4 method for class 'numeric,IterableMatrix' x %*% y # S4 method for class 'dgCMatrix,IterableMatrix' x %*% y # S3 method for class 'IterableMatrix' rowQuantiles( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) colQuantiles.IterableMatrix( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) # S4 method for class 'IterableMatrix,numeric' e1 < e2 # S4 method for class 'numeric,IterableMatrix' e1 > e2 # S4 method for class 'IterableMatrix,numeric' e1 <= e2 # S4 method for class 'numeric,IterableMatrix' e1 >= e2 # S4 method for class 'numeric,IterableMatrix' e1 * e2 # S4 method for class 'numeric,IterableMatrix' e1 + e2 # S4 method for class 'numeric,IterableMatrix' e1 - e2"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"digits Number decimal places quantile calculations ... Additional arguments passed methods drop Logical indicating whether drop dimensions subsetting.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods-misc.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableMatrix methods miscellaneous — IterableMatrix-methods-misc","text":"x %*% y: Multiply dense matrix IterableMatrix x %*% y: Multiply numeric row vector IterableMatrix x %*% y: Multiply dgCMatrix IterableMatrix rowQuantiles(IterableMatrix): Calculate rowQuantiles (replacement matrixStats::rowQuantiles) colQuantiles.IterableMatrix(): Calculate colQuantiles (replacement matrixStats::colQuantiles) e1 < e2: Perform matrix < numeric comparison (unsupported) e1 > e2: Perform numeric > matrix comparison (unsupported) e1 <= e2: Perform matrix <= numeric comparison (unsupported) e1 >= e2: Compare numeric value IterableMatrix using >= (numeric left operand) e1 * e2: Multiply IterableMatrix numeric value row-wise vector (numeric left operand) e1 + e2: Add IterableMatrix numeric value row-wise vector (numeric left operand) e1 - e2: Subtract matrix numeric constant/vector","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableMatrix methods — IterableMatrix-methods","title":"IterableMatrix methods — IterableMatrix-methods","text":"Generic methods built-functions IterableMatrix objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableMatrix methods — IterableMatrix-methods","text":"","code":"matrix_type(x) storage_order(x) # S4 method for class 'IterableMatrix' show(object) # S4 method for class 'IterableMatrix' t(x) # S4 method for class 'IterableMatrix,matrix' x %*% y # S4 method for class 'IterableMatrix' rowSums(x) # S4 method for class 'IterableMatrix' colSums(x) # S4 method for class 'IterableMatrix' rowMeans(x) # S4 method for class 'IterableMatrix' colMeans(x) colVars( x, rows = NULL, cols = NULL, na.rm = FALSE, center = NULL, ..., useNames = TRUE ) rowVars( x, rows = NULL, cols = NULL, na.rm = FALSE, center = NULL, ..., useNames = TRUE ) rowMaxs(x, rows = NULL, cols = NULL, na.rm = FALSE, ..., useNames = TRUE) colMaxs(x, rows = NULL, cols = NULL, na.rm = FALSE, ..., useNames = TRUE) rowQuantiles( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) colQuantiles( x, rows = NULL, cols = NULL, probs = seq(from = 0, to = 1, by = 0.25), na.rm = FALSE, type = 7L, digits = 7L, ..., useNames = TRUE, drop = TRUE ) # S4 method for class 'IterableMatrix' log1p(x) log1p_slow(x) # S4 method for class 'IterableMatrix' expm1(x) expm1_slow(x) # S4 method for class 'IterableMatrix,numeric' e1^e2 # S4 method for class 'numeric,IterableMatrix' e1 < e2 # S4 method for class 'IterableMatrix,numeric' e1 > e2 # S4 method for class 'numeric,IterableMatrix' e1 <= e2 # S4 method for class 'IterableMatrix,numeric' e1 >= e2 # S4 method for class 'IterableMatrix' round(x, digits = 0) # S4 method for class 'IterableMatrix,numeric' e1 * e2 # S4 method for class 'IterableMatrix,numeric' e1 + e2 # S4 method for class 'IterableMatrix,numeric' e1/e2 # S4 method for class 'IterableMatrix,numeric' e1 - e2"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableMatrix methods — IterableMatrix-methods","text":"x IterableMatrix object matrix-like object. object IterableMatrix object y matrix probs (Numeric) Quantile value(s) computed, 0 1. type (Integer) 4 9 selecting quantile algorithm use, detailed matrixStats::rowQuantiles()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"IterableMatrix methods — IterableMatrix-methods","text":"t() Transposed object x %*% y: dense matrix result rowSums(): vector row sums colSums(): vector col sums rowMeans(): vector row means colMeans(): vector col means colVars(): vector col variance rowVars(): vector row variance rowMaxs(): vector maxes every row colMaxs(): vector column maxes rowQuantiles(): length(probs) == 1, return numeric number entries equal number rows matrix. Else, return Matrix quantile values, cols representing quantile, row representing row input matrix. colQuantiles(): length(probs) == 1, return numeric number entries equal number columns matrix. Else, return Matrix quantile values, cols representing quantile, row representing col input matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableMatrix methods — IterableMatrix-methods","text":"matrix_type(): Get matrix data type (mat_uint32_t, mat_float, mat_double now) storage_order(): Get matrix storage order (\"row\" \"col\") show(IterableMatrix): Display IterableMatrix t(IterableMatrix): Transpose IterableMatrix x %*% y: Multiply dense matrix rowSums(IterableMatrix): Calculate rowSums colSums(IterableMatrix): Calculate colSums rowMeans(IterableMatrix): Calculate rowMeans colMeans(IterableMatrix): Calculate colMeans colVars(): Calculate colVars (replacement matrixStats::colVars()) rowVars(): Calculate rowVars (replacement matrixStats::rowVars()) rowMaxs(): Calculate rowMaxs (replacement matrixStats::rowMaxs()) colMaxs(): Calculate colMax (replacement matrixStats::colMax()) rowQuantiles(): Calculate rowQuantiles (replacement matrixStats::rowQuantiles) colQuantiles(): Calculate colQuantiles (replacement matrixStats::colQuantiles) log1p(IterableMatrix): Calculate log(x + 1) log1p_slow(): Calculate log(x + 1) (non-SIMD version) expm1(IterableMatrix): Calculate exp(x) - 1 expm1_slow(): Calculate exp(x) - 1 (non-SIMD version) e1^e2: Calculate x^y (elementwise; y > 0) e1 < e2: Binarize matrix according numeric < matrix comparison e1 > e2: Binarize matrix according matrix > numeric comparison e1 <= e2: Binarize matrix according numeric <= matrix comparison e1 >= e2: Binarize matrix according matrix >= numeric comparison round(IterableMatrix): round nearest integer (digits must 0) e1 * e2: Multiply constant, multiply rows vector length nrow(mat) e1 + e2: Add constant, row-wise addition vector length nrow(mat) e1 / e2: Divide constant, divide rows vector length nrow(mat) e1 - e2: Subtract constant, row-wise subtraction vector length nrow(mat)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-methods.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"IterableMatrix methods — IterableMatrix-methods","text":"","code":"## Prep data mat <- matrix(1:25, nrow = 5) %>% as(\"dgCMatrix\") mat #> 5 x 5 sparse Matrix of class \"dgCMatrix\" #> #> [1,] 1 6 11 16 21 #> [2,] 2 7 12 17 22 #> [3,] 3 8 13 18 23 #> [4,] 4 9 14 19 24 #> [5,] 5 10 15 20 25 mat <- as(mat, \"IterableMatrix\") mat #> 5 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ####################################################################### ## matrix_type() example ####################################################################### matrix_type(mat) #> [1] \"double\" ####################################################################### ## storage_order() example ####################################################################### storage_order(mat) #> [1] \"col\" ####################################################################### ## show() example ####################################################################### show(mat) #> 5 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ####################################################################### ## t() example ####################################################################### t(mat) #> 5 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ####################################################################### ## `x %*% y` example ####################################################################### mat %*% as(matrix(1:50, nrow = 5), \"dgCMatrix\") #> 5 x 10 IterableMatrix object with class MatrixMultiply #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (5x5) * Iterable_dgCMatrix_wrapper (5x10) ####################################################################### ## rowSums() example ####################################################################### rowSums(mat) #> [1] 55 60 65 70 75 ####################################################################### ## colSums() example ####################################################################### colSums(mat) #> [1] 15 40 65 90 115 ####################################################################### ## rowMeans() example ####################################################################### rowMeans(mat) #> [1] 11 12 13 14 15 ####################################################################### ## colMeans() example ####################################################################### colMeans(mat) #> [1] 3 8 13 18 23 ####################################################################### ## colVars() example ####################################################################### colVars(mat) #> [1] 2.5 2.5 2.5 2.5 2.5 ####################################################################### ## rowMaxs() example ####################################################################### rowMaxs(mat) #> [1] 21 22 23 24 25 ####################################################################### ## colMaxs() example ####################################################################### colMaxs(mat) #> [1] 5 10 15 20 25 ####################################################################### ## rowQuantiles() example ####################################################################### rowQuantiles(transpose_storage_order(mat)) #> 0% 25% 50% 75% 100% #> [1,] 1 6 11 16 21 #> [2,] 2 7 12 17 22 #> [3,] 3 8 13 18 23 #> [4,] 4 9 14 19 24 #> [5,] 5 10 15 20 25 ####################################################################### ## colQuantiles() example ####################################################################### colQuantiles(mat) #> 0% 25% 50% 75% 100% #> [1,] 1 2 3 4 5 #> [2,] 6 7 8 9 10 #> [3,] 11 12 13 14 15 #> [4,] 16 17 18 19 20 #> [5,] 21 22 23 24 25 ####################################################################### ## log1p() example ####################################################################### log1p(mat) #> 5 x 5 IterableMatrix object with class TransformLog1p #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform log1p ####################################################################### ## log1p_slow() example ####################################################################### log1p_slow(mat) #> 5 x 5 IterableMatrix object with class TransformLog1pSlow #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform log1p (non-SIMD implementation) ####################################################################### ## expm1() example ####################################################################### expm1(mat) #> 5 x 5 IterableMatrix object with class TransformExpm1 #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform expm1 ####################################################################### ## expm1_slow() example ####################################################################### expm1_slow(mat) #> 5 x 5 IterableMatrix object with class TransformExpm1Slow #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform expm1 (non-SIMD implementation) ####################################################################### ## `e1 < e2` example ####################################################################### 5 < mat #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x < 5 #> 3. Convert type from double to uint32_t ####################################################################### ## `e1 > e2` example ####################################################################### mat > 5 #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x < 5 #> 3. Convert type from double to uint32_t ####################################################################### ## `e1 <= e2` example ####################################################################### 5 <= mat #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x <= 5 #> 3. Convert type from double to uint32_t ####################################################################### ## `e1 >= e2` example ####################################################################### mat >= 5 #> 5 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Binarize according to formula: x <= 5 #> 3. Convert type from double to uint32_t ####################################################################### ## round() example ####################################################################### round(mat) #> 5 x 5 IterableMatrix object with class TransformRound #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Transform round to 0 decimal places ####################################################################### ## `e1 * e2` example ####################################################################### ## Multiplying by a constant mat * 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale by 5 ## Multiplying by a vector of length `nrow(mat)` mat * 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale rows by 1, 2 ... 5 ####################################################################### ## `e1 + e2` example ####################################################################### ## Add by a constant mat + 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift by 5 ## Adding row-wise by a vector of length `nrow(mat)` mat + 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift rows by 1, 2 ... 5 ####################################################################### ## `e1 / e2` example ####################################################################### ## Divide by a constant mat / 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale by 0.2 ## Divide by a vector of length `nrow(mat)` mat / 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Scale rows by 1, 0.5 ... 0.2 ####################################################################### ## `e1 - e2` example ####################################################################### ## Subtracting by a constant mat - 5 #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift by -5 ## Subtracting by a vector of length `nrow(mat)` mat - 1:nrow(mat) #> 5 x 5 IterableMatrix object with class TransformScaleShift #> #> Row names: unknown names #> Col names: unknown names #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Shift rows by -1, -2 ... -5"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":null,"dir":"Reference","previous_headings":"","what":"IterableMatrix subclass methods — IterableMatrix-misc-methods","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"Methods classes extend IterableMatrix dispatched directly base class. typically helper objects wrap another matrix alter behaviour (e.g., concatenation, -disk access).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"","code":"# S4 method for class 'MatrixMultiply' matrix_type(x) # S4 method for class 'MatrixMultiply,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'MatrixMask' matrix_type(x) # S4 method for class 'MatrixRankTransform' matrix_type(x) # S4 method for class 'MatrixSubset' matrix_type(x) # S4 method for class 'MatrixSubset,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'RenameDims' matrix_type(x) # S4 method for class 'RenameDims,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'RowBindMatrices' matrix_type(x) # S4 method for class 'ColBindMatrices' matrix_type(x) # S4 method for class 'RowBindMatrices,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'ColBindMatrices,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'PackedMatrixMem_uint32_t' matrix_type(x) # S4 method for class 'PackedMatrixMem_float' matrix_type(x) # S4 method for class 'PackedMatrixMem_double' matrix_type(x) # S4 method for class 'UnpackedMatrixMem_uint32_t' matrix_type(x) # S4 method for class 'UnpackedMatrixMem_float' matrix_type(x) # S4 method for class 'UnpackedMatrixMem_double' matrix_type(x) # S4 method for class 'MatrixDir' matrix_type(x) # S4 method for class 'EXPERIMENTAL_MatrixDirCompressedCol' matrix_type(x) # S4 method for class 'MatrixH5' matrix_type(x) # S4 method for class '10xMatrixH5' matrix_type(x) # S4 method for class 'AnnDataMatrixH5' matrix_type(x) # S4 method for class 'PeakMatrix' matrix_type(x) # S4 method for class 'PeakMatrix,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'TileMatrix' matrix_type(x) # S4 method for class 'TileMatrix,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'ConvertMatrixType' matrix_type(x) # S4 method for class 'ConvertMatrixType,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'Iterable_dgCMatrix_wrapper' matrix_type(x) # S4 method for class 'TransformedMatrix' matrix_type(x) # S4 method for class 'TransformedMatrix,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] # S4 method for class 'TransformScaleShift,numeric' e1 * e2 # S4 method for class 'TransformScaleShift,numeric' e1 + e2 # S4 method for class 'numeric,TransformScaleShift' e1 * e2 # S4 method for class 'numeric,TransformScaleShift' e1 + e2"},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"x object inheriting IterableMatrix. Row indices selection helpers. j Column indices selection helpers. ... Additional arguments passed call. drop Logical indicating whether drop dimensions (subsetting). e1 Left operand binary operations. e2 Right operand binary operations.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/IterableMatrix-misc-methods.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"IterableMatrix subclass methods — IterableMatrix-misc-methods","text":"matrix_type(MatrixMultiply): Matrix data type MatrixMultiply objects x[: Subset MatrixMultiply results matrix_type(MatrixMask): Matrix data type MatrixMask objects matrix_type(MatrixRankTransform): Matrix data type MatrixRankTransform objects matrix_type(MatrixSubset): Matrix data type MatrixSubset objects x[: Subset MatrixSubset transforms matrix_type(RenameDims): Matrix data type RenameDims objects x[: Subset RenameDims transforms matrix_type(RowBindMatrices): Matrix data type RowBindMatrices objects matrix_type(ColBindMatrices): Matrix data type ColBindMatrices objects x[: Subset RowBindMatrices transforms x[: Subset ColBindMatrices transforms matrix_type(PackedMatrixMem_uint32_t): Matrix data type PackedMatrixMem_uint32_t objects matrix_type(PackedMatrixMem_float): Matrix data type PackedMatrixMem_float objects matrix_type(PackedMatrixMem_double): Matrix data type PackedMatrixMem_double objects matrix_type(UnpackedMatrixMem_uint32_t): Matrix data type UnpackedMatrixMem_uint32_t objects matrix_type(UnpackedMatrixMem_float): Matrix data type UnpackedMatrixMem_float objects matrix_type(UnpackedMatrixMem_double): Matrix data type UnpackedMatrixMem_double objects matrix_type(MatrixDir): Matrix data type MatrixDir objects matrix_type(EXPERIMENTAL_MatrixDirCompressedCol): Matrix data type EXPERIMENTAL_MatrixDirCompressedCol objects matrix_type(MatrixH5): Matrix data type MatrixH5 objects matrix_type(`10xMatrixH5`): Matrix data type 10xMatrixH5 objects matrix_type(AnnDataMatrixH5): Matrix data type AnnDataMatrixH5 objects matrix_type(PeakMatrix): Matrix data type PeakMatrix objects x[: Subset PeakMatrix matrix_type(TileMatrix): Matrix data type TileMatrix objects x[: Subset TileMatrix matrix_type(ConvertMatrixType): Matrix data type ConvertMatrixType objects x[: Subset ConvertMatrixType transforms matrix_type(Iterable_dgCMatrix_wrapper): Matrix data type Iterable_dgCMatrix_wrapper objects matrix_type(TransformedMatrix): Matrix data type TransformedMatrix objects x[: Subset TransformedMatrix results e1 * e2: Scale TransformScaleShift results numeric values e1 + e2: Shift TransformScaleShift results numeric values e1 * e2: Apply numeric scaling left TransformScaleShift results e1 + e2: Add TransformScaleShift results numeric values (numeric left operand)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-class.html","id":null,"dir":"Reference","previous_headings":"","what":"Represent a sparse matrix-vector product operation — LinearOperator-class","title":"Represent a sparse matrix-vector product operation — LinearOperator-class","text":"LinearOperators perform sparse matrix-vector product operations downstream matrix solvers. avoid repeatedly calling iterate_matrix SVD solver possible efficiency gain","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":null,"dir":"Reference","previous_headings":"","what":"LinearOperator multiplication helpers — LinearOperator-math","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"Methods enabling \\%*% LinearOperator objects dense matrices numeric vectors.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"","code":"# S4 method for class 'LinearOperator,matrix' x %*% y # S4 method for class 'matrix,LinearOperator' x %*% y # S4 method for class 'LinearOperator,numeric' x %*% y # S4 method for class 'numeric,LinearOperator' x %*% y"},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"x Left operand. y Right operand.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/LinearOperator-math.html","id":"functions","dir":"Reference","previous_headings":"","what":"Functions","title":"LinearOperator multiplication helpers — LinearOperator-math","text":"x %*% y: Multiply LinearOperator dense matrix x %*% y: Multiply dense matrix LinearOperator x %*% y: Multiply LinearOperator numeric vector x %*% y: Multiply numeric vector LinearOperator","code":""},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":null,"dir":"Reference","previous_headings":"","what":"Get/set inputs to a matrix transform — all_matrix_inputs","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"matrix object can either input (.e. file disk raw matrix memory), can represent delayed operation one matrices. all_matrix_inputs() getter setter functions allow accessing base-level input matrices list, changing . useful want re-locate data disk without losing transformed BPCells matrix. (Note: experimental API; potentially subject revisions).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"","code":"all_matrix_inputs(x) all_matrix_inputs(x) <- value"},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"x IterableMatrix value List IterableMatrix objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/all_matrix_inputs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get/set inputs to a matrix transform — all_matrix_inputs","text":"List IterableMatrix objects. matrix m input object, all_matrix_inputs(m) return list(m).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":null,"dir":"Reference","previous_headings":"","what":"Apply a function to summarize rows/cols — apply_by_row","title":"Apply a function to summarize rows/cols — apply_by_row","text":"Apply custom R function row/col BPCells matrix. run slower builtin C++-backed functions, keep memory benefits disk-backed operations.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Apply a function to summarize rows/cols — apply_by_row","text":"","code":"apply_by_row(mat, fun, ...) apply_by_col(mat, fun, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Apply a function to summarize rows/cols — apply_by_row","text":"mat IterableMatrix object fun function(val, row, col) takes row/col values returns summary output. Argument details: val - Vector length (# non-zero values) value non-zero matrix entry row - one-based row index (apply_by_col: vector length (# non-zero values), apply_by_row: single integer) col - one-based col index (apply_by_col: single integer, apply_by_row: vector length (# non-zero values)) ... - Optional additional arguments (named row, col, val) ... Optional additional arguments passed fun","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Apply a function to summarize rows/cols — apply_by_row","text":"apply_by_row - list length nrow(matrix) results returned fun() row apply_by_col - list length ncol(matrix) results returned fun() row","code":""},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Apply a function to summarize rows/cols — apply_by_row","text":"functions require row-major matrix storage apply_by_row col-major storage apply_by_col, matrices stored wrong order may neeed re-ordered copy created using transpose_storage_order() first. required able keep memory-usage low allow calculating result single streaming pass input matrix. vector/matrix outputs desired instead lists, calling unlist(x) .call(cbind, x) .call(rbind, x) can convert list output.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/apply_by_row.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Apply a function to summarize rows/cols — apply_by_row","text":"","code":"mat <- matrix(rbinom(40, 1, 0.5) * sample.int(5, 40, replace = TRUE), nrow = 4) rownames(mat) <- paste0(\"gene\", 1:4) mat #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> gene1 0 0 2 0 0 0 3 3 0 2 #> gene2 1 0 3 0 0 3 4 1 1 2 #> gene3 2 0 1 0 0 2 0 0 0 0 #> gene4 0 0 0 0 3 0 1 0 0 0 mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") ####################################################################### ## apply_by_row() example ####################################################################### ## Get mean of every row ## expect an error in the case that col-major matrix is passed apply_by_row(mat, function(val, row, col) {sum(val) / nrow(mat)}) %>% unlist() #> Error in apply_by_row(mat, function(val, row, col) { sum(val)/nrow(mat)}): Cannot call apply_by_row on a col-major matrix. Please call transpose_storage_order() first ## Need to transpose matrix to make sure it is in row-order mat_row_order <- transpose_storage_order(mat) ## works as expected for row major apply_by_row(mat_row_order, function(val, row, col) sum(val) / ncol(mat_row_order) ) %>% unlist() #> [1] 1.0 1.5 0.5 0.4 # Also analogous to running rowMeans() without names rowMeans(mat) #> gene1 gene2 gene3 gene4 #> 1.0 1.5 0.5 0.4 ####################################################################### ## apply_by_col() example ####################################################################### ## Get argmax of every col apply_by_col(mat, function(val, row, col) if (length(val) > 0) row[which.max(val)] else 1L ) %>% unlist() #> [1] 3 1 2 1 4 2 2 1 2 1"},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert matrix elements to zeros and ones — binarize","title":"Convert matrix elements to zeros and ones — binarize","text":"Binarize compares matrix element values threshold value sets output elements either zero one. default, element values greater threshold set one; otherwise, set zero. strict_inequality set FALSE, element values greater equal threshold set one. alternative, <, <=, >, >= operators also supported.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert matrix elements to zeros and ones — binarize","text":"","code":"binarize(mat, threshold = 0, strict_inequality = TRUE)"},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert matrix elements to zeros and ones — binarize","text":"mat IterableMatrix threshold numeric value determines whether elements x set zero one. strict_inequality logical value determining whether comparison threshold >= (strict_inequality=FALSE) > (strict_inequality=TRUE).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/binarize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert matrix elements to zeros and ones — binarize","text":"binarized IterableMatrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_macs_peaks.html","id":null,"dir":"Reference","previous_headings":"","what":"Call peaks using MACS2/3 — call_macs_peaks","title":"Call peaks using MACS2/3 — call_macs_peaks","text":"function renamed call_peaks_macs()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_macs_peaks.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call peaks using MACS2/3 — call_macs_peaks","text":"","code":"call_macs_peaks(...)"},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":null,"dir":"Reference","previous_headings":"","what":"Call peaks using MACS2/3 — call_peaks_macs","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"Export pseudobulk bed files input MACS, run MACS read output peaks tibble. step can can run independently, allowing quickly re-loading results already completed call, running MACS externally (e.g. via cluster job submisison) increased parallelization. See details information.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"","code":"call_peaks_macs( fragments, path, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), effective_genome_size = 2.9e+09, insertion_mode = c(\"both\", \"start_only\", \"end_only\"), step = c(\"all\", \"prep-inputs\", \"run-macs\", \"read-outputs\"), macs_executable = NULL, additional_params = \"--call-summits --keep-dup all --shift -75 --extsize 150 --nomodel --nolambda\", verbose = FALSE, threads = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"fragments IterableFragments object path (string) Parent directory store MACS inputs outputs. Inputs stored /input/ outputs /output//. See \"File format\" details cell_groups Grouping vector one entry per cell fragments, e.g. cluster IDs effective_genome_size (numeric) Effective genome size MACS. Default 2.9e9 following MACS default GRCh38. See deeptools values common genomes. insertion_mode (string) fragment ends use coverage calculation. One , start_only, end_only. step (string) step run. One , prep-inputs, run-macs, read-outputs. prep-inputs, create input bed files macs, provides shell script per cell group command run macs. run-macs, also run bash scripts execute macs. read-outputs, read outputs tibbles. macs_executable (string) Path either MACS2/3 executable. Default (NULL) autodetect PATH. additional_params (string) Additional parameters pass MACS2/3. verbose (bool) Whether provide verbose output MACS. used step run-macs . threads (int) Number threads use.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"step prep-inputs, return script paths cell group given character vector. step run-macs, return NULL. step read-outputs , returns tibble peaks cell group concatenated. Columnns chr, start, end, group, name, score, strand, fold_enrichment, log10_pvalue, log10_qvalue, summit_offset","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"File format: Inputs written bed file used input MACS, well shell file containing call MACS written cell group. Bed files containing chr, start, end coordinates insertions written /input/.bed.gz. Shell commands run MACS manually written /input/.sh. Outputs written output directory subdirectory cell group. cell group's output directory contains file narrowPeaks, peaks, summits. NarrowPeaks written /output//_peaks.narrowPeak. Peaks written /output//_peaks.xls. Summits written /output//_summits.bed. narrowPeaks file read tibble returned. information outputs MACS, visit MACS docs Performance: Running 2600 cell dataset taking start end insertions account, written input bedfiles MACS outputs used 364 MB 158 MB space respectively. 4 threads, running function end end took 74 seconds, 61 seconds spent running MACS. Running MACS manually: run MACS manually, first run call_peaks_macs() step=\"prep-inputs. , manually run shell scripts generated /input/.sh. Finally, run call_peaks_macs() original arguments, setting step=\"read-outputs\".","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_macs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Call peaks using MACS2/3 — call_peaks_macs","text":"","code":"macs_files <- file.path(tempdir(), \"peaks\") frags <- get_demo_frags() head(call_peaks_macs(frags, macs_files)) #> # A tibble: 6 × 11 #> chr start end group name score strand fold_enrichment log10_pvalue #> #> 1 chr11 175907 176057 all all_peak_1 22 . 2.95 3.29 #> 2 chr11 179864 180015 all all_peak_2 33 . 3.68 4.44 #> 3 chr11 180095 180352 all all_peak_3 13 . 2.21 2.23 #> 4 chr11 184430 184599 all all_peak_4 33 . 3.68 4.44 #> 5 chr11 188061 188273 all all_peak_5 56 . 5.16 6.97 #> 6 chr11 189522 189672 all all_peak_6 33 . 3.68 4.44 #> # ℹ 2 more variables: log10_qvalue , summit_offset ## Can also just run the input prep, then run macs manually ## by setting step to 'prep_inputs' macs_script <- call_peaks_macs(frags, macs_files, step = \"prep-inputs\") system2(\"bash\", macs_script[1], stdout = FALSE, stderr = FALSE) ## Then read the narrow peaks files list.files(file.path(macs_files, \"output\", \"all\")) #> [1] \"all_peaks.narrowPeak\" \"all_peaks.xls\" \"all_summits.bed\" #> [4] \"log.txt\" ## call_peaks_macs() can also solely perform the output reading step head(call_peaks_macs(frags, macs_files, step = \"read-outputs\")) #> # A tibble: 6 × 11 #> chr start end group name score strand fold_enrichment log10_pvalue #> #> 1 chr11 175907 176057 all all_peak_1 22 . 2.95 3.29 #> 2 chr11 179864 180015 all all_peak_2 33 . 3.68 4.44 #> 3 chr11 180095 180352 all all_peak_3 13 . 2.21 2.23 #> 4 chr11 184430 184599 all all_peak_4 33 . 3.68 4.44 #> 5 chr11 188061 188273 all all_peak_5 56 . 5.16 6.97 #> 6 chr11 189522 189672 all all_peak_6 33 . 3.68 4.44 #> # ℹ 2 more variables: log10_qvalue , summit_offset "},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":null,"dir":"Reference","previous_headings":"","what":"Call peaks from tiles — call_peaks_tile","title":"Call peaks from tiles — call_peaks_tile","text":"Calling peaks pre-set list tiles can much faster using dedicated peak-calling software like macs3. resulting peaks less precise terms exact coordinates, sufficient analyses.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call peaks from tiles — call_peaks_tile","text":"","code":"call_peaks_tile( fragments, chromosome_sizes, cell_groups = rep.int(\"all\", length(cellNames(fragments))), effective_genome_size = NULL, peak_width = 200, peak_tiling = 3, fdr_cutoff = 0.01, merge_peaks = c(\"all\", \"group\", \"none\") )"},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Call peaks from tiles — call_peaks_tile","text":"fragments IterableFragments object chromosome_sizes Chromosome start end coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position See read_ucsc_chrom_sizes(). cell_groups Grouping vector one entry per cell fragments, e.g. cluster IDs effective_genome_size (Optional) effective genome size poisson background rate estimation. See deeptools values common genomes. Defaults sum chromosome sizes, overestimates peak significance peak_width Width candidate peaks peak_tiling Number candidate peaks overlapping base genome. E.g. peak_width = 300 peak_tiling = 3 results candidate peaks 300bp spaced 100bp apart fdr_cutoff Adjusted p-value significance cutoff merge_peaks merge significant peaks merge_peaks_iterative() \"\" Merge full set peaks \"group\" Merge peaks within group \"none\" perform merging","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Call peaks from tiles — call_peaks_tile","text":"tibble peak calls following columns: chr, start, end: genome coordinates group: group ID peak identified p_val, q_val: Poission p-value BH-corrected p-value enrichment: Enrichment counts peak compared genome-wide background","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Call peaks from tiles — call_peaks_tile","text":"Peak calling steps: Estimate genome-wide expected insertions per tile based peak_width, effective_genome_size, per-group read counts Tile genome nonoverlapping tiles size peak_width tile group, calculate p_value based Poisson model Compute adjusted p-values using BH method using total number tiles number hypotheses tested. Repeat steps 2-4 peak_tiling times, evenly spaced offsets merge_peaks \"\" \"group\": use merge_peaks_iterative() within group keep significant overlapping candidate peaks merge_peaks \"\", perform final round merge_peaks_iterative(), prioritizing peak within-group significance rank","code":""},{"path":"https://bnprks.github.io/BPCells/reference/call_peaks_tile.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Call peaks from tiles — call_peaks_tile","text":"","code":"## Prep data reference_dir <- file.path(tempdir(), \"references\") frags <- get_demo_frags() ## Remove blacklist regions from fragments blacklist <- read_encode_blacklist(reference_dir, genome=\"hg38\") frags_filter_blacklist <- select_regions(frags, blacklist, invert_selection = TRUE) chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) ## Call peaks call_peaks_tile(frags_filter_blacklist, chrom_sizes, effective_genome_size = 2.8e9) #> # A tibble: 73,160 × 7 #> chr start end group p_val q_val enrichment #> #> 1 chr11 65615400 65615600 all 0 0 6764. #> 2 chr4 2262266 2262466 all 0 0 6422. #> 3 chr11 119057200 119057400 all 0 0 6188. #> 4 chr11 695133 695333 all 0 0 6180. #> 5 chr11 2400400 2400600 all 0 0 6166. #> 6 chr4 1346933 1347133 all 0 0 6109. #> 7 chr11 3797600 3797800 all 0 0 6017. #> 8 chr11 64878600 64878800 all 0 0 5948. #> 9 chr11 57667733 57667933 all 0 0 5946. #> 10 chr11 83156933 83157133 all 0 0 5913. #> # ℹ 73,150 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate the MD5 checksum of an IterableMatrix — checksum","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"Calculate MD5 checksum IterableMatrix return checksum hexidecimal format.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"","code":"checksum(matrix)"},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"matrix IterableMatrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"MD5 checksum string hexidecimal format.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"checksum() converts non-zero elements sparse input matrix double precision, concatenates element value element row column index words, uses 16-byte blocks along matrix dimensions row column names calculate checksum. checksum value depends storage order column- row-order matrices element values give different checksum values. checksum() uses element index values little-endian CPU storage order. converts little-endian order big-endian architecture although tested.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/checksum.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate the MD5 checksum of an IterableMatrix — checksum","text":"","code":"library(Matrix) library(BPCells) m1 <- matrix(seq(1,12), nrow=3) m2 <- as(m1, 'dgCMatrix') m3 <- as(m2, 'IterableMatrix') checksum(m3) #> [1] \"8a6bf37ef376f7d74b4642a2ed0fc58d\""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster an adjacency matrix — cluster_graph_leiden","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Cluster adjacency matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"","code":"cluster_graph_leiden( snn, resolution = 1, objective_function = c(\"modularity\", \"CPM\"), seed = 12531, ... ) cluster_graph_louvain(snn, resolution = 1, seed = 12531) cluster_graph_seurat(snn, resolution = 0.8, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"snn Symmetric adjacency matrix (dgCMatrix) output e.g. knn_to_snn_graph() knn_to_geodesic_graph(). lower triangle used resolution Resolution parameter. Higher values result clusters objective_function Graph statistic optimize clustering. Modularity default keeps resolution independent dataset size (see details ). meaning option, see igraph::cluster_leiden(). seed Random seed clustering initialization ... Additional arguments underlying clustering function","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Factor vector containing cluster assignment cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"cluster_graph_leiden: Leiden clustering algorithm igraph::cluster_leiden(). Note using objective_function = \"CPM\" number clusters empirically scales cells * resolution, 1e-3 good resolution 10k cells, 1M cells better 1e-5 resolution. resolution 1 good default objective_function = \"modularity\" per default. cluster_graph_louvain: Louvain graph clustering algorithm igraph::cluster_louvain() cluster_graph_seurat: Seurat's clustering algorithm Seurat::FindClusters()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"Take cell embedding matrix, find k nearest neighbors (KNN) cell, convert KNN graph (adjacency matrix), run graph-based clustering algorithm. steps can customized passing function performs step (see details).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"","code":"cluster_cells_graph( mat, knn_method = knn_hnsw, knn_to_graph_method = knn_to_geodesic_graph, cluster_graph_method = cluster_graph_leiden, threads = 0L, verbose = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"mat (matrix) Cell embeddings matrix shape (cells x n_embeddings) knn_method (function) Function takes embedding matrix first argument returns k nearest neighbors (KNN) object. example, knn_hnsw(), knn_annoy(), parameterized version (see Details). knn_to_graph_method (function) Function takes KNN object returns graph undirected graph (lower-triangular dgCMatrix adjacency matrix). example, knn_to_graph(), knn_to_snn_graph(), knn_to_geodesic_graph(), parameterized version (see Details). cluster_graph_method (function) Function takes undirected graph cell similarity returns factor cluster assignments cell. example, cluster_graph_leiden(), cluster_graph_louvain(), cluster_graph_seurat(), parameterized version (see Details). threads (integer) Number threads use knn_method, knn_to_graph_method cluster_graph_method. functions utilize threads argument, silently ignored. verbose (logical) Whether print progress information knn_method, knn_to_graph_method cluster_graph_method. functions utilize verbose argument, silently ignored.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"(factor) Factor vector containing cluster assignment cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_cells_graph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster cell embeddings using a KNN graph-based algorithm — cluster_cells_graph","text":"Customizing clustering steps BPCells functions named like knn_*, knn_to_graph_*, cluster_graph_* support customizing parameters via partial function application. example, look 20 neighbors k nearest neighbors search, setting knn_method=knn_hnsw(k=20) convenient shortcut knn_method=function(x) knn_hnsw(x, k=20). Similarly, lowering default clustering resolution can done cluster_graph_method=cluster_graph_louvain(resolution=0.5). works functions written return partially parameterized copy function object first argument missing. even advanced customization, users can manually call knn, knn_to_graph, cluster_graph methods rather using cluster_cells_graph() convenient wrapper. Implementing custom clustering steps required interfaces step follows: knn_method: First argument matrix cell embeddings, shape (cells x n_embeddings). Returns named list two matrices dimension (cells x k): idx: Neighbor indices, idx[c, n] index nth nearest neighbor cell c. dist: Neighbor distances, dist[c, n] distance cell c nth nearest neighbor. Self-neighbors allowed, sufficient search effort idx[c,1] == c nearly cells. knn_to_graph_method: First argument KNN object returned knn_method. Returns weighted similarity graph lower triangular sparse adjacency matrix (dgCMatrix). cells j, similarity score adjacency_mat[max(,j), min(,j)]. cluster_graph_method: First argument weighted similarity graph returned knn_to_graph_method. Returns factor vector length cells cluster assignment cell.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster an adjacency matrix — cluster_graph_leiden","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Cluster adjacency matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"","code":"cluster_graph_leiden( mat, resolution = 1, objective_function = c(\"modularity\", \"CPM\"), seed = 12531, ... ) cluster_graph_louvain(mat, resolution = 1, seed = 12531) cluster_graph_seurat(mat, resolution = 0.8, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"mat Symmetric adjacency matrix (dgCMatrix) output e.g. knn_to_snn_graph() knn_to_geodesic_graph(). lower triangle used. resolution Resolution parameter. Higher values result clusters objective_function Graph statistic optimize clustering. Modularity default keeps resolution independent dataset size (see details ). meaning option, see igraph::cluster_leiden(). seed Random seed clustering initialization ... Additional arguments underlying clustering function","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"Factor vector containing cluster assignment cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_graph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster an adjacency matrix — cluster_graph_leiden","text":"cluster_graph_leiden: Leiden clustering algorithm igraph::cluster_leiden(). Note using objective_function = \"CPM\" number clusters empirically scales cells * resolution, 1e-3 good resolution 10k cells, 1M cells better 1e-5 resolution. resolution 1 good default objective_function = \"modularity\" per default. cluster_graph_louvain: Louvain graph clustering algorithm igraph::cluster_louvain() cluster_graph_seurat: Seurat's clustering algorithm Seurat::FindClusters()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert grouping vector to sparse matrix — cluster_membership_matrix","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"Converts vector membership IDs sparse matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"","code":"cluster_membership_matrix(groups, group_order = NULL)"},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"groups Vector one entry per cell, specifying cell's group group_order Optional vector listing ordering groups","code":""},{"path":"https://bnprks.github.io/BPCells/reference/cluster_membership_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert grouping vector to sparse matrix — cluster_membership_matrix","text":"cell x group matrix entry 1 cell given group","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":null,"dir":"Reference","previous_headings":"","what":"Collect features for plotting — collect_features","title":"Collect features for plotting — collect_features","text":"Helper function data features plot diverse set data sources.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Collect features for plotting — collect_features","text":"","code":"collect_features( source, features = NULL, gene_mapping = human_gene_mapping, n = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Collect features for plotting — collect_features","text":"source Matrix data frame pull features , vector feature values single feature. matrix, features must rows. features Character vector features names plot source vector. gene_mapping optional vector gene name matching match_gene_symbol(). Ignored source data frame. n Internal-use parameter marking number nested calls. used finding name \"source\" input variable caller's perspective","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Collect features for plotting — collect_features","text":"Data frame one column feature requested","code":""},{"path":"https://bnprks.github.io/BPCells/reference/collect_features.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Collect features for plotting — collect_features","text":"source data.frame, features drawn columns. source matrix object (IterableMatrix, dgCMatrix, matrix), features drawn rows.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/concat_dimnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function for rbind/cbind concatenating dimnames — concat_dimnames","title":"Helper function for rbind/cbind concatenating dimnames — concat_dimnames","text":"Helper function rbind/cbind concatenating dimnames","code":""},{"path":"https://bnprks.github.io/BPCells/reference/concat_dimnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function for rbind/cbind concatenating dimnames — concat_dimnames","text":"","code":"concat_dimnames(x, y, len_x, len_y, warning_prefix, dim_type)"},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert the type of a matrix — convert_matrix_type","title":"Convert the type of a matrix — convert_matrix_type","text":"Convert type matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert the type of a matrix — convert_matrix_type","text":"","code":"convert_matrix_type(matrix, type = c(\"uint32_t\", \"double\", \"float\"))"},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert the type of a matrix — convert_matrix_type","text":"matrix IterableMatrix object input type One uint32_t (unsigned 32-bit integer), float (32-bit real number), double (64-bit real number)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert the type of a matrix — convert_matrix_type","text":"IterableMatrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/convert_matrix_type.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert the type of a matrix — convert_matrix_type","text":"","code":"mat <- matrix(rnorm(50), nrow = 10, ncol = 5) rownames(mat) <- paste0(\"gene\", seq_len(10)) colnames(mat) <- paste0(\"cell\", seq_len(5)) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") mat #> 10 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory convert_matrix_type(mat, \"float\") #> 10 x 5 IterableMatrix object with class ConvertMatrixType #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: float #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory #> 2. Convert type from double to float"},{"path":"https://bnprks.github.io/BPCells/reference/create_partial.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper to create partial functions — create_partial","title":"Helper to create partial functions — create_partial","text":"Automatically creates partial application caller function including non-missing arguments.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/create_partial.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper to create partial functions — create_partial","text":"","code":"create_partial()"},{"path":"https://bnprks.github.io/BPCells/reference/create_partial.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Helper to create partial functions — create_partial","text":"bpcells_partial object (function extra attributes)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve BPCells demo data — get_demo_mat","title":"Retrieve BPCells demo data — get_demo_mat","text":"Functions download matrices fragments derived 10X Genomics PBMC 3k dataset, options filter common qc metrics, subset genes fragments chromosome 4 11.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve BPCells demo data — get_demo_mat","text":"","code":"get_demo_mat(filter_qc = TRUE, subset = TRUE) get_demo_frags(filter_qc = TRUE, subset = TRUE) remove_demo_data()"},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve BPCells demo data — get_demo_mat","text":"filter_qc (bool) Whether filter RNA ATAC data using qc metrics (described details). subset (bool) Whether subset genes/insertions chromosome 4 11.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve BPCells demo data — get_demo_mat","text":"get_demo_mat(): (IterableMatrix) (features x cells) matrix. get_demo_frags(): (IterableFragments) Fragments object. remove_demo_data(): NULL","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Retrieve BPCells demo data — get_demo_mat","text":"data functions experimental. interface, well demo dataset likely undergo changes near future. Data Processing: first time either get_demo_mat(), get_demo_frags(), ran demo data downloaded stored BPCells data directory (file.path(tools::R_user_dir(\"BPcells\", =\"data\"), \"demo_data\")). Subsequent calls function use previously downloaded matrix/fragments, given combination filtering subsetting performed previously. preparation matrix can reproduced running internal function prepare_demo_data() directory set BPCells data directory. case demo data pre-downloaded demo data download fails, prepare_demo_data() act fallback. matrix get_demo_mat() fragments get_demo_frags() may removed running remove_demo_data(). Filtering using QC information fragments matrix object chooses cells least 1000 reads, 1000 frags, minimum tss enrichment 10. Subsetting provides genes insertions chromosomes 4 11. Dimensions: Data size: Function Description: get_demo_mat(): Retrieve demo IterableMatrix object representing 10X Genomics PBMC 3k dataset. get_demo_frags(): Retrieve demo IterableFragments object representing 10X Genomics PBMC 3k dataset. remove_demo_data(): Remove demo data BPCells data directory.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/demo_data.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve BPCells demo data — get_demo_mat","text":"","code":"####################################################################### ## get_demo_mat() example ####################################################################### get_demo_mat() #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /home/imman/.local/share/R/BPCells/demo_data/demo_mat_filtered_subsetted ####################################################################### ## get_demo_frags() example ####################################################################### get_demo_frags() #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered_subsetted ####################################################################### ## remove_demo_data() example ####################################################################### remove_demo_data() ## Demo data folder is now empty data_dir <- file.path(tools::R_user_dir(\"BPCells\", which = \"data\"), \"demo_data\") list.files(data_dir) #> character(0)"},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"function renamed trackplot_combine().","code":""},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"","code":"draw_trackplot_grid( ..., labels, title = NULL, heights = rep(1, length(plots)), label_width = 0.2, label_style = list(fontface = \"bold\", size = 4) )"},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"... Plots order top bottom, generally plain ggplots. better accomodate many bulk tracks, patchwork objects contain multiple tracks also accepted. case, plot labels drawn attribute $patchwork$labels present, rather labels argument. labels Text labels display track title Text overarching title plot heights Relative heights component plot. suggested use 1 standard height pseudobulk track. label_width Fraction width used labels relative main track area label_style Arguments pass geom_text adjust label text style","code":""},{"path":"https://bnprks.github.io/BPCells/reference/draw_trackplot_grid.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine ggplot track plots into an aligned grid. — draw_trackplot_grid","text":"plot object aligned genome plots. aligned row text label, y-axis, plot body. relative height row given heights. shared title x-axis put top.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/ensure_downloaded.html","id":null,"dir":"Reference","previous_headings":"","what":"Download a file with a custom timeout — ensure_downloaded","title":"Download a file with a custom timeout — ensure_downloaded","text":"Download file custom timeout","code":""},{"path":"https://bnprks.github.io/BPCells/reference/ensure_downloaded.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download a file with a custom timeout — ensure_downloaded","text":"","code":"ensure_downloaded(path, backup_url, timeout)"},{"path":"https://bnprks.github.io/BPCells/reference/ensure_downloaded.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download a file with a custom timeout — ensure_downloaded","text":"path Output path write file timeout timeout seconds url download ","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Extend genome ranges in a strand-aware fashion. — extend_ranges","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"Extend genome ranges strand-aware fashion.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"","code":"extend_ranges( ranges, upstream = 0, downstream = 0, metadata_cols = c(\"strand\"), chromosome_sizes = NULL, zero_based_coords = !is(ranges, \"GRanges\") )"},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position upstream Number bases extend range upstream (negative shrink width) downstream Number bases extend range downstream (negative shrink width) metadata_cols Optional list metadata columns require & extract chromosome_sizes (optional) Size chromosomes genomic-ranges object zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"Note ranges blocked extending past beginning chromosome (base 0), chromosome_sizes given also blocked extending past end chromosome","code":""},{"path":"https://bnprks.github.io/BPCells/reference/extend_ranges.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extend genome ranges in a strand-aware fashion. — extend_ranges","text":"","code":"## Prep data ranges <- tibble::tibble( chr = \"chr1\", start = seq(50, 4050, 1000), end = start + 50, strand = \"+\" ) ranges #> # A tibble: 5 × 4 #> chr start end strand #> #> 1 chr1 50 100 + #> 2 chr1 1050 1100 + #> 3 chr1 2050 2100 + #> 4 chr1 3050 3100 + #> 5 chr1 4050 4100 + ## Extend ranges 1 bp upstream, 1 bp downstream extend_ranges(ranges, upstream = 1, downstream = 1) #> # A tibble: 5 × 4 #> chr start end strand #> #> 1 chr1 49 101 TRUE #> 2 chr1 1049 1101 TRUE #> 3 chr1 2049 2101 TRUE #> 4 chr1 3049 3101 TRUE #> 5 chr1 4049 4101 TRUE"},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":null,"dir":"Reference","previous_headings":"","what":"Get footprints around a set of genomic coordinates — footprint","title":"Get footprints around a set of genomic coordinates — footprint","text":"Get footprints around set genomic coordinates","code":""},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get footprints around a set of genomic coordinates — footprint","text":"","code":"footprint( fragments, ranges, zero_based_coords = !is(ranges, \"GRanges\"), cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), cell_weights = rlang::rep_along(cell_groups, 1), flank = 125L, normalization_width = flank%/%10L )"},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get footprints around a set of genomic coordinates — footprint","text":"fragments IterableFragments object ranges Footprint centers given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand \"+\" strand ranges footprint around start coordinate, \"-\" strand ranges around end coordinate. zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range cell_groups Character factor assigning group cell, order cellNames(fragments) cell_weights Numeric vector assigning weight factors (e.g. inverse total reads) cell, order cellNames(fragments) flank Number flanking basepairs include either side motif normalization_width Number basepairs upstream + downstream extremes use calculating enrichment","code":""},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get footprints around a set of genomic coordinates — footprint","text":"tibble::tibble() columns group, position, count, enrichment","code":""},{"path":"https://bnprks.github.io/BPCells/reference/footprint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get footprints around a set of genomic coordinates — footprint","text":"","code":"## Prep data frags <- get_demo_frags() ## Motif positions taken from taking a subset of GATA1 motifs ## positions in peaks using motifmatchr ## See basic tutorial for description of generating ## positions motif_positions <- tibble::tibble( chr = rep(\"chr4\", 3), start = c(338237, 498344, 499851), end = c(338247, 498354, 499861), strand = c(\"-\", \"+\", \"+\"), score = c(8.1422, 8.1415, 9.59462) ) ## Run footprinting footprint(frags, motif_positions) #> # A tibble: 251 × 4 #> group position count enrichment #> #> 1 all -125 0 0 #> 2 all -124 1 2 #> 3 all -123 0 0 #> 4 all -122 0 0 #> 5 all -121 2 4 #> 6 all -120 0 0 #> 7 all -119 1 2 #> 8 all -118 0 0 #> 9 all -117 0 0 #> 10 all -116 1 2 #> # ℹ 241 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert between BPCells fragments and R objects. — convert_to_fragments","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"BPCells fragments can interconverted GRanges data.frame R objects. main conversion method R's builtin () function, though convert_to_fragments() helper also available. R objects except GRanges, BPCells assumes 0-based, end-exclusive coordinate system. (See genomic-ranges-like reference details)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"","code":"# Convert from R to BPCells convert_to_fragments(x, zero_based_coords = !is(x, \"GRanges\")) as(x, \"IterableFragments\") # Convert from BPCells to R as.data.frame(bpcells_fragments) as(bpcells_fragments, \"data.frame\") as(bpcells_fragments, \"GRanges\")"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"x Fragment coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position cell_id: cell barcodes unique identifiers string factor zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"convert_to_fragments(): IterableFragments object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_R_conversion.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert between BPCells fragments and R objects. — convert_to_fragments","text":"","code":"frags_table <- tibble::tibble( chr = paste0(\"chr\", 1:10), start = 0, end = 5, cell_id = \"cell1\" ) frags_table #> # A tibble: 10 × 4 #> chr start end cell_id #> #> 1 chr1 0 5 cell1 #> 2 chr2 0 5 cell1 #> 3 chr3 0 5 cell1 #> 4 chr4 0 5 cell1 #> 5 chr5 0 5 cell1 #> 6 chr6 0 5 cell1 #> 7 chr7 0 5 cell1 #> 8 chr8 0 5 cell1 #> 9 chr9 0 5 cell1 #> 10 chr10 0 5 cell1 frags_granges <- GenomicRanges::makeGRangesFromDataFrame( frags_table, keep.extra.columns = TRUE ) frags_granges #> GRanges object with 10 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 0-5 * | cell1 #> [2] chr2 0-5 * | cell1 #> [3] chr3 0-5 * | cell1 #> [4] chr4 0-5 * | cell1 #> [5] chr5 0-5 * | cell1 #> [6] chr6 0-5 * | cell1 #> [7] chr7 0-5 * | cell1 #> [8] chr8 0-5 * | cell1 #> [9] chr9 0-5 * | cell1 #> [10] chr10 0-5 * | cell1 #> ------- #> seqinfo: 10 sequences from an unspecified genome; no seqlengths ####################################################################### ## convert_to_fragments() example ####################################################################### frags <- convert_to_fragments(frags_granges) frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 10 chromosomes with names chr1, chr2 ... chr10 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ####################################################################### ## as(x, \"IterableFragments\") example ####################################################################### frags <- as(frags_table, \"IterableFragments\") frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 10 chromosomes with names chr1, chr10 ... chr9 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ####################################################################### ## as(bpcells_fragments, \"data.frame\") example ####################################################################### frags_table <- as(frags, \"data.frame\") frags_table #> chr start end cell_id #> 1 chr1 0 5 cell1 #> 2 chr10 0 5 cell1 #> 3 chr2 0 5 cell1 #> 4 chr3 0 5 cell1 #> 5 chr4 0 5 cell1 #> 6 chr5 0 5 cell1 #> 7 chr6 0 5 cell1 #> 8 chr7 0 5 cell1 #> 9 chr8 0 5 cell1 #> 10 chr9 0 5 cell1 ####################################################################### ## as(bpcells_fragments, \"GRanges\") example ####################################################################### frags_granges <- as(frags, \"GRanges\") frags_granges #> GRanges object with 10 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 1-5 * | cell1 #> [2] chr10 1-5 * | cell1 #> [3] chr2 1-5 * | cell1 #> [4] chr3 1-5 * | cell1 #> [5] chr4 1-5 * | cell1 #> [6] chr5 1-5 * | cell1 #> [7] chr6 1-5 * | cell1 #> [8] chr7 1-5 * | cell1 #> [9] chr8 1-5 * | cell1 #> [10] chr9 1-5 * | cell1 #> ------- #> seqinfo: 10 sequences from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write BPCells fragment objects — write_fragments_memory","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"BPCells fragments can read/written compressed (bitpacked) uncompressed form variety storage locations: memory (R object), hdf5 file, directory disk (containing binary files).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"","code":"write_fragments_memory(fragments, compress = TRUE) write_fragments_dir( fragments, dir, compress = TRUE, buffer_size = 1024L, overwrite = FALSE ) open_fragments_dir(dir, buffer_size = 1024L) write_fragments_hdf5( fragments, path, group = \"fragments\", compress = TRUE, buffer_size = 8192L, chunk_size = 1024L, overwrite = FALSE, gzip_level = 0L ) open_fragments_hdf5(path, group = \"fragments\", buffer_size = 16384L)"},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"fragments Input fragments object compress Whether compress data. compression, storage size half size gzip-compressed 10x fragments file. dir Directory read/write data buffer_size performance tuning . number items bufferred memory calling writes disk. overwrite TRUE, write temp dir overwrite existing data. Alternatively, pass temp path string customize temp dir location. path Path hdf5 file disk group group within hdf5 file write data . writing existing hdf5 file group must already use chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression). recommended compression compatibility outside programs required. Otherwise, using compress=TRUE recommended >10x faster often similar compression levels.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"Fragment object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"Saving directory disk good default local analysis, provides best /O performance lowest memory usage. HDF5 format allows saving within existing hdf5 files group data together, memory format provides fastest performance event memory usage unimportant.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragment_io.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write BPCells fragment objects — write_fragments_memory","text":"","code":"## Create temporary directory to keep demo fragments data_dir <- file.path(tempdir(), \"frags\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) ## Get demo frags loaded from disk frags <- get_demo_frags() frags #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered_subsetted ####################################################################### ## write_fragments_memory() example ####################################################################### frags_memory <- write_fragments_memory(frags) frags_memory #> IterableFragments object of class \"PackedMemFragments\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from memory ####################################################################### ## write_fragments_dir() example ####################################################################### frags <- write_fragments_dir( frags_memory, file.path(data_dir, \"demo_frags\"), overwrite = TRUE ) frags #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /tmp/RtmpCiGY9C/frags/demo_frags ####################################################################### ## open_fragments_dir() example ####################################################################### frags <- open_fragments_dir(file.path(data_dir, \"demo_frags\")) frags #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from directory /tmp/RtmpCiGY9C/frags/demo_frags ####################################################################### ## write_fragments_hdf5() example ####################################################################### frags_hdf5 <- write_fragments_hdf5( frags, file.path(data_dir, \"demo_frags.h5\"), overwrite = TRUE ) frags_hdf5 #> IterableFragments object of class \"FragmentsHDF5\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from /tmp/RtmpCiGY9C/frags/demo_frags.h5, group fragments ####################################################################### ## open_fragments_hdf5() example ####################################################################### frags_hdf5 <- open_fragments_hdf5(file.path(data_dir, \"demo_frags.h5\")) frags_hdf5 #> IterableFragments object of class \"FragmentsHDF5\" #> #> Cells: 2600 cells with names TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> Chromosomes: 2 chromosomes with names chr4, chr11 #> #> Queued Operations: #> 1. Read compressed fragments from /tmp/RtmpCiGY9C/frags/demo_frags.h5, group fragments"},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if two fragments objects are identical — fragments_identical","title":"Check if two fragments objects are identical — fragments_identical","text":"Check two fragments objects identical","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if two fragments objects are identical — fragments_identical","text":"","code":"fragments_identical(fragments1, fragments2)"},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if two fragments objects are identical — fragments_identical","text":"fragments1 First IterableFragments compare fragments2 Second IterableFragments compare","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if two fragments objects are identical — fragments_identical","text":"boolean whether fragments objects identical","code":""},{"path":"https://bnprks.github.io/BPCells/reference/fragments_identical.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if two fragments objects are identical — fragments_identical","text":"","code":"## Prep data frags_1 <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags_1 #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory frags_2_identical <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags_3_different <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(4,2))) ) %>% convert_to_fragments() ## In the case of mismatching cell ids fragments_identical(frags_1, frags_3_different) #> [1] FALSE ## In the case of two identical frag objects fragments_identical(frags_1, frags_2_identical) #> [1] TRUE"},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":null,"dir":"Reference","previous_headings":"","what":"Gene Symbol Mapping data — human_gene_mapping","title":"Gene Symbol Mapping data — human_gene_mapping","text":"Mapping canonical gene symbols corresponding unambiguous alias, previous symbol, ensembl ID, entrez ID.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Gene Symbol Mapping data — human_gene_mapping","text":"","code":"human_gene_mapping mouse_gene_mapping"},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Gene Symbol Mapping data — human_gene_mapping","text":"human_gene_mapping named character vector. Names aliases IDs values corresponding canonical gene symbol mouse_gene_mapping named character vector. Names aliases IDs values corresponding canonical gene symbol","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Gene Symbol Mapping data — human_gene_mapping","text":"human_gene_mapping http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/non_alt_loci_set.txt mouse_gene_mapping http://www.informatics.jax.org/downloads/reports/MGI_EntrezGene.rpt http://www.informatics.jax.org/downloads/reports/MRK_ENSEMBL.rpt","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Gene Symbol Mapping data — human_gene_mapping","text":"See source code data-raw/human_gene_mapping.R data-raw/mouse_gene_mapping.R exactly mappings made.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_mapping.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Gene Symbol Mapping data — human_gene_mapping","text":"","code":"####################################################################### ## human_gene_mapping head(human_gene_mapping) #> 0808y08y 1 1-8D 1-8U 1-Cys 1/2-SBSRNA4 #> \"NFYC-AS1\" \"A1BG\" \"IFITM2\" \"IFITM3\" \"PRDX6\" \"SEC24B-AS1\" ####################################################################### ####################################################################### ## mouse_gene_mapping head(mouse_gene_mapping) #> (ACTbEGFP)10sb (CAM)alpha1B-AR #> \"Tg(CAG-EGFP)1Osb\" \"Tg(CAMalpha1b)7Wjk\" #> (CaMKII)Cre2834 (G2019S) LRRK2 #> \"Tg(Camk2a-cre)2834Lusc\" \"Tg(PDGFB-LRRK2*G2019S)32Hlw\" #> (G93A)Tg+ (H163R) PS-1 YAC #> \"Tg(SOD1*G93A)1Gur\" \"Tg(PSEN1H163R)G9Btla\""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":null,"dir":"Reference","previous_headings":"","what":"Gene symbol matching — match_gene_symbol","title":"Gene symbol matching — match_gene_symbol","text":"Correct alias gene symbols, Ensembl IDs, Entrez IDs canonical gene symbols. useful matching gene names different datasets might always use gene naming conventions.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Gene symbol matching — match_gene_symbol","text":"","code":"match_gene_symbol(query, subject, gene_mapping = human_gene_mapping) canonical_gene_symbol(query, gene_mapping = human_gene_mapping)"},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Gene symbol matching — match_gene_symbol","text":"query Character vector gene symbols IDs subject Vector gene symbols IDs index gene_mapping Named vector names gene symbols IDs values canonical gene symbols","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Gene symbol matching — match_gene_symbol","text":"match_gene_symbol Integer vector indices v subject[v] corresponds gene symbols query canonical_gene_symbol Character vector canonical gene symbols symbol query","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_matching.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Gene symbol matching — match_gene_symbol","text":"","code":"####################################################################### ## match_gene_symbol() example ####################################################################### match_gene_symbol( c(\"CD8\", \"CD4\", \"CD45\"), c(\"ENSG00000081237.19\", \"ENSG00000153563.15\", \"ENSG00000010610.9\", \"ENSG00000288825\") ) #> [1] 2 3 1 ####################################################################### ## canonical_gene_symbol() example ####################################################################### canonical_gene_symbol(c(\"CD45\", \"CD8\", \"CD4\")) #> CD45 CD8 CD4 #> \"PTPRC\" \"CD8A\" \"CD4\""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":null,"dir":"Reference","previous_headings":"","what":"Find gene region — gene_region","title":"Find gene region — gene_region","text":"Conveniently look region gene gene symbol. value returned function can used region argument trackplot functions trackplot_coverage() trackplot_gene()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find gene region — gene_region","text":"","code":"gene_region( genes, gene_symbol, extend_bp = c(10000, 10000), gene_mapping = human_gene_mapping )"},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find gene region — gene_region","text":"genes Transcipt features given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand gene_name: Symbol gene ID gene_symbol Name gene symbol ID extend_bp Bases extend region upstream downstream gene. length 1, extension symmetric. length 2, provide upstream extension downstream extension positive distances. gene_mapping Named vector names gene symbols IDs values canonical gene symbols","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find gene region — gene_region","text":"List chr, start, end positions use trackplot functions.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_region.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find gene region — gene_region","text":"","code":"## Prep data genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) ## Get gene region gene_region(genes, \"CD19\", extend_bp = 1e5) #> $chr #> [1] \"chr16\" #> #> $start #> [1] 28831970 #> #> $end #> [1] 29039342 #>"},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"ArchR-style gene activity scores based weighted sum tile according signed distance tile gene body. function calculates signed distances according ArchR's default parameters.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"","code":"gene_score_tiles_archr( genes, chromosome_sizes = NULL, tile_width = 500, addArchRBug = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"genes Gene coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand chromosome_sizes (optional) Size chromosomes genomic-ranges object tile_width Size tiles consider addArchRBug Replicate ArchR bug handling nested genes","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"Tibble one range per tile, additional metadata columns gene_idx (row index gene tile corresponds ) distance. Distance signed distance calculated tile smaller start coordinate gene gene + strand, distance negative. distance adjacent non-overlapping regions 1bp, counting .","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"ArchR's tile distance algorithm works follows Genes extended 5kb upstream Genes linked tiles 1kb-100kb upstream + downstream, tiles beyond neighboring gene considered","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_score_tiles_archr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate gene-tile distances for ArchR gene activities — gene_score_tiles_archr","text":"","code":"## Prep data directory <- file.path(tempdir(), \"references\") genes <- read_gencode_genes( directory, release = \"42\", annotation_set = \"basic\", ) ## Get gene scores by tile gene_score_tiles_archr( genes ) #> # A tibble: 6,900,314 × 5 #> chr start end gene_idx distance #> #> 1 chr1 0 500 1 -6369 #> 2 chr1 500 1000 1 -5869 #> 3 chr1 1000 1500 1 -5369 #> 4 chr1 1500 2000 1 -4869 #> 5 chr1 2000 2500 1 -4369 #> 6 chr1 2500 3000 1 -3869 #> 7 chr1 3000 3500 1 -3369 #> 8 chr1 3500 4000 1 -2869 #> 9 chr1 4000 4500 1 -2369 #> 10 chr1 4500 5000 1 -1869 #> # ℹ 6,900,304 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate GeneActivityScores — gene_score_weights_archr","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"Gene activity scores can calculated distance-weighted sum per-tile accessibility. tile weights gene can represented sparse matrix dimension genes x tiles. multiply weight matrix corresponding tile matrix (tiles x cells), can get gene activity score matrix genes x cells. gene_score_weights_archr() calculates weight matrix (best pre-computed tile matrix), gene_score_archr() provides easy--use wrapper.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"","code":"gene_score_weights_archr( genes, chromosome_sizes, blacklist = NULL, tile_width = 500, gene_name_column = \"gene_id\", addArchRBug = FALSE ) gene_score_archr( fragments, genes, chromosome_sizes, blacklist = NULL, tile_width = 500, gene_name_column = \"gene_id\", addArchRBug = FALSE, tile_max_count = 4, scale_factor = 10000, tile_matrix_path = tempfile(pattern = \"gene_score_tile_mat\") )"},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"genes Gene coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand chromosome_sizes Chromosome start end coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position See read_ucsc_chrom_sizes(). blacklist Regions exclude calculations, given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position tile_width Size tiles consider gene_name_column NULL, column name genes use row names addArchRBug Replicate ArchR bug handling nested genes fragments Input fragments object tile_max_count Maximum value tile counts matrix. null, tile counts higher clipped tile_max_count. Equivalent ceiling argument ArchR::addGeneScoreMatrix() scale_factor null, counts cell scaled sum scale_factor. Equivalent scaleTo argument ArchR::addGeneScoreMatrix() tile_matrix_path Path directory intermediate tile matrix saved","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"gene_score_weights_archr Weight matrix dimension genes x tiles gene_score_archr Gene score matrix dimension genes x cells.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"gene_score_weights_archr: Given set tile coordinates distances returned gene_score_tiles_archr(), calculate weight matrix dimensions genes x tiles. matrix can multiplied tile matrix obtain ArchR-compatible gene activity scores.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/gene_scores.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate GeneActivityScores — gene_score_weights_archr","text":"","code":"## Prep data reference_dir <- file.path(tempdir(), \"references\") frags <- get_demo_frags() genes <- read_gencode_genes( reference_dir, release=\"42\", annotation_set = \"basic\", ) %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) blacklist <- read_encode_blacklist(reference_dir, genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) chrom_sizes$tile_width = 500 ####################################################################### ## gene_score_weights_archr() example ####################################################################### ## Get gene score weight matrix (genes x tiles) gene_score_weights <- gene_score_weights_archr( genes, chrom_sizes, blacklist ) ## Get tile matrix (tiles x cells) tiles <- tile_matrix(frags, chrom_sizes, mode = \"fragments\") ## Get gene scores per cell gene_score_weights %*% tiles #> 3849 x 2600 IterableMatrix object with class MatrixMultiply #> #> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * ConvertMatrixType (2600x650604) ####################################################################### ## gene_score_archr() example ####################################################################### ## This is a wrapper that creates both the gene score weight ## matrix and tile matrix together gene_score_archr(frags, genes, chrom_sizes, blacklist) #> 3849 x 2600 IterableMatrix object with class TransformScaleShift #> #> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * TransformMin (2600x650604) #> 2. Scale columns by 0.917, 0.495 ... 8.53"},{"path":"https://bnprks.github.io/BPCells/reference/genomic-ranges-like.html","id":null,"dir":"Reference","previous_headings":"","what":"Genomic range formats — genomic-ranges-like","title":"Genomic range formats — genomic-ranges-like","text":"BPCells accepts flexible set genomic ranges-like objects input, either GRanges, data.frame, lists, character vectors. objects must specify chromosome, start, end coordinates along optional metadata range. exception GenomicRanges::GRanges objects, BPCells assumes objects use zero-based, end-exclusive coordinate system (see details).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/genomic-ranges-like.html","id":"valid-range-like-objects","dir":"Reference","previous_headings":"","what":"Valid Range-like objects","title":"Genomic range formats — genomic-ranges-like","text":"BPCells can interpret following types ranges: list(), data.frame(), columns: chr: Character factor chromosome names start: Start coordinates (0-based) end: End coordinates (exclusive) (optional) strand: \"+\"/\"-\" TRUE/FALSE pos/neg strand (optional) Additional metadata named list entries data.frame columns GenomicRanges::GRanges start(x) interpreted 1-based start coordinate end(x) interpreted inclusive end coordinate strand(x): \"*\" entries interpeted postive strand (optional) mcols(x) holds additional metadata character Given format \"chr1:1000-2000\" \"chr1:1,000-2,000\" Uses 0-based, end-exclusive coordinate system used ranges additional metadata required","code":""},{"path":"https://bnprks.github.io/BPCells/reference/genomic-ranges-like.html","id":"range-coordinate-systems","dir":"Reference","previous_headings":"","what":"Range coordinate systems","title":"Genomic range formats — genomic-ranges-like","text":"two main conventions coordinate systems: One-based, end-inclusive ranges first base chromosome numbered 1 last base range equal end coordinate e.g. 1-5 describes first 5 bases chromosome Used formats SAM, GTF BPCells, used reading writing GenomicRanges::GRanges objects Zero-based, end-exclusive ranges first base chromosome numbered 0 last base range one less end coordinate e.g. 0-5 describes first 5 bases chromosome Used formats BAM, BED BPCells, used range objects","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":null,"dir":"Reference","previous_headings":"","what":"Import MatrixMarket files — import_matrix_market","title":"Import MatrixMarket files — import_matrix_market","text":"Read sparse matrix MatrixMarket file. text-based format used 10x, Parse, others store sparse matrices. Format details NIST website.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Import MatrixMarket files — import_matrix_market","text":"","code":"import_matrix_market( mtx_path, outdir = tempfile(\"matrix_market\"), row_names = NULL, col_names = NULL, row_major = FALSE, tmpdir = tempdir(), load_bytes = 4194304L, sort_bytes = 1073741824L ) import_matrix_market_10x( mtx_dir, outdir = tempfile(\"matrix_market\"), feature_type = NULL, row_major = FALSE, tmpdir = tempdir(), load_bytes = 4194304L, sort_bytes = 1073741824L )"},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Import MatrixMarket files — import_matrix_market","text":"mtx_path Path mtx mtx.gz file outdir Directory store output row_names Character vector row names col_names Character vector col names row_major true, store matrix row-major orientation tmpdir Temporary directory use intermediate storage load_bytes minimum contiguous load size merge sort passes sort_bytes amount memory allocate re-sorting chunks entries mtx_dir Directory holding matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz feature_type String vector feature types include. (cellranger 3.0 newer)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Import MatrixMarket files — import_matrix_market","text":"MatrixDir object imported matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/import_matrix_market.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Import MatrixMarket files — import_matrix_market","text":"Import MatrixMarket mtx files BPCells format. implementation ensures fixed memory usage even large inputs -disk sorts. much slower hdf5 inputs, use MatrixMarket format absolutely necessary. rough speed estimate, importing 17GB Parse 1M PBMC DGE_1M_PBMC.mtx file takes 4 minutes 1.3GB RAM, producing compressed output matrix 1.5GB. mtx.gz files slower import due gzip decompression. importing 10x mtx files, row column names can read automatically using import_matrix_market_10x() convenience function.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_adjacency_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","title":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","text":"Clustering functions like cluster_graph_leiden() cluster_graph_louvain() require graph adjacency matrix input. assume square dgCMatrix graph adjacency matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_adjacency_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","text":"","code":"is_adjacency_matrix(mat)"},{"path":"https://bnprks.github.io/BPCells/reference/is_adjacency_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if an input is a graph adjacency matrix. — is_adjacency_matrix","text":"TRUE mat graph adjacency matrix, FALSE otherwise","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_knn_object.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if an input is a kNN object — is_knn_object","title":"Check if an input is a kNN object — is_knn_object","text":"knn object functions knn_hnsw() knn_annoy() return list two matrices, idx dist. used inputs create graph adjacency matrices clustering. Assume list least idx dist items kNN object.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/is_knn_object.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if an input is a kNN object — is_knn_object","text":"","code":"is_knn_object(mat)"},{"path":"https://bnprks.github.io/BPCells/reference/is_knn_object.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if an input is a kNN object — is_knn_object","text":"TRUE mat knn object, FALSE otherwise","code":""},{"path":"https://bnprks.github.io/BPCells/reference/iterate_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a wrapped pointer to the iterable matrix — iterate_matrix","title":"Get a wrapped pointer to the iterable matrix — iterate_matrix","text":"Get wrapped pointer iterable matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/iterate_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a wrapped pointer to the iterable matrix — iterate_matrix","text":"","code":"iterate_matrix(x)"},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a knn object from reduced dimensions — knn_hnsw","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"Search approximate nearest neighbors cells reduced dimensions (e.g. PCA), return k nearest neighbors (knn) cell. Optionally, can find neighbors two separate sets cells utilizing data query.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"","code":"knn_hnsw( data, query = NULL, k = 10, metric = c(\"euclidean\", \"cosine\"), verbose = TRUE, threads = 1, ef = 100 ) knn_annoy( data, query = NULL, k = 10, metric = c(\"euclidean\", \"cosine\", \"manhattan\", \"hamming\"), n_trees = 50, search_k = -1 )"},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"data cell x dims matrix reference dataset query cell x dims matrix query dataset (optional) k number neighbors calculate metric distance metric use verbose whether print progress information search threads Number threads use. Note result non-deterministic threads > 1 ef ef parameter RcppHNSW::hnsw_search(). Increase slower search improved accuracy n_trees Number trees index build time. trees gives higher accuracy search_k Number nodes inspect query, -1 default value. Higher number gives higher accuracy","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"Named list two matrices dimension (cells x k): idx: Neighbor indices, idx[c, n] index nth nearest neighbor cell c. dist: Neighbor distances, dist[c, n] distance cell c nth nearest neighbor. query given, nearest neighbors found mapping data matrix , likely including self-neighbors (.e. idx[c,1] == c cells).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get a knn object from reduced dimensions — knn_hnsw","text":"knn_hnsw: Use RcppHNSW knn engine knn_annoy: Use RcppAnnoy knn engine","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":null,"dir":"Reference","previous_headings":"","what":"K Nearest Neighbor (KNN) Graph — knn_to_graph","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"Convert KNN object (e.g. returned knn_hnsw() knn_annoy()) graph. graph represented sparse adjacency matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"","code":"knn_to_graph(knn, use_weights = FALSE, self_loops = TRUE) knn_to_snn_graph( knn, min_val = 1/15, self_loops = FALSE, return_type = c(\"matrix\", \"list\") ) knn_to_geodesic_graph(knn, return_type = c(\"matrix\", \"list\"), threads = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"knn List 2 matrices – idx cell x K neighbor indices, dist cell x K neighbor distances use_weights boolean whether replace distance weights 1 self_loops Whether allow self-loops output graph min_val minimum jaccard index neighbors. Values round 0 return_type Whether return sparse adjacency matrix edge list threads Number threads use calculations","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"knn_to_graph Sparse matrix (dgCMatrix) mat[,j] = distance cell cell j, 0 cell j K nearest neighbors knn_to_snn_graph return_type == \"matrix\": Sparse matrix (dgCMatrix) mat[,j] = jaccard index overlap nearest neigbors cell cell j, 0 jaccard index < min_val. lower triangle filled , compatible BPCells clustering methods return_type == \"list\": List 3 equal-length vectors , j, weight, along integer dim. correspond rows, cols, values non-zero entries lower triangle adjacency matrix. dim total number vertices (cells) graph knn_to_geodesic_graph return_type == \"matrix\": Sparse matrix (dgCMatrix) mat[,j] = normalized similarity cell cell j. lower triangle filled , compatible BPCells clustering methods return_type == \"list\": List 3 equal-length vectors , j, weight, along integer dim. correspond rows, cols, values non-zero entries lower triangle adjacency matrix. dim total number vertices (cells) graph","code":""},{"path":"https://bnprks.github.io/BPCells/reference/knn_graph.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"K Nearest Neighbor (KNN) Graph — knn_to_graph","text":"knn_to_graph Create knn graph knn_to_snn_graph Convert knn object shared nearest neighbors adjacency matrix. follows algorithm Seurat uses compute SNN graphs knn_to_geodesic_graph Convert knn object undirected weighted graph, using geodesic distance estimation method UMAP package. matches output umap._umap.fuzzy_simplicial_set umap-learn python package, used default scanpy.pp.neighbors. re-weights symmetrizes KNN graph, usually use less memory return sparser graph knn_to_snn_graph computes 2nd-order neighbors. Note: cells listed nearest neighbor, results may differ slightly umap._umap.fuzzy_simplicial_set, assumes self always successfully found approximate nearest neighbor search.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/linear_operator.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct a LinearOperator object — linear_operator","title":"Construct a LinearOperator object — linear_operator","text":"Constructs C++ matrix object save pointer use repeated matrix-vector products bit experimental still internal use","code":""},{"path":"https://bnprks.github.io/BPCells/reference/linear_operator.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct a LinearOperator object — linear_operator","text":"","code":"linear_operator(mat)"},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":null,"dir":"Reference","previous_headings":"","what":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"Test MACS executable valid. macs_executable NULL, function try auto-detect MACS PATH, preference MACS3 MACS2. macs_executable provided, function check MACS can called.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"","code":"macs_path_is_valid(macs_executable)"},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"macs_executable (string) Path either MACS2/3 executable. Default (NULL) autodetect PATH.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/macs_path_is_valid.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Test if MACS executable is valid. If macs_executable is NULL, this function will try to auto-detect MACS from PATH, with preference for MACS3 over MACS2. If macs_executable is provided, this function will check if MACS can be called. — macs_path_is_valid","text":"MACS executable path.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":null,"dir":"Reference","previous_headings":"","what":"Test for marker features — marker_features","title":"Test for marker features — marker_features","text":"Given features x cells matrix, perform one-vs-differential tests find markers.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test for marker features — marker_features","text":"","code":"marker_features(mat, groups, method = \"wilcoxon\")"},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Test for marker features — marker_features","text":"mat IterableMatrix object dimensions features x cells groups Character/factor vector cell groups/clusters. Length #cells method Test method use. Current options : wilcoxon: Wilconxon rank-sum test .k.Mann-Whitney U test","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Test for marker features — marker_features","text":"tibble following columns: foreground: Group ID used foreground background: Group ID used background (NA comparing rest cells) feature: ID feature p_val_raw: Unadjusted p-value differential test foreground_mean: Average value foreground group background_mean: Average value background group","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Test for marker features — marker_features","text":"Tips using values function: Use dplyr::mutate() add columns e.g. adjusted p-value log fold change. Use dplyr::filter() get differential genes given threshold get adjusted p-values, use R p.adjust(), recommended method \"BH\" get log2 fold change: input matrix already log-transformed, calculate (foreground_mean - background_mean)/log(2). input matrix log-transformed, calculate log2(forground_mean/background_mean)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/marker_features.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Test for marker features — marker_features","text":"","code":"mat <- get_demo_mat() groups <- sample(c(\"A\", \"B\", \"C\", \"D\"), ncol(mat), replace = TRUE) marker_feats <- marker_features(mat, groups) #> Warning: marker features calculation requires row-major storage #> • Consider using transpose_storage_order() if running marker_features repeatedly #> This message is displayed once every 8 hours. #> Writing transposed storage order to /tmp/RtmpCiGY9C/transpose3d2cda46aa001f ## to see the results of one specific group vs all other groups marker_feats %>% dplyr::filter(foreground == \"A\") #> # A tibble: 3,582 × 6 #> foreground background feature p_val_raw foreground_mean background_mean #> #> 1 A NA ENSG00000272… 0.130 0.0275 0.0427 #> 2 A NA ENSG00000250… 0.886 0.136 0.143 #> 3 A NA ENSG00000275… 0.412 0 0.00103 #> 4 A NA ENSG00000186… 1 0 0 #> 5 A NA ENSG00000286… 0.389 0.0107 0.00771 #> 6 A NA ENSG00000131… 0.347 0.113 0.131 #> 7 A NA ENSG00000281… 0.657 0.0183 0.0211 #> 8 A NA ENSG00000272… 1 0 0 #> 9 A NA ENSG00000182… 0.148 0.359 0.304 #> 10 A NA ENSG00000174… 0.832 0.111 0.111 #> # ℹ 3,572 more rows ## get only differential genes given a threshold value marker_feats %>% dplyr::filter(p_val_raw < 0.05) #> # A tibble: 473 × 6 #> foreground background feature p_val_raw foreground_mean background_mean #> #> 1 B NA ENSG00000178… 0.0360 0.0180 0.00931 #> 2 A NA ENSG00000163… 0.0436 0.0748 0.102 #> 3 C NA ENSG00000159… 0.0380 0.205 0.145 #> 4 A NA ENSG00000125… 0.0429 0.00763 0.0175 #> 5 B NA ENSG00000159… 0.0238 0.0616 0.0982 #> 6 D NA ENSG00000159… 0.0484 0.120 0.0787 #> 7 B NA ENSG00000248… 0.0160 0.00300 0 #> 8 C NA ENSG00000173… 0.00666 0 0.0143 #> 9 B NA ENSG00000013… 0.00435 0.168 0.113 #> 10 A NA ENSG00000246… 0.0221 0.0260 0.0123 #> # ℹ 463 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/mask_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","title":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","text":"Mask matrix entries zero Set matrix entries zero given mask matrix dimensions. Normally, non-zero values mask set matrix entry zero. inverted, zero values mask matrix set matrix entry zero.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mask_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","text":"","code":"mask_matrix(mat, mask, invert = FALSE)"},{"path":"https://bnprks.github.io/BPCells/reference/mask_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Mask matrix entries to zero Set matrix entries to zero given a mask matrix of the same dimensions. Normally, non-zero values in the mask will set the matrix entry to zero. If inverted, zero values in the mask matrix will set the matrix entry to zero. — mask_matrix","text":"mat Data matrix (IterableMatrix) mask Mask matrix (IterableMatrix dgCMatrix)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":null,"dir":"Reference","previous_headings":"","what":"Broadcasting vector arithmetic — add_rows","title":"Broadcasting vector arithmetic — add_rows","text":"Convenience functions adding multiplying row / column matrix number.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Broadcasting vector arithmetic — add_rows","text":"","code":"add_rows(mat, vec) add_cols(mat, vec) multiply_rows(mat, vec) multiply_cols(mat, vec)"},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Broadcasting vector arithmetic — add_rows","text":"mat Matrix-like object vec Numeric vector","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Broadcasting vector arithmetic — add_rows","text":"Matrix-like object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/mat_norm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Broadcasting vector arithmetic — add_rows","text":"","code":"set.seed(12345) mat <- matrix(rpois(40, lambda = 5), nrow = 4) rownames(mat) <- paste0(\"gene\", 1:4) mat <- mat %>% as(\"dgCMatrix\") mat #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 5 6 6 4 5 6 3 3 8 #> gene2 8 3 11 . 4 4 4 5 6 8 #> gene3 6 4 1 4 3 9 6 7 4 6 #> gene4 8 5 3 5 9 6 5 . 4 3 mat <- mat %>% as(\"IterableMatrix\") ####################################################################### ## add_rows() example ####################################################################### add_rows(mat, 1:4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 7 6 7 7 5 6 7 4 4 9 #> gene2 10 5 13 2 6 6 6 7 8 10 #> gene3 9 7 4 7 6 12 9 10 7 9 #> gene4 12 9 7 9 13 10 9 4 8 7 ####################################################################### ## add_cols() example ####################################################################### add_cols(mat, 1:10) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 7 7 9 10 9 11 13 11 12 18 #> gene2 9 5 14 4 9 10 11 13 15 18 #> gene3 7 6 4 8 8 15 13 15 13 16 #> gene4 9 7 6 9 14 12 12 8 13 13 ####################################################################### ## multiply_rows() example ####################################################################### multiply_rows(mat, 1:4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 5 6 6 4 5 6 3 3 8 #> gene2 16 6 22 . 8 8 8 10 12 16 #> gene3 18 12 3 12 9 27 18 21 12 18 #> gene4 32 20 12 20 36 24 20 . 16 12 ####################################################################### ## multiply_cols() example ####################################################################### multiply_cols(mat, 1:10) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 10 18 24 20 30 42 24 27 80 #> gene2 8 6 33 . 20 24 28 40 54 80 #> gene3 6 8 3 16 15 54 42 56 36 60 #> gene4 8 10 9 20 45 36 35 . 36 30"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_R_conversion.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert between BPCells matrix and R objects. — matrix_R_conversion","title":"Convert between BPCells matrix and R objects. — matrix_R_conversion","text":"BPCells matrices can interconverted Matrix package dgCMatrix sparse matrices, well base R dense matrices (though may result high memory usage large matrices)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_R_conversion.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert between BPCells matrix and R objects. — matrix_R_conversion","text":"","code":"# Convert to R from BPCells as(bpcells_mat, \"dgCMatrix\") # Sparse matrix conversion as.matrix(bpcells_mat) # Dense matrix conversion # Convert to BPCells from R as(dgc_mat, \"IterableMatrix\")"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_R_conversion.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert between BPCells matrix and R objects. — matrix_R_conversion","text":"","code":"mat <- get_demo_mat()[1:2, 1:2] mat #> 2 x 2 IterableMatrix object with class MatrixSubset #> #> Row names: ENSG00000272602, ENSG00000250312 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /home/imman/.local/share/R/BPCells/demo_data/demo_mat_filtered_subsetted #> 2. Select rows: 1, 2 and cols: 1, 2 ####################################################################### ## as(bpcells_mat, \"dgCMatrix\") example ####################################################################### mat_dgc <- as(mat, \"dgCMatrix\") mat_dgc #> 2 x 2 sparse Matrix of class \"dgCMatrix\" #> TTTAGCAAGGTAGCTT-1 AGCCGGTTCCGGAACC-1 #> ENSG00000272602 1 . #> ENSG00000250312 . . ## as.matrix(bpcells_mat) example as.matrix(mat) #> Warning: Converting to a dense matrix may use excessive memory #> This message is displayed once every 8 hours. #> TTTAGCAAGGTAGCTT-1 AGCCGGTTCCGGAACC-1 #> ENSG00000272602 1 0 #> ENSG00000250312 0 0 ## Alternatively, can also use function as() as(mat, \"matrix\") #> TTTAGCAAGGTAGCTT-1 AGCCGGTTCCGGAACC-1 #> ENSG00000272602 1 0 #> ENSG00000250312 0 0 ####################################################################### ## as(dgc_mat, \"IterableMatrix\") example ####################################################################### as(mat_dgc, \"IterableMatrix\") #> 2 x 2 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: ENSG00000272602, ENSG00000250312 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_inputs.html","id":null,"dir":"Reference","previous_headings":"","what":"Return a list of input matrices to the current matrix (experimental) — matrix_inputs","title":"Return a list of input matrices to the current matrix (experimental) — matrix_inputs","text":"File objects 0 inputs. transforms 1 input. transforms (e.g. matrix multiplication matrix concatenation) can multiple used primarily know safe clear dimnames intermediate transformed matrices. C++ relies base matrices (non-transform) dimnames, R relies outermost matrix (transform) dimnames.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_inputs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return a list of input matrices to the current matrix (experimental) — matrix_inputs","text":"","code":"matrix_inputs(x)"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write sparse matrices — write_matrix_memory","title":"Read/write sparse matrices — write_matrix_memory","text":"BPCells matrices stored sparse format, meaning non-zero entries stored. Matrices can store integer counts data decimal numbers (float double). See details information.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write sparse matrices — write_matrix_memory","text":"","code":"write_matrix_memory(mat, compress = TRUE) write_matrix_dir( mat, dir, compress = TRUE, buffer_size = 8192L, overwrite = FALSE ) open_matrix_dir(dir, buffer_size = 8192L) write_matrix_hdf5( mat, path, group, compress = TRUE, buffer_size = 8192L, chunk_size = 1024L, overwrite = FALSE, gzip_level = 0L ) open_matrix_hdf5(path, group, buffer_size = 16384L)"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write sparse matrices — write_matrix_memory","text":"compress Whether compress data. dir Directory save data buffer_size performance tuning . number items buffered memory calling writes disk. overwrite TRUE, write temp dir overwrite existing data. Alternatively, pass temp path string customize temp dir location. path Path hdf5 file disk group group within hdf5 file write data . writing existing hdf5 file group must already use chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression). recommended compression compatibility outside programs required. Otherwise, using compress=TRUE recommended >10x faster often similar compression levels. matrix Input matrix, either IterableMatrix dgCMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write sparse matrices — write_matrix_memory","text":"BPCells matrix object","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"storage-locations","dir":"Reference","previous_headings":"","what":"Storage locations","title":"Read/write sparse matrices — write_matrix_memory","text":"Matrices can stored directory disk, memory, HDF5 file. Saving directory disk good default local analysis, provides best /O performance lowest memory usage. HDF5 format allows saving within existing hdf5 files group data together, memory format provides fastest performance event memory usage unimportant.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"bitpacking-compression","dir":"Reference","previous_headings":"","what":"Bitpacking Compression","title":"Read/write sparse matrices — write_matrix_memory","text":"typical RNA counts matrices holding integer counts, bitpacking compression result 6-8x less space R dgCMatrix, 4-6x smaller scipy csc_matrix. compression effective count values matrix small, rows matrix sorted rowMeans. tests RNA-seq data optimal ordering save 40% storage space. non-integer data row indices compressed, values space savings smaller. non-integer data matrices, bitpacking compression much less effective, can applied indexes entry values. still space savings, far less counts matrices.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_io.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write sparse matrices — write_matrix_memory","text":"","code":"## Create temporary directory to keep demo matrix data_dir <- file.path(tempdir(), \"mat\") if (dir.exists(data_dir)) unlink(data_dir, recursive = TRUE) dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) mat <- get_demo_mat() mat #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /home/imman/.local/share/R/BPCells/demo_data/demo_mat_filtered_subsetted ####################################################################### ## write_matrix_memory() example ####################################################################### mat_memory <- write_matrix_memory(mat) mat_memory #> 3582 x 2600 IterableMatrix object with class PackedMatrixMem_uint32_t #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from memory ####################################################################### ## write_matrix_dir() example ####################################################################### mat %>% write_matrix_dir( file.path(data_dir, \"demo_mat\"), overwrite = TRUE ) #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /tmp/RtmpCiGY9C/mat/demo_mat ####################################################################### ## open_matrix_dir() example ####################################################################### mat <- open_matrix_dir( file.path(data_dir, \"demo_mat\") ) mat #> 3582 x 2600 IterableMatrix object with class MatrixDir #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix from directory /tmp/RtmpCiGY9C/mat/demo_mat ####################################################################### ## write_matrix_hdf5() example ####################################################################### mat %>% write_matrix_hdf5(path = file.path(data_dir, \"demo_mat.h5\"), group = \"mat\") #> 3582 x 2600 IterableMatrix object with class MatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix in hdf5 file /tmp/RtmpCiGY9C/mat/demo_mat.h5, group mat ####################################################################### ## open_matrix_hdf5() example ####################################################################### mat_hdf5 <- open_matrix_hdf5( file.path(data_dir, \"demo_mat.h5\"), group = 'mat' ) mat_hdf5 #> 3582 x 2600 IterableMatrix object with class MatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. Load compressed matrix in hdf5 file /tmp/RtmpCiGY9C/mat/demo_mat.h5, group mat"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate matrix stats — matrix_stats","title":"Calculate matrix stats — matrix_stats","text":"Calculate matrix stats","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate matrix stats — matrix_stats","text":"","code":"matrix_stats( matrix, row_stats = c(\"none\", \"nonzero\", \"mean\", \"variance\"), col_stats = c(\"none\", \"nonzero\", \"mean\", \"variance\"), threads = 0L )"},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate matrix stats — matrix_stats","text":"matrix Input matrix object row_stats row statistics compute col_stats col statistics compute threads Number threads use execution","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate matrix stats — matrix_stats","text":"List row_stats: matrix n_stats x n_rows, col_stats: matrix n_stats x n_cols","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate matrix stats — matrix_stats","text":"statistics calculated single pass matrix, method desirable use efficiency purposes compared standard rowMeans colMeans multiple statistics needed. stats ordered complexity: nonzero, mean, variance. less complex stats calculated process calculating complicated stat. calculate mean variance simultaneously, just ask variance, compute mean nonzero counts side-effect","code":""},{"path":"https://bnprks.github.io/BPCells/reference/matrix_stats.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate matrix stats — matrix_stats","text":"","code":"mat <- matrix(rpois(100, lambda = 5), nrow = 10) rownames(mat) <- paste0(\"gene\", 1:10) colnames(mat) <- paste0(\"cell\", 1:10) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") ## By default, no row or column stats are calculated res_none <- matrix_stats(mat) res_none #> $row_stats #> gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9 gene10 #> #> $col_stats #> cell1 cell2 cell3 cell4 cell5 cell6 cell7 cell8 cell9 cell10 #> ## Request row variance (automatically computes mean and nonzero too) res_row_var <- matrix_stats(mat, row_stats = \"variance\") res_row_var #> $row_stats #> gene1 gene2 gene3 gene4 gene5 gene6 gene7 #> nonzero 10.000000 10.000000 10.00000 10.000000 10.000000 10.000000 10.000000 #> mean 6.000000 5.200000 5.40000 4.800000 5.700000 5.800000 7.000000 #> variance 5.555556 1.733333 10.93333 3.288889 6.677778 3.511111 5.555556 #> gene8 gene9 gene10 #> nonzero 10.000000 10.000000 10.000000 #> mean 4.200000 3.500000 4.800000 #> variance 3.288889 3.388889 5.288889 #> #> $col_stats #> cell1 cell2 cell3 cell4 cell5 cell6 cell7 cell8 cell9 cell10 #> ## Request both row variance and column variance res_both_var <- matrix_stats( mat = mat, row_stats = \"variance\", col_stats = \"mean\" ) res_both_var #> $row_stats #> gene1 gene2 gene3 gene4 gene5 gene6 gene7 #> nonzero 10.000000 10.000000 10.00000 10.000000 10.000000 10.000000 10.000000 #> mean 6.000000 5.200000 5.40000 4.800000 5.700000 5.800000 7.000000 #> variance 5.555556 1.733333 10.93333 3.288889 6.677778 3.511111 5.555556 #> gene8 gene9 gene10 #> nonzero 10.000000 10.000000 10.000000 #> mean 4.200000 3.500000 4.800000 #> variance 3.288889 3.388889 5.288889 #> #> $col_stats #> cell1 cell2 cell3 cell4 cell5 cell6 cell7 cell8 cell9 cell10 #> nonzero 10.0 10.0 10.0 10 10.0 10.0 10.0 10.0 10.0 10.0 #> mean 4.5 4.9 6.5 5 4.3 5.1 5.8 5.4 5.4 5.5 #>"},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":null,"dir":"Reference","previous_headings":"","what":"Merge cells into pseudobulks — merge_cells","title":"Merge cells into pseudobulks — merge_cells","text":"Peak tile matrix calculations can sped reducing number cells. cases outputs going added together afterwards, can provide performance improvement","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Merge cells into pseudobulks — merge_cells","text":"","code":"merge_cells(fragments, cell_groups)"},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Merge cells into pseudobulks — merge_cells","text":"fragments Input fragments object cell_groups Character factor vector providing group cell. Ordering cellNames(fragments)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_cells.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Merge cells into pseudobulks — merge_cells","text":"","code":"frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Pseudobulk into two groups merge_cells(frags, as.factor(c(rep(1,3), rep(2,3)))) #> IterableFragments object of class \"CellMerge\" #> #> Cells: 2 cells with names 1, 2 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Merge 6 cells into 2 groups"},{"path":"https://bnprks.github.io/BPCells/reference/merge_dimnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function for rbind/cbind merging dimnames — merge_dimnames","title":"Helper function for rbind/cbind merging dimnames — merge_dimnames","text":"Helper function rbind/cbind merging dimnames","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_dimnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function for rbind/cbind merging dimnames — merge_dimnames","text":"","code":"merge_dimnames(x, y, warning_prefix, dim_type)"},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":null,"dir":"Reference","previous_headings":"","what":"Merge peaks — merge_peaks_iterative","title":"Merge peaks — merge_peaks_iterative","text":"Merge peaks according ArchR's iterative merging algorithm. details ArchR website","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Merge peaks — merge_peaks_iterative","text":"","code":"merge_peaks_iterative(peaks)"},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Merge peaks — merge_peaks_iterative","text":"peaks Peaks given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position Must ordered priority columns chr, start, end.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Merge peaks — merge_peaks_iterative","text":"tibble::tibble() nonoverlapping subset rows peaks. metadata columns preserved","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Merge peaks — merge_peaks_iterative","text":"Properties merged peaks: peaks merged set overlap Peaks prioritized according order original input output peaks subset input peaks, peak boundaries changed","code":""},{"path":"https://bnprks.github.io/BPCells/reference/merge_peaks_iterative.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Merge peaks — merge_peaks_iterative","text":"","code":"## Create example peaks peaks <- tibble::tibble( chr = \"chr1\", start = as.integer(1:10), end = start + 2L ) peaks #> # A tibble: 10 × 3 #> chr start end #> #> 1 chr1 1 3 #> 2 chr1 2 4 #> 3 chr1 3 5 #> 4 chr1 4 6 #> 5 chr1 5 7 #> 6 chr1 6 8 #> 7 chr1 7 9 #> 8 chr1 8 10 #> 9 chr1 9 11 #> 10 chr1 10 12 ## Merge peaks merge_peaks_iterative(peaks) #> # A tibble: 5 × 3 #> chr start end #> #> 1 chr1 1 3 #> 2 chr1 3 5 #> 3 chr1 5 7 #> 4 chr1 7 9 #> 5 chr1 9 11"},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":null,"dir":"Reference","previous_headings":"","what":"Elementwise minimum — min_scalar","title":"Elementwise minimum — min_scalar","text":"min_scalar: Take minumum global constant min_by_row: Take minimum per-row constant min_by_col: Take minimum per-col constant","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Elementwise minimum — min_scalar","text":"","code":"min_scalar(mat, val) min_by_row(mat, vals) min_by_col(mat, vals)"},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Elementwise minimum — min_scalar","text":"mat IterableMatrix val Single positive numeric value","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Elementwise minimum — min_scalar","text":"IterableMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Elementwise minimum — min_scalar","text":"Take minimum value matrix per-row, per-col, global constant. constant must >0 preserve sparsity matrix. effect capping maximum value matrix.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/min_elementwise.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Elementwise minimum — min_scalar","text":"","code":"set.seed(12345) mat <- matrix(rpois(40, lambda = 5), nrow = 4) rownames(mat) <- paste0(\"gene\", 1:4) mat <- mat %>% as(\"dgCMatrix\") mat #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 6 5 6 6 4 5 6 3 3 8 #> gene2 8 3 11 . 4 4 4 5 6 8 #> gene3 6 4 1 4 3 9 6 7 4 6 #> gene4 8 5 3 5 9 6 5 . 4 3 mat <- mat %>% as(\"IterableMatrix\") ####################################################################### ## min_scalar() example ####################################################################### min_scalar(mat, 4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 4 4 4 4 4 4 4 3 3 4 #> gene2 4 3 4 . 4 4 4 4 4 4 #> gene3 4 4 1 4 3 4 4 4 4 4 #> gene4 4 4 3 4 4 4 4 . 4 3 ####################################################################### ## min_by_row() example ####################################################################### min_by_row(mat, 1:4) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 1 1 1 1 1 1 1 1 1 1 #> gene2 2 2 2 . 2 2 2 2 2 2 #> gene3 3 3 1 3 3 3 3 3 3 3 #> gene4 4 4 3 4 4 4 4 . 4 3 ####################################################################### ## min_by_col() example ####################################################################### min_by_col(mat, 1:10) %>% as(\"dgCMatrix\") #> 4 x 10 sparse Matrix of class \"dgCMatrix\" #> #> gene1 1 2 3 4 4 5 6 3 3 8 #> gene2 1 2 3 . 4 4 4 5 6 8 #> gene3 1 2 1 4 3 6 6 7 4 6 #> gene4 1 2 3 4 5 6 5 . 4 3"},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Normalize an object representing genomic ranges — normalize_ranges","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"Normalize object representing genomic ranges","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"","code":"normalize_ranges( ranges, metadata_cols = character(0), zero_based_coords = !is(ranges, \"GRanges\"), n = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position metadata_cols Optional list metadata columns require & extract zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"data frame zero-based coordinates, elements chr (factor), start (int), end (int). ranges chr level information, chr levels sorted unique values chr. strand metadata_cols, output strand element TRUE positive strand, FALSE negative strand. (Converted character vector \"+\"/\"-\" necessary)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_ranges.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Normalize an object representing genomic ranges — normalize_ranges","text":"","code":"## Prep data ranges <- GenomicRanges::GRanges( seqnames = S4Vectors::Rle(c(\"chr1\", \"chr2\", \"chr3\"), c(1, 2, 2)), ranges = IRanges::IRanges(101:105, end = 111:115, names = head(letters, 5)), strand = S4Vectors::Rle(GenomicRanges::strand(c(\"-\", \"+\", \"*\")), c(1, 2, 2)), score = 1:5, GC = seq(1, 0, length=5)) ranges #> GRanges object with 5 ranges and 2 metadata columns: #> seqnames ranges strand | score GC #> | #> a chr1 101-111 - | 1 1.00 #> b chr2 102-112 + | 2 0.75 #> c chr2 103-113 + | 3 0.50 #> d chr3 104-114 * | 4 0.25 #> e chr3 105-115 * | 5 0.00 #> ------- #> seqinfo: 3 sequences from an unspecified genome; no seqlengths ## Normalize ranges normalize_ranges(ranges) #> # A tibble: 5 × 3 #> chr start end #> #> 1 chr1 100 111 #> 2 chr2 101 112 #> 3 chr2 102 113 #> 4 chr3 103 114 #> 5 chr3 104 115 ## With metadata information normalize_ranges(ranges, metadata_cols = c(\"strand\", \"score\", \"GC\")) #> # A tibble: 5 × 6 #> strand chr start end score GC #> #> 1 FALSE chr1 100 111 1 1 #> 2 TRUE chr2 101 112 2 0.75 #> 3 TRUE chr2 102 113 3 0.5 #> 4 TRUE chr3 103 114 4 0.25 #> 5 TRUE chr3 104 115 5 0"},{"path":"https://bnprks.github.io/BPCells/reference/normalize_unique_file_names.html","id":null,"dir":"Reference","previous_headings":"","what":"Adjust a set of (unique) potential file names to not include any invalid characters. — normalize_unique_file_names","title":"Adjust a set of (unique) potential file names to not include any invalid characters. — normalize_unique_file_names","text":"Adjust set (unique) potential file names include invalid characters.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalize_unique_file_names.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adjust a set of (unique) potential file names to not include any invalid characters. — normalize_unique_file_names","text":"","code":"normalize_unique_file_names(names, replacement = \"_\")"},{"path":"https://bnprks.github.io/BPCells/reference/normalized_dimnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Helper function to set dimnames to NULL instead of 0-length character vectors — normalized_dimnames","title":"Helper function to set dimnames to NULL instead of 0-length character vectors — normalized_dimnames","text":"Helper function set dimnames NULL instead 0-length character vectors","code":""},{"path":"https://bnprks.github.io/BPCells/reference/normalized_dimnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Helper function to set dimnames to NULL instead of 0-length character vectors — normalized_dimnames","text":"","code":"normalized_dimnames(row_names, col_names)"},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":null,"dir":"Reference","previous_headings":"","what":"Count fragments by nucleosomal size — nucleosome_counts","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"Count fragments nucleosomal size","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"","code":"nucleosome_counts(fragments, nucleosome_width = 147)"},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"fragments Fragments object nucleosome_width Integer cutoff use nucleosome width","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"List names subNucleosomal, monoNucleosomal, multiNucleosomal, nFrags, containing count vectors fragments class per cell.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"Shorter nucleosome_width subNucleosomal, nucleosome_width 2*nucleosome_width-1 monoNucleosomal, anything longer multiNucleosomal. sum fragments given nFrags","code":""},{"path":"https://bnprks.github.io/BPCells/reference/nucleosome_counts.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Count fragments by nucleosomal size — nucleosome_counts","text":"","code":"## Prep data frags_sub_nucleosomal <- tibble::tibble( chr = 1, start = seq(0, 3000, by = 1000), end = start + 146, cell_id = c(rep(\"cell1\", 3), rep(\"cell2\", 1)) ) frags_sub_nucleosomal #> # A tibble: 4 × 4 #> chr start end cell_id #> #> 1 1 0 146 cell1 #> 2 1 1000 1146 cell1 #> 3 1 2000 2146 cell1 #> 4 1 3000 3146 cell2 frags_nucleosomal <- tibble::tibble( chr = 1, start = seq(5000, 7000, by = 1000), end = start + 147, # Value equal to nucleosome_width is inclusive cell_id = c(rep(\"cell1\", 1), rep(\"cell2\", 2)) ) frags_nucleosomal #> # A tibble: 3 × 4 #> chr start end cell_id #> #> 1 1 5000 5147 cell1 #> 2 1 6000 6147 cell2 #> 3 1 7000 7147 cell2 frags_multi_nucleosomal <- tibble::tibble( chr = 1, start = seq(12000, 15000, by = 1000), end = start + 294, # Value equal to 2x nucleosome_width cell_id = c(rep(\"cell1\", 2), rep(\"cell2\", 2)) ) frags_multi_nucleosomal #> # A tibble: 4 × 4 #> chr start end cell_id #> #> 1 1 12000 12294 cell1 #> 2 1 13000 13294 cell1 #> 3 1 14000 14294 cell2 #> 4 1 15000 15294 cell2 frags <- dplyr::bind_rows( frags_sub_nucleosomal, frags_nucleosomal, frags_multi_nucleosomal ) %>% convert_to_fragments() ## Get nucleosome counts head(nucleosome_counts(frags)) #> $subNucleosomal #> [1] 3 1 #> #> $monoNucleosomal #> [1] 1 2 #> #> $multiNucleosomal #> [1] 2 2 #> #> $nFrags #> [1] 6 5 #>"},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write a 10x fragments file — open_fragments_10x","title":"Read/write a 10x fragments file — open_fragments_10x","text":"10x fragment files come bed-like format, columns chr, start, end, cell_id, pcr_duplicates. Unlike standard bed format, format cellranger inclusive end-coordinate, meaning end coordinate counted tagmentation site, rather offset 1.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write a 10x fragments file — open_fragments_10x","text":"","code":"open_fragments_10x(path, comment = \"#\", end_inclusive = TRUE) write_fragments_10x( fragments, path, end_inclusive = TRUE, append_5th_column = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write a 10x fragments file — open_fragments_10x","text":"path File path (e.g. fragments.tsv fragments.tsv.gz) comment Skip lines beginning file start comment string end_inclusive Whether end coordinate bed inclusive – .e. insertion end coordinate rather base end coordinate. 10x default, though quite standard bed file format. fragments Input fragments object append_5th_column Whether include 5th column 0 compatibility 10x fragment file outputs (defaults 4 columns chr,start,end,cell)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write a 10x fragments file — open_fragments_10x","text":"10x fragments file object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write a 10x fragments file — open_fragments_10x","text":"open_fragments_10x disk operations take place fragments used function write_fragments_10x Fragments written disk immediately, returned readable object.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_fragments_10x.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write a 10x fragments file — open_fragments_10x","text":"","code":"## Download example fragments from pbmc 500 dataset and save in temp directory data_dir <- file.path(tempdir(), \"frags_10x\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) url_base <- \"https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/\" frags_file <- \"atac_pbmc_500_nextgem_fragments.tsv.gz\" atac_raw_url <- paste0(url_base, frags_file) if (!file.exists(file.path(data_dir, frags_file))) { download.file(atac_raw_url, file.path(data_dir, frags_file), mode=\"wb\") } ####################################################################### ## open_fragments_10x() example ####################################################################### frags <- open_fragments_10x( file.path(data_dir, frags_file) ) ## A Fragments object imported from 10x will not have cell/chromosome ## information directly known unless written as a BPCells fragment object frags #> IterableFragments object of class \"ShiftFragments\" #> #> Cells: count unknown #> Chromosomes: count unknown #> #> Queued Operations: #> 1. Load 10x fragments file from /tmp/RtmpCiGY9C/frags_10x/atac_pbmc_500_nextgem_fragments.tsv.gz #> 2. Shift start +0bp, end +1bp frags %>% write_fragments_dir( file.path(data_dir, \"demo_frags_from_h5\"), overwrite = TRUE ) #> IterableFragments object of class \"FragmentsDir\" #> #> Cells: 219780 cells with names CCACGTTCAGTAACTC-1, GCGAGAAGTCCACCAG-1 ... AAACGAAGTTCAGAAA-1 #> Chromosomes: 39 chromosomes with names chr1, chr10 ... KI270713.1 #> #> Queued Operations: #> 1. Read compressed fragments from directory /tmp/RtmpCiGY9C/frags_10x/demo_frags_from_h5 ####################################################################### ## write_fragments_10x() example ####################################################################### frags <- write_fragments_10x( frags, file.path(data_dir, paste0(\"new_\", frags_file)) ) frags #> IterableFragments object of class \"ShiftFragments\" #> #> Cells: count unknown #> Chromosomes: count unknown #> #> Queued Operations: #> 1. Load 10x fragments file from /tmp/RtmpCiGY9C/frags_10x/new_atac_pbmc_500_nextgem_fragments.tsv.gz #> 2. Shift start +0bp, end +1bp"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"Read/write 10x feature matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"","code":"open_matrix_10x_hdf5(path, feature_type = NULL, buffer_size = 16384L) write_matrix_10x_hdf5( mat, path, barcodes = colnames(mat), feature_ids = rownames(mat), feature_names = rownames(mat), feature_types = \"Gene Expression\", feature_metadata = list(), buffer_size = 16384L, chunk_size = 1024L, gzip_level = 0L, type = c(\"uint32_t\", \"double\", \"float\", \"auto\") )"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"path Path hdf5 file disk feature_type Optional selection feature types include output matrix. multiome data, options \"Gene Expression\" \"Peaks\". option compatible files cellranger 3.0 newer. buffer_size performance tuning . number items buffered memory calling writes disk. mat IterableMatrix barcodes Vector names cells feature_ids Vector IDs features feature_names Vector names features feature_types String vector feature types feature_metadata Named list additional metadata vectors store feature chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression) type Data type output matrix. Default uint32_t match matrix 10x UMI counts. Non-integer data types include float double. auto, use data type mat.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"BPCells matrix object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"10x format makes use gzip compression matrix data, can slow read performance. Consider writing another format read performance important . Input matrices must column-major storage order, rownames colnames set, names must provided relevant metadata parameters. metadata parameters read default BPCells, possible export use tools.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_10x_hdf5.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write a 10x feature matrix — open_matrix_10x_hdf5","text":"","code":"## Download example matrices from pbmc 500 dataset and save in temp directory data_dir <- file.path(tempdir(), \"mat_10x\") dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) url_base <- \"https://cf.10xgenomics.com/samples/cell-exp/6.1.0/500_PBMC_3p_LT_Chromium_X/\" mat_file <- \"500_PBMC_3p_LT_Chromium_X_filtered_feature_bc_matrix.h5\" rna_url <- paste0(url_base, mat_file) if (!file.exists(file.path(data_dir, mat_file))) { download.file(rna_url, file.path(data_dir, mat_file), mode=\"wb\") } ####################################################################### ## open_matrix_10x_hdf5() example ####################################################################### mat <- open_matrix_10x_hdf5( file.path(data_dir, mat_file) ) mat #> 36601 x 587 IterableMatrix object with class 10xMatrixH5 #> #> Row names: ENSG00000243485, ENSG00000237613 ... ENSG00000277196 #> Col names: AATCACGAGCATTGAA-1, AATCACGCAAGCCATT-1 ... TTTGTTGTCTCTAGGA-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. 10x HDF5 feature matrix in file /tmp/RtmpCiGY9C/mat_10x/500_PBMC_3p_LT_Chromium_X_filtered_feature_bc_matrix.h5 ####################################################################### ## write_matrix_10x_hdf5() example ####################################################################### mat <- write_matrix_10x_hdf5( mat, file.path(data_dir, paste0(\"new\", mat_file)) ) mat #> 36601 x 587 IterableMatrix object with class 10xMatrixH5 #> #> Row names: ENSG00000243485, ENSG00000237613 ... ENSG00000277196 #> Col names: AATCACGAGCATTGAA-1, AATCACGCAAGCCATT-1 ... TTTGTTGTCTCTAGGA-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. 10x HDF5 feature matrix in file /tmp/RtmpCiGY9C/mat_10x/new500_PBMC_3p_LT_Chromium_X_filtered_feature_bc_matrix.h5"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":null,"dir":"Reference","previous_headings":"","what":"Read/write AnnData matrix — open_matrix_anndata_hdf5","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"Read write matrix anndata hdf5 file. functions automatically transpose matrices converting /AnnData format. AnnData convention stores cells rows, whereas R convention stores cells columns. behavior undesired, call t() manually matrix inputs outputs functions. users writing AnnData files default write_matrix_anndata_hdf5() rather dense variant (see details information).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"","code":"open_matrix_anndata_hdf5(path, group = \"X\", buffer_size = 16384L) write_matrix_anndata_hdf5( mat, path, group = \"X\", buffer_size = 16384L, chunk_size = 1024L, gzip_level = 0L ) write_matrix_anndata_hdf5_dense( mat, path, dataset = \"X\", buffer_size = 16384L, chunk_size = 1024L, gzip_level = 0L )"},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"path Path hdf5 file disk group group within hdf5 file write data . writing existing hdf5 file group must already use buffer_size performance tuning . number items buffered memory calling writes disk. chunk_size performance tuning . chunk size used HDF5 array storage. gzip_level Gzip compression level. Default 0 (compression) dataset dataset within hdf5 file write matrix . Used write_matrix_anndata_hdf5_dense","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"AnnDataMatrixH5 object, cells columns.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"Efficiency considerations: Reading dense AnnData matrix generally slower sparse single cell datasets, recommended re-write dense AnnData inputs sparse format early processing. write_matrix_anndata_hdf5() used default, always writes efficient sparse format. write_matrix_anndata_hdf5_dense() writes AnnData dense format, can used smaller matrices efficiency file size less concern increased portability (e.g. writing obsm varm matrices). See AnnData docs format details. Dimension names: Dimnames inferred obs/_index var/_index based length matching. helps infer dimnames obsp, varm, etc. number len(obs) == len(var), dimname inference disabled.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/open_matrix_anndata_hdf5.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read/write AnnData matrix — open_matrix_anndata_hdf5","text":"","code":"## Create temporary directory to keep demo matrix data_dir <- file.path(tempdir(), \"mat_anndata\") if (dir.exists(data_dir)) unlink(data_dir, recursive = TRUE) dir.create(data_dir, recursive = TRUE, showWarnings = FALSE) mat <- get_demo_mat() ####################################################################### ## write_matrix_anndata_hdf5() example ####################################################################### mat <- write_matrix_anndata_hdf5( mat, file.path(data_dir, paste0(\"new_demo_mat.h5\")) ) mat #> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. AnnData HDF5 matrix in file /tmp/RtmpCiGY9C/mat_anndata/new_demo_mat.h5, group X ####################################################################### ## open_matrix_anndata_hdf5() example ####################################################################### mat <- open_matrix_anndata_hdf5( file.path(data_dir, paste0(\"new_demo_mat.h5\")) ) mat #> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. AnnData HDF5 matrix in file /tmp/RtmpCiGY9C/mat_anndata/new_demo_mat.h5, group X ####################################################################### ## write_matrix_anndata_hdf5_dense() example ####################################################################### mat <- write_matrix_anndata_hdf5_dense( mat, file.path(data_dir, paste0(\"new_demo_mat_dense.h5\")) ) mat #> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5 #> #> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: column major #> #> Queued Operations: #> 1. AnnData HDF5 matrix in file /tmp/RtmpCiGY9C/mat_anndata/new_demo_mat_dense.h5, group X"},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Get end-sorted ordering for genome ranges — order_ranges","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"Use function order regioins prior calling peak_matrix() tile_matrix().","code":""},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"","code":"order_ranges(ranges, chr_levels, sort_by_end = TRUE)"},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position chr_levels Ordering chromosome names sort_by_end TRUE (defualt), sort (chr, end, start). Else sort (chr, start, end)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"Numeric vector analagous order function. Provides index selection reorder input ranges sorted chr, end, start","code":""},{"path":"https://bnprks.github.io/BPCells/reference/order_ranges.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get end-sorted ordering for genome ranges — order_ranges","text":"","code":"## Prep data ranges <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(310, 0, -60), cell_id = paste0(\"cell1\") ) %>% as(\"GRanges\") ranges #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 10-320 * | cell1 #> [2] chr1 60-310 * | cell1 #> [3] chr1 110-300 * | cell1 #> [4] chr1 160-290 * | cell1 #> [5] chr1 210-280 * | cell1 #> [6] chr1 260-270 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths ## Get end-sorted ordering order_ranges(ranges, levels(GenomicRanges::seqnames(ranges))) #> [1] 6 5 4 3 2 1"},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":null,"dir":"Reference","previous_headings":"","what":"Color palettes — discrete_palette","title":"Color palettes — discrete_palette","text":"color palettes derived ArchR color palettes, provide large sets distinguishable colors","code":""},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Color palettes — discrete_palette","text":"","code":"discrete_palette(name, n = 1) continuous_palette(name)"},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Color palettes — discrete_palette","text":"name Name color palette. Valid discrete palettes : stallion, calm, kelly, bear, ironMan, circus, paired, grove, summerNight, captain. Valid continuous palettes bluePurpleDark n Minimum number colors needed","code":""},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Color palettes — discrete_palette","text":"Character vector hex color codes","code":""},{"path":"https://bnprks.github.io/BPCells/reference/palettes.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Color palettes — discrete_palette","text":"requested number colors large, new palette constructed via interpolation requested palette","code":""},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Prepare a matrix for multi-threaded operation — parallel_split","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"Transforms matrix matrix_stats matrix multiplies vector/dense matrix evaluated parallel. speeds specific operations, reading writing matrix general. parallelism guaranteed work additional operations applied parallel split.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"","code":"parallel_split(mat, threads, chunks = threads)"},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"mat IterableMatrix threads Number execution threads chunks Number chunks use (>= threads)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/parallel_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Prepare a matrix for multi-threaded operation — parallel_split","text":"IterableMatrix perform certain operations parallel","code":""},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":null,"dir":"Reference","previous_headings":"","what":"Create partial function calls — partial_apply","title":"Create partial function calls — partial_apply","text":"Specify arguments function.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create partial function calls — partial_apply","text":"","code":"partial_apply(f, ..., .overwrite = TRUE, .missing_args_error = TRUE)"},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create partial function calls — partial_apply","text":"f function ... Named arguments f .overwrite (bool) f already output partial_apply(), whether parameter re-definitions ignored overwrite existing definitions .missing_args_error (bool) TRUE, passing arguments function's signature raise error, otherwise ignored","code":""},{"path":"https://bnprks.github.io/BPCells/reference/partial_apply.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create partial function calls — partial_apply","text":"bpcells_partial object (function extra attributes)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate ranges x cells overlap matrix — peak_matrix","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"Calculate ranges x cells overlap matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"","code":"peak_matrix( fragments, ranges, mode = c(\"insertions\", \"fragments\", \"overlaps\"), zero_based_coords = !is(ranges, \"GRanges\"), explicit_peak_names = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"fragments Input fragments object. Must cell names chromosome names defined ranges Peaks/ranges overlap, given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position mode Mode counting peak overlaps. (See \"value\" section details) zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) explicit_peak_names Boolean whether add rownames output matrix format e.g chr1:500-1000, start end coords given 0-based coordinate system. Note either way, peak names written matrix saved.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"Iterable matrix object dimension ranges x cells. saved, column names output matrix format chr1:500-1000, start end coords given 0-based coordinate system. mode options \"insertions\": Start end coordinates separately overlapped peak \"fragments\": Like \"insertions\", fragment can contribute 1 count peak, even start end coordinates overlap \"overlaps\": Like \"fragments\", overlap also counted fragment fully spans peak even neither start end falls within peak","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"calculating matrix directly fragments tsv, necessary first call select_chromosomes() order provide ordering chromosomes expect reading tsv.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/peak_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate ranges x cells overlap matrix — peak_matrix","text":"","code":"## Prep demo data frags <- get_demo_frags(subset = FALSE) chrom_sizes <- read_ucsc_chrom_sizes(file.path(tempdir(), \"references\"), genome=\"hg38\") blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE) peaks <- call_peaks_tile( frags_filter_blacklist, chrom_sizes, effective_genome_size = 2.8e9 ) top_peaks <- head(peaks, 5000) top_peaks <- top_peaks[order_ranges(top_peaks, chrNames(frags)),] ## Get peak matrix peak_matrix(frags_filter_blacklist, top_peaks, mode=\"insertions\") #> 5000 x 2600 IterableMatrix object with class PeakMatrix #> #> Row names: chr1:959200-959400, chr1:1019400-1019600 ... chrX:154379066-154379266 #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: row major #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered #> 2. Subset to fragments not overlapping 636 ranges: chr10:1-45700 ... chrY:26637301-57227400 #> 3. Calculate 2600 peaks over 5000 ranges: chr1:959201-959400 ... chrX:154379067-154379266"},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":null,"dir":"Reference","previous_headings":"","what":"Dotplot — plot_dot","title":"Dotplot — plot_dot","text":"Plot feature levels per group cluster grid dots. Dots colored z-score normalized average expression, sized percent non-zero.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Dotplot — plot_dot","text":"","code":"plot_dot( source, features, groups, group_order = NULL, gene_mapping = human_gene_mapping, colors = c(\"lightgrey\", \"#4682B4\"), return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Dotplot — plot_dot","text":"source Feature x cell matrix data.frame features. best results, features sparse log-normalized (e.g. run log1p() zero raw counts map zero) features Character vector features plot groups Vector one entry per cell, specifying cell's group group_order Optional vector listing ordering groups gene_mapping optional vector gene name matching match_gene_symbol(). colors Color scale plot return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_dot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Dotplot — plot_dot","text":"","code":"## Prep data mat <- get_demo_mat() cell_types <- paste(\"Group\", rep(1:3, length.out = length(colnames(mat)))) ## Plot dot plot <- plot_dot(mat, c(\"MS4A1\", \"CD3E\"), cell_types) BPCells:::render_plot_from_storage( plot, width = 4, height = 5 )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot UMAP or embeddings — plot_embedding","title":"Plot UMAP or embeddings — plot_embedding","text":"Plot one features coloring cells UMAP plot.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot UMAP or embeddings — plot_embedding","text":"","code":"plot_embedding( source, embedding, features = NULL, quantile_range = c(0.01, 0.99), randomize_order = TRUE, smooth = NULL, smooth_rounds = 3, gene_mapping = human_gene_mapping, size = NULL, rasterize = FALSE, raster_pixels = 512, legend_continuous = c(\"auto\", \"quantile\", \"value\"), labels_quantile_range = TRUE, colors_continuous = c(\"lightgrey\", \"#4682B4\"), legend_discrete = TRUE, labels_discrete = TRUE, colors_discrete = discrete_palette(\"stallion\"), return_data = FALSE, return_plot_list = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot UMAP or embeddings — plot_embedding","text":"source Matrix, data frame pull features , vector feature values single feature. matrix, features must rows. embedding matrix dimensions cells x 2 embedding coordinates features Character vector features plot source vector. quantile_range (optional) Length 2 vector giving quantiles clip minimum maximum color scale values, fractions 0 1. NULL NA values skip clipping randomize_order TRUE, shuffle cells prevent overplotting biases. Can pass integer instead specify random seed use. smooth (optional) Sparse matrix dimensions cells x cells cell-cell distance weights smoothing. smooth_rounds Number multiplication rounds apply smoothing. gene_mapping optional vector gene name matching match_gene_symbol(). Ignored source data frame. size Point size plotting rasterize Whether rasterize point drawing speed display graphics programs. raster_pixels Number pixels use rasterizing. Can provide one number square dimensions, two numbers width x height. legend_continuous Whether label continuous features quantile value. \"auto\" labels quantile features continuous quantile_range NULL. Quantile labeling adds text annotation listing range displayed values. labels_quantile_range Whether add text label value range feature legend set quantile colors_continuous Vector colors use continuous color palette legend_discrete Whether show legend discrete (categorical) features. labels_discrete Whether add text labels center group discrete (categorical) features. colors_discrete Vector colors use discrete (categorical) features. return_data true, return data just plotting rather plot. return_plot_list TRUE, return multiple plots list, rather single plot combined using patchwork::wrap_plots() apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot UMAP or embeddings — plot_embedding","text":"default, returns ggplot2 object requested features plotted grid. return_data return_plot_list called, return value match argument.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"smoothing","dir":"Reference","previous_headings":"","what":"Smoothing","title":"Plot UMAP or embeddings — plot_embedding","text":"Smoothing performed follows: first, smoothing matrix normalized sum incoming weights every cell 1. , raw data values repeatedly multiplied smoothing matrix re-scaled average value stays .","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_embedding.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot UMAP or embeddings — plot_embedding","text":"","code":"set.seed(123) mat <- get_demo_mat() ## Normalize matrix mat_norm <- log1p(multiply_cols(mat, 1/colSums(mat)) * 10000) %>% write_matrix_memory(compress = FALSE) ## Get variable genes stats <- matrix_stats(mat, row_stats = \"variance\") variable_genes <- order(stats$row_stats[\"variance\",], decreasing=TRUE) %>% head(1000) %>% sort() # Z score normalize genes mat_norm <- mat[variable_genes, ] gene_means <- stats$row_stats['mean', variable_genes] gene_vars <- stats$row_stats['variance', variable_genes] mat_norm <- (mat_norm - gene_means) / gene_vars ## Save matrix to memory mat_norm <- mat_norm %>% write_matrix_memory(compress = FALSE) ## Run SVD svd <- BPCells::svds(mat_norm, k = 10) pca <- multiply_cols(svd$v, svd$d) ## Get UMAP umap <- uwot::umap(pca) ## Get clusters clusts <- knn_hnsw(pca, ef = 500) %>% knn_to_snn_graph() %>% cluster_graph_louvain() #> 14:58:39 Building HNSW index with metric 'euclidean' ef = 200 M = 16 using 1 threads #> 14:58:39 Finished building index #> 14:58:39 Searching HNSW index with ef = 500 and 1 threads #> 14:58:39 Finished searching ## Plot embeddings print(length(clusts)) #> [1] 2600 plot_embedding(clusts, umap) ### Can also plot by features #plot_embedding( # source = mat, # umap, # features = c(\"MS4A1\", \"CD3E\"), #)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":null,"dir":"Reference","previous_headings":"","what":"Fragment size distribution — plot_fragment_length","title":"Fragment size distribution — plot_fragment_length","text":"Plot distribution fragment lengths, length basepairs x-axis, proportion fragments y-axis. Typical plots show 10-basepair periodicity, well humps spaced multiples nucleosome width (150bp).","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fragment size distribution — plot_fragment_length","text":"","code":"plot_fragment_length( fragments, max_length = 500, return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fragment size distribution — plot_fragment_length","text":"fragments Fragments object max_length Maximum length show plot return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fragment size distribution — plot_fragment_length","text":"Numeric vector index contans number length-fragments","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_fragment_length.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fragment size distribution — plot_fragment_length","text":"","code":"frags <- get_demo_frags(filter_qc = FALSE, subset = FALSE) plot_fragment_length(frags)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":null,"dir":"Reference","previous_headings":"","what":"Knee plot of single cell read counts — plot_read_count_knee","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"Plots read count rank vs. number reads log-log scale.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"","code":"plot_read_count_knee( read_counts, cutoff = NULL, return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"read_counts Vector read counts per cell cutoff (optional) Read cutoff mark plot return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"ggplot2 plot object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"Performs logarithmic downsampling reduce number points plotted","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_read_count_knee.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Knee plot of single cell read counts — plot_read_count_knee","text":"","code":"## Prep data mat <- get_demo_mat(filter_qc = FALSE, subset = FALSE) reads_per_cell <- colSums(mat) # Render knee plot plot_read_count_knee(reads_per_cell, cutoff = 1e3)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tf_footprint.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot TF footprint — plot_tf_footprint","title":"Plot TF footprint — plot_tf_footprint","text":"Plot footprinting around TF motif sites","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tf_footprint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot TF footprint — plot_tf_footprint","text":"","code":"plot_tf_footprint( fragments, motif_positions, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), flank = 250L, smooth = 0L, zero_based_coords = !is(genes, \"GRanges\"), colors = discrete_palette(\"stallion\"), return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tf_footprint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot TF footprint — plot_tf_footprint","text":"fragments IterableFragments object motif_positions Coordinate ranges motifs (must include strand) constant width cell_groups Character factor assigning group cell, order cellNames(fragments) flank Number flanking basepairs include either side motif smooth (optional) Sparse matrix dimensions cells x cells cell-cell distance weights smoothing. zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot TSS profile — plot_tss_profile","title":"Plot TSS profile — plot_tss_profile","text":"Plot enrichmment insertions relative transcription start sites (TSS). Typically, plot shows strong enrichment insertions near TSS, small bump downstream around 220bp downstream TSS +1 nucleosome.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot TSS profile — plot_tss_profile","text":"","code":"plot_tss_profile( fragments, genes, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), flank = 2000L, smooth = 0L, zero_based_coords = !is(genes, \"GRanges\"), colors = discrete_palette(\"stallion\"), return_data = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot TSS profile — plot_tss_profile","text":"fragments IterableFragments object genes Coordinate ranges genes (must include strand) cell_groups Character factor assigning group cell, order cellNames(fragments) flank Number flanking basepairs include either side motif smooth Number bases smooth (rolling average) zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range return_data true, return data just plotting rather plot. apply_styling false, return plot without pretty styling applied","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_profile.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot TSS profile — plot_tss_profile","text":"","code":"## Prep data frags <- get_demo_frags(filter_qc = FALSE, subset = FALSE) genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) ## Plot tss profile plot_tss_profile(frags, genes)"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":null,"dir":"Reference","previous_headings":"","what":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"Density scatter plot log10(fragment_count) x-axis TSS enrichment y-axis. plot useful select cell barcodes experiment correspond high-quality cells","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"","code":"plot_tss_scatter( atac_qc, min_frags = NULL, min_tss = NULL, bins = 100, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"atac_qc Tibble returned qc_scATAC(). Must columns nFrags TSSEnrichment min_frags Minimum fragment count cutoff min_tss Minimum TSS Enrichment cutoff bins Number bins density calculation apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/plot_tss_scatter.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"TSS Enrichment vs. Fragment Counts plot — plot_tss_scatter","text":"","code":"## Prep data frags <- get_demo_frags(filter_qc = FALSE, subset = FALSE) genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") atac_qc <- qc_scATAC(frags, genes, blacklist) ## Render tss enrichment vs fragment plot plot_tss_scatter(atac_qc, min_frags = 1000, min_tss = 10)"},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":null,"dir":"Reference","previous_headings":"","what":"Add sample prefix to cell names — prefix_cell_names","title":"Add sample prefix to cell names — prefix_cell_names","text":"Rename cells adding prefix names. commonly sample name. cells recieve exact text prefix added beginning, separator characters like \"_\" must included given prefix. Use prior merging fragments different experiments c() order help prevent cell name clashes.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add sample prefix to cell names — prefix_cell_names","text":"","code":"prefix_cell_names(fragments, prefix)"},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add sample prefix to cell names — prefix_cell_names","text":"fragments Input fragments object. prefix String add prefix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add sample prefix to cell names — prefix_cell_names","text":"Fragments object prefixed names","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prefix_cell_names.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add sample prefix to cell names — prefix_cell_names","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Prefix cells with foo prefix_cell_names(frags, \"foo_\") %>% as(\"GRanges\") #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 11-40 * | foo_cell1 #> [2] chr1 61-90 * | foo_cell1 #> [3] chr1 111-140 * | foo_cell2 #> [4] chr1 161-190 * | foo_cell2 #> [5] chr1 211-240 * | foo_cell3 #> [6] chr1 261-290 * | foo_cell3 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a small demo matrix and fragment object. — prepare_demo_data","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"Downloads 10x Genomics dataset, consisting 3k cells performs optional QC subsetting. Holds subsetted objects disk, returns list matrix fragments.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"","code":"prepare_demo_data( directory = NULL, filter_qc = TRUE, subset = TRUE, timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"directory (character) directory input/output data stored. Downloaded intermediates stored subdir intermediates. NULL, temporary directory created. filter_qc (bool) Whether filter RNA ATAC data using QC information. subset (bool) Whether subset genes/insertions chromosome 4 11. timeout (numeric) Timeout downloading files seconds.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"(list) list RNA matrix name mat, ATAC fragments name frags.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/prepare_demo_data.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a small demo matrix and fragment object. — prepare_demo_data","text":"function downloads 10x Genomics PBMC 3k dataset. Filtering using QC information fragments matrix provides cells least 1000 reads, 1000 frags, minimum tss enrichment 10. Subsetting provides genes insertions chromosomes 4 11. name matrix fragments folders demo_mat demo_frags respectively. Additionally, choosing qc filter appends _filtered, choosing subset data appends _subsetted name.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"Given (features x cells) matrix, group cells cell_groups aggregate counts method feature.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"","code":"pseudobulk_matrix(mat, cell_groups, method = \"sum\", threads = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"mat IterableMatrix object dimensions features x cells cell_groups (Character/factor) Vector group/cluster assignments cell. Length must ncol(mat). method (Character vector) Method(s) aggregate counts. one method provided, output matrix. multiple methods provided, output named list matrices. Current options : nonzeros, sum, mean, variance. threads (integer) Number threads use.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"method length 1, returns matrix shape (features x groups). method greater length 1, returns list matrices matrix representing pseudobulk matrix different aggregation method. matrix shape (features x groups), names one nonzeros, sum, mean, variance.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"simpler stats calculated process calculating complex statistics. calculating variance, nonzeros mean can included extra calculation time, calculating mean, adding nonzeros take extra time.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/pseudobulk_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Aggregate counts matrices by cell group or feature. — pseudobulk_matrix","text":"","code":"set.seed(12345) mat <- matrix(rpois(100, lambda = 5), nrow = 10) rownames(mat) <- paste0(\"gene\", 1:10) colnames(mat) <- paste0(\"cell\", 1:10) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") groups <- rep(c(\"Cluster1\", \"Cluster2\"), each = 5) ## When calculating only sum across two groups pseudobulk_res <- pseudobulk_matrix( mat = mat, cell_groups = groups, method = \"sum\" ) pseudobulk_res #> Cluster1 Cluster2 #> gene1 26 38 #> gene2 19 27 #> gene3 32 21 #> gene4 27 19 #> gene5 22 27 #> gene6 20 23 #> gene7 24 37 #> gene8 24 22 #> gene9 20 23 #> gene10 34 21 ## Can also request multiple summary statistics for pseudoulking pseudobulk_res_multi <- pseudobulk_matrix( mat = mat, cell_groups = groups, method = c(\"mean\", \"variance\") ) names(pseudobulk_res_multi) #> [1] \"mean\" \"variance\" pseudobulk_res_multi$mean #> Cluster1 Cluster2 #> gene1 5.2 7.6 #> gene2 3.8 5.4 #> gene3 6.4 4.2 #> gene4 5.4 3.8 #> gene5 4.4 5.4 #> gene6 4.0 4.6 #> gene7 4.8 7.4 #> gene8 4.8 4.4 #> gene9 4.0 4.6 #> gene10 6.8 4.2"},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"Calculate ArchR-compatible per-cell QC statistics","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"","code":"qc_scATAC(fragments, genes, blacklist)"},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"fragments IterableFragments object genes Gene coordinates given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position blacklist Blacklisted regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"data.frame QC data","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"implementation mimics ArchR's default parameters. uses requiring flexibility tweak default parameters, best option re-implement function required changes. Output columns data.frame: cellName: cell name cell nFrags: number fragments per cell subNucleosomal, monoNucleosomal, multiNucleosomal: number fragments size 1-146bp, 147-254bp, 255bp + respectively. equivalent ArchR's nMonoFrags, nDiFrags, nMultiFrags respectively TSSEnrichment: AvgInsertInTSS / max(AvgInsertFlankingTSS, 0.1), AvgInsertInTSS ReadsInTSS / 101 (window size), AvgInsertFlankingTSS ReadsFlankingTSS / (100*2) (window size). max(0.1) ensures low-read cells get assigned spuriously high TSSEnrichment. ReadsInPromoter: Number reads 2000bp upstream TSS 101bp downstream TSS ReadsInBlacklist: Number reads provided blacklist region ReadsInTSS: Number reads overlapping 101bp centered around TSS ReadsFlankingTSS: Number reads overlapping 1901-2000bp +/- TSS Differences ArchR: Note ArchR default uses different set annotations derive TSS sites promoter sites. function uses just one annotation gene start+end sites, must called twice exactly re-calculate ArchR QC stats. ArchR's PromoterRatio BlacklistRatio included output, can easily calculated ReadsInPromoter / nFrags ReadsInBlacklist / nFrags. Similarly, ArchR's NucleosomeRatio can calculated (monoNucleosomal + multiNucleosomal) / subNucleosomal.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/qc_scATAC.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate ArchR-compatible per-cell QC statistics — qc_scATAC","text":"","code":"## Prep data frags <- get_demo_frags(subset = FALSE) reference_dir <- file.path(tempdir(), \"references\") genes <- read_gencode_transcripts( reference_dir, release=\"42\", transcript_choice=\"MANE_Select\", annotation_set = \"basic\", features=\"transcript\" ) blacklist <- read_encode_blacklist(reference_dir, genome = \"hg38\") ## Run qc head(qc_scATAC(frags, genes, blacklist)) #> # A tibble: 6 × 10 #> cellName TSSEnrichment nFrags subNucleosomal monoNucleosomal multiNucleosomal #> #> 1 TTTAGCAA… 45.1 16363 8069 5588 2706 #> 2 AGCCGGTT… 30.9 33313 15855 11868 5590 #> 3 TGATTAGT… 41.9 11908 6103 3817 1988 #> 4 ATTGACTC… 43.9 13075 6932 4141 2002 #> 5 CGTTAGGT… 31.5 14874 6833 5405 2636 #> 6 AAACCGCG… 41.9 30141 15085 10199 4857 #> # ℹ 4 more variables: ReadsInTSS , ReadsFlankingTSS , #> # ReadsInPromoter , ReadsInBlacklist "},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":null,"dir":"Reference","previous_headings":"","what":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"Given set genomic ranges, find distance nearest neighbors upstream downstream.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"","code":"range_distance_to_nearest( ranges, addArchRBug = FALSE, zero_based_coords = !is(ranges, \"GRanges\") )"},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"ranges Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand addArchRBug boolean reproduce ArchR's bug incorrectly handles nested genes zero_based_coords true, coordinates start 0 end coordinate included range. false, coordinates start 1 end coordinate included range","code":""},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"2-column data.frame columns upstream downstream, containing distances nearest neighbor respective directions. ranges + * strand, distance calculated : upstream = max(start(range) - end(upstreamNeighbor), 0) downstream = max(start(downstreamNeighbor) - end(range), 0) ranges - strand, definition upstream downstream flipped. Note definition distance one GenomicRanges::distance(), ranges neighbor overlap given distance 1 rather 0.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/range_distance_to_nearest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find signed distance to nearest genomic ranges — range_distance_to_nearest","text":"","code":"## Prep data ranges <- tibble::tibble( chr = \"chr1\", start = seq(10, 410, 100), end = start + 50, strand = \"+\" ) ## Add one range that is completely nested in the other ranges ranges_with_nesting <- ranges %>% tibble::add_row(chr = \"chr1\", start = 11, end = 20, strand = \"+\") ## Get range distance to nearest range_distance_to_nearest(ranges_with_nesting) #> # A tibble: 6 × 2 #> upstream downstream #> #> 1 Inf 51 #> 2 51 51 #> 3 51 51 #> 4 51 51 #> 5 51 Inf #> 6 0 0"},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":null,"dir":"Reference","previous_headings":"","what":"Rank-transform a matrix — rank_transform","title":"Rank-transform a matrix — rank_transform","text":"Rank values within row/col matrix, output rank values new matrix. Rank values offset rank 0 value 0, ties handled averaging ranks.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rank-transform a matrix — rank_transform","text":"","code":"rank_transform(mat, axis)"},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rank-transform a matrix — rank_transform","text":"mat Data matrix (IterableMatrix) axis Axis rank values within. \"col\" rank values within column, \"row\" rank values within row.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rank_transform.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Rank-transform a matrix — rank_transform","text":"Note efficient rank calculation depends storage order matrix, may necessary call transpose_storage_order()","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":null,"dir":"Reference","previous_headings":"","what":"Read a bed file into a data frame — read_bed","title":"Read a bed file into a data frame — read_bed","text":"Bed files can contain peak blacklist annotations. utilities help read thos annotations","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read a bed file into a data frame — read_bed","text":"","code":"read_bed( path, additional_columns = character(0), backup_url = NULL, timeout = 300 ) read_encode_blacklist( dir, genome = c(\"hg38\", \"mm10\", \"hg19\", \"dm6\", \"dm3\", \"ce11\", \"ce10\"), timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read a bed file into a data frame — read_bed","text":"path Path file (desired save location backup_url used) additional_columns Names additional columns bed file backup_url path exist, provides URL download gtf timeout Maximum time seconds wait download backup_url dir Output directory cache downloaded gtf file genome genome name","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read a bed file into a data frame — read_bed","text":"Data frame coordinates using 0-based convention.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read a bed file into a data frame — read_bed","text":"read_bed Read bed file disk url. read_encode_blacklist Downloads Boyle Lab blacklist, described https://doi.org/10.1038/s41598-019-45839-z","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/read_bed.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read a bed file into a data frame — read_bed","text":"","code":"## Dummy bed file creation data.frame( chrom = rep(\"chr1\", 6), start = seq(20, 121, 20), end = seq(39, 140, 20) ) %>% write.table(\"./references/example.bed\", row.names = FALSE, col.names = FALSE, sep = \"\\t\") ####################################################################### ## read_bed() example ####################################################################### read_bed(\"./references/example.bed\") #> # A tibble: 6 × 3 #> chr start end #> #> 1 chr1 20 39 #> 2 chr1 40 59 #> 3 chr1 60 79 #> 4 chr1 80 99 #> 5 chr1 100 119 #> 6 chr1 120 139 ####################################################################### ## read_encode_blacklist() example ####################################################################### read_encode_blacklist(\"./reference\") #> # A tibble: 636 × 4 #> chr start end reason #> #> 1 chr10 0 45700 Low Mappability #> 2 chr10 38481300 38596500 High Signal Region #> 3 chr10 38782600 38967900 High Signal Region #> 4 chr10 39901300 41712900 High Signal Region #> 5 chr10 41838900 42107300 High Signal Region #> 6 chr10 42279400 42322500 High Signal Region #> 7 chr10 126946300 126953400 Low Mappability #> 8 chr10 133625800 133797400 High Signal Region #> 9 chr11 0 194500 Low Mappability #> 10 chr11 518900 520700 Low Mappability #> # ℹ 626 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":null,"dir":"Reference","previous_headings":"","what":"Read GTF gene annotations — read_gtf","title":"Read GTF gene annotations — read_gtf","text":"Read gene annotations gtf format data frame. source can URL, gtf file disk, gencode release version.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read GTF gene annotations — read_gtf","text":"","code":"read_gtf( path, attributes = c(\"gene_id\"), tags = character(0), features = c(\"gene\"), keep_attribute_column = FALSE, backup_url = NULL, timeout = 300 ) read_gencode_genes( dir, release = \"latest\", annotation_set = c(\"basic\", \"comprehensive\"), gene_type = \"lncRNA|protein_coding|IG_.*_gene|TR_.*_gene\", attributes = c(\"gene_id\", \"gene_type\", \"gene_name\"), tags = character(0), features = c(\"gene\"), timeout = 300 ) read_gencode_transcripts( dir, release = \"latest\", transcript_choice = c(\"MANE_Select\", \"Ensembl_Canonical\", \"all\"), annotation_set = c(\"basic\", \"comprehensive\"), gene_type = \"lncRNA|protein_coding|IG_.*_gene|TR_.*_gene\", attributes = c(\"gene_id\", \"gene_type\", \"gene_name\", \"transcript_id\"), features = c(\"transcript\", \"exon\"), timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read GTF gene annotations — read_gtf","text":"path Path file (desired save location backup_url used) attributes Vector GTF attribute names parse columns tags Vector tags parse boolean presence/absence features List features types keep GTF (e.g. gene, transcript, exon, intron) keep_attribute_column Boolean whether preserve raw attribute text column backup_url path exist, provides URL download gtf timeout Maximum time seconds wait download backup_url dir Output directory cache downloaded gtf file release release version (prefix M mouse versions). recent version, use \"latest\" \"latest_mouse\" annotation_set Either \"basic\" \"comprehensive\" annotation sets (see details section). gene_type Regular expression gene types keep. Defaults protein_coding, lncRNA, IG/TR genes transcript_choice Method selecting representative transcripts. Choices : MANE_Select: human-, conservative Ensembl_Canonical: human+mouse, superset MANE_Select human : Preserve transcript models (recommended plotting)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read GTF gene annotations — read_gtf","text":"Data frame coordinates using 0-based convention. Columns : chr source feature start end score strand frame attributes (optional; named according listed attributes) tags (named according listed tags)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read GTF gene annotations — read_gtf","text":"read_gtf Read gtf file URL read_gencode_genes Read gene annotations directly GENCODE. file name vary depending release annotation set requested, format gencode.v42.annotation.gtf.gz. GENCODE currently recommends basic set: https://www.gencodegenes.org/human/. release 42, comprehensive basic sets identical gene-level annotations, comprehensive set additional transcript variants annotated. read_gencode_transcripts Read transcript models GENCODE, use trackplot_gene()","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/read_gtf.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read GTF gene annotations — read_gtf","text":"","code":"####################################################################### ## read_gtf() example ####################################################################### species <- \"Saccharomyces_cerevisiae\" version <- \"GCF_000146045.2_R64\" head(read_gtf( path = sprintf(\"./reference/%s_genomic.gtf.gz\", version), backup_url = sprintf( \"https://ftp.ncbi.nlm.nih.gov/genomes/refseq/fungi/%s/reference/%s/%s_genomic.gtf.gz\", species, version, version ) )) #> # A tibble: 6 × 9 #> chr source feature start end score strand frame gene_id #> #> 1 NC_001133.9 RefSeq gene 1806 2169 . - . YAL068C #> 2 NC_001133.9 RefSeq gene 2479 2707 . + . YAL067W-A #> 3 NC_001133.9 RefSeq gene 7234 9016 . - . YAL067C #> 4 NC_001133.9 RefSeq gene 11564 11951 . - . YAL065C #> 5 NC_001133.9 RefSeq gene 12045 12426 . + . YAL064W-B #> 6 NC_001133.9 RefSeq gene 13362 13743 . - . YAL064C-A ####################################################################### ## read_gencode_genes() example ####################################################################### read_gencode_genes(\"./references\", release = \"42\") #> # A tibble: 39,319 × 11 #> chr source feature start end score strand frame gene_id gene_type #> #> 1 chr1 HAVANA gene 11868 14409 . + . ENSG00000290… lncRNA #> 2 chr1 HAVANA gene 29553 31109 . + . ENSG00000243… lncRNA #> 3 chr1 HAVANA gene 34553 36081 . - . ENSG00000237… lncRNA #> 4 chr1 HAVANA gene 57597 64116 . + . ENSG00000290… lncRNA #> 5 chr1 HAVANA gene 65418 71585 . + . ENSG00000186… protein_… #> 6 chr1 HAVANA gene 89294 133723 . - . ENSG00000238… lncRNA #> 7 chr1 HAVANA gene 89550 91105 . - . ENSG00000239… lncRNA #> 8 chr1 HAVANA gene 139789 140339 . - . ENSG00000239… lncRNA #> 9 chr1 HAVANA gene 141473 173862 . - . ENSG00000241… lncRNA #> 10 chr1 HAVANA gene 160445 161525 . + . ENSG00000241… lncRNA #> # ℹ 39,309 more rows #> # ℹ 1 more variable: gene_name ####################################################################### ## read_gencode_transcripts() example ####################################################################### ## If read_gencode_genes() was already ran on the same release, ## will reuse previously downloaded annotations read_gencode_transcripts(\"./references\", release = \"42\") #> # A tibble: 220,296 × 13 #> chr source feature start end score strand frame gene_id gene_type #> #> 1 chr1 HAVANA transcript 65418 71585 . + . ENSG00000… protein_… #> 2 chr1 HAVANA exon 65418 65433 . + . ENSG00000… protein_… #> 3 chr1 HAVANA exon 65519 65573 . + . ENSG00000… protein_… #> 4 chr1 HAVANA exon 69036 71585 . + . ENSG00000… protein_… #> 5 chr1 HAVANA transcript 450739 451678 . - . ENSG00000… protein_… #> 6 chr1 HAVANA exon 450739 451678 . - . ENSG00000… protein_… #> 7 chr1 HAVANA transcript 685715 686654 . - . ENSG00000… protein_… #> 8 chr1 HAVANA exon 685715 686654 . - . ENSG00000… protein_… #> 9 chr1 HAVANA transcript 923922 944574 . + . ENSG00000… protein_… #> 10 chr1 HAVANA exon 923922 924948 . + . ENSG00000… protein_… #> # ℹ 220,286 more rows #> # ℹ 3 more variables: gene_name , transcript_id , MANE_Select "},{"path":"https://bnprks.github.io/BPCells/reference/read_ucsc_chrom_sizes.html","id":null,"dir":"Reference","previous_headings":"","what":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","title":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","text":"Read chromosome sizes UCSC return tibble one row per chromosome. underlying data pulled : https://hgdownload.soe.ucsc.edu/downloads.html","code":""},{"path":"https://bnprks.github.io/BPCells/reference/read_ucsc_chrom_sizes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","text":"","code":"read_ucsc_chrom_sizes( dir, genome = c(\"hg38\", \"mm39\", \"mm10\", \"mm9\", \"hg19\"), keep_chromosomes = \"chr[0-9]+|chrX|chrY\", timeout = 300 )"},{"path":"https://bnprks.github.io/BPCells/reference/read_ucsc_chrom_sizes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read UCSC chromosome sizes — read_ucsc_chrom_sizes","text":"","code":"read_ucsc_chrom_sizes(\"./reference\") #> # A tibble: 24 × 3 #> chr start end #> #> 1 chr1 0 248956422 #> 2 chr2 0 242193529 #> 3 chr3 0 198295559 #> 4 chr4 0 190214555 #> 5 chr5 0 181538259 #> 6 chr6 0 170805979 #> 7 chr7 0 159345973 #> 8 chrX 0 156040895 #> 9 chr8 0 145138636 #> 10 chr9 0 138394717 #> # ℹ 14 more rows"},{"path":"https://bnprks.github.io/BPCells/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. magrittr %>% Matrix colMeans, colSums, rowMeans, rowSums","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":null,"dir":"Reference","previous_headings":"","what":"Regress out unwanted variation — regress_out","title":"Regress out unwanted variation — regress_out","text":"Regress effects confounding variables using linear least squares regression model.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Regress out unwanted variation — regress_out","text":"","code":"regress_out(mat, latent_data, prediction_axis = c(\"row\", \"col\"))"},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Regress out unwanted variation — regress_out","text":"mat Input IterableMatrix latent_data Data regress , data.frame column variable regress . prediction_axis axis corresponds prediction outputs linear models (e.g. gene axis typical single cell analysis). Options include \"row\" (default) \"col\".","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Regress out unwanted variation — regress_out","text":"IterableMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/regress_out.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Regress out unwanted variation — regress_out","text":"Conceptually, regress_out calculates linear least squares best fit model row matrix. (column prediction_axis \"col\"). input data regression model columns latent_data, model tries predict values corresponding row (column) mat. fitting model, regress_out subtract model predictions input values, aiming retain effects explained variables latent_data. models can fit efficiently since share input data calculations closed-form best fit solution shared. QR factorization model matrix dense matrix-vector multiply sufficient fully calculate residual values. Efficiency considerations: output matrix dense rather sparse, mean variance calculations may run comparatively slowly. However, PCA matrix/vector multiply operations can performed nearly cost input matrix due mathematical simplifications. Memory usage scales n_features * ((nrow(mat) + ncol(mat)). Generally, n_features == ncol(latent_data), categorical variables latent_data, category expanded indicator variable. Memory usage therefore higher using categorical input variables many (.e. >100) distinct values.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/render_plot_from_storage.html","id":null,"dir":"Reference","previous_headings":"","what":"Render a plot with intermediate disk storage step — render_plot_from_storage","title":"Render a plot with intermediate disk storage step — render_plot_from_storage","text":"Take plotting object save temp storage, can outputted exact dimensions. Primarily used allow adjusting plot dimensions within function reference examples.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/render_plot_from_storage.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Render a plot with intermediate disk storage step — render_plot_from_storage","text":"","code":"render_plot_from_storage(plot, width, height)"},{"path":"https://bnprks.github.io/BPCells/reference/render_plot_from_storage.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Render a plot with intermediate disk storage step — render_plot_from_storage","text":"plot (ggplot) ggplot output plotting function width (numeric) width rendered plot height (numeric) height rendered plot","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rotate_x_labels.html","id":null,"dir":"Reference","previous_headings":"","what":"Rotate ggplot x axis labels — rotate_x_labels","title":"Rotate ggplot x axis labels — rotate_x_labels","text":"Rotate ggplot x axis labels","code":""},{"path":"https://bnprks.github.io/BPCells/reference/rotate_x_labels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rotate ggplot x axis labels — rotate_x_labels","text":"","code":"rotate_x_labels(degrees = 45)"},{"path":"https://bnprks.github.io/BPCells/reference/rotate_x_labels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rotate ggplot x axis labels — rotate_x_labels","text":"degrees Number degrees rotate ","code":""},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":null,"dir":"Reference","previous_headings":"","what":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"Scan fragments without performing operations (used benchmarking)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"","code":"scan_fragments(fragments)"},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"fragments Fragments object scan","code":""},{"path":"https://bnprks.github.io/BPCells/reference/scan_fragments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Scan through fragments without performing any operations (used for benchmarking) — scan_fragments","text":"Length 4 vector fragment count, sums chr, starts, ends","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":null,"dir":"Reference","previous_headings":"","what":"SCTransform Pearson Residuals — sctransform_pearson","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"Calculate pearson residuals negative binomial sctransform model. Normalized values calculated (X - mu) / sqrt(mu + mu^2/theta). mu calculated cell_read_counts * gene_beta.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"","code":"sctransform_pearson( mat, gene_theta, gene_beta, cell_read_counts, min_var = -Inf, clip_range = c(-10, 10), columns_are_cells = TRUE, slow = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"mat IterableMatrix (raw counts) gene_theta Vector per-gene thetas (overdispersion values) gene_beta Vector per-gene betas (expression level values) cell_read_counts Vector total reads per (umi count RNA) min_var Minimum value clipping variance clip_range Length 2 vector min max clipping range columns_are_cells Whether columns matrix correspond cells (default) genes slow TRUE, use 10x slower precise implementation (default FALSE)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"IterableMatrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/sctransform_pearson.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"SCTransform Pearson Residuals — sctransform_pearson","text":"parameterization used somewhat simplified compared original SCTransform paper, particular uses linear-scale rather log-scale represent cell_read_counts gene_beta variables. also support addition arbitrary cell metadata (e.g. batch) add negative binomial regression.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset, translate, or reorder cell IDs — select_cells","title":"Subset, translate, or reorder cell IDs — select_cells","text":"Subset, translate, reorder cell IDs","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset, translate, or reorder cell IDs — select_cells","text":"","code":"select_cells(fragments, cell_selection)"},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset, translate, or reorder cell IDs — select_cells","text":"fragments Input fragments object cell_selection List cell IDs (numeric), names (character), logical mask.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Subset, translate, or reorder cell IDs — select_cells","text":"Numeric cell IDs re-assigned order cell_selection. output cell ID n taken input cell ID/name cell_selection[n].","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_cells.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset, translate, or reorder cell IDs — select_cells","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 3 cells with names cell1, cell2, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Select cells by name select_cells(frags, \"cell1\") #> IterableFragments object of class \"CellSelectName\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 1 cells by name: cell1 ## Select cells by index select_cells(frags, c(1,3)) #> IterableFragments object of class \"CellSelectIndex\" #> #> Cells: 2 cells with names cell1, cell3 #> Chromosomes: 1 chromosomes with names chr1 #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 2 cells by index: 1, 3"},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset, translate, or reorder chromosome IDs — select_chromosomes","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"Subset, translate, reorder chromosome IDs","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"","code":"select_chromosomes(fragments, chromosome_selection)"},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"fragments Input fragments object chromosome_selection List chromosme IDs (numeric), names (character), logical mask.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"Numeric chromosome IDs re-assigned order chromosome_selection. output chromosome ID n taken input chromosome ID/name chromosome_selection[n].","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_chromosomes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset, translate, or reorder chromosome IDs — select_chromosomes","text":"","code":"## Prep data frags <- tibble::tibble( chr = c(rep(\"chr1\", 2), rep(\"chrX\", 2), rep(\"chr3\", 2)), start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell1\") ) %>% as(\"GRanges\") frags <- frags %>% convert_to_fragments() frags #> IterableFragments object of class \"UnpackedMemFragments\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 3 chromosomes with names chr1, chr3, chrX #> #> Queued Operations: #> 1. Read uncompressed fragments from memory ## Selecting by chromosome IDs select_chromosomes(frags, c(1, 3)) #> IterableFragments object of class \"ChrSelectIndex\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 2 chromosomes with names chr1, chrX #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 2 chromosomes by index: 1, 3 ## Selecting by name select_chromosomes(frags, c(\"chrX\")) #> IterableFragments object of class \"ChrSelectName\" #> #> Cells: 1 cells with names cell1 #> Chromosomes: 1 chromosomes with names chrX #> #> Queued Operations: #> 1. Read uncompressed fragments from memory #> 2. Select 1 chromosomes by name: chrX"},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset fragments by genomic region — select_regions","title":"Subset fragments by genomic region — select_regions","text":"Fragments can subset based overlapping (overlapping) set regions","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset fragments by genomic region — select_regions","text":"","code":"select_regions( fragments, ranges, invert_selection = FALSE, zero_based_coords = !is(ranges, \"GRanges\") )"},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset fragments by genomic region — select_regions","text":"fragments Input fragments object. ranges Peaks/ranges overlap, given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position invert_selection TRUE, select fragments overlapping selected regions instead fragments overlapping selected regions. zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Subset fragments by genomic region — select_regions","text":"Fragments object filtered according selected regions","code":""},{"path":"https://bnprks.github.io/BPCells/reference/select_regions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset fragments by genomic region — select_regions","text":"","code":"frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = \"cell1\" ) frags #> # A tibble: 6 × 4 #> chr start end cell_id #> #> 1 chr1 10 15 cell1 #> 2 chr1 60 70 cell1 #> 3 chr1 110 125 cell1 #> 4 chr1 160 180 cell1 #> 5 chr1 210 235 cell1 #> 6 chr1 260 290 cell1 frags <- frags %>% convert_to_fragments() region <- tibble::tibble( chr = \"chr1\", start = 60, end = 130 ) %>% as(\"GRanges\") ## Select ranges overlapping with region select_regions(frags, region) %>% as(\"GRanges\") #> GRanges object with 2 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 61-70 * | cell1 #> [2] chr1 111-125 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths ## Select ranges not overlapping with region select_regions(frags, region, invert_selection = TRUE) %>% as(\"GRanges\") #> GRanges object with 4 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 11-15 * | cell1 #> [2] chr1 161-180 * | cell1 #> [3] chr1 211-235 * | cell1 #> [4] chr1 261-290 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":null,"dir":"Reference","previous_headings":"","what":"Set matrix op thread count — set_threads","title":"Set matrix op thread count — set_threads","text":"Set number threads use sparse-dense multiply matrix_stats.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set matrix op thread count — set_threads","text":"","code":"set_threads(mat, threads = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set matrix op thread count — set_threads","text":"mat IterableMatrix, product rbind cbind threads Number threads use execution","code":""},{"path":"https://bnprks.github.io/BPCells/reference/set_threads.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Set matrix op thread count — set_threads","text":"valid concatenated matrices","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":null,"dir":"Reference","previous_headings":"","what":"Shift start or end coordinates — shift_fragments","title":"Shift start or end coordinates — shift_fragments","text":"Shifts start end fragments fixed amount, can useful correct Tn5 offset.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Shift start or end coordinates — shift_fragments","text":"","code":"shift_fragments(fragments, shift_start = 0L, shift_end = 0L)"},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Shift start or end coordinates — shift_fragments","text":"fragments Input fragments object shift_start many basepairs shift start coords shift_end many basepairs shift end coords","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Shift start or end coordinates — shift_fragments","text":"Shifted fragments object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Shift start or end coordinates — shift_fragments","text":"correct Tn5 offset +/- 4bp since Tn5 cut sites opposite strands offset 9bp. However, +4/-5 bp often applied bed-format files, since end coordinate bed files 1 past last basepair sequenced DNA fragment. results bed-like format except inclusive end coordinates.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/shift_fragments.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Shift start or end coordinates — shift_fragments","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + 30, cell_id = paste0(\"cell1\") ) %>% as(\"GRanges\") frags #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 10-40 * | cell1 #> [2] chr1 60-90 * | cell1 #> [3] chr1 110-140 * | cell1 #> [4] chr1 160-190 * | cell1 #> [5] chr1 210-240 * | cell1 #> [6] chr1 260-290 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths frags <- frags %>% convert_to_fragments() ## Shift fragments shift_fragments(frags, shift_start = 4, shift_end = -4) %>% as(\"GRanges\") #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 14-36 * | cell1 #> [2] chr1 64-86 * | cell1 #> [3] chr1 114-136 * | cell1 #> [4] chr1 164-186 * | cell1 #> [5] chr1 214-236 * | cell1 #> [6] chr1 264-286 * | cell1 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset fragments by length — subset_lengths","title":"Subset fragments by length — subset_lengths","text":"Subset fragments length","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset fragments by length — subset_lengths","text":"","code":"subset_lengths(fragments, min_len = 0L, max_len = NA_integer_)"},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset fragments by length — subset_lengths","text":"fragments Input fragments object min_len Minimum bases fragment (inclusive) max_len Maximum bases fragment (inclusive)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Subset fragments by length — subset_lengths","text":"Fragments object","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Subset fragments by length — subset_lengths","text":"Fragment length calculated end-start","code":""},{"path":"https://bnprks.github.io/BPCells/reference/subset_lengths.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset fragments by length — subset_lengths","text":"","code":"## Prep data frags <- tibble::tibble( chr = \"chr1\", start = seq(10, 260, 50), end = start + seq(5, 30, 5), cell_id = paste0(\"cell\", c(rep(1, 2), rep(2,2), rep(3,2))) ) frags #> # A tibble: 6 × 4 #> chr start end cell_id #> #> 1 chr1 10 15 cell1 #> 2 chr1 60 70 cell1 #> 3 chr1 110 125 cell2 #> 4 chr1 160 180 cell2 #> 5 chr1 210 235 cell3 #> 6 chr1 260 290 cell3 frags <- frags %>% convert_to_fragments() ## Subset lengths subset_lengths(frags, min_len = 10, max_len = 20) %>% as(\"GRanges\") #> GRanges object with 3 ranges and 1 metadata column: #> seqnames ranges strand | cell_id #> | #> [1] chr1 61-70 * | cell1 #> [2] chr1 111-125 * | cell2 #> [3] chr1 161-180 * | cell2 #> ------- #> seqinfo: 1 sequence from an unspecified genome; no seqlengths"},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate svds — svds","title":"Calculate svds — svds","text":"Use C++ Spectra solver (RSpectra package), order compute largest k values corresponding singular vectors. Empirically, memory usage much lower using irlba::irlba(), likely due avoiding R garbage creation solving due pure-C++ solver. documentation slightly-edited version RSpectra::svds() documentation.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate svds — svds","text":"","code":"svds(A, k, nu = k, nv = k, opts = list(), threads=0L, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate svds — svds","text":"matrix whose truncated SVD computed. k Number singular values requested. nu Number right singular vectors computed. must 0 'k'. (Must equal 'k' BPCells IterableMatrix) opts Control parameters related computing algorithm. See Details threads Control threads use calculating mat-vec producs (BPCells specific)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate svds — svds","text":"list following components: d vector computed singular values. u m nu matrix whose columns contain left singular vectors. nu == 0, NULL returned. v n nv matrix whose columns contain right singular vectors. nv == 0, NULL returned. nconv Number converged singular values. niter Number iterations used. nops Number matrix-vector multiplications used.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate svds — svds","text":"RSpectra installed, function just add method RSpectra::svds() IterableMatrix class. opts argument list can supply following parameters: ncv Number Lanzcos basis vectors use. vectors result faster convergence, greater memory use. ncv must satisfy \\(k < ncv \\le p\\) p = min(m, n). Default min(p, max(2*k+1, 20)). tol Precision parameter. Default 1e-10. maxitr Maximum number iterations. Default 1000. center Either logical value (TRUE/FALSE), numeric vector length \\(n\\). vector \\(c\\) supplied, SVD computed matrix \\(- 1c'\\), implicit way without actually forming matrix. center = TRUE effect center = colMeans(). Default FALSE. Ignored BPCells scale Either logical value (TRUE/FALSE), numeric vector length \\(n\\). vector \\(s\\) supplied, SVD computed matrix \\((- 1c')S\\), \\(c\\) centering vector \\(S = diag(1/s)\\). scale = TRUE, vector \\(s\\) computed column norm \\(- 1c'\\). Default FALSE. Ignored BPCells","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Calculate svds — svds","text":"Qiu Y, Mei J (2022). RSpectra: Solvers Large-Scale Eigenvalue SVD Problems. R package version 0.16-1, https://CRAN.R-project.org/package=RSpectra.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/svds.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate svds — svds","text":"","code":"mat <- matrix(rnorm(500), nrow = 50, ncol = 10) rownames(mat) <- paste0(\"gene\", seq_len(50)) colnames(mat) <- paste0(\"cell\", seq_len(10)) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") svd_res <- svds(mat, k = 5) names(svd_res) #> [1] \"d\" \"u\" \"v\" \"niter\" \"nops\" \"nconv\" svd_res$d #> [1] 10.213518 9.181788 8.371677 7.570168 7.202453 dim(svd_res$u) #> [1] 50 5 dim(svd_res$v) #> [1] 10 5 # Can also pass in values directly into RSpectra::svds svd_res <- svds(mat, k = 5, opts=c(maxitr = 500))"},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate ranges x cells tile overlap matrix — tile_matrix","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"Calculate ranges x cells tile overlap matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"","code":"tile_matrix( fragments, ranges, mode = c(\"insertions\", \"fragments\"), zero_based_coords = !is(ranges, \"GRanges\"), explicit_tile_names = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"fragments Input fragments object ranges Tiled regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position tile_width: Size tile region basepairs Must non-overlapping sorted (chr, start), chromosomes ordered according chromosome names fragments mode Mode counting tile overlaps. (See \"value\" section detail) zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) explicit_tile_names Boolean whether add rownames output matrix format e.g chr1:500-1000, start end coords given 0-based coordinate system. whole-genome Tile matrices names take ~5 seconds generate take 400MB memory. Note either way, tile names written matrix saved.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"Iterable matrix object dimension ranges x cells. saved, column names format chr1:500-1000, start end coords given 0-based coordinate system. mode options \"insertions\": Start end coordinates separately overlapped tile \"fragments\": Like \"insertions\", fragment can contribute 1 count tile, even start end coordinates overlap","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"calculating matrix directly fragments tsv, necessary first call select_chromosomes() order provide ordering chromosomes expect reading tsv.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate ranges x cells tile overlap matrix — tile_matrix","text":"","code":"## Prep demo data frags <- get_demo_frags(subset = FALSE) chrom_sizes <- read_ucsc_chrom_sizes(file.path(tempdir(), \"references\"), genome=\"hg38\") blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE) ranges <- tibble::tibble( chr = \"chr4\", start = 0, end = \"190214555\", tile_width = 200 ) ## Get tile matrix tile_matrix(frags_filter_blacklist, ranges) #> 951073 x 2600 IterableMatrix object with class TileMatrix #> #> Row names: unknown names #> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1 #> #> Data type: uint32_t #> Storage order: row major #> #> Queued Operations: #> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered #> 2. Subset to fragments not overlapping 636 ranges: chr10:1-45700 ... chrY:26637301-57227400 #> 3. Calculate 951073 tiles over 1 ranges: chr4:1-190214555 (200bp), chr4:1-190214555 (200bp)"},{"path":"https://bnprks.github.io/BPCells/reference/tile_ranges.html","id":null,"dir":"Reference","previous_headings":"","what":"Get ranges corresponding to selected tiles of a tile matrix — tile_ranges","title":"Get ranges corresponding to selected tiles of a tile matrix — tile_ranges","text":"Get ranges corresponding selected tiles tile matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/tile_ranges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get ranges corresponding to selected tiles of a tile matrix — tile_ranges","text":"","code":"tile_ranges(tile_matrix, selection)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":null,"dir":"Reference","previous_headings":"","what":"Pseudobulk trackplot — trackplot_bulk","title":"Pseudobulk trackplot — trackplot_bulk","text":"function renamed trackplot_coverage() Plot pseudobulk genome track, showing number fragment insertions across region.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pseudobulk trackplot — trackplot_bulk","text":"","code":"trackplot_bulk( fragments, region, groups, cell_read_counts, group_order = NULL, bins = 200, clip_quantile = 0.999, colors = discrete_palette(\"stallion\"), legend_label = \"group\", zero_based_coords = !is(region, \"GRanges\"), return_data = FALSE, return_plot_list = FALSE, apply_styling = TRUE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pseudobulk trackplot — trackplot_bulk","text":"fragments Fragments object region GRanges length 1 region plot, list/data.frame one entry chr, start, end. See gene_region() genomic-ranges details groups Vector one entry per cell, specifying cell's group cell_read_counts Numeric vector read counts cell (used normalization) group_order Optional vector listing ordering groups bins Number bins plot across region clip_quantile (optional) Quantile values clipping y-axis limits. Default 0.999 crop just extreme outliers across region. NULL disable clipping colors Character vector color values (optionally named group) legend_label Custom label put legend zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) return_data true, return data just plotting rather plot. return_plot_list TRUE, return multiple plots list, rather single plot combined using patchwork::wrap_plots() apply_styling false, return plot without pretty styling applied","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_bulk.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pseudobulk trackplot — trackplot_bulk","text":"Returns combined plot pseudobulk genome tracks. compatability draw_trackplot_grid(), extra attribute $patches$labels added specify labels track. return_data return_plot_list TRUE, return value modified accordingly.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"Calculate y positions trackplot segments avoid overlap Steps: Calculate maximum overlap depth transcripts Iterate start/end segments sorted order Randomly assign segment y-coordinate 1 max overlap depth, restriction segment y-coordinate overlapping segment","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"","code":"trackplot_calculate_segment_height(data)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"data tibble genome ranges start end columns, assumed chromosome.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_calculate_segment_height.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate y positions for trackplot segments to avoid overlap Steps: Calculate the maximum overlap depth of transcripts Iterate through start/end of segments in sorted order Randomly assign each segment a y-coordinate between 1 and max overlap depth, with the restriction that a segment can't have the same y-coordinate as an overlapping segment — trackplot_calculate_segment_height","text":"Vector y coordinates, one per input row, ranges y coordinate overlap","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine track plots — trackplot_combine","title":"Combine track plots — trackplot_combine","text":"Combines multiple track plots region single grid. Uses patchwork package perform alignment.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine track plots — trackplot_combine","text":"","code":"trackplot_combine( tracks, side_plot = NULL, title = NULL, side_plot_width = 0.3 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine track plots — trackplot_combine","text":"tracks List tracks order top bottom, generally ggplots output trackplot_*() functions. side_plot Optional plot align right (e.g. RNA expression per cluster). aligned first trackplot_coverage() output present, else first generic ggplot alignment. horizontal orientation cluster ordering coverage plots. title Text overarching title plot side_plot_width Fraction width used side plot relative main track area","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine track plots — trackplot_combine","text":"plot object aligned genome plots. aligned row text label, y-axis, plot body. relative height row given heights. shared title x-axis put top.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_combine.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Combine track plots — trackplot_combine","text":"","code":"## Prep data frags <- get_demo_frags() ## Use genes and blacklist to determine proper number of reads per cell genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") read_counts <- qc_scATAC(frags, genes, blacklist)$nFrags region <- \"chr4:3034877-4034877\" cell_types <- paste(\"Group\", rep(1:3, length.out = length(cellNames(frags)))) transcripts <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\" ) region <- \"chr4:3034877-4034877\" ## Get all trackplots and scalebars to combine plot_scalebar <- trackplot_scalebar(region) plot_gene <- trackplot_gene(transcripts, region) plot_coverage <- trackplot_coverage(frags, region, groups = cell_types, cell_read_counts = read_counts) ## Combine trackplots and render ## Also remove colors from gene track plot <- trackplot_combine( list(plot_scalebar, plot_coverage, plot_gene + ggplot2::guides(color = \"none\")) ) BPCells:::render_plot_from_storage(plot, width = 6, height = 4)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":null,"dir":"Reference","previous_headings":"","what":"Pseudobulk coverage trackplot — trackplot_coverage","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"Plot pseudobulk genome track, showing number fragment insertions across region cell type group.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"","code":"trackplot_coverage( fragments, region, groups, cell_read_counts, group_order = NULL, bins = 500, clip_quantile = 0.999, colors = discrete_palette(\"stallion\"), legend_label = NULL, zero_based_coords = !is(region, \"GRanges\"), return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"fragments Fragments object region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details groups Vector one entry per cell, specifying cell's group cell_read_counts Numeric vector read counts cell (used normalization) group_order Optional vector listing ordering groups bins Number bins plot across region clip_quantile (optional) Quantile values clipping y-axis limits. Default 0.999 crop just extreme outliers across region. NULL disable clipping colors Character vector color values (optionally named group) legend_label Custom label put legend (longer used color legend shown anymore) zero_based_coords Whether convert ranges 1-based end-inclusive coordinate system 0-based end-exclusive coordinate system. Defaults true GRanges false formats (see archived UCSC blogpost) return_data true, return data just plotting rather plot. scale_bar Whether include scale bar top track (TRUE FALSE)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"Returns combined plot pseudobulk genome tracks. compatability draw_trackplot_grid(), extra attribute $patches$labels added specify labels track. return_data return_plot_list TRUE, return value modified accordingly.","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_coverage.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pseudobulk coverage trackplot — trackplot_coverage","text":"","code":"frags <- get_demo_frags() ## Use genes and blacklist to determine proper number of reads per cell genes <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) blacklist <- read_encode_blacklist(file.path(tempdir(), \"references\"), genome=\"hg38\") read_counts <- qc_scATAC(frags, genes, blacklist)$nFrags region <- \"chr4:3034877-4034877\" cell_types <- paste(\"Group\", rep(1:3, length.out = length(cellNames(frags)))) BPCells:::render_plot_from_storage( trackplot_coverage(frags, region, groups = cell_types, cell_read_counts = read_counts), width = 6, height = 3 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":null,"dir":"Reference","previous_headings":"","what":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"Break segments smaller segments length plot, divided size","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"","code":"trackplot_create_arrow_segs(data, region, size = 50, head_only = FALSE)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"data Dataframe full segments broken region Region plotted end start attr size int Number arrows span x axis track head_only bool TRUE, head segment plotted","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_create_arrow_segs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Break up segments into smaller segments the length of the plot, divided by size — trackplot_create_arrow_segs","text":"Dataframe segments broken smaller segments. columns start, end, additional metadata columns original data","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot transcript models — trackplot_gene","title":"Plot transcript models — trackplot_gene","text":"Plot transcript models","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot transcript models — trackplot_gene","text":"","code":"trackplot_gene( transcripts, region, exon_size = 2.5, gene_size = 0.5, label_size = 11 * 0.8/ggplot2::.pt, track_label = \"Genes\", return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot transcript models — trackplot_gene","text":"transcripts Transcipt features given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position strand: +/- TRUE/FALSE positive negative strand feature: entries marked \"transcript\" \"exon\" considered gene_name: Symbol gene ID display transcript_id: Transcritp identifier link transcripts exons Usually given output read_gencode_transcripts() region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details exon_size size exon lines units mm gene_size size intron/gene lines units mm label_size size transcript labels units mm return_data true, return data just plotting rather plot. labels Character vector labels item transcripts. NA items labeled transcript_size size transcript lines units mm","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot transcript models — trackplot_gene","text":"Plot gene locations","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_gene.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot transcript models — trackplot_gene","text":"","code":"## Prep data transcripts <- read_gencode_transcripts( file.path(tempdir(), \"references\"), release = \"42\", annotation_set = \"basic\", features = \"transcript\" ) region <- \"chr4:3034877-4034877\" ## Plot gene trackplot plot <- trackplot_gene(transcripts, region) BPCells:::render_plot_from_storage(plot, width = 6, height = 1)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"Plot range-based annotation tracks (e.g. peaks)","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"","code":"trackplot_genome_annotation( loci, region, color_by = NULL, colors = NULL, label_by = NULL, label_size = 11 * 0.8/ggplot2::.pt, show_strand = FALSE, annotation_size = 2.5, track_label = \"Peaks\", return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"loci Genomic loci given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details color_by Name metadata column loci use coloring, data vector length loci. Column must numeric convertible factor. colors Vector hex color codes use color scale. numeric color_by data, passed ggplot2::scale_color_gradientn(), otherwise interpreted discrete color palette ggplot2::scale_color_manual() label_by Name metadata column loci use labeling, data vector length loci. Column must hold string data. label_size size labels units mm show_strand TRUE, show strand direction arrows annotation_size size annotation lines mm return_data true, return data just plotting rather plot.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"Plot genomic loci return_data FALSE, otherwise returns data frame used generate plot","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_genome_annotation.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot range-based annotation tracks (e.g. peaks) — trackplot_genome_annotation","text":"","code":"## Prep data ## Peaks generated from demo frags, as input into `call_peaks_tile()` peaks <- tibble::tibble( chr = factor(rep(\"chr4\", 16)), start = c(3041400, 3041733, 3037400, 3041933, 3040466, 3041200, 3038200, 3038000, 3040266, 3037733, 3040800, 3042133, 3038466, 3037200, 3043333, 3040066), end = c(3041600, 3041933, 3037600, 3042133, 3040666, 3041400, 3038400, 3038200, 3040466, 3037933, 3041000, 3042333, 3038666, 3037400, 3043533, 3040266), enrichment = c(46.4, 43.5, 28.4, 27.3, 17.3, 11.7, 10.5, 7.95, 7.22, 6.86, 6.32, 6.14, 5.96, 5.06, 4.51, 3.43) ) region <- \"chr4:3034877-3044877\" ## Plot peaks BPCells:::render_plot_from_storage( trackplot_genome_annotation(peaks, region, color_by = \"enrichment\"), width = 6, height = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot loops — trackplot_loop","title":"Plot loops — trackplot_loop","text":"Plot loops","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot loops — trackplot_loop","text":"","code":"trackplot_loop( loops, region, color_by = NULL, colors = NULL, allow_truncated = TRUE, curvature = 0.75, track_label = \"Links\", return_data = FALSE )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot loops — trackplot_loop","text":"loops Genomic regions given GRanges, data.frame, list. See help(\"genomic-ranges-like\") details format coordinate systems. Required attributes: chr, start, end: genomic position region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details color_by Name metadata column loops use coloring, data vector length loci. Column must numeric convertible factor. colors Vector hex color codes use color scale. numeric color_by data, passed ggplot2::scale_color_gradientn(), otherwise interpreted discrete color palette ggplot2::scale_color_manual() allow_truncated FALSE, remove loops fully contained within region curvature Curvature value 0 1. 1 180-degree arc, 0 flat lines. return_data true, return data just plotting rather plot.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot loops — trackplot_loop","text":"Plot loops connecting genomic coordinates","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_loop.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot loops — trackplot_loop","text":"","code":"peaks <- c(3054877, 3334877, 3534877, 3634877, 3734877) loops <- tibble::tibble( chr = \"chr4\", start = peaks[c(1,1,2,3)], end = peaks[c(2,3,4,5)], score = c(4,1,3,2) ) region <- \"chr4:3034877-4034877\" ## Plot loops plot <- trackplot_loop(loops, region, color_by = \"score\") BPCells:::render_plot_from_storage(plot, width = 6, height = 1.5)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":null,"dir":"Reference","previous_headings":"","what":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"Normalize trackplot ranges data, handling metadata argument renaming type conversions Type conversions follows: color -> factor numeric label -> string","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"","code":"trackplot_normalize_ranges_with_metadata(data, metadata)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"data Input ranges-like object metadata List form e.g. list(color=color_by, label=label_by). values can either column names data vectors. NULL values skipped","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_normalize_ranges_with_metadata.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Normalize trackplot ranges data, while handling metadata argument renaming and type conversions Type conversions are as follows: color -> factor or numeric label -> string — trackplot_normalize_ranges_with_metadata","text":"Tibble normalized ranges additional columns populated requested metadata","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot scale bar — trackplot_scalebar","title":"Plot scale bar — trackplot_scalebar","text":"Plots human-readable scale bar coordinates region plotted","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot scale bar — trackplot_scalebar","text":"","code":"trackplot_scalebar(region, font_pt = 11)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot scale bar — trackplot_scalebar","text":"region Region plot, e.g. output gene_region(). String format \"chr1:100-200\", list/data.frame/GRanges length 1 specifying chr, start, end. See help(\"genomic-ranges-like\") details font_pt Font size scale bar labels units pt.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Plot scale bar — trackplot_scalebar","text":"Plot coordinates scalebar plotted genomic region","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_scalebar.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Plot scale bar — trackplot_scalebar","text":"","code":"region <- \"chr4:3034877-3044877\" BPCells:::render_plot_from_storage( trackplot_scalebar(region), width = 6, height = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":null,"dir":"Reference","previous_headings":"","what":"Adjust trackplot properties — set_trackplot_label","title":"Adjust trackplot properties — set_trackplot_label","text":"Adjust labels heights trackplots. Labels set facet labels ggplot2, heights additional properties read trackplot_combine() determine relative height input plots.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adjust trackplot properties — set_trackplot_label","text":"","code":"set_trackplot_label(plot, labels) set_trackplot_height(plot, height) get_trackplot_height(plot)"},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Adjust trackplot properties — set_trackplot_label","text":"plot ggplot object labels character vector labels – must match existing number facets plot height New height. numeric, adjusts relative height. ggplot2::unit grid::unit sets absolute height specified units. \"null\" units interpreted relative height.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/trackplot_utils.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Adjust trackplot properties — set_trackplot_label","text":"set_trackplot_label: ggplot object adjusted facet labels set_trackplot_height: ggplot object adjusted trackplot height get_trackplot_height: ggplot2::unit object height setting","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":null,"dir":"Reference","previous_headings":"","what":"Transpose the storage order for a matrix — transpose_storage_order","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"Transpose storage order matrix","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"","code":"transpose_storage_order( matrix, outdir = tempfile(\"transpose\"), tmpdir = tempdir(), load_bytes = 4194304L, sort_bytes = 1073741824L )"},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"matrix Input matrix outdir Directory store output tmpdir Temporary directory use intermediate storage load_bytes minimum contiguous load size merge sort passes sort_bytes amount memory allocate re-sorting chunks entries","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"MatrixDir object copy input matrix, storage order flipped","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"re-sorts entries matrix change storage order row-major col-major. large matrices, can slow – around 2 minutes transpose 500k cell RNA-seq matrix default load_bytes (4MiB) sort_bytes (1GiB) parameters allow ~85GB data sorted two passes data, ~7.3TB data sorted three passes data.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/transpose_storage_order.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Transpose the storage order for a matrix — transpose_storage_order","text":"","code":"mat <- matrix(rnorm(50), nrow = 10, ncol = 5) rownames(mat) <- paste0(\"gene\", seq_len(10)) colnames(mat) <- paste0(\"cell\", seq_len(5)) mat <- mat %>% as(\"dgCMatrix\") %>% as(\"IterableMatrix\") mat #> 10 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: double #> Storage order: column major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ## A regular transpose operation switches a user's rows and cols t(mat) #> 5 x 10 IterableMatrix object with class Iterable_dgCMatrix_wrapper #> #> Row names: cell1, cell2 ... cell5 #> Col names: gene1, gene2 ... gene10 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Load dgCMatrix from memory ## Running `transpose_storage_order()` instead changes whether the storage is in row-major or col-major, ## but does not switch the rows and cols transpose_storage_order(mat) #> 10 x 5 IterableMatrix object with class MatrixDir #> #> Row names: gene1, gene2 ... gene10 #> Col names: cell1, cell2 ... cell5 #> #> Data type: double #> Storage order: row major #> #> Queued Operations: #> 1. Load compressed matrix from directory /tmp/RtmpCiGY9C/transpose3d2cda1481e785"},{"path":"https://bnprks.github.io/BPCells/reference/wrapMatrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an S4 matrix object wrapping another matrix object — wrapMatrix","title":"Construct an S4 matrix object wrapping another matrix object — wrapMatrix","text":"Helps avoid duplicate storage dimnames","code":""},{"path":"https://bnprks.github.io/BPCells/reference/wrapMatrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an S4 matrix object wrapping another matrix object — wrapMatrix","text":"","code":"wrapMatrix(class, m, ...)"},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":null,"dir":"Reference","previous_headings":"","what":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"Write insertion counts data one pseudobulks bed/bedgraph format. Beds hold chrom, start, end data, bedGraphs also provide score column. reports total number insertions basepair group listed cell_groups.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"","code":"write_insertion_bedgraph( fragments, path, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), insertion_mode = c(\"both\", \"start_only\", \"end_only\"), tile_width = 1, normalization_method = c(\"none\", \"cpm\", \"n_cells\"), chrom_sizes = NULL ) write_insertion_bed( fragments, path, cell_groups = rlang::rep_along(cellNames(fragments), \"all\"), insertion_mode = c(\"both\", \"start_only\", \"end_only\"), verbose = FALSE, threads = 1 )"},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"fragments IterableFragments object path (character vector) Path(s) save bed/bedgraphs , optionally ending \".gz\" add gzip compression. cell_groups provided, path must named character vector, one name level cell_groups cell_groups Character factor assigning group cell, order cellNames(fragments) insertion_mode (string) fragment ends use coverage calculation. One \"\", \"start_only\", \"end_only\" tile_width (integer) Width tiles use binning insertions. insertions single bin summed. tile_width 1, functionally equivalent write_insertion_bedgraph(). normalization_method (character) Normalization method use. One : none: normalization cpm: Normalize total number fragments group, scaling 1 million fragments (.e. CPM). n_cells: Normalize total number cells group. chrom_sizes (GRanges, data.frame, list, numeric, NULL) Chromosome sizes clip tiles end chromosome. NULL, tile_width required 1. data.frame list, must contain columns chr end (See help(\"genomic-ranges-like\")). numeric vector, assumed chromosome sizes order chrNames(fragments). verbose (bool) Whether provide verbose progress output console. threads (int) Number threads use.","code":""},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"NULL","code":""},{"path":"https://bnprks.github.io/BPCells/reference/write_insertion_bedgraph.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Write insertion counts to bed/bedgraph file — write_insertion_bedgraph","text":"","code":"## Prep data frags <- get_demo_frags() bedgraph_outputs <- file.path(tempdir(), \"bedgraph_outputs\") ###################################################### ## `write_insertion_bedgraph()` examples ###################################################### ## Write insertions write_insertion_bedgraph(frags, file.path(bedgraph_outputs, \"all.tar.gz\")) list.files(bedgraph_outputs) #> [1] \"all.tar.gz\" # With tiling chrom_sizes <- read_ucsc_chrom_sizes(\"./reference\", genome=\"hg38\") %>% dplyr::filter(chr %in% c(\"chr4\", \"chr11\")) write_insertion_bedgraph(frags, file.path(bedgraph_outputs, \"all_tiled.bedGraph\"), chrom_sizes = chrom_sizes, normalization_method = \"cpm\", tile_width = 100) reads <- readr::read_tsv(file.path(bedgraph_outputs, \"all_tiled.bedGraph\"), col_names = c(\"chr\", \"start\", \"end\", \"score\"), show_col_types = FALSE) head(reads) #> # A tibble: 6 × 4 #> chr start end score #> #> 1 chr4 10000 10100 1.45 #> 2 chr4 10100 10200 0.869 #> 3 chr4 10300 10400 0.290 #> 4 chr4 10400 10500 0.145 #> 5 chr4 10600 10700 0.434 #> 6 chr4 11100 11200 0.145 ###################################################### ## `write_insertion_bed()` examples ###################################################### # We utilize two groups this time bed_outputs <- file.path(tempdir(), \"bed_outputs\") cell_groups <- rep(c(\"A\", \"B\"), length.out = length(cellNames(frags))) bed_paths <- c(file.path(bed_outputs, \"A.bed\"), file.path(bed_outputs, \"B.bed\")) names(bed_paths) <- c(\"A\", \"B\") write_insertion_bed( frags, path = bed_paths, cell_groups = cell_groups, verbose = TRUE ) #> 2025-11-10 15:00:48 Writing bed file for cluster: A #> 2025-11-10 15:00:48 Bed file for cluster: A written to: /tmp/RtmpCiGY9C/bed_outputs/A.bed #> 2025-11-10 15:00:48 Writing bed file for cluster: B #> 2025-11-10 15:00:49 Bed file for cluster: B written to: /tmp/RtmpCiGY9C/bed_outputs/B.bed #> 2025-11-10 15:00:49 Finished writing bed files list.files(bed_outputs) #> [1] \"A.bed\" \"B.bed\" head(readr::read_tsv( file.path(bed_outputs, \"A.bed\"), col_names = c(\"chr\", \"start\", \"end\"), show_col_types = FALSE) ) #> # A tibble: 6 × 3 #> chr start end #> #> 1 chr4 10035 10036 #> 2 chr4 10045 10046 #> 3 chr4 10045 10046 #> 4 chr4 10046 10047 #> 5 chr4 10046 10047 #> 6 chr4 10066 10067"},{"path":[]},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"breaking-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"BPCells 0.4.0 (in-progress main branch)","text":"Change first parameter name cluster_graph_leiden(), cluster_graph_louvain() cluster_graph_seurat() snn mat accurately reflect input type. (pull request #292)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-4-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.4.0 (in-progress main branch)","text":"Create wrapper function cluster_cells_graph() wraps steps knn object creation, graph adjacency creation, clustering within single function (pull request #292) Add tile_width normalization arguments write_insertion_bedgraph() allow flexible bedgraph creation (pull request #299) Export write_insertion_bed(), originally helper peak calling (pull request #302)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-4-0","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.4.0 (in-progress main branch)","text":"Fix error documentation examples plot_embedding(), resulting way documentation examples use nested function calls (pull request #316). Fix error qc_scATAC() fragments near start chromosome (pull request #320).","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"to-dos-0-4-0","dir":"Changelog","previous_headings":"","what":"To-dos","title":"BPCells 0.4.0 (in-progress main branch)","text":"Add support sparse pseudobulking pseudobulk_matrix(). Currently progress #268. Add support duplicate rows/cols subsetting operations. Add support matrix matrix addition. Maybe add CCA support? Refactor C++ backend take logic R S4 methods. allow cleaner seperation R C++ code, allow much quicker porting Python future.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bpcells-031-7212025","dir":"Changelog","previous_headings":"","what":"BPCells 0.3.1 (7/21/2025)","title":"BPCells 0.3.1 (7/21/2025)","text":"BPCells 0.3.1 release covers 7 months changes 40 commits 5 contributors. Notable changes include writing matrices AnnData’s dense format, methods retrieving demo data testing examples. Full details changes . Thanks @ycli1995 @mfansler pull requests contributed release, well users submitted github issues help identify fix bugs.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-3-1","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.3.1 (7/21/2025)","text":"Add write_matrix_anndata_hdf5_dense() allows writing matrices AnnData’s dense format, commonly used obsm varm matrices. (Thanks @ycli1995 pull request #166) Add get_demo_mat(), get_demo_frags() remove_demo_data() retrieve small test matrix/fragments object PBMC 3k dataset 10X Genomics. (pull request #193)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"improvements-0-3-1","dir":"Changelog","previous_headings":"","what":"Improvements","title":"BPCells 0.3.1 (7/21/2025)","text":"Speed taking large subsets large concatenated matrices, e.g. selecting 9M cells 10M cell matrix composed ~100 concatenated pieces. (pull request #179) matrix_stats() now also works types matrix dgCMatrix. (pull request #190) Fixed memory errors running writeInsertionBed() writeInsertionBedGraph() (pull request #{118, 134}) Export merge_peaks_iterative(), helps create non-overlapping peak sets. (pull request #216) Add support uint16_t reading anndata matrices using open_matrix_anndata_hdf5(). (pull request #248) Switch write_matrix_10x_hdf5() use signed rather unsigned integers indices, indptr, shape improve compatibility 10x-produced files. (Thanks @ycli1995 pull request #256) Change behaviour cbind() rbind() matrices different types, upcast instead erroring . (pull request #265)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-3-1","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.3.1 (7/21/2025)","text":"Fix error message printing MACS crashes call_peaks_macs() (pull request #175) Fix gene_score_archr() gene_score_weights_archr() malfunctioning non-default tile_width settings. (Thanks @Baboon61 reporting issue #185) Fix gene_score_archr() chromosome_sizes argument sorted. (Thanks @Baboon61 reporting issue #188) Fix matrix transpose error BPCells loaded via devtools::load_all() BiocGenerics imported previously. (pull request #191) Fix error using single group write_insertion_bedgraph() (pull request #214) Fix GRanges conversion functions sometimes defined BPCells built binary package prior GenomicRanges installed. (pull request #231; thanks @mfansler reporting issue #229) Fix error write_matrix_hdf5() overwriting .h5 file exist. (pull request #234) Fix configure script use pre-installed libhwy available installation time. (Thanks @mfansler submitting PR #228) Fix line-ending issue caused windows-created matrices readable platforms. (pull request #257; thanks @pavsol reporting issue #253) Fix compilation exists system-installed libhwy old. (pull request #288, thanks @GerardoZA reporting issue #285)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bpcells-030-12212024","dir":"Changelog","previous_headings":"","what":"BPCells 0.3.0 (12/21/2024)","title":"BPCells 0.3.0 (12/21/2024)","text":"BPCells 0.3.0 release covers 6 months changes 45 commits 5 contributors. Notable improvements release include support peak calling MACS addition pseudobulk matrix stats calculations. also released initial prototype BPCells Python library (details ). Full details changes . Thanks @ycli1995, @Yunuuuu, @douglasgscofield pull requests contributed release, well users sumitted github issues help identify fix bugs. also added @immanuelazn team new hire! responsible many new features release continue help maintenance new development moving forwards.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-3-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.3.0 (12/21/2024)","text":"apply_by_col() apply_by_row() allow providing custom R functions compute per row/col summaries. initial tests calculating row/col means using R functions ~2x slower C++-based implementation memory usage remains low. Add rowMaxs() colMaxs() functions, return maximum value row column matrix. matrixStats MatrixGenerics packages installed, BPCells::rowMaxs() fall back implementations non-BPCells objects. Thanks @immanuelazn first contribution new lab hire! Add regress_out() allow removing unwanted sources variation via least squares linear regression models. Thanks @ycli1995 pull request #110 Add trackplot_genome_annotation() plotting peaks, options directional arrows, colors, labels, peak widths. (pull request #113) Add MACS2/3 input creation peak calling call_peaks_macs()(pull request #118). Note, renamed call_macs_peaks() pull request #143 Add rowQuantiles() colQuantiles() functions, return quantiles row/column matrix. Currently rowQuantiles() works row-major matrices colQuantiles() works col-major matrices. matrixStats MatrixGenerics packages installed, BPCells::colQuantiles() fall back implementations non-BPCells objects. (pull request #128) Add pseudobulk_matrix() allows pseudobulk aggregation sum mean calculation per-pseudobulk variance nonzero statistics gene (pull request #128)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"improvements-0-3-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"BPCells 0.3.0 (12/21/2024)","text":"trackplot_loop() now accepts discrete color scales trackplot_combine() now smarter layout logic margins, well detecting plots combined cover different genomic regions. (pull request #116) select_cells() select_chromosomes() now also allow using logical mask selection. (pull request #117) BPCells installation can now also configured setting LDFLAGS CFLAGS environment variables addition setting ~/.R/Makevars (pull request #124) open_matrix_anndata_hdf5() now supports reading AnnData matrices dense format. (pull request #146) cluster_graph_leiden() now better defaults produce reasonable cluster counts regardless dataset size. (pull request #147)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-3-0","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.3.0 (12/21/2024)","text":"Fixed error message matrix large converted dgCMatrix. (Thanks @RookieA1 reporting issue #95) Fixed forgetting dimnames subsetting certain sets operations. (Thanks @Yunuuuu reporting issues #97 #100) Fixed plotting crashes running trackplot_coverage() fragments single cluster. (Thanks @sjessa directly reporting bug coming fix) Fixed issues trackplot_coverage() called ranges less 500 bp length (Thanks @bettybliu directly reporting bug.) Fix Rcpp warning created handling compressed matrices one non-zero entry (pull request #123) Fixed discrepancy default ArchR BPCells peak calling insertion method, BPCells defaulted using start fragment opposed ArchR’s method using start end sites fragments (pull request #143) Fix error tile_matrix() fragment mode (pull request #141) Fix precision bug sctransform_pearson() ARM architecture (pull request #141) Fix type-confusion error pseudobulk_matrix() gets integer matrix (pull request #174)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"deprecations-0-3-0","dir":"Changelog","previous_headings":"","what":"Deprecations","title":"BPCells 0.3.0 (12/21/2024)","text":"trackplot_coverage() legend_label argument now ignored, color legend longer shown default coverage plots.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bpcells-020-6142024","dir":"Changelog","previous_headings":"","what":"BPCells 0.2.0 (6/14/2024)","title":"BPCells 0.2.0 (6/14/2024)","text":"finally declaring new release version, covering large amount changes improvements past year. Among major features parallelization options svds() matrix_stats(), improved genomic track plots, runtime CPU feature detection SIMD code (enables higher performance, portable builds). Full details changes . version also comes new installation path, done preparation future Python package release. (can one folder R one Python, rather R files sit root folder). breaking change requires slightly modified installation command. Thanks @brgew, @ycli1995, @Yunuuuu pull requests contributed release, well users submitted github issues help identify fix bugs.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"breaking-changes-0-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"BPCells 0.2.0 (6/14/2024)","text":"r-universe mirrors add \"subdir\": \"r\" packages.json config. New slots added 10x matrix objects, saved RDS files may need 10x matrix inputs re-opened replaced calling all_matrix_inputs(). Outside loading old RDS files changes needed. trackplot_gene() now returns plot facet label match new trackplot system. label can removed calling trackplot_gene(...) + ggplot2::facet_null() equivalent old function’s output.","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"deprecations-0-2-0","dir":"Changelog","previous_headings":"","what":"Deprecations","title":"BPCells 0.2.0 (6/14/2024)","text":"draw_trackplot_grid() deprecated, replaced trackplot_combine() simplified arguments trackplot_bulk() deprecated, replaced trackplot_coverage() equivalent functionality old function names output deprecation warnings, otherwise work .","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-2-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.2.0 (6/14/2024)","text":"New svds() function, based excellent Spectra C++ library (used RSpectra) Yixuan Qiu. ensure lower memory usage compared irlba, achieving similar speed + accuracy. normalizations supported, operations like marker_features() writing matrix disk remain single-threaded. Running svds() many threads gene-major matrices can result high memory usage now. problem present cell-major matrices. Reading text-based MatrixMarket inputs (e.g. 10x Parse) now supported via import_matrix_market() convenience function import_matrix_market_10x(). implementation uses disk-backed sorting allow importing large files low memory usage. Added binarize() function associated generics <, <=, >, >=. supports comparison non-negative numbers currently. (Thanks contribution @brgew) Added round() matrix transformation (Thanks contributions @brgew) Add getter/setter function all_matrix_inputs() help enable relocating underlying storage BPCells matrix transform objects. hdf5-writing functions now support gzip_level parameter, enable shuffle + gzip filter compression. generally much slower bitpacking compression, adds improved storage options files must read outside programs. Thanks @ycli1995 submitting improvement pull #42. AnnData export now supported via write_matrix_anndata_hdf5() (issue #49) Re-licensed code base use dual-licensed Apache V2 MIT instead GPLv3 Assigning subset now supported (e.g. m1[,j] <- m2). Note modify data disk. Instead, uses series subsetting concatenation operations provide appearance overwriting appropriate entries. Added knn_to_geodesic_graph(), matches Scanpy default construction graph-based clustering Add checksum(), allows calculating MD5 checksum matrix contents. Thanks @brgrew submitting improvement pull request #83 write_insertion_bedgraph() allows exporting pseudobulk insertion data bedgraph format","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"improvements-0-2-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"BPCells 0.2.0 (6/14/2024)","text":"Merging fragments c() now handles inputs mismatched chromosome names. Merging fragments now 2-3.5x faster SNN graph construction knn_to_snn_graph() work smoothly large datasets due C++ implementation Reduced memory usage marker_features() samples millions cells large number clusters compare. Windows, increased maximum number files can simultaneously open. Previously, opening >63 compressed counts matrices simultaneously hit limit. Now least 1,000 simultaneous matrices possible. Subsetting peak tile matrices [ now propagates always avoid computing parts peak/tile matrix discarded subset. Subsetting tile matrix automatically convert peak matrix possible improved efficiency. Subsetting RowBindMatrices ColBindMatrices now propagates avoid touching matrices selected indices Added logic help reduce cases subsetting causes BPCells fall back less efficient matrix-vector multiply algorithm. affects math transforms. part , filtering part subset propagate earlier transformation steps, reordering . Thanks @nimanouri-nm raising issue #65 fix bug initial implementation. Additional C++17 filesystem backwards compatibility allow slightly older compilers GCC 7.5 build BPCells. .matrix() produce integer matrices appropriate (Thanks @Yunuuuu pull #77) 10x HDF5 matrices can now read write non-integer types requested (Thanks @ycli1995 pull #75) Old-style 10x files cellranger v2 can now read multi-genome files, returned list (Thanks @ycli1995 pull #75) Trackplots now use faceting provide per-plot labels, leading easier--use trackplot_combine() trackplot_gene() now draws arrows direction transcription trackplot_loop() new track type allows plotting interactions genomic regions, instance peak-gene correlations loop calls Hi-C trackplot_scalebar() added show genomic scale trackplot functions now return ggplot objects additional metadata stored plotting height track Labels heights trackplots can adjusted using set_trackplot_label() set_trackplot_height() getting started pbmc 3k vignette now includes updated trackplot APIs final example Add rowVars() colVars() functions, convenience wrappers around matrix_stats(). matrixStats MatrixGenerics packages installed, BPCells::rowVars() fall back implementations non-BPCells objects. Unfortunately, matrixStats::rowVars() generic, either BPCells::rowVars() BPCells::colVars() Optimize mean variance calculations matrices added per-row per-column constant. Adds run-time detection CPU features eliminate architecture-specific compilation now, Pow SIMD implementation removed, Square gets new SIMD implementation Empirically, operations using SIMD math instructions 2x faster. includes log1p(), sctransform_pearson() Minor speedups dense-sparse matrix multiply functions (1.1-1.5x faster)","code":""},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"bug-fixes-0-2-0","dir":"Changelog","previous_headings":"","what":"Bug-fixes","title":"BPCells 0.2.0 (6/14/2024)","text":"Fixed fragment transforms using chrNames(frags) <- val cellNames(frags) <- val cause downstream errors. Fixed errors transpose_storage_order() matrices >4 billion non-zero entries. Fixed error transpose_storage_order() matrices non-zero entries. Fixed bug writing fragment files >512 chromosomes. Fixed bug reading fragment files >4 billion fragments. Fixed file permissions errors using read-hdf5 files (Issue #26 reported thanks @ttumkaya) Renaming rownames() colnames() now propagated saving matrices (Issue #29 reported thanks @realzehuali, additional fix report thanks @Dario-Rocha) Fixed 64-bit integer overflow (!) cause incorrect p-value calculations marker_features() features 2.6 million zeros. Improved robustness Windows installation process setups need -lsz linker flag compile hdf5 Fixed possible memory safety bug wrapped R objects (dgCMatrix) potentially garbage collected C++ still trying access data rare circumstances. Fixed case dimnames preserved calling convert_matrix_type() twice row cancels (e.g. double -> uint32_t -> double). Thanks @brgrew reporting issue #43 Caused fixed issue resulting unusably slow performance reading matrices HDF5 files. Broken versions range commit 21f8dcf fix 3711a40 (October 18-November 3, 2023). Thanks @abhiachoudhary reporting issue #53 Fixed error svds() handling row-major matrices correctly. Thanks @ycli1995 reporting issue #55 Fixed error row/col name handling AnnData matrices. Thanks @lisch7 reporting issue #57 Fixed error merging matrices different data types. Thanks @Yunuuuu identifying issue providing fix (#68 #70) Fixed issue losing dimnames subset assignment [<-. Thanks @Yunuuuu identifying issue #67 Fixed incorrect results cases scaling matrix shifting. Thanks @Yunuuuu identifying issue #72 Fixed infinite loop bug calling transpose_storage_order() densely-transformed matrix. Thanks @Yunuuuu reporting issue #71 h5ad outputs now subset properly loaded Python anndata package (Thanks issue described @ggruenhagen3 issue #49 fixed @ycli1995 pull #81) Disk-backed fragment objects now load via absolute path, matching behavior matrices making objects loaded via readRDS() can used different working directories. footprints() now respects user interrupts via Ctrl-C","code":""},{"path":[]},{"path":"https://bnprks.github.io/BPCells/news/index.html","id":"features-0-1-0","dir":"Changelog","previous_headings":"","what":"Features","title":"BPCells 0.1.0 (4/7/2023)","text":"Reading/writing 10x fragment files disk Reading/writing compressed fragments disk (folder hdf5 group) Interconversion fragments objects GRanges / data.frame Merging multiple source fragment files transparently run time Calculation Cell x Peak matrices, Cell x Tile matrices ArchR-compatible QC calculations ArchR-compatible gene activity score calculations Filtering fragments chromosmes, cells, lengths, genomic region Fast peak calling approximation via overlapping tiles Conversion /R sparse matrices Read-write access 10x hdf5 feature matrices, read-access AnnData files Reading/writing compressed matrices disk (folder hdf5 group) Support integer single/double-precision floating point matrices disk Fast transposition storage order, switch indexing cell gene/feature. Concatenation multiple source matrix files transparently run time Single-pass calculation row/column mean variance Wilcoxon marker feature calculation Transparent handling vector +, -, *, /, log1p streaming normalization, along less common operations. allows implementation ATAC-seq LSI Seurat default normalization, along published log-based normalizations. SCTransform pearson residual calculation Multiplication sparse matrices Read count knee cutoffs UMAP embeddings Dot plots Transcription factor footprinting / TSS profile plotting Fragments vs. TSS Enrichment ATAC-seq QC plot Pseudobulk genome track plots, gene annotation plots Matching gene symbols/IDs canonical symbols Download transcript annotations Gencode GTF files Download + parse UCSC chromosome sizes Parse peak files BED format; Download ENCODE blacklist region Wrappers knn graph calculation + clustering Note: operations interoperate storage formats. example, matrix operations can applied directly AnnData 10x matrix file. many cases bitpacking-compressed formats provide performance/space advantages, required use computations.","code":""}]