feat: support np.random.Generator by flying-sheep · Pull Request #3983 · scverse/scanpy

flying-sheep · 2026-02-23T15:27:34Z

Closes Switch to numpy.random.Generator #3371
Tests included or not required because:

Release notes not necessary because:

The idea is to make our code behave as closely as possible to how it did, but make it read like modern code:

add decorator that converts random_state into rng argument (deduplicating them and using 0 by default)
add helpers that allow old behavior (e.g. for APIs that only take RandomState)
- _FakeRandomGen.wrap_global and _if_legacy_apply_global to conditionally replace np.random.seed and other global-state-mutating functions
- _legacy_random_state to get back a random_state argument for APIs that don’t take rng, e.g. scikit-learn stuff and "random_state" in adata.uns[thing]["params"]
after this PR: make feat: presets #3653 change the default behavior when a preset is passed.

I also didn’t convert the transformer argument to neighbors (yet?), or deprecated stuff like louvain or the external APIs.

Reviewing

First a short info abut how Generators work:

they can spawn independent children (doing so advances their internal state once)
all their other methods advance their internal state
they are reproducible in the same environment (when initialized with the same seed of course) but make no reproducibility guarantee across versions
the convention is to use rng: SeedLike | RNGLike | None = None for the argument, None meaning random initialization

Now questions to the reviewers:

Should we store the new RNG in adata.uns? If no, this fixes Passing a RandomState instance can cause failures to save #1131
Should we keep random_state in the docs?
How should we annotate that the default rng isn’t actually None but “a new instance of _LegacyRandom(0)” but people can pass rng=None to get the future default behavior?
Should I handle passing rng to neighbors transformer?
Did I miss other spots where rng can be passed?
Did I miss any spots where we called np.random.seed()

TODO:

add the decorator
add helpers for restarting (e.g. if 0 was passed, it’d be reused by the functions called in function body)
for the functions that called it: re-add the np.random.seed calls (maybe if isinstance(rng, _FakeRandomGen): gen = legacy_numpy_gen(rng) or so?)
- partially done, finish the work
add spawn to _FakeRandomGen (that does nothing) and use spawn for a tree structure
- ENH: Should there be an rng.clone() or similar? numpy/numpy#24086 (comment)
ingest

codecov · 2026-02-24T13:00:56Z

❌ 4 Tests Failed:

Tests completed	Failed	Passed	Skipped
1639	4	1635	1120

View the top 3 failed test(s) by shortest run time

tests/test_pca.py::test_pca_reproducible[scipy_csr_arr-rng]

Stack Traces | 0.028s run time

subtests = <_pytest.subtests.Subtests object at 0x7f2f231e2ea0>
array_type = <class 'scipy.sparse._csr.csr_array'>, rng_arg = 'rng'

    #x1B[0m#x1B[37m@pytest#x1B[39;49;00m.mark.parametrize(#x1B[33m"#x1B[39;49;00m#x1B[33mrng_arg#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, [#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m])#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mtest_pca_reproducible#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        subtests: pytest.Subtests, array_type, rng_arg: Literal[#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m]#x1B[90m#x1B[39;49;00m
    ):#x1B[90m#x1B[39;49;00m
        pbmc = pbmc3k_normalized()#x1B[90m#x1B[39;49;00m
        pbmc.X = array_type(pbmc.X)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mwith#x1B[39;49;00m (#x1B[90m#x1B[39;49;00m
            pytest.warns(#x1B[96mUserWarning#x1B[39;49;00m, match=#x1B[33mr#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mIgnoring rng.*sparse dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X, DaskArray) #x1B[95mand#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X._meta, CSBase)#x1B[90m#x1B[39;49;00m
            #x1B[94melse#x1B[39;49;00m nullcontext()#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
>           a, b, c = (#x1B[90m#x1B[39;49;00m
            ^^^^^^^#x1B[90m#x1B[39;49;00m
                sc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
                #x1B[94mfor#x1B[39;49;00m seed #x1B[95min#x1B[39;49;00m (#x1B[94m42#x1B[39;49;00m, #x1B[94m42#x1B[39;49;00m, #x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m

#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:348: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:349: in <genexpr>
    #x1B[0msc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[.../scanpy/_utils/random.py#x1B[0m:200: in wrapper
    #x1B[0m#x1B[94mreturn#x1B[39;49;00m func(*args, **kwargs)#x1B[90m#x1B[39;49;00m
           ^^^^^^^^^^^^^^^^^^^^^#x1B[90m#x1B[39;49;00m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = AnnData object with n_obs × n_vars = 2700 × 16634
    var: 'gene_ids', 'n_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p', 'hvg'
n_comps = 50

    #x1B[0m#x1B[37m@_doc_params#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        mask_var_hvg=doc_mask_var_hvg,#x1B[90m#x1B[39;49;00m
    )#x1B[90m#x1B[39;49;00m
    #x1B[37m@_accepts_legacy_random_state#x1B[39;49;00m(#x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mpca#x1B[39;49;00m(  #x1B[90m# noqa: PLR0912, PLR0913, PLR0915#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        data: AnnData | np.ndarray | CSBase,#x1B[90m#x1B[39;49;00m
        n_comps: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        *,#x1B[90m#x1B[39;49;00m
        layer: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        obsm: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        zero_center: #x1B[96mbool#x1B[39;49;00m = #x1B[94mTrue#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        svd_solver: SvdSolver | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunked: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunk_size: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        rng: SeedLike | RNGLike | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        return_info: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        mask_var: NDArray[np.bool] | #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m | Empty = _empty,#x1B[90m#x1B[39;49;00m
        use_highly_variable: #x1B[96mbool#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        dtype: DTypeLike = #x1B[33m"#x1B[39;49;00m#x1B[33mfloat32#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        key_added: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        copy: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
    ) -> AnnData | np.ndarray | CSBase | #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
    #x1B[90m    #x1B[39;49;00m#x1B[33mr#x1B[39;49;00m#x1B[33m"""Principal component analysis :cite:p:`Pedregosa2011`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Computes PCA coordinates, loadings and variance decomposition.#x1B[39;49;00m
    #x1B[33m    Uses the following implementations (and defaults for `svd_solver`):#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. list-table::#x1B[39;49;00m
    #x1B[33m       :header-rows: 1#x1B[39;49;00m
    #x1B[33m       :stub-columns: 1#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m       - -#x1B[39;49;00m
    #x1B[33m         - :class:`~numpy.ndarray`, :class:`~scipy.sparse.spmatrix`, or :class:`~scipy.sparse.sparray`#x1B[39;49;00m
    #x1B[33m         - :class:`dask.array.Array`#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=True`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.PCA` (`'arpack'`)#x1B[39;49;00m
    #x1B[33m         - - *dense*: dask-ml :class:`~dask_ml.decomposition.PCA`\ [#high-mem]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m           - *sparse* or `svd_solver='covariance_eigh'`: custom implementation (`'covariance_eigh'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=False`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.TruncatedSVD` (`'randomized'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.TruncatedSVD`\ [#dense-only]_ (`'tsqr'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=True` (`zero_center` ignored)#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.IncrementalPCA` (`'auto'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.IncrementalPCA`\ [#densifies]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. [#high-mem] Consider `svd_solver='covariance_eigh'` to reduce memory usage (see :issue:`dask/dask-ml#985`).#x1B[39;49;00m
    #x1B[33m    .. [#dense-only] This implementation can not handle sparse chunks, try manually densifying them.#x1B[39;49;00m
    #x1B[33m    .. [#densifies] This implementation densifies sparse chunks and therefore has increased memory usage.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. array-support:: pp.pca#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Parameters#x1B[39;49;00m
    #x1B[33m    ----------#x1B[39;49;00m
    #x1B[33m    data#x1B[39;49;00m
    #x1B[33m        The (annotated) data matrix of shape `n_obs` × `n_vars`.#x1B[39;49;00m
    #x1B[33m        Rows correspond to cells and columns to genes.#x1B[39;49;00m
    #x1B[33m    n_comps#x1B[39;49;00m
    #x1B[33m        Number of principal components to compute. Defaults to 50,#x1B[39;49;00m
    #x1B[33m        or 1 - minimum dimension size of selected representation.#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.layers` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    obsm#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.obsm` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    zero_center#x1B[39;49;00m
    #x1B[33m        If `True`, compute (or approximate) PCA from covariance matrix.#x1B[39;49;00m
    #x1B[33m        If `False`, performa a truncated SVD instead of PCA.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Our default PCA algorithms (see `svd_solver`) support implicit zero-centering,#x1B[39;49;00m
    #x1B[33m        and therefore efficiently operating on sparse data.#x1B[39;49;00m
    #x1B[33m    svd_solver#x1B[39;49;00m
    #x1B[33m        SVD solver to use.#x1B[39;49;00m
    #x1B[33m        See table above to see which solver class is used based on `chunked` and `zero_center`,#x1B[39;49;00m
    #x1B[33m        as well as the default solver for each class when `svd_solver=None`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Efficient computation of the principal components of a sparse matrix#x1B[39;49;00m
    #x1B[33m        currently only works with the `'arpack`' or `'covariance_eigh`' solver.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        `None`#x1B[39;49;00m
    #x1B[33m            Choose automatically based on solver class (see table above).#x1B[39;49;00m
    #x1B[33m        `'arpack'`#x1B[39;49;00m
    #x1B[33m            ARPACK wrapper in SciPy (:func:`~scipy.sparse.linalg.svds`).#x1B[39;49;00m
    #x1B[33m            Not available for *dask* arrays.#x1B[39;49;00m
    #x1B[33m        `'covariance_eigh'`#x1B[39;49;00m
    #x1B[33m            Classic eigendecomposition of the covariance matrix, suited for tall-and-skinny matrices.#x1B[39;49;00m
    #x1B[33m            With dask, array must be CSR or dense and chunked as `(N, adata.shape[1])`.#x1B[39;49;00m
    #x1B[33m        `'randomized'`#x1B[39;49;00m
    #x1B[33m            Randomized algorithm from :cite:t:`Halko2009`.#x1B[39;49;00m
    #x1B[33m            For *dask* arrays, this will use :func:`~dask.array.linalg.svd_compressed`.#x1B[39;49;00m
    #x1B[33m        `'auto'`#x1B[39;49;00m
    #x1B[33m            Choose automatically depending on the size of the problem:#x1B[39;49;00m
    #x1B[33m            Will use `'full'` for small shapes and `'randomized'` for large shapes.#x1B[39;49;00m
    #x1B[33m        `'tsqr'`#x1B[39;49;00m
    #x1B[33m            “tall-and-skinny QR” algorithm from :cite:t:`Benson2013`.#x1B[39;49;00m
    #x1B[33m            Only available for dense *dask* arrays.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.9.3#x1B[39;49;00m
    #x1B[33m           Default value changed from `'arpack'` to None.#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.4.5#x1B[39;49;00m
    #x1B[33m           Default value changed from `'auto'` to `'arpack'`.#x1B[39;49;00m
    #x1B[33m    chunked#x1B[39;49;00m
    #x1B[33m        If `True`, perform an incremental PCA on segments of `chunk_size`.#x1B[39;49;00m
    #x1B[33m        Automatically zero centers and ignores settings of `zero_center`, `random_seed` and `svd_solver`.#x1B[39;49;00m
    #x1B[33m        If `False`, perform a full PCA/truncated SVD (see `svd_solver` and `zero_center`).#x1B[39;49;00m
    #x1B[33m        See table above for which solver class is used.#x1B[39;49;00m
    #x1B[33m    chunk_size#x1B[39;49;00m
    #x1B[33m        Number of observations to include in each chunk.#x1B[39;49;00m
    #x1B[33m        Required if `chunked=True` was passed.#x1B[39;49;00m
    #x1B[33m    rng#x1B[39;49;00m
    #x1B[33m        Change to use different initial states for the optimization.#x1B[39;49;00m
    #x1B[33m    return_info#x1B[39;49;00m
    #x1B[33m        Only relevant when not passing an :class:`~anndata.AnnData`:#x1B[39;49;00m
    #x1B[33m        see “Returns”.#x1B[39;49;00m
    #x1B[33m    {mask_var_hvg}#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        Layer of `adata` to use as expression values.#x1B[39;49;00m
    #x1B[33m    dtype#x1B[39;49;00m
    #x1B[33m        Numpy data type string to which to convert the result.#x1B[39;49;00m
    #x1B[33m    key_added#x1B[39;49;00m
    #x1B[33m        If not specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ `['X_pca']`, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ `['PCs']`, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ `['pca']`.#x1B[39;49;00m
    #x1B[33m        If specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ ``[key_added]``, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ ``[key_added]``, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ ``[key_added]``.#x1B[39;49;00m
    #x1B[33m    copy#x1B[39;49;00m
    #x1B[33m        If an :class:`~anndata.AnnData` is passed, determines whether a copy#x1B[39;49;00m
    #x1B[33m        is returned. Is ignored otherwise.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Returns#x1B[39;49;00m
    #x1B[33m    -------#x1B[39;49;00m
    #x1B[33m    If `data` is array-like and `return_info=False` was passed,#x1B[39;49;00m
    #x1B[33m    this function returns the PCA representation of `data` as an#x1B[39;49;00m
    #x1B[33m    array of the same type as the input array.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Otherwise, it returns `None` if `copy=False`, else an updated `AnnData` object.#x1B[39;49;00m
    #x1B[33m    Sets the following fields:#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    `.obsm['X_pca' | key_added]` : :class:`~scipy.sparse.csr_matrix` | :class:`~scipy.sparse.csc_matrix` | :class:`~numpy.ndarray` (shape `(adata.n_obs, n_comps)`)#x1B[39;49;00m
    #x1B[33m        PCA representation of data.#x1B[39;49;00m
    #x1B[33m    `.varm['PCs' | key_added]` : :class:`~numpy.ndarray` (shape `(adata.n_vars, n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm=None`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['components']` : :class:`~numpy.ndarray` (shape `(adata.obsm[obsm].shape[1], n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm="..."`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance_ratio']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Ratio of explained variance.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Explained variance, equivalent to the eigenvalues of the#x1B[39;49;00m
    #x1B[33m        covariance matrix.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    """#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        logg_start = logg.info(#x1B[33m"#x1B[39;49;00m#x1B[33mcomputing PCA#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            #x1B[90m# Current chunking implementation relies on pca being called on X#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33m"#x1B[39;49;00m#x1B[33mCannot use `layer`/`obsm` and `chunked` at the same time.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# chunked calculation is not randomized, anyways#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m svd_solver #x1B[95min#x1B[39;49;00m {#x1B[33m"#x1B[39;49;00m#x1B[33mauto#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandomized#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m} #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            logg.info(#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mNote that scikit-learn#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33ms randomized PCA might not be exactly #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducible across different computational platforms. For exact #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducibility, choose `svd_solver=#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33marpack#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33m`.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m return_anndata := #x1B[96misinstance#x1B[39;49;00m(data, AnnData):#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mand#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked #x1B[95mand#x1B[39;49;00m is_backed_type(data.X):#x1B[90m#x1B[39;49;00m
                msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(data.X)#x1B[33m}#x1B[39;49;00m#x1B[33m with chunked as False#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
            adata = data.copy() #x1B[94mif#x1B[39;49;00m copy #x1B[94melse#x1B[39;49;00m data#x1B[90m#x1B[39;49;00m
        #x1B[94melse#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            adata = AnnData(data)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# Unify new mask argument and deprecated use_highly_varible argument#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        mask_var_param, mask_var = _handle_mask_var(#x1B[90m#x1B[39;49;00m
            adata, mask_var, obsm=obsm, use_highly_variable=use_highly_variable#x1B[90m#x1B[39;49;00m
        )#x1B[90m#x1B[39;49;00m
        #x1B[94mdel#x1B[39;49;00m use_highly_variable#x1B[90m#x1B[39;49;00m
        adata_comp = adata[:, mask_var] #x1B[94mif#x1B[39;49;00m mask_var #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[94melse#x1B[39;49;00m adata#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m n_comps #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            min_dim = #x1B[96mmin#x1B[39;49;00m(adata_comp.n_vars, adata_comp.n_obs)#x1B[90m#x1B[39;49;00m
            n_comps = min_dim - #x1B[94m1#x1B[39;49;00m #x1B[94mif#x1B[39;49;00m min_dim <= settings.N_PCS #x1B[94melse#x1B[39;49;00m settings.N_PCS#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        logg.info(#x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33m    with #x1B[39;49;00m#x1B[33m{#x1B[39;49;00mn_comps#x1B[33m=}#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        x = _get_obs_rep(adata_comp, layer=layer, obsm=obsm)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m is_backed_type(x) #x1B[95mand#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m):#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(x)#x1B[33m}#x1B[39;49;00m#x1B[33m from layers/obsm#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# dask needs an int for random state#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        rng = np.random.default_rng(rng)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(rng, _FakeRandomGen) #x1B[95mor#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
            rng._arg, #x1B[96mint#x1B[39;49;00m | np.random.RandomState#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
            #x1B[90m# TODO: remove this error and if we don’t have a _FakeRandomGen,#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[90m#       just use rng.integers to make a seed farther down#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mrng needs to be an int or a np.random.RandomState, not a #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(rng).#x1B[91m__name__#x1B[39;49;00m#x1B[33m}#x1B[39;49;00m#x1B[33m when passing a dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
>           #x1B[94mraise#x1B[39;49;00m #x1B[96mTypeError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[31mE           TypeError: rng needs to be an int or a np.random.RandomState, not a Generator when passing a dask array#x1B[0m

#x1B[1m#x1B[.../preprocessing/_pca/__init__.py#x1B[0m:257: TypeError

tests/test_pca.py::test_pca_reproducible[scipy_csc_mat-rng]

Stack Traces | 0.05s run time

subtests = <_pytest.subtests.Subtests object at 0x7fb0732588a0>
array_type = <class 'scipy.sparse._csc.csc_matrix'>, rng_arg = 'rng'

    #x1B[0m#x1B[37m@pytest#x1B[39;49;00m.mark.parametrize(#x1B[33m"#x1B[39;49;00m#x1B[33mrng_arg#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, [#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m])#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mtest_pca_reproducible#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        subtests: pytest.Subtests, array_type, rng_arg: Literal[#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m]#x1B[90m#x1B[39;49;00m
    ):#x1B[90m#x1B[39;49;00m
        pbmc = pbmc3k_normalized()#x1B[90m#x1B[39;49;00m
        pbmc.X = array_type(pbmc.X)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mwith#x1B[39;49;00m (#x1B[90m#x1B[39;49;00m
            pytest.warns(#x1B[96mUserWarning#x1B[39;49;00m, match=#x1B[33mr#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mIgnoring rng.*sparse dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X, DaskArray) #x1B[95mand#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X._meta, CSBase)#x1B[90m#x1B[39;49;00m
            #x1B[94melse#x1B[39;49;00m nullcontext()#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
>           a, b, c = (#x1B[90m#x1B[39;49;00m
            ^^^^^^^#x1B[90m#x1B[39;49;00m
                sc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
                #x1B[94mfor#x1B[39;49;00m seed #x1B[95min#x1B[39;49;00m (#x1B[94m42#x1B[39;49;00m, #x1B[94m42#x1B[39;49;00m, #x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m

#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:348: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:349: in <genexpr>
    #x1B[0msc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[.../scanpy/_utils/random.py#x1B[0m:200: in wrapper
    #x1B[0m#x1B[94mreturn#x1B[39;49;00m func(*args, **kwargs)#x1B[90m#x1B[39;49;00m
           ^^^^^^^^^^^^^^^^^^^^^#x1B[90m#x1B[39;49;00m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = AnnData object with n_obs × n_vars = 2700 × 16634
    var: 'gene_ids', 'n_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p', 'hvg'
n_comps = 50

    #x1B[0m#x1B[37m@_doc_params#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        mask_var_hvg=doc_mask_var_hvg,#x1B[90m#x1B[39;49;00m
    )#x1B[90m#x1B[39;49;00m
    #x1B[37m@_accepts_legacy_random_state#x1B[39;49;00m(#x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mpca#x1B[39;49;00m(  #x1B[90m# noqa: PLR0912, PLR0913, PLR0915#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        data: AnnData | np.ndarray | CSBase,#x1B[90m#x1B[39;49;00m
        n_comps: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        *,#x1B[90m#x1B[39;49;00m
        layer: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        obsm: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        zero_center: #x1B[96mbool#x1B[39;49;00m = #x1B[94mTrue#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        svd_solver: SvdSolver | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunked: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunk_size: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        rng: SeedLike | RNGLike | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        return_info: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        mask_var: NDArray[np.bool] | #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m | Empty = _empty,#x1B[90m#x1B[39;49;00m
        use_highly_variable: #x1B[96mbool#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        dtype: DTypeLike = #x1B[33m"#x1B[39;49;00m#x1B[33mfloat32#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        key_added: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        copy: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
    ) -> AnnData | np.ndarray | CSBase | #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
    #x1B[90m    #x1B[39;49;00m#x1B[33mr#x1B[39;49;00m#x1B[33m"""Principal component analysis :cite:p:`Pedregosa2011`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Computes PCA coordinates, loadings and variance decomposition.#x1B[39;49;00m
    #x1B[33m    Uses the following implementations (and defaults for `svd_solver`):#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. list-table::#x1B[39;49;00m
    #x1B[33m       :header-rows: 1#x1B[39;49;00m
    #x1B[33m       :stub-columns: 1#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m       - -#x1B[39;49;00m
    #x1B[33m         - :class:`~numpy.ndarray`, :class:`~scipy.sparse.spmatrix`, or :class:`~scipy.sparse.sparray`#x1B[39;49;00m
    #x1B[33m         - :class:`dask.array.Array`#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=True`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.PCA` (`'arpack'`)#x1B[39;49;00m
    #x1B[33m         - - *dense*: dask-ml :class:`~dask_ml.decomposition.PCA`\ [#high-mem]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m           - *sparse* or `svd_solver='covariance_eigh'`: custom implementation (`'covariance_eigh'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=False`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.TruncatedSVD` (`'randomized'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.TruncatedSVD`\ [#dense-only]_ (`'tsqr'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=True` (`zero_center` ignored)#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.IncrementalPCA` (`'auto'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.IncrementalPCA`\ [#densifies]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. [#high-mem] Consider `svd_solver='covariance_eigh'` to reduce memory usage (see :issue:`dask/dask-ml#985`).#x1B[39;49;00m
    #x1B[33m    .. [#dense-only] This implementation can not handle sparse chunks, try manually densifying them.#x1B[39;49;00m
    #x1B[33m    .. [#densifies] This implementation densifies sparse chunks and therefore has increased memory usage.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. array-support:: pp.pca#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Parameters#x1B[39;49;00m
    #x1B[33m    ----------#x1B[39;49;00m
    #x1B[33m    data#x1B[39;49;00m
    #x1B[33m        The (annotated) data matrix of shape `n_obs` × `n_vars`.#x1B[39;49;00m
    #x1B[33m        Rows correspond to cells and columns to genes.#x1B[39;49;00m
    #x1B[33m    n_comps#x1B[39;49;00m
    #x1B[33m        Number of principal components to compute. Defaults to 50,#x1B[39;49;00m
    #x1B[33m        or 1 - minimum dimension size of selected representation.#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.layers` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    obsm#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.obsm` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    zero_center#x1B[39;49;00m
    #x1B[33m        If `True`, compute (or approximate) PCA from covariance matrix.#x1B[39;49;00m
    #x1B[33m        If `False`, performa a truncated SVD instead of PCA.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Our default PCA algorithms (see `svd_solver`) support implicit zero-centering,#x1B[39;49;00m
    #x1B[33m        and therefore efficiently operating on sparse data.#x1B[39;49;00m
    #x1B[33m    svd_solver#x1B[39;49;00m
    #x1B[33m        SVD solver to use.#x1B[39;49;00m
    #x1B[33m        See table above to see which solver class is used based on `chunked` and `zero_center`,#x1B[39;49;00m
    #x1B[33m        as well as the default solver for each class when `svd_solver=None`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Efficient computation of the principal components of a sparse matrix#x1B[39;49;00m
    #x1B[33m        currently only works with the `'arpack`' or `'covariance_eigh`' solver.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        `None`#x1B[39;49;00m
    #x1B[33m            Choose automatically based on solver class (see table above).#x1B[39;49;00m
    #x1B[33m        `'arpack'`#x1B[39;49;00m
    #x1B[33m            ARPACK wrapper in SciPy (:func:`~scipy.sparse.linalg.svds`).#x1B[39;49;00m
    #x1B[33m            Not available for *dask* arrays.#x1B[39;49;00m
    #x1B[33m        `'covariance_eigh'`#x1B[39;49;00m
    #x1B[33m            Classic eigendecomposition of the covariance matrix, suited for tall-and-skinny matrices.#x1B[39;49;00m
    #x1B[33m            With dask, array must be CSR or dense and chunked as `(N, adata.shape[1])`.#x1B[39;49;00m
    #x1B[33m        `'randomized'`#x1B[39;49;00m
    #x1B[33m            Randomized algorithm from :cite:t:`Halko2009`.#x1B[39;49;00m
    #x1B[33m            For *dask* arrays, this will use :func:`~dask.array.linalg.svd_compressed`.#x1B[39;49;00m
    #x1B[33m        `'auto'`#x1B[39;49;00m
    #x1B[33m            Choose automatically depending on the size of the problem:#x1B[39;49;00m
    #x1B[33m            Will use `'full'` for small shapes and `'randomized'` for large shapes.#x1B[39;49;00m
    #x1B[33m        `'tsqr'`#x1B[39;49;00m
    #x1B[33m            “tall-and-skinny QR” algorithm from :cite:t:`Benson2013`.#x1B[39;49;00m
    #x1B[33m            Only available for dense *dask* arrays.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.9.3#x1B[39;49;00m
    #x1B[33m           Default value changed from `'arpack'` to None.#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.4.5#x1B[39;49;00m
    #x1B[33m           Default value changed from `'auto'` to `'arpack'`.#x1B[39;49;00m
    #x1B[33m    chunked#x1B[39;49;00m
    #x1B[33m        If `True`, perform an incremental PCA on segments of `chunk_size`.#x1B[39;49;00m
    #x1B[33m        Automatically zero centers and ignores settings of `zero_center`, `random_seed` and `svd_solver`.#x1B[39;49;00m
    #x1B[33m        If `False`, perform a full PCA/truncated SVD (see `svd_solver` and `zero_center`).#x1B[39;49;00m
    #x1B[33m        See table above for which solver class is used.#x1B[39;49;00m
    #x1B[33m    chunk_size#x1B[39;49;00m
    #x1B[33m        Number of observations to include in each chunk.#x1B[39;49;00m
    #x1B[33m        Required if `chunked=True` was passed.#x1B[39;49;00m
    #x1B[33m    rng#x1B[39;49;00m
    #x1B[33m        Change to use different initial states for the optimization.#x1B[39;49;00m
    #x1B[33m    return_info#x1B[39;49;00m
    #x1B[33m        Only relevant when not passing an :class:`~anndata.AnnData`:#x1B[39;49;00m
    #x1B[33m        see “Returns”.#x1B[39;49;00m
    #x1B[33m    {mask_var_hvg}#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        Layer of `adata` to use as expression values.#x1B[39;49;00m
    #x1B[33m    dtype#x1B[39;49;00m
    #x1B[33m        Numpy data type string to which to convert the result.#x1B[39;49;00m
    #x1B[33m    key_added#x1B[39;49;00m
    #x1B[33m        If not specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ `['X_pca']`, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ `['PCs']`, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ `['pca']`.#x1B[39;49;00m
    #x1B[33m        If specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ ``[key_added]``, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ ``[key_added]``, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ ``[key_added]``.#x1B[39;49;00m
    #x1B[33m    copy#x1B[39;49;00m
    #x1B[33m        If an :class:`~anndata.AnnData` is passed, determines whether a copy#x1B[39;49;00m
    #x1B[33m        is returned. Is ignored otherwise.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Returns#x1B[39;49;00m
    #x1B[33m    -------#x1B[39;49;00m
    #x1B[33m    If `data` is array-like and `return_info=False` was passed,#x1B[39;49;00m
    #x1B[33m    this function returns the PCA representation of `data` as an#x1B[39;49;00m
    #x1B[33m    array of the same type as the input array.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Otherwise, it returns `None` if `copy=False`, else an updated `AnnData` object.#x1B[39;49;00m
    #x1B[33m    Sets the following fields:#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    `.obsm['X_pca' | key_added]` : :class:`~scipy.sparse.csr_matrix` | :class:`~scipy.sparse.csc_matrix` | :class:`~numpy.ndarray` (shape `(adata.n_obs, n_comps)`)#x1B[39;49;00m
    #x1B[33m        PCA representation of data.#x1B[39;49;00m
    #x1B[33m    `.varm['PCs' | key_added]` : :class:`~numpy.ndarray` (shape `(adata.n_vars, n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm=None`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['components']` : :class:`~numpy.ndarray` (shape `(adata.obsm[obsm].shape[1], n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm="..."`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance_ratio']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Ratio of explained variance.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Explained variance, equivalent to the eigenvalues of the#x1B[39;49;00m
    #x1B[33m        covariance matrix.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    """#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        logg_start = logg.info(#x1B[33m"#x1B[39;49;00m#x1B[33mcomputing PCA#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            #x1B[90m# Current chunking implementation relies on pca being called on X#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33m"#x1B[39;49;00m#x1B[33mCannot use `layer`/`obsm` and `chunked` at the same time.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# chunked calculation is not randomized, anyways#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m svd_solver #x1B[95min#x1B[39;49;00m {#x1B[33m"#x1B[39;49;00m#x1B[33mauto#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandomized#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m} #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            logg.info(#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mNote that scikit-learn#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33ms randomized PCA might not be exactly #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducible across different computational platforms. For exact #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducibility, choose `svd_solver=#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33marpack#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33m`.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m return_anndata := #x1B[96misinstance#x1B[39;49;00m(data, AnnData):#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mand#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked #x1B[95mand#x1B[39;49;00m is_backed_type(data.X):#x1B[90m#x1B[39;49;00m
                msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(data.X)#x1B[33m}#x1B[39;49;00m#x1B[33m with chunked as False#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
            adata = data.copy() #x1B[94mif#x1B[39;49;00m copy #x1B[94melse#x1B[39;49;00m data#x1B[90m#x1B[39;49;00m
        #x1B[94melse#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            adata = AnnData(data)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# Unify new mask argument and deprecated use_highly_varible argument#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        mask_var_param, mask_var = _handle_mask_var(#x1B[90m#x1B[39;49;00m
            adata, mask_var, obsm=obsm, use_highly_variable=use_highly_variable#x1B[90m#x1B[39;49;00m
        )#x1B[90m#x1B[39;49;00m
        #x1B[94mdel#x1B[39;49;00m use_highly_variable#x1B[90m#x1B[39;49;00m
        adata_comp = adata[:, mask_var] #x1B[94mif#x1B[39;49;00m mask_var #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[94melse#x1B[39;49;00m adata#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m n_comps #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            min_dim = #x1B[96mmin#x1B[39;49;00m(adata_comp.n_vars, adata_comp.n_obs)#x1B[90m#x1B[39;49;00m
            n_comps = min_dim - #x1B[94m1#x1B[39;49;00m #x1B[94mif#x1B[39;49;00m min_dim <= settings.N_PCS #x1B[94melse#x1B[39;49;00m settings.N_PCS#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        logg.info(#x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33m    with #x1B[39;49;00m#x1B[33m{#x1B[39;49;00mn_comps#x1B[33m=}#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        x = _get_obs_rep(adata_comp, layer=layer, obsm=obsm)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m is_backed_type(x) #x1B[95mand#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m):#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(x)#x1B[33m}#x1B[39;49;00m#x1B[33m from layers/obsm#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# dask needs an int for random state#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        rng = np.random.default_rng(rng)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(rng, _FakeRandomGen) #x1B[95mor#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
            rng._arg, #x1B[96mint#x1B[39;49;00m | np.random.RandomState#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
            #x1B[90m# TODO: remove this error and if we don’t have a _FakeRandomGen,#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[90m#       just use rng.integers to make a seed farther down#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mrng needs to be an int or a np.random.RandomState, not a #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(rng).#x1B[91m__name__#x1B[39;49;00m#x1B[33m}#x1B[39;49;00m#x1B[33m when passing a dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
>           #x1B[94mraise#x1B[39;49;00m #x1B[96mTypeError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[31mE           TypeError: rng needs to be an int or a np.random.RandomState, not a Generator when passing a dask array#x1B[0m

#x1B[1m#x1B[.../preprocessing/_pca/__init__.py#x1B[0m:257: TypeError

tests/test_pca.py::test_pca_reproducible[numpy_ndarray-rng]

Stack Traces | 0.188s run time

subtests = <_pytest.subtests.Subtests object at 0x7fb0738767b0>
array_type = <function asarray at 0x7fb0b4533d70>, rng_arg = 'rng'

    #x1B[0m#x1B[37m@pytest#x1B[39;49;00m.mark.parametrize(#x1B[33m"#x1B[39;49;00m#x1B[33mrng_arg#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, [#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m])#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mtest_pca_reproducible#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        subtests: pytest.Subtests, array_type, rng_arg: Literal[#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m]#x1B[90m#x1B[39;49;00m
    ):#x1B[90m#x1B[39;49;00m
        pbmc = pbmc3k_normalized()#x1B[90m#x1B[39;49;00m
        pbmc.X = array_type(pbmc.X)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mwith#x1B[39;49;00m (#x1B[90m#x1B[39;49;00m
            pytest.warns(#x1B[96mUserWarning#x1B[39;49;00m, match=#x1B[33mr#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mIgnoring rng.*sparse dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X, DaskArray) #x1B[95mand#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X._meta, CSBase)#x1B[90m#x1B[39;49;00m
            #x1B[94melse#x1B[39;49;00m nullcontext()#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
>           a, b, c = (#x1B[90m#x1B[39;49;00m
            ^^^^^^^#x1B[90m#x1B[39;49;00m
                sc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
                #x1B[94mfor#x1B[39;49;00m seed #x1B[95min#x1B[39;49;00m (#x1B[94m42#x1B[39;49;00m, #x1B[94m42#x1B[39;49;00m, #x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m

#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:348: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:349: in <genexpr>
    #x1B[0msc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[.../scanpy/_utils/random.py#x1B[0m:200: in wrapper
    #x1B[0m#x1B[94mreturn#x1B[39;49;00m func(*args, **kwargs)#x1B[90m#x1B[39;49;00m
           ^^^^^^^^^^^^^^^^^^^^^#x1B[90m#x1B[39;49;00m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = AnnData object with n_obs × n_vars = 2700 × 16634
    var: 'gene_ids', 'n_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p', 'hvg'
n_comps = 50

    #x1B[0m#x1B[37m@_doc_params#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        mask_var_hvg=doc_mask_var_hvg,#x1B[90m#x1B[39;49;00m
    )#x1B[90m#x1B[39;49;00m
    #x1B[37m@_accepts_legacy_random_state#x1B[39;49;00m(#x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mpca#x1B[39;49;00m(  #x1B[90m# noqa: PLR0912, PLR0913, PLR0915#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        data: AnnData | np.ndarray | CSBase,#x1B[90m#x1B[39;49;00m
        n_comps: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        *,#x1B[90m#x1B[39;49;00m
        layer: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        obsm: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        zero_center: #x1B[96mbool#x1B[39;49;00m = #x1B[94mTrue#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        svd_solver: SvdSolver | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunked: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunk_size: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        rng: SeedLike | RNGLike | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        return_info: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        mask_var: NDArray[np.bool] | #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m | Empty = _empty,#x1B[90m#x1B[39;49;00m
        use_highly_variable: #x1B[96mbool#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        dtype: DTypeLike = #x1B[33m"#x1B[39;49;00m#x1B[33mfloat32#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        key_added: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        copy: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
    ) -> AnnData | np.ndarray | CSBase | #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
    #x1B[90m    #x1B[39;49;00m#x1B[33mr#x1B[39;49;00m#x1B[33m"""Principal component analysis :cite:p:`Pedregosa2011`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Computes PCA coordinates, loadings and variance decomposition.#x1B[39;49;00m
    #x1B[33m    Uses the following implementations (and defaults for `svd_solver`):#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. list-table::#x1B[39;49;00m
    #x1B[33m       :header-rows: 1#x1B[39;49;00m
    #x1B[33m       :stub-columns: 1#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m       - -#x1B[39;49;00m
    #x1B[33m         - :class:`~numpy.ndarray`, :class:`~scipy.sparse.spmatrix`, or :class:`~scipy.sparse.sparray`#x1B[39;49;00m
    #x1B[33m         - :class:`dask.array.Array`#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=True`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.PCA` (`'arpack'`)#x1B[39;49;00m
    #x1B[33m         - - *dense*: dask-ml :class:`~dask_ml.decomposition.PCA`\ [#high-mem]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m           - *sparse* or `svd_solver='covariance_eigh'`: custom implementation (`'covariance_eigh'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=False`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.TruncatedSVD` (`'randomized'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.TruncatedSVD`\ [#dense-only]_ (`'tsqr'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=True` (`zero_center` ignored)#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.IncrementalPCA` (`'auto'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.IncrementalPCA`\ [#densifies]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. [#high-mem] Consider `svd_solver='covariance_eigh'` to reduce memory usage (see :issue:`dask/dask-ml#985`).#x1B[39;49;00m
    #x1B[33m    .. [#dense-only] This implementation can not handle sparse chunks, try manually densifying them.#x1B[39;49;00m
    #x1B[33m    .. [#densifies] This implementation densifies sparse chunks and therefore has increased memory usage.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. array-support:: pp.pca#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Parameters#x1B[39;49;00m
    #x1B[33m    ----------#x1B[39;49;00m
    #x1B[33m    data#x1B[39;49;00m
    #x1B[33m        The (annotated) data matrix of shape `n_obs` × `n_vars`.#x1B[39;49;00m
    #x1B[33m        Rows correspond to cells and columns to genes.#x1B[39;49;00m
    #x1B[33m    n_comps#x1B[39;49;00m
    #x1B[33m        Number of principal components to compute. Defaults to 50,#x1B[39;49;00m
    #x1B[33m        or 1 - minimum dimension size of selected representation.#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.layers` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    obsm#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.obsm` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    zero_center#x1B[39;49;00m
    #x1B[33m        If `True`, compute (or approximate) PCA from covariance matrix.#x1B[39;49;00m
    #x1B[33m        If `False`, performa a truncated SVD instead of PCA.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Our default PCA algorithms (see `svd_solver`) support implicit zero-centering,#x1B[39;49;00m
    #x1B[33m        and therefore efficiently operating on sparse data.#x1B[39;49;00m
    #x1B[33m    svd_solver#x1B[39;49;00m
    #x1B[33m        SVD solver to use.#x1B[39;49;00m
    #x1B[33m        See table above to see which solver class is used based on `chunked` and `zero_center`,#x1B[39;49;00m
    #x1B[33m        as well as the default solver for each class when `svd_solver=None`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Efficient computation of the principal components of a sparse matrix#x1B[39;49;00m
    #x1B[33m        currently only works with the `'arpack`' or `'covariance_eigh`' solver.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        `None`#x1B[39;49;00m
    #x1B[33m            Choose automatically based on solver class (see table above).#x1B[39;49;00m
    #x1B[33m        `'arpack'`#x1B[39;49;00m
    #x1B[33m            ARPACK wrapper in SciPy (:func:`~scipy.sparse.linalg.svds`).#x1B[39;49;00m
    #x1B[33m            Not available for *dask* arrays.#x1B[39;49;00m
    #x1B[33m        `'covariance_eigh'`#x1B[39;49;00m
    #x1B[33m            Classic eigendecomposition of the covariance matrix, suited for tall-and-skinny matrices.#x1B[39;49;00m
    #x1B[33m            With dask, array must be CSR or dense and chunked as `(N, adata.shape[1])`.#x1B[39;49;00m
    #x1B[33m        `'randomized'`#x1B[39;49;00m
    #x1B[33m            Randomized algorithm from :cite:t:`Halko2009`.#x1B[39;49;00m
    #x1B[33m            For *dask* arrays, this will use :func:`~dask.array.linalg.svd_compressed`.#x1B[39;49;00m
    #x1B[33m        `'auto'`#x1B[39;49;00m
    #x1B[33m            Choose automatically depending on the size of the problem:#x1B[39;49;00m
    #x1B[33m            Will use `'full'` for small shapes and `'randomized'` for large shapes.#x1B[39;49;00m
    #x1B[33m        `'tsqr'`#x1B[39;49;00m
    #x1B[33m            “tall-and-skinny QR” algorithm from :cite:t:`Benson2013`.#x1B[39;49;00m
    #x1B[33m            Only available for dense *dask* arrays.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.9.3#x1B[39;49;00m
    #x1B[33m           Default value changed from `'arpack'` to None.#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.4.5#x1B[39;49;00m
    #x1B[33m           Default value changed from `'auto'` to `'arpack'`.#x1B[39;49;00m
    #x1B[33m    chunked#x1B[39;49;00m
    #x1B[33m        If `True`, perform an incremental PCA on segments of `chunk_size`.#x1B[39;49;00m
    #x1B[33m        Automatically zero centers and ignores settings of `zero_center`, `random_seed` and `svd_solver`.#x1B[39;49;00m
    #x1B[33m        If `False`, perform a full PCA/truncated SVD (see `svd_solver` and `zero_center`).#x1B[39;49;00m
    #x1B[33m        See table above for which solver class is used.#x1B[39;49;00m
    #x1B[33m    chunk_size#x1B[39;49;00m
    #x1B[33m        Number of observations to include in each chunk.#x1B[39;49;00m
    #x1B[33m        Required if `chunked=True` was passed.#x1B[39;49;00m
    #x1B[33m    rng#x1B[39;49;00m
    #x1B[33m        Change to use different initial states for the optimization.#x1B[39;49;00m
    #x1B[33m    return_info#x1B[39;49;00m
    #x1B[33m        Only relevant when not passing an :class:`~anndata.AnnData`:#x1B[39;49;00m
    #x1B[33m        see “Returns”.#x1B[39;49;00m
    #x1B[33m    {mask_var_hvg}#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        Layer of `adata` to use as expression values.#x1B[39;49;00m
    #x1B[33m    dtype#x1B[39;49;00m
    #x1B[33m        Numpy data type string to which to convert the result.#x1B[39;49;00m
    #x1B[33m    key_added#x1B[39;49;00m
    #x1B[33m        If not specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ `['X_pca']`, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ `['PCs']`, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ `['pca']`.#x1B[39;49;00m
    #x1B[33m        If specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ ``[key_added]``, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ ``[key_added]``, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ ``[key_added]``.#x1B[39;49;00m
    #x1B[33m    copy#x1B[39;49;00m
    #x1B[33m        If an :class:`~anndata.AnnData` is passed, determines whether a copy#x1B[39;49;00m
    #x1B[33m        is returned. Is ignored otherwise.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Returns#x1B[39;49;00m
    #x1B[33m    -------#x1B[39;49;00m
    #x1B[33m    If `data` is array-like and `return_info=False` was passed,#x1B[39;49;00m
    #x1B[33m    this function returns the PCA representation of `data` as an#x1B[39;49;00m
    #x1B[33m    array of the same type as the input array.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Otherwise, it returns `None` if `copy=False`, else an updated `AnnData` object.#x1B[39;49;00m
    #x1B[33m    Sets the following fields:#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    `.obsm['X_pca' | key_added]` : :class:`~scipy.sparse.csr_matrix` | :class:`~scipy.sparse.csc_matrix` | :class:`~numpy.ndarray` (shape `(adata.n_obs, n_comps)`)#x1B[39;49;00m
    #x1B[33m        PCA representation of data.#x1B[39;49;00m
    #x1B[33m    `.varm['PCs' | key_added]` : :class:`~numpy.ndarray` (shape `(adata.n_vars, n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm=None`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['components']` : :class:`~numpy.ndarray` (shape `(adata.obsm[obsm].shape[1], n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm="..."`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance_ratio']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Ratio of explained variance.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Explained variance, equivalent to the eigenvalues of the#x1B[39;49;00m
    #x1B[33m        covariance matrix.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    """#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        logg_start = logg.info(#x1B[33m"#x1B[39;49;00m#x1B[33mcomputing PCA#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            #x1B[90m# Current chunking implementation relies on pca being called on X#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33m"#x1B[39;49;00m#x1B[33mCannot use `layer`/`obsm` and `chunked` at the same time.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# chunked calculation is not randomized, anyways#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m svd_solver #x1B[95min#x1B[39;49;00m {#x1B[33m"#x1B[39;49;00m#x1B[33mauto#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandomized#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m} #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            logg.info(#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mNote that scikit-learn#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33ms randomized PCA might not be exactly #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducible across different computational platforms. For exact #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducibility, choose `svd_solver=#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33marpack#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33m`.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m return_anndata := #x1B[96misinstance#x1B[39;49;00m(data, AnnData):#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mand#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked #x1B[95mand#x1B[39;49;00m is_backed_type(data.X):#x1B[90m#x1B[39;49;00m
                msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(data.X)#x1B[33m}#x1B[39;49;00m#x1B[33m with chunked as False#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
            adata = data.copy() #x1B[94mif#x1B[39;49;00m copy #x1B[94melse#x1B[39;49;00m data#x1B[90m#x1B[39;49;00m
        #x1B[94melse#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            adata = AnnData(data)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# Unify new mask argument and deprecated use_highly_varible argument#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        mask_var_param, mask_var = _handle_mask_var(#x1B[90m#x1B[39;49;00m
            adata, mask_var, obsm=obsm, use_highly_variable=use_highly_variable#x1B[90m#x1B[39;49;00m
        )#x1B[90m#x1B[39;49;00m
        #x1B[94mdel#x1B[39;49;00m use_highly_variable#x1B[90m#x1B[39;49;00m
        adata_comp = adata[:, mask_var] #x1B[94mif#x1B[39;49;00m mask_var #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[94melse#x1B[39;49;00m adata#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m n_comps #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            min_dim = #x1B[96mmin#x1B[39;49;00m(adata_comp.n_vars, adata_comp.n_obs)#x1B[90m#x1B[39;49;00m
            n_comps = min_dim - #x1B[94m1#x1B[39;49;00m #x1B[94mif#x1B[39;49;00m min_dim <= settings.N_PCS #x1B[94melse#x1B[39;49;00m settings.N_PCS#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        logg.info(#x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33m    with #x1B[39;49;00m#x1B[33m{#x1B[39;49;00mn_comps#x1B[33m=}#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        x = _get_obs_rep(adata_comp, layer=layer, obsm=obsm)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m is_backed_type(x) #x1B[95mand#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m):#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(x)#x1B[33m}#x1B[39;49;00m#x1B[33m from layers/obsm#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# dask needs an int for random state#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        rng = np.random.default_rng(rng)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(rng, _FakeRandomGen) #x1B[95mor#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
            rng._arg, #x1B[96mint#x1B[39;49;00m | np.random.RandomState#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
            #x1B[90m# TODO: remove this error and if we don’t have a _FakeRandomGen,#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[90m#       just use rng.integers to make a seed farther down#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mrng needs to be an int or a np.random.RandomState, not a #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(rng).#x1B[91m__name__#x1B[39;49;00m#x1B[33m}#x1B[39;49;00m#x1B[33m when passing a dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
>           #x1B[94mraise#x1B[39;49;00m #x1B[96mTypeError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[31mE           TypeError: rng needs to be an int or a np.random.RandomState, not a Generator when passing a dask array#x1B[0m

#x1B[1m#x1B[.../preprocessing/_pca/__init__.py#x1B[0m:257: TypeError

tests/test_pca.py::test_pca_reproducible[scipy_csr_mat-rng]

Stack Traces | 2.6s run time

subtests = <_pytest.subtests.Subtests object at 0x7fb098d76520>
array_type = <class 'scipy.sparse._csr.csr_matrix'>, rng_arg = 'rng'

    #x1B[0m#x1B[37m@pytest#x1B[39;49;00m.mark.parametrize(#x1B[33m"#x1B[39;49;00m#x1B[33mrng_arg#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, [#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m])#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mtest_pca_reproducible#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        subtests: pytest.Subtests, array_type, rng_arg: Literal[#x1B[33m"#x1B[39;49;00m#x1B[33mrng#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandom_state#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m]#x1B[90m#x1B[39;49;00m
    ):#x1B[90m#x1B[39;49;00m
        pbmc = pbmc3k_normalized()#x1B[90m#x1B[39;49;00m
        pbmc.X = array_type(pbmc.X)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mwith#x1B[39;49;00m (#x1B[90m#x1B[39;49;00m
            pytest.warns(#x1B[96mUserWarning#x1B[39;49;00m, match=#x1B[33mr#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mIgnoring rng.*sparse dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X, DaskArray) #x1B[95mand#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(pbmc.X._meta, CSBase)#x1B[90m#x1B[39;49;00m
            #x1B[94melse#x1B[39;49;00m nullcontext()#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
>           a, b, c = (#x1B[90m#x1B[39;49;00m
            ^^^^^^^#x1B[90m#x1B[39;49;00m
                sc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
                #x1B[94mfor#x1B[39;49;00m seed #x1B[95min#x1B[39;49;00m (#x1B[94m42#x1B[39;49;00m, #x1B[94m42#x1B[39;49;00m, #x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m

#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:348: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
#x1B[1m#x1B[31mtests/test_pca.py#x1B[0m:349: in <genexpr>
    #x1B[0msc.pp.pca(pbmc, copy=#x1B[94mTrue#x1B[39;49;00m, dtype=np.float64, **{rng_arg: seed})#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[.../scanpy/_utils/random.py#x1B[0m:200: in wrapper
    #x1B[0m#x1B[94mreturn#x1B[39;49;00m func(*args, **kwargs)#x1B[90m#x1B[39;49;00m
           ^^^^^^^^^^^^^^^^^^^^^#x1B[90m#x1B[39;49;00m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = AnnData object with n_obs × n_vars = 2700 × 16634
    var: 'gene_ids', 'n_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p', 'hvg'
n_comps = 50

    #x1B[0m#x1B[37m@_doc_params#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        mask_var_hvg=doc_mask_var_hvg,#x1B[90m#x1B[39;49;00m
    )#x1B[90m#x1B[39;49;00m
    #x1B[37m@_accepts_legacy_random_state#x1B[39;49;00m(#x1B[94m0#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mpca#x1B[39;49;00m(  #x1B[90m# noqa: PLR0912, PLR0913, PLR0915#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        data: AnnData | np.ndarray | CSBase,#x1B[90m#x1B[39;49;00m
        n_comps: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        *,#x1B[90m#x1B[39;49;00m
        layer: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        obsm: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        zero_center: #x1B[96mbool#x1B[39;49;00m = #x1B[94mTrue#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        svd_solver: SvdSolver | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunked: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        chunk_size: #x1B[96mint#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        rng: SeedLike | RNGLike | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        return_info: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        mask_var: NDArray[np.bool] | #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m | Empty = _empty,#x1B[90m#x1B[39;49;00m
        use_highly_variable: #x1B[96mbool#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        dtype: DTypeLike = #x1B[33m"#x1B[39;49;00m#x1B[33mfloat32#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        key_added: #x1B[96mstr#x1B[39;49;00m | #x1B[94mNone#x1B[39;49;00m = #x1B[94mNone#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        copy: #x1B[96mbool#x1B[39;49;00m = #x1B[94mFalse#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
    ) -> AnnData | np.ndarray | CSBase | #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
    #x1B[90m    #x1B[39;49;00m#x1B[33mr#x1B[39;49;00m#x1B[33m"""Principal component analysis :cite:p:`Pedregosa2011`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Computes PCA coordinates, loadings and variance decomposition.#x1B[39;49;00m
    #x1B[33m    Uses the following implementations (and defaults for `svd_solver`):#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. list-table::#x1B[39;49;00m
    #x1B[33m       :header-rows: 1#x1B[39;49;00m
    #x1B[33m       :stub-columns: 1#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m       - -#x1B[39;49;00m
    #x1B[33m         - :class:`~numpy.ndarray`, :class:`~scipy.sparse.spmatrix`, or :class:`~scipy.sparse.sparray`#x1B[39;49;00m
    #x1B[33m         - :class:`dask.array.Array`#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=True`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.PCA` (`'arpack'`)#x1B[39;49;00m
    #x1B[33m         - - *dense*: dask-ml :class:`~dask_ml.decomposition.PCA`\ [#high-mem]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m           - *sparse* or `svd_solver='covariance_eigh'`: custom implementation (`'covariance_eigh'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=False`, `zero_center=False`#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.TruncatedSVD` (`'randomized'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.TruncatedSVD`\ [#dense-only]_ (`'tsqr'`)#x1B[39;49;00m
    #x1B[33m       - - `chunked=True` (`zero_center` ignored)#x1B[39;49;00m
    #x1B[33m         - sklearn :class:`~sklearn.decomposition.IncrementalPCA` (`'auto'`)#x1B[39;49;00m
    #x1B[33m         - dask-ml :class:`~dask_ml.decomposition.IncrementalPCA`\ [#densifies]_ (`'auto'`)#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. [#high-mem] Consider `svd_solver='covariance_eigh'` to reduce memory usage (see :issue:`dask/dask-ml#985`).#x1B[39;49;00m
    #x1B[33m    .. [#dense-only] This implementation can not handle sparse chunks, try manually densifying them.#x1B[39;49;00m
    #x1B[33m    .. [#densifies] This implementation densifies sparse chunks and therefore has increased memory usage.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    .. array-support:: pp.pca#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Parameters#x1B[39;49;00m
    #x1B[33m    ----------#x1B[39;49;00m
    #x1B[33m    data#x1B[39;49;00m
    #x1B[33m        The (annotated) data matrix of shape `n_obs` × `n_vars`.#x1B[39;49;00m
    #x1B[33m        Rows correspond to cells and columns to genes.#x1B[39;49;00m
    #x1B[33m    n_comps#x1B[39;49;00m
    #x1B[33m        Number of principal components to compute. Defaults to 50,#x1B[39;49;00m
    #x1B[33m        or 1 - minimum dimension size of selected representation.#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.layers` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    obsm#x1B[39;49;00m
    #x1B[33m        If provided, which element of :attr:`~anndata.AnnData.obsm` to use for PCA instead of `X`.#x1B[39;49;00m
    #x1B[33m    zero_center#x1B[39;49;00m
    #x1B[33m        If `True`, compute (or approximate) PCA from covariance matrix.#x1B[39;49;00m
    #x1B[33m        If `False`, performa a truncated SVD instead of PCA.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Our default PCA algorithms (see `svd_solver`) support implicit zero-centering,#x1B[39;49;00m
    #x1B[33m        and therefore efficiently operating on sparse data.#x1B[39;49;00m
    #x1B[33m    svd_solver#x1B[39;49;00m
    #x1B[33m        SVD solver to use.#x1B[39;49;00m
    #x1B[33m        See table above to see which solver class is used based on `chunked` and `zero_center`,#x1B[39;49;00m
    #x1B[33m        as well as the default solver for each class when `svd_solver=None`.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        Efficient computation of the principal components of a sparse matrix#x1B[39;49;00m
    #x1B[33m        currently only works with the `'arpack`' or `'covariance_eigh`' solver.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        `None`#x1B[39;49;00m
    #x1B[33m            Choose automatically based on solver class (see table above).#x1B[39;49;00m
    #x1B[33m        `'arpack'`#x1B[39;49;00m
    #x1B[33m            ARPACK wrapper in SciPy (:func:`~scipy.sparse.linalg.svds`).#x1B[39;49;00m
    #x1B[33m            Not available for *dask* arrays.#x1B[39;49;00m
    #x1B[33m        `'covariance_eigh'`#x1B[39;49;00m
    #x1B[33m            Classic eigendecomposition of the covariance matrix, suited for tall-and-skinny matrices.#x1B[39;49;00m
    #x1B[33m            With dask, array must be CSR or dense and chunked as `(N, adata.shape[1])`.#x1B[39;49;00m
    #x1B[33m        `'randomized'`#x1B[39;49;00m
    #x1B[33m            Randomized algorithm from :cite:t:`Halko2009`.#x1B[39;49;00m
    #x1B[33m            For *dask* arrays, this will use :func:`~dask.array.linalg.svd_compressed`.#x1B[39;49;00m
    #x1B[33m        `'auto'`#x1B[39;49;00m
    #x1B[33m            Choose automatically depending on the size of the problem:#x1B[39;49;00m
    #x1B[33m            Will use `'full'` for small shapes and `'randomized'` for large shapes.#x1B[39;49;00m
    #x1B[33m        `'tsqr'`#x1B[39;49;00m
    #x1B[33m            “tall-and-skinny QR” algorithm from :cite:t:`Benson2013`.#x1B[39;49;00m
    #x1B[33m            Only available for dense *dask* arrays.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.9.3#x1B[39;49;00m
    #x1B[33m           Default value changed from `'arpack'` to None.#x1B[39;49;00m
    #x1B[33m        .. versionchanged:: 1.4.5#x1B[39;49;00m
    #x1B[33m           Default value changed from `'auto'` to `'arpack'`.#x1B[39;49;00m
    #x1B[33m    chunked#x1B[39;49;00m
    #x1B[33m        If `True`, perform an incremental PCA on segments of `chunk_size`.#x1B[39;49;00m
    #x1B[33m        Automatically zero centers and ignores settings of `zero_center`, `random_seed` and `svd_solver`.#x1B[39;49;00m
    #x1B[33m        If `False`, perform a full PCA/truncated SVD (see `svd_solver` and `zero_center`).#x1B[39;49;00m
    #x1B[33m        See table above for which solver class is used.#x1B[39;49;00m
    #x1B[33m    chunk_size#x1B[39;49;00m
    #x1B[33m        Number of observations to include in each chunk.#x1B[39;49;00m
    #x1B[33m        Required if `chunked=True` was passed.#x1B[39;49;00m
    #x1B[33m    rng#x1B[39;49;00m
    #x1B[33m        Change to use different initial states for the optimization.#x1B[39;49;00m
    #x1B[33m    return_info#x1B[39;49;00m
    #x1B[33m        Only relevant when not passing an :class:`~anndata.AnnData`:#x1B[39;49;00m
    #x1B[33m        see “Returns”.#x1B[39;49;00m
    #x1B[33m    {mask_var_hvg}#x1B[39;49;00m
    #x1B[33m    layer#x1B[39;49;00m
    #x1B[33m        Layer of `adata` to use as expression values.#x1B[39;49;00m
    #x1B[33m    dtype#x1B[39;49;00m
    #x1B[33m        Numpy data type string to which to convert the result.#x1B[39;49;00m
    #x1B[33m    key_added#x1B[39;49;00m
    #x1B[33m        If not specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ `['X_pca']`, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ `['PCs']`, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ `['pca']`.#x1B[39;49;00m
    #x1B[33m        If specified, the embedding is stored as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.obsm`\ ``[key_added]``, the loadings as#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.varm`\ ``[key_added]``, and the the parameters in#x1B[39;49;00m
    #x1B[33m        :attr:`~anndata.AnnData.uns`\ ``[key_added]``.#x1B[39;49;00m
    #x1B[33m    copy#x1B[39;49;00m
    #x1B[33m        If an :class:`~anndata.AnnData` is passed, determines whether a copy#x1B[39;49;00m
    #x1B[33m        is returned. Is ignored otherwise.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Returns#x1B[39;49;00m
    #x1B[33m    -------#x1B[39;49;00m
    #x1B[33m    If `data` is array-like and `return_info=False` was passed,#x1B[39;49;00m
    #x1B[33m    this function returns the PCA representation of `data` as an#x1B[39;49;00m
    #x1B[33m    array of the same type as the input array.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    Otherwise, it returns `None` if `copy=False`, else an updated `AnnData` object.#x1B[39;49;00m
    #x1B[33m    Sets the following fields:#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    `.obsm['X_pca' | key_added]` : :class:`~scipy.sparse.csr_matrix` | :class:`~scipy.sparse.csc_matrix` | :class:`~numpy.ndarray` (shape `(adata.n_obs, n_comps)`)#x1B[39;49;00m
    #x1B[33m        PCA representation of data.#x1B[39;49;00m
    #x1B[33m    `.varm['PCs' | key_added]` : :class:`~numpy.ndarray` (shape `(adata.n_vars, n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm=None`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['components']` : :class:`~numpy.ndarray` (shape `(adata.obsm[obsm].shape[1], n_comps)`)#x1B[39;49;00m
    #x1B[33m        The principal components containing the loadings *when `obsm="..."`*.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance_ratio']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Ratio of explained variance.#x1B[39;49;00m
    #x1B[33m    `.uns['pca' | key_added]['variance']` : :class:`~numpy.ndarray` (shape `(n_comps,)`)#x1B[39;49;00m
    #x1B[33m        Explained variance, equivalent to the eigenvalues of the#x1B[39;49;00m
    #x1B[33m        covariance matrix.#x1B[39;49;00m
    #x1B[33m#x1B[39;49;00m
    #x1B[33m    """#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        logg_start = logg.info(#x1B[33m"#x1B[39;49;00m#x1B[33mcomputing PCA#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            #x1B[90m# Current chunking implementation relies on pca being called on X#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33m"#x1B[39;49;00m#x1B[33mCannot use `layer`/`obsm` and `chunked` at the same time.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# chunked calculation is not randomized, anyways#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m svd_solver #x1B[95min#x1B[39;49;00m {#x1B[33m"#x1B[39;49;00m#x1B[33mauto#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m, #x1B[33m"#x1B[39;49;00m#x1B[33mrandomized#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m} #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked:#x1B[90m#x1B[39;49;00m
            logg.info(#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mNote that scikit-learn#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33ms randomized PCA might not be exactly #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducible across different computational platforms. For exact #x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[33m"#x1B[39;49;00m#x1B[33mreproducibility, choose `svd_solver=#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33marpack#x1B[39;49;00m#x1B[33m'#x1B[39;49;00m#x1B[33m`.#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            )#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m return_anndata := #x1B[96misinstance#x1B[39;49;00m(data, AnnData):#x1B[90m#x1B[39;49;00m
            #x1B[94mif#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mand#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m) #x1B[95mand#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m chunked #x1B[95mand#x1B[39;49;00m is_backed_type(data.X):#x1B[90m#x1B[39;49;00m
                msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(data.X)#x1B[33m}#x1B[39;49;00m#x1B[33m with chunked as False#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
                #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
            adata = data.copy() #x1B[94mif#x1B[39;49;00m copy #x1B[94melse#x1B[39;49;00m data#x1B[90m#x1B[39;49;00m
        #x1B[94melse#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            adata = AnnData(data)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# Unify new mask argument and deprecated use_highly_varible argument#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        mask_var_param, mask_var = _handle_mask_var(#x1B[90m#x1B[39;49;00m
            adata, mask_var, obsm=obsm, use_highly_variable=use_highly_variable#x1B[90m#x1B[39;49;00m
        )#x1B[90m#x1B[39;49;00m
        #x1B[94mdel#x1B[39;49;00m use_highly_variable#x1B[90m#x1B[39;49;00m
        adata_comp = adata[:, mask_var] #x1B[94mif#x1B[39;49;00m mask_var #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[94melse#x1B[39;49;00m adata#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m n_comps #x1B[95mis#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            min_dim = #x1B[96mmin#x1B[39;49;00m(adata_comp.n_vars, adata_comp.n_obs)#x1B[90m#x1B[39;49;00m
            n_comps = min_dim - #x1B[94m1#x1B[39;49;00m #x1B[94mif#x1B[39;49;00m min_dim <= settings.N_PCS #x1B[94melse#x1B[39;49;00m settings.N_PCS#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        logg.info(#x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33m    with #x1B[39;49;00m#x1B[33m{#x1B[39;49;00mn_comps#x1B[33m=}#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        x = _get_obs_rep(adata_comp, layer=layer, obsm=obsm)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m is_backed_type(x) #x1B[95mand#x1B[39;49;00m (layer #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m obsm #x1B[95mis#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[94mNone#x1B[39;49;00m):#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mPCA is not implemented for matrices of type #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(x)#x1B[33m}#x1B[39;49;00m#x1B[33m from layers/obsm#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[94mraise#x1B[39;49;00m #x1B[96mNotImplementedError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
        #x1B[90m# dask needs an int for random state#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        rng = np.random.default_rng(rng)#x1B[90m#x1B[39;49;00m
        #x1B[94mif#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(rng, _FakeRandomGen) #x1B[95mor#x1B[39;49;00m #x1B[95mnot#x1B[39;49;00m #x1B[96misinstance#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
            rng._arg, #x1B[96mint#x1B[39;49;00m | np.random.RandomState#x1B[90m#x1B[39;49;00m
        ):#x1B[90m#x1B[39;49;00m
            #x1B[90m# TODO: remove this error and if we don’t have a _FakeRandomGen,#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            #x1B[90m#       just use rng.integers to make a seed farther down#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
            msg = #x1B[33mf#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mrng needs to be an int or a np.random.RandomState, not a #x1B[39;49;00m#x1B[33m{#x1B[39;49;00m#x1B[96mtype#x1B[39;49;00m(rng).#x1B[91m__name__#x1B[39;49;00m#x1B[33m}#x1B[39;49;00m#x1B[33m when passing a dask array#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
>           #x1B[94mraise#x1B[39;49;00m #x1B[96mTypeError#x1B[39;49;00m(msg)#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[31mE           TypeError: rng needs to be an int or a np.random.RandomState, not a Generator when passing a dask array#x1B[0m

#x1B[1m#x1B[.../preprocessing/_pca/__init__.py#x1B[0m:257: TypeError

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

src/scanpy/preprocessing/_pca/_compat.py

src/scanpy/preprocessing/_scrublet/sparse_utils.py

src/scanpy/tools/_umap.py

src/scanpy/neighbors/__init__.py

src/scanpy/preprocessing/_simple.py

src/scanpy/preprocessing/_pca/__init__.py

flying-sheep · 2026-02-27T15:32:21Z

src/scanpy/preprocessing/_pca/__init__.py

+    rng = np.random.default_rng(rng)
+    if not isinstance(rng, _FakeRandomGen) or not isinstance(
+        rng._arg, int | np.random.RandomState
+    ):
+        # TODO: remove this error and if we don’t have a _FakeRandomGen,
+        #       just use rng.integers to make a seed farther down
+        msg = f"rng needs to be an int or a np.random.RandomState, not a {type(rng).__name__} when passing a dask array"


reminder to self: do this TODO

selmanozleyen · 2026-02-27T15:39:52Z

src/scanpy/preprocessing/_simple.py

+        for rowidx, sub_rng in zip(
+            under_target, rng.spawn(len(under_target)), strict=True
+        ):
            _downsample_array(


I think spawn is more for parallel independent streams. This is correct for reproducibility but it might be causing performance overhead for independence that is usually not needed. So far I saw usage of spawn in either when rng is being given to another function while having keeping the rng still independent in the current code flow or in parallel cases. Not in a sequential for loop. My argument goes like this: If this for loop was unrolled and the code was written sequentially would we still use spawn. I think no because in sparse multiply we don't do that for example.

In fact I ran an ai generated script and these were the results. For some reason FakeRandomGen is super slow. But also shows that spawn actually has an overhead.

python bench_downsample.py Warming up numba JIT... Done. counts_per_cell=50 n_obs=500 n_vars=200 nnz=29840 legacy (random_state=0) 20.07 ms real Generator (with spawn) 8.30 ms real Generator (no spawn) 4.53 ms --> spawn overhead: +83% real vs legacy: -59% counts_per_cell=100 n_obs=5000 n_vars=500 nnz=746289 legacy (random_state=0) 399.50 ms real Generator (with spawn) 86.80 ms real Generator (no spawn) 53.74 ms --> spawn overhead: +62% real vs legacy: -78% counts_per_cell=50 n_obs=20000 n_vars=1000 nnz=5969861 legacy (random_state=0) 2968.35 ms real Generator (with spawn) 320.06 ms real Generator (no spawn) 185.98 ms --> spawn overhead: +72% real vs legacy: -89% ❯ python bench_downsample.py Warming up numba JIT... Done. counts_per_cell=50 n_obs=500 n_vars=200 nnz=29840 legacy (_FakeRandomGen) 18.96 ms real Generator (with spawn) 7.77 ms real Generator (no spawn) 4.22 ms --> spawn overhead: +84% real vs legacy: -59% counts_per_cell=100 n_obs=5000 n_vars=500 nnz=746289 legacy (_FakeRandomGen) 412.81 ms real Generator (with spawn) 88.76 ms real Generator (no spawn) 54.64 ms --> spawn overhead: +62% real vs legacy: -78% counts_per_cell=50 n_obs=20000 n_vars=1000 nnz=5969861 legacy (_FakeRandomGen) 2956.91 ms real Generator (with spawn) 322.48 ms real Generator (no spawn) 187.00 ms --> spawn overhead: +72% real vs legacy: -89% ~/p/scanpy │ pa/rng ⇣1 *1 !2 ?2 ✔ │ 28s │ scanpy Py │ 16:30:20

the script: https://gist.github.com/selmanozleyen/2046c849bb10f541642cea2ac7daa0db

ilan-gold · 2026-02-27T15:56:27Z

they can spawn independent children (doing so advances their internal state once)

Is this True? I was looking into your comment and found this:

# High quality entropy created with: f"0x{secrets.randbits(128):x}"
import numpy as np
entropy = 0x3034c61a9ae04ff8cb62ab8ec2c4b501
rng = np.random.default_rng(entropy)
# Generate a random number as the first step
first_random = rng.uniform()
rng = np.random.default_rng(entropy)
# Now, spawn as the first step
child_rng1, child_rng2 = rng.spawn(2)
# And then do RNG
assert first_random == rng.uniform()

Also, reading through the linked thread, it seems the consensus was spawn does not advance internal state, although maybe I'm misunderstanding what you mean by "internal state" (you even say in your comment "but this means that when users pass a Generator, its state isn’t advanced at all (since spawn doesn’t do that)"). Do you mean the internal state of the children?

flying-sheep · 2026-02-27T16:40:11Z

@ilan-gold yeah I'm sure! I was confused: they have two parts of internal state. Spawning advances the SeedSequence. I can do a writeup when I'm back working

ilan-gold · 2026-02-27T17:10:55Z

Ok @flying-sheep That makes a load of sense - I assumed as much was going on (how else could what I posted be true unless they were literally tracking the number of children created) and it turns out, that is exactly what is happening. Like literally:

SeedSequence(
    entropy=64076961259285389890164002958222865665,
    n_children_spawned=2,
)

after spawning so the entropy is unchanged.

ilan-gold

Should we store the new RNG in adata.uns? If no, this fixes #1131

No opinion, seems moderately unrelated. I guess with the new structure it's a bit meaningless because seeds have no impact if you're passing in a Generator and you can't serialize a Generator

Should we keep random_state in the docs?

No

How should we annotate that the default rng isn’t actually None but “a new instance of _LegacyRandom(0)” but people can pass rng=None to get the future default behavior?

I noticed there is no default docstring for rng like for random_state. Why not? It seems like the long term goal is to unify public APIs. Is there a way to create a default docstring that generalizes but can refer to an old version of the pakcage/docs as a disclaimer for "default" behavior? Maybe not. It's hard for me to hold everything in my head.

Should I handle passing rng to neighbors transformer?

It seems like we don't do it now unless it's a self-constructed transformer, so I probably wouldn't. Is this what you mean?

ilan-gold · 2026-03-06T13:05:54Z

src/scanpy/tools/_umap.py

    """
+    rng_init, rng_umap = np.random.default_rng(rng).spawn(2)
+    meta_random_state = (
+        dict(random_state=rng._arg) if isinstance(rng, _FakeRandomGen) else {}


Why make _arg private?

I guess to distinguish it from all the attributes inherited from Generator, but I guess it’s not necessary.

It’s a private class, so once we’ve established that an object has this class, accessing its fields might as well be public.

ilan-gold · 2026-03-06T13:09:58Z

src/scanpy/_utils/random.py

+        """Return `self` `n_children` times.
+
+        In a real generator, the spawned children are independent,
+        but for backwards compatibility we return the same instance.


Suggested change

but for backwards compatibility we return the same instance.

but for backwards compatibility we return the same instance so that its internal state is advanced by each child.

ilan-gold · 2026-03-06T13:24:06Z

src/scanpy/tools/_draw_graph.py

+    meta_random_state = (
+        dict(random_state=rng._arg) if isinstance(rng, _FakeRandomGen) else {}
+    )
+    rng = _if_legacy_apply_global(rng)


Is this not a behavior change? For example if init_pos in adata.obsm branch + if layout == "fa" would never set the global seed?

this kind of comment is exactly why I was looking forward to more eyes on this PR, thanks!

ilan-gold · 2026-03-06T13:25:28Z

src/scanpy/preprocessing/_utils.py

+    if isinstance(rng, _FakeRandomGen):
+        from sklearn.random_projection import sample_without_replacement
+
+        idx = sample_without_replacement(np.prod(dims), nsamp, random_state=rng._arg)


Again seems like _arg should be public

ilan-gold · 2026-03-06T13:26:14Z

src/scanpy/preprocessing/_simple.py

+    return _downsample_array_inner(col, cumcounts, sample, inplace=inplace)
+
+
+# TODO: can/should this be parallelized?


Seems like a good candidate!

yeah, I just left it alone to not cause any more changes in this PR as there were.

ilan-gold · 2026-03-06T13:33:22Z

src/scanpy/preprocessing/_pca/_compat.py

+    rng = np.random.default_rng(rng)
+    # this exists only to be stored in our PCA container object
+    random_state_meta = _legacy_random_state(rng)
+    [rng_init, rng_svds] = rng.spawn(2)


I see you have "we use spawn to pass sub-generators to each subtask, so we can add subtasks without affecting subsequent calls e.g." in your comment on the numpy issue but why do we want this property of independent subtasks?

See my comment there: numpy/numpy#24086 (comment)

I’m torn between these two arguments:

when libraries change from under us, things change anyway, so whenever you update your packages and expect your script to produce the exact same result even when passing a seed, and

Some steps aren’t impactful enough to have qualitative changes, and spawning allows to pass rng to more things that we previously didn’t pass it to (see example in that comment)

[rng_root] = rng.spawn(1) -[rng_eigsh] = rng_root.spawn(1) +rng_eigsh, rng_other_stuff = rng_root.spawn(2) random_init = rng_root.uniform(...) eigsh(A, k, v0=random_init, rng=rng_eigsh) +do_thing(rng_other_stuff)

i think in your example it only matters if you don't know eigsh still works under the hood in parallel after it returns or not. But we know it doesn't right? So continueing with rng_other_stuff or rng_eigsh doesn't matter.

I think it only matters for either:

abstraction reasons: when you don't know or don't want to know what the function you call does with the rng, how long it keeps it etc.

or writing parallel code

flying-sheep · 2026-03-06T14:47:11Z

you can't serialize a Generator

Not in AnnData, but being serialized in general is an explicit feature of them.

I noticed there is no default docstring for rng like for random_state. Why not? It seems like the long term goal is to unify public APIs.

great point, I’ll add that, and that’ll make documenting this easy.

It seems like we don't do it now unless it's a self-constructed transformer

I think we do it when the user passed a transformer class or when using an explicitly transformer that supports it (sklearn?)

My point was: sklearn doesn’t support rng yet, should we pre-empt the possible change and somehow handle both random_state and rng even though there probably won’t be transformers that accept rng for a while.

flying-sheep added 2 commits February 23, 2026 16:26

feat: support np.random.Generator

de9c481

add decorator

8ab6661

flying-sheep added 7 commits February 24, 2026 14:19

scrublet

1ef8780

almost done

5308a1a

Merge branch 'main' into pa/rng

93a8c0b

fix scrublet_simulate_doublets

32b3ddc

fix _RNGIgraph compat

c3da2bb

Merge branch 'main' into pa/rng

2c82b67

whoops

bd85d95

flying-sheep added this to the 1.13.0 milestone Feb 26, 2026

relnote

8247cdb

flying-sheep marked this pull request as ready for review February 26, 2026 12:55

flying-sheep added 5 commits February 26, 2026 14:37

don’t store rng in random_state arg

47f3ceb

make consistent

1e43b2a

use sub-generators

64a0f26

docs

baf2c85

paga

7e2fab5

flying-sheep requested review from ilan-gold and selmanozleyen February 27, 2026 13:22