Skip to content

perf(score_genes): avoid copy-heavy sparse nan mean path#4159

Open
SID-6921 wants to merge 1 commit into
scverse:mainfrom
SID-6921:perf/sparse-nanmean-no-copy
Open

perf(score_genes): avoid copy-heavy sparse nan mean path#4159
SID-6921 wants to merge 1 commit into
scverse:mainfrom
SID-6921:perf/sparse-nanmean-no-copy

Conversation

@SID-6921

Copy link
Copy Markdown
  • Problem: _sparse_nanmean currently makes sparse copies and eliminate_zeros calls.
  • Change: aggregate sums and NaN counts directly via compressed index pointers (csr/csc), no matrix copies.
  • Correctness: preserves np.nanmean-equivalent behavior for sparse matrices.
  • Tests: expanded test_sparse_nanmean to run on both csr and csc formats.
  • Validation command: ANNDATA_ZARR_WRITE_FORMAT=3 python -m pytest tests/test_score_genes.py -k sparse_nanmean -q
  • Closes _sparse_nanmean is inefficient #1894

Copilot AI review requested due to automatic review settings June 14, 2026 07:44

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR optimizes _sparse_nanmean() for compressed sparse matrices and expands unit tests to validate behavior across sparse storage formats.

Changes:

  • Reworked _sparse_nanmean() to avoid sparse matrix copies/eliminate_zeros() and compute reductions via indptr-based aggregation.
  • Added explicit runtime validation for axis values.
  • Extended tests to run _sparse_nanmean() against both CSR and CSC inputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
tests/test_score_genes.py Parameterizes the test to exercise both CSR and CSC matrix formats.
src/scanpy/tools/_score_genes.py Replaces copy-heavy NaN-mean computation with a pointer-based aggregation approach and adds axis validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to +60
with np.errstate(invalid="ignore", divide="ignore"):
return sums / counts
Comment on lines +48 to +56
segment_ids = np.repeat(np.arange(out_size), segment_lengths)
isnan = np.isnan(mat.data)

sums = np.bincount(
segment_ids[~isnan],
weights=mat.data[~isnan],
minlength=out_size,
).astype(np.float64, copy=False)
nan_counts = np.bincount(segment_ids[isnan], minlength=out_size)
Comment on lines +37 to +39
if axis not in (0, 1):
msg = "axis must be 0 or 1"
raise ValueError(msg)
@SID-6921

Copy link
Copy Markdown
Author

Maintainers: this PR is failing the metadata gate because I cannot apply labels from a fork. Could you please add the
o milestone\ label (or assign a milestone)? The code checks pass locally.

@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.60%. Comparing base (2ae768e) to head (0a84825).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/scanpy/tools/_score_genes.py 85.71% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4159      +/-   ##
==========================================
- Coverage   79.61%   79.60%   -0.02%     
==========================================
  Files         120      120              
  Lines       12786    12790       +4     
==========================================
+ Hits        10180    10181       +1     
- Misses       2606     2609       +3     
Flag Coverage Δ
hatch-test.low-vers 78.85% <85.71%> (+<0.01%) ⬆️
hatch-test.pre 79.46% <85.71%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/scanpy/tools/_score_genes.py 86.20% <85.71%> (-1.30%) ⬇️

... and 2 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_sparse_nanmean is inefficient

2 participants