perf(score_genes): avoid copy-heavy sparse nan mean path#4159
Conversation
SID-6921
commented
Jun 14, 2026
- Problem: _sparse_nanmean currently makes sparse copies and eliminate_zeros calls.
- Change: aggregate sums and NaN counts directly via compressed index pointers (csr/csc), no matrix copies.
- Correctness: preserves np.nanmean-equivalent behavior for sparse matrices.
- Tests: expanded test_sparse_nanmean to run on both csr and csc formats.
- Validation command: ANNDATA_ZARR_WRITE_FORMAT=3 python -m pytest tests/test_score_genes.py -k sparse_nanmean -q
- Closes _sparse_nanmean is inefficient #1894
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR optimizes _sparse_nanmean() for compressed sparse matrices and expands unit tests to validate behavior across sparse storage formats.
Changes:
- Reworked
_sparse_nanmean()to avoid sparse matrix copies/eliminate_zeros()and compute reductions viaindptr-based aggregation. - Added explicit runtime validation for
axisvalues. - Extended tests to run
_sparse_nanmean()against both CSR and CSC inputs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/test_score_genes.py | Parameterizes the test to exercise both CSR and CSC matrix formats. |
| src/scanpy/tools/_score_genes.py | Replaces copy-heavy NaN-mean computation with a pointer-based aggregation approach and adds axis validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| with np.errstate(invalid="ignore", divide="ignore"): | ||
| return sums / counts |
| segment_ids = np.repeat(np.arange(out_size), segment_lengths) | ||
| isnan = np.isnan(mat.data) | ||
|
|
||
| sums = np.bincount( | ||
| segment_ids[~isnan], | ||
| weights=mat.data[~isnan], | ||
| minlength=out_size, | ||
| ).astype(np.float64, copy=False) | ||
| nan_counts = np.bincount(segment_ids[isnan], minlength=out_size) |
| if axis not in (0, 1): | ||
| msg = "axis must be 0 or 1" | ||
| raise ValueError(msg) |
|
Maintainers: this PR is failing the metadata gate because I cannot apply labels from a fork. Could you please add the |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4159 +/- ##
==========================================
- Coverage 79.61% 79.60% -0.02%
==========================================
Files 120 120
Lines 12786 12790 +4
==========================================
+ Hits 10180 10181 +1
- Misses 2606 2609 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|