perf: chan's parallel mean-var algorithm for dask-backed arrays (sparse/dense)#4143
perf: chan's parallel mean-var algorithm for dask-backed arrays (sparse/dense)#4143ilan-gold wants to merge 10 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4143 +/- ##
==========================================
+ Coverage 79.61% 79.65% +0.04%
==========================================
Files 120 120
Lines 12786 12826 +40
==========================================
+ Hits 10180 10217 +37
- Misses 2606 2609 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
@ilan-gold I added njit support, see: #4153 . This enables rank_gene_groups to use njit. I integrated this to the rank_gene_groups PR and benchmarked there as well as here and it gives a speedup on both at normal group x gene sizes. |
|
Nice commented there about something, but once you got the pre-commit fixed as well, I'll merge into this |
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
| return out | ||
|
|
||
|
|
||
| @numba.njit(inline="always") # noqa: TID251 |
There was a problem hiding this comment.
@ilan-gold I think I had forgotten to add these noqa's. But I have the precommit hook setup now
See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
Based on a #4118 (comment) with @zboldyga
This has two benefits - it allows us to calculate mean/var in one pass instead of effectively two (square sum and sum squared) and gets rid of a numerical instability issue that @zboldyga found the solution to (see removed comment)