Skip to content

[ENH] Refactor sparse.py type hierarchy #737

@genedan

Description

@genedan

Description

I don't think we'll be able to cleanly solve #736 without reorganizing the type hierarchy of chainladder/utils/sparse.py.

What I had in mind was a dual typing protocol of ArrayModule (kind of like an ArrayLibraryLike type) and BackendArray (like numpy's ArrayLike type, but with modifications for interop between the 4 current backend array types to match our use case).

However, since the frequently-executed xp = get_array_module(...) returns either a ModuleType (numpy, dask, cupy) or an array class (sparse.COO) it's not clear which one it'll be when it's called dynamically. That is, it's unknown whether the type hint should be:

# Case when backend is numpy, sparse, or dask.
xp: ArrayModule = obj.get_array_module(...)

or

# Case when backend is sparse.
xp: BackendArray = obj.get_array_module(...)

Furthermore, when the backend is sparse, xp is also callable, but it's not in the case of the other backends. In order to use xp to create an array, you need to do it at different levels of hierarchy depending on the backend:

# Case sparse
sparse_array = xp(...)
# Case numpy
numpy_array = xp.ndarray(...)

A more consistent implementation of case sparse would be:

sparse_array = xp.COO(...)

If we manage to get a general ArrayModule class defined, we could do something like:

array = xp.BackendArray(...)

where the type of array to be produced is automatically determined by Triangle.array_backend. This would also avoid the various checks that are in place to make sure the backend is a certain type (usually sparse) before creating the array, and avoid calling the backend-specific ndarray and COO.

Is your feature request aligned with the scope of the package?

  • Yes, absolutely!
  • No, but it's still worth discussing.
  • N/A (this request is not a codebase enhancement).

Describe the solution you'd like, or your current workaround.

I believe this can be improved by reorganizing the objects in chainladder/utils/sparse.py. The library sparse can be augmented by assigning it the missing top-level functions that are either in numpy, or the ones that are currently defined in the file, whereas the augmented COO-level method calls can be mapped to sparse.COO.

I was able to play around with this approach and get the tests to pass, I would like to double-check later and open a PR for feedback.

Do you have any additional supporting notes?

This might not be doable in a 100% clean way, I noticed some COO methods to be overridden by top-level numpy functions, but I do think getting most of the way there will solve #736 and greatly improve the goals of #486.

Metadata

Metadata

Assignees

Labels

Effort > Moderate 🐕Mid-sized tasks estimated to take a few days to a few weeks.Impact > Moderate 🔶User-visible but non-breaking change. Treated like a minor version bump (e.g., 0.6.5 → 0.7.0).Refactor 🔧Code level improvements that restructure existing logic without changing behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions