-
Notifications
You must be signed in to change notification settings - Fork 2
Description
When incorporating active storage into Dask, the Dask graph needs to be modified non-lazily to account for the fact that some of the work is being done externally to Dask, i.e. on the server where the data is [*].
If cf-python thinks that active storage operation are not possible, then it simply doesn't modify the graph. There are many reasons why active storage is not deemed OK, such as the data has already been operated on (f += 2), the chunks to not point to files on disk, but the relevant one here is that the file reside at a location that doesn't support active reductions.
So, it would be great to have a method of Active that can tell us if a given file can be reduced actively, something like (notional API):
>>> a = Active('/path/to/file.nc')
>>> a.isactive()
True # or FalseI imagine that this could entail some try ... except ... approach whereby we assume the file is active, send off some mofided URI that returns True iff it is possible.
[*] (The detail of this is that the chunk reduction function used by dask.array.reduction needs to be changed from the usual function that expects to do some work (e.g. np.max) to the identity function that does no work (e.g. lambda x: x). This has to be done prior to the compute().)