New keyword parameter to cf.Field.regrids and cf.Field.regridc: max_masked#950
New keyword parameter to cf.Field.regrids and cf.Field.regridc: max_masked#950davidhassell wants to merge 6 commits into
cf.Field.regrids and cf.Field.regridc: max_masked#950Conversation
sadielbartholomew
left a comment
There was a problem hiding this comment.
Aside from some typos it is perfect, so please review those and then merge at will.
That said, I wanted to comment regarding API consistency - though it relates more to the issue at hand and your proposed way to handle it (i.e. #949) rather than the implementation of it here. Namely I notice we have the keyword 'mtol' for Field.collapse and the various convenience methods for collapses (mean, integral etc.) and that aligns very closely with the new max_masked keyword introduced, given its description:
Set the fraction of input data elements which is allowed to contain missing data when contributing to an individual output data element. Where this fraction exceeds mtol, missing data is returned. The default is 1, meaning that a missing datum in the output array occurs when its contributing input array elements are all missing data. A value of 0 means that a missing datum in the output array occurs whenever any of its contributing input array elements are missing data. Any intermediate value is permitted.
They have different names but (by my understanding) effectively control the same thing (input masking to output masking) - bar the slight complication of the role of any weighting used for the regrid case? But one is specified by number, the other by fraction.
So in future we could consider choosing one over the other for both collapses and regridding (and any other applicable methods).
|
Hi Sadie - thanks for the review. I'd like to think about the pertinent API question you raise before merging. This would involve small tweak to the code (to ascertain the number of source grid cells in play, and to convert the fraction to a number of masked points in each case), but that's no more or less fiddly than what we've already got! I'll take a look .... |
|
Hi Sadie - PR updated for |
There was a problem hiding this comment.
Thanks David for updating the approach - sorry it was extra work but like you indicated, it provides a more flexible approach than the max_masked original kwarg idea and it means we have a consistent kwarg 'mtol' across multiple supported methods. So, much nicer overall.
I reviewed the new commit and then did a sanity check on the whole PR branch. Overall is all good. Some more typos to correct (just two really, copied and pasted across) and a changelog conflict to resolve, but other than that all ready to merge so please go ahead.
|
|
||
| A destination grid cell j will be masked if *mtol* | ||
| multiplied by the total number of source cells i for | ||
| which ``w_ji >= min_weight`` is greater then the |
There was a problem hiding this comment.
| which ``w_ji >= min_weight`` is greater then the | |
| which ``w_ji >= min_weight`` is greater than the |
| For instance, for a rectilinear source grid for which | ||
| up to 4 source grid cells contribute to each | ||
| destination grid cell, if *mtol* is in the range | ||
| ``[0.5, 0.75)`` then a destination grid cell will in | ||
| general only be be masked if three or more of its | ||
| source grid cells are masked. |
There was a problem hiding this comment.
+1, very useful to have an example as its a bit difficult to envision the concept in practice otherwise.
|
|
||
| A destination grid cell j will be masked if *mtol* | ||
| multiplied by the total number of source cells i for | ||
| which ``w_ji >= min_weight`` is greater then the |
There was a problem hiding this comment.
| which ``w_ji >= min_weight`` is greater then the | |
| which ``w_ji >= min_weight`` is greater than the |
| up to 4 source grid cells contribute to each | ||
| destination grid cell, if *mtol* is in the range | ||
| ``[0.5, 0.75)`` then a destination grid cell will in | ||
| general only be be masked if three or more of its |
There was a problem hiding this comment.
| general only be be masked if three or more of its | |
| general only be masked if three or more of its |
| up to 4 source grid cells contribute to each | ||
| destination grid cell, if *mtol* is in the range | ||
| ``[0.5, 0.75)`` then a destination grid cell will in | ||
| general only be be masked if three or more of its |
There was a problem hiding this comment.
| general only be be masked if three or more of its | |
| general only be masked if three or more of its |
|
|
||
| A destination grid cell j will be masked if *mtol* | ||
| multiplied by the total number of source cells i for which | ||
| ``w_ji >= min_weight`` is greater then the number of those |
There was a problem hiding this comment.
| ``w_ji >= min_weight`` is greater then the number of those | |
| ``w_ji >= min_weight`` is greater than the number of those |
| For instance, for a rectilinear source grid for which up | ||
| to 4 source grid cells contribute to each destination grid | ||
| cell, if *mtol* is in the range ``[0.5, 0.75)`` then a | ||
| destination grid cell will in general only be be masked if |
There was a problem hiding this comment.
| destination grid cell will in general only be be masked if | |
| destination grid cell will in general only be masked if |
Fixes #949
All the excitement happens in
cf/data/dask_regrid.py. The rest is just passing the newmax_maskedparameter about, docs and tests :)