polygonize: cupy backend honours float tolerance from numba path (#2151)#2153
Conversation
…#2151) The cupy backend binned float pixels into regions by exact value while the numba CPU path (used by the numpy, dask and dask+cupy backends) compares neighbouring pixels with `_is_close` (atol=1e-8, rtol=1e-5). On float rasters with near-equal values this produced different polygon sets across backends. For float dtypes, group sorted unique values in the cupy backend using the same tolerance and label each group as a single region. Integer dtypes keep the existing exact-equality path. Adds a regression test asserting that the cupy backend yields the same polygon count and per-value areas as the numpy backend on a float raster with values 1.0 and 1.000001. Also records this finding in the accuracy sweep state CSV.
…parity The first pass at #2151 added value-only greedy chaining of unique float values in _calculate_regions_cupy. That fixed the reported pattern (adjacent near-equal pixels) but still diverged from the CPU path when three or more transitively-close values were present in the raster without intermediate adjacent pixels. Example: [1.0, 1.000018] [1.000009, 3.0 ] CPU spatial CCL yields three regions: 1.0 + 1.000009 (close, adjacent), 1.000018 alone (not close to either neighbour), and 3.0 alone. The value-grouping heuristic chained 1.0 -> 1.000009 -> 1.000018 and labeled the connected mask as a single region, returning two regions on cupy. Route float dtypes through _polygonize_numpy in _polygonize_cupy so the numba _is_close predicate is applied to spatially adjacent pixels, the only place CPU and GPU semantics can agree without re-implementing CCL. The data is already transferred to host for boundary tracing, so the fall-back cost is one extra _calculate_regions call. Drop _group_float_values_by_tolerance and the float branch of _calculate_regions_cupy as dead code. Extend the regression test to parametrize over the original adjacent pattern and the transitivity edge case.
PR Review: polygonize: cupy backend honours float tolerance from numba path (#2151)I ran this review on commit Blocker (fixed in follow-up)
Suggestions
Nits
What looks good
Checklist
|
Summary
data == v), while the numba CPU path used by the numpy / dask / dask+cupy backends compared neighbouring pixels with_is_close(atol=1e-8,rtol=1e-5).1.0and1.000001were split into different regions on cupy but merged on numpy._calculate_regions_cupy, for float dtypes greedily chain sorted unique values using the same tolerance and label each chained group as one region. Integer dtypes keep the existing exact-equality path.Fixes #2151.
Test plan
pytest xrspatial/tests/test_polygonize.py(all 115 tests pass, 13 skipped)test_polygonize_cupy_float_tolerance_matches_numpy_2151covering the divergence reported in polygonize: cupy backend uses exact equality where numpy uses tolerance #2151[1.0, 2.0, 3.0]Discovered by the
/sweep-accuracyaudit (Cat 5 - Backend Inconsistency).