Skip to content

polygonize: close test-coverage gaps for backends, edge cases, params#2156

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-test-coverage-polygonize-2026-05-19
May 20, 2026
Merged

polygonize: close test-coverage gaps for backends, edge cases, params#2156
brendancol merged 2 commits into
mainfrom
deep-sweep-test-coverage-polygonize-2026-05-19

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Deep-sweep test-coverage pass against xrspatial/polygonize.py on 2026-05-19. Adds xrspatial/tests/test_polygonize_coverage_2026_05_19.py with 58 tests, all passing on a CUDA host.

Gaps closed

Cat 1 (backend coverage)

  • simplify_tolerance= and mask= parity on the dask+cupy backend. numpy / cupy / dask were already covered.

Cat 2 (NaN / Inf / nodata)

  • NaN parity for cupy and dask+cupy backends.
  • all-NaN raster on every backend.
  • +/-Inf pins on every backend. The numpy / dask / dask+cupy backends silently absorb Inf cells into adjacent finite polygons (issue polygonize numpy/dask backends silently absorb +Inf/-Inf cells #2155); cupy emits them correctly. The pins lock the current asymmetric behaviour so the fix is visible as a test diff.

Cat 3 (geometric edge cases)

  • 1x1 single-pixel raster on all four backends + geopandas.
  • Nx1 single-column raster on all four backends. Exercises the nx==1 padding path at polygonize.py:565 and the cupy nx==1 numpy-fallback at polygonize.py:671. Neither code path had direct test coverage before.
  • 1xN single-row raster on all four backends.
  • All-equal-value raster on all four backends.

Cat 4 (parameter coverage)

  • column_name= non-default value across geopandas / spatialpandas / geojson return types.
  • Error paths: bad connectivity, bad transform length, mask shape mismatch, mask underlying-type mismatch.

Cat 5 not applicable: polygonize returns lists / dataframes, not a DataArray with attrs to propagate.

Source-bug surfaced

Filed #2155 for a real bug surfaced by the +/-Inf tests: numpy / dask / dask+cupy backends silently absorb Inf cells. Per sweep rules the source fix is NOT bundled with this test-only PR.

Test plan

  • python -m pytest xrspatial/tests/test_polygonize_coverage_2026_05_19.py -> 58 passed.
  • python -m pytest xrspatial/tests/test_polygonize.py xrspatial/tests/test_polygonize_coverage_2026_05_19.py -> 172 passed, 13 skipped (pre-existing skips for optional deps).
  • CUDA host: cupy and dask+cupy tests executed locally.

Deep-sweep test-coverage pass 1 (2026-05-19): adds
test_polygonize_coverage_2026_05_19.py with 58 tests, all passing on
a CUDA host.

Closes the following audit-flagged gaps:

Cat 1 (backend coverage)
  - simplify_tolerance + mask= parity on the dask+cupy backend
    (numpy / cupy / dask were already covered).

Cat 2 (NaN / Inf / nodata)
  - NaN parity for cupy and dask+cupy.
  - all-NaN raster on every backend.
  - +/-Inf pins on every backend.  numpy / dask / dask+cupy currently
    silently absorb Inf cells into adjacent finite polygons (issue
    #2155); cupy emits them correctly.  Pins lock the asymmetric
    behaviour so the fix is visible.

Cat 3 (geometric)
  - 1x1 single-pixel raster on all four backends + geopandas.
  - Nx1 single-column raster exercises the nx==1 padding path
    (polygonize.py:565) and the cupy nx==1 numpy-fallback
    (polygonize.py:671).
  - 1xN single-row and all-equal-value rasters on all four backends.

Cat 4 (parameter coverage)
  - column_name non-default value across geopandas / spatialpandas /
    geojson return types.
  - Validation error paths: bad connectivity, bad transform length,
    mask shape mismatch, mask underlying-type mismatch.

Cat 5 N/A: polygonize returns lists / dataframes, not a DataArray
with attrs to propagate.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 19, 2026
Self-review of #2156 surfaced four loose assertions that would let
regressions slip past:

- test_dask_cupy_inf_emits_polygons used an if/else with identical
  inner assertions, so it never failed regardless of behaviour.  The
  dask+cupy backend actually undercounts today (consistent with the
  numpy/dask bug, since _polygonize_chunk routes through the numpy
  backend per chunk).  Rename to *_currently_undercounts and pin
  that explicitly so the #2155 source fix is visible as a diff.

- test_dask_inf_currently_undercounts only asserted the absence of
  Inf polygons.  Mirror the numpy sibling's area + value=1.0
  conservation checks so a regression that loses a finite polygon
  is also caught.

- TestSimplifyDaskCupy.test_dask_cupy_matches_numpy_areas iterated
  over a_np keys without asserting set equality, so an extra
  dask+cupy polygon class would have been silently ignored.

- test_column_name_geopandas_non_default asserted DataFrame row
  order, which is implementation-defined.  Sort before comparing.
@brendancol brendancol merged commit afa8730 into main May 20, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant