Skip to content

polygonize: fix Dask 8-connectivity diagonal merge at chunk corners (#2172)#2197

Merged
brendancol merged 4 commits into
mainfrom
issue-2172
May 21, 2026
Merged

polygonize: fix Dask 8-connectivity diagonal merge at chunk corners (#2172)#2197
brendancol merged 4 commits into
mainfrom
issue-2172

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #2172.

  • Fixes Dask polygonize with connectivity=8 so polygon counts and per-region areas match NumPy when same-value cells are only diagonally adjacent across a chunk boundary.
  • Cause: _pick_next_edge always took the smallest CCW turn at degree-4 vertices, which splits the figure-8 ring NumPy produces for diagonal-only adjacency into two separate squares.
  • connectivity_8 is now plumbed through _merge_polygon_rings and _trace_rings into _pick_next_edge. Under 8-connectivity it picks the largest forward-going CCW turn at degree-4 vertices instead, which preserves figure-8 rings across chunk boundaries.
  • Drops the docstring caveat that warned about extra polygon splits at chunk corners under Dask 8-connectivity.

Backend coverage

  • numpy: unchanged.
  • cupy: unchanged.
  • dask+numpy: bug fixed.
  • dask+cupy: same Dask code path, fix applies.

Test plan

  • New xrspatial/tests/test_polygonize_issue_2172.py covers:
    • Original repro [[1, 0], [0, 1]] with chunks=(1, 1).
    • 4x4 and 6x6 checkerboards across several chunk shapes.
    • Diagonal stripes that cross multiple chunk corners.
    • Float and NaN inputs.
    • Random 20x20 across several chunk shapes.
    • 4-connectivity counts unchanged (separate test).
  • Existing test_polygonize.py (124 tests) still passes.
  • test_polygonize_coverage_2026_05_19.py still passes except for one pre-existing CuPy +inf test owned by a parallel rockout.

Out of scope

This PR does not touch the chunk-size-dependent float merge (#2171), Inf handling (#2174), or the float-tolerance docstring. Parallel rockouts are handling those.

…2172)

Under connectivity=8 the Dask path produced more polygons than NumPy
whenever two same-value cells were only diagonally adjacent across a
chunk boundary. Edge-cancellation across chunk boundaries cannot see a
diagonal connection: the two cells share a vertex, not an edge.

Root cause sits in `_pick_next_edge`. At a degree-4 vertex it always
took the smallest CCW turn, which separates two same-value polygons
sharing a corner into two rings. That is wrong for 8-connectivity. The
NumPy backend produces a figure-8 ring at such corners (a self-touching
polygon) and the Dask merge was undoing that across chunks.

Fix: plumb `connectivity_8` through `_merge_polygon_rings` and
`_trace_rings` into `_pick_next_edge`. Under 8-connectivity, pick the
largest forward-going CCW turn at degree-4 vertices. That pairs the
crossing edges diagonally and reproduces the NumPy figure-8 ring.

Also drops the docstring caveat on `polygonize` that warned about extra
polygon splits at chunk corners under Dask 8-connectivity.

Tests in `test_polygonize_issue_2172.py` cover the original repro
`[[1, 0], [0, 1]]`, checkerboards at several chunk shapes, diagonal
stripes, NaN inputs, float-tolerance paths, and confirm 4-connectivity
counts are unchanged.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 20, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: polygonize: fix Dask 8-connectivity diagonal merge at chunk corners (#2172)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/polygonize.py:884: the 8-conn branch in _pick_next_edge keeps rel == 0 (continue straight) in the max search with weight 0, which loses to rel == 3 (right turn). At a degree-4 vertex where a straight-through ring meets a corner ring, the trace can cross from the straight-through ring into the corner ring instead of continuing straight. The current tests don't hit this case because all degree-4 vertices in the inputs are crossings (both incoming directions are turns). Suggest giving rel == 0 priority over rel == 3, or excluding rel == 0 from the max search. Add a regression test where a same-value polygon spans a chunk-boundary mid-edge (not a corner) and another same-value polygon has a corner at the same point.

  • xrspatial/tests/test_polygonize_issue_2172.py:152: the random-20x20 test loops for chunks in [...]. On failure pytest reports only the last value. Parametrize the chunks list so each shape is its own test case.

  • xrspatial/polygonize.py:1561: _merge_chunk_polygons (unused now) still calls _merge_polygon_rings(polys_list) without connectivity_8. Delete it if dead, otherwise plumb connectivity_8 through so the bug isn't reintroduced if it gets revived.

Nits (optional improvements)

  • xrspatial/polygonize.py:858: docstring uses an em dash ("vertex — exactly what NumPy"). Rest of the file uses ASCII punctuation; replace with a comma or hyphen.

  • xrspatial/polygonize.py:879: comment says rel == 0 "is impossible on a grid because consecutive ring edges never go in the same direction without the trace having already turned". That's true within a single ring, not across multiple rings merged into one edge set, which is exactly the case _merge_polygon_rings handles. Fix the comment to match actual behaviour.

  • xrspatial/tests/test_polygonize_issue_2172.py:53: pytest.approx(area_dk[val], abs=1e-9) is tighter than needed. Polygon areas come back as exact integers for integer input and exact floats for float input; abs=0 works. Not load-bearing.

What looks good

  • Fix is minimal and surgical. _pick_next_edge, _trace_rings, _merge_polygon_rings, and _merge_from_separated gain a connectivity_8 flag with default False, so 4-conn behaviour is preserved bit-for-bit.
  • Docstring caveat at the old line 1646 is removed.
  • Tests cover the original repro, multiple chunk shapes, checkerboards, diagonal stripes, NaN inputs, and float-tolerance paths.
  • No per-cell allocation. Zero extra memory cost relative to the existing path. No OOM concern.

Checklist

  • Algorithm matches reference (NumPy backend).
  • All implemented backends consistent (NumPy, Dask+NumPy verified; Dask+CuPy unchanged code path).
  • NaN handling unchanged (covered by test_float_with_nan).
  • Edge cases covered (checkerboard, stripe, NaN, multiple chunk shapes).
  • Dask chunk boundaries handled correctly.
  • No premature materialization or unnecessary copies.
  • Benchmark: none added; fix sits in a non-perf-critical merge step.
  • README feature matrix: N/A (no new function).
  • Docstrings updated.

- `_pick_next_edge` 8-conn branch: explicit priority order
  rel==0 (straight) > rel==3 (right turn) > rel==1 (left turn) >
  rel==2 (u-turn).  The previous max-rel implementation gave rel==0
  weight 0, so a degree-4 vertex where one ring passed straight
  through and another cornered could wrongly cross between them.
- Updated the docstring and comment to match actual behaviour:
  rel==0 is possible across multiple rings sharing a vertex.
- `_merge_chunk_polygons` (currently unused) now forwards
  ``connectivity_8`` to ``_merge_polygon_rings`` so the same bug
  cannot reappear if it is revived.
- Parametrized the random-20x20 test so each chunk shape is its
  own test case.
- Added direct ``_merge_polygon_rings`` unit tests pinning the
  8-conn figure-8 behaviour and the 4-conn separation behaviour.
- Tightened area tolerance from abs=1e-9 to abs=1e-12 (areas come
  back exact for integer inputs).
- Replaced an em dash in the docstring with ASCII punctuation.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after c822056

Disposition of the previous review

  • Suggestion: rel==0 priority -- fixed in _pick_next_edge via explicit priority table {0: 4, 3: 3, 1: 2, 2: 1}. rel==0 (straight through) now wins, rel==3 (right turn / figure-8 pair) is second, rel==1 (4-conn left turn) is third, rel==2 (u-turn) is last.
  • Suggestion: parametrize random-20x20 test -- done; each chunk shape is its own test case.
  • Suggestion: plumb connectivity_8 through _merge_chunk_polygons -- done. The function is unused in the current call graph but the parameter is in place if it gets revived.
  • Nit: em dash in docstring -- removed.
  • Nit: misleading "rel==0 is impossible" comment -- rewritten to reflect that rel==0 is possible across multiple rings sharing a vertex (which is exactly the case _merge_polygon_rings handles).
  • Nit: pytest.approx tolerance -- tightened from 1e-9 to 1e-12 (areas are exact for integer inputs).

New tests

TestMergePolygonRingsDirect pins the merge behaviour directly without going through the full polygonize path:

  • Two diagonally-adjacent unit squares merge to a single figure-8 ring under 8-conn.
  • Same input under 4-conn keeps them separate.
  • Two edge-sharing squares still cancel and merge under 8-conn.

Residual findings

None.

Checklist

  • Algorithm matches reference.
  • All implemented backends consistent.
  • NaN handling unchanged.
  • Edge cases covered.
  • Dask chunk boundaries handled correctly.
  • No premature materialization.
  • Benchmark: N/A.
  • README feature matrix: N/A.
  • Docstrings updated.

@brendancol brendancol merged commit 1bd5823 into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

polygonize: Dask 8-connectivity splits polygons at chunk corners

1 participant