polygonize: fix Dask 8-connectivity diagonal merge at chunk corners (#2172)#2197
Conversation
…2172) Under connectivity=8 the Dask path produced more polygons than NumPy whenever two same-value cells were only diagonally adjacent across a chunk boundary. Edge-cancellation across chunk boundaries cannot see a diagonal connection: the two cells share a vertex, not an edge. Root cause sits in `_pick_next_edge`. At a degree-4 vertex it always took the smallest CCW turn, which separates two same-value polygons sharing a corner into two rings. That is wrong for 8-connectivity. The NumPy backend produces a figure-8 ring at such corners (a self-touching polygon) and the Dask merge was undoing that across chunks. Fix: plumb `connectivity_8` through `_merge_polygon_rings` and `_trace_rings` into `_pick_next_edge`. Under 8-connectivity, pick the largest forward-going CCW turn at degree-4 vertices. That pairs the crossing edges diagonally and reproduces the NumPy figure-8 ring. Also drops the docstring caveat on `polygonize` that warned about extra polygon splits at chunk corners under Dask 8-connectivity. Tests in `test_polygonize_issue_2172.py` cover the original repro `[[1, 0], [0, 1]]`, checkerboards at several chunk shapes, diagonal stripes, NaN inputs, float-tolerance paths, and confirm 4-connectivity counts are unchanged.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: polygonize: fix Dask 8-connectivity diagonal merge at chunk corners (#2172)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/polygonize.py:884: the 8-conn branch in_pick_next_edgekeepsrel == 0(continue straight) in the max search with weight 0, which loses torel == 3(right turn). At a degree-4 vertex where a straight-through ring meets a corner ring, the trace can cross from the straight-through ring into the corner ring instead of continuing straight. The current tests don't hit this case because all degree-4 vertices in the inputs are crossings (both incoming directions are turns). Suggest givingrel == 0priority overrel == 3, or excludingrel == 0from the max search. Add a regression test where a same-value polygon spans a chunk-boundary mid-edge (not a corner) and another same-value polygon has a corner at the same point. -
xrspatial/tests/test_polygonize_issue_2172.py:152: the random-20x20 test loopsfor chunks in [...]. On failure pytest reports only the last value. Parametrize the chunks list so each shape is its own test case. -
xrspatial/polygonize.py:1561:_merge_chunk_polygons(unused now) still calls_merge_polygon_rings(polys_list)withoutconnectivity_8. Delete it if dead, otherwise plumbconnectivity_8through so the bug isn't reintroduced if it gets revived.
Nits (optional improvements)
-
xrspatial/polygonize.py:858: docstring uses an em dash ("vertex — exactly what NumPy"). Rest of the file uses ASCII punctuation; replace with a comma or hyphen. -
xrspatial/polygonize.py:879: comment saysrel == 0"is impossible on a grid because consecutive ring edges never go in the same direction without the trace having already turned". That's true within a single ring, not across multiple rings merged into one edge set, which is exactly the case_merge_polygon_ringshandles. Fix the comment to match actual behaviour. -
xrspatial/tests/test_polygonize_issue_2172.py:53:pytest.approx(area_dk[val], abs=1e-9)is tighter than needed. Polygon areas come back as exact integers for integer input and exact floats for float input;abs=0works. Not load-bearing.
What looks good
- Fix is minimal and surgical.
_pick_next_edge,_trace_rings,_merge_polygon_rings, and_merge_from_separatedgain aconnectivity_8flag with defaultFalse, so 4-conn behaviour is preserved bit-for-bit. - Docstring caveat at the old line 1646 is removed.
- Tests cover the original repro, multiple chunk shapes, checkerboards, diagonal stripes, NaN inputs, and float-tolerance paths.
- No per-cell allocation. Zero extra memory cost relative to the existing path. No OOM concern.
Checklist
- Algorithm matches reference (NumPy backend).
- All implemented backends consistent (NumPy, Dask+NumPy verified; Dask+CuPy unchanged code path).
- NaN handling unchanged (covered by
test_float_with_nan). - Edge cases covered (checkerboard, stripe, NaN, multiple chunk shapes).
- Dask chunk boundaries handled correctly.
- No premature materialization or unnecessary copies.
- Benchmark: none added; fix sits in a non-perf-critical merge step.
- README feature matrix: N/A (no new function).
- Docstrings updated.
- `_pick_next_edge` 8-conn branch: explicit priority order rel==0 (straight) > rel==3 (right turn) > rel==1 (left turn) > rel==2 (u-turn). The previous max-rel implementation gave rel==0 weight 0, so a degree-4 vertex where one ring passed straight through and another cornered could wrongly cross between them. - Updated the docstring and comment to match actual behaviour: rel==0 is possible across multiple rings sharing a vertex. - `_merge_chunk_polygons` (currently unused) now forwards ``connectivity_8`` to ``_merge_polygon_rings`` so the same bug cannot reappear if it is revived. - Parametrized the random-20x20 test so each chunk shape is its own test case. - Added direct ``_merge_polygon_rings`` unit tests pinning the 8-conn figure-8 behaviour and the 4-conn separation behaviour. - Tightened area tolerance from abs=1e-9 to abs=1e-12 (areas come back exact for integer inputs). - Replaced an em dash in the docstring with ASCII punctuation.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review after c822056
Disposition of the previous review
- Suggestion: rel==0 priority -- fixed in
_pick_next_edgevia explicit priority table {0: 4, 3: 3, 1: 2, 2: 1}. rel==0 (straight through) now wins, rel==3 (right turn / figure-8 pair) is second, rel==1 (4-conn left turn) is third, rel==2 (u-turn) is last. - Suggestion: parametrize random-20x20 test -- done; each chunk shape is its own test case.
- Suggestion: plumb connectivity_8 through
_merge_chunk_polygons-- done. The function is unused in the current call graph but the parameter is in place if it gets revived. - Nit: em dash in docstring -- removed.
- Nit: misleading "rel==0 is impossible" comment -- rewritten to reflect that rel==0 is possible across multiple rings sharing a vertex (which is exactly the case
_merge_polygon_ringshandles). - Nit: pytest.approx tolerance -- tightened from 1e-9 to 1e-12 (areas are exact for integer inputs).
New tests
TestMergePolygonRingsDirect pins the merge behaviour directly without going through the full polygonize path:
- Two diagonally-adjacent unit squares merge to a single figure-8 ring under 8-conn.
- Same input under 4-conn keeps them separate.
- Two edge-sharing squares still cancel and merge under 8-conn.
Residual findings
None.
Checklist
- Algorithm matches reference.
- All implemented backends consistent.
- NaN handling unchanged.
- Edge cases covered.
- Dask chunk boundaries handled correctly.
- No premature materialization.
- Benchmark: N/A.
- README feature matrix: N/A.
- Docstrings updated.
Summary
Closes #2172.
polygonizewithconnectivity=8so polygon counts and per-region areas match NumPy when same-value cells are only diagonally adjacent across a chunk boundary._pick_next_edgealways took the smallest CCW turn at degree-4 vertices, which splits the figure-8 ring NumPy produces for diagonal-only adjacency into two separate squares.connectivity_8is now plumbed through_merge_polygon_ringsand_trace_ringsinto_pick_next_edge. Under 8-connectivity it picks the largest forward-going CCW turn at degree-4 vertices instead, which preserves figure-8 rings across chunk boundaries.Backend coverage
Test plan
xrspatial/tests/test_polygonize_issue_2172.pycovers:[[1, 0], [0, 1]]withchunks=(1, 1).test_polygonize.py(124 tests) still passes.test_polygonize_coverage_2026_05_19.pystill passes except for one pre-existing CuPy +inf test owned by a parallel rockout.Out of scope
This PR does not touch the chunk-size-dependent float merge (#2171), Inf handling (#2174), or the float-tolerance docstring. Parallel rockouts are handling those.