Skip to content

Release gate: eager-vs-dask raster equivalence (#2357)#2362

Merged
brendancol merged 4 commits into
mainfrom
issue-2357
May 24, 2026
Merged

Release gate: eager-vs-dask raster equivalence (#2357)#2362
brendancol merged 4 commits into
mainfrom
issue-2357

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2357.

PR 1 of 5 of epic #2341.

Summary

  • Adds xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py. Reads each of four representative corpus fixtures (integer-nodata, float-NaN-nodata, MinIsWhite, masked-nodata-lifecycle) once through open_geotiff and once through read_geotiff_dask, then asserts pixel values (NaN-aware), dims, coords (dtype + bytes per axis), and the seven release-attr keys (transform, crs, crs_wkt, nodata, masked_nodata, georef_status, raster_type) all match.
  • Helpers are inlined in the test module rather than pulled into a new shared helper file; the issue spec calls for keeping PRs 1-5 independent.
  • Adds one row under "Local GeoTIFF read and write" in docs/source/reference/release_gate_geotiff.rst citing the new test file.

Backend coverage

Eager numpy (open_geotiff) and dask numpy (read_geotiff_dask). GPU and dask+GPU are out of scope: the issue scopes this PR to the two stable backends.

Test plan

  • pytest xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py -v -- 5 passed locally.
  • CI green on the same file across the supported Python / NumPy matrix.

PR 1 of 5 of epic #2341. Locks in pixel + dims + coords + seven
release-attr-keys parity between open_geotiff and read_geotiff_dask
across integer-nodata, float-NaN-nodata, MinIsWhite, and
masked-nodata-lifecycle corpus fixtures. Adds a row to the
release-gate doc page citing the new test file.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 24, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Release gate: eager-vs-dask raster equivalence (#2357)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py:108-111 -- the masked-nodata-lifecycle row passes {"mask_nodata": True}, but True is already the default on both open_geotiff and read_geotiff_dask, so this row reads the fixture with the same effective kwargs as the int-dtype-nodata row at line 92. The two cells exercise an identical code path and pass / fail together. To make the fourth scenario distinct (and to actually cover the "lifecycle" both ways), pass {"mask_nodata": False} here so the row pins the raw-sentinel branch in parity against the masked branch from the first row. Verified locally: with mask_nodata=True the read returns float64 with masked_nodata=True; with mask_nodata=False it returns uint16 with masked_nodata=False. Today the matrix only covers the masked branch on the int fixture.

  • xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py:210-224 -- _attr_equal falls back to == for the transform 6-tuple of floats. The sibling parity gate test_backend_full_parity_2211.py:608-623 allows a 1e-9 ULP tolerance on the transform tuple for the same reason: float coords reconstructed through different code paths can pick up sub-ULP drift even when the on-disk values are identical. Bit-exact works today and is the strongest signal, so keeping == is defensible; consider mirroring the ULP tolerance only if this gate starts flaking on float drift. Flagging so the next reader of this file knows the divergence from the sibling gate is intentional rather than an oversight.

Nits (optional improvements)

  • xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py:119-127 -- _materialise is only called inside _assert_values_equal; it could be inlined to drop one level of indirection. Keeping it separate is fine too if the intent is to give a stable hook for the sibling PRs of epic #2341 to reuse via copy-paste; if so, a one-line comment saying as much would make that intent visible.

  • docs/source/reference/release_gate_geotiff.rst:81 -- "masked-nodata-lifecycle fixtures" is plural, but in practice the corpus reuses the nodata_int_sentinel_uint16 fixture with different kwargs rather than carrying a separate masked-lifecycle fixture file. Rewording to "across four scenarios (integer-nodata, float-NaN-nodata, MinIsWhite, masked-nodata-lifecycle)" tracks the test more literally.

What looks good

  • The @pytest.mark.release_gate mark is the right home for this gate and the marker is already registered in setup.cfg:115.
  • The seven release-attr keys match the list called out in the issue spec.
  • The _attr_equal helper handles NaN sentinels, ndarray attrs, and nested tuples / lists; the NaN branch is essential for the nodata_nan_float32 row.
  • test_release_gate_corpus_is_non_empty is a useful sentinel against a future refactor that empties out the parametrize list and lets the matrix pass vacuously.
  • Helpers are inlined per the issue's "no shared helper module in this PR" constraint, keeping the PR independent of the four sibling PRs.

Checklist

  • Algorithm matches the issue spec (pixels + dims + coords + seven attrs)
  • Both implemented backends (eager numpy, dask numpy) produce consistent results on the in-tree corpus
  • NaN handling correct (equal_nan=True for floats, NaN-aware attr comparison)
  • Edge cases covered for the four scenarios listed in the issue (modulo the mask_nodata=True redundancy noted above)
  • Dask chunk size is sane for the corpus fixtures (32 over 64x64 max)
  • No premature materialisation or unnecessary copies
  • No new benchmarks needed (test-only PR)
  • README feature matrix not applicable (no new public API)
  • Docstrings present on module, helpers, and the parametrized test

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review: Release gate: eager-vs-dask raster equivalence (#2357)

Second pass after the review fixes in commits 94d36bf5 and 5afc20b0.

Disposition of the first-pass findings

  • Fixed: masked-nodata-lifecycle row now passes mask_nodata=False, so it is the contrast cell to the default-True int-dtype-nodata row rather than a duplicate of it. The fixture surface now pins both sides of the nodata lifecycle. Verified locally: 5/5 still pass after the kwarg flip.
  • Dismissed with rationale in code: the transform 6-tuple comparison stays bit-exact via == (no ULP tolerance). The new docstring on _attr_equal explains the divergence from the sibling gate in test_backend_full_parity_2211.py and why it is intentional for the same-file eager-vs-dask axis. If this gate ever flakes on float drift, the comment points the next reader at the knob to turn.
  • Fixed: _materialise now carries a sentence noting it is kept as a named helper for symmetry across the four sibling PRs of epic #2341. The hook is documented rather than mysterious.
  • Fixed: the doc row at release_gate_geotiff.rst:74 now reads "across four scenarios: integer-nodata, float-NaN-nodata, MinIsWhite, and the mask_nodata=False raw-sentinel branch of the nodata lifecycle". Tracks the parametrize matrix literally.
  • Follow-on nit fix (commit 5afc20b0): the comment above the _CORPUS literal had a stale mask_nodata=True example; updated to mask_nodata=False to match the actual row.

Remaining findings

None.

Checklist

  • All review actions either applied or dismissed in code with a rationale.
  • All 5 tests still pass locally.
  • No new blockers, suggestions, or nits surfaced by the second pass.

@brendancol
Copy link
Copy Markdown
Contributor Author

CI status note: the macOS run (macos-latest, 3.14) job failed with 4 test failures, and the ubuntu / windows jobs were auto-cancelled by the matrix-strategy fail-fast after macOS went red. The 4 failing tests are pre-existing on main and unrelated to this PR:

  • xrspatial/geotiff/tests/test_unsupported_features_2349.py::test_vrt_with_skewed_geotransform_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_unsupported_resample_alg_raises
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_srcrect_size_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_dstrect_size_rejected

Verified by checking out origin/main (9c40df49) into a separate worktree and running the same four nodeids -- all four fail with identical error messages there. The failures are regex / wording mismatches between the test expectations and the current error messages in those VRT validators; they are out of scope for this PR (which only adds test_release_gate_eager_dask_parity_2341.py and one doc row).

The run (3.12) job and the second run (ubuntu-latest, 3.14) job both passed, and the new test file's 5 cells pass locally:

xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py::test_release_gate_eager_dask_full_parity[int-dtype-nodata] PASSED
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py::test_release_gate_eager_dask_full_parity[float-dtype-nan-nodata] PASSED
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py::test_release_gate_eager_dask_full_parity[miniswhite] PASSED
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py::test_release_gate_eager_dask_full_parity[masked-nodata-lifecycle] PASSED
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py::test_release_gate_corpus_is_non_empty PASSED

Flagging rather than papering over -- the 4 unrelated failures should be triaged in a separate PR.

@brendancol brendancol merged commit ee4ea34 into main May 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release-gate: eager-vs-dask raster equivalence for representative GeoTIFFs (PR 1 of 5 of epic #2341)

1 participant