Release gate: eager-vs-dask raster equivalence (#2357)#2362
Conversation
PR 1 of 5 of epic #2341. Locks in pixel + dims + coords + seven release-attr-keys parity between open_geotiff and read_geotiff_dask across integer-nodata, float-NaN-nodata, MinIsWhite, and masked-nodata-lifecycle corpus fixtures. Adds a row to the release-gate doc page citing the new test file.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Release gate: eager-vs-dask raster equivalence (#2357)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py:108-111-- themasked-nodata-lifecyclerow passes{"mask_nodata": True}, butTrueis already the default on bothopen_geotiffandread_geotiff_dask, so this row reads the fixture with the same effective kwargs as theint-dtype-nodatarow at line 92. The two cells exercise an identical code path and pass / fail together. To make the fourth scenario distinct (and to actually cover the "lifecycle" both ways), pass{"mask_nodata": False}here so the row pins the raw-sentinel branch in parity against the masked branch from the first row. Verified locally: withmask_nodata=Truethe read returns float64 withmasked_nodata=True; withmask_nodata=Falseit returns uint16 withmasked_nodata=False. Today the matrix only covers the masked branch on the int fixture. -
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py:210-224--_attr_equalfalls back to==for thetransform6-tuple of floats. The sibling parity gatetest_backend_full_parity_2211.py:608-623allows a 1e-9 ULP tolerance on the transform tuple for the same reason: float coords reconstructed through different code paths can pick up sub-ULP drift even when the on-disk values are identical. Bit-exact works today and is the strongest signal, so keeping==is defensible; consider mirroring the ULP tolerance only if this gate starts flaking on float drift. Flagging so the next reader of this file knows the divergence from the sibling gate is intentional rather than an oversight.
Nits (optional improvements)
-
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py:119-127--_materialiseis only called inside_assert_values_equal; it could be inlined to drop one level of indirection. Keeping it separate is fine too if the intent is to give a stable hook for the sibling PRs of epic #2341 to reuse via copy-paste; if so, a one-line comment saying as much would make that intent visible. -
docs/source/reference/release_gate_geotiff.rst:81-- "masked-nodata-lifecycle fixtures" is plural, but in practice the corpus reuses thenodata_int_sentinel_uint16fixture with different kwargs rather than carrying a separate masked-lifecycle fixture file. Rewording to "across four scenarios (integer-nodata, float-NaN-nodata, MinIsWhite, masked-nodata-lifecycle)" tracks the test more literally.
What looks good
- The
@pytest.mark.release_gatemark is the right home for this gate and the marker is already registered insetup.cfg:115. - The seven release-attr keys match the list called out in the issue spec.
- The
_attr_equalhelper handles NaN sentinels, ndarray attrs, and nested tuples / lists; the NaN branch is essential for thenodata_nan_float32row. test_release_gate_corpus_is_non_emptyis a useful sentinel against a future refactor that empties out the parametrize list and lets the matrix pass vacuously.- Helpers are inlined per the issue's "no shared helper module in this PR" constraint, keeping the PR independent of the four sibling PRs.
Checklist
- Algorithm matches the issue spec (pixels + dims + coords + seven attrs)
- Both implemented backends (eager numpy, dask numpy) produce consistent results on the in-tree corpus
- NaN handling correct (
equal_nan=Truefor floats, NaN-aware attr comparison) - Edge cases covered for the four scenarios listed in the issue (modulo the
mask_nodata=Trueredundancy noted above) - Dask chunk size is sane for the corpus fixtures (32 over 64x64 max)
- No premature materialisation or unnecessary copies
- No new benchmarks needed (test-only PR)
- README feature matrix not applicable (no new public API)
- Docstrings present on module, helpers, and the parametrized test
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review: Release gate: eager-vs-dask raster equivalence (#2357)
Second pass after the review fixes in commits 94d36bf5 and 5afc20b0.
Disposition of the first-pass findings
- Fixed:
masked-nodata-lifecyclerow now passesmask_nodata=False, so it is the contrast cell to the default-Trueint-dtype-nodatarow rather than a duplicate of it. The fixture surface now pins both sides of the nodata lifecycle. Verified locally: 5/5 still pass after the kwarg flip. - Dismissed with rationale in code: the
transform6-tuple comparison stays bit-exact via==(no ULP tolerance). The new docstring on_attr_equalexplains the divergence from the sibling gate intest_backend_full_parity_2211.pyand why it is intentional for the same-file eager-vs-dask axis. If this gate ever flakes on float drift, the comment points the next reader at the knob to turn. - Fixed:
_materialisenow carries a sentence noting it is kept as a named helper for symmetry across the four sibling PRs of epic #2341. The hook is documented rather than mysterious. - Fixed: the doc row at
release_gate_geotiff.rst:74now reads "across four scenarios: integer-nodata, float-NaN-nodata, MinIsWhite, and themask_nodata=Falseraw-sentinel branch of the nodata lifecycle". Tracks the parametrize matrix literally. - Follow-on nit fix (commit
5afc20b0): the comment above the_CORPUSliteral had a stalemask_nodata=Trueexample; updated tomask_nodata=Falseto match the actual row.
Remaining findings
None.
Checklist
- All review actions either applied or dismissed in code with a rationale.
- All 5 tests still pass locally.
- No new blockers, suggestions, or nits surfaced by the second pass.
|
CI status note: the macOS
Verified by checking out The Flagging rather than papering over -- the 4 unrelated failures should be triaged in a separate PR. |
Closes #2357.
PR 1 of 5 of epic #2341.
Summary
xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py. Reads each of four representative corpus fixtures (integer-nodata, float-NaN-nodata, MinIsWhite, masked-nodata-lifecycle) once throughopen_geotiffand once throughread_geotiff_dask, then asserts pixel values (NaN-aware),dims,coords(dtype + bytes per axis), and the seven release-attr keys (transform,crs,crs_wkt,nodata,masked_nodata,georef_status,raster_type) all match.docs/source/reference/release_gate_geotiff.rstciting the new test file.Backend coverage
Eager numpy (
open_geotiff) and dask numpy (read_geotiff_dask). GPU and dask+GPU are out of scope: the issue scopes this PR to the two stable backends.Test plan
pytest xrspatial/geotiff/tests/test_release_gate_eager_dask_parity_2341.py -v-- 5 passed locally.