Add release-gate windowed-read coords + shifted-transform parity test (#2358)#2364
Add release-gate windowed-read coords + shifted-transform parity test (#2358)#2364brendancol wants to merge 3 commits into
Conversation
…#2358) Adds xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py to lock the contract that a windowed read returns the correct shape, coords that slice the unwindowed coords bit-exact, an attrs['transform'] equal to T_full * Affine.translation(col_off, row_off) with no float drift, and unchanged canonical non-transform release attrs. Covers eager and dask read paths across a representative corpus (int16 with nodata, float32 tiled with NaN nodata, float32 stripped without nodata). Adds the matching row in docs/source/reference/release_gate_geotiff.rst. Part of epic #2341, PR 2 of 5.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Add release-gate windowed-read coords + shifted-transform parity test (#2358)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
_assert_canonical_attrs_unchangedusesisinstance(full_val, float) and np.isnan(full_val)to detect a NaN nodata sentinel (test file lines 325-329). The reader currently normalises the read-back nodata to a Pythonfloat, so this works today, but if any backend ever returns the sentinel asnp.float32(which is what the input fixture passes toto_geotiff) the isinstance branch silently falls through towin_val == full_valand NaN != NaN flips the test from "checked NaN equality" to "always fails". A more durable form is something liketry: full_is_nan = bool(np.isnan(full_val)) except (TypeError, ValueError): full_is_nan = False
which handles both Python and numpy scalar NaNs without binding to a specific dtype.
-
The dask backend uses
chunks=32over a256x256raster with a window starting at(row_off=32, col_off=64). Both window offsets are exact multiples of 32, so the window aligns to chunk boundaries and the test does not exercise the case where the window splits a chunk. A second parametrize case at e.g.chunks=20or a window offset of(33, 65)would cover the misaligned path without much extra runtime.
Nits (optional improvements)
-
_PIXEL_HEIGHT = -25.0,_ORIGIN_Y = 4001987.25-- the values are well chosen to defeat integer-only window math, but a one-line comment on_ORIGIN_X = 500123.5explaining "fractional origin so a windowed reader that drops the fractional part fails the equality assertion" would make the choice self-documenting. -
The corpus uses three files: int16-deflate-stripped, float32-deflate-tiled, float32-none-stripped. A MinIsWhite cell is on the epic's representative-files list (see
test_backend_pixel_parity_matrix_1813.py::_TIFF_FIXTURES). Out of scope for this PR if you want to keep the diff small, but consider adding it in a follow-up so this gate covers every dtype/layout the eager/dask matrix covers.
What looks good
- Algebra:
_expected_window_transformkeeps thebanddskew terms so the formula generalises if a rotated-grid fixture is added later. The implementation matchesT_full * Affine.translation(col_off, row_off)exactly for axis-aligned grids. - No rasterio oracle: the transform check pins the spec directly, as the issue asked.
- Tuple equality (not
approx) on the transform forces the reader to thread the algebraic identity all the way through, instead of re-deriving the origin from the y/x coord arrays where float rounding could creep in. masked_nodatais correctly identified as data-dependent and asserted structurally (presence + bool type), with the rationale in a comment pointing at_attrs.py.- Temp filenames include the issue tag + a uuid so sibling rockout worktrees and xdist workers cannot collide.
@pytest.mark.release_gateis registered insetup.cfg(line 115).- The matching row was added to
docs/source/reference/release_gate_geotiff.rstunder "Local GeoTIFF read and write" with a clear one-line acceptance.
Checklist
- Algorithm matches reference/paper (algebraic identity, not rasterio)
- All implemented backends produce consistent results (eager + dask both pass)
- NaN handling is correct (works today; see Suggestion 1 for a durability nit)
- Edge cases covered (int sentinel, float NaN sentinel, no sentinel)
- Dask chunk boundaries handled (32-aligned today; see Suggestion 2 for the misaligned case)
- No premature materialization (no
.compute(); coords are already eager on xarray) - Benchmark not needed (test-only PR)
- README feature matrix update not needed (no new public function)
- Docstrings present and accurate
Applies the four findings from the self-review on PR #2364: - Robust NaN compare on the canonical-attrs check: drop the ``isinstance(x, float)`` gate in favour of a try/except around ``np.isnan(x)`` so a future backend that returns a numpy-scalar nodata cannot silently fall into the ``==`` branch where NaN != NaN flips the test from "checked equal" to "always fails". - Add a chunk-misaligned window case ``(33, 65, 95, 191)`` alongside the chunk-aligned ``(32, 64, 96, 192)`` window so the dask reader has to split chunks instead of starting on a chunk boundary. - Extend the corpus with a uint8 MinIsWhite stripped fixture so this release-gate row covers the same representative layout as ``test_backend_pixel_parity_matrix_1813.py``. Gated on tifffile availability via the same skip marker pattern. Because the MinIsWhite fixture is written without GeoTIFF tags, the transform-shift assertion handles the no-georef case by requiring that neither read carries a ``transform``; an invented transform on the windowed side would be the silent-wrongness this gate exists to catch. - Document the ``_ORIGIN_X = 500123.5`` choice with a comment on why the fractional value matters for the exact-tuple equality. Test count goes from 24 (3 files x 2 readers x 4 assertions) to 64 (4 files x 2 readers x 2 windows x 4 assertions); local run is still under one second.
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up): Add release-gate windowed-read coords + shifted-transform parity test (#2358)
Second pass after the round-1 follow-up commit (2d53ea29).
Blockers
None.
Suggestions
None.
Nits (optional)
- The module header docstring (lines 8-12) still names three corpus cells (integer + nodata, float + NaN nodata, float no nodata). The follow-up commit added a fourth (
uint8MinIsWhite stripped). Update the docstring so the prose matches_CORPUS. One-line fix.
What looks good
- All four findings from the first review are addressed.
- NaN compare is now backend-agnostic via
try: np.isnan(x); a future numpy-scalar nodata cannot silently slip into the==branch. - The chunk-misaligned window
(33, 65, 95, 191)exercises the dask reader's chunk-split path; the aligned window keeps the boundary-aligned case covered. - MinIsWhite cell added with the same
tifffile-gated marker pattern used bytest_backend_pixel_parity_matrix_1813.py, and_assert_transform_shiftednow handles the legitimate no-georef case (notransformon either side) without papering over a real attr-drop. - The
_ORIGIN_X = 500123.5fractional value is now self-documenting.
Checklist
- Algorithm matches the algebraic spec (no rasterio oracle)
- Eager and dask paths covered (numpy + dask+numpy)
- NaN handling is robust (backend-agnostic via
np.isnan) - Edge cases covered (int sentinel, float NaN, no sentinel, MinIsWhite no-georef)
- Dask chunk boundaries handled (aligned + misaligned cases)
- No premature materialization
- Benchmark not needed (test-only)
- README feature matrix update not needed (no new public function)
- Docstrings present (header prose has a minor stale-count nit; see above)
CI status noteThe Python 3.14 pytest jobs are red on this PR, but the four failing tests are pre-existing failures on
All four are regex-match failures against The Python 3.12 jobs and the new test file added by this PR pass on every backend matrix cell (eager + dask x 4 corpus files x 2 windows x 4 assertions = 64 cases; MinIsWhite cells skip cleanly when Flagging the red checks as out of scope for this PR -- the regex tightening lives in sibling tests and should land on its own. |
Closes #2358. PR 2 of 5 for epic #2341.
Summary
xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.pylocks the windowed-read contract: shape equals the window's(height, width),coords['y'/'x']are a bit-exact slice of the unwindowed coords,attrs['transform']equalsT_full * Affine.translation(col_off, row_off)by exact tuple equality, and the non-transform canonical release attrs match the unwindowed read.open_geotiff(window=...)) and dask (read_geotiff_dask(window=..., chunks=32)) paths are exercised.masked_nodatais data-dependent (true iff sentinel-to-NaN actually ran), so it is asserted structurally (presence + bool type) instead of by value.docs/source/reference/release_gate_geotiff.rst.Backend coverage
open_geotiff)read_geotiff_dask)GPU paths are out of scope for this row and are covered by
test_backend_pixel_parity_matrix_1813.py.Test plan
pytest xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py-- 24 passed locally (3 files x 2 readers x 4 assertions).Notes
T_full * Affine.translation(col_off, row_off)by hand. It does not callrasterio.windows.transform, so the test pins the spec rather than whatever rasterio happens to compute.