Skip to content

geotiff: golden corpus phase 3 PR 4, dask+GPU backend (#1930)#2040

Merged
brendancol merged 3 commits into
mainfrom
1930-phase3-4-dask-gpu
May 18, 2026
Merged

geotiff: golden corpus phase 3 PR 4, dask+GPU backend (#1930)#2040
brendancol merged 3 commits into
mainfrom
1930-phase3-4-dask-gpu

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Phase 3 PR 4 of the golden corpus plan in #1930. Reads each fixture through open_geotiff(str(path), gpu=True, chunks=32, on_gpu_failure='strict'), returning a dask-of-cupy DataArray. The oracle pulls pixels via .compute() then .get() so the comparison machinery is unchanged.

The module pytest.importorskips cupy and skips cleanly when no CUDA device is reachable. Strict on-gpu-failure keeps a silent CPU fallback from masking dask+GPU coverage.

23 fixtures pass at level-0 parity. 7 xfailed with strict=True (shared codec/attrs gaps). 2 skipped (MinIsWhite + the example_* manifest entry).

_DASK_GPU_SKIPS is reserved for combo-specific gaps and is empty in this PR. A test_dask_gpu_candidate_is_chunked_and_on_device belt-and-braces check catches both failure modes (chunks= silently dropped or gpu=True silently fallen back) by computing one chunk and asserting it is a cupy.ndarray.

Test plan

  • pytest xrspatial/geotiff/tests/test_golden_corpus_dask_gpu_1930.py: 23 passed, 7 xfailed, 2 skipped
  • Module skips cleanly on CPU-only environments

Refs #1930. Fourth of six backend-wiring PRs; HTTP/COG and VRT remain.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
brendancol added a commit that referenced this pull request May 18, 2026
* geotiff: oracle understands masked-nodata candidates (#1930)

Closes the 5-of-7 xfail shared by every phase-3 backend module.

xrspatial reads integer GeoTIFFs whose nodata tag carries an
integer sentinel by masking sentinel-equal pixels to NaN and
upcasting the array to float (issue #1988). It stamps
``attrs['masked_nodata'] = True`` so a write round-trip can put
the tag back. The oracle previously did strict dtype + raw-pixel
comparison, so the dtype shift and sentinel-to-NaN rewrite both
registered as mismatches.

This change adds ``_normalise_for_masked_nodata`` to ``_oracle.py``.
When the candidate reports the contract, the rasterio reference is
cast to the candidate's float dtype and pixels equal to the
sentinel are replaced with NaN; the dtype and pixel assertions
then run on directly comparable arrays. Fixtures that do not
report the contract pass through unchanged, so the 23+ currently
passing fixtures and the citation / JPEG xfails are untouched.

Drops the five nodata-masking entries from ``_PARITY_GAPS`` in
each of the three on-main backend modules (eager numpy, dask
numpy, GPU). The matching ``xfail(strict=True)`` flips to a real
failure when the test starts passing, so the oracle change and
the cleanup must land together to keep main green.

Test coverage in ``golden_corpus/test_oracle.py``:

* masked_nodata candidate matches the oracle
* masked_nodata flag missing -- strict dtype check still fires
* candidate that forgot to mask a sentinel pixel -- pixel check fires
* candidate with wrong ``attrs['nodata']`` -- nodata check fires
* plain float-NaN fixtures are not affected by the new path
* masked_nodata flag plus NaN sentinel passes through cleanly

Follow-ups: the same five entries will be removed from the
``_PARITY_GAPS`` tables in the still-open phase-3 PRs #2040
(dask+GPU), #2041 (HTTP), and #2042 (VRT) via separate commits
on each branch.

* geotiff: tighten masked-nodata oracle guards (#1930)

Address PR #2046 review:

* Reject fractional sentinels via ``float(nd_float).is_integer()``.
  Without the guard, a sentinel like ``3.5`` would cast to ``3``
  and mask every 3-valued pixel.
* Reject sentinels outside the source integer dtype's range via
  ``info.min <= int(nd_float) <= info.max``. Without the guard,
  ``np.uint16(-1.0)`` wraps to ``65535`` and would mask the
  dtype-max pixel.

Both guards mirror the upstream xrspatial reader's checks in
``xrspatial/geotiff/__init__.py``, so the oracle's interpretation
of the masked-nodata contract is now strictly tighter than what
the reader can emit.

Added two tests:

* ``test_masked_nodata_fractional_sentinel_does_not_mask`` --
  builds a fractional-nodata fixture, confirms the oracle stays on
  the raw-pixel path (dtype mismatch fires, not pixel mismatch).
* ``test_masked_nodata_out_of_range_sentinel_does_not_mask`` --
  rasterio refuses to write an out-of-range nodata at the writer
  level, so this calls ``_normalise_for_masked_nodata`` directly
  with synthesised inputs and confirms the inputs pass through
  unchanged.
Mirrors the eager / dask / GPU parity layers but reads each fixture
through ``open_geotiff(str(path), gpu=True, chunks=32,
on_gpu_failure='strict')``, returning a dask-of-cupy DataArray. The
oracle pulls pixels via ``.compute()`` then ``.get()`` so the
comparison machinery is unchanged.

Skips cleanly when no CUDA device is reachable; strict on-gpu-failure
keeps a silent CPU fallback from masking dask+GPU coverage.

23 fixtures pass, 7 xfailed (shared codec/attrs gaps), 2 skipped.
``_DASK_GPU_SKIPS`` is reserved for combo-specific gaps and is empty
in the first pass. A
``test_dask_gpu_candidate_is_chunked_and_on_device`` belt-and-braces
check catches both failure modes (``chunks=`` dropped or ``gpu=True``
silently fallen back) by computing one chunk and asserting it is a
``cupy.ndarray``.
Mirror the phase 3 PR 2 fix: assert the chunk grid is at least 2x2
along the spatial axes after picking a fixture whose pixel extent
is at least ``2 * CHUNK_SIZE``. Catches the failure mode where
``chunks=`` is accepted but stitched into a single chunk that
covers the whole file (windowing logic never runs).
Follow-up to #2046, which extended the oracle to handle
``attrs['masked_nodata']=True`` candidates. The five
``nodata_int_sentinel_uint16`` / ``stripped_*_uint16`` /
``tiled_*_uint16`` fixtures now pass the oracle directly, so
keep ``xfail(strict=True)`` against them would flip to
unexpected-pass and break the module.

This branch is also rebased onto current main so the new oracle
behaviour is what runs.
@brendancol brendancol force-pushed the 1930-phase3-4-dask-gpu branch from bcfa648 to 0f189d1 Compare May 18, 2026 17:57
@brendancol brendancol merged commit 07814ce into main May 18, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant