geotiff: golden corpus phase 3 PR 2, dask+numpy backend (#1930)#2038
Merged
Conversation
Mirrors phase 3 PR 1 (#2036) but reads each fixture through ``open_geotiff(str(path), chunks=32)``. The oracle pulls the candidate's pixels via ``.compute()`` so the comparison machinery is unchanged; what this PR actually exercises is the windowed-decode plumbing in ``read_geotiff_dask``. The skip / xfail taxonomy is intentionally identical to the eager module's, since the parity gaps live in the codec / attrs layer that both backends share. A separate ``_DASK_SKIPS`` table (empty in the first pass) is reserved for dask-only gaps so the difference between "shared gap" and "dask plumbing gap" stays legible in future PRs. 23 fixtures pass, 7 xfailed (the same shared gaps the eager backend flags), 2 skipped (MinIsWhite + the example_ manifest entry without an on-disk .tif). A ``test_dask_candidate_is_actually_chunked`` sanity test catches the failure mode where ``chunks=`` is silently dropped and the eager path runs instead.
This was referenced May 18, 2026
Address phase 3 PR 2 review: ``test_dask_candidate_is_actually_chunked`` previously checked ``hasattr(da.data, 'dask')`` which passes for any dask-backed array including single-chunk ones. A regression that materialised the full file into one chunk would still pass the ``hasattr`` check while leaving the windowing logic untested. Tighten the helper to pick a fixture whose pixel extent is at least ``2 * CHUNK_SIZE`` along both axes and assert ``numblocks >= 2`` on the spatial axes, so the chunk grid is actually 2x2 or larger.
brendancol
added a commit
that referenced
this pull request
May 18, 2026
* geotiff: golden corpus phase 3 PR 3, GPU backend (#1930) Mirrors phase 3 PR 1 (#2036) and PR 2 (#2038) but reads each fixture through ``open_geotiff(str(path), gpu=True, on_gpu_failure='strict')``, returning a CuPy-backed DataArray. The oracle's ``_candidate_pixels`` pulls the device array back to host via ``.get()`` before comparing. The whole module ``pytest.importorskip``s cupy and skips if no CUDA device is reachable, so CI matrices without a GPU collect zero tests here. The strict on-gpu-failure mode is on so a silent CPU fallback surfaces as an exception rather than masking GPU coverage. 23 fixtures pass, 7 xfailed (shared codec/attrs gaps), 2 skipped (MinIsWhite + the example_ manifest entry). ``_GPU_SKIPS`` is reserved for GPU-specific gaps and is empty for now. A ``test_gpu_candidate_is_actually_on_device`` belt-and-braces check confirms the result is a ``cupy.ndarray`` rather than a CPU-fallback numpy array. * geotiff: move JPEG cell to GPU-only skip table (#1930) Address phase 3 PR 3 review: ``compression_jpeg_uint8_ycbcr`` lived in ``_PARITY_GAPS`` with the reason "RGB band axis order divergence", which describes the failure mode on the eager and dask backends. On the GPU path with ``on_gpu_failure='strict'`` the JPEG-YCbCr decoder raises ``OSError: broken data stream`` before the oracle gets to compare anything, so the recorded reason no longer matches what actually happens. Move the entry to ``_GPU_SKIPS`` with a GPU-accurate reason. The xfail mechanics are unchanged; this is documentation that now matches reality. Update the module docstring to match.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 PR 2 of the golden corpus plan in #1930. Mirrors phase 3 PR 1 (#2036, just merged) but reads each fixture through
open_geotiff(str(path), chunks=32)instead of the eager call. The oracle pulls the candidate's pixels via.compute()so the comparison machinery is unchanged; what this PR exercises is the windowed-decode plumbing inread_geotiff_dask.The skip / xfail taxonomy is intentionally identical to the eager module's. The parity gaps it flags (integer nodata masking, citation CRS, RGB band axis order, MinIsWhite inversion) live in the codec / attrs layer that both backends share, so the dask backend inherits them verbatim.
A separate
_DASK_SKIPStable is set up but empty in this PR. Add an entry only when a fixture is dask-specific (eager passes, dask does not); the separation keeps "shared codec/attrs gap" and "dask plumbing gap" legible in future PRs.23 fixtures pass at level-0 parity. 7 xfailed with strict=True (same shared gaps). 2 skipped: MinIsWhite (intentional inversion per #1797) and the
example_*manifest entry without an on-disk .tif.A
test_dask_candidate_is_actually_chunkedsanity test catches the failure mode wherechunks=is silently dropped and the eager path runs instead.Test plan
pytest xrspatial/geotiff/tests/test_golden_corpus_dask_numpy_1930.py: 23 passed, 7 xfailed, 2 skippedRefs #1930. Second of six backend-wiring PRs (eager numpy done in #2036; dask+numpy here; GPU, dask+GPU, HTTP/COG, VRT to follow).