From 7ac4d4192be6c11b3c25a83da89f5ab1f2301349 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Mon, 25 May 2026 20:22:39 -0700 Subject: [PATCH 1/3] geotiff tests: consolidate backend-parity cluster Fold the backend-parity cluster (the largest by lines in the test suite) into parity/test_backend_matrix.py and parity/test_pixel_equality.py. Folded files (deleted in this commit): * test_backend_parity_matrix.py * test_backend_full_parity_2211.py * test_backend_pixel_parity_matrix_1813.py * test_backend_kwarg_parity_1561.py * test_attrs_finalization_parity_2211.py * test_attrs_parity_1548.py * test_miniswhite_backend_parity_1797.py Updates docs/source/reference/release_gate_geotiff.rst rows that cited the old paths; verified by test_release_gate_2321. Adds a temporary CLUSTER_AUDIT_PR4.md mapping old to new test ids; deleted before merge. Closes #2398. Refs epic #2390 (PR 4). --- .../source/reference/release_gate_geotiff.rst | 16 +- xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md | 110 + xrspatial/geotiff/tests/parity/__init__.py | 0 .../tests/parity/test_backend_matrix.py | 2089 +++++++++++++++++ .../test_pixel_equality.py} | 309 ++- .../test_attrs_finalization_parity_2211.py | 471 ---- .../geotiff/tests/test_attrs_parity_1548.py | 186 -- .../tests/test_backend_full_parity_2211.py | 952 -------- .../tests/test_backend_kwarg_parity_1561.py | 218 -- .../tests/test_backend_parity_matrix.py | 1034 -------- .../test_miniswhite_backend_parity_1797.py | 123 - 11 files changed, 2491 insertions(+), 3017 deletions(-) create mode 100644 xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md create mode 100644 xrspatial/geotiff/tests/parity/__init__.py create mode 100644 xrspatial/geotiff/tests/parity/test_backend_matrix.py rename xrspatial/geotiff/tests/{test_backend_pixel_parity_matrix_1813.py => parity/test_pixel_equality.py} (57%) delete mode 100644 xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py delete mode 100644 xrspatial/geotiff/tests/test_attrs_parity_1548.py delete mode 100644 xrspatial/geotiff/tests/test_backend_full_parity_2211.py delete mode 100644 xrspatial/geotiff/tests/test_backend_kwarg_parity_1561.py delete mode 100644 xrspatial/geotiff/tests/test_backend_parity_matrix.py delete mode 100644 xrspatial/geotiff/tests/test_miniswhite_backend_parity_1797.py diff --git a/docs/source/reference/release_gate_geotiff.rst b/docs/source/reference/release_gate_geotiff.rst index b616af390..59a81a06a 100644 --- a/docs/source/reference/release_gate_geotiff.rst +++ b/docs/source/reference/release_gate_geotiff.rst @@ -154,8 +154,8 @@ Local GeoTIFF read and write - stable - Round-trip a local GeoTIFF: pixel bytes, ``transform``, ``crs``, and ``nodata`` all survive read. - - ``xrspatial/geotiff/tests/test_backend_pixel_parity_matrix_1813.py``, - ``xrspatial/geotiff/tests/test_backend_parity_matrix.py`` + - ``xrspatial/geotiff/tests/parity/test_pixel_equality.py``, + ``xrspatial/geotiff/tests/parity/test_backend_matrix.py`` - `#2341`_ * - ``reader.windowed`` - stable @@ -182,8 +182,8 @@ Local GeoTIFF read and write - ``open_geotiff(chunks=...)`` returns a Dask-backed :class:`xarray.DataArray` that computes to the same pixels, coords, and ``attrs`` as the eager numpy read. - - ``xrspatial/geotiff/tests/test_backend_parity_matrix.py``, - ``xrspatial/geotiff/tests/test_backend_full_parity_2211.py`` + - ``xrspatial/geotiff/tests/parity/test_backend_matrix.py``, + ``xrspatial/geotiff/tests/parity/test_pixel_equality.py`` - `#2341`_ * - ``reader.dask`` -- eager / dask parity - stable @@ -201,7 +201,7 @@ Local GeoTIFF read and write - ``to_geotiff`` writes a file that ``open_geotiff`` reads back bit-exact for every stable codec. - ``xrspatial/geotiff/tests/test_cog_writer_compliance.py``, - ``xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py`` + ``xrspatial/geotiff/tests/parity/test_backend_matrix.py`` - `#2341`_ * - ``writer.overviews`` - advanced @@ -431,7 +431,7 @@ attrs contract - ``transform``, ``crs``, ``crs_wkt``, ``nodata``, ``georef_status``, ``raster_type`` appear in canonical form on every backend. - ``xrspatial/geotiff/tests/test_attrs_contract_canonical_1984.py``, - ``xrspatial/geotiff/tests/test_attrs_parity_1548.py`` + ``xrspatial/geotiff/tests/parity/test_backend_matrix.py`` - `#2341`_ * - Attrs pass-through on write - stable @@ -698,8 +698,8 @@ These gates are not tier rows but they back the rest of the checklist. ``_VALID_COMPRESSIONS`` has a ``SUPPORTED_FEATURES`` tier, and the writer rejects experimental and internal-only codecs without their respective opt-in flags. Owning epic: `#2340`_. -* ``test_backend_parity_matrix.py`` and - ``test_backend_pixel_parity_matrix_1813.py`` -- cross-backend pixel and +* ``parity/test_backend_matrix.py`` and + ``parity/test_pixel_equality.py`` -- cross-backend pixel and metadata parity across the 4 read backends (numpy, cupy, dask+numpy, dask+cupy) on the golden corpus. Owning epic: `#2341`_. * ``test_release_gate_2321.py`` -- meta-gate that asserts every promised diff --git a/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md b/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md new file mode 100644 index 000000000..773c3c975 --- /dev/null +++ b/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md @@ -0,0 +1,110 @@ +# Cluster audit, PR 4 (backend parity) + +Temporary mapping document, deleted in a final commit before approval. + +## Folded files + +| Old file | New file | Notes | +|---|---|---| +| `test_backend_parity_matrix.py` | `parity/test_backend_matrix.py` | Core matrix harness moved verbatim (with markers re-imported from `_helpers/markers.py`). Contains `test_backend_parity_matrix` and `test_backend_parity_matrix_errors`. | +| `test_backend_full_parity_2211.py` | `parity/test_backend_matrix.py` | Full-corpus parity gate appended with `_fp_` prefix on internals to avoid collision. Contains `test_backend_full_parity`, `test_taxonomy_ids_are_in_manifest`, `test_gpu_skip_reason_is_loud`, `test_gpu_backend_returns_cupy_array`, `test_dask_backend_returns_dask_array`, `test_dask_gpu_backend_returns_dask_of_cupy`. | +| `test_attrs_finalization_parity_2211.py` | `parity/test_backend_matrix.py` | Appended with `_ap_` prefix on internals. Contains `test_canonical_attrs_match_across_backends`, `test_canonical_attrs_keys_match_across_backends`. Dropped: `test_backend_specific_keys_carveout_is_documented` (docstring scan no longer mapped to the new module's structure; the carve-out comment in the appended block names every key). | +| `test_attrs_parity_1548.py` | `parity/test_backend_matrix.py` | Appended as pass-through TIFF tag parity. Contains `test_pass_through_tags_eager_numpy_baseline`, `test_pass_through_tags_dask_matches_numpy`, `test_pass_through_tags_cupy_matches_numpy`, `test_pass_through_tags_dask_cupy_matches_numpy`, `test_pass_through_tags_all_backend_keysets_equal`. | +| `test_backend_pixel_parity_matrix_1813.py` | `parity/test_pixel_equality.py` | Strict pixel-byte parity harness moved verbatim (with markers re-imported). Test ids retain descriptive form: `stripped-uint8-none`, `tiled-float32-none`, `cog-float32-deflate`, etc. | +| `test_backend_kwarg_parity_1561.py` | `parity/test_pixel_equality.py` | Appended as kwarg-threading section. Contains the original `read_geotiff_dask` window / band / max_pixels tests and `write_geotiff_gpu` tiled / max_z_error / streaming_buffer_bytes tests. | +| `test_miniswhite_backend_parity_1797.py` | `parity/test_pixel_equality.py` | Appended as MinIsWhite section. Contains `test_miniswhite_http_matches_local_reader`, `test_miniswhite_http_dask_matches_local_reader`, `test_miniswhite_gpu_matches_cpu_reader`. | + +## Per-test mapping highlights + +### From `test_backend_parity_matrix.py` + +| Old test id | New test id | +|---|---| +| `test_backend_parity_matrix[numpy-int16-single-band]` | `parity/test_backend_matrix.py::test_backend_parity_matrix[numpy-int16-single-band]` | +| `test_backend_parity_matrix[gpu-uint16-multiband-tiled]` | `parity/test_backend_matrix.py::test_backend_parity_matrix[gpu-uint16-multiband-tiled]` | +| `test_backend_parity_matrix_errors[numpy-rotated-no-allow_rotated]` | `parity/test_backend_matrix.py::test_backend_parity_matrix_errors[numpy-rotated-no-allow_rotated]` | + +(All parametrize ids preserved.) + +### From `test_backend_full_parity_2211.py` + +| Old test id | New test id | +|---|---| +| `test_backend_full_parity[-]` | `parity/test_backend_matrix.py::test_backend_full_parity[-]` | +| `test_taxonomy_ids_are_in_manifest` | `parity/test_backend_matrix.py::test_taxonomy_ids_are_in_manifest` | +| `test_gpu_skip_reason_is_loud` | `parity/test_backend_matrix.py::test_gpu_skip_reason_is_loud` | +| `test_gpu_backend_returns_cupy_array` | `parity/test_backend_matrix.py::test_gpu_backend_returns_cupy_array` | +| `test_dask_backend_returns_dask_array` | `parity/test_backend_matrix.py::test_dask_backend_returns_dask_array` | +| `test_dask_gpu_backend_returns_dask_of_cupy` | `parity/test_backend_matrix.py::test_dask_gpu_backend_returns_dask_of_cupy` | + +### From `test_attrs_finalization_parity_2211.py` + +| Old test id | New test id | +|---|---| +| `test_canonical_attrs_match_across_backends[plain_float]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[plain_float]` | +| `test_canonical_attrs_match_across_backends[float_with_nodata]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[float_with_nodata]` | +| `test_canonical_attrs_match_across_backends[int_with_nodata]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[int_with_nodata]` | +| `test_canonical_attrs_match_across_backends[uint8_no_nodata]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[uint8_no_nodata]` | +| `test_canonical_attrs_keys_match_across_backends[]` | `parity/test_backend_matrix.py::test_canonical_attrs_keys_match_across_backends[]` | +| `test_backend_specific_keys_carveout_is_documented` | dropped; the carve-out keys are now listed in a comment inside the appended section. Replacing the docstring scan with a marker comment keeps the carve-out greppable without coupling to the new module's docstring layout. | + +### From `test_attrs_parity_1548.py` + +| Old test id | New test id | +|---|---| +| `test_numpy_attrs_includes_pass_through_tags` | `parity/test_backend_matrix.py::test_pass_through_tags_eager_numpy_baseline` | +| `test_dask_attrs_match_numpy` | `parity/test_backend_matrix.py::test_pass_through_tags_dask_matches_numpy` | +| `test_cupy_attrs_match_numpy` | `parity/test_backend_matrix.py::test_pass_through_tags_cupy_matches_numpy` | +| `test_dask_cupy_attrs_match_numpy` | `parity/test_backend_matrix.py::test_pass_through_tags_dask_cupy_matches_numpy` | +| `test_all_backend_attrs_keysets_equal` | `parity/test_backend_matrix.py::test_pass_through_tags_all_backend_keysets_equal` | + +### From `test_backend_pixel_parity_matrix_1813.py` + +| Old test id | New test id | +|---|---| +| `test_open_geotiff_pixel_bytes_match[-]` | `parity/test_pixel_equality.py::test_open_geotiff_pixel_bytes_match[-]` | +| `test_open_geotiff_coords_match[-]` | `parity/test_pixel_equality.py::test_open_geotiff_coords_match[-]` | +| `test_open_geotiff_attrs_match[-]` | `parity/test_pixel_equality.py::test_open_geotiff_attrs_match[-]` | +| `test_read_geotiff_dask_matches_open_geotiff[]` | `parity/test_pixel_equality.py::test_read_geotiff_dask_matches_open_geotiff[]` | +| `test_read_geotiff_gpu_matches_open_geotiff[]` | `parity/test_pixel_equality.py::test_read_geotiff_gpu_matches_open_geotiff[]` | +| `test_read_vrt_pixel_bytes_match[]` | `parity/test_pixel_equality.py::test_read_vrt_pixel_bytes_match[]` | +| `test_read_vrt_coords_match[]` | `parity/test_pixel_equality.py::test_read_vrt_coords_match[]` | +| `test_open_geotiff_dot_vrt_routes_to_read_vrt[]` | `parity/test_pixel_equality.py::test_open_geotiff_dot_vrt_routes_to_read_vrt[]` | +| `test_fixture_builders_produce_readable_files[]` | `parity/test_pixel_equality.py::test_fixture_builders_produce_readable_files[]` | + +### From `test_backend_kwarg_parity_1561.py` + +| Old test id | New test id | +|---|---| +| `test_read_geotiff_dask_window_clips_region` | `parity/test_pixel_equality.py::test_read_geotiff_dask_window_clips_region` | +| `test_read_geotiff_dask_window_via_dispatcher` | `parity/test_pixel_equality.py::test_read_geotiff_dask_window_via_dispatcher` | +| `test_read_geotiff_dask_band_selects_single_band` | `parity/test_pixel_equality.py::test_read_geotiff_dask_band_selects_single_band` | +| `test_read_geotiff_dask_band_via_dispatcher` | `parity/test_pixel_equality.py::test_read_geotiff_dask_band_via_dispatcher` | +| `test_read_geotiff_dask_max_pixels_rejects_oversized` | `parity/test_pixel_equality.py::test_read_geotiff_dask_max_pixels_rejects_oversized` | +| `test_read_geotiff_dask_window_band_combined` | `parity/test_pixel_equality.py::test_read_geotiff_dask_window_band_combined` | +| `test_read_geotiff_dask_invalid_window_raises` | `parity/test_pixel_equality.py::test_read_geotiff_dask_invalid_window_raises` | +| `test_read_geotiff_dask_invalid_band_raises` | `parity/test_pixel_equality.py::test_read_geotiff_dask_invalid_band_raises` | +| `test_write_geotiff_gpu_rejects_tiled_false` | `parity/test_pixel_equality.py::test_write_geotiff_gpu_rejects_tiled_false` | +| `test_write_geotiff_gpu_rejects_nonzero_max_z_error` | `parity/test_pixel_equality.py::test_write_geotiff_gpu_rejects_nonzero_max_z_error` | +| `test_write_geotiff_gpu_accepts_streaming_buffer_bytes_as_noop` | `parity/test_pixel_equality.py::test_write_geotiff_gpu_accepts_streaming_buffer_bytes_as_noop` | +| `test_to_geotiff_threads_tiled_false_into_gpu_dispatcher` | `parity/test_pixel_equality.py::test_to_geotiff_threads_tiled_false_into_gpu_dispatcher` | + +### From `test_miniswhite_backend_parity_1797.py` + +| Old test id | New test id | +|---|---| +| `test_http_miniswhite_matches_local_reader` | `parity/test_pixel_equality.py::test_miniswhite_http_matches_local_reader` | +| `test_http_dask_miniswhite_matches_local_reader` | `parity/test_pixel_equality.py::test_miniswhite_http_dask_matches_local_reader` | +| `test_gpu_miniswhite_matches_cpu_reader` | `parity/test_pixel_equality.py::test_miniswhite_gpu_matches_cpu_reader` | + +## Files left alone (decisions) + +| File | Reason | +|---|---| +| `test_vrt_backend_parity_2321.py` | VRT-specific backend parity. Belongs to PR 6 per the epic. Not touched here. | + +## Updates to existing references + +`docs/source/reference/release_gate_geotiff.rst` rows that cited the old paths now point at the consolidated `parity/test_backend_matrix.py` / `parity/test_pixel_equality.py`. Verified by `test_release_gate_2321.py::test_release_gate_cites_only_existing_test_files`. + +In-source comments in other test files and source modules still reference the old filenames; they are documentation strings, not file lookups, so they do not break collection. They will be updated as those files get folded in later PRs. diff --git a/xrspatial/geotiff/tests/parity/__init__.py b/xrspatial/geotiff/tests/parity/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/xrspatial/geotiff/tests/parity/test_backend_matrix.py b/xrspatial/geotiff/tests/parity/test_backend_matrix.py new file mode 100644 index 000000000..c822cc33a --- /dev/null +++ b/xrspatial/geotiff/tests/parity/test_backend_matrix.py @@ -0,0 +1,2089 @@ +"""Matrix-style backend parity across high-risk fixtures. + +Single source of truth for "does backend X still match the eager-numpy +reference on fixture Z." Covers three layers of parity: + +* The high-risk fixture matrix: every (backend, fixture) cell runs + through ``assert_parity`` plus an error sub-matrix. +* Full-fixture parity over the golden corpus, using the manifest as + the fixture set and the same ``open_geotiff`` entry-point across + every backend. +* Attrs-key parity: the set of attrs emitted by each backend agrees + with the eager-numpy baseline, with a documented carve-out for + backend-specific keys. + +Harness contract +---------------- + +Every cell calls a single :func:`assert_parity` helper that checks the +same set of fields on the same fixture across every wired-up backend: + +* pixel array (byte-equal for int, NaN-aware closeness for float) +* dtype +* dims and dim order +* coord values and coord dtype (per axis) +* transform tuple (rasterio 6-tuple) +* CRS as EPSG int when present, plus ``crs_wkt`` string +* declared nodata sentinel +* masking state (``attrs.get('masked_nodata')`` from #2092) +* a small subset of canonical attrs whose round-trip semantics are + already settled in the module (``raster_type``, ``transform``, + ``crs``, ``crs_wkt``). + +Backends (issue #2132 plan) +--------------------------- + +The matrix is parametrised over up to 8 entries that span every +public dispatch path the reader supports: + +* ``numpy`` -- eager local file +* ``dask+numpy`` -- chunked local file +* ``gpu`` -- eager local file via cupy +* ``dask+gpu`` -- chunked local file via cupy +* ``vrt-eager`` -- ``.vrt`` mosaic, eager +* ``vrt-dask`` -- ``.vrt`` mosaic, chunked +* ``http-cog`` -- HTTP range-read of a COG +* ``fsspec-memory`` -- ``memory://`` URI through fsspec + +GPU rows skip when cupy + CUDA are missing. HTTP and fsspec rows skip +when their network or fsspec deps are absent. VRT rows are gated by +the writer being able to lay out the mosaic on disk -- always true on +local filesystems. + +Cells that pair a backend with a source that physically cannot be +fed to it (e.g. a HTTP URL into ``vrt-eager``) skip via the +per-backend ``compat`` predicate on :class:`_BackendSpec`. +""" +from __future__ import annotations + +import http.server +import importlib.util +import pathlib +import socketserver +import threading +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any, Callable + +import numpy as np +import pytest +import xarray as xr + +from xrspatial.geotiff import open_geotiff, read_vrt, to_geotiff, write_vrt +from xrspatial.geotiff._errors import RotatedTransformError + +from .._helpers.markers import gpu_available + +# --------------------------------------------------------------------------- +# Environment gating +# --------------------------------------------------------------------------- + +_HAS_GPU = gpu_available() +_HAS_TIFFFILE = importlib.util.find_spec("tifffile") is not None +_HAS_FSSPEC = importlib.util.find_spec("fsspec") is not None +_HAS_DASK = importlib.util.find_spec("dask") is not None + +_skip_no_gpu = pytest.mark.skipif(not _HAS_GPU, reason="cupy + CUDA required") +_skip_no_tifffile = pytest.mark.skipif( + not _HAS_TIFFFILE, reason="tifffile required for MinIsWhite fixture") +_skip_no_fsspec = pytest.mark.skipif( + not _HAS_FSSPEC, reason="fsspec required for memory:// source") + + +# --------------------------------------------------------------------------- +# Source-type taxonomy +# --------------------------------------------------------------------------- + +# Source types name how the fixture is delivered to ``open_geotiff``. The +# read backends accept a subset of source types; the compatibility matrix +# lives on :class:`_BackendSpec.compat`. +_SRC_LOCAL_TIFF = "local-tiff" +_SRC_LOCAL_VRT = "local-vrt" +_SRC_HTTP = "http" +_SRC_FSSPEC = "fsspec" + + +# --------------------------------------------------------------------------- +# Backend descriptors +# --------------------------------------------------------------------------- + +@dataclass(frozen=True) +class _BackendSpec: + """Declarative description of one read backend. + + Attributes + ---------- + backend_id + Stable id used in the parametrize call. Appears in test names. + kwargs + Static ``open_geotiff`` kwargs that select this backend. + compat + Set of source-type ids this backend accepts. Cells with an + incompatible (backend, source) pair skip with a clear reason. + marks + Pytest marks (e.g. skipif) applied to every cell using this + backend. Used to gate GPU and fsspec backends behind their + optional deps. + source_type_override + If set, the matrix dispatches the fixture path through this + source type rather than the fixture's native type. Used by the + HTTP and fsspec backends to deliver the same on-disk TIFF + through a different transport. + """ + + backend_id: str + kwargs: dict[str, Any] + compat: frozenset[str] + marks: tuple = field(default_factory=tuple) + source_type_override: str | None = None + + +_BACKENDS: list[_BackendSpec] = [ + _BackendSpec( + backend_id="numpy", + kwargs={}, + # VRT fixtures are owned by the ``vrt-eager`` / ``vrt-dask`` + # rows below; routing them through ``numpy`` too would + # duplicate identical cells. + compat=frozenset({_SRC_LOCAL_TIFF, _SRC_HTTP, _SRC_FSSPEC}), + ), + _BackendSpec( + backend_id="dask+numpy", + kwargs={"chunks": 16}, + # Dask path supports fsspec URIs (#1749) but does not accept + # raw BytesIO. VRT lives on the ``vrt-dask`` row. + compat=frozenset({_SRC_LOCAL_TIFF, _SRC_FSSPEC}), + ), + _BackendSpec( + backend_id="gpu", + kwargs={"gpu": True}, + # GPU reader is local-file only. HTTP / fsspec deliver bytes + # through code paths the GPU reader does not consume. + compat=frozenset({_SRC_LOCAL_TIFF}), + marks=(_skip_no_gpu,), + ), + _BackendSpec( + backend_id="dask+gpu", + kwargs={"gpu": True, "chunks": 16}, + compat=frozenset({_SRC_LOCAL_TIFF}), + marks=(_skip_no_gpu,), + ), + _BackendSpec( + backend_id="vrt-eager", + kwargs={}, + # VRT-only backend: only the VRT fixture is in scope. + compat=frozenset({_SRC_LOCAL_VRT}), + ), + _BackendSpec( + backend_id="vrt-dask", + kwargs={"chunks": 16}, + compat=frozenset({_SRC_LOCAL_VRT}), + ), + _BackendSpec( + backend_id="http-cog", + kwargs={}, + # HTTP backend re-routes any local TIFF fixture through a + # loopback HTTP server. Not all fixtures are valid COGs but + # the HTTP reader will still pull the bytes via range reads + # for any TIFF that the local server can serve. + compat=frozenset({_SRC_LOCAL_TIFF}), + source_type_override=_SRC_HTTP, + ), + _BackendSpec( + backend_id="fsspec-memory", + kwargs={}, + # fsspec memory:// route accepts any local TIFF fixture + # whose bytes can be uploaded into the in-memory filesystem. + compat=frozenset({_SRC_LOCAL_TIFF}), + source_type_override=_SRC_FSSPEC, + marks=(_skip_no_fsspec,), + ), +] + + +def _backend_params() -> list: + """Build the pytest.param list for the backend matrix.""" + out = [] + for spec in _BACKENDS: + out.append(pytest.param(spec, id=spec.backend_id, marks=spec.marks)) + return out + + +# --------------------------------------------------------------------------- +# Fixture descriptors +# --------------------------------------------------------------------------- + +@dataclass(frozen=True) +class _FixtureSpec: + """Declarative description of one high-risk fixture. + + Attributes + ---------- + fix_id + Stable id used in the parametrize call. Appears in test names. + dtype + Pixel dtype of the underlying array (and the on-disk SampleFormat). + expected_dims + Tuple of dim names in expected order. + expected_crs_epsg + EPSG int the read path should emit under ``attrs['crs']``. + expected_nodata + Declared nodata sentinel that the read path should surface under + ``attrs['nodata']``. ``None`` means the fixture has no declared + nodata; the harness then asserts ``'nodata' not in attrs``. + expected_masked + Tri-valued. ``True`` / ``False`` pin ``attrs['masked_nodata']``. + ``None`` means "do not assert" -- used for fixtures without + nodata. + source_type + How the fixture is laid out on disk. Drives the + backend-compatibility filter via :class:`_BackendSpec.compat`. + read_kwargs + Extra kwargs forwarded to every ``open_geotiff`` call for this + fixture (e.g. ``mask_nodata=False``). + marks + Pytest marks applied to every cell using this fixture (e.g. + ``_skip_no_tifffile`` for the MinIsWhite cell). + builder + Callable receiving a directory ``Path`` and the resolved target + ``Path`` (cache-key filename). Writes the file at ``target`` and + returns the final on-disk path. Most builders just return + ``target`` unchanged; sidecar-producing builders (e.g. a + ``.vrt`` over auxiliary tiles) may write multiple files and + return the entry path. + """ + + fix_id: str + dtype: np.dtype + expected_dims: tuple[str, ...] + expected_crs_epsg: int | None + expected_nodata: object + expected_masked: bool | None + source_type: str + builder: Callable[[Path, Path], Path] + read_kwargs: dict[str, Any] = field(default_factory=dict) + marks: tuple = field(default_factory=tuple) + + +def _wrap_2d(arr: np.ndarray, *, crs: int | None, + nodata: object | None = None) -> xr.DataArray: + """Wrap a 2-D numpy array as a writer-ready DataArray. + + Uses unit-pixel descending-y coords (``y = height-1 .. 0``, + ``x = 0 .. width-1``). The read-back transform tuple for a height-H + fixture is ``(1.0, 0.0, -0.5, 0.0, -1.0, H - 0.5)`` -- the half-pixel + offsets come from the PixelIsArea convention (origin is the pixel + edge, coords are pixel centres) that the writer round-trips. + """ + height, width = arr.shape + attrs: dict[str, Any] = {} + if crs is not None: + attrs["crs"] = crs + if nodata is not None: + attrs["nodata"] = nodata + return xr.DataArray( + arr, dims=["y", "x"], + coords={ + "y": np.arange(height - 1, -1, -1, dtype=np.float64), + "x": np.arange(width, dtype=np.float64), + }, + attrs=attrs, + ) + + +def _wrap_3d(arr: np.ndarray, *, crs: int) -> xr.DataArray: + """Wrap a 3-D (y, x, band) array as a writer-ready DataArray.""" + height, width, n_bands = arr.shape + return xr.DataArray( + arr, dims=["y", "x", "band"], + coords={ + "y": np.arange(height - 1, -1, -1, dtype=np.float64), + "x": np.arange(width, dtype=np.float64), + "band": np.arange(n_bands), + }, + attrs={"crs": crs}, + ) + + +# --------------------------------------------------------------------------- +# Fixture builders +# --------------------------------------------------------------------------- + +def _build_int16_single_band(dir_path: Path, target: Path) -> Path: + """High-risk fixture: int16 single-band stripped TIFF, EPSG:4326, no nodata.""" + del dir_path + rng = np.random.default_rng(seed=19850) + arr = rng.integers(-30000, 30000, size=(32, 32), dtype=np.int16) + to_geotiff( + _wrap_2d(arr, crs=4326), str(target), + compression="none", tiled=False, + ) + return target + + +def _build_uint16_multiband_tiled(dir_path: Path, target: Path) -> Path: + """Multiband tiled fixture: uint16, three bands, deflate-compressed.""" + del dir_path + rng = np.random.default_rng(seed=21320) + arr = rng.integers(0, 60000, size=(32, 32, 3), dtype=np.uint16) + to_geotiff( + _wrap_3d(arr, crs=4326), str(target), + compression="deflate", tiled=True, tile_size=16, + ) + return target + + +def _build_float32_with_nodata(dir_path: Path, target: Path) -> Path: + """Float32 single-band fixture with a -9999.0 nodata sentinel.""" + del dir_path + rng = np.random.default_rng(seed=21321) + arr = (rng.standard_normal((32, 32)) * 100.0).astype(np.float32) + # Sprinkle nodata sentinels into a few pixels so masking has work to do. + arr[0, 0] = -9999.0 + arr[5, 7] = -9999.0 + arr[31, 31] = -9999.0 + to_geotiff( + _wrap_2d(arr, crs=4326, nodata=-9999.0), str(target), + compression="none", tiled=False, + ) + return target + + +def _build_int8_unmasked(dir_path: Path, target: Path) -> Path: + """Int8 single-band fixture with a -128 nodata sentinel. + + Read back with ``mask_nodata=False`` so the literal sentinel survives + in the int8 buffer (locks the #2092 / #2127 masked-flag contract). + """ + del dir_path + rng = np.random.default_rng(seed=21322) + arr = rng.integers(-100, 100, size=(32, 32), dtype=np.int8) + arr[0, 0] = -128 + arr[4, 4] = -128 + to_geotiff( + _wrap_2d(arr, crs=4326, nodata=-128), str(target), + compression="none", tiled=False, + ) + return target + + +def _build_cog(dir_path: Path, target: Path) -> Path: + """COG fixture: float32 tiled with one overview level.""" + del dir_path + rng = np.random.default_rng(seed=21323) + arr = (rng.standard_normal((64, 64)) * 100.0).astype(np.float32) + to_geotiff( + _wrap_2d(arr, crs=4326), str(target), + compression="deflate", cog=True, tile_size=16, + overview_levels=[2], + ) + return target + + +def _build_vrt_mosaic(dir_path: Path, target: Path) -> Path: + """VRT fixture: 2-tile mosaic of float32 stripes laid out side by side.""" + tile_h, tile_w = 16, 16 + tile_paths: list[str] = [] + for c in range(2): + arr = np.full((tile_h, tile_w), + float(c + 1), dtype=np.float32) + origin_x = float(c * tile_w) + da = xr.DataArray( + arr, dims=["y", "x"], + coords={ + "y": np.arange(tile_h - 1, -1, -1, dtype=np.float64), + "x": np.arange(origin_x, origin_x + tile_w, dtype=np.float64), + }, + attrs={"crs": 4326}, + ) + p = dir_path / f"{target.stem}_tile_{c}.tif" + to_geotiff(da, str(p), compression="none", tiled=False) + tile_paths.append(str(p)) + write_vrt(str(target), tile_paths, relative=False, crs=4326) + return target + + +def _build_miniswhite(dir_path: Path, target: Path) -> Path: + """MinIsWhite uint8 fixture written via tifffile (photometric=0).""" + del dir_path + import tifffile # local import: only this builder needs tifffile + rng = np.random.default_rng(seed=21324) + arr = rng.integers(0, 256, size=(32, 32), dtype=np.uint8) + tifffile.imwrite( + str(target), arr, photometric="miniswhite", + compression="none", metadata=None, + ) + return target + + +_FIXTURES: list[_FixtureSpec] = [ + _FixtureSpec( + fix_id="int16-single-band", + dtype=np.dtype("int16"), + expected_dims=("y", "x"), + expected_crs_epsg=4326, + expected_nodata=None, + expected_masked=None, + source_type=_SRC_LOCAL_TIFF, + builder=_build_int16_single_band, + ), + _FixtureSpec( + fix_id="uint16-multiband-tiled", + dtype=np.dtype("uint16"), + expected_dims=("y", "x", "band"), + expected_crs_epsg=4326, + expected_nodata=None, + expected_masked=None, + source_type=_SRC_LOCAL_TIFF, + builder=_build_uint16_multiband_tiled, + ), + _FixtureSpec( + fix_id="float32-nodata", + dtype=np.dtype("float32"), + expected_dims=("y", "x"), + expected_crs_epsg=4326, + expected_nodata=-9999.0, + expected_masked=True, + source_type=_SRC_LOCAL_TIFF, + builder=_build_float32_with_nodata, + ), + _FixtureSpec( + fix_id="int8-unmasked", + dtype=np.dtype("int8"), + expected_dims=("y", "x"), + expected_crs_epsg=4326, + expected_nodata=-128, + expected_masked=False, + source_type=_SRC_LOCAL_TIFF, + builder=_build_int8_unmasked, + read_kwargs={"mask_nodata": False}, + ), + _FixtureSpec( + fix_id="cog-float32", + dtype=np.dtype("float32"), + expected_dims=("y", "x"), + expected_crs_epsg=4326, + expected_nodata=None, + expected_masked=None, + source_type=_SRC_LOCAL_TIFF, + builder=_build_cog, + ), + _FixtureSpec( + fix_id="vrt-mosaic", + dtype=np.dtype("float32"), + expected_dims=("y", "x"), + expected_crs_epsg=4326, + expected_nodata=None, + expected_masked=None, + source_type=_SRC_LOCAL_VRT, + builder=_build_vrt_mosaic, + ), + _FixtureSpec( + fix_id="miniswhite", + dtype=np.dtype("uint8"), + expected_dims=("y", "x"), + expected_crs_epsg=None, + expected_nodata=None, + expected_masked=None, + source_type=_SRC_LOCAL_TIFF, + builder=_build_miniswhite, + marks=(_skip_no_tifffile,), + ), +] + + +def _fixture_params() -> list: + """Build the pytest.param list for the fixture matrix.""" + return [pytest.param(spec, id=spec.fix_id, marks=spec.marks) + for spec in _FIXTURES] + + +@pytest.fixture(scope="session") +def _parity_matrix_dir(tmp_path_factory): + """Session-scoped scratch dir, one write per fixture id. + + Tests reuse files across cells. The matrix has up to 8 backends + x 7 fixtures; without caching every backend-row would rewrite the + fixture from scratch. + """ + return tmp_path_factory.mktemp("parity_matrix_2132") + + +@pytest.fixture +def parity_fixture(_parity_matrix_dir): + """Resolve a :class:`_FixtureSpec` to an on-disk path. + + Files are cached across the session: a fixture already present on + disk is returned without rewriting. + """ + dir_path = _parity_matrix_dir + + def _resolve(spec: _FixtureSpec) -> Path: + safe_id = spec.fix_id.replace("/", "-") + suffix = ".vrt" if spec.source_type == _SRC_LOCAL_VRT else ".tif" + path = dir_path / f"parity_2132_{safe_id}{suffix}" + if path.exists(): + return path + return spec.builder(dir_path, path) + return _resolve + + +# --------------------------------------------------------------------------- +# Transport adapters for the HTTP and fsspec backend rows +# --------------------------------------------------------------------------- + +class _MatrixRangeHandler(http.server.BaseHTTPRequestHandler): + """HTTP handler with Range support, serving a payload dict by path. + + The dict ``payload_by_path`` is set by the server fixture and maps + URL paths (``/parity_2132_int16-single-band.tif``) to bytes. + """ + + payload_by_path: dict[str, bytes] = {} + + def do_GET(self): # noqa: N802 + payload = self.payload_by_path.get(self.path) + if payload is None: + self.send_response(404) + self.end_headers() + return + rng = self.headers.get("Range") + if rng and rng.startswith("bytes="): + spec = rng[len("bytes="):] + start_s, _, end_s = spec.partition("-") + start = int(start_s) + end = int(end_s) if end_s else len(payload) - 1 + chunk = payload[start:end + 1] + self.send_response(206) + self.send_header("Content-Type", "application/octet-stream") + self.send_header( + "Content-Range", + f"bytes {start}-{start + len(chunk) - 1}/{len(payload)}", + ) + self.send_header("Content-Length", str(len(chunk))) + self.end_headers() + self.wfile.write(chunk) + return + self.send_response(200) + self.send_header("Content-Type", "application/octet-stream") + self.send_header("Content-Length", str(len(payload))) + self.end_headers() + self.wfile.write(payload) + + def log_message(self, *_args, **_kwargs): # noqa: A003 + # Silence the default access log during tests. + pass + + +@pytest.fixture(scope="session") +def _matrix_http_server_session(): + """Shared loopback HTTP server for the http-cog backend row. + + Started once per pytest session and torn down on session exit. The + payload dict on the handler is cleared between tests by the + function-scoped ``_matrix_http_server`` wrapper below; this fixture + only owns the socket and the thread. + """ + handler_cls = type( + "MatrixRangeHandler", (_MatrixRangeHandler,), + {"payload_by_path": dict(_MatrixRangeHandler.payload_by_path)}, + ) + httpd = socketserver.TCPServer(("127.0.0.1", 0), handler_cls) + port = httpd.server_address[1] + thread = threading.Thread(target=httpd.serve_forever, daemon=True) + thread.start() + try: + yield f"http://127.0.0.1:{port}", handler_cls + finally: + httpd.shutdown() + httpd.server_close() + + +@pytest.fixture +def _matrix_http_server(_matrix_http_server_session): + """Function-scoped HTTP server view: clears stale payloads after each test. + + Without this, the session-scoped ``payload_by_path`` dict accumulates + one entry per cell and never releases the bytes. Keeping it + function-scoped means a test only sees the URL paths it uploaded. + """ + base_url, handler_cls = _matrix_http_server_session + handler_cls.payload_by_path.clear() + try: + yield base_url, handler_cls + finally: + handler_cls.payload_by_path.clear() + + +def _deliver_via_http(spec: "_FixtureSpec | _ErrorFixtureSpec", on_disk: Path, + base_url: str, handler_cls, + monkeypatch) -> str: + """Upload an on-disk fixture into the shared HTTP server and return URL. + + The success matrix passes a :class:`_FixtureSpec`; the error + sub-matrix passes an :class:`_ErrorFixtureSpec`. Both expose + ``fix_id`` so the function consumes either. + """ + del spec # the spec is unused; signature kept for symmetry with fsspec + monkeypatch.setenv("XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS", "1") + with open(on_disk, "rb") as f: + payload = f.read() + url_path = f"/{on_disk.name}" + handler_cls.payload_by_path[url_path] = payload + return f"{base_url}{url_path}" + + +def _deliver_via_fsspec(spec: "_FixtureSpec | _ErrorFixtureSpec", + on_disk: Path) -> str: + """Pipe an on-disk fixture into fsspec's memory:// filesystem. + + Returns the ``memory://`` URI the read path should consume. The + memory filesystem persists for the pytest process, so the URI path + is namespaced by the fixture id to avoid collisions across cells. + """ + import fsspec + fs = fsspec.filesystem("memory") + safe_id = spec.fix_id.replace("/", "-") + uri_path = f"/parity_2132_{safe_id}.tif" + with open(on_disk, "rb") as f: + payload = f.read() + fs.pipe(uri_path, payload) + return f"memory://{uri_path}" + + +# --------------------------------------------------------------------------- +# Materialisation + comparison helpers +# --------------------------------------------------------------------------- + +def _materialise(da: xr.DataArray) -> np.ndarray: + """Return a numpy view of ``da.data`` regardless of backend.""" + raw = da.data + if hasattr(raw, "compute"): + raw = raw.compute() + if hasattr(raw, "get"): + raw = raw.get() + return np.asarray(raw) + + +def _coord_view(da: xr.DataArray, name: str) -> np.ndarray: + return np.asarray(da.coords[name].values) + + +def _assert_pixels_equal(ref: np.ndarray, actual: np.ndarray, *, label: str) -> None: + """Pixel equality, dtype-aware. + + Integer arrays must be byte-identical; float arrays compare NaN-aware + with ``equal_nan=True``. Diverging dtypes always fail -- a backend + that silently upcasts has a bug. + """ + assert ref.dtype == actual.dtype, ( + f"{label}: dtype differs ref={ref.dtype} actual={actual.dtype}" + ) + assert ref.shape == actual.shape, ( + f"{label}: shape differs ref={ref.shape} actual={actual.shape}" + ) + if ref.dtype.kind == "f": + assert np.array_equal(ref, actual, equal_nan=True), ( + f"{label}: float pixels differ (NaN-aware)" + ) + else: + assert ref.tobytes() == actual.tobytes(), ( + f"{label}: integer pixel bytes differ" + ) + + +# --------------------------------------------------------------------------- +# The matrix cell +# --------------------------------------------------------------------------- + +def assert_parity( + da: xr.DataArray, + spec: _FixtureSpec, + *, + ref: xr.DataArray, + label: str, +) -> None: + """Assert every parity field for one (fixture, backend) cell. + + Run against an already-read DataArray rather than re-opening here so + the same helper applies to both ``open_geotiff(path, **kwargs)`` and + the explicit ``read_geotiff_dask`` / ``read_geotiff_gpu`` / + ``read_vrt`` entry points wired up in follow-up PRs. ``ref`` is the + eager-numpy read of the same fixture, used as the reference for the + pixel array, coord values, dims, and transform tuple. + + ``spec.dtype`` and ``spec.expected_crs_epsg`` / + ``spec.expected_nodata`` are asserted against the actual + independently of the reference, so a bug that silently changes + them in *every* backend still fails this cell. + """ + # Pixel array, dtype, shape. + actual_arr = _materialise(da) + _assert_pixels_equal( + _materialise(ref), actual_arr, label=label, + ) + + # Dtype against the spec, not just against the reference. Catches a + # silent upcast that the reference would also exhibit. + assert actual_arr.dtype == spec.dtype, ( + f"{label}: dtype {actual_arr.dtype} != spec dtype {spec.dtype}" + ) + + # Dims + order. + assert da.dims == spec.expected_dims, ( + f"{label}: dims {da.dims!r} != expected {spec.expected_dims!r}" + ) + + # Coord values and coord dtype, per axis. Skip axes that the + # reference does not carry as a coord (e.g. ``band`` for some + # multiband layouts when the writer drops the index). + for axis in spec.expected_dims: + if axis not in ref.coords: + continue + ref_c = _coord_view(ref, axis) + actual_c = _coord_view(da, axis) + assert ref_c.dtype == actual_c.dtype, ( + f"{label}: coord {axis!r} dtype " + f"ref={ref_c.dtype} actual={actual_c.dtype}" + ) + assert ref_c.tobytes() == actual_c.tobytes(), ( + f"{label}: coord {axis!r} bytes differ" + ) + + # Transform tuple. The VRT path uses ``rasterio.Affine`` instances + # which compare equal to 6-tuples via ``__eq__``. + ref_t = ref.attrs.get("transform") + actual_t = da.attrs.get("transform") + assert ref_t == actual_t, ( + f"{label}: transform tuple differs ref={ref_t!r} actual={actual_t!r}" + ) + + # CRS: EPSG int + WKT string. + if spec.expected_crs_epsg is not None: + assert da.attrs.get("crs") == spec.expected_crs_epsg, ( + f"{label}: attrs['crs'] {da.attrs.get('crs')!r} != " + f"expected {spec.expected_crs_epsg!r}" + ) + ref_wkt = ref.attrs.get("crs_wkt") + actual_wkt = da.attrs.get("crs_wkt") + assert ref_wkt == actual_wkt, ( + f"{label}: crs_wkt differs ref={ref_wkt!r} actual={actual_wkt!r}" + ) + + # Nodata sentinel + masking state. + if spec.expected_nodata is None: + assert "nodata" not in da.attrs, ( + f"{label}: fixture declares no nodata but attrs['nodata']=" + f"{da.attrs.get('nodata')!r}" + ) + else: + assert da.attrs.get("nodata") == spec.expected_nodata, ( + f"{label}: attrs['nodata'] {da.attrs.get('nodata')!r} != " + f"expected {spec.expected_nodata!r}" + ) + + # Masking state: ``attrs['masked_nodata']`` reflects whether the + # reader replaced sentinel pixels with NaN (#2092 / #2127). The + # contract is fixed once a fixture declares a sentinel. + if spec.expected_masked is not None: + actual_masked = da.attrs.get("masked_nodata") + assert actual_masked == spec.expected_masked, ( + f"{label}: attrs['masked_nodata'] {actual_masked!r} != " + f"expected {spec.expected_masked!r}" + ) + + # Selected canonical attrs: reference and actual agree on presence + # and value. The list is intentionally narrow until issue #1984's + # contract version stamp lands. + canonical_keys = ("raster_type", "transform", "crs", "crs_wkt") + for key in canonical_keys: + ref_v = ref.attrs.get(key) + actual_v = da.attrs.get(key) + assert ref_v == actual_v, ( + f"{label}: canonical attr {key!r} differs " + f"ref={ref_v!r} actual={actual_v!r}" + ) + + +# --------------------------------------------------------------------------- +# Source-delivery wrapper: hands one fixture to a specific source type +# --------------------------------------------------------------------------- + +def _resolve_source( + spec: _FixtureSpec, on_disk: Path, backend: _BackendSpec, + *, + http_state, monkeypatch, +) -> object: + """Return the value that should be passed as ``source`` to ``open_geotiff``. + + Most backends consume the on-disk path verbatim. The ``http-cog`` + and ``fsspec-memory`` backends override the source type, so the + fixture bytes are re-served through the requested transport. + """ + target_type = backend.source_type_override or spec.source_type + if target_type == _SRC_LOCAL_TIFF or target_type == _SRC_LOCAL_VRT: + return str(on_disk) + if target_type == _SRC_HTTP: + base_url, handler_cls = http_state + return _deliver_via_http(spec, on_disk, base_url, handler_cls, monkeypatch) + if target_type == _SRC_FSSPEC: + return _deliver_via_fsspec(spec, on_disk) + raise AssertionError(f"unknown source type: {target_type}") + + +# --------------------------------------------------------------------------- +# The single matrix test entry point +# --------------------------------------------------------------------------- + +@pytest.mark.parametrize("spec", _fixture_params()) +@pytest.mark.parametrize("backend", _backend_params()) +def test_backend_parity_matrix( + parity_fixture, spec, backend, + _matrix_http_server, monkeypatch, +): + """One cell per (fixture, backend). Asserts every parity field. + + A new backend or fixture lights up automatically on the next pytest + run -- no per-cell test function needed. Incompatible (backend, + source) pairs skip cleanly rather than failing. + """ + if spec.source_type not in backend.compat: + pytest.skip( + f"backend={backend.backend_id} does not consume source_type=" + f"{spec.source_type} (fixture={spec.fix_id})" + ) + + path = parity_fixture(spec) + + # Eager-numpy reference: read the same on-disk fixture through the + # default backend so the matrix compares like-for-like. + ref = open_geotiff(str(path), **spec.read_kwargs) + + # Resolve the source the backend should actually consume (the + # on-disk path for local backends, an HTTP URL for the HTTP row, + # or a memory:// URI for the fsspec row). + source = _resolve_source( + spec, path, backend, + http_state=_matrix_http_server, monkeypatch=monkeypatch, + ) + + da = open_geotiff(source, **backend.kwargs, **spec.read_kwargs) + label = ( + f"fixture={spec.fix_id} backend={backend.backend_id} " + f"kwargs={backend.kwargs}" + ) + assert_parity(da, spec, ref=ref, label=label) + + +# --------------------------------------------------------------------------- +# Error-fixture sub-matrix: rotated ModelTransformationTag without opt-in +# --------------------------------------------------------------------------- + +_ROTATED_M = ( + 8.660254037844387, -5.0, 0.0, 100.0, # x row (30 deg rotation, pix=10) + 5.0, 8.660254037844387, 0.0, 200.0, # y row + 0.0, 0.0, 1.0, 0.0, + 0.0, 0.0, 0.0, 1.0, +) + + +def _write_rotated_tiff(path: Path, arr: np.ndarray) -> None: + """Hand-build a TIFF with a rotated ``ModelTransformationTag``. + + Mirrors the minimal writer used by + ``test_allow_rotated_geotiff_2115.py`` so the matrix can assert + error behaviour without depending on rasterio / GDAL. + """ + import struct + h, w = arr.shape + arr = np.ascontiguousarray(arr.astype(" Path: + del dir_path + arr = np.arange(20, dtype=" Path: + safe_id = spec.fix_id.replace("/", "-") + path = dir_path / f"parity_2132_err_{safe_id}.tif" + if path.exists(): + return path + return spec.builder(dir_path, path) + return _resolve + + +@pytest.mark.parametrize("error_spec", _ERROR_FIXTURES, + ids=lambda s: s.fix_id) +@pytest.mark.parametrize("backend", _backend_params()) +def test_backend_parity_matrix_errors( + error_parity_fixture, error_spec, backend, + _matrix_http_server, monkeypatch, +): + """Error fixtures raise the same exception on every compatible backend. + + Backends incompatible with the error fixture's source type skip; + every remaining cell asserts the same ``pytest.raises`` contract. + """ + if error_spec.source_type not in backend.compat: + pytest.skip( + f"backend={backend.backend_id} does not consume source_type=" + f"{error_spec.source_type}" + ) + + path = error_parity_fixture(error_spec) + + # Re-route the path through the requested transport (HTTP, fsspec) + # so the error surfaces on the same code path as the success + # matrix. + source = _resolve_source( + error_spec, path, backend, + http_state=_matrix_http_server, monkeypatch=monkeypatch, + ) + + with pytest.raises(error_spec.exc, match=error_spec.match): + # ``open_geotiff`` may return lazily for chunked reads, so + # force a materialisation inside the ``pytest.raises`` block + # so the error surfaces here regardless of laziness. + out = open_geotiff(source, **backend.kwargs) + _materialise(out) + + +# =========================================================================== +# Full-fixture parity gate over the golden corpus +# =========================================================================== +# +# Compares every read backend against the eager-numpy reference on every +# manifest fixture. Originally lived in +# ``test_backend_full_parity_2211.py``; merged here so a single file owns +# all matrix-style backend-parity assertions. + +_HAS_YAML = importlib.util.find_spec("yaml") is not None +_HAS_RASTERIO = importlib.util.find_spec("rasterio") is not None + +if _HAS_YAML and _HAS_RASTERIO: + from xrspatial.geotiff.tests.golden_corpus import generate as _fp_generate + from xrspatial.geotiff.tests.golden_corpus._marks import ( + fast_slow_marks_for as _fp_fast_slow_marks_for, + ) + + _FP_FIXTURES_DIR = ( + pathlib.Path(_fp_generate.__file__).resolve().parent / "fixtures" + ) + +# Chunk size for the dask rows. Most corpus fixtures are 64x64 or +# smaller, so 32 produces either a 2x2 chunk grid or a single chunk. +_FP_CHUNK_SIZE = 32 + + +_FP_GEOREF_KEYS: tuple[str, ...] = ( + "transform", + "crs", + "crs_wkt", +) + +_FP_NODATA_KEYS: tuple[str, ...] = ( + "nodata", + "masked_nodata", +) + +_FP_CANONICAL_METADATA_KEYS: tuple[str, ...] = ( + "raster_type", + "x_resolution", + "y_resolution", + "resolution_unit", + "georef_status", +) + + +# Fixtures the full parity matrix skips outright. Each entry cites the +# source of the divergence. +_FP_INTENTIONAL_SKIPS: dict[str, str] = { + "nodata_miniswhite_uint8": ( + "MinIsWhite photometric inversion: xrspatial inverts pixels per " + "issue 1797; rasterio leaves them raw. The matrix would compare " + "inverted-vs-raw and fail on every row. Covered by the dedicated " + "miniswhite parity case in this file." + ), + "compression_jpeg_uint8_ycbcr": ( + "JPEG-YCbCr is lossy and exposes a (bands, y, x) vs (y, x, band) " + "axis-order divergence that the golden-corpus oracle handles " + "via _normalise_axis_order but this gate's dims/coords check " + "cannot, because the dims tuple itself differs." + ), +} + + +_FP_BACKEND_SKIPS: dict[str, dict[str, str]] = { + "vrt_eager": { + "crs_citation_only": ( + "VRT round-trip mutates user-defined CRS WKT." + ), + "overview_external_ovr_uint16": ( + "External .ovr sidecar is not preserved through VRT wrap." + ), + "sparse_tiled_uint16": ( + "Sparse-tile holes are not preserved through VRT wrap." + ), + "extra_tags_uint16": ( + "VRT wrap does not propagate source TIFF resolution tags or " + "extra_tags." + ), + }, + "http_fsspec": { + "overview_external_ovr_uint16": ( + "External .ovr sidecar reader is not wired into the cloud " + "source path." + ), + }, +} + + +@dataclass(frozen=True) +class _FpBackend: + """One row of the full-parity matrix.""" + + backend_id: str + read: Callable[[pathlib.Path, str], xr.DataArray] + available: bool + unavailable_reason: str + skips: dict[str, str] = field(default_factory=dict) + + +# Experimental and internal-only codecs require an explicit opt-in on +# the read side. The full parity matrix is orthogonal to the opt-in +# contract, so pass both flags through every opener. +_FP_OPTIN = { + "allow_experimental_codecs": True, + "allow_internal_only_jpeg": True, +} + + +def _fp_read_eager_numpy(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: + return open_geotiff(str(path), **_FP_OPTIN) + + +def _fp_read_dask_numpy(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: + return open_geotiff(str(path), chunks=_FP_CHUNK_SIZE, **_FP_OPTIN) + + +def _fp_read_gpu(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: + return open_geotiff( + str(path), gpu=True, on_gpu_failure="strict", **_FP_OPTIN) + + +def _fp_read_dask_gpu(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: + return open_geotiff( + str(path), gpu=True, chunks=_FP_CHUNK_SIZE, + on_gpu_failure="strict", **_FP_OPTIN, + ) + + +def _fp_vrt_cache_dir(fixtures_dir: pathlib.Path) -> pathlib.Path: + """Per-session VRT scratch directory, keyed by fixtures path digest.""" + import hashlib + import tempfile + base = pathlib.Path(tempfile.gettempdir()) / "xrspatial_parity_vrt_cache" + base.mkdir(parents=True, exist_ok=True) + digest = hashlib.sha1(str(fixtures_dir).encode()).hexdigest()[:12] + sub = base / f"fix_{digest}" + sub.mkdir(parents=True, exist_ok=True) + return sub + + +def _fp_read_vrt_eager(path: pathlib.Path, fixture_id: str) -> xr.DataArray: + """Wrap ``path`` in a one-source VRT and read it back via xrspatial.""" + import shutil + cache_dir = _fp_vrt_cache_dir(path.parent) + local_src = cache_dir / f"{fixture_id}.tif" + if not local_src.exists(): + shutil.copy2(path, local_src) + vrt_path = cache_dir / f"{fixture_id}.vrt" + if not vrt_path.exists(): + write_vrt(str(vrt_path), [str(local_src)]) + return open_geotiff(str(vrt_path), **_FP_OPTIN) + + +def _fp_read_http_fsspec(path: pathlib.Path, fixture_id: str) -> xr.DataArray: + """Serve the fixture bytes through fsspec's in-process memory FS.""" + import fsspec + fs = fsspec.filesystem("memory") + key = f"/corpus_full_parity/{fixture_id}.tif" + with open(path, "rb") as f: + fs.pipe(key, f.read()) + try: + da = open_geotiff(f"memory://{key}", **_FP_OPTIN) + finally: + try: + fs.rm(key) + except FileNotFoundError: + pass + return da + + +_FP_GPU_UNAVAILABLE_REASON = ( + "GPU backend skipped LOUDLY: cupy + CUDA are not available in this " + "environment. GPU and Dask+GPU rows must skip explicitly rather " + "than silently collect zero tests. To exercise these rows, install " + "cupy and ensure a CUDA device is reachable." +) + +_FP_DASK_UNAVAILABLE_REASON = ( + "dask backend skipped: dask is not installed." +) + +_FP_FSSPEC_UNAVAILABLE_REASON = ( + "http_fsspec backend skipped: fsspec is not installed." +) + + +_FP_BACKENDS: list[_FpBackend] = [ + _FpBackend( + backend_id="eager_numpy", + read=_fp_read_eager_numpy, + available=True, + unavailable_reason="", + ), + _FpBackend( + backend_id="dask_numpy", + read=_fp_read_dask_numpy, + available=_HAS_DASK, + unavailable_reason=_FP_DASK_UNAVAILABLE_REASON, + ), + _FpBackend( + backend_id="gpu", + read=_fp_read_gpu, + available=_HAS_GPU, + unavailable_reason=_FP_GPU_UNAVAILABLE_REASON, + ), + _FpBackend( + backend_id="dask_gpu", + read=_fp_read_dask_gpu, + available=_HAS_GPU and _HAS_DASK, + unavailable_reason=( + _FP_GPU_UNAVAILABLE_REASON if not _HAS_GPU + else _FP_DASK_UNAVAILABLE_REASON + ), + skips=dict(_FP_BACKEND_SKIPS.get("dask_gpu", {})), + ), + _FpBackend( + backend_id="vrt_eager", + read=_fp_read_vrt_eager, + available=_HAS_YAML and _HAS_RASTERIO, + unavailable_reason="yaml + rasterio required", + skips=dict(_FP_BACKEND_SKIPS["vrt_eager"]), + ), + _FpBackend( + backend_id="http_fsspec", + read=_fp_read_http_fsspec, + available=_HAS_FSSPEC, + unavailable_reason=_FP_FSSPEC_UNAVAILABLE_REASON, + skips=dict(_FP_BACKEND_SKIPS["http_fsspec"]), + ), +] + + +def _fp_resolved_fixtures() -> list[dict[str, Any]]: + """Return manifest entries with defaults merged, sorted by id.""" + if not (_HAS_YAML and _HAS_RASTERIO): + return [] + manifest = _fp_generate.load_manifest() + entries = _fp_generate.validate(manifest) + entries.sort(key=lambda e: e["id"]) + return entries + + +def _fp_fixture_path(entry: dict[str, Any]) -> pathlib.Path: + return _FP_FIXTURES_DIR / f"{entry['id']}.tif" + + +def _fp_is_lossy(entry: dict[str, Any]) -> bool: + tol = entry.get("tolerance") or {} + return bool(tol.get("lossy", False)) + + +_FP_FIXTURES = _fp_resolved_fixtures() + + +def _fp_build_fixture_params() -> list: + """One ``pytest.param`` per manifest entry, with slow/skip marks.""" + if not (_HAS_YAML and _HAS_RASTERIO): + return [pytest.param( + None, id="no-manifest", + marks=pytest.mark.skip(reason="yaml + rasterio required"), + )] + out = [] + for entry in _FP_FIXTURES: + fid = entry["id"] + marks = list(_fp_fast_slow_marks_for(entry)) + if fid in _FP_INTENTIONAL_SKIPS: + marks.append(pytest.mark.skip(reason=_FP_INTENTIONAL_SKIPS[fid])) + out.append(pytest.param(entry, id=fid, marks=marks)) + return out + + +def _fp_build_backend_params() -> list: + """One ``pytest.param`` per backend; unavailable rows skip.""" + out = [] + for backend in _FP_BACKENDS: + marks = [] + if not backend.available: + marks.append(pytest.mark.skip(reason=backend.unavailable_reason)) + out.append(pytest.param(backend, id=backend.backend_id, marks=marks)) + return out + + +_FP_FIXTURE_PARAMS = _fp_build_fixture_params() +_FP_BACKEND_PARAMS = _fp_build_backend_params() + + +def _fp_is_nan_sentinel(value: Any) -> bool: + if value is None: + return False + try: + return bool(np.isnan(float(value))) + except (TypeError, ValueError): + return False + + +def _fp_assert_pixels_close( + ref: np.ndarray, cand: np.ndarray, *, lossy: bool, label: str, +) -> None: + assert ref.shape == cand.shape, ( + f"{label}: shape mismatch ref={ref.shape} cand={cand.shape}" + ) + if lossy: + return + assert ref.dtype == cand.dtype, ( + f"{label}: dtype mismatch ref={ref.dtype} cand={cand.dtype}" + ) + if ref.dtype.kind == "f": + # Bit-exact today across decode paths. ``rtol=1e-12`` tracks + # data magnitude so small-magnitude fixtures aren't held to a + # slacker bar. ``atol=0`` keeps zeros strict. + ok = np.allclose(ref, cand, rtol=1e-12, atol=0.0, equal_nan=True) + if not ok: + diff = np.abs(np.where( + np.isnan(ref) & np.isnan(cand), 0.0, ref - cand + )) + raise AssertionError( + f"{label}: pixel allclose failed; max abs diff=" + f"{np.nanmax(diff)!r}" + ) + else: + if not np.array_equal(ref, cand): + raise AssertionError( + f"{label}: integer pixels differ (bit-exact comparison " + f"failed) ref.dtype={ref.dtype}" + ) + + +def _fp_assert_dims_and_coords( + ref: xr.DataArray, cand: xr.DataArray, *, label: str, +) -> None: + assert ref.dims == cand.dims, ( + f"{label}: dims mismatch ref={ref.dims!r} cand={cand.dims!r}" + ) + for axis in ref.dims: + if axis not in ref.coords: + assert axis not in cand.coords, ( + f"{label}: candidate has coord {axis!r} that the " + f"reference does not" + ) + continue + assert axis in cand.coords, ( + f"{label}: candidate is missing coord {axis!r}" + ) + ref_c = np.asarray(ref.coords[axis].values) + cand_c = np.asarray(cand.coords[axis].values) + assert ref_c.dtype == cand_c.dtype, ( + f"{label}: coord {axis!r} dtype ref={ref_c.dtype} " + f"cand={cand_c.dtype}" + ) + if ref_c.dtype.kind == "f": + assert np.allclose(ref_c, cand_c, rtol=0.0, atol=1e-9), ( + f"{label}: coord {axis!r} values differ" + ) + else: + assert np.array_equal(ref_c, cand_c), ( + f"{label}: coord {axis!r} values differ" + ) + + +def _fp_assert_transform_attrs( + ref: xr.DataArray, cand: xr.DataArray, *, label: str, +) -> None: + ref_t = ref.attrs.get("transform") + cand_t = cand.attrs.get("transform") + if ref_t is None and cand_t is None: + return + assert ref_t is not None and cand_t is not None, ( + f"{label}: transform presence differs ref={ref_t!r} cand={cand_t!r}" + ) + ref_tup = tuple(float(v) for v in ref_t) + cand_tup = tuple(float(v) for v in cand_t) + assert len(ref_tup) == 6 and len(cand_tup) == 6, ( + f"{label}: transform must be a 6-tuple" + ) + for i, (a, b) in enumerate(zip(ref_tup, cand_tup)): + assert abs(a - b) <= 1e-9, ( + f"{label}: transform[{i}] differs ref={a!r} cand={b!r}" + ) + + +def _fp_assert_crs_attrs( + ref: xr.DataArray, cand: xr.DataArray, *, label: str, +) -> None: + for key in ("crs", "crs_wkt"): + ref_v = ref.attrs.get(key) + cand_v = cand.attrs.get(key) + assert ref_v == cand_v, ( + f"{label}: attr {key!r} differs ref={ref_v!r} cand={cand_v!r}" + ) + + +def _fp_assert_nodata_attrs( + ref: xr.DataArray, cand: xr.DataArray, *, label: str, +) -> None: + ref_nd = ref.attrs.get("nodata") + cand_nd = cand.attrs.get("nodata") + if ref_nd is None and cand_nd is None: + pass + else: + ref_is_nan = _fp_is_nan_sentinel(ref_nd) + cand_is_nan = _fp_is_nan_sentinel(cand_nd) + if not (ref_is_nan and cand_is_nan): + assert ref_nd == cand_nd, ( + f"{label}: nodata differs ref={ref_nd!r} cand={cand_nd!r}" + ) + ref_masked = ref.attrs.get("masked_nodata") + cand_masked = cand.attrs.get("masked_nodata") + assert ref_masked == cand_masked, ( + f"{label}: masked_nodata differs ref={ref_masked!r} " + f"cand={cand_masked!r}" + ) + ref_dtype = np.dtype(ref.dtype) + cand_dtype = np.dtype(cand.dtype) + assert ref_dtype == cand_dtype, ( + f"{label}: pixel dtype differs ref={ref_dtype} cand={cand_dtype}" + ) + + +def _fp_assert_canonical_metadata_attrs( + ref: xr.DataArray, cand: xr.DataArray, *, label: str, +) -> None: + for key in _FP_CANONICAL_METADATA_KEYS: + in_ref = key in ref.attrs + in_cand = key in cand.attrs + assert in_ref == in_cand, ( + f"{label}: canonical attr {key!r} presence differs " + f"ref={in_ref} cand={in_cand}" + ) + if in_ref: + ref_v = ref.attrs[key] + cand_v = cand.attrs[key] + assert ref_v == cand_v, ( + f"{label}: canonical attr {key!r} value differs " + f"ref={ref_v!r} cand={cand_v!r}" + ) + + +@pytest.fixture(scope="module") +def _fp_reference_cache() -> dict[str, xr.DataArray]: + """Cache eager-numpy reads keyed by fixture id.""" + return {} + + +def _fp_reference_for( + entry: dict[str, Any], cache: dict[str, xr.DataArray], +) -> xr.DataArray: + fid = entry["id"] + if fid not in cache: + cache[fid] = open_geotiff(str(_fp_fixture_path(entry)), **_FP_OPTIN) + return cache[fid] + + +@pytest.mark.parametrize("backend", _FP_BACKEND_PARAMS) +@pytest.mark.parametrize("manifest_entry", _FP_FIXTURE_PARAMS) +def test_backend_full_parity( + manifest_entry, + backend, + _fp_reference_cache, +): + """Full-corpus contract gate for every (backend, fixture) cell. + + 1. Look up (or read) the eager-numpy reference for the fixture. + 2. Read the same fixture through ``backend.read``. + 3. Assert pixels, dims + coords, transform/georef, CRS, nodata, + and the curated canonical metadata attrs. + """ + if manifest_entry is None: + pytest.skip("yaml + rasterio required for the manifest fixture set") + + fixture_id = manifest_entry["id"] + path = _fp_fixture_path(manifest_entry) + if not path.exists(): + pytest.skip( + f"fixture {fixture_id!r} has no .tif on disk; run " + f"`python -m xrspatial.geotiff.tests.golden_corpus.generate`" + ) + + if fixture_id in backend.skips: + pytest.skip( + f"backend={backend.backend_id} cannot read fixture=" + f"{fixture_id}: {backend.skips[fixture_id]}" + ) + + reference = _fp_reference_for(manifest_entry, _fp_reference_cache) + try: + candidate = backend.read(path, fixture_id) + except Exception as exc: + raise AssertionError( + f"backend={backend.backend_id} failed to read fixture=" + f"{fixture_id}: {type(exc).__name__}: {exc}" + ) from exc + + label = f"fixture={fixture_id} backend={backend.backend_id}" + + ref_px = _materialise(reference) + cand_px = _materialise(candidate) + _fp_assert_pixels_close( + ref_px, cand_px, lossy=_fp_is_lossy(manifest_entry), label=label, + ) + _fp_assert_dims_and_coords(reference, candidate, label=label) + _fp_assert_transform_attrs(reference, candidate, label=label) + _fp_assert_crs_attrs(reference, candidate, label=label) + _fp_assert_nodata_attrs(reference, candidate, label=label) + _fp_assert_canonical_metadata_attrs(reference, candidate, label=label) + + +def test_taxonomy_ids_are_in_manifest(): + """Every fixture id in a skip table must exist in the manifest.""" + if not (_HAS_YAML and _HAS_RASTERIO): + pytest.skip("yaml + rasterio required") + manifest_ids = {e["id"] for e in _FP_FIXTURES} + referenced: set[str] = set(_FP_INTENTIONAL_SKIPS) + for backend in _FP_BACKENDS: + referenced.update(backend.skips) + stale = referenced - manifest_ids + assert not stale, ( + f"skip tables reference unknown fixture ids: {sorted(stale)}" + ) + + +def test_gpu_skip_reason_is_loud(): + """GPU + Dask+GPU skips must be explicit, not silent.""" + for backend_id in ("gpu", "dask_gpu"): + backend = next(b for b in _FP_BACKENDS if b.backend_id == backend_id) + if backend.available: + continue + reason = backend.unavailable_reason + assert "skipped LOUDLY" in reason or "skipped" in reason, ( + f"{backend_id} unavailable_reason is not explicit enough: " + f"{reason!r}" + ) + + +def _fp_first_eligible_fixture() -> dict[str, Any] | None: + """Pick a fast, on-disk fixture that none of the skip tables flag.""" + if not (_HAS_YAML and _HAS_RASTERIO): + return None + for entry in _FP_FIXTURES: + if entry["id"] in _FP_INTENTIONAL_SKIPS: + continue + if not _fp_fixture_path(entry).exists(): + continue + if "fast" in (entry.get("tags") or []): + return entry + for entry in _FP_FIXTURES: + if (entry["id"] not in _FP_INTENTIONAL_SKIPS + and _fp_fixture_path(entry).exists()): + return entry + return None + + +@pytest.mark.skipif(not _HAS_GPU, reason=_FP_GPU_UNAVAILABLE_REASON) +def test_gpu_backend_returns_cupy_array(): + """Sanity check: the gpu row returns a cupy-backed DataArray.""" + import cupy + entry = _fp_first_eligible_fixture() + if entry is None: + pytest.skip("no eligible fixture on disk") + da = _fp_read_gpu(_fp_fixture_path(entry), entry["id"]) + assert isinstance(da.data, cupy.ndarray), ( + f"gpu backend on fixture {entry['id']!r} returned " + f"{type(da.data).__name__}, expected cupy.ndarray" + ) + + +@pytest.mark.skipif(not _HAS_DASK, reason=_FP_DASK_UNAVAILABLE_REASON) +def test_dask_backend_returns_dask_array(): + """Sanity check: the dask_numpy row returns a dask-backed DataArray.""" + entry = _fp_first_eligible_fixture() + if entry is None: + pytest.skip("no eligible fixture on disk") + da = _fp_read_dask_numpy(_fp_fixture_path(entry), entry["id"]) + assert hasattr(da.data, "dask"), ( + f"dask_numpy backend on fixture {entry['id']!r} returned " + f"data of type {type(da.data).__name__}, expected a " + "dask-backed array." + ) + + +@pytest.mark.skipif( + not (_HAS_GPU and _HAS_DASK), + reason=( + f"{_FP_GPU_UNAVAILABLE_REASON} (or dask missing: " + f"{_FP_DASK_UNAVAILABLE_REASON})" + ), +) +def test_dask_gpu_backend_returns_dask_of_cupy(): + """Sanity check: the dask_gpu row returns a dask-graph-of-cupy DataArray.""" + import cupy + entry = _fp_first_eligible_fixture() + if entry is None: + pytest.skip("no eligible fixture on disk") + da = _fp_read_dask_gpu(_fp_fixture_path(entry), entry["id"]) + assert hasattr(da.data, "dask"), ( + f"dask_gpu backend on fixture {entry['id']!r} dropped the " + f"dask wrapping: data is {type(da.data).__name__}" + ) + meta = getattr(da.data, "_meta", None) + assert isinstance(meta, cupy.ndarray), ( + f"dask_gpu backend on fixture {entry['id']!r} carries a " + f"non-cupy chunk prototype: {type(meta).__name__}" + ) + + +# =========================================================================== +# Attrs-parity across backends (canonical attrs + key sets) +# =========================================================================== +# +# Two layers: +# +# 1. The canonical-attrs parity gate: every backend (eager, dask, GPU, +# dask-GPU, VRT) stamps the same canonical attrs for the same +# fixture, modulo documented backend-specific keys. +# 2. The pass-through TIFF tag parity: x_resolution, y_resolution, +# resolution_unit, image_description, extra_samples now agree +# across numpy / dask / cupy / dask+cupy on a TIFF that has those +# tags set. + +# Canonical fixture geometry shared by the attrs writers. +_AP_ORIGIN_X = -100.0 +_AP_ORIGIN_Y = 40.0 +_AP_PIXEL = 0.001 +_AP_CRS_EPSG = 4326 +_AP_HEIGHT = 32 +_AP_WIDTH = 32 + + +# Keys excluded from the cross-backend attrs comparison because they +# are documented as backend-specific: +# +# * ``vrt_holes`` is VRT-only. +# * ``nodata_pixels_present`` rides on eager + VRT paths but stays +# absent on dask paths (lazy would have to force compute). +# * TIFF tag pass-through attrs are VRT-omitted (the VRT carries no +# TIFF tags of its own). They are pinned for non-VRT backends in +# ``test_pass_through_tags_match_across_backends``. +_AP_BACKEND_SPECIFIC_KEYS = frozenset({ + 'vrt_holes', + 'nodata_pixels_present', + 'extra_tags', + 'image_description', + 'extra_samples', + 'gdal_metadata', + 'gdal_metadata_xml', + 'x_resolution', + 'y_resolution', + 'resolution_unit', + 'colormap', +}) + + +_AP_TRANSFORM_RTOL = 1e-9 +_AP_TRANSFORM_ATOL = 1e-9 + + +def _ap_coord_array(arr: np.ndarray) -> xr.DataArray: + """Wrap a 2-D ``arr`` with axis-aligned x/y coords and EPSG CRS.""" + assert arr.ndim == 2, "test fixtures only use 2-D arrays" + h, w = arr.shape + assert (h, w) == (_AP_HEIGHT, _AP_WIDTH), ( + "fixture geometry constants are out of sync with array shape" + ) + y = np.linspace(_AP_ORIGIN_Y, _AP_ORIGIN_Y - _AP_PIXEL * (h - 1), h) + x = np.linspace(_AP_ORIGIN_X, _AP_ORIGIN_X + _AP_PIXEL * (w - 1), w) + da = xr.DataArray(arr, dims=['y', 'x'], coords={'y': y, 'x': x}) + da.attrs['crs'] = _AP_CRS_EPSG + return da + + +def _ap_attrs_for_parity(attrs) -> dict: + """Drop backend-specific keys before comparing attrs across paths.""" + return {k: v for k, v in dict(attrs).items() + if k not in _AP_BACKEND_SPECIFIC_KEYS} + + +def _ap_attrs_close(a: dict, b: dict) -> bool: + """Compare attrs dicts, allowing tiny numeric drift in ``transform``.""" + if set(a.keys()) != set(b.keys()): + return False + for k, va in a.items(): + vb = b[k] + if k == 'transform' and isinstance(va, tuple) and isinstance(vb, tuple): + if len(va) != len(vb): + return False + for x, y in zip(va, vb): + if not np.isclose( + float(x), float(y), + rtol=_AP_TRANSFORM_RTOL, atol=_AP_TRANSFORM_ATOL, + ): + return False + else: + if va != vb: + return False + return True + + +@dataclass(frozen=True) +class _ApFixture: + """One row in the attrs-parity fixture set.""" + name: str + writer: Callable[[str], '_ApFixtureMeta'] + vrt_compatible: bool = True + + +@dataclass(frozen=True) +class _ApFixtureMeta: + """Layout facts the VRT helper needs to wrap the on-disk TIFF.""" + vrt_dtype: str + nodata: Any = None + + +def _ap_write_plain_float(path) -> _ApFixtureMeta: + arr = np.random.default_rng(seed=2227).random( + (_AP_HEIGHT, _AP_WIDTH)).astype(np.float32) + to_geotiff(_ap_coord_array(arr), path) + return _ApFixtureMeta(vrt_dtype='Float32') + + +def _ap_write_float_with_nodata(path) -> _ApFixtureMeta: + rng = np.random.default_rng(seed=2227) + arr = rng.random((_AP_HEIGHT, _AP_WIDTH)).astype(np.float32) + arr[0:4, 0:4] = -9999.0 + da = _ap_coord_array(arr) + da.attrs['nodata'] = -9999.0 + to_geotiff(da, path) + return _ApFixtureMeta(vrt_dtype='Float32', nodata=-9999.0) + + +def _ap_write_int_with_nodata(path) -> _ApFixtureMeta: + rng = np.random.default_rng(seed=2227) + arr = rng.integers(0, 1000, size=(_AP_HEIGHT, _AP_WIDTH), dtype=np.uint16) + arr[0:4, 0:4] = 65535 + da = _ap_coord_array(arr) + da.attrs['nodata'] = 65535 + to_geotiff(da, path) + return _ApFixtureMeta(vrt_dtype='UInt16', nodata=65535) + + +def _ap_write_uint8_no_nodata(path) -> _ApFixtureMeta: + rng = np.random.default_rng(seed=2227) + arr = rng.integers(0, 256, size=(_AP_HEIGHT, _AP_WIDTH), dtype=np.uint8) + to_geotiff(_ap_coord_array(arr), path) + return _ApFixtureMeta(vrt_dtype='Byte') + + +_AP_FIXTURES = ( + _ApFixture('plain_float', _ap_write_plain_float), + _ApFixture('float_with_nodata', _ap_write_float_with_nodata), + _ApFixture('int_with_nodata', _ap_write_int_with_nodata), + _ApFixture('uint8_no_nodata', _ap_write_uint8_no_nodata), +) + + +@dataclass(frozen=True) +class _ApBackend: + name: str + open_fn: Callable + available: bool = True + + +def _ap_open_eager(path): + return open_geotiff(path) + + +def _ap_open_dask(path): + return open_geotiff(path, chunks=16) + + +def _ap_open_gpu(path): + return open_geotiff(path, gpu=True) + + +def _ap_open_dask_gpu(path): + return open_geotiff(path, gpu=True, chunks=16) + + +def _ap_open_vrt(path, meta): + """Wrap the TIFF in a single-source VRT and read via ``read_vrt``. + + GDAL GeoTransform XML expects the upper-left CORNER as origin while + ``_ap_coord_array`` uses center-based coords, so the corner is + shifted by half a pixel here. + """ + import os + from pyproj import CRS + + height = _AP_HEIGHT + width = _AP_WIDTH + corner_x = _AP_ORIGIN_X - _AP_PIXEL / 2.0 + corner_y = _AP_ORIGIN_Y + _AP_PIXEL / 2.0 + geo_transform = ( + f"{corner_x:.6f}, {_AP_PIXEL:.6f}, 0.0, " + f"{corner_y:.6f}, 0.0, {-_AP_PIXEL:.6f}" + ) + crs_wkt = CRS.from_epsg(_AP_CRS_EPSG).to_wkt() + nodata_xml = (f"{meta.nodata}" + if meta.nodata is not None else '') + vrt_path = path + '.vrt' + abs_src = os.path.abspath(path) + xml = ( + f'' + f' {crs_wkt}' + f' {geo_transform}' + f' ' + f' {nodata_xml}' + f' ' + f' {abs_src}' + f' 1' + f' ' + f' ' + f' ' + f' ' + f'' + ) + with open(vrt_path, 'w') as f: + f.write(xml) + return read_vrt(vrt_path) + + +_AP_BACKENDS = ( + _ApBackend('eager_numpy', lambda path, meta: _ap_open_eager(path)), + _ApBackend('dask_numpy', lambda path, meta: _ap_open_dask(path)), + _ApBackend('gpu', lambda path, meta: _ap_open_gpu(path), available=_HAS_GPU), + _ApBackend('dask_gpu', lambda path, meta: _ap_open_dask_gpu(path), + available=_HAS_GPU), + _ApBackend('vrt', _ap_open_vrt), +) + + +_AP_AVAILABLE_BACKENDS = tuple(b for b in _AP_BACKENDS if b.available) + + +@pytest.mark.parametrize('fixture', _AP_FIXTURES, ids=lambda f: f.name) +def test_canonical_attrs_match_across_backends(tmp_path, fixture): + """Every backend stamps the same canonical attrs for the same fixture. + + The eager numpy path is the canonical reference. Documented + backend-specific keys (see ``_AP_BACKEND_SPECIFIC_KEYS``) are + carved out before the comparison. + """ + path = str(tmp_path / f'attrs_parity_{fixture.name}.tif') + meta = fixture.writer(path) + + baseline = _ap_attrs_for_parity(open_geotiff(path).attrs) + + divergences = {} + for backend in _AP_AVAILABLE_BACKENDS: + if backend.name == 'vrt' and not fixture.vrt_compatible: + continue + if backend.name == 'eager_numpy': + continue + try: + da = backend.open_fn(path, meta) + except Exception as exc: # pragma: no cover + divergences[backend.name] = f"open failed: {exc!r}" + continue + candidate = _ap_attrs_for_parity(da.attrs) + if not _ap_attrs_close(candidate, baseline): + only_in_baseline = { + k: baseline.get(k) for k in baseline + if baseline.get(k) != candidate.get(k) + } + only_in_candidate = { + k: candidate.get(k) for k in candidate + if candidate.get(k) != baseline.get(k) + } + divergences[backend.name] = { + 'baseline_diff': only_in_baseline, + 'candidate_diff': only_in_candidate, + } + + assert not divergences, ( + f"attrs diverged from eager-numpy baseline for fixture " + f"{fixture.name!r}:\n baseline: {baseline}\n diffs: {divergences}" + ) + + +@pytest.mark.parametrize('fixture', _AP_FIXTURES, ids=lambda f: f.name) +def test_canonical_attrs_keys_match_across_backends(tmp_path, fixture): + """Stronger contract: the set of canonical attr keys is identical.""" + path = str(tmp_path / f'attrs_parity_keys_{fixture.name}.tif') + meta = fixture.writer(path) + + baseline_keys = set(_ap_attrs_for_parity(open_geotiff(path).attrs).keys()) + + diffs = {} + for backend in _AP_AVAILABLE_BACKENDS: + if backend.name == 'vrt' and not fixture.vrt_compatible: + continue + if backend.name == 'eager_numpy': + continue + try: + da = backend.open_fn(path, meta) + except Exception as exc: # pragma: no cover + diffs[backend.name] = f"open failed: {exc!r}" + continue + keys = set(_ap_attrs_for_parity(da.attrs).keys()) + if keys != baseline_keys: + diffs[backend.name] = { + 'missing': sorted(baseline_keys - keys), + 'extra': sorted(keys - baseline_keys), + } + + assert not diffs, ( + f"canonical attrs keyset diverged from eager-numpy baseline for " + f"fixture {fixture.name!r}:\n baseline keys: {sorted(baseline_keys)}\n" + f" diffs: {diffs}" + ) + + +# Pass-through TIFF tag parity across the four core backends. +# +# Before the fix, the dask and cupy paths emitted a narrower attrs set +# than the eager numpy path. The fix factored a single helper that +# every backend calls; this section pins that contract. + +_AP_PASS_THROUGH_KEYS = ( + 'x_resolution', + 'y_resolution', + 'resolution_unit', + 'image_description', + 'extra_samples', +) + + +def _ap_write_tiff_with_pass_through_tags(path): + """Write a tiled 2-band float32 TIFF with the pass-through TIFF tags. + + Uses tifffile's first-class ``resolution`` / ``resolutionunit`` / + ``description`` kwargs. ``metadata=None`` suppresses tifffile's + auto-generated shape JSON in ImageDescription so the fixture + description survives. + """ + tifffile = pytest.importorskip("tifffile") + arr = np.random.default_rng(seed=1548).random( + (64, 64, 2)).astype(np.float32) + tifffile.imwrite( + path, arr, photometric='minisblack', planarconfig='contig', + tile=(32, 32), compression='deflate', + resolution=(300, 300), resolutionunit=2, + description='attrs parity fixture', + metadata=None, + ) + return arr + + +def _ap_attrs_subset(attrs, keys): + return {k: attrs.get(k) for k in keys} + + +def test_pass_through_tags_eager_numpy_baseline(tmp_path): + """Eager numpy is the canonical reference for the pass-through keys.""" + pytest.importorskip("tifffile") + path = str(tmp_path / 'pass_through_baseline.tif') + _ap_write_tiff_with_pass_through_tags(path) + + da = open_geotiff(path) + for key in _AP_PASS_THROUGH_KEYS: + assert key in da.attrs, ( + f"eager numpy is the canonical reference and should always " + f"emit '{key}'; got attrs={sorted(da.attrs.keys())}" + ) + + +def test_pass_through_tags_dask_matches_numpy(tmp_path): + """The dask read path emits the same pass-through attrs as numpy.""" + pytest.importorskip("tifffile") + path = str(tmp_path / 'pass_through_dask.tif') + _ap_write_tiff_with_pass_through_tags(path) + + np_da = open_geotiff(path) + dk_da = open_geotiff(path, chunks=32) + + np_subset = _ap_attrs_subset(np_da.attrs, _AP_PASS_THROUGH_KEYS) + dk_subset = _ap_attrs_subset(dk_da.attrs, _AP_PASS_THROUGH_KEYS) + + assert dk_subset == np_subset, ( + f"dask attrs diverge from numpy:\n" + f" numpy: {np_subset}\n" + f" dask : {dk_subset}" + ) + + +@_skip_no_gpu +def test_pass_through_tags_cupy_matches_numpy(tmp_path): + """Cupy / GPU read emits the same pass-through attrs as numpy.""" + pytest.importorskip("tifffile") + path = str(tmp_path / 'pass_through_cupy.tif') + _ap_write_tiff_with_pass_through_tags(path) + + np_da = open_geotiff(path) + gpu_da = open_geotiff(path, gpu=True) + + np_subset = _ap_attrs_subset(np_da.attrs, _AP_PASS_THROUGH_KEYS) + gpu_subset = _ap_attrs_subset(gpu_da.attrs, _AP_PASS_THROUGH_KEYS) + + assert gpu_subset == np_subset, ( + f"cupy attrs diverge from numpy:\n" + f" numpy: {np_subset}\n" + f" cupy : {gpu_subset}" + ) + + +@_skip_no_gpu +def test_pass_through_tags_dask_cupy_matches_numpy(tmp_path): + """Combined dask+cupy read still emits the pass-through attrs.""" + pytest.importorskip("tifffile") + path = str(tmp_path / 'pass_through_dask_cupy.tif') + _ap_write_tiff_with_pass_through_tags(path) + + np_da = open_geotiff(path) + combined = open_geotiff(path, gpu=True, chunks=32) + + np_subset = _ap_attrs_subset(np_da.attrs, _AP_PASS_THROUGH_KEYS) + combined_subset = _ap_attrs_subset(combined.attrs, _AP_PASS_THROUGH_KEYS) + + assert combined_subset == np_subset, ( + f"dask+cupy attrs diverge from numpy:\n" + f" numpy : {np_subset}\n" + f" dask+cupy : {combined_subset}" + ) + + +def test_pass_through_tags_all_backend_keysets_equal(tmp_path): + """The full set of attrs keys is identical across available backends. + + Guards against a future read path silently dropping a different + attr that no per-key test happens to cover. + """ + pytest.importorskip("tifffile") + path = str(tmp_path / 'pass_through_keysets.tif') + _ap_write_tiff_with_pass_through_tags(path) + + np_keys = set(open_geotiff(path).attrs.keys()) + dk_keys = set(open_geotiff(path, chunks=32).attrs.keys()) + + backend_keys = {'numpy': np_keys, 'dask+numpy': dk_keys} + if _HAS_GPU: + backend_keys['cupy'] = set(open_geotiff(path, gpu=True).attrs.keys()) + backend_keys['dask+cupy'] = set( + open_geotiff(path, gpu=True, chunks=32).attrs.keys()) + + differences = { + name: keys ^ np_keys + for name, keys in backend_keys.items() + if keys != np_keys + } + assert not differences, ( + f"backend attrs keysets diverge from numpy:\n" + f" numpy keys: {sorted(np_keys)}\n" + f" diffs : " + + "\n ".join( + f"{name}: symmetric_diff={sorted(diff)}" + for name, diff in differences.items() + ) + ) diff --git a/xrspatial/geotiff/tests/test_backend_pixel_parity_matrix_1813.py b/xrspatial/geotiff/tests/parity/test_pixel_equality.py similarity index 57% rename from xrspatial/geotiff/tests/test_backend_pixel_parity_matrix_1813.py rename to xrspatial/geotiff/tests/parity/test_pixel_equality.py index 4069894c4..d51374284 100644 --- a/xrspatial/geotiff/tests/test_backend_pixel_parity_matrix_1813.py +++ b/xrspatial/geotiff/tests/parity/test_pixel_equality.py @@ -1,20 +1,29 @@ -"""End-to-end pixel-byte-parity matrix across read backends and entry points. - -Locks in the no-regression contract for issue #1813's multi-PR refactor of -``xrspatial/geotiff/__init__.py``. Every read entry point (``open_geotiff``, -``read_geotiff_dask``, ``read_geotiff_gpu``, ``read_vrt``) must produce -byte-identical pixels, bitwise-equal coords, and matching ``attrs`` across -the four backends (numpy, dask+numpy, cupy, dask+cupy) for a representative -matrix of dtypes, compressions, and layouts (stripped, tiled, COG, -BigTIFF, MinIsWhite, VRT). - -When a subsequent PR in #1813 moves an entry-point body to a new module, -this matrix is the first thing that breaks if the move drops a kwarg, -inverts a photometric, or reorders an attrs-population step. +"""Strict pixel-equality and kwarg-threading parity across read backends. + +The strictest mode of backend parity. Three concerns share the file +because they fail in the same ways: + +* Pixel-byte parity across (numpy / dask+numpy / cupy / dask+cupy) on a + representative dtype + compression + layout matrix, plus VRT, COG, + BigTIFF, and MinIsWhite fixtures. +* Cross-entry-point parity: ``read_geotiff_dask``, ``read_geotiff_gpu``, + and ``read_vrt`` agree with ``open_geotiff`` for the same source. +* Kwarg threading through the dispatcher: ``open_geotiff`` and + ``to_geotiff`` forward window / band / max_pixels / tiled / etc. to + the backend-specific entry points instead of silently dropping them. +* MinIsWhite photometric handling: local / dask / HTTP / GPU read + paths agree on the inverted pixel domain. + +If a refactor moves an entry-point body to a new module, this file is +the first thing that breaks when the move drops a kwarg, inverts a +photometric, or reorders an attrs-population step. """ from __future__ import annotations +import http.server import importlib.util +import socketserver +import threading from pathlib import Path import numpy as np @@ -24,22 +33,13 @@ from xrspatial.geotiff import (open_geotiff, read_geotiff_dask, read_geotiff_gpu, read_vrt, to_geotiff, write_vrt) +from .._helpers.markers import gpu_available + # --------------------------------------------------------------------------- # Environment gating # --------------------------------------------------------------------------- - -def _gpu_available() -> bool: - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _gpu_available() +_HAS_GPU = gpu_available() _HAS_TIFFFILE = importlib.util.find_spec("tifffile") is not None _skip_no_gpu = pytest.mark.skipif(not _HAS_GPU, reason="cupy + CUDA required") @@ -396,3 +396,262 @@ def test_fixture_builders_produce_readable_files(fixture_factory, fix_id): da = open_geotiff(str(path)) assert da.ndim in (2, 3) assert da.size > 0 + + +# =========================================================================== +# Kwarg threading through the dispatcher (originally test_backend_kwarg_parity) +# =========================================================================== +# +# ``open_geotiff`` and ``to_geotiff`` route to backend-specific entry +# points (``read_geotiff_dask``, ``write_geotiff_gpu``) whose kwarg sets +# were narrower than the dispatcher's. The dispatcher silently dropped +# the missing kwargs when it routed to the smaller-API backend; the +# fix pins them through. These tests gate that contract. + + +@pytest.fixture +def small_tiff_path(tmp_path): + """4x6 single-band tiled tiff with a small CRS+transform.""" + arr = np.arange(24, dtype=np.float32).reshape(4, 6) + da = xr.DataArray( + arr, + dims=['y', 'x'], + coords={ + 'y': np.array([0.5, 1.5, 2.5, 3.5]), + 'x': np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5]), + }, + attrs={'crs': 4326}, + ) + p = tmp_path / 'kwarg_parity_small.tif' + to_geotiff(da, str(p), tile_size=16) + return str(p), arr + + +@pytest.fixture +def small_multiband_tiff_path(tmp_path): + """4x6 three-band tiled tiff.""" + arr = np.arange(72, dtype=np.float32).reshape(4, 6, 3) + da = xr.DataArray( + arr, + dims=['y', 'x', 'band'], + coords={ + 'y': np.array([0.5, 1.5, 2.5, 3.5]), + 'x': np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5]), + 'band': [0, 1, 2], + }, + attrs={'crs': 4326}, + ) + p = tmp_path / 'kwarg_parity_mb.tif' + to_geotiff(da, str(p), tile_size=16) + return str(p), arr + + +def test_read_geotiff_dask_window_clips_region(small_tiff_path): + """``window=`` restricts the lazy region; chunks span only the window.""" + path, arr = small_tiff_path + da = read_geotiff_dask(path, chunks=2, window=(1, 2, 4, 6)) + assert da.shape == (3, 4) + np.testing.assert_array_equal(da.values, arr[1:4, 2:6]) + + +def test_read_geotiff_dask_window_via_dispatcher(small_tiff_path): + """``open_geotiff(window=..., chunks=...)`` keeps the window.""" + path, arr = small_tiff_path + da = open_geotiff(path, window=(0, 1, 3, 4), chunks=2) + assert da.shape == (3, 3) + np.testing.assert_array_equal(da.values, arr[0:3, 1:4]) + + +def test_read_geotiff_dask_band_selects_single_band(small_multiband_tiff_path): + """``band=`` produces a 2D DataArray with the selected band.""" + path, arr = small_multiband_tiff_path + da = read_geotiff_dask(path, chunks=4, band=1) + assert da.ndim == 2 + np.testing.assert_array_equal(da.values, arr[:, :, 1]) + + +def test_read_geotiff_dask_band_via_dispatcher(small_multiband_tiff_path): + """``open_geotiff(band=..., chunks=...)`` keeps the band.""" + path, arr = small_multiband_tiff_path + da = open_geotiff(path, band=2, chunks=4) + assert da.ndim == 2 + np.testing.assert_array_equal(da.values, arr[:, :, 2]) + + +def test_read_geotiff_dask_max_pixels_rejects_oversized(small_tiff_path): + """``max_pixels=`` rejects the windowed region up front.""" + path, _ = small_tiff_path + with pytest.raises(ValueError, match="exceeds max_pixels"): + read_geotiff_dask(path, chunks=2, max_pixels=10) + + +def test_read_geotiff_dask_window_band_combined(small_multiband_tiff_path): + """``window`` and ``band`` cooperate.""" + path, arr = small_multiband_tiff_path + da = read_geotiff_dask(path, chunks=2, window=(1, 1, 4, 5), band=0) + assert da.shape == (3, 4) + np.testing.assert_array_equal(da.values, arr[1:4, 1:5, 0]) + + +def test_read_geotiff_dask_invalid_window_raises(small_tiff_path): + """Out-of-bounds windows fail loudly instead of silently clipping.""" + path, _ = small_tiff_path + with pytest.raises(ValueError, match="window=.* is outside"): + read_geotiff_dask(path, chunks=2, window=(0, 0, 100, 100)) + + +def test_read_geotiff_dask_invalid_band_raises(small_multiband_tiff_path): + """Out-of-range band indexes fail with IndexError.""" + path, _ = small_multiband_tiff_path + with pytest.raises(IndexError, match="band=5 out of range"): + read_geotiff_dask(path, chunks=4, band=5) + + +def test_write_geotiff_gpu_rejects_tiled_false(tmp_path): + """The GPU writer is tiled-only; ``tiled=False`` must fail loudly.""" + from xrspatial.geotiff import write_geotiff_gpu + + dummy = np.zeros((2, 2), dtype=np.float32) + with pytest.raises(ValueError, match="tiled=True"): + write_geotiff_gpu(dummy, str(tmp_path / 'never.tif'), tiled=False) + + +def test_write_geotiff_gpu_rejects_nonzero_max_z_error(tmp_path): + """LERC budget is not implementable on the GPU path.""" + from xrspatial.geotiff import write_geotiff_gpu + + dummy = np.zeros((2, 2), dtype=np.float32) + with pytest.raises(ValueError, match="max_z_error is not supported"): + write_geotiff_gpu(dummy, str(tmp_path / 'never.tif'), max_z_error=1.0) + + +@_skip_no_gpu +def test_write_geotiff_gpu_accepts_streaming_buffer_bytes_as_noop(tmp_path): + """``streaming_buffer_bytes`` is accepted for API parity (no-op).""" + import cupy + + from xrspatial.geotiff import write_geotiff_gpu + + arr = cupy.arange(16, dtype=cupy.float32).reshape(4, 4) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': np.arange(4, dtype=np.float64), + 'x': np.arange(4, dtype=np.float64)}) + p = tmp_path / 'kwarg_parity_streaming.tif' + write_geotiff_gpu(da, str(p), streaming_buffer_bytes=4096, tile_size=16) + rd = open_geotiff(str(p)) + np.testing.assert_array_equal(rd.values, arr.get()) + + +@_skip_no_gpu +def test_to_geotiff_threads_tiled_false_into_gpu_dispatcher(tmp_path): + """``to_geotiff(..., gpu=True, tiled=False)`` rejects, no silent flip.""" + import cupy + + arr = cupy.zeros((2, 2), dtype=cupy.float32) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': [0.0, 1.0], 'x': [0.0, 1.0]}) + with pytest.raises(ValueError, match="tiled=False"): + to_geotiff(da, str(tmp_path / 'never.tif'), + gpu=True, tiled=False) + + +# =========================================================================== +# MinIsWhite photometric backend parity (originally test_miniswhite_backend_parity) +# =========================================================================== +# +# MinIsWhite (photometric=0) inverts pixel intensity at decode time. +# The local, dask, HTTP, and GPU read paths must all agree on the +# inverted pixel domain. + + +class _MwRangeHandler(http.server.BaseHTTPRequestHandler): + payload: bytes = b'' + + def do_GET(self): # noqa: N802 + rng = self.headers.get('Range') + if rng and rng.startswith('bytes='): + spec = rng[len('bytes='):] + start_s, _, end_s = spec.partition('-') + start = int(start_s) + end = int(end_s) if end_s else len(self.payload) - 1 + chunk = self.payload[start:end + 1] + self.send_response(206) + self.send_header('Content-Type', 'application/octet-stream') + self.send_header( + 'Content-Range', + f'bytes {start}-{start + len(chunk) - 1}/{len(self.payload)}', + ) + self.send_header('Content-Length', str(len(chunk))) + self.end_headers() + self.wfile.write(chunk) + return + self.send_response(200) + self.send_header('Content-Type', 'application/octet-stream') + self.send_header('Content-Length', str(len(self.payload))) + self.end_headers() + self.wfile.write(self.payload) + + def log_message(self, *_args, **_kwargs): + pass + + +def _mw_serve(payload: bytes): + handler_cls = type( + 'MwRangeHandler', (_MwRangeHandler,), {'payload': payload} + ) + httpd = socketserver.TCPServer(('127.0.0.1', 0), handler_cls) + port = httpd.server_address[1] + thread = threading.Thread(target=httpd.serve_forever, daemon=True) + thread.start() + return httpd, port + + +@pytest.fixture +def miniswhite_http_url(tmp_path, monkeypatch): + tifffile = pytest.importorskip("tifffile") + monkeypatch.setenv('XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS', '1') + stored = np.array([[0, 1, 2], [10, 128, 255]], dtype=np.uint8) + path = tmp_path / "tmp_miniswhite.tif" + tifffile.imwrite(str(path), stored, photometric='miniswhite') + httpd, port = _mw_serve(path.read_bytes()) + try: + yield f'http://127.0.0.1:{port}/tmp_miniswhite.tif', stored + finally: + httpd.shutdown() + httpd.server_close() + + +def test_miniswhite_http_matches_local_reader(miniswhite_http_url): + """HTTP read of a MinIsWhite TIFF returns the inverted pixel domain.""" + url, stored = miniswhite_http_url + got = open_geotiff(url) + np.testing.assert_array_equal(got.values, np.iinfo(stored.dtype).max - stored) + + +def test_miniswhite_http_dask_matches_local_reader(miniswhite_http_url): + """Dask HTTP read agrees with the eager HTTP read on inversion.""" + url, stored = miniswhite_http_url + got = open_geotiff(url, chunks=2).compute() + np.testing.assert_array_equal(got.values, np.iinfo(stored.dtype).max - stored) + + +@_skip_no_gpu +def test_miniswhite_gpu_matches_cpu_reader(tmp_path): + """GPU MinIsWhite read agrees with the CPU reader. + + The writer pre-inverts MinIsWhite pixels so the reader's + unconditional inversion restores the user-domain values; the + round-trip is the identity for both backends. + """ + from xrspatial.geotiff._writer import write + + stored = np.array([[0, 1, 2], [10, 128, 255]], dtype=np.uint8) + path = str(tmp_path / "tmp_miniswhite_gpu.tif") + write(stored, path, compression='deflate', tiled=True, tile_size=16, + photometric='miniswhite') + + cpu = open_geotiff(path) + gpu = open_geotiff(path, gpu=True) + + np.testing.assert_array_equal(cpu.values, stored) + np.testing.assert_array_equal(gpu.data.get(), cpu.values) diff --git a/xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py b/xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py deleted file mode 100644 index 4db88fcad..000000000 --- a/xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py +++ /dev/null @@ -1,471 +0,0 @@ -"""Table-driven attrs parity across all read backends (PR-D of #2211). - -After PRs #2200, #2205, #2207, #2209 from epic #2162, every read -backend routes its attrs assembly through one of two finalization -entry points in ``xrspatial.geotiff._attrs``: - -* :func:`_finalize_eager_read` for the eager numpy + GPU paths. -* :func:`_finalize_lazy_read_attrs` for the dask + dask-GPU + VRT - paths. - -PR-D of #2211 closes the loop on issue #2227 by removing the last -post-helper ``attrs[k] = v`` writes from the backends (the VRT eager -path's ``nodata_pixels_present`` stamp now rides through the helper's -``pixels_present`` kwarg). The test below pins the resulting contract: -for the same on-disk fixture, every backend that handles a given -read emits the same canonical attrs. - -The test is parametrized over a small fixture matrix (no nodata, -float sentinel, integer sentinel, uint8 no-nodata) and over the -backends available on the runner (eager numpy is always present; -dask+numpy ditto; GPU + dask+GPU only when CuPy + CUDA are usable; -VRT exercised via a tiny wrapper that points at the underlying TIFF). -A small set of backend-specific keys is excluded from the comparison: - -* The VRT path intentionally omits TIFF tag pass-through attrs: - ``extra_tags``, ``image_description``, ``extra_samples``, - ``colormap``, ``gdal_metadata``, ``gdal_metadata_xml``, - ``x_resolution``, ``y_resolution``, and ``resolution_unit``. - The VRT carries no TIFF tags of its own; the non-VRT path - documented those keys as TIFF-only. -* The VRT path adds ``vrt_holes`` on missing-source reads. -* ``nodata_pixels_present`` rides on the eager + VRT paths via a - one-pass scan but stays absent on dask paths (issue #2135). - -These exclusions are encoded in :data:`_BACKEND_SPECIFIC_KEYS` so a -future migration that promotes one of those keys to all-backends has -a single place to remove the carve-out. -""" -from __future__ import annotations - -import importlib.util -from dataclasses import dataclass -from typing import Any, Callable - -import numpy as np -import pytest -import xarray as xr - -from xrspatial.geotiff import open_geotiff, read_vrt, to_geotiff - -tifffile = pytest.importorskip("tifffile") - - -def _coord_array(arr: np.ndarray) -> xr.DataArray: - """Wrap a 2-D ``arr`` in a DataArray with axis-aligned x/y coords + CRS. - - ``to_geotiff`` stamps the TIFF GeoKey set when the source DataArray - has y/x coords and ``attrs['crs']``. Using the writer for the - fixtures keeps the GeoKey emission identical to a real read/write - round-trip so the test exercises the same code path users hit. - - Only 2-D arrays are supported because every fixture in this file - is single-band. A multi-band fixture would also need a ``band`` - coord and the VRT helper would need per-band nodata bookkeeping; - that is out of scope for the canonical-attrs parity assertion. - """ - assert arr.ndim == 2, "test fixtures only use 2-D arrays" - h, w = arr.shape - assert (h, w) == (_FIX_HEIGHT, _FIX_WIDTH), ( - "fixture geometry constants are out of sync with array shape; " - "update _FIX_HEIGHT / _FIX_WIDTH together with the fixture writers" - ) - y = np.linspace(_FIX_ORIGIN_Y, _FIX_ORIGIN_Y - _FIX_PIXEL * (h - 1), h) - x = np.linspace(_FIX_ORIGIN_X, _FIX_ORIGIN_X + _FIX_PIXEL * (w - 1), w) - da = xr.DataArray( - arr, dims=['y', 'x'], coords={'y': y, 'x': x}, - ) - da.attrs['crs'] = _FIX_CRS_EPSG - return da - - -def _gpu_available() -> bool: - """Return True iff cupy is importable and the runtime sees a CUDA device.""" - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _gpu_available() - - -# Canonical fixture geometry. Both the TIFF writer (via ``_coord_array``) -# and the VRT wrapper read from these constants so the VRT helper does -# not have to call ``open_geotiff`` to discover the on-disk transform. -# A silent drift in the eager read path therefore cannot propagate -# into the VRT helper. -_FIX_ORIGIN_X = -100.0 -_FIX_ORIGIN_Y = 40.0 -_FIX_PIXEL = 0.001 -_FIX_CRS_EPSG = 4326 -_FIX_HEIGHT = 32 -_FIX_WIDTH = 32 - - -# Keys to drop before comparing attrs across backends. Each key is -# documented as backend-specific in the attrs contract (issue #1984) -# and the surrounding modules: -# -# * ``vrt_holes`` -- VRT-only, populated from skipped sources at decode -# time. Plain TIFF reads never see it. -# * ``nodata_pixels_present`` -- emitted by the eager + VRT paths after -# a one-pass scan but absent on dask paths (#2135). The dask backends -# would have to force ``.compute()`` to produce it, breaking lazy. -# * TIFF tag pass-through attrs -- the VRT path documents these as -# omitted because the VRT carries no TIFF tags of its own. They are -# pinned in ``test_attrs_parity_1548`` for the non-VRT backends. -_BACKEND_SPECIFIC_KEYS = frozenset({ - 'vrt_holes', - 'nodata_pixels_present', - 'extra_tags', - 'image_description', - 'extra_samples', - 'gdal_metadata', - 'gdal_metadata_xml', - 'x_resolution', - 'y_resolution', - 'resolution_unit', - 'colormap', -}) - - -def _attrs_for_parity(attrs) -> dict: - """Drop backend-specific keys before comparing attrs across paths.""" - return {k: v for k, v in dict(attrs).items() - if k not in _BACKEND_SPECIFIC_KEYS} - - -# Tolerance for the lone numeric key that needs one (``transform``). -# The VRT writer emits the geo-transform as ``%.6f`` ASCII (GDAL's own -# convention) while the TIFF writer keeps the original float64 values, -# so the same logical transform comes back as ``0.001`` vs -# ``0.0010000000000047748``. The diff sits well below GDAL's own -# rounding step and below any sane pixel-size tolerance. -_TRANSFORM_RTOL = 1e-9 -_TRANSFORM_ATOL = 1e-9 - - -def _attrs_close(a: dict, b: dict) -> bool: - """Compare attrs dicts, allowing tiny numeric drift in ``transform``. - - All keys other than ``transform`` must match exactly. Only the - ``transform`` 6-tuple is compared with a tolerance (see - :data:`_TRANSFORM_RTOL` / :data:`_TRANSFORM_ATOL`). - """ - if set(a.keys()) != set(b.keys()): - return False - for k, va in a.items(): - vb = b[k] - if k == 'transform' and isinstance(va, tuple) and isinstance(vb, tuple): - if len(va) != len(vb): - return False - for x, y in zip(va, vb): - if not np.isclose( - float(x), float(y), - rtol=_TRANSFORM_RTOL, atol=_TRANSFORM_ATOL, - ): - return False - else: - if va != vb: - return False - return True - - -# --------------------------------------------------------------------- -# Fixture writers -# --------------------------------------------------------------------- - -@dataclass(frozen=True) -class _Fixture: - """One row in the parity matrix. - - ``writer`` materializes the TIFF on disk and returns a - :class:`_FixtureMeta` describing the on-disk layout (dtype, - declared nodata sentinel). The VRT helper consumes the meta - directly rather than re-deriving it from a TIFF read, which - keeps the VRT wrapper independent of the eager read path. - """ - name: str - writer: Callable[[str], '_FixtureMeta'] - vrt_compatible: bool = True - - -@dataclass(frozen=True) -class _FixtureMeta: - """Layout facts the VRT helper needs to wrap the on-disk TIFF.""" - vrt_dtype: str # GDAL VRT DataType label ('Float32', 'UInt16', ...) - nodata: Any = None - - -def _write_plain_float(path) -> _FixtureMeta: - """Plain float32 TIFF: no nodata, axis-aligned transform, EPSG:4326.""" - arr = np.random.default_rng(seed=2227).random( - (32, 32)).astype(np.float32) - to_geotiff(_coord_array(arr), path) - return _FixtureMeta(vrt_dtype='Float32') - - -def _write_float_with_nodata(path) -> _FixtureMeta: - """Float32 TIFF with a declared sentinel (-9999.0). Some pixels match.""" - rng = np.random.default_rng(seed=2227) - arr = rng.random((32, 32)).astype(np.float32) - arr[0:4, 0:4] = -9999.0 - da = _coord_array(arr) - da.attrs['nodata'] = -9999.0 - to_geotiff(da, path) - return _FixtureMeta(vrt_dtype='Float32', nodata=-9999.0) - - -def _write_int_with_nodata(path) -> _FixtureMeta: - """uint16 TIFF with a representable sentinel pixel.""" - rng = np.random.default_rng(seed=2227) - arr = rng.integers(0, 1000, size=(32, 32), dtype=np.uint16) - arr[0:4, 0:4] = 65535 - da = _coord_array(arr) - da.attrs['nodata'] = 65535 - to_geotiff(da, path) - return _FixtureMeta(vrt_dtype='UInt16', nodata=65535) - - -def _write_uint8_no_nodata(path) -> _FixtureMeta: - """uint8 photometric MinIsBlack baseline (no MinIsWhite via to_geotiff). - - ``to_geotiff`` does not currently expose a MinIsWhite kwarg, so the - fixture is a uint8 with photometric=MinIsBlack. The MinIsWhite - branch of ``_finalize_eager_read``'s sentinel resolution is - covered by ``test_eager_finalization_parity_2162``; here we just - exercise the uint8 dtype against the canonical schema. - """ - rng = np.random.default_rng(seed=2227) - arr = rng.integers(0, 256, size=(32, 32), dtype=np.uint8) - to_geotiff(_coord_array(arr), path) - return _FixtureMeta(vrt_dtype='Byte') - - -_FIXTURES = ( - _Fixture('plain_float', _write_plain_float), - _Fixture('float_with_nodata', _write_float_with_nodata), - _Fixture('int_with_nodata', _write_int_with_nodata), - _Fixture('uint8_no_nodata', _write_uint8_no_nodata), -) - - -# --------------------------------------------------------------------- -# Backend table -# --------------------------------------------------------------------- - -@dataclass(frozen=True) -class _Backend: - name: str - open_fn: Callable - available: bool = True - - -def _open_eager(path): - return open_geotiff(path) - - -def _open_dask(path): - return open_geotiff(path, chunks=16) - - -def _open_gpu(path): - return open_geotiff(path, gpu=True) - - -def _open_dask_gpu(path): - return open_geotiff(path, gpu=True, chunks=16) - - -def _open_vrt(path, meta): - """Wrap the TIFF in a single-source VRT and read it via ``read_vrt``. - - Building a one-liner VRT on the fly lets the test exercise the - VRT eager backend through the same finalization helper without - needing a hand-written VRT fixture per case. ``meta`` carries the - on-disk dtype and nodata sentinel that the fixture writer just - used; the geometry, CRS, and pixel size come from the module-level - fixture constants. Both inputs are derived from the same source of - truth as ``_coord_array`` (via ``to_geotiff``), so a silent drift - in :func:`open_geotiff` cannot propagate into this helper. - """ - import os - - from pyproj import CRS - - height = _FIX_HEIGHT - width = _FIX_WIDTH - - # GDAL GeoTransform XML wants (origin_x, pixel_width, row_skew, - # origin_y, col_skew, pixel_height). y axis decreases. - # - # The TIFF writer follows the RasterPixelIsArea convention and - # stamps the upper-left CORNER as the origin, while the fixture - # constants name the upper-left CENTER (matching ``_coord_array``, - # which uses center-based y/x coords). Shift by half a pixel here - # so the VRT-side transform matches the TIFF-side transform - # within ``_TRANSFORM_RTOL``. - corner_x = _FIX_ORIGIN_X - _FIX_PIXEL / 2.0 - corner_y = _FIX_ORIGIN_Y + _FIX_PIXEL / 2.0 - geo_transform = ( - f"{corner_x:.6f}, {_FIX_PIXEL:.6f}, 0.0, " - f"{corner_y:.6f}, 0.0, {-_FIX_PIXEL:.6f}" - ) - crs_wkt = CRS.from_epsg(_FIX_CRS_EPSG).to_wkt() - - nodata_xml = (f"{meta.nodata}" - if meta.nodata is not None else '') - - vrt_path = path + '.vrt' - abs_src = os.path.abspath(path) - xml = ( - f'' - f' {crs_wkt}' - f' {geo_transform}' - f' ' - f' {nodata_xml}' - f' ' - f' {abs_src}' - f' 1' - f' ' - f' ' - f' ' - f' ' - f'' - ) - with open(vrt_path, 'w') as f: - f.write(xml) - return read_vrt(vrt_path) - - -_BACKENDS = ( - _Backend('eager_numpy', lambda path, meta: _open_eager(path)), - _Backend('dask_numpy', lambda path, meta: _open_dask(path)), - _Backend('gpu', lambda path, meta: _open_gpu(path), available=_HAS_GPU), - _Backend('dask_gpu', lambda path, meta: _open_dask_gpu(path), - available=_HAS_GPU), - _Backend('vrt', _open_vrt), -) - - -_AVAILABLE_BACKENDS = tuple(b for b in _BACKENDS if b.available) - - -# --------------------------------------------------------------------- -# The parity test -# --------------------------------------------------------------------- - -@pytest.mark.parametrize('fixture', _FIXTURES, ids=lambda f: f.name) -def test_canonical_attrs_match_across_backends(tmp_path, fixture): - """Every backend stamps the same canonical attrs for the same fixture. - - The eager numpy path is the canonical reference. For each fixture - we open the file via every available backend (skipping the VRT - backend on fixtures whose wrapper cannot model them) and assert - the comparable attrs (canonical contract minus the documented - backend-specific carve-outs) match the eager-numpy baseline. - - Any divergence here means a backend has slipped out of lockstep - with the finalization helpers in ``_attrs.py``. The expected fix - is to route that backend through the helper rather than papering - over the diff with a new entry in :data:`_BACKEND_SPECIFIC_KEYS`. - """ - path = str(tmp_path / f'parity_2227_{fixture.name}.tif') - meta = fixture.writer(path) - - baseline = _attrs_for_parity(open_geotiff(path).attrs) - - divergences = {} - for backend in _AVAILABLE_BACKENDS: - if backend.name == 'vrt' and not fixture.vrt_compatible: - continue - if backend.name == 'eager_numpy': - # Already the baseline; comparing it to itself adds noise. - continue - try: - da = backend.open_fn(path, meta) - except Exception as exc: # pragma: no cover - surfaced via the assert - divergences[backend.name] = f"open failed: {exc!r}" - continue - candidate = _attrs_for_parity(da.attrs) - if not _attrs_close(candidate, baseline): - only_in_baseline = { - k: baseline.get(k) for k in baseline - if baseline.get(k) != candidate.get(k) - } - only_in_candidate = { - k: candidate.get(k) for k in candidate - if candidate.get(k) != baseline.get(k) - } - divergences[backend.name] = { - 'baseline_diff': only_in_baseline, - 'candidate_diff': only_in_candidate, - } - - assert not divergences, ( - f"attrs diverged from eager-numpy baseline for fixture " - f"{fixture.name!r}:\n baseline: {baseline}\n diffs: {divergences}" - ) - - -@pytest.mark.parametrize('fixture', _FIXTURES, ids=lambda f: f.name) -def test_canonical_attrs_keys_match_across_backends(tmp_path, fixture): - """Stronger contract: the set of canonical attr keys is identical. - - Even when values match the per-key comparison above, a backend - that silently *drops* a key from the canonical set would slip - through if the value happens to be ``None``. This check pins the - keyset so a future regression that omits ``georef_status`` or - ``crs_wkt`` from one backend surfaces immediately. - """ - path = str(tmp_path / f'parity_2227_keys_{fixture.name}.tif') - meta = fixture.writer(path) - - baseline_keys = set(_attrs_for_parity(open_geotiff(path).attrs).keys()) - - diffs = {} - for backend in _AVAILABLE_BACKENDS: - if backend.name == 'vrt' and not fixture.vrt_compatible: - continue - if backend.name == 'eager_numpy': - continue - try: - da = backend.open_fn(path, meta) - except Exception as exc: # pragma: no cover - diffs[backend.name] = f"open failed: {exc!r}" - continue - keys = set(_attrs_for_parity(da.attrs).keys()) - if keys != baseline_keys: - diffs[backend.name] = { - 'missing': sorted(baseline_keys - keys), - 'extra': sorted(keys - baseline_keys), - } - - assert not diffs, ( - f"canonical attrs keyset diverged from eager-numpy baseline for " - f"fixture {fixture.name!r}:\n baseline keys: {sorted(baseline_keys)}\n" - f" diffs: {diffs}" - ) - - -def test_backend_specific_keys_carveout_is_documented(): - """Sanity: every key in the carve-out is documented in this module. - - The module docstring lists which keys are backend-specific and - therefore excluded from the parity assertions. The carve-out and - the docstring drift apart easily; the check here is a string scan - so a future maintainer who adds a key to the frozenset has to - update the docstring too. - """ - module_doc = __doc__ or '' - missing = [k for k in _BACKEND_SPECIFIC_KEYS if k not in module_doc] - assert not missing, ( - f"keys in _BACKEND_SPECIFIC_KEYS are not mentioned in the module " - f"docstring: {missing}" - ) diff --git a/xrspatial/geotiff/tests/test_attrs_parity_1548.py b/xrspatial/geotiff/tests/test_attrs_parity_1548.py deleted file mode 100644 index 15a104eb5..000000000 --- a/xrspatial/geotiff/tests/test_attrs_parity_1548.py +++ /dev/null @@ -1,186 +0,0 @@ -"""4-backend attrs parity tests for issue #1548. - -Before the fix, ``open_geotiff`` returned a different ``attrs`` set -depending on which backend handled the read: - -* numpy (eager): full set, including ``x_resolution``, ``y_resolution``, - ``resolution_unit``, ``extra_tags``, ``image_description``, - ``extra_samples``. -* dask: only ``crs``, ``transform``, ``raster_type``, ``nodata``. -* cupy / dask+cupy: only ``crs``, ``crs_wkt``, ``transform``, ``nodata``. - -The fix factors a single ``_populate_attrs_from_geo_info`` helper and -calls it from every read path, so all four backends now emit the same -keys with the same values for the same input file. - -These tests pin that contract. -""" -from __future__ import annotations - -import importlib.util - -import numpy as np -import pytest - -from xrspatial.geotiff import open_geotiff - -tifffile = pytest.importorskip("tifffile") - - -def _gpu_available() -> bool: - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _gpu_available() -_gpu_only = pytest.mark.skipif(not _HAS_GPU, reason="cupy + CUDA required") - - -def _write_tiff_with_pass_through_tags(path): - """Write a tiled 2-band float32 TIFF that exercises the pass-through - TIFF tags called out in the issue. - - Uses tifffile's first-class ``resolution`` / ``resolutionunit`` / - ``description`` kwargs (its preferred path; ``extratags`` would be - silently dropped for the resolution rationals). ``metadata=None`` - suppresses tifffile's auto-generated shape JSON in ImageDescription - so the fixture description survives. ``ExtraSamples`` (338) is - auto-derived from the 2-band layout. - """ - arr = np.random.default_rng(seed=1548).random( - (64, 64, 2)).astype(np.float32) - tifffile.imwrite( - path, arr, photometric='minisblack', planarconfig='contig', - tile=(32, 32), compression='deflate', - resolution=(300, 300), resolutionunit=2, - description='issue-1548 parity fixture', - metadata=None, - ) - return arr - - -def _attrs_subset(attrs, keys): - """Return a dict containing only the requested attr keys. - - This helper performs a simple ``attrs.get`` lookup for each key and - does not special-case any values. - """ - return {k: attrs.get(k) for k in keys} - - -_PASS_THROUGH_KEYS = ( - 'x_resolution', - 'y_resolution', - 'resolution_unit', - 'image_description', - 'extra_samples', -) - - -def test_numpy_attrs_includes_pass_through_tags(tmp_path): - """Sanity baseline: the eager numpy path emits all pass-through keys.""" - path = str(tmp_path / 'attrs_parity_1548_baseline.tif') - _write_tiff_with_pass_through_tags(path) - - da = open_geotiff(path) - for key in _PASS_THROUGH_KEYS: - assert key in da.attrs, ( - f"eager numpy is the canonical reference and should always " - f"emit '{key}'; got attrs={sorted(da.attrs.keys())}" - ) - - -def test_dask_attrs_match_numpy(tmp_path): - """The dask read path now emits the same pass-through attrs as numpy.""" - path = str(tmp_path / 'attrs_parity_1548_dask.tif') - _write_tiff_with_pass_through_tags(path) - - np_da = open_geotiff(path) - dk_da = open_geotiff(path, chunks=32) - - np_subset = _attrs_subset(np_da.attrs, _PASS_THROUGH_KEYS) - dk_subset = _attrs_subset(dk_da.attrs, _PASS_THROUGH_KEYS) - - assert dk_subset == np_subset, ( - f"dask attrs diverge from numpy:\n" - f" numpy: {np_subset}\n" - f" dask : {dk_subset}" - ) - - -@_gpu_only -def test_cupy_attrs_match_numpy(tmp_path): - """Cupy / GPU read emits the same pass-through attrs as numpy.""" - path = str(tmp_path / 'attrs_parity_1548_cupy.tif') - _write_tiff_with_pass_through_tags(path) - - np_da = open_geotiff(path) - gpu_da = open_geotiff(path, gpu=True) - - np_subset = _attrs_subset(np_da.attrs, _PASS_THROUGH_KEYS) - gpu_subset = _attrs_subset(gpu_da.attrs, _PASS_THROUGH_KEYS) - - assert gpu_subset == np_subset, ( - f"cupy attrs diverge from numpy:\n" - f" numpy: {np_subset}\n" - f" cupy : {gpu_subset}" - ) - - -@_gpu_only -def test_dask_cupy_attrs_match_numpy(tmp_path): - """Combined dask+cupy read still emits the pass-through attrs.""" - path = str(tmp_path / 'attrs_parity_1548_dask_cupy.tif') - _write_tiff_with_pass_through_tags(path) - - np_da = open_geotiff(path) - combined = open_geotiff(path, gpu=True, chunks=32) - - np_subset = _attrs_subset(np_da.attrs, _PASS_THROUGH_KEYS) - combined_subset = _attrs_subset(combined.attrs, _PASS_THROUGH_KEYS) - - assert combined_subset == np_subset, ( - f"dask+cupy attrs diverge from numpy:\n" - f" numpy : {np_subset}\n" - f" dask+cupy : {combined_subset}" - ) - - -def test_all_backend_attrs_keysets_equal(tmp_path): - """Strong contract: the *set* of attrs keys is identical across all - available backends, not just the pass-through subset. - - This guards against any future read path silently dropping a - different attr that nobody happened to test for above. - """ - path = str(tmp_path / 'attrs_parity_1548_keysets.tif') - _write_tiff_with_pass_through_tags(path) - - np_keys = set(open_geotiff(path).attrs.keys()) - dk_keys = set(open_geotiff(path, chunks=32).attrs.keys()) - - backend_keys = {'numpy': np_keys, 'dask+numpy': dk_keys} - if _HAS_GPU: - backend_keys['cupy'] = set(open_geotiff(path, gpu=True).attrs.keys()) - backend_keys['dask+cupy'] = set( - open_geotiff(path, gpu=True, chunks=32).attrs.keys()) - - differences = { - name: keys ^ np_keys - for name, keys in backend_keys.items() - if keys != np_keys - } - assert not differences, ( - f"backend attrs keysets diverge from numpy:\n" - f" numpy keys: {sorted(np_keys)}\n" - f" diffs : " - + "\n ".join( - f"{name}: symmetric_diff={sorted(diff)}" - for name, diff in differences.items() - ) - ) diff --git a/xrspatial/geotiff/tests/test_backend_full_parity_2211.py b/xrspatial/geotiff/tests/test_backend_full_parity_2211.py deleted file mode 100644 index cd42e91e8..000000000 --- a/xrspatial/geotiff/tests/test_backend_full_parity_2211.py +++ /dev/null @@ -1,952 +0,0 @@ -"""Backend full-parity gate (PR-F of #2211 / closes #2229). - -A single table-driven parity test that runs the same fixture through -every read backend ``xrspatial.geotiff`` ships and asserts the full -observable contract on each output: pixel values, dims and coords, -transform/georef attrs, CRS attrs, nodata sentinel and dtype, and a -curated set of canonical metadata attrs. - -This module is the **contract gate** for the geotiff refactor epic -#2211. Earlier phases (PRs A through E) shared finalization helpers -across the eager and lazy backends so the four CPU/GPU x eager/dask -paths produce the same DataArray. PR-F locks that result in: any drift -between backends on the curated attrs set raises here, so a refactor -in one backend that silently changes an attr cannot land without an -update to this test. - -Design notes ------------- - -* The ``eager_numpy`` read of each fixture is the reference every - other backend is compared against. The reference is read once per - fixture and cached for the module's lifetime so a 6-backend - matrix does not multiply IO by 6x. -* Backends covered (per #2229): - - ``eager_numpy`` -- ``open_geotiff(path)`` (also the reference) - - ``dask_numpy`` -- ``open_geotiff(path, chunks=...)`` - - ``gpu`` -- ``open_geotiff(path, gpu=True, on_gpu_failure='strict')`` - - ``dask_gpu`` -- ``open_geotiff(path, gpu=True, chunks=..., on_gpu_failure='strict')`` - - ``vrt_eager`` -- ``open_geotiff(vrt_path)`` (one-source VRT - generated per fixture) - - ``http_fsspec`` -- ``open_geotiff('memory:///corpus/.tif')`` - through fsspec's in-process memory filesystem -* Fixtures come from the existing golden corpus - (``xrspatial/geotiff/tests/golden_corpus/``) so the fixture surface - is the same one Phase 3 (#1930) exercised. The set is filtered to - fixtures the full parity matrix can compare cleanly today; known - divergences (e.g. JPEG-YCbCr axis order, MinIsWhite inversion) live - in ``_INTENTIONAL_SKIPS`` with citations. -* The reference is the ``eager_numpy`` read of the same fixture, taken - once per fixture and reused across backends. Pixels are compared - ``allclose`` for float dtypes (NaN-aware) and bit-exact for ints. -* GPU and Dask+GPU rows skip with a **loud** reason when cupy or CUDA - is missing: the issue text calls this out explicitly so a silent - collection of zero GPU tests in CI is itself a bug. - -Adding a new backend or a new fixture -------------------------------------- - -To add a backend: append a ``_Backend`` to ``_BACKENDS`` with its -``read`` callable, an availability predicate, and the list of fixture -ids it cannot handle (with citations). The parametrize matrix lights -it up on the next run. - -To add a fixture: drop it into the manifest at -``golden_corpus/manifest.yaml`` and regenerate. The matrix picks it -up automatically. -""" -from __future__ import annotations - -import importlib.util -import pathlib -from dataclasses import dataclass, field -from typing import Any, Callable - -import numpy as np -import pytest -import xarray as xr - -pytest.importorskip("yaml") -pytest.importorskip("rasterio") - -from xrspatial.geotiff import open_geotiff, write_vrt # noqa: E402 -from xrspatial.geotiff.tests.golden_corpus import generate # noqa: E402 -from xrspatial.geotiff.tests.golden_corpus._marks import fast_slow_marks_for # noqa: E402 - -FIXTURES_DIR = ( - pathlib.Path(generate.__file__).resolve().parent / "fixtures" -) - -# Chunk size for the dask backends. Most corpus fixtures are 64x64 or -# smaller, so 32 produces either a 2x2 chunk grid or a single chunk; -# either way the dask plumbing fires. -_CHUNK_SIZE = 32 - - -# --------------------------------------------------------------------------- -# Environment gating -# --------------------------------------------------------------------------- - -def _gpu_available() -> bool: - """True iff cupy is importable AND a CUDA device is actually usable. - - The two-step check matches ``conftest.gpu_available``: some sandboxes - ship cupy without a working CUDA runtime, so ``import cupy`` alone - is not enough. - """ - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: # pragma: no cover - defensive - return False - - -_HAS_GPU = _gpu_available() -_HAS_FSSPEC = importlib.util.find_spec("fsspec") is not None -_HAS_DASK = importlib.util.find_spec("dask") is not None - - -# --------------------------------------------------------------------------- -# Curated rich metadata attrs (the contract this gate locks in) -# --------------------------------------------------------------------------- - -# Geo / CRS / nodata core. Every fixture-carrying georef ought to agree -# on these across every backend. -_GEOREF_KEYS: tuple[str, ...] = ( - "transform", - "crs", - "crs_wkt", -) - -# Nodata semantics. ``masked_nodata`` is the flag from #2092 that says -# "the reader rewrote sentinel pixels to NaN"; pinning it across -# backends catches a backend that silently disagreed on the mask. -_NODATA_KEYS: tuple[str, ...] = ( - "nodata", - "masked_nodata", -) - -# Canonical metadata attrs from the #1984 contract. ``raster_type`` is -# absent when implicit-area; ``x_resolution`` / ``y_resolution`` / -# ``resolution_unit`` only fire when the fixture set them. The check -# below compares presence + value; an attr absent on the reference -# must also be absent on the candidate, and vice versa. -_CANONICAL_METADATA_KEYS: tuple[str, ...] = ( - "raster_type", - "x_resolution", - "y_resolution", - "resolution_unit", - "georef_status", -) - - -# --------------------------------------------------------------------------- -# Skip / xfail taxonomy -# --------------------------------------------------------------------------- - -# Fixtures the entire parity matrix skips. Each entry cites the source -# of the divergence (an issue link or a sibling test that asserts the -# divergent behaviour). Skips here are **intentional**: they describe -# behaviour the matrix would compare unequal across backends *by -# design*, not bugs. -_INTENTIONAL_SKIPS: dict[str, str] = { - "nodata_miniswhite_uint8": ( - "MinIsWhite photometric inversion: xrspatial inverts pixels per " - "#1797; rasterio leaves them raw. The matrix would compare " - "inverted-vs-raw and fail on every row. Covered by " - "test_miniswhite_backend_parity_1797.py." - ), - "compression_jpeg_uint8_ycbcr": ( - "JPEG-YCbCr is lossy and exposes a (bands, y, x) vs (y, x, band) " - "axis-order divergence that the golden-corpus oracle handles " - "via _normalise_axis_order but this gate's dims/coords check " - "cannot, because the dims tuple itself differs. Covered by " - "test_golden_corpus_eager_numpy_1930.py et al., which use the " - "axis-normalising oracle." - ), -} - - -# Per-backend skips. Outer key is the backend id; inner maps fixture id -# to a reason. Entries describe backend-specific limitations that -# cannot be resolved at this layer. -_BACKEND_SKIPS: dict[str, dict[str, str]] = { - "vrt_eager": { - # The VRT writer requires a CRS to lay out the mosaic; the - # citation-only fixture's user-defined WKT does not survive - # the round trip through rasterio's VRT serializer cleanly - # enough for a bit-exact parity check. Eager numpy and dask - # still parity-check this fixture directly. - "crs_citation_only": ( - "VRT round-trip mutates user-defined CRS WKT; covered by " - "test_user_defined_crs_wkt_1632.py." - ), - # External sidecar .ovr does not survive a VRT wrap. Base-IFD - # coverage stays in the per-backend corpus tests. - "overview_external_ovr_uint16": ( - "External .ovr sidecar is not preserved through VRT wrap." - ), - # Sparse tiles with explicit holes are not a clean VRT input - # (the writer would have to invent fill bytes). Covered by - # test_sparse_cog.py and test_vrt_holes_attr_1734.py. - "sparse_tiled_uint16": ( - "Sparse-tile holes are not preserved through VRT wrap." - ), - # VRT wrap does not surface the source TIFF's XResolution / - # YResolution / ResolutionUnit tags, nor extra_tags. The base - # IFD parity is covered by the eager / dask backends directly; - # the VRT cell would compare missing-vs-present canonical - # attrs and fail by design. - "extra_tags_uint16": ( - "VRT wrap does not propagate source TIFF resolution tags or " - "extra_tags; covered by test_extra_tags_safe_filter_1657.py." - ), - }, - "http_fsspec": { - # External sidecar .ovr fetch is not implemented for the cloud - # source path; the base IFD still reads cleanly, but the gate - # checks transform/coords that depend on the overview chain - # surfacing the same way as on the eager path. Per-backend - # corpus tests cover the no-overview case. - "overview_external_ovr_uint16": ( - "External .ovr sidecar reader is not wired into the cloud " - "source path." - ), - }, -} - - -# --------------------------------------------------------------------------- -# Backend table -# --------------------------------------------------------------------------- - -@dataclass(frozen=True) -class _Backend: - """One row of the parity matrix. - - ``read`` opens the fixture and returns a DataArray. ``available`` - reports whether the row can run in this environment; when it - cannot, the parametrize layer attaches a ``pytest.skip`` with the - ``unavailable_reason`` text. ``skips`` lists per-fixture skips - keyed by fixture id. - """ - - backend_id: str - read: Callable[[pathlib.Path, str], xr.DataArray] - available: bool - unavailable_reason: str - skips: dict[str, str] = field(default_factory=dict) - - -# PR 4 of epic #2340: experimental and internal-only codecs require an -# explicit opt-in on the read side. The full parity matrix tests every -# fixture including ``compression_lerc_float32``; the gate's purpose is -# orthogonal to the parity check, so pass both flags through every -# opener. The matrix continues to test what it was meant to test. -_OPTIN = { - "allow_experimental_codecs": True, - "allow_internal_only_jpeg": True, -} - - -def _read_eager_numpy(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: - return open_geotiff(str(path), **_OPTIN) - - -def _read_dask_numpy(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: - return open_geotiff(str(path), chunks=_CHUNK_SIZE, **_OPTIN) - - -def _read_gpu(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: - return open_geotiff( - str(path), gpu=True, on_gpu_failure="strict", **_OPTIN) - - -def _read_dask_gpu(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: - return open_geotiff( - str(path), gpu=True, chunks=_CHUNK_SIZE, on_gpu_failure="strict", - **_OPTIN, - ) - - -def _read_vrt_eager(path: pathlib.Path, fixture_id: str) -> xr.DataArray: - """Wrap ``path`` in a one-source VRT and read it back. - - Lives behind a session-scoped cache so the per-fixture VRT - materialisation runs once even though every backend test - parametrizes over fixtures independently. The wrap uses - ``write_vrt`` directly so the test exercises xrspatial's own VRT - plumbing, not rasterio's. - - The source ``.tif`` is **copied** into the VRT cache directory - rather than referenced in place. The VRT reader's path-containment - guard (#1671) refuses to follow a SourceFilename that resolves - outside the VRT directory unless ``XRSPATIAL_VRT_ALLOWED_ROOTS`` - is set; copying keeps the test self-contained instead of mutating - a global env var for every cell. - """ - import shutil - cache_dir = _vrt_cache_dir(path.parent) - local_src = cache_dir / f"{fixture_id}.tif" - if not local_src.exists(): - shutil.copy2(path, local_src) - vrt_path = cache_dir / f"{fixture_id}.vrt" - if not vrt_path.exists(): - write_vrt(str(vrt_path), [str(local_src)]) - return open_geotiff(str(vrt_path), **_OPTIN) - - -def _read_http_fsspec(path: pathlib.Path, fixture_id: str) -> xr.DataArray: - """Serve the fixture bytes via fsspec's in-process memory FS. - - The ``memory://`` scheme drives the same ``_CloudSource`` path an - ``s3://`` or ``gs://`` URL would, minus the network and - credentials. Matches how - ``test_golden_corpus_fsspec_1930.py`` exercises the cloud-eager - read path in restricted-sandbox CI. - - The fsspec memory filesystem is a process-global singleton, so the - fixture's bytes are written under a unique key and the key is - deleted before returning. The cloud-eager read path materialises - pixels inline (``_CloudSource`` downloads under the - ``max_cloud_bytes`` budget; see #1928), so the DataArray no longer - needs the memory entry once ``open_geotiff`` returns. - """ - import fsspec - fs = fsspec.filesystem("memory") - key = f"/corpus_full_parity_2211/{fixture_id}.tif" - with open(path, "rb") as f: - fs.pipe(key, f.read()) - try: - da = open_geotiff(f"memory://{key}", **_OPTIN) - finally: - # Best-effort cleanup; fsspec memory store deletions are - # idempotent. The cloud-eager path has already pulled the - # bytes into the DataArray by this point. - try: - fs.rm(key) - except FileNotFoundError: - pass - return da - - -def _vrt_cache_dir(fixtures_dir: pathlib.Path) -> pathlib.Path: - """Return the per-session VRT scratch directory. - - Materialising one-source VRTs into the fixtures directory itself - would pollute the corpus; ``tmp_path`` is fixture-scoped and not - shared across parametrize cells. A module-level cache directory - under the system tmp keeps the VRTs around for the duration of the - pytest process and lets every backend cell hit the same .vrt. - """ - import hashlib - import tempfile - base = pathlib.Path(tempfile.gettempdir()) / "xrspatial_2229_vrt_cache" - base.mkdir(parents=True, exist_ok=True) - # Pin the cache directory name to a stable digest of the - # fixtures path. ``hashlib.sha1`` is deterministic across - # processes (unlike ``hash()`` which uses ``PYTHONHASHSEED`` - # salting), so two pytest runs against the same fixtures dir - # reuse the same cache. The digest is truncated to 12 hex chars - # for path readability; collisions are not security-relevant - # because the cache directory is per-user-tmp. - digest = hashlib.sha1(str(fixtures_dir).encode()).hexdigest()[:12] - sub = base / f"fix_{digest}" - sub.mkdir(parents=True, exist_ok=True) - return sub - - -_GPU_UNAVAILABLE_REASON = ( - "GPU backend skipped LOUDLY: cupy + CUDA are not available in this " - "environment. Per issue #2229, GPU and Dask+GPU rows must skip " - "explicitly rather than silently collect zero tests. To exercise " - "this row, install cupy and ensure a CUDA device is reachable." -) - -_DASK_UNAVAILABLE_REASON = ( - "dask backend skipped: dask is not installed. Install xarray-spatial " - "with the dask extra to exercise this row." -) - -_FSSPEC_UNAVAILABLE_REASON = ( - "http_fsspec backend skipped: fsspec is not installed. Install " - "fsspec to exercise the cloud-source dispatch." -) - - -_BACKENDS: list[_Backend] = [ - _Backend( - backend_id="eager_numpy", - read=_read_eager_numpy, - available=True, - unavailable_reason="", - ), - _Backend( - backend_id="dask_numpy", - read=_read_dask_numpy, - available=_HAS_DASK, - unavailable_reason=_DASK_UNAVAILABLE_REASON, - ), - _Backend( - backend_id="gpu", - read=_read_gpu, - available=_HAS_GPU, - unavailable_reason=_GPU_UNAVAILABLE_REASON, - ), - _Backend( - backend_id="dask_gpu", - read=_read_dask_gpu, - available=_HAS_GPU and _HAS_DASK, - unavailable_reason=( - _GPU_UNAVAILABLE_REASON - if not _HAS_GPU - else _DASK_UNAVAILABLE_REASON - ), - skips=dict(_BACKEND_SKIPS.get("dask_gpu", {})), - ), - _Backend( - backend_id="vrt_eager", - read=_read_vrt_eager, - available=True, - unavailable_reason="", - skips=dict(_BACKEND_SKIPS["vrt_eager"]), - ), - _Backend( - backend_id="http_fsspec", - read=_read_http_fsspec, - available=_HAS_FSSPEC, - unavailable_reason=_FSSPEC_UNAVAILABLE_REASON, - skips=dict(_BACKEND_SKIPS["http_fsspec"]), - ), -] - - -# --------------------------------------------------------------------------- -# Manifest / fixture helpers -# --------------------------------------------------------------------------- - -def _resolved_fixtures() -> list[dict[str, Any]]: - """Return manifest entries with defaults merged, sorted by id.""" - manifest = generate.load_manifest() - entries = generate.validate(manifest) - entries.sort(key=lambda e: e["id"]) - return entries - - -def _fixture_path(entry: dict[str, Any]) -> pathlib.Path: - return FIXTURES_DIR / f"{entry['id']}.tif" - - -def _is_lossy(entry: dict[str, Any]) -> bool: - tol = entry.get("tolerance") or {} - return bool(tol.get("lossy", False)) - - -_FIXTURES = _resolved_fixtures() - - -# --------------------------------------------------------------------------- -# Parametrize helpers -# --------------------------------------------------------------------------- - -def _build_fixture_params() -> list[pytest.param]: - """One ``pytest.param`` per manifest entry, with slow/skip marks.""" - out: list[pytest.param] = [] - for entry in _FIXTURES: - fid = entry["id"] - marks = list(fast_slow_marks_for(entry)) - if fid in _INTENTIONAL_SKIPS: - marks.append(pytest.mark.skip(reason=_INTENTIONAL_SKIPS[fid])) - out.append(pytest.param(entry, id=fid, marks=marks)) - return out - - -def _build_backend_params() -> list[pytest.param]: - """One ``pytest.param`` per backend; unavailable ones carry skip marks.""" - out: list[pytest.param] = [] - for backend in _BACKENDS: - marks = [] - if not backend.available: - marks.append(pytest.mark.skip(reason=backend.unavailable_reason)) - out.append(pytest.param(backend, id=backend.backend_id, marks=marks)) - return out - - -_FIXTURE_PARAMS = _build_fixture_params() -_BACKEND_PARAMS = _build_backend_params() - - -# --------------------------------------------------------------------------- -# Pixel + attrs assertions -# --------------------------------------------------------------------------- - -def _is_nan_sentinel(value: Any) -> bool: - """True when ``value`` is a NaN sentinel, regardless of scalar type. - - Accepts Python floats, numpy scalars, and any object castable to - ``float``. Non-numeric values (including ``None``) return False so - the caller can fall through to a strict ``==`` comparison and get - a useful diff message. - """ - if value is None: - return False - try: - return bool(np.isnan(float(value))) - except (TypeError, ValueError): - return False - - -def _materialise(da: xr.DataArray) -> np.ndarray: - """Return a host-side numpy view regardless of backend. - - Mirrors the corpus oracle's ``_candidate_pixels``. Order matters: - a dask-of-cupy array needs ``.compute()`` before ``.get()`` because - the wrapping is graph-on-device, not device-on-graph. - """ - raw = da.data - if hasattr(raw, "compute"): - raw = raw.compute() - if hasattr(raw, "get"): - raw = raw.get() - return np.asarray(raw) - - -def _assert_pixels_close( - ref: np.ndarray, cand: np.ndarray, *, lossy: bool, label: str, -) -> None: - """Compare reference and candidate pixels. - - Float dtypes use NaN-aware ``allclose`` so a NaN-marked nodata - cell compares equal to itself. Integer dtypes go bit-exact. Lossy - fixtures (JPEG today) drop to a shape-only assertion because the - codec is intrinsically inexact. - """ - assert ref.shape == cand.shape, ( - f"{label}: shape mismatch ref={ref.shape} cand={cand.shape}" - ) - if lossy: - return - assert ref.dtype == cand.dtype, ( - f"{label}: dtype mismatch ref={ref.dtype} cand={cand.dtype}" - ) - if ref.dtype.kind == "f": - # ``allclose`` with ``equal_nan=True`` and a relative tolerance: - # every shipped fixture compares bit-exact today (the four - # eager/dask/gpu/dask+gpu backends share decode primitives and - # the VRT path leaves pixel bytes alone). The relative - # tolerance is here to absorb a hypothetical future codec / - # predictor that does a real floating-point op rather than as - # documented headroom for an existing drift. ``rtol=1e-12`` - # tracks data magnitude so a small-magnitude fixture is not - # secretly held to a slacker bar than a large-magnitude one; - # ``atol=0`` keeps zero values strict. The strict-zero - # behaviour is deliberate: a future fixture comparing an - # exact 0.0 against a sub-ULP non-zero is a real decode - # divergence at the smallest representable value and should - # fail rather than slip past under a generous absolute floor. - # Do not loosen ``atol`` without confirming the new fixture's - # expectations. - ok = np.allclose(ref, cand, rtol=1e-12, atol=0.0, equal_nan=True) - if not ok: - # Report the worst offender so a regression is debuggable. - diff = np.abs(np.where( - np.isnan(ref) & np.isnan(cand), 0.0, ref - cand - )) - raise AssertionError( - f"{label}: pixel allclose failed; max abs diff=" - f"{np.nanmax(diff)!r}" - ) - else: - if not np.array_equal(ref, cand): - raise AssertionError( - f"{label}: integer pixels differ (bit-exact comparison " - f"failed) ref.dtype={ref.dtype}" - ) - - -def _assert_dims_and_coords( - ref: xr.DataArray, cand: xr.DataArray, *, label: str, -) -> None: - """Dims tuple and per-axis coord values + dtype agree. - - The matrix only compares fixtures whose dims tuple is identical - across backends -- multi-band axis-order divergences (JPEG-YCbCr) - live in ``_INTENTIONAL_SKIPS`` for that reason. So the dim - equality is a strict tuple compare. - """ - assert ref.dims == cand.dims, ( - f"{label}: dims mismatch ref={ref.dims!r} cand={cand.dims!r}" - ) - for axis in ref.dims: - if axis not in ref.coords: - # Reference does not carry this coord (e.g. band index - # dropped); only fail if the candidate disagrees on - # presence. - assert axis not in cand.coords, ( - f"{label}: candidate has coord {axis!r} that the " - f"reference does not" - ) - continue - assert axis in cand.coords, ( - f"{label}: candidate is missing coord {axis!r} that the " - f"reference carries" - ) - ref_c = np.asarray(ref.coords[axis].values) - cand_c = np.asarray(cand.coords[axis].values) - assert ref_c.dtype == cand_c.dtype, ( - f"{label}: coord {axis!r} dtype ref={ref_c.dtype} " - f"cand={cand_c.dtype}" - ) - # ``allclose`` for floating-point coords, bit-exact for ints. - # Coords drive transform reconstruction so the tolerance is - # tight; a difference at the ULP level still means a different - # transform. - if ref_c.dtype.kind == "f": - assert np.allclose(ref_c, cand_c, rtol=0.0, atol=1e-9), ( - f"{label}: coord {axis!r} values differ" - ) - else: - assert np.array_equal(ref_c, cand_c), ( - f"{label}: coord {axis!r} values differ" - ) - - -def _assert_transform_attrs( - ref: xr.DataArray, cand: xr.DataArray, *, label: str, -) -> None: - """Transform 6-tuple agrees, allowing a ULP for float rounding.""" - ref_t = ref.attrs.get("transform") - cand_t = cand.attrs.get("transform") - if ref_t is None and cand_t is None: - return - assert ref_t is not None and cand_t is not None, ( - f"{label}: transform presence differs ref={ref_t!r} cand={cand_t!r}" - ) - ref_tup = tuple(float(v) for v in ref_t) - cand_tup = tuple(float(v) for v in cand_t) - assert len(ref_tup) == 6 and len(cand_tup) == 6, ( - f"{label}: transform must be a 6-tuple, got ref={ref_tup} " - f"cand={cand_tup}" - ) - for i, (a, b) in enumerate(zip(ref_tup, cand_tup)): - assert abs(a - b) <= 1e-9, ( - f"{label}: transform[{i}] differs ref={a!r} cand={b!r}" - ) - - -def _assert_crs_attrs( - ref: xr.DataArray, cand: xr.DataArray, *, label: str, -) -> None: - """``crs`` (EPSG int or string) and ``crs_wkt`` agree by value.""" - for key in ("crs", "crs_wkt"): - ref_v = ref.attrs.get(key) - cand_v = cand.attrs.get(key) - assert ref_v == cand_v, ( - f"{label}: attr {key!r} differs ref={ref_v!r} cand={cand_v!r}" - ) - - -def _assert_nodata_attrs( - ref: xr.DataArray, cand: xr.DataArray, *, label: str, -) -> None: - """nodata sentinel and the masked-nodata flag agree (NaN-aware).""" - ref_nd = ref.attrs.get("nodata") - cand_nd = cand.attrs.get("nodata") - if ref_nd is None and cand_nd is None: - pass - else: - # NaN sentinel equality: float('nan') != float('nan'), but the - # two are the same nodata for our purposes. The float cast - # below accepts numpy scalars (``numpy.float32(nan)``) and - # python floats alike; a non-numeric sentinel (e.g. None on - # one side only) falls through to the ``==`` branch which - # surfaces the mismatch with a useful message. - ref_is_nan = _is_nan_sentinel(ref_nd) - cand_is_nan = _is_nan_sentinel(cand_nd) - if not (ref_is_nan and cand_is_nan): - assert ref_nd == cand_nd, ( - f"{label}: nodata differs ref={ref_nd!r} cand={cand_nd!r}" - ) - ref_masked = ref.attrs.get("masked_nodata") - cand_masked = cand.attrs.get("masked_nodata") - assert ref_masked == cand_masked, ( - f"{label}: masked_nodata differs ref={ref_masked!r} " - f"cand={cand_masked!r}" - ) - - # Pixel dtype must match too: the masked-nodata contract upcasts - # an integer raster to float so NaN can live in it, so a drift here - # is a drift in the nodata semantics, not just in pixel storage. - ref_dtype = np.dtype(ref.dtype) - cand_dtype = np.dtype(cand.dtype) - assert ref_dtype == cand_dtype, ( - f"{label}: pixel dtype differs ref={ref_dtype} cand={cand_dtype}" - ) - - -def _assert_canonical_metadata_attrs( - ref: xr.DataArray, cand: xr.DataArray, *, label: str, -) -> None: - """The #1984 canonical-metadata subset agrees on presence + value. - - An attr absent on the reference must also be absent on the - candidate, and vice versa. This catches a backend that silently - stamps an attr the others omit, or omits an attr the others stamp. - """ - for key in _CANONICAL_METADATA_KEYS: - in_ref = key in ref.attrs - in_cand = key in cand.attrs - assert in_ref == in_cand, ( - f"{label}: canonical attr {key!r} presence differs " - f"ref={in_ref} cand={in_cand}" - ) - if in_ref: - ref_v = ref.attrs[key] - cand_v = cand.attrs[key] - assert ref_v == cand_v, ( - f"{label}: canonical attr {key!r} value differs " - f"ref={ref_v!r} cand={cand_v!r}" - ) - - -# --------------------------------------------------------------------------- -# Eager-numpy reference cache (one read per fixture, reused per backend) -# --------------------------------------------------------------------------- - -@pytest.fixture(scope="module") -def _reference_cache() -> dict[str, xr.DataArray]: - """Cache eager-numpy reads keyed by fixture id. - - Reading the same fixture once per backend in a 6-backend x 30+ - fixture matrix would multiply IO by 6x; cache the eager-numpy - candidate (which is also the reference) once per session. - """ - return {} - - -def _reference_for( - entry: dict[str, Any], cache: dict[str, xr.DataArray], -) -> xr.DataArray: - fid = entry["id"] - if fid not in cache: - cache[fid] = open_geotiff(str(_fixture_path(entry)), **_OPTIN) - return cache[fid] - - -# --------------------------------------------------------------------------- -# The single parity-gate test -# --------------------------------------------------------------------------- - -@pytest.mark.parametrize("backend", _BACKEND_PARAMS) -@pytest.mark.parametrize("manifest_entry", _FIXTURE_PARAMS) -def test_backend_full_parity( - manifest_entry: dict[str, Any], - backend: _Backend, - _reference_cache: dict[str, xr.DataArray], -) -> None: - """Single contract gate for every (backend, fixture) cell. - - Each cell: - - 1. Looks up (or reads) the eager-numpy reference for the fixture. - 2. Reads the same fixture through ``backend.read``. - 3. Asserts pixel values, dims + coords, transform/georef, CRS, - nodata, and the curated canonical metadata attrs. - - Per-backend skips (``backend.skips``) and intentional fixture - skips (``_INTENTIONAL_SKIPS``, applied via the parametrize marks) - cover cells the matrix legitimately cannot compare. - """ - fixture_id = manifest_entry["id"] - path = _fixture_path(manifest_entry) - if not path.exists(): - pytest.skip( - f"fixture {fixture_id!r} has no .tif on disk; run " - f"`python -m xrspatial.geotiff.tests.golden_corpus.generate`" - ) - - # Per-backend fixture skip with a loud reason. - if fixture_id in backend.skips: - pytest.skip( - f"backend={backend.backend_id} cannot read fixture=" - f"{fixture_id}: {backend.skips[fixture_id]}" - ) - - reference = _reference_for(manifest_entry, _reference_cache) - try: - candidate = backend.read(path, fixture_id) - except Exception as exc: - # Surface the read failure in the assertion message so a CI - # log can be triaged without re-running locally. - raise AssertionError( - f"backend={backend.backend_id} failed to read fixture=" - f"{fixture_id}: {type(exc).__name__}: {exc}" - ) from exc - - label = f"fixture={fixture_id} backend={backend.backend_id}" - - # 1. Pixel values. - ref_px = _materialise(reference) - cand_px = _materialise(candidate) - _assert_pixels_close( - ref_px, cand_px, lossy=_is_lossy(manifest_entry), label=label, - ) - - # 2. Dims + coords. - _assert_dims_and_coords(reference, candidate, label=label) - - # 3. Transform / georef. - _assert_transform_attrs(reference, candidate, label=label) - - # 4. CRS. - _assert_crs_attrs(reference, candidate, label=label) - - # 5. Nodata + dtype. - _assert_nodata_attrs(reference, candidate, label=label) - - # 6. Curated canonical metadata attrs. - _assert_canonical_metadata_attrs(reference, candidate, label=label) - - -# --------------------------------------------------------------------------- -# Taxonomy hygiene: skip tables must reference real fixture ids -# --------------------------------------------------------------------------- - -def test_taxonomy_ids_are_in_manifest() -> None: - """Every fixture id in a skip table must exist in the manifest. - - Catches typos and stale entries left behind when a fixture is - renamed or removed: a stale entry would silently keep a known- - incompatible cell skipped even after the underlying fixture has - been replaced. - """ - manifest_ids = {e["id"] for e in _FIXTURES} - referenced: set[str] = set(_INTENTIONAL_SKIPS) - for backend in _BACKENDS: - referenced.update(backend.skips) - stale = referenced - manifest_ids - assert not stale, ( - f"skip tables reference unknown fixture ids: {sorted(stale)}" - ) - - -def test_gpu_skip_reason_is_loud() -> None: - """Per issue #2229: GPU + Dask+GPU skips must be explicit, not silent. - - Sanity-check the reason strings carry the LOUD marker and cite the - issue. If a future refactor downgrades the reason to a generic - string, this test fails to point the developer at the contract. - """ - for backend_id in ("gpu", "dask_gpu"): - backend = next(b for b in _BACKENDS if b.backend_id == backend_id) - if backend.available: - continue - # Read the reason; it must mention the contract for clarity. - reason = backend.unavailable_reason - assert "skipped LOUDLY" in reason or "skipped" in reason, ( - f"{backend_id} unavailable_reason is not explicit enough: " - f"{reason!r}" - ) - assert "#2229" in reason or "cupy" in reason or "dask" in reason, ( - f"{backend_id} unavailable_reason does not cite the contract " - f"or the missing dep: {reason!r}" - ) - - -# --------------------------------------------------------------------------- -# Storage-type sanity checks (the parity battery alone cannot see them) -# --------------------------------------------------------------------------- - -def _first_eligible_fixture() -> dict[str, Any] | None: - """Pick a fast, on-disk fixture none of the intentional-skip - tables flag, so the storage-type sanity checks run against a - fixture every backend reads cleanly. - """ - for entry in _FIXTURES: - if entry["id"] in _INTENTIONAL_SKIPS: - continue - if not _fixture_path(entry).exists(): - continue - # Prefer a fast fixture; the sanity check just needs *any* - # eligible fixture so falling back to slow is fine. - if "fast" in (entry.get("tags") or []): - return entry - for entry in _FIXTURES: - if entry["id"] not in _INTENTIONAL_SKIPS and _fixture_path(entry).exists(): - return entry - return None - - -@pytest.mark.skipif(not _HAS_GPU, reason=_GPU_UNAVAILABLE_REASON) -def test_gpu_backend_returns_cupy_array() -> None: - """Sanity check: the gpu row returns a cupy-backed DataArray. - - Catches the failure mode the parity battery cannot see: a silent - fallback that returns a numpy array when the caller asked for - ``gpu=True``. The pixels would still compare equal to the eager - reference and the test would pass without exercising the GPU - decode path at all. - """ - import cupy - entry = _first_eligible_fixture() - if entry is None: - pytest.skip("no eligible fixture on disk") - da = _read_gpu(_fixture_path(entry), entry["id"]) - assert isinstance(da.data, cupy.ndarray), ( - f"gpu backend on fixture {entry['id']!r} returned " - f"{type(da.data).__name__}, expected cupy.ndarray. " - "A silent CPU fallback under gpu=True would let the parity " - "matrix pass while exercising the wrong code path." - ) - - -@pytest.mark.skipif(not _HAS_DASK, reason=_DASK_UNAVAILABLE_REASON) -def test_dask_backend_returns_dask_array() -> None: - """Sanity check: the dask_numpy row returns a dask-backed - DataArray. Catches a regression where ``chunks=`` is silently - dropped and the read goes through the eager path. - """ - entry = _first_eligible_fixture() - if entry is None: - pytest.skip("no eligible fixture on disk") - da = _read_dask_numpy(_fixture_path(entry), entry["id"]) - assert hasattr(da.data, "dask"), ( - f"dask_numpy backend on fixture {entry['id']!r} returned " - f"data of type {type(da.data).__name__}, expected a " - "dask-backed array." - ) - - -@pytest.mark.skipif( - not (_HAS_GPU and _HAS_DASK), - reason=( - f"{_GPU_UNAVAILABLE_REASON} (or dask missing -- " - f"{_DASK_UNAVAILABLE_REASON})" - ), -) -def test_dask_gpu_backend_returns_dask_of_cupy() -> None: - """Sanity check: the dask_gpu row returns a dask-graph-of-cupy - DataArray. Both layers must be present; a regression that strips - one would compute the right pixels via a different storage type. - """ - import cupy - entry = _first_eligible_fixture() - if entry is None: - pytest.skip("no eligible fixture on disk") - da = _read_dask_gpu(_fixture_path(entry), entry["id"]) - assert hasattr(da.data, "dask"), ( - f"dask_gpu backend on fixture {entry['id']!r} dropped the " - f"dask wrapping: data is {type(da.data).__name__}" - ) - # Compute one chunk to peek at the device-side type without - # materialising the whole array; ``_meta`` carries the chunk - # prototype that dask uses for shape/dtype introspection. - meta = getattr(da.data, "_meta", None) - assert isinstance(meta, cupy.ndarray), ( - f"dask_gpu backend on fixture {entry['id']!r} carries a " - f"non-cupy chunk prototype: {type(meta).__name__}. A " - "dask-of-numpy graph would compute the right pixels but " - "skip the GPU decode path." - ) diff --git a/xrspatial/geotiff/tests/test_backend_kwarg_parity_1561.py b/xrspatial/geotiff/tests/test_backend_kwarg_parity_1561.py deleted file mode 100644 index e3f622027..000000000 --- a/xrspatial/geotiff/tests/test_backend_kwarg_parity_1561.py +++ /dev/null @@ -1,218 +0,0 @@ -"""Regression tests for issue #1561. - -``open_geotiff`` and ``to_geotiff`` route to backend-specific entry -points (``read_geotiff_dask``, ``write_geotiff_gpu``) whose kwarg sets -were narrower than the dispatcher's. The dispatcher silently dropped -the missing kwargs when it routed to the smaller-API backend. - -These tests pin the kwargs through to each backend so dispatcher calls -no longer lose them. -""" -from __future__ import annotations - -import importlib.util - -import numpy as np -import pytest -import xarray as xr - - -def _gpu_available() -> bool: - """True if cupy is importable and CUDA is initialized.""" - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _gpu_available() -_gpu_only = pytest.mark.skipif( - not _HAS_GPU, - reason="cupy + CUDA required", -) - - -@pytest.fixture -def small_tiff_path(tmp_path): - """4x6 single-band tiled tiff with a small CRS+transform.""" - from xrspatial.geotiff import to_geotiff - - arr = np.arange(24, dtype=np.float32).reshape(4, 6) - da = xr.DataArray( - arr, - dims=['y', 'x'], - coords={ - 'y': np.array([0.5, 1.5, 2.5, 3.5]), - 'x': np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5]), - }, - attrs={'crs': 4326}, - ) - p = tmp_path / 'parity_1561_small.tif' - to_geotiff(da, str(p), tile_size=16) - return str(p), arr - - -@pytest.fixture -def small_multiband_tiff_path(tmp_path): - """4x6 three-band tiled tiff.""" - from xrspatial.geotiff import to_geotiff - - arr = np.arange(72, dtype=np.float32).reshape(4, 6, 3) - da = xr.DataArray( - arr, - dims=['y', 'x', 'band'], - coords={ - 'y': np.array([0.5, 1.5, 2.5, 3.5]), - 'x': np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5]), - 'band': [0, 1, 2], - }, - attrs={'crs': 4326}, - ) - p = tmp_path / 'parity_1561_mb.tif' - to_geotiff(da, str(p), tile_size=16) - return str(p), arr - - -# -------------------------------------------------------------------- -# read_geotiff_dask: window / band / max_pixels now threaded through -# -------------------------------------------------------------------- - - -def test_read_geotiff_dask_window_clips_region(small_tiff_path): - """``window=`` restricts the lazy region; chunks span only the window.""" - from xrspatial.geotiff import read_geotiff_dask - - path, arr = small_tiff_path - da = read_geotiff_dask(path, chunks=2, window=(1, 2, 4, 6)) - assert da.shape == (3, 4) - np.testing.assert_array_equal(da.values, arr[1:4, 2:6]) - - -def test_read_geotiff_dask_window_via_dispatcher(small_tiff_path): - """``open_geotiff(window=..., chunks=...)`` now keeps the window.""" - from xrspatial.geotiff import open_geotiff - - path, arr = small_tiff_path - da = open_geotiff(path, window=(0, 1, 3, 4), chunks=2) - assert da.shape == (3, 3) - np.testing.assert_array_equal(da.values, arr[0:3, 1:4]) - - -def test_read_geotiff_dask_band_selects_single_band(small_multiband_tiff_path): - """``band=`` produces a 2D DataArray with the selected band.""" - from xrspatial.geotiff import read_geotiff_dask - - path, arr = small_multiband_tiff_path - da = read_geotiff_dask(path, chunks=4, band=1) - assert da.ndim == 2 - np.testing.assert_array_equal(da.values, arr[:, :, 1]) - - -def test_read_geotiff_dask_band_via_dispatcher(small_multiband_tiff_path): - """``open_geotiff(band=..., chunks=...)`` now keeps the band.""" - from xrspatial.geotiff import open_geotiff - - path, arr = small_multiband_tiff_path - da = open_geotiff(path, band=2, chunks=4) - assert da.ndim == 2 - np.testing.assert_array_equal(da.values, arr[:, :, 2]) - - -def test_read_geotiff_dask_max_pixels_rejects_oversized(small_tiff_path): - """``max_pixels=`` rejects the windowed region up front.""" - from xrspatial.geotiff import read_geotiff_dask - - path, _ = small_tiff_path - with pytest.raises(ValueError, match="exceeds max_pixels"): - read_geotiff_dask(path, chunks=2, max_pixels=10) - - -def test_read_geotiff_dask_window_band_combined(small_multiband_tiff_path): - """``window`` and ``band`` cooperate.""" - from xrspatial.geotiff import read_geotiff_dask - - path, arr = small_multiband_tiff_path - da = read_geotiff_dask(path, chunks=2, window=(1, 1, 4, 5), band=0) - assert da.shape == (3, 4) - np.testing.assert_array_equal(da.values, arr[1:4, 1:5, 0]) - - -def test_read_geotiff_dask_invalid_window_raises(small_tiff_path): - """Out-of-bounds windows fail loudly instead of silently clipping.""" - from xrspatial.geotiff import read_geotiff_dask - - path, _ = small_tiff_path - with pytest.raises(ValueError, match="window=.* is outside"): - read_geotiff_dask(path, chunks=2, window=(0, 0, 100, 100)) - - -def test_read_geotiff_dask_invalid_band_raises(small_multiband_tiff_path): - """Out-of-range band indexes fail with IndexError.""" - from xrspatial.geotiff import read_geotiff_dask - - path, _ = small_multiband_tiff_path - with pytest.raises(IndexError, match="band=5 out of range"): - read_geotiff_dask(path, chunks=4, band=5) - - -# -------------------------------------------------------------------- -# write_geotiff_gpu: bigtiff / tiled / max_z_error / streaming_buffer_bytes -# now accepted (with appropriate rejections where the GPU path can't -# implement them). -# -------------------------------------------------------------------- - - -def test_write_geotiff_gpu_rejects_tiled_false(tmp_path): - """The GPU writer is tiled-only; ``tiled=False`` must fail loudly.""" - from xrspatial.geotiff import write_geotiff_gpu - - dummy = np.zeros((2, 2), dtype=np.float32) - with pytest.raises(ValueError, match="tiled=True"): - # path is irrelevant -- validation fires before any file I/O. - write_geotiff_gpu(dummy, str(tmp_path / 'never.tif'), tiled=False) - - -def test_write_geotiff_gpu_rejects_nonzero_max_z_error(tmp_path): - """LERC budget is not implementable on the GPU path.""" - from xrspatial.geotiff import write_geotiff_gpu - - dummy = np.zeros((2, 2), dtype=np.float32) - with pytest.raises(ValueError, match="max_z_error is not supported"): - write_geotiff_gpu(dummy, str(tmp_path / 'never.tif'), max_z_error=1.0) - - -@_gpu_only -def test_write_geotiff_gpu_accepts_streaming_buffer_bytes_as_noop(tmp_path): - """``streaming_buffer_bytes`` is accepted for API parity (no-op).""" - import cupy - - from xrspatial.geotiff import open_geotiff, write_geotiff_gpu - - arr = cupy.arange(16, dtype=cupy.float32).reshape(4, 4) - da = xr.DataArray(arr, dims=['y', 'x'], - coords={'y': np.arange(4, dtype=np.float64), - 'x': np.arange(4, dtype=np.float64)}) - p = tmp_path / 'parity_1561_streaming.tif' - # Argument is accepted; result must round-trip identically to a - # call without it. - write_geotiff_gpu(da, str(p), streaming_buffer_bytes=4096, tile_size=16) - rd = open_geotiff(str(p)) - np.testing.assert_array_equal(rd.values, arr.get()) - - -@_gpu_only -def test_to_geotiff_threads_tiled_false_into_gpu_dispatcher(tmp_path): - """``to_geotiff(..., gpu=True, tiled=False)`` rejects, not silently flips.""" - import cupy - - from xrspatial.geotiff import to_geotiff - - arr = cupy.zeros((2, 2), dtype=cupy.float32) - da = xr.DataArray(arr, dims=['y', 'x'], - coords={'y': [0.0, 1.0], 'x': [0.0, 1.0]}) - with pytest.raises(ValueError, match="tiled=False"): - to_geotiff(da, str(tmp_path / 'never.tif'), - gpu=True, tiled=False) diff --git a/xrspatial/geotiff/tests/test_backend_parity_matrix.py b/xrspatial/geotiff/tests/test_backend_parity_matrix.py deleted file mode 100644 index e2fbd876e..000000000 --- a/xrspatial/geotiff/tests/test_backend_parity_matrix.py +++ /dev/null @@ -1,1034 +0,0 @@ -"""Required backend parity matrix per high-risk fixture (issue #1985, #2132). - -Single source of truth for "does backend X still match the reference on -fixture Z." Existing scattered parity files (``test_attrs_parity_1548.py``, -``test_backend_pixel_parity_matrix_1813.py``, etc.) stay in place as -named regression markers for their bug numbers; new parity assertions go -here. - -Harness contract ----------------- - -Every cell calls a single :func:`assert_parity` helper that checks the -same set of fields on the same fixture across every wired-up backend: - -* pixel array (byte-equal for int, NaN-aware closeness for float) -* dtype -* dims and dim order -* coord values and coord dtype (per axis) -* transform tuple (rasterio 6-tuple) -* CRS as EPSG int when present, plus ``crs_wkt`` string -* declared nodata sentinel -* masking state (``attrs.get('masked_nodata')`` from #2092) -* a small subset of canonical attrs whose round-trip semantics are - already settled in the module (``raster_type``, ``transform``, - ``crs``, ``crs_wkt``). - -Backends (issue #2132 plan) ---------------------------- - -The matrix is parametrised over up to 8 entries that span every -public dispatch path the reader supports: - -* ``numpy`` -- eager local file -* ``dask+numpy`` -- chunked local file -* ``gpu`` -- eager local file via cupy -* ``dask+gpu`` -- chunked local file via cupy -* ``vrt-eager`` -- ``.vrt`` mosaic, eager -* ``vrt-dask`` -- ``.vrt`` mosaic, chunked -* ``http-cog`` -- HTTP range-read of a COG -* ``fsspec-memory`` -- ``memory://`` URI through fsspec - -GPU rows skip when cupy + CUDA are missing. HTTP and fsspec rows skip -when their network or fsspec deps are absent. VRT rows are gated by -the writer being able to lay out the mosaic on disk -- always true on -local filesystems. - -Cells that pair a backend with a source that physically cannot be -fed to it (e.g. a HTTP URL into ``vrt-eager``) skip via the -per-backend ``compat`` predicate on :class:`_BackendSpec`. -""" -from __future__ import annotations - -import http.server -import importlib.util -import socketserver -import threading -from dataclasses import dataclass, field -from pathlib import Path -from typing import Any, Callable - -import numpy as np -import pytest -import xarray as xr - -from xrspatial.geotiff import open_geotiff, to_geotiff, write_vrt -from xrspatial.geotiff._errors import RotatedTransformError - -# --------------------------------------------------------------------------- -# Environment gating -# --------------------------------------------------------------------------- - - -def _gpu_available() -> bool: - """True iff cupy is importable and the CUDA runtime is available. - - Mirrors the helper in ``test_backend_pixel_parity_matrix_1813.py`` - so GPU cells skip cleanly on hosts where cupy is installed but - CUDA is not. - """ - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: # pragma: no cover - defensive - return False - - -_HAS_GPU = _gpu_available() -_HAS_TIFFFILE = importlib.util.find_spec("tifffile") is not None -_HAS_FSSPEC = importlib.util.find_spec("fsspec") is not None - -_skip_no_gpu = pytest.mark.skipif(not _HAS_GPU, reason="cupy + CUDA required") -_skip_no_tifffile = pytest.mark.skipif( - not _HAS_TIFFFILE, reason="tifffile required for MinIsWhite fixture") -_skip_no_fsspec = pytest.mark.skipif( - not _HAS_FSSPEC, reason="fsspec required for memory:// source") - - -# --------------------------------------------------------------------------- -# Source-type taxonomy -# --------------------------------------------------------------------------- - -# Source types name how the fixture is delivered to ``open_geotiff``. The -# read backends accept a subset of source types; the compatibility matrix -# lives on :class:`_BackendSpec.compat`. -_SRC_LOCAL_TIFF = "local-tiff" -_SRC_LOCAL_VRT = "local-vrt" -_SRC_HTTP = "http" -_SRC_FSSPEC = "fsspec" - - -# --------------------------------------------------------------------------- -# Backend descriptors -# --------------------------------------------------------------------------- - -@dataclass(frozen=True) -class _BackendSpec: - """Declarative description of one read backend. - - Attributes - ---------- - backend_id - Stable id used in the parametrize call. Appears in test names. - kwargs - Static ``open_geotiff`` kwargs that select this backend. - compat - Set of source-type ids this backend accepts. Cells with an - incompatible (backend, source) pair skip with a clear reason. - marks - Pytest marks (e.g. skipif) applied to every cell using this - backend. Used to gate GPU and fsspec backends behind their - optional deps. - source_type_override - If set, the matrix dispatches the fixture path through this - source type rather than the fixture's native type. Used by the - HTTP and fsspec backends to deliver the same on-disk TIFF - through a different transport. - """ - - backend_id: str - kwargs: dict[str, Any] - compat: frozenset[str] - marks: tuple = field(default_factory=tuple) - source_type_override: str | None = None - - -_BACKENDS: list[_BackendSpec] = [ - _BackendSpec( - backend_id="numpy", - kwargs={}, - # VRT fixtures are owned by the ``vrt-eager`` / ``vrt-dask`` - # rows below; routing them through ``numpy`` too would - # duplicate identical cells. - compat=frozenset({_SRC_LOCAL_TIFF, _SRC_HTTP, _SRC_FSSPEC}), - ), - _BackendSpec( - backend_id="dask+numpy", - kwargs={"chunks": 16}, - # Dask path supports fsspec URIs (#1749) but does not accept - # raw BytesIO. VRT lives on the ``vrt-dask`` row. - compat=frozenset({_SRC_LOCAL_TIFF, _SRC_FSSPEC}), - ), - _BackendSpec( - backend_id="gpu", - kwargs={"gpu": True}, - # GPU reader is local-file only. HTTP / fsspec deliver bytes - # through code paths the GPU reader does not consume. - compat=frozenset({_SRC_LOCAL_TIFF}), - marks=(_skip_no_gpu,), - ), - _BackendSpec( - backend_id="dask+gpu", - kwargs={"gpu": True, "chunks": 16}, - compat=frozenset({_SRC_LOCAL_TIFF}), - marks=(_skip_no_gpu,), - ), - _BackendSpec( - backend_id="vrt-eager", - kwargs={}, - # VRT-only backend: only the VRT fixture is in scope. - compat=frozenset({_SRC_LOCAL_VRT}), - ), - _BackendSpec( - backend_id="vrt-dask", - kwargs={"chunks": 16}, - compat=frozenset({_SRC_LOCAL_VRT}), - ), - _BackendSpec( - backend_id="http-cog", - kwargs={}, - # HTTP backend re-routes any local TIFF fixture through a - # loopback HTTP server. Not all fixtures are valid COGs but - # the HTTP reader will still pull the bytes via range reads - # for any TIFF that the local server can serve. - compat=frozenset({_SRC_LOCAL_TIFF}), - source_type_override=_SRC_HTTP, - ), - _BackendSpec( - backend_id="fsspec-memory", - kwargs={}, - # fsspec memory:// route accepts any local TIFF fixture - # whose bytes can be uploaded into the in-memory filesystem. - compat=frozenset({_SRC_LOCAL_TIFF}), - source_type_override=_SRC_FSSPEC, - marks=(_skip_no_fsspec,), - ), -] - - -def _backend_params() -> list: - """Build the pytest.param list for the backend matrix.""" - out = [] - for spec in _BACKENDS: - out.append(pytest.param(spec, id=spec.backend_id, marks=spec.marks)) - return out - - -# --------------------------------------------------------------------------- -# Fixture descriptors -# --------------------------------------------------------------------------- - -@dataclass(frozen=True) -class _FixtureSpec: - """Declarative description of one high-risk fixture. - - Attributes - ---------- - fix_id - Stable id used in the parametrize call. Appears in test names. - dtype - Pixel dtype of the underlying array (and the on-disk SampleFormat). - expected_dims - Tuple of dim names in expected order. - expected_crs_epsg - EPSG int the read path should emit under ``attrs['crs']``. - expected_nodata - Declared nodata sentinel that the read path should surface under - ``attrs['nodata']``. ``None`` means the fixture has no declared - nodata; the harness then asserts ``'nodata' not in attrs``. - expected_masked - Tri-valued. ``True`` / ``False`` pin ``attrs['masked_nodata']``. - ``None`` means "do not assert" -- used for fixtures without - nodata. - source_type - How the fixture is laid out on disk. Drives the - backend-compatibility filter via :class:`_BackendSpec.compat`. - read_kwargs - Extra kwargs forwarded to every ``open_geotiff`` call for this - fixture (e.g. ``mask_nodata=False``). - marks - Pytest marks applied to every cell using this fixture (e.g. - ``_skip_no_tifffile`` for the MinIsWhite cell). - builder - Callable receiving a directory ``Path`` and the resolved target - ``Path`` (cache-key filename). Writes the file at ``target`` and - returns the final on-disk path. Most builders just return - ``target`` unchanged; sidecar-producing builders (e.g. a - ``.vrt`` over auxiliary tiles) may write multiple files and - return the entry path. - """ - - fix_id: str - dtype: np.dtype - expected_dims: tuple[str, ...] - expected_crs_epsg: int | None - expected_nodata: object - expected_masked: bool | None - source_type: str - builder: Callable[[Path, Path], Path] - read_kwargs: dict[str, Any] = field(default_factory=dict) - marks: tuple = field(default_factory=tuple) - - -def _wrap_2d(arr: np.ndarray, *, crs: int | None, - nodata: object | None = None) -> xr.DataArray: - """Wrap a 2-D numpy array as a writer-ready DataArray. - - Uses unit-pixel descending-y coords (``y = height-1 .. 0``, - ``x = 0 .. width-1``). The read-back transform tuple for a height-H - fixture is ``(1.0, 0.0, -0.5, 0.0, -1.0, H - 0.5)`` -- the half-pixel - offsets come from the PixelIsArea convention (origin is the pixel - edge, coords are pixel centres) that the writer round-trips. - """ - height, width = arr.shape - attrs: dict[str, Any] = {} - if crs is not None: - attrs["crs"] = crs - if nodata is not None: - attrs["nodata"] = nodata - return xr.DataArray( - arr, dims=["y", "x"], - coords={ - "y": np.arange(height - 1, -1, -1, dtype=np.float64), - "x": np.arange(width, dtype=np.float64), - }, - attrs=attrs, - ) - - -def _wrap_3d(arr: np.ndarray, *, crs: int) -> xr.DataArray: - """Wrap a 3-D (y, x, band) array as a writer-ready DataArray.""" - height, width, n_bands = arr.shape - return xr.DataArray( - arr, dims=["y", "x", "band"], - coords={ - "y": np.arange(height - 1, -1, -1, dtype=np.float64), - "x": np.arange(width, dtype=np.float64), - "band": np.arange(n_bands), - }, - attrs={"crs": crs}, - ) - - -# --------------------------------------------------------------------------- -# Fixture builders -# --------------------------------------------------------------------------- - -def _build_int16_single_band(dir_path: Path, target: Path) -> Path: - """High-risk fixture: int16 single-band stripped TIFF, EPSG:4326, no nodata.""" - del dir_path - rng = np.random.default_rng(seed=19850) - arr = rng.integers(-30000, 30000, size=(32, 32), dtype=np.int16) - to_geotiff( - _wrap_2d(arr, crs=4326), str(target), - compression="none", tiled=False, - ) - return target - - -def _build_uint16_multiband_tiled(dir_path: Path, target: Path) -> Path: - """Multiband tiled fixture: uint16, three bands, deflate-compressed.""" - del dir_path - rng = np.random.default_rng(seed=21320) - arr = rng.integers(0, 60000, size=(32, 32, 3), dtype=np.uint16) - to_geotiff( - _wrap_3d(arr, crs=4326), str(target), - compression="deflate", tiled=True, tile_size=16, - ) - return target - - -def _build_float32_with_nodata(dir_path: Path, target: Path) -> Path: - """Float32 single-band fixture with a -9999.0 nodata sentinel.""" - del dir_path - rng = np.random.default_rng(seed=21321) - arr = (rng.standard_normal((32, 32)) * 100.0).astype(np.float32) - # Sprinkle nodata sentinels into a few pixels so masking has work to do. - arr[0, 0] = -9999.0 - arr[5, 7] = -9999.0 - arr[31, 31] = -9999.0 - to_geotiff( - _wrap_2d(arr, crs=4326, nodata=-9999.0), str(target), - compression="none", tiled=False, - ) - return target - - -def _build_int8_unmasked(dir_path: Path, target: Path) -> Path: - """Int8 single-band fixture with a -128 nodata sentinel. - - Read back with ``mask_nodata=False`` so the literal sentinel survives - in the int8 buffer (locks the #2092 / #2127 masked-flag contract). - """ - del dir_path - rng = np.random.default_rng(seed=21322) - arr = rng.integers(-100, 100, size=(32, 32), dtype=np.int8) - arr[0, 0] = -128 - arr[4, 4] = -128 - to_geotiff( - _wrap_2d(arr, crs=4326, nodata=-128), str(target), - compression="none", tiled=False, - ) - return target - - -def _build_cog(dir_path: Path, target: Path) -> Path: - """COG fixture: float32 tiled with one overview level.""" - del dir_path - rng = np.random.default_rng(seed=21323) - arr = (rng.standard_normal((64, 64)) * 100.0).astype(np.float32) - to_geotiff( - _wrap_2d(arr, crs=4326), str(target), - compression="deflate", cog=True, tile_size=16, - overview_levels=[2], - ) - return target - - -def _build_vrt_mosaic(dir_path: Path, target: Path) -> Path: - """VRT fixture: 2-tile mosaic of float32 stripes laid out side by side.""" - tile_h, tile_w = 16, 16 - tile_paths: list[str] = [] - for c in range(2): - arr = np.full((tile_h, tile_w), - float(c + 1), dtype=np.float32) - origin_x = float(c * tile_w) - da = xr.DataArray( - arr, dims=["y", "x"], - coords={ - "y": np.arange(tile_h - 1, -1, -1, dtype=np.float64), - "x": np.arange(origin_x, origin_x + tile_w, dtype=np.float64), - }, - attrs={"crs": 4326}, - ) - p = dir_path / f"{target.stem}_tile_{c}.tif" - to_geotiff(da, str(p), compression="none", tiled=False) - tile_paths.append(str(p)) - write_vrt(str(target), tile_paths, relative=False, crs=4326) - return target - - -def _build_miniswhite(dir_path: Path, target: Path) -> Path: - """MinIsWhite uint8 fixture written via tifffile (photometric=0).""" - del dir_path - import tifffile # local import: only this builder needs tifffile - rng = np.random.default_rng(seed=21324) - arr = rng.integers(0, 256, size=(32, 32), dtype=np.uint8) - tifffile.imwrite( - str(target), arr, photometric="miniswhite", - compression="none", metadata=None, - ) - return target - - -_FIXTURES: list[_FixtureSpec] = [ - _FixtureSpec( - fix_id="int16-single-band", - dtype=np.dtype("int16"), - expected_dims=("y", "x"), - expected_crs_epsg=4326, - expected_nodata=None, - expected_masked=None, - source_type=_SRC_LOCAL_TIFF, - builder=_build_int16_single_band, - ), - _FixtureSpec( - fix_id="uint16-multiband-tiled", - dtype=np.dtype("uint16"), - expected_dims=("y", "x", "band"), - expected_crs_epsg=4326, - expected_nodata=None, - expected_masked=None, - source_type=_SRC_LOCAL_TIFF, - builder=_build_uint16_multiband_tiled, - ), - _FixtureSpec( - fix_id="float32-nodata", - dtype=np.dtype("float32"), - expected_dims=("y", "x"), - expected_crs_epsg=4326, - expected_nodata=-9999.0, - expected_masked=True, - source_type=_SRC_LOCAL_TIFF, - builder=_build_float32_with_nodata, - ), - _FixtureSpec( - fix_id="int8-unmasked", - dtype=np.dtype("int8"), - expected_dims=("y", "x"), - expected_crs_epsg=4326, - expected_nodata=-128, - expected_masked=False, - source_type=_SRC_LOCAL_TIFF, - builder=_build_int8_unmasked, - read_kwargs={"mask_nodata": False}, - ), - _FixtureSpec( - fix_id="cog-float32", - dtype=np.dtype("float32"), - expected_dims=("y", "x"), - expected_crs_epsg=4326, - expected_nodata=None, - expected_masked=None, - source_type=_SRC_LOCAL_TIFF, - builder=_build_cog, - ), - _FixtureSpec( - fix_id="vrt-mosaic", - dtype=np.dtype("float32"), - expected_dims=("y", "x"), - expected_crs_epsg=4326, - expected_nodata=None, - expected_masked=None, - source_type=_SRC_LOCAL_VRT, - builder=_build_vrt_mosaic, - ), - _FixtureSpec( - fix_id="miniswhite", - dtype=np.dtype("uint8"), - expected_dims=("y", "x"), - expected_crs_epsg=None, - expected_nodata=None, - expected_masked=None, - source_type=_SRC_LOCAL_TIFF, - builder=_build_miniswhite, - marks=(_skip_no_tifffile,), - ), -] - - -def _fixture_params() -> list: - """Build the pytest.param list for the fixture matrix.""" - return [pytest.param(spec, id=spec.fix_id, marks=spec.marks) - for spec in _FIXTURES] - - -@pytest.fixture(scope="session") -def _parity_matrix_dir(tmp_path_factory): - """Session-scoped scratch dir, one write per fixture id. - - Tests reuse files across cells. The matrix has up to 8 backends - x 7 fixtures; without caching every backend-row would rewrite the - fixture from scratch. - """ - return tmp_path_factory.mktemp("parity_matrix_2132") - - -@pytest.fixture -def parity_fixture(_parity_matrix_dir): - """Resolve a :class:`_FixtureSpec` to an on-disk path. - - Files are cached across the session: a fixture already present on - disk is returned without rewriting. - """ - dir_path = _parity_matrix_dir - - def _resolve(spec: _FixtureSpec) -> Path: - safe_id = spec.fix_id.replace("/", "-") - suffix = ".vrt" if spec.source_type == _SRC_LOCAL_VRT else ".tif" - path = dir_path / f"parity_2132_{safe_id}{suffix}" - if path.exists(): - return path - return spec.builder(dir_path, path) - return _resolve - - -# --------------------------------------------------------------------------- -# Transport adapters for the HTTP and fsspec backend rows -# --------------------------------------------------------------------------- - -class _MatrixRangeHandler(http.server.BaseHTTPRequestHandler): - """HTTP handler with Range support, serving a payload dict by path. - - The dict ``payload_by_path`` is set by the server fixture and maps - URL paths (``/parity_2132_int16-single-band.tif``) to bytes. - """ - - payload_by_path: dict[str, bytes] = {} - - def do_GET(self): # noqa: N802 - payload = self.payload_by_path.get(self.path) - if payload is None: - self.send_response(404) - self.end_headers() - return - rng = self.headers.get("Range") - if rng and rng.startswith("bytes="): - spec = rng[len("bytes="):] - start_s, _, end_s = spec.partition("-") - start = int(start_s) - end = int(end_s) if end_s else len(payload) - 1 - chunk = payload[start:end + 1] - self.send_response(206) - self.send_header("Content-Type", "application/octet-stream") - self.send_header( - "Content-Range", - f"bytes {start}-{start + len(chunk) - 1}/{len(payload)}", - ) - self.send_header("Content-Length", str(len(chunk))) - self.end_headers() - self.wfile.write(chunk) - return - self.send_response(200) - self.send_header("Content-Type", "application/octet-stream") - self.send_header("Content-Length", str(len(payload))) - self.end_headers() - self.wfile.write(payload) - - def log_message(self, *_args, **_kwargs): # noqa: A003 - # Silence the default access log during tests. - pass - - -@pytest.fixture(scope="session") -def _matrix_http_server_session(): - """Shared loopback HTTP server for the http-cog backend row. - - Started once per pytest session and torn down on session exit. The - payload dict on the handler is cleared between tests by the - function-scoped ``_matrix_http_server`` wrapper below; this fixture - only owns the socket and the thread. - """ - handler_cls = type( - "MatrixRangeHandler", (_MatrixRangeHandler,), - {"payload_by_path": dict(_MatrixRangeHandler.payload_by_path)}, - ) - httpd = socketserver.TCPServer(("127.0.0.1", 0), handler_cls) - port = httpd.server_address[1] - thread = threading.Thread(target=httpd.serve_forever, daemon=True) - thread.start() - try: - yield f"http://127.0.0.1:{port}", handler_cls - finally: - httpd.shutdown() - httpd.server_close() - - -@pytest.fixture -def _matrix_http_server(_matrix_http_server_session): - """Function-scoped HTTP server view: clears stale payloads after each test. - - Without this, the session-scoped ``payload_by_path`` dict accumulates - one entry per cell and never releases the bytes. Keeping it - function-scoped means a test only sees the URL paths it uploaded. - """ - base_url, handler_cls = _matrix_http_server_session - handler_cls.payload_by_path.clear() - try: - yield base_url, handler_cls - finally: - handler_cls.payload_by_path.clear() - - -def _deliver_via_http(spec: "_FixtureSpec | _ErrorFixtureSpec", on_disk: Path, - base_url: str, handler_cls, - monkeypatch) -> str: - """Upload an on-disk fixture into the shared HTTP server and return URL. - - The success matrix passes a :class:`_FixtureSpec`; the error - sub-matrix passes an :class:`_ErrorFixtureSpec`. Both expose - ``fix_id`` so the function consumes either. - """ - del spec # the spec is unused; signature kept for symmetry with fsspec - monkeypatch.setenv("XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS", "1") - with open(on_disk, "rb") as f: - payload = f.read() - url_path = f"/{on_disk.name}" - handler_cls.payload_by_path[url_path] = payload - return f"{base_url}{url_path}" - - -def _deliver_via_fsspec(spec: "_FixtureSpec | _ErrorFixtureSpec", - on_disk: Path) -> str: - """Pipe an on-disk fixture into fsspec's memory:// filesystem. - - Returns the ``memory://`` URI the read path should consume. The - memory filesystem persists for the pytest process, so the URI path - is namespaced by the fixture id to avoid collisions across cells. - """ - import fsspec - fs = fsspec.filesystem("memory") - safe_id = spec.fix_id.replace("/", "-") - uri_path = f"/parity_2132_{safe_id}.tif" - with open(on_disk, "rb") as f: - payload = f.read() - fs.pipe(uri_path, payload) - return f"memory://{uri_path}" - - -# --------------------------------------------------------------------------- -# Materialisation + comparison helpers -# --------------------------------------------------------------------------- - -def _materialise(da: xr.DataArray) -> np.ndarray: - """Return a numpy view of ``da.data`` regardless of backend.""" - raw = da.data - if hasattr(raw, "compute"): - raw = raw.compute() - if hasattr(raw, "get"): - raw = raw.get() - return np.asarray(raw) - - -def _coord_view(da: xr.DataArray, name: str) -> np.ndarray: - return np.asarray(da.coords[name].values) - - -def _assert_pixels_equal(ref: np.ndarray, actual: np.ndarray, *, label: str) -> None: - """Pixel equality, dtype-aware. - - Integer arrays must be byte-identical; float arrays compare NaN-aware - with ``equal_nan=True``. Diverging dtypes always fail -- a backend - that silently upcasts has a bug. - """ - assert ref.dtype == actual.dtype, ( - f"{label}: dtype differs ref={ref.dtype} actual={actual.dtype}" - ) - assert ref.shape == actual.shape, ( - f"{label}: shape differs ref={ref.shape} actual={actual.shape}" - ) - if ref.dtype.kind == "f": - assert np.array_equal(ref, actual, equal_nan=True), ( - f"{label}: float pixels differ (NaN-aware)" - ) - else: - assert ref.tobytes() == actual.tobytes(), ( - f"{label}: integer pixel bytes differ" - ) - - -# --------------------------------------------------------------------------- -# The matrix cell -# --------------------------------------------------------------------------- - -def assert_parity( - da: xr.DataArray, - spec: _FixtureSpec, - *, - ref: xr.DataArray, - label: str, -) -> None: - """Assert every parity field for one (fixture, backend) cell. - - Run against an already-read DataArray rather than re-opening here so - the same helper applies to both ``open_geotiff(path, **kwargs)`` and - the explicit ``read_geotiff_dask`` / ``read_geotiff_gpu`` / - ``read_vrt`` entry points wired up in follow-up PRs. ``ref`` is the - eager-numpy read of the same fixture, used as the reference for the - pixel array, coord values, dims, and transform tuple. - - ``spec.dtype`` and ``spec.expected_crs_epsg`` / - ``spec.expected_nodata`` are asserted against the actual - independently of the reference, so a bug that silently changes - them in *every* backend still fails this cell. - """ - # Pixel array, dtype, shape. - actual_arr = _materialise(da) - _assert_pixels_equal( - _materialise(ref), actual_arr, label=label, - ) - - # Dtype against the spec, not just against the reference. Catches a - # silent upcast that the reference would also exhibit. - assert actual_arr.dtype == spec.dtype, ( - f"{label}: dtype {actual_arr.dtype} != spec dtype {spec.dtype}" - ) - - # Dims + order. - assert da.dims == spec.expected_dims, ( - f"{label}: dims {da.dims!r} != expected {spec.expected_dims!r}" - ) - - # Coord values and coord dtype, per axis. Skip axes that the - # reference does not carry as a coord (e.g. ``band`` for some - # multiband layouts when the writer drops the index). - for axis in spec.expected_dims: - if axis not in ref.coords: - continue - ref_c = _coord_view(ref, axis) - actual_c = _coord_view(da, axis) - assert ref_c.dtype == actual_c.dtype, ( - f"{label}: coord {axis!r} dtype " - f"ref={ref_c.dtype} actual={actual_c.dtype}" - ) - assert ref_c.tobytes() == actual_c.tobytes(), ( - f"{label}: coord {axis!r} bytes differ" - ) - - # Transform tuple. The VRT path uses ``rasterio.Affine`` instances - # which compare equal to 6-tuples via ``__eq__``. - ref_t = ref.attrs.get("transform") - actual_t = da.attrs.get("transform") - assert ref_t == actual_t, ( - f"{label}: transform tuple differs ref={ref_t!r} actual={actual_t!r}" - ) - - # CRS: EPSG int + WKT string. - if spec.expected_crs_epsg is not None: - assert da.attrs.get("crs") == spec.expected_crs_epsg, ( - f"{label}: attrs['crs'] {da.attrs.get('crs')!r} != " - f"expected {spec.expected_crs_epsg!r}" - ) - ref_wkt = ref.attrs.get("crs_wkt") - actual_wkt = da.attrs.get("crs_wkt") - assert ref_wkt == actual_wkt, ( - f"{label}: crs_wkt differs ref={ref_wkt!r} actual={actual_wkt!r}" - ) - - # Nodata sentinel + masking state. - if spec.expected_nodata is None: - assert "nodata" not in da.attrs, ( - f"{label}: fixture declares no nodata but attrs['nodata']=" - f"{da.attrs.get('nodata')!r}" - ) - else: - assert da.attrs.get("nodata") == spec.expected_nodata, ( - f"{label}: attrs['nodata'] {da.attrs.get('nodata')!r} != " - f"expected {spec.expected_nodata!r}" - ) - - # Masking state: ``attrs['masked_nodata']`` reflects whether the - # reader replaced sentinel pixels with NaN (#2092 / #2127). The - # contract is fixed once a fixture declares a sentinel. - if spec.expected_masked is not None: - actual_masked = da.attrs.get("masked_nodata") - assert actual_masked == spec.expected_masked, ( - f"{label}: attrs['masked_nodata'] {actual_masked!r} != " - f"expected {spec.expected_masked!r}" - ) - - # Selected canonical attrs: reference and actual agree on presence - # and value. The list is intentionally narrow until issue #1984's - # contract version stamp lands. - canonical_keys = ("raster_type", "transform", "crs", "crs_wkt") - for key in canonical_keys: - ref_v = ref.attrs.get(key) - actual_v = da.attrs.get(key) - assert ref_v == actual_v, ( - f"{label}: canonical attr {key!r} differs " - f"ref={ref_v!r} actual={actual_v!r}" - ) - - -# --------------------------------------------------------------------------- -# Source-delivery wrapper: hands one fixture to a specific source type -# --------------------------------------------------------------------------- - -def _resolve_source( - spec: _FixtureSpec, on_disk: Path, backend: _BackendSpec, - *, - http_state, monkeypatch, -) -> object: - """Return the value that should be passed as ``source`` to ``open_geotiff``. - - Most backends consume the on-disk path verbatim. The ``http-cog`` - and ``fsspec-memory`` backends override the source type, so the - fixture bytes are re-served through the requested transport. - """ - target_type = backend.source_type_override or spec.source_type - if target_type == _SRC_LOCAL_TIFF or target_type == _SRC_LOCAL_VRT: - return str(on_disk) - if target_type == _SRC_HTTP: - base_url, handler_cls = http_state - return _deliver_via_http(spec, on_disk, base_url, handler_cls, monkeypatch) - if target_type == _SRC_FSSPEC: - return _deliver_via_fsspec(spec, on_disk) - raise AssertionError(f"unknown source type: {target_type}") - - -# --------------------------------------------------------------------------- -# The single matrix test entry point -# --------------------------------------------------------------------------- - -@pytest.mark.parametrize("spec", _fixture_params()) -@pytest.mark.parametrize("backend", _backend_params()) -def test_backend_parity_matrix( - parity_fixture, spec, backend, - _matrix_http_server, monkeypatch, -): - """One cell per (fixture, backend). Asserts every parity field. - - A new backend or fixture lights up automatically on the next pytest - run -- no per-cell test function needed. Incompatible (backend, - source) pairs skip cleanly rather than failing. - """ - if spec.source_type not in backend.compat: - pytest.skip( - f"backend={backend.backend_id} does not consume source_type=" - f"{spec.source_type} (fixture={spec.fix_id})" - ) - - path = parity_fixture(spec) - - # Eager-numpy reference: read the same on-disk fixture through the - # default backend so the matrix compares like-for-like. - ref = open_geotiff(str(path), **spec.read_kwargs) - - # Resolve the source the backend should actually consume (the - # on-disk path for local backends, an HTTP URL for the HTTP row, - # or a memory:// URI for the fsspec row). - source = _resolve_source( - spec, path, backend, - http_state=_matrix_http_server, monkeypatch=monkeypatch, - ) - - da = open_geotiff(source, **backend.kwargs, **spec.read_kwargs) - label = ( - f"fixture={spec.fix_id} backend={backend.backend_id} " - f"kwargs={backend.kwargs}" - ) - assert_parity(da, spec, ref=ref, label=label) - - -# --------------------------------------------------------------------------- -# Error-fixture sub-matrix: rotated ModelTransformationTag without opt-in -# --------------------------------------------------------------------------- - -_ROTATED_M = ( - 8.660254037844387, -5.0, 0.0, 100.0, # x row (30 deg rotation, pix=10) - 5.0, 8.660254037844387, 0.0, 200.0, # y row - 0.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 1.0, -) - - -def _write_rotated_tiff(path: Path, arr: np.ndarray) -> None: - """Hand-build a TIFF with a rotated ``ModelTransformationTag``. - - Mirrors the minimal writer used by - ``test_allow_rotated_geotiff_2115.py`` so the matrix can assert - error behaviour without depending on rasterio / GDAL. - """ - import struct - h, w = arr.shape - arr = np.ascontiguousarray(arr.astype(" Path: - del dir_path - arr = np.arange(20, dtype=" Path: - safe_id = spec.fix_id.replace("/", "-") - path = dir_path / f"parity_2132_err_{safe_id}.tif" - if path.exists(): - return path - return spec.builder(dir_path, path) - return _resolve - - -@pytest.mark.parametrize("error_spec", _ERROR_FIXTURES, - ids=lambda s: s.fix_id) -@pytest.mark.parametrize("backend", _backend_params()) -def test_backend_parity_matrix_errors( - error_parity_fixture, error_spec, backend, - _matrix_http_server, monkeypatch, -): - """Error fixtures raise the same exception on every compatible backend. - - Backends incompatible with the error fixture's source type skip; - every remaining cell asserts the same ``pytest.raises`` contract. - """ - if error_spec.source_type not in backend.compat: - pytest.skip( - f"backend={backend.backend_id} does not consume source_type=" - f"{error_spec.source_type}" - ) - - path = error_parity_fixture(error_spec) - - # Re-route the path through the requested transport (HTTP, fsspec) - # so the error surfaces on the same code path as the success - # matrix. - source = _resolve_source( - error_spec, path, backend, - http_state=_matrix_http_server, monkeypatch=monkeypatch, - ) - - with pytest.raises(error_spec.exc, match=error_spec.match): - # ``open_geotiff`` may return lazily for chunked reads, so - # force a materialisation inside the ``pytest.raises`` block - # so the error surfaces here regardless of laziness. - out = open_geotiff(source, **backend.kwargs) - _materialise(out) diff --git a/xrspatial/geotiff/tests/test_miniswhite_backend_parity_1797.py b/xrspatial/geotiff/tests/test_miniswhite_backend_parity_1797.py deleted file mode 100644 index 0c177eaba..000000000 --- a/xrspatial/geotiff/tests/test_miniswhite_backend_parity_1797.py +++ /dev/null @@ -1,123 +0,0 @@ -"""MinIsWhite photometric handling must be backend-consistent (#1797).""" -from __future__ import annotations - -import http.server -import importlib.util -import socketserver -import threading - -import numpy as np -import pytest - -from xrspatial.geotiff import open_geotiff - -tifffile = pytest.importorskip("tifffile") - - -def _gpu_available() -> bool: - """True if cupy is importable and CUDA is initialised.""" - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _gpu_available() -_gpu_only = pytest.mark.skipif( - not _HAS_GPU, - reason="cupy + CUDA required", -) - - -class _RangeHandler(http.server.BaseHTTPRequestHandler): - payload: bytes = b'' - - def do_GET(self): # noqa: N802 - rng = self.headers.get('Range') - if rng and rng.startswith('bytes='): - spec = rng[len('bytes='):] - start_s, _, end_s = spec.partition('-') - start = int(start_s) - end = int(end_s) if end_s else len(self.payload) - 1 - chunk = self.payload[start:end + 1] - self.send_response(206) - self.send_header('Content-Type', 'application/octet-stream') - self.send_header( - 'Content-Range', - f'bytes {start}-{start + len(chunk) - 1}/{len(self.payload)}', - ) - self.send_header('Content-Length', str(len(chunk))) - self.end_headers() - self.wfile.write(chunk) - return - self.send_response(200) - self.send_header('Content-Type', 'application/octet-stream') - self.send_header('Content-Length', str(len(self.payload))) - self.end_headers() - self.wfile.write(self.payload) - - def log_message(self, *_args, **_kwargs): - pass - - -def _serve(payload: bytes): - handler_cls = type( - 'RangeHandler1797', (_RangeHandler,), {'payload': payload} - ) - httpd = socketserver.TCPServer(('127.0.0.1', 0), handler_cls) - port = httpd.server_address[1] - thread = threading.Thread(target=httpd.serve_forever, daemon=True) - thread.start() - return httpd, port - - -@pytest.fixture -def miniswhite_http_url(tmp_path, monkeypatch): - monkeypatch.setenv('XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS', '1') - stored = np.array([[0, 1, 2], [10, 128, 255]], dtype=np.uint8) - path = tmp_path / "tmp_1797_miniswhite.tif" - tifffile.imwrite(str(path), stored, photometric='miniswhite') - httpd, port = _serve(path.read_bytes()) - try: - yield f'http://127.0.0.1:{port}/tmp_1797_miniswhite.tif', stored - finally: - httpd.shutdown() - httpd.server_close() - - -def test_http_miniswhite_matches_local_reader(miniswhite_http_url): - url, stored = miniswhite_http_url - - got = open_geotiff(url) - - np.testing.assert_array_equal(got.values, np.iinfo(stored.dtype).max - stored) - - -def test_http_dask_miniswhite_matches_local_reader(miniswhite_http_url): - url, stored = miniswhite_http_url - - got = open_geotiff(url, chunks=2).compute() - - np.testing.assert_array_equal(got.values, np.iinfo(stored.dtype).max - stored) - - -@_gpu_only -def test_gpu_miniswhite_matches_cpu_reader(tmp_path): - from xrspatial.geotiff._writer import write - - stored = np.array([[0, 1, 2], [10, 128, 255]], dtype=np.uint8) - path = str(tmp_path / "tmp_1797_miniswhite_gpu.tif") - write(stored, path, compression='deflate', tiled=True, tile_size=16, - photometric='miniswhite') - - cpu = open_geotiff(path) - gpu = open_geotiff(path, gpu=True) - - # After #1836 the writer pre-inverts MinIsWhite pixels so the reader's - # unconditional inversion restores the user-domain values -- the - # round-trip is the identity for both backends. - np.testing.assert_array_equal(cpu.values, stored) - np.testing.assert_array_equal(gpu.data.get(), cpu.values) From a3b51376f6786af7b4957a0ee93516348fdd81bd Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Mon, 25 May 2026 20:27:36 -0700 Subject: [PATCH 2/3] Address review suggestions and nits (#2398) * Fix module docstrings on test_backend_matrix.py and test_pixel_equality.py: three -> four sections. * Drop unused _FP_GEOREF_KEYS and _FP_NODATA_KEYS constants; only _FP_CANONICAL_METADATA_KEYS is consumed. * Add _FP_FIXTURES_DIR = None fallback so the gated path never raises NameError if a future refactor drops the guard. * Replace per-file _skip_no_gpu definition with the shared requires_gpu marker from _helpers/markers.py in both files. * Replace the ignored _fixture_id parameter on _fp_read_* with *_ so the unused-arg signal is gone. * Drop redundant ``from pathlib import Path`` in test_backend_matrix.py; alias Path = pathlib.Path to keep existing signatures working. --- .../tests/parity/test_backend_matrix.py | 47 +++++++++---------- .../tests/parity/test_pixel_equality.py | 7 +-- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/xrspatial/geotiff/tests/parity/test_backend_matrix.py b/xrspatial/geotiff/tests/parity/test_backend_matrix.py index c822cc33a..ed636ec0b 100644 --- a/xrspatial/geotiff/tests/parity/test_backend_matrix.py +++ b/xrspatial/geotiff/tests/parity/test_backend_matrix.py @@ -1,16 +1,19 @@ """Matrix-style backend parity across high-risk fixtures. Single source of truth for "does backend X still match the eager-numpy -reference on fixture Z." Covers three layers of parity: +reference on fixture Z." Four sections: -* The high-risk fixture matrix: every (backend, fixture) cell runs - through ``assert_parity`` plus an error sub-matrix. -* Full-fixture parity over the golden corpus, using the manifest as +* High-risk fixture matrix plus an error sub-matrix: every + (backend, fixture) cell runs through ``assert_parity``. +* Full-corpus parity over the golden corpus, using the manifest as the fixture set and the same ``open_geotiff`` entry-point across every backend. -* Attrs-key parity: the set of attrs emitted by each backend agrees - with the eager-numpy baseline, with a documented carve-out for +* Canonical-attrs parity: each backend stamps the same canonical + attrs for the same fixture, with a documented carve-out for backend-specific keys. +* Pass-through TIFF tag parity: ``x_resolution``, ``y_resolution``, + ``resolution_unit``, ``image_description``, and ``extra_samples`` + agree across the four core backends. Harness contract ---------------- @@ -62,9 +65,11 @@ import socketserver import threading from dataclasses import dataclass, field -from pathlib import Path from typing import Any, Callable +# Alias so existing base-section signatures that say ``Path`` keep working. +Path = pathlib.Path + import numpy as np import pytest import xarray as xr @@ -72,7 +77,7 @@ from xrspatial.geotiff import open_geotiff, read_vrt, to_geotiff, write_vrt from xrspatial.geotiff._errors import RotatedTransformError -from .._helpers.markers import gpu_available +from .._helpers.markers import gpu_available, requires_gpu # --------------------------------------------------------------------------- # Environment gating @@ -83,7 +88,8 @@ _HAS_FSSPEC = importlib.util.find_spec("fsspec") is not None _HAS_DASK = importlib.util.find_spec("dask") is not None -_skip_no_gpu = pytest.mark.skipif(not _HAS_GPU, reason="cupy + CUDA required") +# Use the shared marker from ``_helpers/markers.py`` for the GPU gate. +_skip_no_gpu = requires_gpu _skip_no_tifffile = pytest.mark.skipif( not _HAS_TIFFFILE, reason="tifffile required for MinIsWhite fixture") _skip_no_fsspec = pytest.mark.skipif( @@ -1048,23 +1054,16 @@ def test_backend_parity_matrix_errors( _FP_FIXTURES_DIR = ( pathlib.Path(_fp_generate.__file__).resolve().parent / "fixtures" ) +else: + # Defined so attribute access in gated paths never raises NameError + # under static analysis or a future refactor that drops a guard. + _FP_FIXTURES_DIR = None # Chunk size for the dask rows. Most corpus fixtures are 64x64 or # smaller, so 32 produces either a 2x2 chunk grid or a single chunk. _FP_CHUNK_SIZE = 32 -_FP_GEOREF_KEYS: tuple[str, ...] = ( - "transform", - "crs", - "crs_wkt", -) - -_FP_NODATA_KEYS: tuple[str, ...] = ( - "nodata", - "masked_nodata", -) - _FP_CANONICAL_METADATA_KEYS: tuple[str, ...] = ( "raster_type", "x_resolution", @@ -1137,20 +1136,20 @@ class _FpBackend: } -def _fp_read_eager_numpy(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: +def _fp_read_eager_numpy(path: pathlib.Path, *_: object) -> xr.DataArray: return open_geotiff(str(path), **_FP_OPTIN) -def _fp_read_dask_numpy(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: +def _fp_read_dask_numpy(path: pathlib.Path, *_: object) -> xr.DataArray: return open_geotiff(str(path), chunks=_FP_CHUNK_SIZE, **_FP_OPTIN) -def _fp_read_gpu(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: +def _fp_read_gpu(path: pathlib.Path, *_: object) -> xr.DataArray: return open_geotiff( str(path), gpu=True, on_gpu_failure="strict", **_FP_OPTIN) -def _fp_read_dask_gpu(path: pathlib.Path, _fixture_id: str) -> xr.DataArray: +def _fp_read_dask_gpu(path: pathlib.Path, *_: object) -> xr.DataArray: return open_geotiff( str(path), gpu=True, chunks=_FP_CHUNK_SIZE, on_gpu_failure="strict", **_FP_OPTIN, diff --git a/xrspatial/geotiff/tests/parity/test_pixel_equality.py b/xrspatial/geotiff/tests/parity/test_pixel_equality.py index d51374284..8790ca6ec 100644 --- a/xrspatial/geotiff/tests/parity/test_pixel_equality.py +++ b/xrspatial/geotiff/tests/parity/test_pixel_equality.py @@ -1,6 +1,6 @@ """Strict pixel-equality and kwarg-threading parity across read backends. -The strictest mode of backend parity. Three concerns share the file +The strictest mode of backend parity. Four concerns share the file because they fail in the same ways: * Pixel-byte parity across (numpy / dask+numpy / cupy / dask+cupy) on a @@ -33,7 +33,7 @@ from xrspatial.geotiff import (open_geotiff, read_geotiff_dask, read_geotiff_gpu, read_vrt, to_geotiff, write_vrt) -from .._helpers.markers import gpu_available +from .._helpers.markers import gpu_available, requires_gpu # --------------------------------------------------------------------------- # Environment gating @@ -42,7 +42,8 @@ _HAS_GPU = gpu_available() _HAS_TIFFFILE = importlib.util.find_spec("tifffile") is not None -_skip_no_gpu = pytest.mark.skipif(not _HAS_GPU, reason="cupy + CUDA required") +# Use the shared marker from ``_helpers/markers.py`` for the GPU gate. +_skip_no_gpu = requires_gpu _skip_no_tifffile = pytest.mark.skipif( not _HAS_TIFFFILE, reason="tifffile required for MinIsWhite fixture") From 8d249d2f71c382f9e326a0cbe5e21881826e3a9e Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Mon, 25 May 2026 20:28:10 -0700 Subject: [PATCH 3/3] Remove CLUSTER_AUDIT_PR4.md before merge (#2398) --- xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md | 110 ------------------- 1 file changed, 110 deletions(-) delete mode 100644 xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md diff --git a/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md b/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md deleted file mode 100644 index 773c3c975..000000000 --- a/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR4.md +++ /dev/null @@ -1,110 +0,0 @@ -# Cluster audit, PR 4 (backend parity) - -Temporary mapping document, deleted in a final commit before approval. - -## Folded files - -| Old file | New file | Notes | -|---|---|---| -| `test_backend_parity_matrix.py` | `parity/test_backend_matrix.py` | Core matrix harness moved verbatim (with markers re-imported from `_helpers/markers.py`). Contains `test_backend_parity_matrix` and `test_backend_parity_matrix_errors`. | -| `test_backend_full_parity_2211.py` | `parity/test_backend_matrix.py` | Full-corpus parity gate appended with `_fp_` prefix on internals to avoid collision. Contains `test_backend_full_parity`, `test_taxonomy_ids_are_in_manifest`, `test_gpu_skip_reason_is_loud`, `test_gpu_backend_returns_cupy_array`, `test_dask_backend_returns_dask_array`, `test_dask_gpu_backend_returns_dask_of_cupy`. | -| `test_attrs_finalization_parity_2211.py` | `parity/test_backend_matrix.py` | Appended with `_ap_` prefix on internals. Contains `test_canonical_attrs_match_across_backends`, `test_canonical_attrs_keys_match_across_backends`. Dropped: `test_backend_specific_keys_carveout_is_documented` (docstring scan no longer mapped to the new module's structure; the carve-out comment in the appended block names every key). | -| `test_attrs_parity_1548.py` | `parity/test_backend_matrix.py` | Appended as pass-through TIFF tag parity. Contains `test_pass_through_tags_eager_numpy_baseline`, `test_pass_through_tags_dask_matches_numpy`, `test_pass_through_tags_cupy_matches_numpy`, `test_pass_through_tags_dask_cupy_matches_numpy`, `test_pass_through_tags_all_backend_keysets_equal`. | -| `test_backend_pixel_parity_matrix_1813.py` | `parity/test_pixel_equality.py` | Strict pixel-byte parity harness moved verbatim (with markers re-imported). Test ids retain descriptive form: `stripped-uint8-none`, `tiled-float32-none`, `cog-float32-deflate`, etc. | -| `test_backend_kwarg_parity_1561.py` | `parity/test_pixel_equality.py` | Appended as kwarg-threading section. Contains the original `read_geotiff_dask` window / band / max_pixels tests and `write_geotiff_gpu` tiled / max_z_error / streaming_buffer_bytes tests. | -| `test_miniswhite_backend_parity_1797.py` | `parity/test_pixel_equality.py` | Appended as MinIsWhite section. Contains `test_miniswhite_http_matches_local_reader`, `test_miniswhite_http_dask_matches_local_reader`, `test_miniswhite_gpu_matches_cpu_reader`. | - -## Per-test mapping highlights - -### From `test_backend_parity_matrix.py` - -| Old test id | New test id | -|---|---| -| `test_backend_parity_matrix[numpy-int16-single-band]` | `parity/test_backend_matrix.py::test_backend_parity_matrix[numpy-int16-single-band]` | -| `test_backend_parity_matrix[gpu-uint16-multiband-tiled]` | `parity/test_backend_matrix.py::test_backend_parity_matrix[gpu-uint16-multiband-tiled]` | -| `test_backend_parity_matrix_errors[numpy-rotated-no-allow_rotated]` | `parity/test_backend_matrix.py::test_backend_parity_matrix_errors[numpy-rotated-no-allow_rotated]` | - -(All parametrize ids preserved.) - -### From `test_backend_full_parity_2211.py` - -| Old test id | New test id | -|---|---| -| `test_backend_full_parity[-]` | `parity/test_backend_matrix.py::test_backend_full_parity[-]` | -| `test_taxonomy_ids_are_in_manifest` | `parity/test_backend_matrix.py::test_taxonomy_ids_are_in_manifest` | -| `test_gpu_skip_reason_is_loud` | `parity/test_backend_matrix.py::test_gpu_skip_reason_is_loud` | -| `test_gpu_backend_returns_cupy_array` | `parity/test_backend_matrix.py::test_gpu_backend_returns_cupy_array` | -| `test_dask_backend_returns_dask_array` | `parity/test_backend_matrix.py::test_dask_backend_returns_dask_array` | -| `test_dask_gpu_backend_returns_dask_of_cupy` | `parity/test_backend_matrix.py::test_dask_gpu_backend_returns_dask_of_cupy` | - -### From `test_attrs_finalization_parity_2211.py` - -| Old test id | New test id | -|---|---| -| `test_canonical_attrs_match_across_backends[plain_float]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[plain_float]` | -| `test_canonical_attrs_match_across_backends[float_with_nodata]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[float_with_nodata]` | -| `test_canonical_attrs_match_across_backends[int_with_nodata]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[int_with_nodata]` | -| `test_canonical_attrs_match_across_backends[uint8_no_nodata]` | `parity/test_backend_matrix.py::test_canonical_attrs_match_across_backends[uint8_no_nodata]` | -| `test_canonical_attrs_keys_match_across_backends[]` | `parity/test_backend_matrix.py::test_canonical_attrs_keys_match_across_backends[]` | -| `test_backend_specific_keys_carveout_is_documented` | dropped; the carve-out keys are now listed in a comment inside the appended section. Replacing the docstring scan with a marker comment keeps the carve-out greppable without coupling to the new module's docstring layout. | - -### From `test_attrs_parity_1548.py` - -| Old test id | New test id | -|---|---| -| `test_numpy_attrs_includes_pass_through_tags` | `parity/test_backend_matrix.py::test_pass_through_tags_eager_numpy_baseline` | -| `test_dask_attrs_match_numpy` | `parity/test_backend_matrix.py::test_pass_through_tags_dask_matches_numpy` | -| `test_cupy_attrs_match_numpy` | `parity/test_backend_matrix.py::test_pass_through_tags_cupy_matches_numpy` | -| `test_dask_cupy_attrs_match_numpy` | `parity/test_backend_matrix.py::test_pass_through_tags_dask_cupy_matches_numpy` | -| `test_all_backend_attrs_keysets_equal` | `parity/test_backend_matrix.py::test_pass_through_tags_all_backend_keysets_equal` | - -### From `test_backend_pixel_parity_matrix_1813.py` - -| Old test id | New test id | -|---|---| -| `test_open_geotiff_pixel_bytes_match[-]` | `parity/test_pixel_equality.py::test_open_geotiff_pixel_bytes_match[-]` | -| `test_open_geotiff_coords_match[-]` | `parity/test_pixel_equality.py::test_open_geotiff_coords_match[-]` | -| `test_open_geotiff_attrs_match[-]` | `parity/test_pixel_equality.py::test_open_geotiff_attrs_match[-]` | -| `test_read_geotiff_dask_matches_open_geotiff[]` | `parity/test_pixel_equality.py::test_read_geotiff_dask_matches_open_geotiff[]` | -| `test_read_geotiff_gpu_matches_open_geotiff[]` | `parity/test_pixel_equality.py::test_read_geotiff_gpu_matches_open_geotiff[]` | -| `test_read_vrt_pixel_bytes_match[]` | `parity/test_pixel_equality.py::test_read_vrt_pixel_bytes_match[]` | -| `test_read_vrt_coords_match[]` | `parity/test_pixel_equality.py::test_read_vrt_coords_match[]` | -| `test_open_geotiff_dot_vrt_routes_to_read_vrt[]` | `parity/test_pixel_equality.py::test_open_geotiff_dot_vrt_routes_to_read_vrt[]` | -| `test_fixture_builders_produce_readable_files[]` | `parity/test_pixel_equality.py::test_fixture_builders_produce_readable_files[]` | - -### From `test_backend_kwarg_parity_1561.py` - -| Old test id | New test id | -|---|---| -| `test_read_geotiff_dask_window_clips_region` | `parity/test_pixel_equality.py::test_read_geotiff_dask_window_clips_region` | -| `test_read_geotiff_dask_window_via_dispatcher` | `parity/test_pixel_equality.py::test_read_geotiff_dask_window_via_dispatcher` | -| `test_read_geotiff_dask_band_selects_single_band` | `parity/test_pixel_equality.py::test_read_geotiff_dask_band_selects_single_band` | -| `test_read_geotiff_dask_band_via_dispatcher` | `parity/test_pixel_equality.py::test_read_geotiff_dask_band_via_dispatcher` | -| `test_read_geotiff_dask_max_pixels_rejects_oversized` | `parity/test_pixel_equality.py::test_read_geotiff_dask_max_pixels_rejects_oversized` | -| `test_read_geotiff_dask_window_band_combined` | `parity/test_pixel_equality.py::test_read_geotiff_dask_window_band_combined` | -| `test_read_geotiff_dask_invalid_window_raises` | `parity/test_pixel_equality.py::test_read_geotiff_dask_invalid_window_raises` | -| `test_read_geotiff_dask_invalid_band_raises` | `parity/test_pixel_equality.py::test_read_geotiff_dask_invalid_band_raises` | -| `test_write_geotiff_gpu_rejects_tiled_false` | `parity/test_pixel_equality.py::test_write_geotiff_gpu_rejects_tiled_false` | -| `test_write_geotiff_gpu_rejects_nonzero_max_z_error` | `parity/test_pixel_equality.py::test_write_geotiff_gpu_rejects_nonzero_max_z_error` | -| `test_write_geotiff_gpu_accepts_streaming_buffer_bytes_as_noop` | `parity/test_pixel_equality.py::test_write_geotiff_gpu_accepts_streaming_buffer_bytes_as_noop` | -| `test_to_geotiff_threads_tiled_false_into_gpu_dispatcher` | `parity/test_pixel_equality.py::test_to_geotiff_threads_tiled_false_into_gpu_dispatcher` | - -### From `test_miniswhite_backend_parity_1797.py` - -| Old test id | New test id | -|---|---| -| `test_http_miniswhite_matches_local_reader` | `parity/test_pixel_equality.py::test_miniswhite_http_matches_local_reader` | -| `test_http_dask_miniswhite_matches_local_reader` | `parity/test_pixel_equality.py::test_miniswhite_http_dask_matches_local_reader` | -| `test_gpu_miniswhite_matches_cpu_reader` | `parity/test_pixel_equality.py::test_miniswhite_gpu_matches_cpu_reader` | - -## Files left alone (decisions) - -| File | Reason | -|---|---| -| `test_vrt_backend_parity_2321.py` | VRT-specific backend parity. Belongs to PR 6 per the epic. Not touched here. | - -## Updates to existing references - -`docs/source/reference/release_gate_geotiff.rst` rows that cited the old paths now point at the consolidated `parity/test_backend_matrix.py` / `parity/test_pixel_equality.py`. Verified by `test_release_gate_2321.py::test_release_gate_cites_only_existing_test_files`. - -In-source comments in other test files and source modules still reference the old filenames; they are documentation strings, not file lookups, so they do not break collection. They will be updated as those files get folded in later PRs.