diff --git a/docs/source/reference/geotiff.rst b/docs/source/reference/geotiff.rst index d6511587a..1eab532c4 100644 --- a/docs/source/reference/geotiff.rst +++ b/docs/source/reference/geotiff.rst @@ -285,9 +285,9 @@ turn the process into a port scanner. The knobs are: and ``xrspatial/geotiff/tests/test_http_read_all_bounded_2051.py``. * ``XRSPATIAL_COG_MAX_TILE_BYTES``. Per-tile / per-strip compressed byte cap (default 256 MiB). Locked by - ``xrspatial/geotiff/tests/test_local_tile_byte_cap_1664.py``, + ``xrspatial/geotiff/tests/read/test_tiling.py``, ``xrspatial/geotiff/tests/test_cloud_read_byte_limit_1928.py``, and - ``xrspatial/geotiff/tests/test_gpu_tile_byte_cap_2026_05_18.py``. + ``xrspatial/geotiff/tests/read/test_tiling.py``. * ``XRSPATIAL_GEOTIFF_HTTP_CONNECT_TIMEOUT`` and ``XRSPATIAL_GEOTIFF_HTTP_READ_TIMEOUT``. Per-request connect / read timeouts in seconds. Positive floats only; other values fall back diff --git a/docs/source/reference/release_gate_geotiff.rst b/docs/source/reference/release_gate_geotiff.rst index b616af390..58a45a0df 100644 --- a/docs/source/reference/release_gate_geotiff.rst +++ b/docs/source/reference/release_gate_geotiff.rst @@ -231,7 +231,7 @@ Local GeoTIFF read and write - stable - Lossless byte-for-byte round-trip on integer and float dtypes. - ``xrspatial/geotiff/tests/test_supported_features_tiers_2137.py``, - ``xrspatial/geotiff/tests/test_compression.py`` + ``xrspatial/geotiff/tests/read/test_compression.py`` - `#2340`_ * - Stable codec round-trip (read / write / read) - stable @@ -354,7 +354,7 @@ HTTP / fsspec reads - Tile or strip declared sizes exceeding ``XRSPATIAL_COG_MAX_TILE_BYTES`` (default 256 MiB) raise ``ValueError``. - ``xrspatial/geotiff/tests/test_cloud_read_byte_limit_1928.py``, - ``xrspatial/geotiff/tests/test_gpu_tile_byte_cap_2026_05_18.py`` + ``xrspatial/geotiff/tests/read/test_tiling.py`` - `#2344`_ * - ``max_cloud_bytes`` dispatcher pass-through - stable @@ -665,7 +665,7 @@ GPU paths (experimental) - Integer and float nodata sentinels survive the GPU read / write round-trip. - ``xrspatial/geotiff/tests/test_gpu_nodata_1542.py``, - ``xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_inplace_1934.py`` + ``xrspatial/geotiff/tests/read/test_nodata.py`` - `#2341`_ Internal-only surfaces (not promised) diff --git a/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR8.md b/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR8.md new file mode 100644 index 000000000..02fa2fc6d --- /dev/null +++ b/xrspatial/geotiff/tests/CLUSTER_AUDIT_PR8.md @@ -0,0 +1,193 @@ +# CLUSTER_AUDIT_PR8.md — Reader-path tests + +Temporary audit table mapping every old `file::test` to its new home in +the `read/` cluster. Deleted in a follow-up commit on the same branch +before merge, per the epic #2390 contract. + +## Cluster split + +PR 8 owns the reader-side cluster. The following eight files land under +`xrspatial/geotiff/tests/read/`: + +- `read/test_basic.py` — minimal read paths, band validation. +- `read/test_dtypes.py` — reader dtype handling (eager / dask / GPU). +- `read/test_compression.py` — decompression-codec round-trips and + bomb caps (DEFLATE / LZW / ZSTD / PACKBITS / LZ4 / LERC / JPEG2000 / + JPEG). +- `read/test_tiling.py` — tile / strip byte-count cap on CPU and GPU. +- `read/test_endianness.py` — big-endian multi-byte read paths. +- `read/test_nodata.py` — nodata propagation on read (GPU helper). +- `read/test_coords.py` — descending / ascending coord round-trip. +- `read/test_streaming.py` — streaming BigTIFF threshold (folds in + `xrspatial/tests/test_geotiff_streaming_bigtiff_threshold_1785.py`). + +PR 3's `read/test_crs.py` (rotated / dropped / missing CRS) is the +parallel sibling and is left for that PR. + +## Folded files + +### `test_band_validation_1673.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `test_band_validation_1673.py::test_read_to_array_negative_band_rejected` | `read/test_basic.py::TestBandValidationLocal::test_negative_band_rejected` | Renamed, class-grouped. Same assertion. | +| `test_band_validation_1673.py::test_read_to_array_band_equal_to_samples_rejected` | `read/test_basic.py::TestBandValidationLocal::test_band_equal_to_samples_rejected` | Same. | +| `test_band_validation_1673.py::test_read_to_array_band_far_above_samples_rejected` | `read/test_basic.py::TestBandValidationLocal::test_band_far_above_samples_rejected` | Same. | +| `test_band_validation_1673.py::test_read_to_array_valid_band_still_works` | `read/test_basic.py::TestBandValidationLocal::test_valid_band_still_works` | Same. | +| `test_band_validation_1673.py::test_read_to_array_band_none_still_returns_all_bands` | `read/test_basic.py::TestBandValidationLocal::test_band_none_returns_all_bands` | Same. | +| `test_band_validation_1673.py::test_backend_parity_negative_band` | `read/test_basic.py::TestBandValidationBackendParity::test_negative_band` | Class-grouped. | +| `test_band_validation_1673.py::test_backend_parity_band_equal_to_samples` | `read/test_basic.py::TestBandValidationBackendParity::test_band_equal_to_samples` | Class-grouped. | +| (fixture) `multiband_tiff_path` | same fixture in `read/test_basic.py` | Filename in tmp_path renamed `mb_1673.tif` -> `mb_band_validation.tif`. | + +### `test_dtype_read.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `test_dtype_read.py::TestDtypeEager::*` | `read/test_dtypes.py::TestDtypeEager::*` | Verbatim. Fixture filenames renamed `test_1083_*.tif` -> `dtype_*.tif`. | +| `test_dtype_read.py::TestDtypeDask::*` | `read/test_dtypes.py::TestDtypeDask::*` | Same. | + +### `test_float16_read_1941.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `TestDtypeMap::*` | `read/test_dtypes.py::TestFloat16DtypeMap::*` | Renamed to disambiguate from generic dtype map tests. Body unchanged. | +| `TestEagerFloat16Read::*` | `read/test_dtypes.py::TestEagerFloat16Read::*` | Verbatim. | +| `TestPredictor3Float16::*` | `read/test_dtypes.py::TestPredictor3Float16::*` | Verbatim. | +| `TestRegressionGuards::*` | `read/test_dtypes.py::TestFloat16RegressionGuards::*` | Class renamed (no name collisions with other regression-guard classes). | + +### `test_float16_read_gpu_1941.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `TestEagerGPUReadFloat16::*` | `read/test_dtypes.py::TestEagerGPUReadFloat16::*` | Body unchanged. Module-level `pytestmark` skip replaced with per-method `@_gpu_only` since the consolidated file mixes GPU and non-GPU tests. | +| `TestGPUWindowedFloat16::*` | `read/test_dtypes.py::TestGPUWindowedFloat16::*` | Same. | +| `TestDaskGPUFloat16::*` | `read/test_dtypes.py::TestDaskGPUFloat16::*` | Same. | +| `TestGDSPathGatedOffForFloat16::*` | `read/test_dtypes.py::TestGDSPathGatedOffForFloat16::*` | Same. | +| `TestBackendParityFloat16::*` | `read/test_dtypes.py::TestBackendParityFloat16::*` | Same. | +| `TestPredictor3Float16GPU::*` | `read/test_dtypes.py::TestPredictor3Float16GPU::*` | Same. | + +### `test_compression.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `TestDeflate::*` | `read/test_compression.py::TestDeflate::*` | Verbatim. | +| `TestLZW::*` | `read/test_compression.py::TestLZW::*` | Verbatim. | +| `TestPredictor::*` | `read/test_compression.py::TestPredictor::*` | Verbatim. | +| `TestDispatch::*` | `read/test_compression.py::TestDispatch::*` | Verbatim. | + +### `test_decompression_caps.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `TestCodecDirect::*` | `read/test_compression.py::TestCodecDirect::*` | Verbatim. | +| `TestZstdDirect::*` | `read/test_compression.py::TestZstdDirect::*` | Verbatim. | +| `TestLz4Direct::*` | `read/test_compression.py::TestLz4Direct::*` | Verbatim. | +| `test_deflate_bomb_rejected` | `read/test_compression.py::test_deflate_bomb_rejected` | Verbatim. | +| `test_zstd_bomb_rejected` | `read/test_compression.py::test_zstd_bomb_rejected` | Verbatim. | +| `test_lz4_bomb_rejected` | `read/test_compression.py::test_lz4_bomb_rejected` | Verbatim. | +| `test_packbits_bomb_rejected` | `read/test_compression.py::test_packbits_bomb_rejected` | Verbatim. | +| `test_legitimate_high_compression_passes` | `read/test_compression.py::test_legitimate_high_compression_passes` | Verbatim. | +| `test_cap_includes_metadata_margin` | `read/test_compression.py::test_cap_includes_metadata_margin` | Verbatim. | +| `TestLercDirect::*` | `read/test_compression.py::TestLercDirect::*` | Verbatim. | +| `TestJpeg2000Direct::*` | `read/test_compression.py::TestJpeg2000Direct::*` | Verbatim. | +| `TestJpegDirect::*` | `read/test_compression.py::TestJpegDirect::*` | Verbatim. | + +### `test_local_tile_byte_cap_1664.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `TestLocalTileByteCap::*` | `read/test_tiling.py::TestLocalTileByteCap::*` | Verbatim. Fixture filenames renamed `forged_local_*_1664.tif` -> `forged_*.tif`. | +| `TestLocalStripByteCap::*` | `read/test_tiling.py::TestLocalStripByteCap::*` | Same. | +| `test_max_tile_bytes_env_negative_falls_back` | `read/test_tiling.py::test_max_tile_bytes_env_negative_falls_back` | Verbatim. | +| `test_max_tile_bytes_env_zero_falls_back` | `read/test_tiling.py::test_max_tile_bytes_env_zero_falls_back` | Verbatim. | +| `test_max_tile_bytes_env_garbage_falls_back` | `read/test_tiling.py::test_max_tile_bytes_env_garbage_falls_back` | Verbatim. | +| Import: `from ._helpers.tiff_surgery import ...` | `from .._helpers.tiff_surgery import ...` | One-level deeper under `read/`. | + +### `test_gpu_tile_byte_cap_2026_05_18.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `TestGpuTileByteCap::*` | `read/test_tiling.py::TestGpuTileByteCap::*` | Verbatim. Shares `_build_forged_tiled_cog` helper with the CPU class via a `basename` parameter so the two CPU-vs-GPU forged-tile groups do not collide on `tmp_path`. | +| `TestGpuChunkedTileByteCap::*` | `read/test_tiling.py::TestGpuChunkedTileByteCap::*` | Verbatim. | + +### `test_gpu_byteswap_1508.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `test_read_geotiff_gpu_big_endian_multibyte[*]` | `read/test_endianness.py::test_read_geotiff_gpu_big_endian_multibyte[*]` | Verbatim. | +| `test_read_geotiff_gpu_big_endian_uncompressed` | `read/test_endianness.py::test_read_geotiff_gpu_big_endian_uncompressed` | Verbatim. | +| `test_xp_byteswap_preserves_dtype` | `read/test_endianness.py::test_xp_byteswap_preserves_dtype` | Verbatim. | +| `test_xp_byteswap_uint8_passthrough` | `read/test_endianness.py::test_xp_byteswap_uint8_passthrough` | Verbatim. | + +### `test_apply_nodata_mask_gpu_inplace_1934.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `test_apply_nodata_mask_gpu_float_masks_sentinel_to_nan_1934` | `read/test_nodata.py::test_apply_nodata_mask_gpu_float_masks_sentinel_to_nan` | Issue number dropped from name. Body unchanged. | +| `test_apply_nodata_mask_gpu_float_in_place_no_copy_1934` | `read/test_nodata.py::test_apply_nodata_mask_gpu_float_in_place_no_copy` | Same. | +| `test_apply_nodata_mask_gpu_float_alloc_count_unchanged_1934` | `read/test_nodata.py::test_apply_nodata_mask_gpu_float_alloc_count_unchanged` | Same. | +| `test_apply_nodata_mask_gpu_int_promotes_and_masks_1934` | `read/test_nodata.py::test_apply_nodata_mask_gpu_int_promotes_and_masks` | Same. | +| `test_apply_nodata_mask_gpu_int_no_extra_buffer_after_astype_1934` | `read/test_nodata.py::test_apply_nodata_mask_gpu_int_no_extra_buffer_after_astype` | Same. | +| `test_apply_nodata_mask_gpu_float_nan_sentinel_noop_1934` | `read/test_nodata.py::test_apply_nodata_mask_gpu_float_nan_sentinel_noop` | Same. | +| `test_apply_nodata_mask_gpu_none_nodata_passthrough_1934` | `read/test_nodata.py::test_apply_nodata_mask_gpu_none_nodata_passthrough` | Same. | + +### `test_apply_nodata_mask_gpu_with_presence_removed_2208.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `test_apply_nodata_mask_gpu_with_presence_not_importable_2208` | `read/test_nodata.py::test_apply_nodata_mask_gpu_with_presence_not_importable` | Issue number dropped. Same `ImportError` assertion. | +| `test_apply_nodata_mask_gpu_still_present_2208` | `read/test_nodata.py::test_apply_nodata_mask_gpu_still_present` | Same. | + +### `test_descending_coords_1716.py` (deleted) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `test_descending_x_roundtrip` | `read/test_coords.py::TestDescendingCoordsRoundTrip::test_descending_x_roundtrip` | Class-grouped. tmp_path filenames renamed (`tmp_1716_desc_x.tif` -> `desc_x.tif`). | +| `test_ascending_y_roundtrip` | `read/test_coords.py::TestDescendingCoordsRoundTrip::test_ascending_y_roundtrip` | Same. | +| `test_descending_x_and_ascending_y_roundtrip` | `read/test_coords.py::TestDescendingCoordsRoundTrip::test_descending_x_and_ascending_y_roundtrip` | Same. | +| `test_north_up_still_uses_pixel_scale_and_tiepoint` | `read/test_coords.py::TestOrientationTagSelection::test_north_up_uses_pixel_scale_and_tiepoint` | Class-grouped, name slimmed. | +| `test_descending_x_uses_transformation_tag` | `read/test_coords.py::TestOrientationTagSelection::test_descending_x_uses_transformation_tag` | Same. | +| `test_ascending_y_uses_transformation_tag` | `read/test_coords.py::TestOrientationTagSelection::test_ascending_y_uses_transformation_tag` | Same. | + +### `xrspatial/tests/test_geotiff_streaming_bigtiff_threshold_1785.py` (deleted — cross-directory move) + +| Old `file::test` | New `file::test_id` | Notes | +|---|---|---| +| `TestShouldUseBigTIFFStreaming::*` | `read/test_streaming.py::TestShouldUseBigTIFFStreaming::*` | Verbatim. | +| `TestStreamingBigTIFFUserOverride::*` | `read/test_streaming.py::TestStreamingBigTIFFUserOverride::*` | Verbatim. Fixture filenames renamed `*_1785.tif` -> issue-number-free. | + +## Files NOT folded in (justified) + +Several files in the prompt's "key examples" list turned out to be +writer-side or unit-level on inspection and would conflict with another +PR's surface. They are left in place for their natural cluster: + +| File | Reason left in place | +|---|---| +| `test_accuracy_1081.py` | Mixed read/write numerical accuracy with parity surface area; folding into `read/test_basic.py` would expand PR scope beyond the reader-only contract. Defer to PR 11 unit-cleanup. | +| `test_ambiguous_metadata_hooks_1987.py` | Metadata contract / parity surface — overlaps with PR 4 (parity) and PR 5 (attrs contract). | +| `test_assemble_layout_no_bytes_copy_1756.py` | Tests `_assemble_standard_layout`, `_assemble_cog_layout`, `_assemble_tiff` — writer internals. Belongs to PR 7. | +| `test_bytesio_source.py` | Mixed BytesIO read/write; round-trip surface area is large and the file already groups its own concerns coherently. Defer to PR 11. | +| `test_chunked_gpu_declared_dtype_1909.py` | Mixed dtype/dask coverage that overlaps with the parity matrix (PR 4). | +| `test_compression_docstring_1644.py` | Tests `write_geotiff_gpu` docstring + GPU writer codec acceptance — writer-side. Belongs to PR 7. | +| `test_compression_level.py` | Tests `to_geotiff(compression_level=...)` — writer-side. Belongs to PR 7. | +| `test_conflicting_crs_write_1987.py` | Writer-side (CRS conflict on write). Belongs to PR 7. | +| `test_coord_regularity_1720.py` | Tests `_coords_to_transform` validation on the writer path. Belongs to PR 7. | +| `test_coords_1813.py` | Unit tests of `xrspatial.geotiff._coords` helpers — fits `unit/` (PR 11). | +| `test_coords_to_transform_3d_1643.py` | Writer-side coord-to-transform. Belongs to PR 7. | +| `test_predictor2_big_endian.py` / `test_predictor2_big_endian_gpu_1517.py` / `test_predictor3_big_endian.py` / `test_predictor3_int_dtype*` / `test_predictor_fp_write_*` | Predictor coverage overlaps with the writer codec matrix (PR 7) and the parity matrix (PR 4). Defer to a future endianness/predictor sub-cluster rather than risk colliding mid-PR. | + +## Verification + +- 134 tests collected in `xrspatial/geotiff/tests/read/` after PR 8 (8 + modules, including PR 3's `test_crs.py` once that PR lands). +- Total `test_*.py` files removed across the PR: 13 (12 inside + `geotiff/tests/`, plus the one cross-directory move from + `xrspatial/tests/test_geotiff_streaming_bigtiff_threshold_1785.py`). +- New `test_*.py` files added under `read/`: 8 (plus the empty + `__init__.py`). +- Net delta inside `geotiff/tests/`: -12 + 8 = -4 `test_*.py` files + (`find xrspatial/geotiff/tests -name 'test_*.py' | wc -l` goes from + 352 to 348). +- Net delta inside `xrspatial/tests/`: -1 `test_*.py` file. +- Total PR-wide `test_*.py` delta: -5. diff --git a/xrspatial/geotiff/tests/read/__init__.py b/xrspatial/geotiff/tests/read/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/xrspatial/geotiff/tests/read/test_basic.py b/xrspatial/geotiff/tests/read/test_basic.py new file mode 100644 index 000000000..c5ebe3795 --- /dev/null +++ b/xrspatial/geotiff/tests/read/test_basic.py @@ -0,0 +1,112 @@ +"""Minimal reader paths: band validation, byte-band, eager read. + +Consolidates the reader-side band-validation regression coverage +(formerly ``test_band_validation_1673.py``). The contract is that every +backend rejects out-of-range ``band`` arguments with the same typed +``IndexError`` so callers see consistent diagnostics regardless of +which path they pick. +""" +from __future__ import annotations + +import numpy as np +import pytest +import xarray as xr + + +@pytest.fixture +def multiband_tiff_path(tmp_path): + """4x6 three-band tiled tiff for band-validation tests.""" + from xrspatial.geotiff import to_geotiff + + arr = np.arange(72, dtype=np.float32).reshape(4, 6, 3) + da = xr.DataArray( + arr, + dims=['y', 'x', 'band'], + coords={ + 'y': np.array([0.5, 1.5, 2.5, 3.5]), + 'x': np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5]), + 'band': [0, 1, 2], + }, + attrs={'crs': 4326}, + ) + p = tmp_path / 'mb_band_validation.tif' + to_geotiff(da, str(p), tile_size=16) + return str(p), arr + + +class TestBandValidationLocal: + """``read_to_array`` rejects out-of-range band indices.""" + + def test_negative_band_rejected(self, multiband_tiff_path): + """``band=-1`` no longer silently selects the last channel.""" + from xrspatial.geotiff._reader import read_to_array + + path, _ = multiband_tiff_path + with pytest.raises(IndexError, match="band=-1 out of range"): + read_to_array(path, band=-1) + + def test_band_equal_to_samples_rejected(self, multiband_tiff_path): + """``band=samples_per_pixel`` (off-by-one) raises a typed error.""" + from xrspatial.geotiff._reader import read_to_array + + path, _ = multiband_tiff_path + with pytest.raises(IndexError, match="band=3 out of range"): + read_to_array(path, band=3) + + def test_band_far_above_samples_rejected(self, multiband_tiff_path): + """A wildly out-of-range band index gives the same typed error.""" + from xrspatial.geotiff._reader import read_to_array + + path, _ = multiband_tiff_path + with pytest.raises(IndexError, match="band=103 out of range"): + read_to_array(path, band=103) + + def test_valid_band_still_works(self, multiband_tiff_path): + """Valid band indices keep working after the validation guard.""" + from xrspatial.geotiff._reader import read_to_array + + path, arr = multiband_tiff_path + out, _ = read_to_array(path, band=1) + np.testing.assert_array_equal(out, arr[:, :, 1]) + + def test_band_none_returns_all_bands(self, multiband_tiff_path): + """``band=None`` still returns the full multi-band array.""" + from xrspatial.geotiff._reader import read_to_array + + path, arr = multiband_tiff_path + out, _ = read_to_array(path) + np.testing.assert_array_equal(out, arr) + + +class TestBandValidationBackendParity: + """Local eager and dask paths agree on the rejection contract.""" + + def test_negative_band(self, multiband_tiff_path): + """Both paths raise the same error for ``band=-1``.""" + from xrspatial.geotiff import read_geotiff_dask + from xrspatial.geotiff._reader import read_to_array + + path, _ = multiband_tiff_path + + with pytest.raises(IndexError) as eager_exc: + read_to_array(path, band=-1) + with pytest.raises(IndexError) as dask_exc: + read_geotiff_dask(path, chunks=4, band=-1) + + assert "band=-1 out of range" in str(eager_exc.value) + assert "band=-1 out of range" in str(dask_exc.value) + + def test_band_equal_to_samples(self, multiband_tiff_path): + """Both paths agree on the off-by-one rejection.""" + from xrspatial.geotiff import read_geotiff_dask + from xrspatial.geotiff._reader import read_to_array + + path, _ = multiband_tiff_path + + with pytest.raises(IndexError) as eager_exc: + read_to_array(path, band=3) + with pytest.raises(IndexError) as dask_exc: + read_geotiff_dask(path, chunks=4, band=3) + + assert "band=3 out of range" in str(eager_exc.value) + assert "band=3 out of range" in str(dask_exc.value) diff --git a/xrspatial/geotiff/tests/test_decompression_caps.py b/xrspatial/geotiff/tests/read/test_compression.py similarity index 70% rename from xrspatial/geotiff/tests/test_decompression_caps.py rename to xrspatial/geotiff/tests/read/test_compression.py index 5954c8e43..19e42d583 100644 --- a/xrspatial/geotiff/tests/test_decompression_caps.py +++ b/xrspatial/geotiff/tests/read/test_compression.py @@ -1,17 +1,22 @@ -"""Tests for decompression-bomb defenses (security finding S1). - -Each codec used by the TIFF reader (deflate, zstd, lz4, packbits) accepts an -``expected_size`` argument and refuses to produce more than ~5% above that -size before raising ``ValueError``. Without these caps a small malicious -TIFF could expand to many GB during decode and OOM the reader before the -post-decode size check ran. - -Each end-to-end test here builds a minimal TIFF that declares a 1024x1024 -uint8 image (1 MiB of legitimate pixel data) and feeds in a strip whose -decoded size is several MiB. That ratio is enough to trip the cap (~1.05 -MiB) without forcing the test process to allocate a multi-gigabyte -payload host-side -- the audit's original 1024:1 framing was symbolic; -what we actually verify is "compressed size << decoded size > cap". +"""Reader compression-codec coverage. + +Consolidates: + +* ``test_compression.py`` -- codec round-trip unit tests for the + reader's decompression entry points (deflate, LZW, predictor encode / + decode, the dispatcher). +* ``test_decompression_caps.py`` -- decompression-bomb defenses + (security finding S1) across deflate, ZSTD, LZ4, PackBits, LERC, + JPEG 2000, and JPEG. + +Several classes here (``TestDeflate``, ``TestLZW``, ``TestPredictor``, +``TestDispatch``, the ``Test*Direct`` codec bomb classes, and the JPEG +SOF cap class) call the codec functions directly rather than going +through ``open_geotiff`` / ``read_to_array``. They live under ``read/`` +because the reader is the only consumer of those decode entry points; +the writer side of the same codecs is exercised from PR 7's writer +cluster. Future maintainers scanning ``read/`` should treat these as +reader-internal codec coverage rather than end-to-end read paths. """ from __future__ import annotations @@ -22,8 +27,11 @@ import numpy as np import pytest -from xrspatial.geotiff._compression import (deflate_decompress, lz4_decompress, packbits_decompress, - zstd_decompress) +from xrspatial.geotiff._compression import (COMPRESSION_DEFLATE, COMPRESSION_LZW, COMPRESSION_NONE, + compress, decompress, deflate_compress, + deflate_decompress, lz4_decompress, lzw_compress, + lzw_decompress, packbits_decompress, + predictor_decode, predictor_encode, zstd_decompress) from xrspatial.geotiff._reader import read_to_array @@ -50,9 +58,123 @@ def _module_available(name: str) -> bool: # --------------------------------------------------------------------------- -# Helpers +# Codec round-trips (formerly test_compression.py) # --------------------------------------------------------------------------- + +class TestDeflate: + def test_round_trip(self): + data = b'hello world! ' * 100 + compressed = deflate_compress(data) + assert compressed != data + assert deflate_decompress(compressed) == data + + def test_empty(self): + compressed = deflate_compress(b'') + assert deflate_decompress(compressed) == b'' + + def test_binary_data(self): + data = bytes(range(256)) * 10 + compressed = deflate_compress(data) + assert deflate_decompress(compressed) == data + + +class TestLZW: + def test_round_trip_simple(self): + data = b'ABCABCABCABC' + compressed = lzw_compress(data) + decompressed = lzw_decompress(compressed, len(data)) + assert decompressed.tobytes() == data + + def test_round_trip_repetitive(self): + data = b'\x00' * 1000 + compressed = lzw_compress(data) + decompressed = lzw_decompress(compressed, len(data)) + assert decompressed.tobytes() == data + + def test_round_trip_sequential(self): + data = bytes(range(256)) + compressed = lzw_compress(data) + decompressed = lzw_decompress(compressed, len(data)) + assert decompressed.tobytes() == data + + def test_round_trip_random(self): + rng = np.random.RandomState(42) + data = bytes(rng.randint(0, 256, size=500, dtype=np.uint8)) + compressed = lzw_compress(data) + decompressed = lzw_decompress(compressed, len(data)) + assert decompressed.tobytes() == data + + def test_round_trip_large(self): + rng = np.random.RandomState(123) + data = bytes(rng.randint(0, 256, size=10000, dtype=np.uint8)) + compressed = lzw_compress(data) + decompressed = lzw_decompress(compressed, len(data)) + assert decompressed.tobytes() == data + + def test_empty(self): + compressed = lzw_compress(b'') + decompressed = lzw_decompress(compressed, 0) + assert decompressed.tobytes() == b'' + + +class TestPredictor: + def test_round_trip_uint8(self): + # 4x4 image, 1 byte per sample + data = np.array([10, 20, 30, 40, 50, 60, 70, 80, + 90, 100, 110, 120, 130, 140, 150, 160], + dtype=np.uint8) + encoded = predictor_encode(data.copy(), 4, 4, 1) + decoded = predictor_decode(encoded.copy(), 4, 4, 1) + np.testing.assert_array_equal(decoded, data) + + def test_round_trip_float32(self): + # 2x3 image, 4 bytes per sample + arr = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], dtype=np.float32) + raw = np.frombuffer(arr.tobytes(), dtype=np.uint8).copy() + encoded = predictor_encode(raw.copy(), 3, 2, 4) + decoded = predictor_decode(encoded.copy(), 3, 2, 4) + np.testing.assert_array_equal(decoded, raw) + + def test_predictor_encode_differences(self): + # First pixel unchanged, rest are differences + data = np.array([10, 20, 30, 40], dtype=np.uint8) + encoded = predictor_encode(data.copy(), 4, 1, 1) + assert encoded[0] == 10 + assert encoded[1] == 10 # 20 - 10 + assert encoded[2] == 10 # 30 - 20 + assert encoded[3] == 10 # 40 - 30 + + +class TestDispatch: + def test_none(self): + data = b'hello' + assert decompress(data, COMPRESSION_NONE).tobytes() == data + assert compress(data, COMPRESSION_NONE) == data + + def test_deflate(self): + data = b'test data ' * 50 + compressed = compress(data, COMPRESSION_DEFLATE) + assert decompress(compressed, COMPRESSION_DEFLATE).tobytes() == data + + def test_lzw(self): + data = b'ABCABC' * 20 + compressed = compress(data, COMPRESSION_LZW) + decompressed = decompress(compressed, COMPRESSION_LZW, len(data)) + assert decompressed.tobytes() == data + + def test_unsupported(self): + with pytest.raises(ValueError, match="Unsupported compression"): + decompress(b'', 99) + with pytest.raises(ValueError, match="Unsupported compression"): + compress(b'', 99) + + +# --------------------------------------------------------------------------- +# Decompression-bomb caps (formerly test_decompression_caps.py) +# --------------------------------------------------------------------------- + + def _build_tiff_with_strip(strip_bytes: bytes, *, compression: int, width: int, height: int) -> bytes: """Build a minimal little-endian uint8 TIFF with one strip of opaque bytes. @@ -64,9 +186,6 @@ def _build_tiff_with_strip(strip_bytes: bytes, *, compression: int, """ bo = '<' - # Tags: (tag_id, type_id, count, value_or_offset_bytes_4) - # We keep every tag's value inline so the TIFF body order is: - # header(8) | IFD | strip tags = [] def add_short(tag, val): @@ -88,11 +207,10 @@ def add_long(tag, val): tags.sort(key=lambda t: t[0]) num_entries = len(tags) - ifd_size = 2 + 12 * num_entries + 4 # count + entries + next-IFD + ifd_size = 2 + 12 * num_entries + 4 ifd_start = 8 strip_offset = ifd_start + ifd_size - # Patch StripOffsets (tag 273) with the real strip location. patched = [] for tag_id, typ, count, raw in tags: if tag_id == 273: @@ -101,23 +219,18 @@ def add_long(tag, val): tags = patched out = bytearray() - out += b'II' # little-endian - out += struct.pack(f'{bo}H', 42) # magic - out += struct.pack(f'{bo}I', ifd_start) # offset to IFD0 - out += struct.pack(f'{bo}H', num_entries) # IFD entry count + out += b'II' + out += struct.pack(f'{bo}H', 42) + out += struct.pack(f'{bo}I', ifd_start) + out += struct.pack(f'{bo}H', num_entries) for tag_id, typ, count, raw in tags: out += struct.pack(f'{bo}HHI', tag_id, typ, count) - # Pad raw to 4 bytes (all our tags fit inline). out += raw.ljust(4, b'\x00') - out += struct.pack(f'{bo}I', 0) # no next IFD + out += struct.pack(f'{bo}I', 0) out += strip_bytes return bytes(out) -# --------------------------------------------------------------------------- -# Codec-level direct tests -# --------------------------------------------------------------------------- - # Direct-codec bomb size: 4 MiB of zeros, well above the 1 KiB cap used # in those tests but small enough to keep CI host allocations under # control. 4 MiB / 1 KiB ~ 4000:1 is still bomb-shaped territory; the @@ -141,14 +254,8 @@ def test_deflate_legitimate_passes(self): assert out == data def test_deflate_no_cap_when_expected_size_zero(self): - """Backward-compat: ``expected_size=0`` (default) disables the cap. - - Callers that haven't been updated to supply a size must keep - getting the unbounded library decode -- otherwise a default - cap would silently break them. Round-tripping data larger than - any plausible cap proves the disable path is intact. - """ - data = b'A' * (256 * 1024) # 256 KiB, well above any cap default + """Backward-compat: ``expected_size=0`` (default) disables the cap.""" + data = b'A' * (256 * 1024) comp = zlib.compress(data, 9) out = deflate_decompress(comp) # no expected_size kwarg assert out == data @@ -168,8 +275,6 @@ def test_packbits_legitimate_passes(self): assert out == b'ABCD' def test_packbits_no_cap_when_expected_size_zero(self): - # Same literal-run pattern, no expected_size argument: the - # backward-compat path must skip the cap and decode in full. data = b'\x03ABCD' * 1024 out = packbits_decompress(data) assert out == b'ABCD' * 1024 @@ -224,15 +329,13 @@ def test_lz4_no_cap_when_expected_size_zero(self): # --------------------------------------------------------------------------- -# End-to-end TIFF tests (audit reproducer shape) +# End-to-end TIFF bomb tests (audit reproducer shape) # --------------------------------------------------------------------------- # 1024 x 1024 uint8 = 1 MiB declared image, cap is ~1.05 MiB. We feed a # strip whose decoded size is 8 MiB. The cap is exceeded by ~7x, which is # enough to prove the codec rejects rather than silently truncates, while # keeping the test's host-side allocation small enough for any CI runner. -# (The audit's original 1024:1 framing was symbolic; the defense fires the -# moment decoded > cap, not at any specific ratio.) _DECLARED_W = 1024 _DECLARED_H = 1024 _DECLARED_BYTES = _DECLARED_W * _DECLARED_H # 1 MiB @@ -271,24 +374,16 @@ def test_lz4_bomb_rejected(tmp_path): import lz4.frame payload = b'\x00' * _BOMB_BYTES strip = lz4.frame.compress(payload) - # LZ4 has a higher floor than deflate/zstd for runs of zeros, but - # still well below the bomb size. assert len(strip) < _BOMB_BYTES // 4 tiff = _build_tiff_with_strip(strip, compression=50004, width=_DECLARED_W, height=_DECLARED_H) path = tmp_path / "lz4_bomb.tif" path.write_bytes(tiff) - # LZ4 is the Experimental read tier (PR 4 of epic #2340); pass the - # opt-in so the test exercises the bomb cap rather than the codec - # gate. with pytest.raises(ValueError, match="exceed"): read_to_array(str(path), allow_experimental_codecs=True) def test_packbits_bomb_rejected(tmp_path): - # Packbits "repeat next byte 128 times" header is 0x81 0x00 (2 bytes). - # We declare 1024x1024=1 MiB image but supply a 2 MiB strip that - # decodes to 128 MiB. The cap should fire long before allocation. strip = b'\x81\x00' * (1024 * 1024) tiff = _build_tiff_with_strip(strip, compression=32773, width=_DECLARED_W, height=_DECLARED_H) @@ -302,11 +397,11 @@ def test_packbits_bomb_rejected(tmp_path): # Negative tests: legitimate high-ratio compression must still pass # --------------------------------------------------------------------------- + def test_legitimate_high_compression_passes(tmp_path): """All-zero array compresses to a fraction of declared size — must pass.""" arr = np.zeros((_DECLARED_H, _DECLARED_W), dtype=np.uint8) strip = zlib.compress(arr.tobytes(), 9) - # Confirm we actually have a high ratio (not a degenerate test). assert len(strip) < _DECLARED_BYTES // 50 tiff = _build_tiff_with_strip(strip, compression=8, width=_DECLARED_W, height=_DECLARED_H) @@ -319,14 +414,8 @@ def test_legitimate_high_compression_passes(tmp_path): def test_cap_includes_metadata_margin(): - """The cap allows ~5% of legitimate codec metadata above expected size. - - Some encoders emit small framing or trailing bytes; the cap must not - reject them. We feed a payload exactly at expected_size + a few bytes - and confirm it decodes. - """ + """The cap allows ~5% of legitimate codec metadata above expected size.""" expected = 1000 - # Decompressed size: expected + 30 (3% over). Within the 5% margin. data = b'A' * (expected + 30) comp = zlib.compress(data, 9) out = deflate_decompress(comp, expected_size=expected) @@ -334,31 +423,19 @@ def test_cap_includes_metadata_margin(): # --------------------------------------------------------------------------- -# LERC and JPEG 2000 codec-level bomb tests (issue #1625) +# LERC and JPEG 2000 codec-level bomb tests # --------------------------------------------------------------------------- -# -# LERC and JPEG 2000 use external libraries (lerc / glymur) that materialise -# the full decoded buffer before returning, so the existing post-decode -# size check in ``_decode_strip_or_tile`` fires only after the bomb is -# already in memory. The wrappers in ``_compression.py`` now query each -# codestream's declared dimensions (LERC via ``getLercBlobInfo``, JPEG 2000 -# via ``Jp2k.shape``) and raise before invoking the underlying decoder. + @pytest.mark.skipif(not _HAS_LERC, reason="lerc not installed") class TestLercDirect: def test_lerc_bomb_raises(self): - """A LERC blob whose declared dimensions exceed the cap must raise. - - Constant-value rasters compress at >700,000:1 in LERC, so a - 4096x4096 float32 (64 MiB) encodes to ~94 bytes. The cap is set to - 1 KiB, well below the declared 64 MiB, and the wrapper must reject - the blob before ``lerc.decode`` allocates the output buffer. - """ + """A LERC blob whose declared dimensions exceed the cap must raise.""" import lerc arr = np.zeros((4096, 4096), dtype=np.float32) encoded = lerc.encode(arr, 1, False, None, 0.0, 1) blob = bytes(encoded[2]) - assert len(blob) < 1024 # confirm this is a real high-ratio blob + assert len(blob) < 1024 from xrspatial.geotiff._compression import lerc_decompress with pytest.raises(ValueError, match="exceed"): lerc_decompress(blob, expected_size=1024) @@ -381,7 +458,7 @@ def test_lerc_no_cap_when_expected_size_zero(self): encoded = lerc.encode(arr, 1, False, None, 0.0, 1) blob = bytes(encoded[2]) from xrspatial.geotiff._compression import lerc_decompress - out = lerc_decompress(blob) # no expected_size + out = lerc_decompress(blob) decoded = np.frombuffer(out, dtype=np.float32).reshape(128, 128) assert decoded.shape == arr.shape @@ -389,24 +466,16 @@ def test_lerc_no_cap_when_expected_size_zero(self): @pytest.mark.skipif(not _HAS_GLYMUR, reason="glymur not installed") class TestJpeg2000Direct: def test_jpeg2000_bomb_raises(self, tmp_path): - """A JPEG 2000 codestream whose declared shape exceeds the cap raises. - - Glymur reports ``Jp2k(file).shape`` from the SIZ marker without - triggering pixel decoding, so the wrapper validates the declared - ``H * W * dtype_bytes`` against the bomb cap before calling - ``jp2[:]``. - """ + """A JPEG 2000 codestream whose declared shape exceeds the cap raises.""" import glymur - # Build a real 2000x2000 uint8 codestream (~150 bytes for zeros). arr = np.zeros((2000, 2000), dtype=np.uint8) tmp = tmp_path / "src.j2k" glymur.Jp2k(str(tmp), data=arr) blob = tmp.read_bytes() - assert len(blob) < 10_000 # confirm high ratio + assert len(blob) < 10_000 from xrspatial.geotiff._compression import jpeg2000_decompress with pytest.raises(ValueError, match="exceed"): - # declared output 4 MiB, cap 1 KiB jpeg2000_decompress( blob, width=2000, height=2000, samples=1, expected_size=1024) @@ -415,8 +484,6 @@ def test_jpeg2000_legitimate_passes(self, tmp_path): """A JPEG 2000 blob whose declared output matches expected_size passes.""" import glymur - # Use a 64x64 raster: large enough for the default 6-resolution - # OpenJPEG pyramid without tripping its min-tile-size check. arr = (np.arange(64 * 64, dtype=np.uint8) % 200).reshape(64, 64) tmp = tmp_path / "legit.j2k" glymur.Jp2k(str(tmp), data=arr) @@ -442,14 +509,7 @@ def test_jpeg2000_no_cap_when_expected_size_zero(self, tmp_path): def test_jpeg2000_unreadable_shape_fails_closed( self, tmp_path, monkeypatch): - """If the SIZ marker is unreadable, refuse to call ``jp2[:]``. - - Earlier the wrapper silently disabled the cap on - ``Jp2k.shape``/``dtype`` failure, which would let an attacker - bypass the bomb guard with a malformed-but-decodable - codestream. The current behaviour is fail-closed: raise - ``ValueError`` before any pixel-decoding work runs. - """ + """If the SIZ marker is unreadable, refuse to call ``jp2[:]``.""" import glymur arr = np.zeros((64, 64), dtype=np.uint8) tmp = tmp_path / "broken.j2k" @@ -479,31 +539,15 @@ def __getitem__(self, _): # --------------------------------------------------------------------------- -# JPEG (issue #1792) +# JPEG SOF cap # --------------------------------------------------------------------------- -# -# Pillow has its own DecompressionBombError, but it fires only at -# ~178M pixels (~500 MB RGB). A malicious TIFF can declare a small tile -# (e.g. 256x256 RGB, ~196 KiB expected) while shipping a JPEG payload -# whose SOF marker declares a much larger image; that lets ~500 MB -# allocate per tile before the downstream chunk.size != expected -# reshape check fires. ``jpeg_decompress`` now parses the JPEG SOF -# marker and raises before handing the blob to Pillow when the -# declared output exceeds ``expected * 1.05 + 1`` bytes. See -# https://github.com/xarray-contrib/xarray-spatial/issues/1792 . + def _forge_jpeg_with_sof_dimensions(real_h: int, real_w: int, real_c: int, declared_h: int, declared_w: int) -> bytes: - """Build a real JPEG and rewrite the SOF marker's H/W fields. - - Pillow encodes a small valid image so the bytestream is a complete - JPEG; we then overwrite the height/width fields in the SOF segment - so the SOF claims a much larger image than the payload can decode. - The decoder never gets the chance to fail on the mismatch because - the pre-decode cap fires first -- which is the property under test. - """ + """Build a real JPEG and rewrite the SOF marker's H/W fields.""" import io from PIL import Image @@ -512,7 +556,6 @@ def _forge_jpeg_with_sof_dimensions(real_h: int, real_w: int, buf = io.BytesIO() img.save(buf, format='JPEG', quality=75) data = bytearray(buf.getvalue()) - # Find SOF0..SOF3,SOF5..SOF7,SOF9..SOF11,SOF13..SOF15 marker. sof_codes = { 0xC0, 0xC1, 0xC2, 0xC3, 0xC5, 0xC6, 0xC7, 0xC9, 0xCA, 0xCB, 0xCD, 0xCE, 0xCF, @@ -524,7 +567,6 @@ def _forge_jpeg_with_sof_dimensions(real_h: int, real_w: int, raise AssertionError("forged JPEG lost marker alignment") marker = data[i + 1] if marker in sof_codes: - # SOF: 0xFF Cx | len(2) | precision(1) | H(2) | W(2) | components(1) data[i + 5] = (declared_h >> 8) & 0xFF data[i + 6] = declared_h & 0xFF data[i + 7] = (declared_w >> 8) & 0xFF @@ -541,24 +583,13 @@ def _forge_jpeg_with_sof_dimensions(real_h: int, real_w: int, @pytest.mark.skipif(not _HAS_PILLOW, reason="Pillow not installed") class TestJpegDirect: def test_jpeg_bomb_raises(self): - """A JPEG whose SOF dimensions exceed the per-tile cap must raise. - - The forged JPEG payload itself is small (a real 16x16 image with - the SOF marker rewritten to declare 8000x8000x3). The wrapper - is given a per-tile expected size of 32x32x3 = 3072 bytes; the - declared output 8000*8000*3 = 192_000_000 bytes is well above - the ``3072 * 1.05 + 1`` cap and the decode must be refused. - """ + """A JPEG whose SOF dimensions exceed the per-tile cap must raise.""" blob = _forge_jpeg_with_sof_dimensions( real_h=16, real_w=16, real_c=3, declared_h=8000, declared_w=8000, ) from xrspatial.geotiff._compression import jpeg_decompress - # Match the full diagnostic so a regression that swaps in a - # different error path (e.g. Pillow's own DecompressionBombError - # with a different wording, or a numeric overflow before the - # explicit guard) fails the test instead of silently passing. with pytest.raises( ValueError, match=r"jpeg decode would exceed.*Likely a decompression bomb", @@ -579,20 +610,11 @@ def test_jpeg_legitimate_passes(self): assert arr.shape == (32, 32, 3) def test_jpeg_no_cap_when_size_kwargs_default(self): - """Backward-compat: omitting size kwargs falls back to Pillow's guard. - - Direct callers and round-trip tests pass no dimensions; the - pre-check must be a no-op so those keep working. - """ + """Backward-compat: omitting size kwargs falls back to Pillow's guard.""" blob = _forge_jpeg_with_sof_dimensions( real_h=16, real_w=16, real_c=3, declared_h=64, declared_w=64, ) - # With no dimension kwargs, the cap is disabled. The forged JPEG - # declares 64x64 but encodes only 16x16 of payload -- libjpeg - # raises on the truncation; the bomb cap is what we're checking - # is *not* the source of any exception here. Catch whatever - # Pillow raises and assert it isn't our bomb message. from PIL import Image as _Img # noqa: F401 from xrspatial.geotiff._compression import jpeg_decompress @@ -601,42 +623,20 @@ def test_jpeg_no_cap_when_size_kwargs_default(self): except ValueError as exc: assert "Likely a decompression bomb" not in str(exc) except Exception: - # libjpeg/Pillow errors are acceptable -- we only care that - # the bomb cap did not fire. pass def test_jpeg_malformed_falls_through_to_pillow(self): - """A JPEG without a parseable SOF defers to Pillow's own guard. - - We don't want the pre-check to misclassify weird-but-valid - streams; if the helper can't read the SOF it should return - ``None`` and let Pillow raise its own error. - """ - # SOI followed by EOI -- a syntactically valid but empty stream - # with no SOF marker. + """A JPEG without a parseable SOF defers to Pillow's own guard.""" blob = bytes([0xFF, 0xD8, 0xFF, 0xD9]) from xrspatial.geotiff._compression import jpeg_decompress - # No SOF -> bomb cap returns None -> Pillow raises on the empty - # stream. with pytest.raises(Exception): jpeg_decompress(blob, width=32, height=32, samples=3) def test_jpeg_sof_with_truncated_segment_length_returns_none(self): - """A SOF segment whose declared length runs past EOF returns None. - - Without segment-length validation, ``_read_jpeg_sof`` would happily - read height/width/components at fixed offsets even when those - offsets pointed past the segment. The pre-check now demands - ``seg_len >= 8`` and ``i + 2 + seg_len <= n`` before reading; - truncated SOFs are treated as "unknown size" and the bomb cap - defers to Pillow. - """ + """A SOF segment whose declared length runs past EOF returns None.""" from xrspatial.geotiff._compression import _read_jpeg_sof - # SOI | SOF0 | seg_len=64 (advertises 64 bytes of segment, but - # the buffer ends after only 10 bytes of segment payload). - # Truncation -> _read_jpeg_sof must return None. truncated = bytes([ 0xFF, 0xD8, # SOI 0xFF, 0xC0, # SOF0 diff --git a/xrspatial/geotiff/tests/read/test_coords.py b/xrspatial/geotiff/tests/read/test_coords.py new file mode 100644 index 000000000..0fe2f32b4 --- /dev/null +++ b/xrspatial/geotiff/tests/read/test_coords.py @@ -0,0 +1,129 @@ +"""Coordinate / geotransform reconstruction on read. + +Consolidates the descending / ascending coord round-trip coverage +formerly in ``test_descending_coords_1716.py``. The reader has to +reconstruct the original axis direction from the file's +``ModelTransformationTag`` (34264) when the writer chose a non-standard +orientation, so the round-trip check pins both halves of the contract. +""" +from __future__ import annotations + +import numpy as np +import xarray as xr + +from xrspatial.geotiff import open_geotiff, to_geotiff +from xrspatial.geotiff._geotags import (TAG_MODEL_PIXEL_SCALE, TAG_MODEL_TIEPOINT, + TAG_MODEL_TRANSFORMATION) +from xrspatial.geotiff._header import parse_all_ifds, parse_header + + +def _ifd_tag_ids(path: str) -> set[int]: + with open(path, 'rb') as fh: + data = fh.read() + header = parse_header(data) + ifds = parse_all_ifds(data, header) + return set(ifds[0].entries.keys()) + + +def _make_da(x_coords: np.ndarray, y_coords: np.ndarray) -> xr.DataArray: + arr = np.arange(len(y_coords) * len(x_coords), dtype=np.float32) + arr = arr.reshape(len(y_coords), len(x_coords)) + return xr.DataArray( + arr, + dims=('y', 'x'), + coords={'y': y_coords, 'x': x_coords}, + ) + + +class TestDescendingCoordsRoundTrip: + """Round-trip read of non-standard-orientation rasters.""" + + def test_descending_x_roundtrip(self, tmp_path): + """Descending x coords survive the round trip.""" + # x decreases left-to-right (unusual but valid) + x = np.array([200.0, 190.0, 180.0, 170.0, 160.0], dtype=np.float64) + y = np.array([50.0, 40.0, 30.0, 20.0], dtype=np.float64) # north-up + da = _make_da(x, y) + + out = tmp_path / 'desc_x.tif' + to_geotiff(da, str(out), crs=4326) + + loaded = open_geotiff(str(out)) + np.testing.assert_allclose(loaded.coords['x'].values, x) + np.testing.assert_allclose(loaded.coords['y'].values, y) + np.testing.assert_array_equal(loaded.values, da.values) + + def test_ascending_y_roundtrip(self, tmp_path): + """Ascending y coords survive the round trip.""" + x = np.array([160.0, 170.0, 180.0, 190.0, 200.0], dtype=np.float64) + # y increases top-to-bottom (south-up) + y = np.array([20.0, 30.0, 40.0, 50.0], dtype=np.float64) + da = _make_da(x, y) + + out = tmp_path / 'asc_y.tif' + to_geotiff(da, str(out), crs=4326) + + loaded = open_geotiff(str(out)) + np.testing.assert_allclose(loaded.coords['x'].values, x) + np.testing.assert_allclose(loaded.coords['y'].values, y) + np.testing.assert_array_equal(loaded.values, da.values) + + def test_descending_x_and_ascending_y_roundtrip(self, tmp_path): + """Both axes flipped relative to north-up.""" + x = np.array([200.0, 190.0, 180.0, 170.0, 160.0], dtype=np.float64) + y = np.array([20.0, 30.0, 40.0, 50.0], dtype=np.float64) + da = _make_da(x, y) + + out = tmp_path / 'desc_x_asc_y.tif' + to_geotiff(da, str(out), crs=4326) + + loaded = open_geotiff(str(out)) + np.testing.assert_allclose(loaded.coords['x'].values, x) + np.testing.assert_allclose(loaded.coords['y'].values, y) + np.testing.assert_array_equal(loaded.values, da.values) + + +class TestOrientationTagSelection: + """The writer picks the right tags for the orientation; the reader + has to be able to read either flavour.""" + + def test_north_up_uses_pixel_scale_and_tiepoint(self, tmp_path): + """North-up keeps ModelPixelScale + ModelTiepoint (no transformation).""" + x = np.array([160.0, 170.0, 180.0, 190.0, 200.0], dtype=np.float64) + y = np.array([50.0, 40.0, 30.0, 20.0], dtype=np.float64) + da = _make_da(x, y) + + out = tmp_path / 'north_up.tif' + to_geotiff(da, str(out), crs=4326) + + tag_ids = _ifd_tag_ids(str(out)) + assert TAG_MODEL_PIXEL_SCALE in tag_ids + assert TAG_MODEL_TIEPOINT in tag_ids + assert TAG_MODEL_TRANSFORMATION not in tag_ids + + def test_descending_x_uses_transformation_tag(self, tmp_path): + """Non-standard orientation emits ModelTransformationTag.""" + x = np.array([200.0, 190.0, 180.0, 170.0, 160.0], dtype=np.float64) + y = np.array([50.0, 40.0, 30.0, 20.0], dtype=np.float64) + da = _make_da(x, y) + + out = tmp_path / 'desc_x_tags.tif' + to_geotiff(da, str(out), crs=4326) + + tag_ids = _ifd_tag_ids(str(out)) + assert TAG_MODEL_TRANSFORMATION in tag_ids + assert TAG_MODEL_PIXEL_SCALE not in tag_ids + assert TAG_MODEL_TIEPOINT not in tag_ids + + def test_ascending_y_uses_transformation_tag(self, tmp_path): + x = np.array([160.0, 170.0, 180.0, 190.0, 200.0], dtype=np.float64) + y = np.array([20.0, 30.0, 40.0, 50.0], dtype=np.float64) + da = _make_da(x, y) + + out = tmp_path / 'asc_y_tags.tif' + to_geotiff(da, str(out), crs=4326) + + tag_ids = _ifd_tag_ids(str(out)) + assert TAG_MODEL_TRANSFORMATION in tag_ids + assert TAG_MODEL_PIXEL_SCALE not in tag_ids + assert TAG_MODEL_TIEPOINT not in tag_ids diff --git a/xrspatial/geotiff/tests/read/test_dtypes.py b/xrspatial/geotiff/tests/read/test_dtypes.py new file mode 100644 index 000000000..3de820ee9 --- /dev/null +++ b/xrspatial/geotiff/tests/read/test_dtypes.py @@ -0,0 +1,530 @@ +"""Reader dtype handling. + +Consolidates: + +* ``test_dtype_read.py`` -- ``dtype=`` kwarg on ``open_geotiff`` (eager + + dask, float -> float / int -> int casts, float -> int rejection). +* ``test_float16_read_1941.py`` -- IEEE half-precision auto-promotion to + float32 on read (eager + dask). +* ``test_float16_read_gpu_1941.py`` -- the same float16 promotion on + ``read_geotiff_gpu`` and ``open_geotiff(gpu=True)``. +""" +from __future__ import annotations + +import numpy as np +import pytest +import xarray as xr + +from xrspatial.geotiff import open_geotiff, read_geotiff_dask, to_geotiff +from xrspatial.geotiff._dtypes import (SAMPLE_FORMAT_FLOAT, SAMPLE_FORMAT_INT, SAMPLE_FORMAT_UINT, + tiff_dtype_to_numpy, tiff_storage_dtype) + +from .._helpers.markers import requires_gpu as _gpu_only + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + + +@pytest.fixture +def float64_tif(tmp_path): + """Write a float64 GeoTIFF for dtype cast tests.""" + arr = np.random.default_rng(99).random((80, 80)).astype(np.float64) + y = np.linspace(40.0, 41.0, 80) + x = np.linspace(-105.0, -104.0, 80) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': y, 'x': x}, + attrs={'crs': 4326}) + path = str(tmp_path / 'dtype_f64.tif') + to_geotiff(da, path, compression='none') + return path, arr + + +@pytest.fixture +def uint16_tif(tmp_path): + """Write a uint16 GeoTIFF for dtype cast tests.""" + arr = np.random.default_rng(77).integers(0, 10000, (60, 60), + dtype=np.uint16) + y = np.linspace(40.0, 41.0, 60) + x = np.linspace(-105.0, -104.0, 60) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': y, 'x': x}, + attrs={'crs': 4326}) + path = str(tmp_path / 'dtype_u16.tif') + to_geotiff(da, path, compression='none') + return path, arr + + +@pytest.fixture +def float16_tif(tmp_path): + """Write a small float16 GeoTIFF using tifffile. + + tifffile encodes numpy float16 with ``BitsPerSample=16`` and + ``SampleFormat=3``, which is what an external rasterio / GDAL caller + would produce. + """ + tifffile = pytest.importorskip("tifffile") + arr = np.array( + [[0.0, 1.0, 2.0, 3.0], + [-1.0, -2.0, -3.0, -4.0], + [0.5, 1.5, 2.5, 3.5], + [100.0, 200.0, 300.0, 400.0]], + dtype=np.float16, + ) + path = tmp_path / "f16.tif" + tifffile.imwrite(str(path), arr, compression=None) + return path, arr + + +@pytest.fixture +def float16_stripped_tif(tmp_path): + """Stripped float16 GeoTIFF: triggers the bps_mismatch CPU fallback.""" + tifffile = pytest.importorskip("tifffile") + arr = np.array( + [[0.0, 1.0, 2.0, 3.0], + [-1.0, -2.0, -3.0, -4.0], + [0.5, 1.5, 2.5, 3.5], + [100.0, 200.0, 300.0, 400.0]], + dtype=np.float16, + ) + path = tmp_path / "f16_stripped.tif" + tifffile.imwrite(str(path), arr, compression=None) + return path, arr + + +@pytest.fixture +def float16_tiled_tif(tmp_path): + """Multi-tile float16 GeoTIFF: 32x32 image, 16x16 tiles (2x2 grid).""" + tifffile = pytest.importorskip("tifffile") + arr = np.arange(1024, dtype=np.float16).reshape(32, 32) + path = tmp_path / "f16_tiled.tif" + tifffile.imwrite( + str(path), arr, compression="deflate", tile=(16, 16)) + return path, arr + + +@pytest.fixture +def float16_tiled_uncompressed_tif(tmp_path): + """Tiled uncompressed float16 GeoTIFF.""" + tifffile = pytest.importorskip("tifffile") + arr = np.arange(256, dtype=np.float16).reshape(16, 16) + path = tmp_path / "f16_tiled_none.tif" + tifffile.imwrite( + str(path), arr, compression=None, tile=(16, 16)) + return path, arr + + +# --------------------------------------------------------------------------- +# dtype= kwarg on open_geotiff (eager) +# --------------------------------------------------------------------------- + + +class TestDtypeEager: + def test_float64_to_float32(self, float64_tif): + path, orig = float64_tif + result = open_geotiff(path, dtype='float32') + assert result.dtype == np.float32 + np.testing.assert_array_almost_equal( + result.values, orig.astype(np.float32), decimal=6) + + def test_float64_to_float16(self, float64_tif): + path, orig = float64_tif + result = open_geotiff(path, dtype=np.float16) + assert result.dtype == np.float16 + + def test_uint16_to_int32(self, uint16_tif): + path, orig = uint16_tif + result = open_geotiff(path, dtype='int32') + assert result.dtype == np.int32 + np.testing.assert_array_equal(result.values, orig.astype(np.int32)) + + def test_uint16_to_uint8(self, uint16_tif): + path, _ = uint16_tif + result = open_geotiff(path, dtype='uint8') + assert result.dtype == np.uint8 + + def test_float_to_int_raises(self, float64_tif): + path, _ = float64_tif + with pytest.raises(ValueError, match='float.*int'): + open_geotiff(path, dtype='int32') + + def test_dtype_none_preserves_native(self, float64_tif): + path, _ = float64_tif + result = open_geotiff(path, dtype=None) + assert result.dtype == np.float64 + + def test_int_with_nodata_float_to_int_raises(self, tmp_path): + """uint16 file with nodata: nodata masking promotes to float64, so float->int validation fires.""" # noqa: E501 + arr = np.array([[1, 2], [3, 9999]], dtype=np.uint16) + y = np.linspace(40.0, 41.0, 2) + x = np.linspace(-105.0, -104.0, 2) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': y, 'x': x}, + attrs={'crs': 4326, 'nodata': 9999.0}) + path = str(tmp_path / 'dtype_nodata_int_eager.tif') + to_geotiff(da, path, compression='none') + with pytest.raises(ValueError, match='float.*int'): + open_geotiff(path, dtype='int32') + + +# --------------------------------------------------------------------------- +# dtype= kwarg on open_geotiff (dask) +# --------------------------------------------------------------------------- + + +class TestDtypeDask: + def test_float64_to_float32_dask(self, float64_tif): + path, orig = float64_tif + result = open_geotiff(path, dtype='float32', chunks=40) + assert result.dtype == np.float32 + computed = result.values + np.testing.assert_array_almost_equal( + computed, orig.astype(np.float32), decimal=6) + + def test_chunks_are_target_dtype(self, float64_tif): + path, _ = float64_tif + result = open_geotiff(path, dtype='float32', chunks=40) + assert result.data.dtype == np.float32 + + def test_float_to_int_raises_dask(self, float64_tif): + path, _ = float64_tif + with pytest.raises(ValueError, match='float.*int'): + open_geotiff(path, dtype='int32', chunks=40) + + def test_int_with_nodata_float_to_int_raises_dask(self, tmp_path): + """uint16 file with nodata: nodata masking promotes to float64, so float->int validation fires.""" # noqa: E501 + arr = np.array([[1, 2], [3, 9999]], dtype=np.uint16) + y = np.linspace(40.0, 41.0, 2) + x = np.linspace(-105.0, -104.0, 2) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': y, 'x': x}, + attrs={'crs': 4326, 'nodata': 9999.0}) + path = str(tmp_path / 'dtype_nodata_int_dask.tif') + to_geotiff(da, path, compression='none') + with pytest.raises(ValueError, match='float.*int'): + open_geotiff(path, dtype='int32', chunks=2) + + +# --------------------------------------------------------------------------- +# Float16 dtype-map: auto-promotion on read +# --------------------------------------------------------------------------- + + +class TestFloat16DtypeMap: + """The dtype map auto-promotes float16 on read.""" + + def test_tiff_dtype_to_numpy_float16(self): + assert tiff_dtype_to_numpy(16, SAMPLE_FORMAT_FLOAT) == np.float32 + + def test_tiff_storage_dtype_float16(self): + assert tiff_storage_dtype(16, SAMPLE_FORMAT_FLOAT) == np.float16 + + def test_tiff_storage_dtype_delegates_for_non_promoted(self): + # Non-promoted keys behave identically. + for bps, sf in [ + (8, SAMPLE_FORMAT_UINT), + (16, SAMPLE_FORMAT_UINT), + (16, SAMPLE_FORMAT_INT), + (32, SAMPLE_FORMAT_FLOAT), + (64, SAMPLE_FORMAT_FLOAT), + ]: + assert tiff_storage_dtype(bps, sf) == tiff_dtype_to_numpy(bps, sf) + + +# --------------------------------------------------------------------------- +# Float16 eager + dask reads +# --------------------------------------------------------------------------- + + +class TestEagerFloat16Read: + """``open_geotiff`` decodes an external float16 file to float32.""" + + def test_open_geotiff_returns_float32(self, float16_tif): + path, arr = float16_tif + result = open_geotiff(str(path)) + assert result.dtype == np.float32 + # Float16 values fit exactly in float32, so equality is well-defined. + np.testing.assert_array_equal(result.values, arr.astype(np.float32)) + + def test_open_geotiff_dask_returns_float32(self, float16_tif): + path, arr = float16_tif + result = read_geotiff_dask(str(path), chunks=2) + assert result.dtype == np.float32 + np.testing.assert_array_equal( + result.compute().values, arr.astype(np.float32)) + + +class TestPredictor3Float16: + """Predictor=3 + float16 on disk also decodes correctly.""" + + def test_predictor3_float16_round_trip(self, tmp_path): + tifffile = pytest.importorskip("tifffile") + pytest.importorskip("imagecodecs") # required for predictor=3 + arr = np.linspace(-1.0, 1.0, 16).astype(np.float16).reshape(4, 4) + path = tmp_path / "pred3_f16.tif" + tifffile.imwrite( + str(path), arr, predictor=3, compression="deflate") + + result = open_geotiff(str(path)) + assert result.dtype == np.float32 + np.testing.assert_array_equal( + result.values, arr.astype(np.float32)) + + +class TestFloat16RegressionGuards: + """The float16 promotion did not change non-float16 behaviour.""" + + def test_float32_still_float32(self, tmp_path): + tifffile = pytest.importorskip("tifffile") + arr = np.arange(16, dtype=np.float32).reshape(4, 4) + path = tmp_path / "f32.tif" + tifffile.imwrite(str(path), arr) + + result = open_geotiff(str(path)) + assert result.dtype == np.float32 + np.testing.assert_array_equal(result.values, arr) + + def test_float64_still_float64(self, tmp_path): + tifffile = pytest.importorskip("tifffile") + arr = np.arange(16, dtype=np.float64).reshape(4, 4) + path = tmp_path / "f64.tif" + tifffile.imwrite(str(path), arr) + + result = open_geotiff(str(path)) + assert result.dtype == np.float64 + np.testing.assert_array_equal(result.values, arr) + + def test_uint16_still_uint16(self, tmp_path): + tifffile = pytest.importorskip("tifffile") + arr = np.arange(16, dtype=np.uint16).reshape(4, 4) + path = tmp_path / "u16.tif" + tifffile.imwrite(str(path), arr) + + result = open_geotiff(str(path)) + assert result.dtype == np.uint16 + np.testing.assert_array_equal(result.values, arr) + + +# --------------------------------------------------------------------------- +# Float16 GPU read paths +# --------------------------------------------------------------------------- + + +class TestEagerGPUReadFloat16: + """``read_geotiff_gpu`` returns float32 for stripped float16 input.""" + + @_gpu_only + def test_read_geotiff_gpu_stripped_returns_float32( + self, float16_stripped_tif + ): + from xrspatial.geotiff import read_geotiff_gpu + + path, arr = float16_stripped_tif + result = read_geotiff_gpu(str(path)) + assert result.dtype == np.float32, ( + f"GPU read of float16 must return float32, got {result.dtype}" + ) + np.testing.assert_array_equal( + result.data.get(), arr.astype(np.float32)) + + @_gpu_only + def test_read_geotiff_gpu_tiled_returns_float32( + self, float16_tiled_tif + ): + from xrspatial.geotiff import read_geotiff_gpu + + path, arr = float16_tiled_tif + result = read_geotiff_gpu(str(path)) + assert result.dtype == np.float32 + np.testing.assert_array_equal( + result.data.get(), arr.astype(np.float32)) + + @_gpu_only + def test_read_geotiff_gpu_tiled_uncompressed_returns_float32( + self, float16_tiled_uncompressed_tif + ): + from xrspatial.geotiff import read_geotiff_gpu + + path, arr = float16_tiled_uncompressed_tif + result = read_geotiff_gpu(str(path)) + assert result.dtype == np.float32 + np.testing.assert_array_equal( + result.data.get(), arr.astype(np.float32)) + + @_gpu_only + def test_open_geotiff_gpu_dispatcher_float16(self, float16_tiled_tif): + """``open_geotiff(gpu=True)`` dispatches correctly for float16.""" + path, arr = float16_tiled_tif + result = open_geotiff(str(path), gpu=True) + assert result.dtype == np.float32 + np.testing.assert_array_equal( + result.data.get(), arr.astype(np.float32)) + + +class TestGPUWindowedFloat16: + """Windowed GPU reads honour the bps_mismatch fallback path.""" + + @_gpu_only + def test_read_geotiff_gpu_windowed_stripped(self, float16_stripped_tif): + from xrspatial.geotiff import read_geotiff_gpu + + path, arr = float16_stripped_tif + result = read_geotiff_gpu(str(path), window=(0, 0, 2, 2)) + assert result.dtype == np.float32 + assert result.shape == (2, 2) + np.testing.assert_array_equal( + result.data.get(), arr[:2, :2].astype(np.float32)) + + @_gpu_only + def test_read_geotiff_gpu_windowed_tiled(self, float16_tiled_tif): + from xrspatial.geotiff import read_geotiff_gpu + + path, arr = float16_tiled_tif + result = read_geotiff_gpu(str(path), window=(0, 0, 8, 8)) + assert result.dtype == np.float32 + assert result.shape == (8, 8) + np.testing.assert_array_equal( + result.data.get(), arr[:8, :8].astype(np.float32)) + + +class TestDaskGPUFloat16: + """``open_geotiff(chunks=, gpu=True)`` decodes float16 correctly.""" + + @_gpu_only + def test_dask_gpu_tiled_float16(self, float16_tiled_tif): + path, arr = float16_tiled_tif + result = open_geotiff(str(path), chunks=8, gpu=True) + assert result.dtype == np.float32, ( + f"dask+GPU read of float16 must return float32, got {result.dtype}" + ) + computed = result.compute() + np.testing.assert_array_equal( + computed.data.get(), arr.astype(np.float32)) + + @_gpu_only + def test_read_geotiff_gpu_chunks_kwarg_float16(self, float16_tiled_tif): + """``read_geotiff_gpu(chunks=)`` also routes correctly.""" + from xrspatial.geotiff import read_geotiff_gpu + + path, arr = float16_tiled_tif + result = read_geotiff_gpu(str(path), chunks=8) + assert result.dtype == np.float32 + computed = result.compute() + np.testing.assert_array_equal( + computed.data.get(), arr.astype(np.float32)) + + +class TestGDSPathGatedOffForFloat16: + """``_gds_chunk_path_available`` returns False for (bps=16, sf=3).""" + + @_gpu_only + def test_gds_path_gated_off_for_float16(self, float16_tiled_tif): + pytest.importorskip("kvikio", exc_type=ImportError) + + from xrspatial.geotiff._backends.gpu import _gds_chunk_path_available + from xrspatial.geotiff._header import parse_all_ifds, parse_header + + path, _ = float16_tiled_tif + with open(str(path), "rb") as f: + data = f.read() + header = parse_header(data) + ifds = parse_all_ifds(data, header) + ifd = ifds[0] + + assert ifd.is_tiled, "fixture sanity: tiled layout expected" + bps_first = ifd.bits_per_sample + if isinstance(bps_first, tuple): + bps = bps_first[0] if bps_first else 0 + else: + bps = bps_first + assert bps == 16, "fixture sanity: bps=16 expected" + assert ifd.sample_format == SAMPLE_FORMAT_FLOAT + + result = _gds_chunk_path_available( + str(path), ifd, has_sparse_tile=False, orientation=1) + assert result is False, ( + "_gds_chunk_path_available must return False for " + "(bps=16, sf=float) so the GDS chunked path does not " + "mis-decode half-precision tiles." + ) + + @_gpu_only + def test_gds_path_allowed_for_float32_tiled(self, tmp_path): + """Sanity: GDS path remains allowed for a float32 tiled file.""" + tifffile = pytest.importorskip("tifffile") + pytest.importorskip("kvikio", exc_type=ImportError) + + arr = np.arange(256, dtype=np.float32).reshape(16, 16) + path = tmp_path / "f32_tiled.tif" + tifffile.imwrite( + str(path), arr, compression="deflate", tile=(16, 16)) + + from xrspatial.geotiff._backends.gpu import _gds_chunk_path_available + from xrspatial.geotiff._header import parse_all_ifds, parse_header + + with open(str(path), "rb") as f: + data = f.read() + header = parse_header(data) + ifds = parse_all_ifds(data, header) + + result = _gds_chunk_path_available( + str(path), ifds[0], has_sparse_tile=False, orientation=1) + assert result is True, ( + "_gds_chunk_path_available must remain True for " + "(bps=32, sf=float) tiled files so the kvikio GDS chunk " + "path still applies." + ) + + +class TestBackendParityFloat16: + """All four backends agree pixel-exact on float16 input.""" + + @_gpu_only + def test_eager_numpy_equals_gpu(self, float16_tiled_tif): + path, _ = float16_tiled_tif + cpu = open_geotiff(str(path)) + gpu = open_geotiff(str(path), gpu=True) + + assert cpu.dtype == gpu.dtype == np.float32 + np.testing.assert_array_equal(np.asarray(cpu), gpu.data.get()) + + @_gpu_only + def test_eager_numpy_equals_dask_gpu(self, float16_tiled_tif): + path, _ = float16_tiled_tif + cpu = open_geotiff(str(path)) + dask_gpu = open_geotiff(str(path), chunks=8, gpu=True).compute() + + assert cpu.dtype == dask_gpu.dtype == np.float32 + np.testing.assert_array_equal( + np.asarray(cpu), dask_gpu.data.get()) + + @_gpu_only + def test_dask_numpy_equals_dask_gpu(self, float16_tiled_tif): + path, _ = float16_tiled_tif + dask_cpu = read_geotiff_dask(str(path), chunks=8).compute() + dask_gpu = open_geotiff(str(path), chunks=8, gpu=True).compute() + + np.testing.assert_array_equal( + np.asarray(dask_cpu), dask_gpu.data.get()) + + +class TestPredictor3Float16GPU: + """Predictor=3 + float16 on disk also decodes correctly on GPU.""" + + @_gpu_only + def test_predictor3_float16_gpu_round_trip(self, tmp_path): + tifffile = pytest.importorskip("tifffile") + pytest.importorskip("imagecodecs") # required for predictor=3 + + from xrspatial.geotiff import read_geotiff_gpu + + arr = np.linspace(-1.0, 1.0, 16).astype(np.float16).reshape(4, 4) + path = tmp_path / "pred3_f16.tif" + tifffile.imwrite( + str(path), arr, predictor=3, compression="deflate") + + result = read_geotiff_gpu(str(path)) + assert result.dtype == np.float32 + np.testing.assert_array_equal( + result.data.get(), arr.astype(np.float32)) diff --git a/xrspatial/geotiff/tests/test_gpu_byteswap_1508.py b/xrspatial/geotiff/tests/read/test_endianness.py similarity index 69% rename from xrspatial/geotiff/tests/test_gpu_byteswap_1508.py rename to xrspatial/geotiff/tests/read/test_endianness.py index 4cde5cc40..152051d7c 100644 --- a/xrspatial/geotiff/tests/test_gpu_byteswap_1508.py +++ b/xrspatial/geotiff/tests/read/test_endianness.py @@ -1,16 +1,11 @@ -"""Regression test for issue #1508. - -Big-endian multi-byte TIFFs read via ``read_geotiff_gpu`` used to crash -inside the GPU decode pipeline with:: - - AttributeError: 'ndarray' object has no attribute 'byteswap' - -because ``cupy.ndarray`` (as of cupy 13.x) does not expose ``byteswap()``. -The dispatcher in ``read_geotiff_gpu`` caught the error and silently fell -back to CPU, so results stayed correct but the GPU fast path was lost. - -These tests confirm the GPU path now decodes BE multi-byte data directly -(result is a CuPy array, not a NumPy fallback) and matches the CPU read. +"""Big-endian / little-endian GeoTIFF reader paths. + +Consolidates the GPU byteswap regression coverage formerly in +``test_gpu_byteswap_1508.py``. Pre-fix big-endian multi-byte TIFFs read +via ``read_geotiff_gpu`` crashed inside the GPU decode pipeline because +``cupy.ndarray`` does not expose ``byteswap()``. The dispatcher caught +the error and silently fell back to CPU, so results stayed correct but +the GPU fast path was lost. """ from __future__ import annotations @@ -19,22 +14,11 @@ import numpy as np import pytest +from .._helpers.markers import gpu_available -def _gpu_available() -> bool: - """True if cupy is importable and CUDA is initialised.""" - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _gpu_available() _HAS_TIFFFILE = importlib.util.find_spec("tifffile") is not None _gpu_only = pytest.mark.skipif( - not (_HAS_GPU and _HAS_TIFFFILE), + not (gpu_available() and _HAS_TIFFFILE), reason="cupy + CUDA + tifffile required", ) @@ -69,20 +53,12 @@ def test_read_geotiff_gpu_big_endian_multibyte(tmp_path, dtype): gpu_da = read_geotiff_gpu(str(path)) - # The GPU path was actually exercised (no silent CPU fallback masking - # a crash inside gpu_decode_tiles_from_file). assert isinstance(gpu_da.data, cupy.ndarray), ( "expected cupy-backed DataArray, got " f"{type(gpu_da.data).__name__} -- the GPU path likely fell back " "to CPU again" ) - # The fix must preserve the native dtype contract. An earlier version - # used ``arr.view(arr.dtype.newbyteorder()).copy()`` which produced an - # array tagged with non-native byteorder (``>u2`` instead of ``= " f"array_bytes {array_bytes}; in-place mutation regressed" ) - # And the returned buffer is the same one we passed in. assert out.data.ptr == arr_gpu.data.ptr finally: cupy.cuda.set_allocator(prev_allocator) @@ -112,7 +87,7 @@ def test_apply_nodata_mask_gpu_float_alloc_count_unchanged_1934(): @_gpu_only -def test_apply_nodata_mask_gpu_int_promotes_and_masks_1934(): +def test_apply_nodata_mask_gpu_int_promotes_and_masks(): """Integer path still promotes to float64 and masks the sentinel.""" import cupy @@ -131,13 +106,8 @@ def test_apply_nodata_mask_gpu_int_promotes_and_masks_1934(): @_gpu_only -def test_apply_nodata_mask_gpu_int_no_extra_buffer_after_astype_1934(): - """Integer path: only the ``astype(float64)`` buffer is allocated. - - Before the fix the trailing ``cupy.where`` allocated a second - chunk-sized float64 buffer. After the fix the ``astype`` buffer is - mutated in place. - """ +def test_apply_nodata_mask_gpu_int_no_extra_buffer_after_astype(): + """Integer path: only the ``astype(float64)`` buffer is allocated.""" import cupy from xrspatial.geotiff import _apply_nodata_mask_gpu @@ -158,13 +128,8 @@ def test_apply_nodata_mask_gpu_int_no_extra_buffer_after_astype_1934(): isolated_pool.free_all_blocks() total_after = isolated_pool.total_bytes() - # Required: one float64 buffer (512*512*8 = 2 MiB) from astype. - # Pre-fix would have allocated a second float64 buffer for - # cupy.where (another 2 MiB) on top of that. float64_bytes = out.nbytes growth = total_after - total_before - # Allow some slack for the bool mask + .any() scalar (well under - # one float64 buffer of slack). assert growth < 2 * float64_bytes, ( f"unexpected allocation growth {growth} bytes >= " f"2 * float64_bytes {2 * float64_bytes}; pre-fix double-alloc" @@ -175,7 +140,7 @@ def test_apply_nodata_mask_gpu_int_no_extra_buffer_after_astype_1934(): @_gpu_only -def test_apply_nodata_mask_gpu_float_nan_sentinel_noop_1934(): +def test_apply_nodata_mask_gpu_float_nan_sentinel_noop(): """NaN nodata on a float array stays a no-op.""" import cupy @@ -186,13 +151,12 @@ def test_apply_nodata_mask_gpu_float_nan_sentinel_noop_1934(): ) input_ptr = arr_gpu.data.ptr out = _apply_nodata_mask_gpu(arr_gpu, float('nan')) - # Same buffer back, untouched. assert out.data.ptr == input_ptr np.testing.assert_array_equal(out.get(), [[1.0, 2.0], [3.0, 4.0]]) @_gpu_only -def test_apply_nodata_mask_gpu_none_nodata_passthrough_1934(): +def test_apply_nodata_mask_gpu_none_nodata_passthrough(): """``nodata is None`` returns the input array untouched.""" import cupy @@ -203,3 +167,22 @@ def test_apply_nodata_mask_gpu_none_nodata_passthrough_1934(): out = _apply_nodata_mask_gpu(arr_gpu, None) assert out.data.ptr == input_ptr assert out.dtype == cupy.int32 + + +# --------------------------------------------------------------------------- +# Helper removal pin (#2208) +# --------------------------------------------------------------------------- + + +def test_apply_nodata_mask_gpu_with_presence_not_importable(): + """The dead sibling helper stays removed after #2207.""" + # Covers both module-attribute absence and the import-time surface. + with pytest.raises(ImportError): + from xrspatial.geotiff._backends._gpu_helpers import \ + _apply_nodata_mask_gpu_with_presence # noqa: F401 + + +def test_apply_nodata_mask_gpu_still_present(): + """``_apply_nodata_mask_gpu`` is still on the chunked GPU dask path.""" + assert hasattr(_gpu_helpers, '_apply_nodata_mask_gpu') + assert callable(_gpu_helpers._apply_nodata_mask_gpu) diff --git a/xrspatial/tests/test_geotiff_streaming_bigtiff_threshold_1785.py b/xrspatial/geotiff/tests/read/test_streaming.py similarity index 74% rename from xrspatial/tests/test_geotiff_streaming_bigtiff_threshold_1785.py rename to xrspatial/geotiff/tests/read/test_streaming.py index d960d299f..565b8e4fa 100644 --- a/xrspatial/tests/test_geotiff_streaming_bigtiff_threshold_1785.py +++ b/xrspatial/geotiff/tests/read/test_streaming.py @@ -1,4 +1,11 @@ -"""Regression tests for issue #1785. +"""Streaming / chunked read paths. + +Folds in the streaming-BigTIFF threshold tests from +``xrspatial/tests/test_geotiff_streaming_bigtiff_threshold_1785.py`` +per the epic #2390 PR 8 directive. The cluster covers the +streaming-decision helper that the chunked write/read pipeline uses to +pick classic vs. BigTIFF, plus the integration check that the user's +``bigtiff=`` override still wins on the streaming code path. The streaming writer's auto-BigTIFF decision used to compare only the uncompressed pixel-data size against ``UINT32_MAX``. For rasters just @@ -6,16 +13,15 @@ file past the classic-TIFF uint32 offset ceiling, and the write failed late with ``struct.error``. -These tests pin the corrected decision: +The pinned contract: * The helper takes an actual ``ifd_overhead_bytes`` value (computed from the real tag list via ``_compute_classic_ifd_overhead``) rather than a 200-byte fudge constant; large ``gdal_metadata_xml`` or ``extra_tags`` - payloads must not silently undercount overhead. See the Copilot review - on PR #1787. + payloads must not silently undercount overhead. * The comparison is ``> UINT32_MAX``, matching the eager - ``_assemble_tiff`` decision (``estimated_file_size > UINT32_MAX``). A - file that is exactly ``UINT32_MAX`` bytes still fits classic. + ``_assemble_tiff`` decision. A file that is exactly ``UINT32_MAX`` + bytes still fits classic. * The explicit ``bigtiff=True``/``False`` user override still wins. """ from __future__ import annotations @@ -29,22 +35,12 @@ from xrspatial.geotiff import to_geotiff from xrspatial.geotiff._dtypes import ASCII, LONG, SHORT -from xrspatial.geotiff._header import ( - TAG_BITS_PER_SAMPLE, - TAG_COMPRESSION, - TAG_GDAL_METADATA, - TAG_IMAGE_LENGTH, - TAG_IMAGE_WIDTH, - TAG_PHOTOMETRIC, - TAG_SAMPLE_FORMAT, - TAG_SAMPLES_PER_PIXEL, - TAG_STRIP_BYTE_COUNTS, - TAG_STRIP_OFFSETS, -) -from xrspatial.geotiff._writer import ( - _compute_classic_ifd_overhead, - _should_use_bigtiff_streaming, -) +from xrspatial.geotiff._header import (TAG_BITS_PER_SAMPLE, TAG_COMPRESSION, TAG_GDAL_METADATA, + TAG_IMAGE_LENGTH, TAG_IMAGE_WIDTH, TAG_PHOTOMETRIC, + TAG_SAMPLE_FORMAT, TAG_SAMPLES_PER_PIXEL, + TAG_STRIP_BYTE_COUNTS, TAG_STRIP_OFFSETS) +from xrspatial.geotiff._writer import (_compute_classic_ifd_overhead, + _should_use_bigtiff_streaming) UINT32_MAX = 0xFFFFFFFF @@ -79,11 +75,7 @@ def _minimal_tag_list(n_entries: int, gdal_metadata_size: int = 0) -> list: class TestShouldUseBigTIFFStreaming: def test_just_under_uint32_max_promotes(self): - """uncompressed = UINT32_MAX - 50 with non-trivial overhead promotes. - - Even ~50 bytes of slack disappears once IFD + strip-table overhead - is added, so this case must promote to BigTIFF. - """ + """uncompressed = UINT32_MAX - 50 with non-trivial overhead promotes.""" # 1024 entries: strip table contributes 8 * 1024 = 8 KiB. tags = _minimal_tag_list(n_entries=1024) overhead = _compute_classic_ifd_overhead(tags) @@ -104,14 +96,7 @@ def test_half_uint32_max_stays_classic(self): ) is False def test_exactly_uint32_max_stays_classic(self): - """Boundary: total file size == UINT32_MAX bytes still fits classic. - - Eager ``_assemble_tiff`` uses ``estimated_file_size > UINT32_MAX``; - the streaming helper must match. A file of exactly ``UINT32_MAX`` - bytes has its last byte at offset ``UINT32_MAX - 1``, which is a - valid classic-TIFF offset. - """ - # Construct uncompressed_bytes so total = exactly UINT32_MAX. + """Boundary: total file size == UINT32_MAX bytes still fits classic.""" tags = _minimal_tag_list(n_entries=1) overhead = _compute_classic_ifd_overhead(tags) header = 8 @@ -139,15 +124,7 @@ def test_small_raster_no_overhead_stays_classic(self): ) is False def test_large_strip_table_alone_can_promote(self): - """Even a small pixel payload can need BigTIFF if n_entries is huge. - - Documents the strip-table contribution: ~536 M entries puts the - table itself near 4 GiB and forces BigTIFF with no pixel data. - Driven through the ``n_entries`` parameter (8 bytes per entry) - to avoid allocating a 536 M-element Python list at test time; - the ``ifd_overhead_bytes`` path is exercised by - ``test_overhead_pushes_just_under_threshold_over``. - """ + """Even a small pixel payload can need BigTIFF if n_entries is huge.""" n_entries = (UINT32_MAX // 8) + 1 assert _should_use_bigtiff_streaming( uncompressed_bytes=0, @@ -156,13 +133,12 @@ def test_large_strip_table_alone_can_promote(self): ) is True def test_overhead_pushes_just_under_threshold_over(self): - """Regression: a payload that fits classic by raw bytes but not - once header + IFD + strip table is added must promote. + """A payload that fits classic by raw bytes but not once header + + IFD + strip table is added must promote. """ n_entries = 100_000 # ~800 KB strip table tags = _minimal_tag_list(n_entries=n_entries) overhead = _compute_classic_ifd_overhead(tags) - # Choose uncompressed so the total equals exactly UINT32_MAX + 1. header = 8 uncompressed = UINT32_MAX + 1 - header - overhead assert _should_use_bigtiff_streaming( @@ -178,13 +154,7 @@ def test_overhead_pushes_just_under_threshold_over(self): ) is False def test_large_gdal_metadata_flips_decision(self): - """A 5000-byte gdal_metadata blob must flip a borderline case. - - Under the old 200-byte fudge, ``uncompressed + 200 < UINT32_MAX`` - could stay classic even when a multi-KB gdal_metadata overflow - pushed real overhead well past 200 bytes. With the actual - overhead computed from the tag list, the decision flips. - """ + """A 5000-byte gdal_metadata blob must flip a borderline case.""" n_entries = 1024 big_blob = 5000 # ASCII overflow heap entry plain_tags = _minimal_tag_list(n_entries=n_entries) @@ -193,21 +163,15 @@ def test_large_gdal_metadata_flips_decision(self): plain_overhead = _compute_classic_ifd_overhead(plain_tags) meta_overhead = _compute_classic_ifd_overhead(meta_tags) - # Metadata blob really does increase computed overhead. assert meta_overhead - plain_overhead >= big_blob - # Pick uncompressed so plain_overhead path stays classic but - # the metadata path tips over. header = 8 uncompressed = UINT32_MAX - header - plain_overhead - # Plain: total == UINT32_MAX -> classic. assert _should_use_bigtiff_streaming( uncompressed_bytes=uncompressed, n_entries=0, ifd_overhead_bytes=plain_overhead, ) is False - # With the large metadata blob folded into the real overhead, - # the total now exceeds UINT32_MAX and we must promote. assert _should_use_bigtiff_streaming( uncompressed_bytes=uncompressed, n_entries=0, @@ -217,6 +181,7 @@ def test_large_gdal_metadata_flips_decision(self): # -- Integration tests against the writer ------------------------------------ + def _read_tiff_magic(path: str) -> int: """Return the TIFF version field: 42 (0x002A) classic, 43 (0x002B) BigTIFF.""" with open(path, 'rb') as f: @@ -246,18 +211,18 @@ def small_dask_raster(): class TestStreamingBigTIFFUserOverride: def test_bigtiff_true_forces_bigtiff_on_small_raster( self, small_dask_raster, tmp_path): - path = str(tmp_path / 'force_bigtiff_1785.tif') + path = str(tmp_path / 'force_bigtiff.tif') to_geotiff(small_dask_raster, path, bigtiff=True) assert _read_tiff_magic(path) == 43 def test_bigtiff_false_forces_classic_on_small_raster( self, small_dask_raster, tmp_path): - path = str(tmp_path / 'force_classic_1785.tif') + path = str(tmp_path / 'force_classic.tif') to_geotiff(small_dask_raster, path, bigtiff=False) assert _read_tiff_magic(path) == 42 def test_bigtiff_none_small_raster_stays_classic( self, small_dask_raster, tmp_path): - path = str(tmp_path / 'auto_classic_1785.tif') + path = str(tmp_path / 'auto_classic.tif') to_geotiff(small_dask_raster, path, bigtiff=None) assert _read_tiff_magic(path) == 42 diff --git a/xrspatial/geotiff/tests/test_local_tile_byte_cap_1664.py b/xrspatial/geotiff/tests/read/test_tiling.py similarity index 59% rename from xrspatial/geotiff/tests/test_local_tile_byte_cap_1664.py rename to xrspatial/geotiff/tests/read/test_tiling.py index 25216f539..bdff3631f 100644 --- a/xrspatial/geotiff/tests/test_local_tile_byte_cap_1664.py +++ b/xrspatial/geotiff/tests/read/test_tiling.py @@ -1,14 +1,11 @@ -"""Local-file tile/strip byte-count cap (issue #1664). +"""Tiled-read paths, tile boundaries, byte caps. -Before #1664, ``XRSPATIAL_COG_MAX_TILE_BYTES`` only fired in the HTTP -fetch path. A crafted local TIFF with a huge ``TileByteCounts`` / -``StripByteCounts`` could still feed an enormous slice into the -decompressor, which can balloon into gigabytes of decoded output even -when the underlying mmap slice is bounded by the file size. +Consolidates: -These tests fabricate small COGs / strip-TIFFs, rewrite their byte -counts to oversized values, and check that the cap raises before the -decoder runs. +* ``test_local_tile_byte_cap_1664.py`` -- local-file ``TileByteCounts`` / + ``StripByteCounts`` cap and the env-driven override (CPU path). +* ``test_gpu_tile_byte_cap_2026_05_18.py`` -- the matching GPU eager and + dask + GPU chunked paths. """ from __future__ import annotations @@ -17,20 +14,23 @@ import xarray as xr from xrspatial.geotiff import _reader as _reader_mod -from xrspatial.geotiff import open_geotiff, to_geotiff +from xrspatial.geotiff import open_geotiff, read_geotiff_gpu, to_geotiff + +from .._helpers.markers import requires_gpu as _gpu_only +from .._helpers.tiff_surgery import patch_byte_counts as _patch_byte_counts -from ._helpers.tiff_surgery import patch_byte_counts as _patch_byte_counts # noqa: E402 # --------------------------------------------------------------------------- -# Helpers -- patch in-place IFD entries for tile / strip byte counts +# Helpers # --------------------------------------------------------------------------- -def _build_forged_tiled_cog(tmp_path, byte_count_value: int) -> str: +def _build_forged_tiled_cog(tmp_path, byte_count_value: int, + *, basename: str = "forged_tiles") -> str: """Write a real tiled COG, patch every TileByteCounts entry, return path.""" arr = np.arange(64 * 64, dtype=np.float32).reshape(64, 64) da = xr.DataArray(arr, dims=['y', 'x']) - path = str(tmp_path / "forged_local_tiles_1664.tif") + path = str(tmp_path / f"{basename}.tif") to_geotiff(da, path, tile_size=32, compression='deflate') with open(path, 'rb') as f: data = bytearray(f.read()) @@ -44,9 +44,7 @@ def _build_forged_stripped_tif(tmp_path, byte_count_value: int) -> str: """Write a strip-organized TIFF, patch every StripByteCounts entry.""" arr = np.arange(64 * 64, dtype=np.float32).reshape(64, 64) da = xr.DataArray(arr, dims=['y', 'x']) - path = str(tmp_path / "forged_local_strips_1664.tif") - # tiled=False forces strip layout; deflate gets the decompressor on - # the hot path so a huge declared size matters. + path = str(tmp_path / "forged_strips.tif") to_geotiff(da, path, tiled=False, compression='deflate') with open(path, 'rb') as f: data = bytearray(f.read()) @@ -64,7 +62,6 @@ def _build_forged_stripped_tif(tmp_path, byte_count_value: int) -> str: class TestLocalTileByteCap: def test_huge_tile_byte_count_rejected(self, tmp_path, monkeypatch): """A local tile with a huge TileByteCount raises before decode.""" - # 100 MB > the 1 MB cap we set below. path = _build_forged_tiled_cog(tmp_path, 100 * 1024 * 1024) monkeypatch.setenv('XRSPATIAL_COG_MAX_TILE_BYTES', str(1024 * 1024)) @@ -78,7 +75,6 @@ def test_error_message_names_value_and_cap(self, tmp_path, monkeypatch): with pytest.raises(ValueError) as excinfo: open_geotiff(path) msg = str(excinfo.value) - # The forged value (52,428,800) and the cap (1,024) both appear. assert "52,428,800" in msg or "52428800" in msg assert "1,024" in msg or "1024" in msg assert "denial-of-service" in msg.lower() or "malformed" in msg @@ -87,7 +83,7 @@ def test_normal_local_cog_under_default_cap(self, tmp_path): """Legitimate local reads with the default cap still succeed.""" arr = np.arange(64 * 64, dtype=np.float32).reshape(64, 64) da = xr.DataArray(arr, dims=['y', 'x']) - path = str(tmp_path / "normal_local_1664.tif") + path = str(tmp_path / "normal_local.tif") to_geotiff(da, path, tile_size=32, compression='deflate') result = open_geotiff(path) @@ -95,27 +91,16 @@ def test_normal_local_cog_under_default_cap(self, tmp_path): def test_env_override_lifts_cap(self, tmp_path, monkeypatch): """A user with legitimate large tiles can lift the cap via env.""" - # 50 MB declared. With cap=64 MB the read succeeds even though - # the underlying compressed slice is smaller (mmap truncates at - # EOF). path = _build_forged_tiled_cog(tmp_path, 50 * 1024 * 1024) monkeypatch.setenv( 'XRSPATIAL_COG_MAX_TILE_BYTES', str(64 * 1024 * 1024)) - # Read may raise inside the decompressor (the truncated mmap - # slice is garbage to deflate) but it must NOT raise the cap - # error. The thing we care about is that the cap check passes. try: open_geotiff(path) except ValueError as e: assert "exceeds the per-tile safety cap" not in str(e) -# --------------------------------------------------------------------------- -# Strip-organized local reads -# --------------------------------------------------------------------------- - - class TestLocalStripByteCap: def test_huge_strip_byte_count_rejected(self, tmp_path, monkeypatch): path = _build_forged_stripped_tif(tmp_path, 100 * 1024 * 1024) @@ -141,13 +126,7 @@ def test_strip_error_message_mentions_strip(self, tmp_path, monkeypatch): def test_max_tile_bytes_env_negative_falls_back(monkeypatch): - """Negative env value falls back to the default, not a 1-byte cap. - - Earlier drafts clamped to ``max(1, val)`` which made a typo - (``XRSPATIAL_COG_MAX_TILE_BYTES=-1``) silently reject every tile. - The current policy matches ``_http_timeout_from_env``: any non- - positive integer is ignored. - """ + """Negative env value falls back to the default, not a 1-byte cap.""" monkeypatch.setenv('XRSPATIAL_COG_MAX_TILE_BYTES', '-5') assert ( _reader_mod._max_tile_bytes_from_env() @@ -170,3 +149,73 @@ def test_max_tile_bytes_env_garbage_falls_back(monkeypatch): _reader_mod._max_tile_bytes_from_env() == _reader_mod.MAX_TILE_BYTES_DEFAULT ) + + +# --------------------------------------------------------------------------- +# GPU eager path: per-tile byte cap +# --------------------------------------------------------------------------- + + +class TestGpuTileByteCap: + @_gpu_only + def test_huge_tile_byte_count_rejected(self, tmp_path, monkeypatch): + """A local tile with a huge TileByteCount raises before GPU decode.""" + path = _build_forged_tiled_cog( + tmp_path, 100 * 1024 * 1024, basename="forged_gpu_tiles") + monkeypatch.setenv("XRSPATIAL_COG_MAX_TILE_BYTES", str(1024 * 1024)) + + with pytest.raises(ValueError, match="TileByteCount"): + read_geotiff_gpu(path) + + @_gpu_only + def test_error_message_names_value_and_cap(self, tmp_path, monkeypatch): + path = _build_forged_tiled_cog( + tmp_path, 50 * 1024 * 1024, basename="forged_gpu_tiles_msg") + monkeypatch.setenv("XRSPATIAL_COG_MAX_TILE_BYTES", str(1024)) + + with pytest.raises(ValueError) as excinfo: + read_geotiff_gpu(path) + msg = str(excinfo.value) + assert "52,428,800" in msg or "52428800" in msg + assert "1,024" in msg or "1024" in msg + assert "denial-of-service" in msg.lower() or "malformed" in msg + + @_gpu_only + def test_normal_gpu_read_under_default_cap(self, tmp_path): + """Legitimate GPU reads with the default cap still succeed.""" + arr = np.arange(64 * 64, dtype=np.float32).reshape(64, 64) + da = xr.DataArray(arr, dims=["y", "x"]) + path = str(tmp_path / "normal_gpu.tif") + to_geotiff(da, path, tile_size=32, compression="deflate") + + result = read_geotiff_gpu(path) + np.testing.assert_array_equal(result.data.get(), arr) + + @_gpu_only + def test_env_override_lifts_cap(self, tmp_path, monkeypatch): + """A user with legitimate large tiles can lift the cap via env.""" + path = _build_forged_tiled_cog( + tmp_path, 50 * 1024 * 1024, basename="forged_gpu_tiles_override") + monkeypatch.setenv( + "XRSPATIAL_COG_MAX_TILE_BYTES", str(64 * 1024 * 1024)) + + try: + read_geotiff_gpu(path) + except Exception as exc: + assert "exceeds the per-tile safety cap" not in str(exc), ( + "cap loop fired despite the env override lifting the cap" + ) + + +class TestGpuChunkedTileByteCap: + @_gpu_only + def test_chunked_huge_tile_byte_count_rejected( + self, tmp_path, monkeypatch): + """Sibling check on the dask + GPU chunked path.""" + path = _build_forged_tiled_cog( + tmp_path, 100 * 1024 * 1024, basename="forged_gpu_chunked") + monkeypatch.setenv( + "XRSPATIAL_COG_MAX_TILE_BYTES", str(1024 * 1024)) + + with pytest.raises(ValueError, match="TileByteCount"): + read_geotiff_gpu(path, chunks=32) diff --git a/xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_with_presence_removed_2208.py b/xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_with_presence_removed_2208.py deleted file mode 100644 index 9a4b27207..000000000 --- a/xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_with_presence_removed_2208.py +++ /dev/null @@ -1,34 +0,0 @@ -"""Regression test for issue #2208. - -After #2207 routed all three GPU eager sites through -``_finalize_eager_read``, the sibling helper -``_apply_nodata_mask_gpu_with_presence`` had no remaining callers. The -helper was removed in this PR. This test pins the removal so a future -PR cannot quietly re-introduce a dead callable. - -``_apply_nodata_mask_gpu`` is still alive on the chunked GPU dask path -(``_backends/gpu.py`` ``_chunk_task``), so this test also asserts that -helper is still importable as a sanity check that the removal was -surgical. -""" -import pytest - -from xrspatial.geotiff._backends import _gpu_helpers - - -def test_apply_nodata_mask_gpu_with_presence_not_importable_2208(): - # Covers both module-attribute absence and the import-time surface. - # _apply_nodata_mask_gpu_with_presence was removed in #2208 after - # #2207 routed all GPU eager sites through _finalize_eager_read; - # the helper had zero remaining callers. - with pytest.raises(ImportError): - from xrspatial.geotiff._backends._gpu_helpers import \ - _apply_nodata_mask_gpu_with_presence # noqa: F401 - - -def test_apply_nodata_mask_gpu_still_present_2208(): - # _apply_nodata_mask_gpu is still on the chunked GPU dask path - # (_chunk_task in _backends/gpu.py); removal in #2208 was scoped - # to the dead sibling only. - assert hasattr(_gpu_helpers, '_apply_nodata_mask_gpu') - assert callable(_gpu_helpers._apply_nodata_mask_gpu) diff --git a/xrspatial/geotiff/tests/test_band_validation_1673.py b/xrspatial/geotiff/tests/test_band_validation_1673.py deleted file mode 100644 index 36e4ede92..000000000 --- a/xrspatial/geotiff/tests/test_band_validation_1673.py +++ /dev/null @@ -1,125 +0,0 @@ -"""Regression tests for issue #1673. - -``read_to_array`` accepts a ``band`` argument and applies it to the -decoded array via ``arr[:, :, band]`` without validating the index. -Two failure modes follow: - -* ``band=-1`` silently selects the last channel via numpy negative - indexing. The public contract is "0-based non-negative index", so - this is a silent semantic shift, not an explicit selection. -* ``band=N`` with ``N >= samples_per_pixel`` raises a raw numpy - ``IndexError`` whose message ("index N is out of bounds for axis - 2 with size M") leaks the internal slice shape. - -The dask path (``read_geotiff_dask``) and the GPU path both validate -``band`` up front and raise ``IndexError("band=N out of range for -M-band file.")``. These tests pin the local eager path to the same -contract so backend parity holds. -""" -from __future__ import annotations - -import numpy as np -import pytest -import xarray as xr - - -@pytest.fixture -def multiband_tiff_path(tmp_path): - """4x6 three-band tiled tiff for band-validation tests.""" - from xrspatial.geotiff import to_geotiff - - arr = np.arange(72, dtype=np.float32).reshape(4, 6, 3) - da = xr.DataArray( - arr, - dims=['y', 'x', 'band'], - coords={ - 'y': np.array([0.5, 1.5, 2.5, 3.5]), - 'x': np.array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5]), - 'band': [0, 1, 2], - }, - attrs={'crs': 4326}, - ) - p = tmp_path / 'mb_1673.tif' - to_geotiff(da, str(p), tile_size=16) - return str(p), arr - - -def test_read_to_array_negative_band_rejected(multiband_tiff_path): - """``band=-1`` no longer silently selects the last channel.""" - from xrspatial.geotiff._reader import read_to_array - - path, _ = multiband_tiff_path - with pytest.raises(IndexError, match="band=-1 out of range"): - read_to_array(path, band=-1) - - -def test_read_to_array_band_equal_to_samples_rejected(multiband_tiff_path): - """``band=samples_per_pixel`` (off-by-one) raises a typed error.""" - from xrspatial.geotiff._reader import read_to_array - - path, _ = multiband_tiff_path - # File has 3 bands; valid indices are 0, 1, 2. - with pytest.raises(IndexError, match="band=3 out of range"): - read_to_array(path, band=3) - - -def test_read_to_array_band_far_above_samples_rejected(multiband_tiff_path): - """A wildly out-of-range band index gives the same typed error.""" - from xrspatial.geotiff._reader import read_to_array - - path, _ = multiband_tiff_path - with pytest.raises(IndexError, match="band=103 out of range"): - read_to_array(path, band=103) - - -def test_read_to_array_valid_band_still_works(multiband_tiff_path): - """Valid band indices keep working after the validation guard.""" - from xrspatial.geotiff._reader import read_to_array - - path, arr = multiband_tiff_path - out, _ = read_to_array(path, band=1) - np.testing.assert_array_equal(out, arr[:, :, 1]) - - -def test_read_to_array_band_none_still_returns_all_bands(multiband_tiff_path): - """``band=None`` still returns the full multi-band array.""" - from xrspatial.geotiff._reader import read_to_array - - path, arr = multiband_tiff_path - out, _ = read_to_array(path) - np.testing.assert_array_equal(out, arr) - - -def test_backend_parity_negative_band(multiband_tiff_path): - """Local eager and dask paths raise the same error for ``band=-1``.""" - from xrspatial.geotiff import read_geotiff_dask - from xrspatial.geotiff._reader import read_to_array - - path, _ = multiband_tiff_path - - with pytest.raises(IndexError) as eager_exc: - read_to_array(path, band=-1) - with pytest.raises(IndexError) as dask_exc: - read_geotiff_dask(path, chunks=4, band=-1) - - # Same error type and same diagnostic substring; the dask message - # is "band=-1 out of range for 3-band file." so any 0-based caller - # gets identical signal regardless of which backend they pick. - assert "band=-1 out of range" in str(eager_exc.value) - assert "band=-1 out of range" in str(dask_exc.value) - - -def test_backend_parity_band_equal_to_samples(multiband_tiff_path): - """Local eager and dask paths agree on the off-by-one rejection.""" - from xrspatial.geotiff import read_geotiff_dask - from xrspatial.geotiff._reader import read_to_array - - path, _ = multiband_tiff_path - - with pytest.raises(IndexError) as eager_exc: - read_to_array(path, band=3) - with pytest.raises(IndexError) as dask_exc: - read_geotiff_dask(path, chunks=4, band=3) - - assert "band=3 out of range" in str(eager_exc.value) - assert "band=3 out of range" in str(dask_exc.value) diff --git a/xrspatial/geotiff/tests/test_compression.py b/xrspatial/geotiff/tests/test_compression.py deleted file mode 100644 index b8f5bc5d1..000000000 --- a/xrspatial/geotiff/tests/test_compression.py +++ /dev/null @@ -1,118 +0,0 @@ -"""Tests for compression codecs.""" -from __future__ import annotations - -import numpy as np -import pytest - -from xrspatial.geotiff._compression import (COMPRESSION_DEFLATE, COMPRESSION_LZW, COMPRESSION_NONE, - compress, decompress, deflate_compress, - deflate_decompress, lzw_compress, lzw_decompress, - predictor_decode, predictor_encode) - - -class TestDeflate: - def test_round_trip(self): - data = b'hello world! ' * 100 - compressed = deflate_compress(data) - assert compressed != data - assert deflate_decompress(compressed) == data - - def test_empty(self): - compressed = deflate_compress(b'') - assert deflate_decompress(compressed) == b'' - - def test_binary_data(self): - data = bytes(range(256)) * 10 - compressed = deflate_compress(data) - assert deflate_decompress(compressed) == data - - -class TestLZW: - def test_round_trip_simple(self): - data = b'ABCABCABCABC' - compressed = lzw_compress(data) - decompressed = lzw_decompress(compressed, len(data)) - assert decompressed.tobytes() == data - - def test_round_trip_repetitive(self): - data = b'\x00' * 1000 - compressed = lzw_compress(data) - decompressed = lzw_decompress(compressed, len(data)) - assert decompressed.tobytes() == data - - def test_round_trip_sequential(self): - data = bytes(range(256)) - compressed = lzw_compress(data) - decompressed = lzw_decompress(compressed, len(data)) - assert decompressed.tobytes() == data - - def test_round_trip_random(self): - rng = np.random.RandomState(42) - data = bytes(rng.randint(0, 256, size=500, dtype=np.uint8)) - compressed = lzw_compress(data) - decompressed = lzw_decompress(compressed, len(data)) - assert decompressed.tobytes() == data - - def test_round_trip_large(self): - rng = np.random.RandomState(123) - data = bytes(rng.randint(0, 256, size=10000, dtype=np.uint8)) - compressed = lzw_compress(data) - decompressed = lzw_decompress(compressed, len(data)) - assert decompressed.tobytes() == data - - def test_empty(self): - compressed = lzw_compress(b'') - decompressed = lzw_decompress(compressed, 0) - assert decompressed.tobytes() == b'' - - -class TestPredictor: - def test_round_trip_uint8(self): - # 4x4 image, 1 byte per sample - data = np.array([10, 20, 30, 40, 50, 60, 70, 80, - 90, 100, 110, 120, 130, 140, 150, 160], - dtype=np.uint8) - encoded = predictor_encode(data.copy(), 4, 4, 1) - decoded = predictor_decode(encoded.copy(), 4, 4, 1) - np.testing.assert_array_equal(decoded, data) - - def test_round_trip_float32(self): - # 2x3 image, 4 bytes per sample - arr = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], dtype=np.float32) - raw = np.frombuffer(arr.tobytes(), dtype=np.uint8).copy() - encoded = predictor_encode(raw.copy(), 3, 2, 4) - decoded = predictor_decode(encoded.copy(), 3, 2, 4) - np.testing.assert_array_equal(decoded, raw) - - def test_predictor_encode_differences(self): - # First pixel unchanged, rest are differences - data = np.array([10, 20, 30, 40], dtype=np.uint8) - encoded = predictor_encode(data.copy(), 4, 1, 1) - assert encoded[0] == 10 - assert encoded[1] == 10 # 20 - 10 - assert encoded[2] == 10 # 30 - 20 - assert encoded[3] == 10 # 40 - 30 - - -class TestDispatch: - def test_none(self): - data = b'hello' - assert decompress(data, COMPRESSION_NONE).tobytes() == data - assert compress(data, COMPRESSION_NONE) == data - - def test_deflate(self): - data = b'test data ' * 50 - compressed = compress(data, COMPRESSION_DEFLATE) - assert decompress(compressed, COMPRESSION_DEFLATE).tobytes() == data - - def test_lzw(self): - data = b'ABCABC' * 20 - compressed = compress(data, COMPRESSION_LZW) - decompressed = decompress(compressed, COMPRESSION_LZW, len(data)) - assert decompressed.tobytes() == data - - def test_unsupported(self): - with pytest.raises(ValueError, match="Unsupported compression"): - decompress(b'', 99) - with pytest.raises(ValueError, match="Unsupported compression"): - compress(b'', 99) diff --git a/xrspatial/geotiff/tests/test_descending_coords_1716.py b/xrspatial/geotiff/tests/test_descending_coords_1716.py deleted file mode 100644 index ff045b820..000000000 --- a/xrspatial/geotiff/tests/test_descending_coords_1716.py +++ /dev/null @@ -1,127 +0,0 @@ -"""Regression tests for issue #1716. - -``to_geotiff`` previously stored ``abs(pixel_width)`` / ``abs(pixel_height)`` -in ModelPixelScaleTag and the reader hard-coded a north-up reconstruction. -DataArrays with descending x or ascending y silently round-tripped with the -wrong georeference. The writer now emits ModelTransformationTag (34264) -for non-standard orientations so the sign survives the round trip. -""" -from __future__ import annotations - -import numpy as np -import xarray as xr - -from xrspatial.geotiff import open_geotiff, to_geotiff -from xrspatial.geotiff._geotags import (TAG_MODEL_PIXEL_SCALE, TAG_MODEL_TIEPOINT, - TAG_MODEL_TRANSFORMATION) -from xrspatial.geotiff._header import parse_all_ifds, parse_header - - -def _ifd_tag_ids(path: str) -> set[int]: - with open(path, 'rb') as fh: - data = fh.read() - header = parse_header(data) - ifds = parse_all_ifds(data, header) - return set(ifds[0].entries.keys()) - - -def _make_da(x_coords: np.ndarray, y_coords: np.ndarray) -> xr.DataArray: - arr = np.arange(len(y_coords) * len(x_coords), dtype=np.float32) - arr = arr.reshape(len(y_coords), len(x_coords)) - return xr.DataArray( - arr, - dims=('y', 'x'), - coords={'y': y_coords, 'x': x_coords}, - ) - - -def test_descending_x_roundtrip(tmp_path): - """Descending x coords survive the round trip.""" - # x decreases left-to-right (unusual but valid) - x = np.array([200.0, 190.0, 180.0, 170.0, 160.0], dtype=np.float64) - y = np.array([50.0, 40.0, 30.0, 20.0], dtype=np.float64) # north-up - da = _make_da(x, y) - - out = tmp_path / 'tmp_1716_desc_x.tif' - to_geotiff(da, str(out), crs=4326) - - loaded = open_geotiff(str(out)) - np.testing.assert_allclose(loaded.coords['x'].values, x) - np.testing.assert_allclose(loaded.coords['y'].values, y) - np.testing.assert_array_equal(loaded.values, da.values) - - -def test_ascending_y_roundtrip(tmp_path): - """Ascending y coords survive the round trip.""" - x = np.array([160.0, 170.0, 180.0, 190.0, 200.0], dtype=np.float64) - # y increases top-to-bottom (south-up) - y = np.array([20.0, 30.0, 40.0, 50.0], dtype=np.float64) - da = _make_da(x, y) - - out = tmp_path / 'tmp_1716_asc_y.tif' - to_geotiff(da, str(out), crs=4326) - - loaded = open_geotiff(str(out)) - np.testing.assert_allclose(loaded.coords['x'].values, x) - np.testing.assert_allclose(loaded.coords['y'].values, y) - np.testing.assert_array_equal(loaded.values, da.values) - - -def test_descending_x_and_ascending_y_roundtrip(tmp_path): - """Both axes flipped relative to north-up.""" - x = np.array([200.0, 190.0, 180.0, 170.0, 160.0], dtype=np.float64) - y = np.array([20.0, 30.0, 40.0, 50.0], dtype=np.float64) - da = _make_da(x, y) - - out = tmp_path / 'tmp_1716_desc_x_asc_y.tif' - to_geotiff(da, str(out), crs=4326) - - loaded = open_geotiff(str(out)) - np.testing.assert_allclose(loaded.coords['x'].values, x) - np.testing.assert_allclose(loaded.coords['y'].values, y) - np.testing.assert_array_equal(loaded.values, da.values) - - -def test_north_up_still_uses_pixel_scale_and_tiepoint(tmp_path): - """Standard north-up orientation keeps ModelPixelScale + ModelTiepoint.""" - x = np.array([160.0, 170.0, 180.0, 190.0, 200.0], dtype=np.float64) - y = np.array([50.0, 40.0, 30.0, 20.0], dtype=np.float64) - da = _make_da(x, y) - - out = tmp_path / 'tmp_1716_north_up.tif' - to_geotiff(da, str(out), crs=4326) - - tag_ids = _ifd_tag_ids(str(out)) - assert TAG_MODEL_PIXEL_SCALE in tag_ids - assert TAG_MODEL_TIEPOINT in tag_ids - assert TAG_MODEL_TRANSFORMATION not in tag_ids - - -def test_descending_x_uses_transformation_tag(tmp_path): - """Non-standard orientation emits ModelTransformationTag and skips - the scale/tiepoint pair.""" - x = np.array([200.0, 190.0, 180.0, 170.0, 160.0], dtype=np.float64) - y = np.array([50.0, 40.0, 30.0, 20.0], dtype=np.float64) - da = _make_da(x, y) - - out = tmp_path / 'tmp_1716_desc_x_tags.tif' - to_geotiff(da, str(out), crs=4326) - - tag_ids = _ifd_tag_ids(str(out)) - assert TAG_MODEL_TRANSFORMATION in tag_ids - assert TAG_MODEL_PIXEL_SCALE not in tag_ids - assert TAG_MODEL_TIEPOINT not in tag_ids - - -def test_ascending_y_uses_transformation_tag(tmp_path): - x = np.array([160.0, 170.0, 180.0, 190.0, 200.0], dtype=np.float64) - y = np.array([20.0, 30.0, 40.0, 50.0], dtype=np.float64) - da = _make_da(x, y) - - out = tmp_path / 'tmp_1716_asc_y_tags.tif' - to_geotiff(da, str(out), crs=4326) - - tag_ids = _ifd_tag_ids(str(out)) - assert TAG_MODEL_TRANSFORMATION in tag_ids - assert TAG_MODEL_PIXEL_SCALE not in tag_ids - assert TAG_MODEL_TIEPOINT not in tag_ids diff --git a/xrspatial/geotiff/tests/test_dtype_read.py b/xrspatial/geotiff/tests/test_dtype_read.py deleted file mode 100644 index 0026e1364..000000000 --- a/xrspatial/geotiff/tests/test_dtype_read.py +++ /dev/null @@ -1,116 +0,0 @@ -"""Tests for dtype parameter on open_geotiff.""" -import numpy as np -import pytest -import xarray as xr - -from xrspatial.geotiff import open_geotiff, to_geotiff - - -@pytest.fixture -def float64_tif(tmp_path): - """Write a float64 GeoTIFF for dtype cast tests.""" - arr = np.random.default_rng(99).random((80, 80)).astype(np.float64) - y = np.linspace(40.0, 41.0, 80) - x = np.linspace(-105.0, -104.0, 80) - da = xr.DataArray(arr, dims=['y', 'x'], - coords={'y': y, 'x': x}, - attrs={'crs': 4326}) - path = str(tmp_path / 'test_1083_f64.tif') - to_geotiff(da, path, compression='none') - return path, arr - - -@pytest.fixture -def uint16_tif(tmp_path): - """Write a uint16 GeoTIFF for dtype cast tests.""" - arr = np.random.default_rng(77).integers(0, 10000, (60, 60), - dtype=np.uint16) - y = np.linspace(40.0, 41.0, 60) - x = np.linspace(-105.0, -104.0, 60) - da = xr.DataArray(arr, dims=['y', 'x'], - coords={'y': y, 'x': x}, - attrs={'crs': 4326}) - path = str(tmp_path / 'test_1083_u16.tif') - to_geotiff(da, path, compression='none') - return path, arr - - -class TestDtypeEager: - def test_float64_to_float32(self, float64_tif): - path, orig = float64_tif - result = open_geotiff(path, dtype='float32') - assert result.dtype == np.float32 - np.testing.assert_array_almost_equal( - result.values, orig.astype(np.float32), decimal=6) - - def test_float64_to_float16(self, float64_tif): - path, orig = float64_tif - result = open_geotiff(path, dtype=np.float16) - assert result.dtype == np.float16 - - def test_uint16_to_int32(self, uint16_tif): - path, orig = uint16_tif - result = open_geotiff(path, dtype='int32') - assert result.dtype == np.int32 - np.testing.assert_array_equal(result.values, orig.astype(np.int32)) - - def test_uint16_to_uint8(self, uint16_tif): - path, _ = uint16_tif - result = open_geotiff(path, dtype='uint8') - assert result.dtype == np.uint8 - - def test_float_to_int_raises(self, float64_tif): - path, _ = float64_tif - with pytest.raises(ValueError, match='float.*int'): - open_geotiff(path, dtype='int32') - - def test_dtype_none_preserves_native(self, float64_tif): - path, _ = float64_tif - result = open_geotiff(path, dtype=None) - assert result.dtype == np.float64 - - def test_int_with_nodata_float_to_int_raises(self, tmp_path): - """uint16 file with nodata: nodata masking promotes to float64, so float->int validation fires.""" # noqa: E501 - arr = np.array([[1, 2], [3, 9999]], dtype=np.uint16) - y = np.linspace(40.0, 41.0, 2) - x = np.linspace(-105.0, -104.0, 2) - da = xr.DataArray(arr, dims=['y', 'x'], - coords={'y': y, 'x': x}, - attrs={'crs': 4326, 'nodata': 9999.0}) - path = str(tmp_path / 'test_1083_nodata_int_eager.tif') - to_geotiff(da, path, compression='none') - with pytest.raises(ValueError, match='float.*int'): - open_geotiff(path, dtype='int32') - - -class TestDtypeDask: - def test_float64_to_float32_dask(self, float64_tif): - path, orig = float64_tif - result = open_geotiff(path, dtype='float32', chunks=40) - assert result.dtype == np.float32 - computed = result.values - np.testing.assert_array_almost_equal( - computed, orig.astype(np.float32), decimal=6) - - def test_chunks_are_target_dtype(self, float64_tif): - path, _ = float64_tif - result = open_geotiff(path, dtype='float32', chunks=40) - assert result.data.dtype == np.float32 - - def test_float_to_int_raises_dask(self, float64_tif): - path, _ = float64_tif - with pytest.raises(ValueError, match='float.*int'): - open_geotiff(path, dtype='int32', chunks=40) - - def test_int_with_nodata_float_to_int_raises_dask(self, tmp_path): - """uint16 file with nodata: nodata masking promotes to float64, so float->int validation fires.""" # noqa: E501 - arr = np.array([[1, 2], [3, 9999]], dtype=np.uint16) - y = np.linspace(40.0, 41.0, 2) - x = np.linspace(-105.0, -104.0, 2) - da = xr.DataArray(arr, dims=['y', 'x'], - coords={'y': y, 'x': x}, - attrs={'crs': 4326, 'nodata': 9999.0}) - path = str(tmp_path / 'test_1083_nodata_int_dask.tif') - to_geotiff(da, path, compression='none') - with pytest.raises(ValueError, match='float.*int'): - open_geotiff(path, dtype='int32', chunks=2) diff --git a/xrspatial/geotiff/tests/test_float16_read_1941.py b/xrspatial/geotiff/tests/test_float16_read_1941.py deleted file mode 100644 index e642bc90d..000000000 --- a/xrspatial/geotiff/tests/test_float16_read_1941.py +++ /dev/null @@ -1,138 +0,0 @@ -"""Regression tests for issue #1941. - -External GeoTIFFs that store IEEE half-precision floats (``BitsPerSample -=16`` + ``SampleFormat=3``) used to raise ``ValueError("Unsupported -BitsPerSample=16, SampleFormat=3")`` from ``tiff_dtype_to_numpy``. The -writer auto-promotes float16 inputs to float32 before encoding, so the -write side could not produce such a file, but reads from rasterio / -GDAL / tifffile-produced files broke read-parity. - -The fix: - -* ``tiff_dtype_to_numpy(16, 3)`` returns ``np.float32`` (symmetric with - the writer's auto-promotion). -* A new ``tiff_storage_dtype`` returns ``np.float16`` for the same key - so the byte-view in ``_decode_strip_or_tile`` reads the raw 2-byte - samples correctly before casting to float32. -* The GPU paths fall back to CPU decode when bps != dtype.itemsize * 8, - matching the existing stripped-layout fallback. -""" -from __future__ import annotations - -import numpy as np -import pytest - -from xrspatial.geotiff import open_geotiff, read_geotiff_dask -from xrspatial.geotiff._dtypes import (SAMPLE_FORMAT_FLOAT, SAMPLE_FORMAT_INT, SAMPLE_FORMAT_UINT, - tiff_dtype_to_numpy, tiff_storage_dtype) - - -class TestDtypeMap: - """The dtype map auto-promotes float16 on read.""" - - def test_tiff_dtype_to_numpy_float16(self): - assert tiff_dtype_to_numpy(16, SAMPLE_FORMAT_FLOAT) == np.float32 - - def test_tiff_storage_dtype_float16(self): - assert tiff_storage_dtype(16, SAMPLE_FORMAT_FLOAT) == np.float16 - - def test_tiff_storage_dtype_delegates_for_non_promoted(self): - # Non-promoted keys behave identically. - for bps, sf in [ - (8, SAMPLE_FORMAT_UINT), - (16, SAMPLE_FORMAT_UINT), - (16, SAMPLE_FORMAT_INT), - (32, SAMPLE_FORMAT_FLOAT), - (64, SAMPLE_FORMAT_FLOAT), - ]: - assert tiff_storage_dtype(bps, sf) == tiff_dtype_to_numpy(bps, sf) - - -@pytest.fixture -def float16_tif(tmp_path): - """Write a small float16 GeoTIFF using tifffile. - - tifffile encodes numpy float16 with ``BitsPerSample=16`` and - ``SampleFormat=3``, which is what an external rasterio / GDAL caller - would produce. - """ - tifffile = pytest.importorskip("tifffile") - arr = np.array( - [[0.0, 1.0, 2.0, 3.0], - [-1.0, -2.0, -3.0, -4.0], - [0.5, 1.5, 2.5, 3.5], - [100.0, 200.0, 300.0, 400.0]], - dtype=np.float16, - ) - path = tmp_path / "f16.tif" - tifffile.imwrite(str(path), arr, compression=None) - return path, arr - - -class TestEagerFloat16Read: - """``open_geotiff`` decodes an external float16 file to float32.""" - - def test_open_geotiff_returns_float32(self, float16_tif): - path, arr = float16_tif - result = open_geotiff(str(path)) - assert result.dtype == np.float32 - # Float16 values fit exactly in float32, so equality is well-defined. - np.testing.assert_array_equal(result.values, arr.astype(np.float32)) - - def test_open_geotiff_dask_returns_float32(self, float16_tif): - path, arr = float16_tif - result = read_geotiff_dask(str(path), chunks=2) - assert result.dtype == np.float32 - np.testing.assert_array_equal( - result.compute().values, arr.astype(np.float32)) - - -class TestPredictor3Float16: - """Predictor=3 + float16 on disk also decodes correctly.""" - - def test_predictor3_float16_round_trip(self, tmp_path): - tifffile = pytest.importorskip("tifffile") - pytest.importorskip("imagecodecs") # required for predictor=3 - arr = np.linspace(-1.0, 1.0, 16).astype(np.float16).reshape(4, 4) - path = tmp_path / "pred3_f16.tif" - tifffile.imwrite( - str(path), arr, predictor=3, compression="deflate") - - result = open_geotiff(str(path)) - assert result.dtype == np.float32 - np.testing.assert_array_equal( - result.values, arr.astype(np.float32)) - - -class TestRegressionGuards: - """The promotion did not change non-float16 behaviour.""" - - def test_float32_still_float32(self, tmp_path): - tifffile = pytest.importorskip("tifffile") - arr = np.arange(16, dtype=np.float32).reshape(4, 4) - path = tmp_path / "f32.tif" - tifffile.imwrite(str(path), arr) - - result = open_geotiff(str(path)) - assert result.dtype == np.float32 - np.testing.assert_array_equal(result.values, arr) - - def test_float64_still_float64(self, tmp_path): - tifffile = pytest.importorskip("tifffile") - arr = np.arange(16, dtype=np.float64).reshape(4, 4) - path = tmp_path / "f64.tif" - tifffile.imwrite(str(path), arr) - - result = open_geotiff(str(path)) - assert result.dtype == np.float64 - np.testing.assert_array_equal(result.values, arr) - - def test_uint16_still_uint16(self, tmp_path): - tifffile = pytest.importorskip("tifffile") - arr = np.arange(16, dtype=np.uint16).reshape(4, 4) - path = tmp_path / "u16.tif" - tifffile.imwrite(str(path), arr) - - result = open_geotiff(str(path)) - assert result.dtype == np.uint16 - np.testing.assert_array_equal(result.values, arr) diff --git a/xrspatial/geotiff/tests/test_float16_read_gpu_1941.py b/xrspatial/geotiff/tests/test_float16_read_gpu_1941.py deleted file mode 100644 index 3983e08e0..000000000 --- a/xrspatial/geotiff/tests/test_float16_read_gpu_1941.py +++ /dev/null @@ -1,340 +0,0 @@ -"""GPU backend coverage for issue #1941 (float16 read). - -#1941 added float16 auto-promotion on read by making -``tiff_dtype_to_numpy(16, SAMPLE_FORMAT_FLOAT)`` return ``float32`` and -adding the on-disk ``tiff_storage_dtype`` companion. The eager numpy and -dask paths are covered by ``test_float16_read_1941.py``; this module -closes the GPU and dask+GPU coverage gap. - -A regression that: - -* dropped the ``bps_mismatch`` stripped/odd-bps fallback at - ``_backends/gpu.py:357`` would route float16 stripped reads through - the tiled GPU decoder and mis-decode the half-precision samples; -* dropped the ``bps_first == 16 and sample_format == SAMPLE_FORMAT_FLOAT`` - early-out at ``_backends/gpu.py:791`` in ``_gds_chunk_path_available`` - would send tiled float16 chunked reads down the kvikIO GDS path and - mis-stride the buffer; -* dropped the entry at ``(16, SAMPLE_FORMAT_FLOAT) -> float32`` in - ``tiff_dtype_to_numpy`` would surface as ``ValueError("Unsupported - BitsPerSample=16, SampleFormat=3")`` from the GPU read paths. - -Every test ships through ``read_geotiff_gpu`` directly or through -``open_geotiff(..., gpu=True)`` so the dispatcher path is also wired in. -``cuda-unavailable`` builds skip the suite via the project's standard -``CUDA_AVAILABLE`` gate. -""" -from __future__ import annotations - -import importlib.util - -import numpy as np -import pytest - - -def _gpu_available() -> bool: - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _gpu_available() -pytestmark = pytest.mark.skipif( - not _HAS_GPU, reason="cupy + CUDA required for GPU float16 read tests", -) - - -@pytest.fixture -def float16_stripped_tif(tmp_path): - """Stripped float16 GeoTIFF: triggers the bps_mismatch CPU fallback. - - ``tifffile.imwrite`` without ``tile=`` produces a stripped layout, so - the GPU reader hits ``bps_mismatch=True`` (file_dtype.itemsize*8 == 32 - but bps == 16) and falls back to ``_read_to_array`` on CPU before - copying to device. - """ - tifffile = pytest.importorskip("tifffile") - arr = np.array( - [[0.0, 1.0, 2.0, 3.0], - [-1.0, -2.0, -3.0, -4.0], - [0.5, 1.5, 2.5, 3.5], - [100.0, 200.0, 300.0, 400.0]], - dtype=np.float16, - ) - path = tmp_path / "f16_stripped.tif" - tifffile.imwrite(str(path), arr, compression=None) - return path, arr - - -@pytest.fixture -def float16_tiled_tif(tmp_path): - """Multi-tile float16 GeoTIFF: 32x32 image, 16x16 tiles (2x2 grid). - - Tiled and deflate-compressed. The 2x2 tile grid exercises inter-tile - reassembly in the decoder path so a regression that mis-stitched - adjacent tiles would surface here. ``bps_mismatch`` short-circuits - the tiled GPU decode path and routes through the CPU decoder; the - GDS path is also gated off via ``_gds_chunk_path_available`` - returning False for (bps=16, sf=3). - """ - tifffile = pytest.importorskip("tifffile") - arr = np.arange(1024, dtype=np.float16).reshape(32, 32) - path = tmp_path / "f16_tiled.tif" - tifffile.imwrite( - str(path), arr, compression="deflate", tile=(16, 16)) - return path, arr - - -@pytest.fixture -def float16_tiled_uncompressed_tif(tmp_path): - """Tiled uncompressed float16 GeoTIFF. - - Mirrors ``float16_tiled_tif`` but with ``compression=None`` so the - tile-decode path is exercised without an extra deflate codec call. - Tile size 16 is the smallest tifffile allows. - """ - tifffile = pytest.importorskip("tifffile") - arr = np.arange(256, dtype=np.float16).reshape(16, 16) - path = tmp_path / "f16_tiled_none.tif" - tifffile.imwrite( - str(path), arr, compression=None, tile=(16, 16)) - return path, arr - - -class TestEagerGPUReadFloat16: - """``read_geotiff_gpu`` returns float32 for stripped float16 input.""" - - def test_read_geotiff_gpu_stripped_returns_float32( - self, float16_stripped_tif - ): - from xrspatial.geotiff import read_geotiff_gpu - - path, arr = float16_stripped_tif - result = read_geotiff_gpu(str(path)) - assert result.dtype == np.float32, ( - f"GPU read of float16 must return float32, got {result.dtype}" - ) - np.testing.assert_array_equal( - result.data.get(), arr.astype(np.float32)) - - def test_read_geotiff_gpu_tiled_returns_float32( - self, float16_tiled_tif - ): - from xrspatial.geotiff import read_geotiff_gpu - - path, arr = float16_tiled_tif - result = read_geotiff_gpu(str(path)) - assert result.dtype == np.float32 - np.testing.assert_array_equal( - result.data.get(), arr.astype(np.float32)) - - def test_read_geotiff_gpu_tiled_uncompressed_returns_float32( - self, float16_tiled_uncompressed_tif - ): - from xrspatial.geotiff import read_geotiff_gpu - - path, arr = float16_tiled_uncompressed_tif - result = read_geotiff_gpu(str(path)) - assert result.dtype == np.float32 - np.testing.assert_array_equal( - result.data.get(), arr.astype(np.float32)) - - def test_open_geotiff_gpu_dispatcher_float16(self, float16_tiled_tif): - """``open_geotiff(gpu=True)`` dispatches correctly for float16.""" - from xrspatial.geotiff import open_geotiff - - path, arr = float16_tiled_tif - result = open_geotiff(str(path), gpu=True) - assert result.dtype == np.float32 - np.testing.assert_array_equal( - result.data.get(), arr.astype(np.float32)) - - -class TestGPUWindowedFloat16: - """Windowed GPU reads honour the bps_mismatch fallback path.""" - - def test_read_geotiff_gpu_windowed_stripped(self, float16_stripped_tif): - from xrspatial.geotiff import read_geotiff_gpu - - path, arr = float16_stripped_tif - result = read_geotiff_gpu(str(path), window=(0, 0, 2, 2)) - assert result.dtype == np.float32 - assert result.shape == (2, 2) - np.testing.assert_array_equal( - result.data.get(), arr[:2, :2].astype(np.float32)) - - def test_read_geotiff_gpu_windowed_tiled(self, float16_tiled_tif): - from xrspatial.geotiff import read_geotiff_gpu - - path, arr = float16_tiled_tif - result = read_geotiff_gpu(str(path), window=(0, 0, 8, 8)) - assert result.dtype == np.float32 - assert result.shape == (8, 8) - np.testing.assert_array_equal( - result.data.get(), arr[:8, :8].astype(np.float32)) - - -class TestDaskGPUFloat16: - """``open_geotiff(chunks=, gpu=True)`` decodes float16 correctly.""" - - def test_dask_gpu_tiled_float16(self, float16_tiled_tif): - from xrspatial.geotiff import open_geotiff - - path, arr = float16_tiled_tif - result = open_geotiff(str(path), chunks=8, gpu=True) - assert result.dtype == np.float32, ( - f"dask+GPU read of float16 must return float32, got {result.dtype}" - ) - # Compute the dask array; under dask+cupy, .compute() yields a - # cupy-backed DataArray, so the .data.get() step pulls to host. - computed = result.compute() - np.testing.assert_array_equal( - computed.data.get(), arr.astype(np.float32)) - - def test_read_geotiff_gpu_chunks_kwarg_float16(self, float16_tiled_tif): - """``read_geotiff_gpu(chunks=)`` also routes correctly.""" - from xrspatial.geotiff import read_geotiff_gpu - - path, arr = float16_tiled_tif - result = read_geotiff_gpu(str(path), chunks=8) - assert result.dtype == np.float32 - computed = result.compute() - np.testing.assert_array_equal( - computed.data.get(), arr.astype(np.float32)) - - -class TestGDSPathGatedOffForFloat16: - """``_gds_chunk_path_available`` returns False for (bps=16, sf=3). - - Direct structural test of the gating logic added in #1941 to keep the - KvikIO GDS chunked path from mis-decoding half-precision tiles. A - regression dropping the float16 guard would silently corrupt every - chunked GPU read of a float16 source. - """ - - def test_gds_path_gated_off_for_float16(self, float16_tiled_tif): - pytest.importorskip("kvikio", exc_type=ImportError) - - from xrspatial.geotiff._backends.gpu import _gds_chunk_path_available - from xrspatial.geotiff._header import parse_all_ifds, parse_header - - path, _ = float16_tiled_tif - with open(str(path), "rb") as f: - data = f.read() - header = parse_header(data) - ifds = parse_all_ifds(data, header) - ifd = ifds[0] - - # Sanity-check fixture: tiled, bps=16, sample_format=3 (float) - from xrspatial.geotiff._dtypes import SAMPLE_FORMAT_FLOAT - assert ifd.is_tiled, "fixture sanity: tiled layout expected" - # Mirror the production unpacking pattern at gpu.py:791 - # (bps_first[0] if bps_first else 0) so an empty BitsPerSample - # tuple would not raise IndexError here. - bps_first = ifd.bits_per_sample - if isinstance(bps_first, tuple): - bps = bps_first[0] if bps_first else 0 - else: - bps = bps_first - assert bps == 16, "fixture sanity: bps=16 expected" - assert ifd.sample_format == SAMPLE_FORMAT_FLOAT - - result = _gds_chunk_path_available( - str(path), ifd, has_sparse_tile=False, orientation=1) - assert result is False, ( - "_gds_chunk_path_available must return False for " - "(bps=16, sf=float) so the GDS chunked path does not " - "mis-decode half-precision tiles." - ) - - def test_gds_path_allowed_for_float32_tiled(self, tmp_path): - """Sanity: GDS path remains allowed for a float32 tiled file. - - Pins that the float16 guard at gpu.py:791 fires only on - (bps=16, sf=float), not on every tiled float file. A regression - widening the guard to all floats would silently disable the - GDS path on every float32 tiled COG. - """ - tifffile = pytest.importorskip("tifffile") - pytest.importorskip("kvikio", exc_type=ImportError) - - arr = np.arange(256, dtype=np.float32).reshape(16, 16) - path = tmp_path / "f32_tiled.tif" - tifffile.imwrite( - str(path), arr, compression="deflate", tile=(16, 16)) - - from xrspatial.geotiff._backends.gpu import _gds_chunk_path_available - from xrspatial.geotiff._header import parse_all_ifds, parse_header - - with open(str(path), "rb") as f: - data = f.read() - header = parse_header(data) - ifds = parse_all_ifds(data, header) - - result = _gds_chunk_path_available( - str(path), ifds[0], has_sparse_tile=False, orientation=1) - assert result is True, ( - "_gds_chunk_path_available must remain True for " - "(bps=32, sf=float) tiled files so the kvikio GDS chunk " - "path still applies." - ) - - -class TestBackendParityFloat16: - """All four backends agree pixel-exact on float16 input.""" - - def test_eager_numpy_equals_gpu(self, float16_tiled_tif): - from xrspatial.geotiff import open_geotiff - - path, _ = float16_tiled_tif - cpu = open_geotiff(str(path)) - gpu = open_geotiff(str(path), gpu=True) - - assert cpu.dtype == gpu.dtype == np.float32 - np.testing.assert_array_equal(np.asarray(cpu), gpu.data.get()) - - def test_eager_numpy_equals_dask_gpu(self, float16_tiled_tif): - from xrspatial.geotiff import open_geotiff - - path, _ = float16_tiled_tif - cpu = open_geotiff(str(path)) - dask_gpu = open_geotiff(str(path), chunks=8, gpu=True).compute() - - assert cpu.dtype == dask_gpu.dtype == np.float32 - np.testing.assert_array_equal( - np.asarray(cpu), dask_gpu.data.get()) - - def test_dask_numpy_equals_dask_gpu(self, float16_tiled_tif): - from xrspatial.geotiff import open_geotiff, read_geotiff_dask - - path, _ = float16_tiled_tif - dask_cpu = read_geotiff_dask(str(path), chunks=8).compute() - dask_gpu = open_geotiff(str(path), chunks=8, gpu=True).compute() - - np.testing.assert_array_equal( - np.asarray(dask_cpu), dask_gpu.data.get()) - - -class TestPredictor3Float16GPU: - """Predictor=3 + float16 on disk also decodes correctly on GPU.""" - - def test_predictor3_float16_gpu_round_trip(self, tmp_path): - tifffile = pytest.importorskip("tifffile") - pytest.importorskip("imagecodecs") # required for predictor=3 - - from xrspatial.geotiff import read_geotiff_gpu - - arr = np.linspace(-1.0, 1.0, 16).astype(np.float16).reshape(4, 4) - path = tmp_path / "pred3_f16.tif" - tifffile.imwrite( - str(path), arr, predictor=3, compression="deflate") - - result = read_geotiff_gpu(str(path)) - assert result.dtype == np.float32 - np.testing.assert_array_equal( - result.data.get(), arr.astype(np.float32)) diff --git a/xrspatial/geotiff/tests/test_gpu_tile_byte_cap_2026_05_18.py b/xrspatial/geotiff/tests/test_gpu_tile_byte_cap_2026_05_18.py deleted file mode 100644 index 127aef86a..000000000 --- a/xrspatial/geotiff/tests/test_gpu_tile_byte_cap_2026_05_18.py +++ /dev/null @@ -1,156 +0,0 @@ -"""GPU read path per-tile byte cap (security sweep follow-up). - -The CPU readers ``_read_tiles`` (xrspatial/geotiff/_reader.py:2084) and -``_fetch_decode_cog_http_tiles`` (xrspatial/geotiff/_reader.py:2563) -reject a tile whose declared ``TileByteCount`` exceeds the env-driven -``_max_tile_bytes_from_env()`` cap (default 256 MiB). The eager GPU -read path in ``xrspatial.geotiff._backends.gpu.read_geotiff_gpu`` did -not run the same check; ``validate_tile_layout`` bounds the offsets -array length but not the byte-count entries. A crafted local TIFF with -a multi-hundred-MB ``TileByteCount`` could then pass through to GPU -decode, where ``_check_gpu_memory`` only catches the aggregate at -~90% of free VRAM and not the per-tile asymmetry between the CPU and -GPU paths. - -The GPU eager path now applies the same per-tile cap so the CPU and -GPU contracts agree. These tests cover the rejection, the wording of -the rejection message, the env-override escape hatch, and the legit- -read pass-through under the default cap. - -Mirrors the structure of ``test_local_tile_byte_cap_1664.py`` for the -CPU paths so a side-by-side comparison is easy. -""" -from __future__ import annotations - -import importlib.util - -import numpy as np -import pytest -import xarray as xr - -from xrspatial.geotiff import read_geotiff_gpu, to_geotiff - -from ._helpers.tiff_surgery import patch_byte_counts as _patch_byte_counts - - -def _cupy_available() -> bool: - if importlib.util.find_spec("cupy") is None: - return False - try: - import cupy - - return bool(cupy.cuda.is_available()) - except Exception: - return False - - -_HAS_GPU = _cupy_available() -_gpu_only = pytest.mark.skipif( - not _HAS_GPU, reason="cupy + CUDA required for the GPU read path", -) - - -def _build_forged_tiled_cog(tmp_path, byte_count_value: int) -> str: - """Write a real tiled COG, patch every TileByteCounts entry, return path.""" - arr = np.arange(64 * 64, dtype=np.float32).reshape(64, 64) - da = xr.DataArray(arr, dims=["y", "x"]) - path = str(tmp_path / "forged_gpu_tiles_2026_05_18.tif") - to_geotiff(da, path, tile_size=32, compression="deflate") - with open(path, "rb") as f: - data = bytearray(f.read()) - _patch_byte_counts(data, 325, byte_count_value) - with open(path, "wb") as f: - f.write(data) - return path - - -# --------------------------------------------------------------------------- -# GPU eager path: per-tile byte cap -# --------------------------------------------------------------------------- - - -class TestGpuTileByteCap: - @_gpu_only - def test_huge_tile_byte_count_rejected(self, tmp_path, monkeypatch): - """A local tile with a huge TileByteCount raises before GPU decode.""" - path = _build_forged_tiled_cog(tmp_path, 100 * 1024 * 1024) - monkeypatch.setenv("XRSPATIAL_COG_MAX_TILE_BYTES", str(1024 * 1024)) - - with pytest.raises(ValueError, match="TileByteCount"): - read_geotiff_gpu(path) - - @_gpu_only - def test_error_message_names_value_and_cap(self, tmp_path, monkeypatch): - path = _build_forged_tiled_cog(tmp_path, 50 * 1024 * 1024) - monkeypatch.setenv("XRSPATIAL_COG_MAX_TILE_BYTES", str(1024)) - - with pytest.raises(ValueError) as excinfo: - read_geotiff_gpu(path) - msg = str(excinfo.value) - # The forged value (52,428,800) and the cap (1,024) both appear. - assert "52,428,800" in msg or "52428800" in msg - assert "1,024" in msg or "1024" in msg - assert "denial-of-service" in msg.lower() or "malformed" in msg - - @_gpu_only - def test_normal_gpu_read_under_default_cap(self, tmp_path): - """Legitimate GPU reads with the default cap still succeed.""" - arr = np.arange(64 * 64, dtype=np.float32).reshape(64, 64) - da = xr.DataArray(arr, dims=["y", "x"]) - path = str(tmp_path / "normal_gpu_2026_05_18.tif") - to_geotiff(da, path, tile_size=32, compression="deflate") - - result = read_geotiff_gpu(path) - # CuPy -> numpy for comparison. - np.testing.assert_array_equal(result.data.get(), arr) - - @_gpu_only - def test_env_override_lifts_cap(self, tmp_path, monkeypatch): - """A user with legitimate large tiles can lift the cap via env. - - The truncated forged payload makes the downstream codec raise; - the assertion below asserts only that whatever error fires is - *not* the cap rejection. Catch the broad ``Exception`` so the - test stays focused on the cap-loop contract rather than - chasing every decoder failure mode, but still inspect the - message string to make sure a regression that re-fires the cap - through a different error path would be visible. - """ - path = _build_forged_tiled_cog(tmp_path, 50 * 1024 * 1024) - monkeypatch.setenv( - "XRSPATIAL_COG_MAX_TILE_BYTES", str(64 * 1024 * 1024)) - - try: - read_geotiff_gpu(path) - except Exception as exc: - assert "exceeds the per-tile safety cap" not in str(exc), ( - "cap loop fired despite the env override lifting the cap" - ) - - -# --------------------------------------------------------------------------- -# Dask + GPU chunked path: same per-tile cap (added in the review pass) -# --------------------------------------------------------------------------- - - -class TestGpuChunkedTileByteCap: - @_gpu_only - def test_chunked_huge_tile_byte_count_rejected( - self, tmp_path, monkeypatch): - """Sibling check on the dask + GPU chunked path. - - ``_read_geotiff_gpu_chunked_gds`` parses the IFDs and then fans - out per-chunk GDS reads. Without the cap, the chunked path - would build a graph that still pulls the forged tile per task; - the metadata-time check rejects the file before any graph is - built. - """ - path = _build_forged_tiled_cog(tmp_path, 100 * 1024 * 1024) - monkeypatch.setenv( - "XRSPATIAL_COG_MAX_TILE_BYTES", str(1024 * 1024)) - - with pytest.raises(ValueError, match="TileByteCount"): - # ``chunks`` enables the dask + GPU pipeline; the read path - # internally routes through ``_read_geotiff_gpu_chunked_gds`` - # when the file qualifies for the GDS chunked fast path. - read_geotiff_gpu(path, chunks=32)