geotiff: reject predictor=3 + integer SampleFormat on read (#1933)#1940
Merged
Conversation
…1933) The reader accepted a malformed IFD that claimed Predictor=3 (Floating-Point Predictor, TIFF Technical Note 3) with an integer SampleFormat without complaint. The byte-swizzle unshuffle ran on integer bytes and produced garbage pixel values that look like valid integers, with no warning. The writer already rejected this combination via _writer._resolve_predictor. The fix adds _validate_predictor_sample_format in _validation.py and calls it at every IFD-read site (eager numpy, dask, GPU tiled, GPU stripped, HTTP/cloud metadata parse) so reader and writer share one contract. Regression test in test_predictor3_int_dtype_1933.py covers the helper, end-to-end eager and dask reads of a synthetic file, and a control that legitimate predictor=3 float files still round-trip.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the GeoTIFF reader against malformed TIFFs that declare Predictor=3 (floating-point predictor) while using a non-float SampleFormat, mirroring the writer’s existing rejection and preventing silent misdecodes.
Changes:
- Add
_validate_predictor_sample_formatinxrspatial/geotiff/_validation.pyto rejectPredictor=3whenSampleFormat != 3. - Invoke the new validator across eager, HTTP/COG, dask graph-build, and GPU read paths.
- Add regression tests that assert eager and dask reads raise on a synthetic malformed TIFF, while valid float predictor=3 files still round-trip.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
xrspatial/geotiff/tests/test_predictor3_int_dtype_1933.py |
Adds regression + round-trip tests for malformed predictor=3+integer SampleFormat and valid float predictor=3 cases. |
xrspatial/geotiff/_validation.py |
Introduces _validate_predictor_sample_format and the corresponding error message. |
xrspatial/geotiff/_reader.py |
Calls the validator in strip/tile and HTTP decode paths. |
xrspatial/geotiff/_backends/gpu.py |
Calls the validator in GPU tiled decode paths. |
xrspatial/geotiff/_backends/dask.py |
Calls the validator during HTTP/fsspec metadata parsing to fail at graph-build time. |
xrspatial/geotiff/__init__.py |
Calls the validator in _read_geo_info for both local and fsspec metadata reads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+211
to
+216
| # Only the float-predictor case is asymmetric; predictor=1 (none) and | ||
| # predictor=2 (horizontal) are sample-format-agnostic by design. | ||
| if predictor == 3 and sample_format != 3: | ||
| raise ValueError( | ||
| f"Predictor=3 (floating-point) requires SampleFormat=3 " | ||
| f"(IEEE float), got SampleFormat={sample_format}. The TIFF " |
Comment on lines
+56
to
+61
| (TAG_IMAGE_LENGTH, LONG, 1, arr.shape[0]), | ||
| (TAG_BITS_PER_SAMPLE, SHORT, 1, bits_per_sample), | ||
| (TAG_COMPRESSION, SHORT, 1, 1), | ||
| (TAG_PHOTOMETRIC, SHORT, 1, 1), | ||
| (TAG_SAMPLES_PER_PIXEL, SHORT, 1, 1), | ||
| (TAG_SAMPLE_FORMAT, SHORT, 1, 1), # UINT |
- _validate_predictor_sample_format now normalizes a tuple-valued predictor (1-element or longer) to its first element, then falls back to predictor=1 on the empty-tuple case. IFD.predictor delegates to IFD.get_value, which can return a tuple for a malformed Predictor tag with count > 1; without this normalization the (3,) + integer-SampleFormat combo would slip past the guard. - Add a regression test that asserts (3,) + SF=1 still raises and that an empty tuple / non-3 tuple is a no-op. - Replace the hardcoded Compression tag value in the synthetic-TIFF builder with the already-imported COMPRESSION_NONE constant.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1933.
Summary
_apply_predictorin_reader.pyrouted tofp_predictor_decodeonPredictor=3with no check that the data is float. A malformed file with integerSampleFormatdecoded silently to garbage._writer._resolve_predictor. The fix gives the reader symmetric behaviour._validate_predictor_sample_formatin_validation.pyis invoked at every IFD-read site: eager numpy, dask local + HTTP, GPU tiled, GPU stripped, and the shared metadata parser used byread_geotiff_daskgraph build.Test plan
SampleFormatvalues.open_geotiffraises on a synthetic predictor=3 + uint32 file (built via_assemble_standard_layout).read_geotiff_daskraises at graph-build time on the same synthetic file.pytest xrspatial/geotiff/tests/non-GPU/HTTP: 2382 passed, 7 skipped.