Skip to content

geotiff: reject predictor=3 + integer SampleFormat on read (#1933)#1940

Merged
brendancol merged 2 commits into
mainfrom
rockout-accuracy-low-geotiff-2026-05-15-01
May 15, 2026
Merged

geotiff: reject predictor=3 + integer SampleFormat on read (#1933)#1940
brendancol merged 2 commits into
mainfrom
rockout-accuracy-low-geotiff-2026-05-15-01

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1933.

Summary

  • _apply_predictor in _reader.py routed to fp_predictor_decode on Predictor=3 with no check that the data is float. A malformed file with integer SampleFormat decoded silently to garbage.
  • Writer side already rejects this combination in _writer._resolve_predictor. The fix gives the reader symmetric behaviour.
  • New helper _validate_predictor_sample_format in _validation.py is invoked at every IFD-read site: eager numpy, dask local + HTTP, GPU tiled, GPU stripped, and the shared metadata parser used by read_geotiff_dask graph build.

Test plan

  • Unit-level helper accepts the legitimate float case and rejects the three non-float SampleFormat values.
  • End-to-end open_geotiff raises on a synthetic predictor=3 + uint32 file (built via _assemble_standard_layout).
  • End-to-end read_geotiff_dask raises at graph-build time on the same synthetic file.
  • Legitimate predictor=3 + float32 files still round-trip.
  • pytest xrspatial/geotiff/tests/ non-GPU/HTTP: 2382 passed, 7 skipped.

…1933)

The reader accepted a malformed IFD that claimed Predictor=3
(Floating-Point Predictor, TIFF Technical Note 3) with an integer
SampleFormat without complaint. The byte-swizzle unshuffle ran on
integer bytes and produced garbage pixel values that look like valid
integers, with no warning.

The writer already rejected this combination via
_writer._resolve_predictor. The fix adds _validate_predictor_sample_format
in _validation.py and calls it at every IFD-read site (eager numpy,
dask, GPU tiled, GPU stripped, HTTP/cloud metadata parse) so reader and
writer share one contract.

Regression test in test_predictor3_int_dtype_1933.py covers the helper,
end-to-end eager and dask reads of a synthetic file, and a control that
legitimate predictor=3 float files still round-trip.
Copilot AI review requested due to automatic review settings May 15, 2026 14:38
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the GeoTIFF reader against malformed TIFFs that declare Predictor=3 (floating-point predictor) while using a non-float SampleFormat, mirroring the writer’s existing rejection and preventing silent misdecodes.

Changes:

  • Add _validate_predictor_sample_format in xrspatial/geotiff/_validation.py to reject Predictor=3 when SampleFormat != 3.
  • Invoke the new validator across eager, HTTP/COG, dask graph-build, and GPU read paths.
  • Add regression tests that assert eager and dask reads raise on a synthetic malformed TIFF, while valid float predictor=3 files still round-trip.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
xrspatial/geotiff/tests/test_predictor3_int_dtype_1933.py Adds regression + round-trip tests for malformed predictor=3+integer SampleFormat and valid float predictor=3 cases.
xrspatial/geotiff/_validation.py Introduces _validate_predictor_sample_format and the corresponding error message.
xrspatial/geotiff/_reader.py Calls the validator in strip/tile and HTTP decode paths.
xrspatial/geotiff/_backends/gpu.py Calls the validator in GPU tiled decode paths.
xrspatial/geotiff/_backends/dask.py Calls the validator during HTTP/fsspec metadata parsing to fail at graph-build time.
xrspatial/geotiff/__init__.py Calls the validator in _read_geo_info for both local and fsspec metadata reads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +211 to +216
# Only the float-predictor case is asymmetric; predictor=1 (none) and
# predictor=2 (horizontal) are sample-format-agnostic by design.
if predictor == 3 and sample_format != 3:
raise ValueError(
f"Predictor=3 (floating-point) requires SampleFormat=3 "
f"(IEEE float), got SampleFormat={sample_format}. The TIFF "
Comment on lines +56 to +61
(TAG_IMAGE_LENGTH, LONG, 1, arr.shape[0]),
(TAG_BITS_PER_SAMPLE, SHORT, 1, bits_per_sample),
(TAG_COMPRESSION, SHORT, 1, 1),
(TAG_PHOTOMETRIC, SHORT, 1, 1),
(TAG_SAMPLES_PER_PIXEL, SHORT, 1, 1),
(TAG_SAMPLE_FORMAT, SHORT, 1, 1), # UINT
- _validate_predictor_sample_format now normalizes a tuple-valued
  predictor (1-element or longer) to its first element, then falls
  back to predictor=1 on the empty-tuple case. IFD.predictor delegates
  to IFD.get_value, which can return a tuple for a malformed
  Predictor tag with count > 1; without this normalization the
  (3,) + integer-SampleFormat combo would slip past the guard.
- Add a regression test that asserts (3,) + SF=1 still raises and
  that an empty tuple / non-3 tuple is a no-op.
- Replace the hardcoded Compression tag value in the synthetic-TIFF
  builder with the already-imported COMPRESSION_NONE constant.
@brendancol brendancol merged commit 837ec08 into main May 15, 2026
1 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: predictor=3 + integer dtype silently misdecoded on read

2 participants