Skip to content

Reject contradictory GeoKey directories on read (#2417)#2422

Merged
brendancol merged 3 commits into
mainfrom
issue-2417
May 26, 2026
Merged

Reject contradictory GeoKey directories on read (#2417)#2422
brendancol merged 3 commits into
mainfrom
issue-2417

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2417

Summary

  • Add InconsistentGeoKeysError (subclass of GeoTIFFAmbiguousMetadataError) and a registered read-side check that rejects internally contradictory ModelType / ProjectedCSType / GeographicType GeoKey combinations.
  • Thread allow_inconsistent_geokeys through open_geotiff, read_geotiff_dask, read_geotiff_gpu, read_vrt, the two _finalize_* helpers, and _validate_read_geo_info. Default is fail-closed.
  • Three structural cases get rejected: ModelType=projected with only GeographicType populated, ModelType=geographic with ProjectedCSType populated, and both type keys resolved to different EPSG codes. User-defined codes (32767) are treated as "no resolved code" and do not trigger.

Backend coverage

Read-side fix only. The check sits on the shared _validate_read_geo_info helper, so numpy / cupy / dask+numpy / dask+cupy all run it. VRT does not parse GeoKeys (it consumes the GDAL <SRS> field), so VRT reads no-op on the GeoKey context but still accept the opt-out kwarg for signature symmetry.

Test plan

  • Unit tests for each rejection case via validate_read_metadata.
  • Full-stack tests with hand-built TIFFs that exercise the exact issue reproducer through open_geotiff().
  • Opt-out (allow_inconsistent_geokeys=True) restores legacy behaviour.
  • User-defined (32767) codes do not false-positive.
  • Existing geotiff suite (5740 tests) still passes after kwarg-order and doc-parity updates.

The reader took ProjectedCSTypeGeoKey first, fell back to
GeographicTypeGeoKey, and never cross-checked either against
ModelTypeGeoKey. A TIFF declaring ModelType=geographic with an EPSG
under ProjectedCSTypeGeoKey would publish trustworthy-looking
attrs['crs'] / attrs['crs_wkt'] fabricated from contradictory inputs.

Add InconsistentGeoKeysError (subclass of GeoTIFFAmbiguousMetadataError)
and a registered read-side check that refuses three structural
combinations: ModelType=projected with only GeographicType set,
ModelType=geographic with ProjectedCSType set, and both type keys
populated with different non-user-defined EPSG codes. Pass
allow_inconsistent_geokeys=True on the public read entry points to keep
the legacy permissive behaviour for known-quirky historical files.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 26, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Reject contradictory GeoKey directories on read (#2417)

Blockers

None.

Suggestions

  1. Magic numeric GeoKey IDs in _attrs.py:1114-1116. The helper uses raw_geokeys.get(1024), .get(3072), .get(2048) instead of GEOKEY_MODEL_TYPE, GEOKEY_PROJECTED_CS_TYPE, GEOKEY_GEOGRAPHIC_TYPE from _geotags. _attrs.py already imports from _geotags at line 167, so the named constants are one import away and keep the IDs in one place.

  2. Validator crashes on NaN / inf model_type in _validation.py:1289-1294. _as_int(float('nan')) raises ValueError; _as_int(float('inf')) raises OverflowError. The reader never produces these for type-code GeoKeys, but validate_read_metadata takes an arbitrary dict and any caller can pass garbage. A try/except (ValueError, OverflowError) around the int(v) cast (returning None) keeps the validator from blowing up on bad input.

  3. VRT docstring at _backends/vrt.py:263-269 says "Forwarded to the per-source reader," but _read_vrt_internal is not called with allow_inconsistent_geokeys. Same pattern as allow_unparseable_crs today, so it's not a regression, but the docstring should reflect what's actually happening. Either trim the "Forwarded" line or thread the kwarg into _read_vrt_internal.

Nits

  1. _validation.py:1228-1232 declares _MODEL_TYPE_UNDEFINED and _MODEL_TYPE_GEOCENTRIC but never uses them. Reference them in the docstring or drop them.

  2. The hand-built TIFF helper in test_inconsistent_geokeys_2417.py:215-330 duplicates ~80 lines from test_remaining_fail_closed_1987.py::_write_minimal_tiff_with_wkt. A shared tests/_geotiff_fixtures.py would clean this up; fine to leave as a follow-up.

  3. Every raised InconsistentGeoKeysError ends with "See issue #2417," consistent with the family.

What looks good

  • Error class slots into the GeoTIFFAmbiguousMetadataError hierarchy and is exported in __init__.py __all__ plus the kwarg-order and doc-parity canonical tests.
  • Opt-out kwarg plumbed through every public read entry point with default False.
  • Three rejection cases cover the structural matrix. User-defined (32767) treated as "no resolved code," which avoids false positives on legitimate placeholder slots.
  • 21 unit + integration tests pass, including opt-out, float coercion, and non-numeric tolerance.
  • Doc row added to geotiff_safe_io.rst in the same format as the sibling rows.

Checklist

  • Algorithm matches the GeoTIFF ModelType enum
  • Read-side helper shared across numpy / cupy / dask, so backend parity is structural
  • NaN handling correct on control inputs
  • NaN/inf model_type fuzz case missing from tests (see Suggestion 2)
  • Validation runs at graph build, not per-chunk
  • No premature materialization
  • No benchmark needed (validation-only)
  • No README matrix change needed (no new function)
  • Docstrings present, modulo the VRT wording fix in Suggestion 3

- Use the named GEOKEY_MODEL_TYPE / GEOKEY_PROJECTED_CS_TYPE /
  GEOKEY_GEOGRAPHIC_TYPE constants from _geotags in _attrs instead of
  raw integer literals, so the IDs live in one place.
- Guard the validator's int coercion against NaN / inf floats. Pure
  defensive belt for callers that hand-build a context dict;
  validate_read_metadata is public-ish and should not crash on
  garbage input. Adds a parametrised fuzz test covering nan / inf /
  -inf across all three GeoKey slots.
- Reference the unused _MODEL_TYPE_UNDEFINED / _MODEL_TYPE_GEOCENTRIC
  enum constants in the check's docstring so they document the full
  spec rather than dangling.
- VRT docstring no longer claims the kwarg is forwarded to the
  per-source reader. _read_vrt_internal does not currently thread
  per-GeoTIFF-source allow_* kwargs (same pattern as
  allow_unparseable_crs), so the kwarg is documented as a no-op on
  the VRT path until that VRT-internal change happens separately.

Follow-up issue #2423 tracks extracting the duplicated hand-built
TIFF writer in tests/test_inconsistent_geokeys_2417.py and
tests/test_remaining_fail_closed_1987.py into a shared
tests/_geotiff_fixtures.py helper.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: After review-pass-1 commits

Disposition of round-1 findings

  • Suggestion 1 (magic numeric GeoKey IDs in _attrs.py): fixed. _attrs.py now imports GEOKEY_MODEL_TYPE, GEOKEY_PROJECTED_CS_TYPE, GEOKEY_GEOGRAPHIC_TYPE from _geotags and uses the named constants.
  • Suggestion 2 (NaN / inf crash in _validation._as_int): fixed. int(v) is now wrapped in try / except (ValueError, OverflowError) returning None. A parametrised fuzz test covers nan / inf / -inf across all three GeoKey slots.
  • Suggestion 3 (VRT docstring overstates forwarding): fixed by trimming the docstring to say the kwarg is "currently a no-op on the VRT path." Threading the kwarg into _read_vrt_internal would mirror what is missing for allow_unparseable_crs today and is out of scope here.
  • Nit 1 (unused _MODEL_TYPE_UNDEFINED / _MODEL_TYPE_GEOCENTRIC): fixed by referencing both in the docstring's ModelType enum list so the constants document the full spec.
  • Nit 2 (duplicated hand-built TIFF helper): deferred. The dedupe sweep touches test_remaining_fail_closed_1987.py (not otherwise modified by this PR) and needs a parameterised helper signature. Filed as follow-up issue #2423.
  • Nit 3 (consistent "See issue #2417" tail on error messages): dismissed. The pattern matches the rest of the _check_read_* family and no action was needed.

Re-audit of round-2 changes

Blockers

None.

Suggestions

None.

Nits

None.

What looks good

  • The named-constants migration removes the magic integers without changing behaviour (the integer values were correct).
  • The NaN / inf guard is local to _as_int, has no effect on the happy path, and is covered by a parametrised test that exercises every slot.
  • The VRT docstring update is honest about the per-source forwarding gap and points at the parallel allow_unparseable_crs situation, so a future PR can fix both consistently.
  • The _MODEL_TYPE_UNDEFINED / _MODEL_TYPE_GEOCENTRIC reference makes the docstring a complete spec citation; the constants are no longer dead code.
  • All 67 touched tests still pass after the follow-ups.

Checklist

  • Round-1 Blockers: n/a (none)
  • Round-1 Suggestions: all three fixed in-PR
  • Round-1 Nits: 1 fixed, 1 deferred with issue link, 1 dismissed with reason
  • No new findings on re-audit

# Conflicts:
#	xrspatial/geotiff/__init__.py
@brendancol brendancol merged commit 30d9b83 into main May 26, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoTIFF reader silently accepts contradictory ModelType/Projected/Geographic GeoKeys

1 participant