Reject contradictory GeoKey directories on read (#2417)#2422
Conversation
The reader took ProjectedCSTypeGeoKey first, fell back to GeographicTypeGeoKey, and never cross-checked either against ModelTypeGeoKey. A TIFF declaring ModelType=geographic with an EPSG under ProjectedCSTypeGeoKey would publish trustworthy-looking attrs['crs'] / attrs['crs_wkt'] fabricated from contradictory inputs. Add InconsistentGeoKeysError (subclass of GeoTIFFAmbiguousMetadataError) and a registered read-side check that refuses three structural combinations: ModelType=projected with only GeographicType set, ModelType=geographic with ProjectedCSType set, and both type keys populated with different non-user-defined EPSG codes. Pass allow_inconsistent_geokeys=True on the public read entry points to keep the legacy permissive behaviour for known-quirky historical files.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Reject contradictory GeoKey directories on read (#2417)
Blockers
None.
Suggestions
-
Magic numeric GeoKey IDs in
_attrs.py:1114-1116. The helper usesraw_geokeys.get(1024),.get(3072),.get(2048)instead ofGEOKEY_MODEL_TYPE,GEOKEY_PROJECTED_CS_TYPE,GEOKEY_GEOGRAPHIC_TYPEfrom_geotags._attrs.pyalready imports from_geotagsat line 167, so the named constants are one import away and keep the IDs in one place. -
Validator crashes on NaN / inf
model_typein_validation.py:1289-1294._as_int(float('nan'))raisesValueError;_as_int(float('inf'))raisesOverflowError. The reader never produces these for type-code GeoKeys, butvalidate_read_metadatatakes an arbitrary dict and any caller can pass garbage. Atry/except (ValueError, OverflowError)around theint(v)cast (returningNone) keeps the validator from blowing up on bad input. -
VRT docstring at
_backends/vrt.py:263-269says "Forwarded to the per-source reader," but_read_vrt_internalis not called withallow_inconsistent_geokeys. Same pattern asallow_unparseable_crstoday, so it's not a regression, but the docstring should reflect what's actually happening. Either trim the "Forwarded" line or thread the kwarg into_read_vrt_internal.
Nits
-
_validation.py:1228-1232declares_MODEL_TYPE_UNDEFINEDand_MODEL_TYPE_GEOCENTRICbut never uses them. Reference them in the docstring or drop them. -
The hand-built TIFF helper in
test_inconsistent_geokeys_2417.py:215-330duplicates ~80 lines fromtest_remaining_fail_closed_1987.py::_write_minimal_tiff_with_wkt. A sharedtests/_geotiff_fixtures.pywould clean this up; fine to leave as a follow-up. -
Every raised
InconsistentGeoKeysErrorends with "See issue #2417," consistent with the family.
What looks good
- Error class slots into the
GeoTIFFAmbiguousMetadataErrorhierarchy and is exported in__init__.py__all__plus the kwarg-order and doc-parity canonical tests. - Opt-out kwarg plumbed through every public read entry point with default
False. - Three rejection cases cover the structural matrix. User-defined (
32767) treated as "no resolved code," which avoids false positives on legitimate placeholder slots. - 21 unit + integration tests pass, including opt-out, float coercion, and non-numeric tolerance.
- Doc row added to
geotiff_safe_io.rstin the same format as the sibling rows.
Checklist
- Algorithm matches the GeoTIFF ModelType enum
- Read-side helper shared across numpy / cupy / dask, so backend parity is structural
- NaN handling correct on control inputs
- NaN/inf
model_typefuzz case missing from tests (see Suggestion 2) - Validation runs at graph build, not per-chunk
- No premature materialization
- No benchmark needed (validation-only)
- No README matrix change needed (no new function)
- Docstrings present, modulo the VRT wording fix in Suggestion 3
- Use the named GEOKEY_MODEL_TYPE / GEOKEY_PROJECTED_CS_TYPE / GEOKEY_GEOGRAPHIC_TYPE constants from _geotags in _attrs instead of raw integer literals, so the IDs live in one place. - Guard the validator's int coercion against NaN / inf floats. Pure defensive belt for callers that hand-build a context dict; validate_read_metadata is public-ish and should not crash on garbage input. Adds a parametrised fuzz test covering nan / inf / -inf across all three GeoKey slots. - Reference the unused _MODEL_TYPE_UNDEFINED / _MODEL_TYPE_GEOCENTRIC enum constants in the check's docstring so they document the full spec rather than dangling. - VRT docstring no longer claims the kwarg is forwarded to the per-source reader. _read_vrt_internal does not currently thread per-GeoTIFF-source allow_* kwargs (same pattern as allow_unparseable_crs), so the kwarg is documented as a no-op on the VRT path until that VRT-internal change happens separately. Follow-up issue #2423 tracks extracting the duplicated hand-built TIFF writer in tests/test_inconsistent_geokeys_2417.py and tests/test_remaining_fail_closed_1987.py into a shared tests/_geotiff_fixtures.py helper.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up Review: After review-pass-1 commits
Disposition of round-1 findings
- Suggestion 1 (magic numeric GeoKey IDs in
_attrs.py): fixed._attrs.pynow importsGEOKEY_MODEL_TYPE,GEOKEY_PROJECTED_CS_TYPE,GEOKEY_GEOGRAPHIC_TYPEfrom_geotagsand uses the named constants. - Suggestion 2 (NaN / inf crash in
_validation._as_int): fixed.int(v)is now wrapped intry / except (ValueError, OverflowError)returningNone. A parametrised fuzz test covers nan / inf / -inf across all three GeoKey slots. - Suggestion 3 (VRT docstring overstates forwarding): fixed by trimming the docstring to say the kwarg is "currently a no-op on the VRT path." Threading the kwarg into
_read_vrt_internalwould mirror what is missing forallow_unparseable_crstoday and is out of scope here. - Nit 1 (unused
_MODEL_TYPE_UNDEFINED/_MODEL_TYPE_GEOCENTRIC): fixed by referencing both in the docstring's ModelType enum list so the constants document the full spec. - Nit 2 (duplicated hand-built TIFF helper): deferred. The dedupe sweep touches
test_remaining_fail_closed_1987.py(not otherwise modified by this PR) and needs a parameterised helper signature. Filed as follow-up issue #2423. - Nit 3 (consistent "See issue #2417" tail on error messages): dismissed. The pattern matches the rest of the
_check_read_*family and no action was needed.
Re-audit of round-2 changes
Blockers
None.
Suggestions
None.
Nits
None.
What looks good
- The named-constants migration removes the magic integers without changing behaviour (the integer values were correct).
- The NaN / inf guard is local to
_as_int, has no effect on the happy path, and is covered by a parametrised test that exercises every slot. - The VRT docstring update is honest about the per-source forwarding gap and points at the parallel
allow_unparseable_crssituation, so a future PR can fix both consistently. - The
_MODEL_TYPE_UNDEFINED/_MODEL_TYPE_GEOCENTRICreference makes the docstring a complete spec citation; the constants are no longer dead code. - All 67 touched tests still pass after the follow-ups.
Checklist
- Round-1 Blockers: n/a (none)
- Round-1 Suggestions: all three fixed in-PR
- Round-1 Nits: 1 fixed, 1 deferred with issue link, 1 dismissed with reason
- No new findings on re-audit
# Conflicts: # xrspatial/geotiff/__init__.py
Closes #2417
Summary
InconsistentGeoKeysError(subclass ofGeoTIFFAmbiguousMetadataError) and a registered read-side check that rejects internally contradictoryModelType/ProjectedCSType/GeographicTypeGeoKey combinations.allow_inconsistent_geokeysthroughopen_geotiff,read_geotiff_dask,read_geotiff_gpu,read_vrt, the two_finalize_*helpers, and_validate_read_geo_info. Default is fail-closed.ModelType=projectedwith onlyGeographicTypepopulated,ModelType=geographicwithProjectedCSTypepopulated, and both type keys resolved to different EPSG codes. User-defined codes (32767) are treated as "no resolved code" and do not trigger.Backend coverage
Read-side fix only. The check sits on the shared
_validate_read_geo_infohelper, so numpy / cupy / dask+numpy / dask+cupy all run it. VRT does not parse GeoKeys (it consumes the GDAL<SRS>field), so VRT reads no-op on the GeoKey context but still accept the opt-out kwarg for signature symmetry.Test plan
validate_read_metadata.open_geotiff().allow_inconsistent_geokeys=True) restores legacy behaviour.32767) codes do not false-positive.