Reject compound EPSG codes on the stable-EPSG writer path (#2418)#2420
Conversation
The integer-EPSG writer path emits only GeographicTypeGeoKey (2048) and ProjectedCSTypeGeoKey (3072). Both GeoKeys are spec'd to hold a 2D horizontal CRS code. When the caller passes a compound CRS (horizontal + vertical, e.g. EPSG:6349 = "NAD83(2011) + NAVD88 height"), pyproj reports the compound CRS as is_geographic or is_projected based on its horizontal sub-CRS, so _model_type_from_epsg returns a 2D model type and the writer stores the compound code in the horizontal-CRS slot. rasterio / GDAL then reads the file back as the horizontal sub-CRS (vertical component dropped) or as no CRS at all. The xrspatial reader masks the corruption by re-resolving the stored integer through pyproj. Adds a NonRepresentableEPSGCRSError raised from _validate_crs_arg when the EPSG resolves to a compound CRS. Defense in depth in _model_type_from_epsg catches the case if validation is bypassed. _wkt_to_epsg returns None for compound CRSs so a caller who passes the full compound WKT routes through the user-defined CRS / citation path (#1768) instead of falling back into the integer path. Tests pin the rejection at the kwarg entry point and via the DataArray.attrs['crs'] path, confirm plain horizontal CRSs still round-trip through rasterio with the input code intact, and confirm the compound WKT path still writes. Tests use rasterio directly for round-trip assertions so the xrspatial reader cannot mask the corruption.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Reject compound EPSG codes on the stable-EPSG writer path (#2418)
Blockers (must fix before merge)
None. The fix is correct, and the reproduction from the issue body now raises at the call site instead of corrupting the file on disk.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/_geotags.py:1463: lazyfrom ._errors import NonRepresentableEPSGCRSErrorinside_model_type_from_epsg._errorsis already imported at the top of the module (line 7 importsUnknownCRSModelTypeErrorfrom the same module), so there is no circular-import reason for the deferred import. Move it to the top alongside the existing_errorsimport. -
xrspatial/geotiff/tests/test_compound_crs_reject_2418.py:154:test_compound_epsg_does_not_round_trip_via_integer_pathis a tautology. The docstring says it demonstrates corruption by bypassing the validator, but the body only asserts pyproj structural properties (crs.is_compound is True,crs.is_geographic is True). It does not exercise the write path. Either remove it, or call intobuild_geo_tagswith the validator monkey-patched and assert the rasterio readback is broken. The latter would be a useful regression in case a future refactor drops either guard.
Nits (optional improvements)
-
xrspatial/geotiff/_crs.py:48: the helper name_reject_non_representable_epsgis a mouthful. The current name is fine for clarity at the call site; no change needed unless someone has a shorter idea. -
xrspatial/geotiff/__init__.py:74: the multi-linefrom ._errors import (...)is getting wide. Not introduced by this PR; an isort pass would tidy it. -
GPU writer at
xrspatial/geotiff/_writers/gpu.py:472calls_validate_crs_arg, so the fix flows through. No explicit GPU-path test for compound EPSG rejection. Low value to add given the validator is shared, but worth noting that GPU coverage is structural only.
What looks good
- The fix is properly scoped. Only compound CRSs are rejected. Non-compound non-2D codes (EPSG:4979 3D-geographic, EPSG:4978 geocentric, EPSG:5773 vertical-only) are left alone because they round-trip through rasterio with the input code intact.
test_geographic_3d_crs_still_workspins that scoping. - The validator gate plus the defense-in-depth check in
_model_type_from_epsgmeans a future caller that bypasses_validate_crs_argstill cannot write a compound EPSG to disk. _wkt_to_epsgreturning None for compound CRSs is the right call. It routes the compound-WKT path to the user-defined CRS / citation fallback instead of silently downgrading to the broken integer path.- The negative-case tests use rasterio for round-trip assertions, which is what the issue body asked for. The xrspatial reader would mask the bug.
- The error message names the failing EPSG, names the CRS, and offers both the WKT fallback and the horizontal sub-CRS workaround.
Checklist
- Algorithm matches the spec (GeoTIFF GeoKey emission rules, Annex A)
- All implemented backends produce consistent results (fix lives in shared validation)
- NaN handling not applicable (metadata-only change)
- Edge cases covered (compound geographic, compound projected, attrs path, kwarg path)
- Dask chunk boundaries not relevant
- No premature materialization or unnecessary copies
- No benchmark needed (validation-only change)
- README feature matrix not applicable
- Docstrings present and accurate
Two follow-ups from the PR review: 1. Move ``from ._errors import NonRepresentableEPSGCRSError`` to the top of ``_geotags.py`` alongside the existing ``_errors`` import. The lazy import inside ``_model_type_from_epsg`` was unnecessary -- ``_errors`` is already imported at the top of the module, so there is no circular-import concern. 2. Replace ``test_compound_epsg_does_not_round_trip_via_integer_path`` (a tautology that only asserted pyproj structural properties) with ``test_compound_epsg_corruption_surfaces_when_validator_bypassed``. The new test monkey-patches both validator gates and runs the integer-EPSG write path for EPSG:6349, then asserts rasterio does not recover EPSG:6349 from the result. The test detects the exact corruption the fix prevents and catches a future refactor that drops either guard. Nit findings (helper name length, pre-existing wide import line, GPU path lacking an explicit test) are dismissed in the rockout summary: the helper name is descriptive and a rename would be pure churn, the import is pre-existing and out of scope, and the GPU writer flows through the same shared ``_validate_crs_arg`` gate that already has direct unit tests.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up Review (post-fixes)
Re-reviewed after commit ffcaee4.
Original findings -- disposition
- Suggestion: hoist
NonRepresentableEPSGCRSErrorimport -- fixed._geotags.py:7now imports it alongsideRotatedTransformErrorandUnknownCRSModelTypeError; the in-function import in_model_type_from_epsgis gone. - Suggestion: tautology test -- fixed. Replaced with
test_compound_epsg_corruption_surfaces_when_validator_bypassed, which monkey-patches both gates and assertsrasterio_epsg != 6349from the result. The new test pins the actual corruption the fix prevents. - Nit: helper name
_reject_non_representable_epsg-- dismissed. Name is descriptive at the call site, no shorter alternative offered, rename would be pure churn. - Nit: wide
__init__.pyimport line -- dismissed. Pre-existing, explicitly out of scope per the original finding. - Nit: no explicit GPU test -- dismissed. The reviewer themselves flagged it as low-value.
_validate_crs_argis the gate and has direct unit tests; the GPU writer calls it at_writers/gpu.py:472.
Current state
No new findings. The 23 tests in the new file pass. The full xrspatial/geotiff/tests/ suite (4863 tests) still passes locally.
Ready for human review.
Closes #2418.
Summary
GeographicTypeGeoKey(2048) andProjectedCSTypeGeoKey(3072). Both are spec'd to hold a 2D horizontal CRS code. Pass a compound CRS like EPSG:6349 and pyproj reportsis_geographic=True(because the horizontal sub-CRS is geographic), so the writer stores the compound code in the horizontal slot. rasterio / GDAL reads the file back as the wrong CRS or as no CRS at all.NonRepresentableEPSGCRSErrorand reject compound EPSG codes from_validate_crs_arg. Defense in depth in_model_type_from_epsgcatches a direct call intobuild_geo_tags._wkt_to_epsgnow returns None for compound CRSs so a caller who passes the full compound WKT routes through the user-defined CRS / citation path (geotiff: WKT-only CRS writes opaque citation, hurts non-libgeotiff interop #1768) instead of being downgraded into the broken integer path.Backend coverage
Pure metadata fix in
_crs.py/_geotags.py/_errors.py. Numpy, cupy, dask+numpy, and dask+cupy writers all flow through the same_validate_crs_arggate, so the fix applies to every backend.Test plan
pytest xrspatial/geotiff/tests/test_compound_crs_reject_2418.py(22 new tests, including rasterio round-trip assertions for EPSG:6349, 5498, 7415, 8360 and regression coverage for plain horizontal CRSs).pytest xrspatial/geotiff/tests/(4863 passed, 25 skipped, 3 xfailed; no regressions).NonRepresentableEPSGCRSErrorat theto_geotiff(..., crs=6349)call instead of silently producing a corrupt GeoTIFF.