Skip to content

Reject compound EPSG codes on the stable-EPSG writer path (#2418)#2420

Merged
brendancol merged 2 commits into
mainfrom
issue-2418
May 26, 2026
Merged

Reject compound EPSG codes on the stable-EPSG writer path (#2418)#2420
brendancol merged 2 commits into
mainfrom
issue-2418

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2418.

Summary

  • The integer-EPSG writer path emits only GeographicTypeGeoKey (2048) and ProjectedCSTypeGeoKey (3072). Both are spec'd to hold a 2D horizontal CRS code. Pass a compound CRS like EPSG:6349 and pyproj reports is_geographic=True (because the horizontal sub-CRS is geographic), so the writer stores the compound code in the horizontal slot. rasterio / GDAL reads the file back as the wrong CRS or as no CRS at all.
  • Add NonRepresentableEPSGCRSError and reject compound EPSG codes from _validate_crs_arg. Defense in depth in _model_type_from_epsg catches a direct call into build_geo_tags.
  • _wkt_to_epsg now returns None for compound CRSs so a caller who passes the full compound WKT routes through the user-defined CRS / citation path (geotiff: WKT-only CRS writes opaque citation, hurts non-libgeotiff interop #1768) instead of being downgraded into the broken integer path.

Backend coverage

Pure metadata fix in _crs.py / _geotags.py / _errors.py. Numpy, cupy, dask+numpy, and dask+cupy writers all flow through the same _validate_crs_arg gate, so the fix applies to every backend.

Test plan

  • pytest xrspatial/geotiff/tests/test_compound_crs_reject_2418.py (22 new tests, including rasterio round-trip assertions for EPSG:6349, 5498, 7415, 8360 and regression coverage for plain horizontal CRSs).
  • pytest xrspatial/geotiff/tests/ (4863 passed, 25 skipped, 3 xfailed; no regressions).
  • Manual repro from the issue body now raises NonRepresentableEPSGCRSError at the to_geotiff(..., crs=6349) call instead of silently producing a corrupt GeoTIFF.

The integer-EPSG writer path emits only GeographicTypeGeoKey (2048)
and ProjectedCSTypeGeoKey (3072). Both GeoKeys are spec'd to hold a
2D horizontal CRS code. When the caller passes a compound CRS
(horizontal + vertical, e.g. EPSG:6349 = "NAD83(2011) + NAVD88
height"), pyproj reports the compound CRS as is_geographic or
is_projected based on its horizontal sub-CRS, so _model_type_from_epsg
returns a 2D model type and the writer stores the compound code in
the horizontal-CRS slot. rasterio / GDAL then reads the file back as
the horizontal sub-CRS (vertical component dropped) or as no CRS at
all. The xrspatial reader masks the corruption by re-resolving the
stored integer through pyproj.

Adds a NonRepresentableEPSGCRSError raised from _validate_crs_arg
when the EPSG resolves to a compound CRS. Defense in depth in
_model_type_from_epsg catches the case if validation is bypassed.
_wkt_to_epsg returns None for compound CRSs so a caller who passes
the full compound WKT routes through the user-defined CRS / citation
path (#1768) instead of falling back into the integer path.

Tests pin the rejection at the kwarg entry point and via the
DataArray.attrs['crs'] path, confirm plain horizontal CRSs still
round-trip through rasterio with the input code intact, and confirm
the compound WKT path still writes. Tests use rasterio directly for
round-trip assertions so the xrspatial reader cannot mask the
corruption.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 26, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Reject compound EPSG codes on the stable-EPSG writer path (#2418)

Blockers (must fix before merge)

None. The fix is correct, and the reproduction from the issue body now raises at the call site instead of corrupting the file on disk.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_geotags.py:1463: lazy from ._errors import NonRepresentableEPSGCRSError inside _model_type_from_epsg. _errors is already imported at the top of the module (line 7 imports UnknownCRSModelTypeError from the same module), so there is no circular-import reason for the deferred import. Move it to the top alongside the existing _errors import.

  • xrspatial/geotiff/tests/test_compound_crs_reject_2418.py:154: test_compound_epsg_does_not_round_trip_via_integer_path is a tautology. The docstring says it demonstrates corruption by bypassing the validator, but the body only asserts pyproj structural properties (crs.is_compound is True, crs.is_geographic is True). It does not exercise the write path. Either remove it, or call into build_geo_tags with the validator monkey-patched and assert the rasterio readback is broken. The latter would be a useful regression in case a future refactor drops either guard.

Nits (optional improvements)

  • xrspatial/geotiff/_crs.py:48: the helper name _reject_non_representable_epsg is a mouthful. The current name is fine for clarity at the call site; no change needed unless someone has a shorter idea.

  • xrspatial/geotiff/__init__.py:74: the multi-line from ._errors import (...) is getting wide. Not introduced by this PR; an isort pass would tidy it.

  • GPU writer at xrspatial/geotiff/_writers/gpu.py:472 calls _validate_crs_arg, so the fix flows through. No explicit GPU-path test for compound EPSG rejection. Low value to add given the validator is shared, but worth noting that GPU coverage is structural only.

What looks good

  • The fix is properly scoped. Only compound CRSs are rejected. Non-compound non-2D codes (EPSG:4979 3D-geographic, EPSG:4978 geocentric, EPSG:5773 vertical-only) are left alone because they round-trip through rasterio with the input code intact. test_geographic_3d_crs_still_works pins that scoping.
  • The validator gate plus the defense-in-depth check in _model_type_from_epsg means a future caller that bypasses _validate_crs_arg still cannot write a compound EPSG to disk.
  • _wkt_to_epsg returning None for compound CRSs is the right call. It routes the compound-WKT path to the user-defined CRS / citation fallback instead of silently downgrading to the broken integer path.
  • The negative-case tests use rasterio for round-trip assertions, which is what the issue body asked for. The xrspatial reader would mask the bug.
  • The error message names the failing EPSG, names the CRS, and offers both the WKT fallback and the horizontal sub-CRS workaround.

Checklist

  • Algorithm matches the spec (GeoTIFF GeoKey emission rules, Annex A)
  • All implemented backends produce consistent results (fix lives in shared validation)
  • NaN handling not applicable (metadata-only change)
  • Edge cases covered (compound geographic, compound projected, attrs path, kwarg path)
  • Dask chunk boundaries not relevant
  • No premature materialization or unnecessary copies
  • No benchmark needed (validation-only change)
  • README feature matrix not applicable
  • Docstrings present and accurate

Two follow-ups from the PR review:

1. Move ``from ._errors import NonRepresentableEPSGCRSError`` to the
   top of ``_geotags.py`` alongside the existing ``_errors`` import.
   The lazy import inside ``_model_type_from_epsg`` was unnecessary --
   ``_errors`` is already imported at the top of the module, so there
   is no circular-import concern.

2. Replace ``test_compound_epsg_does_not_round_trip_via_integer_path``
   (a tautology that only asserted pyproj structural properties) with
   ``test_compound_epsg_corruption_surfaces_when_validator_bypassed``.
   The new test monkey-patches both validator gates and runs the
   integer-EPSG write path for EPSG:6349, then asserts rasterio does
   not recover EPSG:6349 from the result. The test detects the exact
   corruption the fix prevents and catches a future refactor that
   drops either guard.

Nit findings (helper name length, pre-existing wide import line, GPU
path lacking an explicit test) are dismissed in the rockout summary:
the helper name is descriptive and a rename would be pure churn, the
import is pre-existing and out of scope, and the GPU writer flows
through the same shared ``_validate_crs_arg`` gate that already has
direct unit tests.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review (post-fixes)

Re-reviewed after commit ffcaee4.

Original findings -- disposition

  • Suggestion: hoist NonRepresentableEPSGCRSError import -- fixed. _geotags.py:7 now imports it alongside RotatedTransformError and UnknownCRSModelTypeError; the in-function import in _model_type_from_epsg is gone.
  • Suggestion: tautology test -- fixed. Replaced with test_compound_epsg_corruption_surfaces_when_validator_bypassed, which monkey-patches both gates and asserts rasterio_epsg != 6349 from the result. The new test pins the actual corruption the fix prevents.
  • Nit: helper name _reject_non_representable_epsg -- dismissed. Name is descriptive at the call site, no shorter alternative offered, rename would be pure churn.
  • Nit: wide __init__.py import line -- dismissed. Pre-existing, explicitly out of scope per the original finding.
  • Nit: no explicit GPU test -- dismissed. The reviewer themselves flagged it as low-value. _validate_crs_arg is the gate and has direct unit tests; the GPU writer calls it at _writers/gpu.py:472.

Current state

No new findings. The 23 tests in the new file pass. The full xrspatial/geotiff/tests/ suite (4863 tests) still passes locally.

Ready for human review.

@brendancol brendancol merged commit 7b43786 into main May 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Writer emits structurally wrong CRS metadata for compound / vertical / geocentric EPSG codes

1 participant