Skip to content

Centralize VRT capability validation (#2329)#2339

Merged
brendancol merged 2 commits into
mainfrom
issue-2329
May 24, 2026
Merged

Centralize VRT capability validation (#2329)#2339
brendancol merged 2 commits into
mainfrom
issue-2329

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2329. Part of #2321.

Summary

  • New xrspatial/geotiff/_vrt_validation.py with one validate_parsed_vrt entry point that audits an already-parsed VRTDataset against every capability the read pipeline does not honour: band count, dtype kind, transform orientation, pixel-size, SrcRect/DstRect non-negativity and within-extent, the supported resampling set, plus the existing CRS / mixed-nodata policies.
  • Wired into both eager and chunked read_vrt dispatch in _backends/vrt.py. open_geotiff('foo.vrt') delegates to read_vrt, so the same validator runs there too. Direct and dispatched entry points produce equivalent failures for the same bad input.
  • The rejection lives ahead of any source decode. Previously some checks fired mid-decode (and per chunk task under chunked dispatch), so a malformed VRT could build a dask graph and then blow up deep inside compute().
  • New VRTUnsupportedError (a GeoTIFFAmbiguousMetadataError / ValueError subclass) carries the offending source path and field in its message. Existing typed errors (RotatedTransformError, MixedBandMetadataError, UnparseableCRSError) keep their contracts so except-by-subclass callers stay green; the validator delegates those cases to the existing registered hooks rather than swapping the type.

Backend coverage

  • numpy: yes (eager read path)
  • dask+numpy: yes (chunked dispatcher)
  • cupy: not applicable (VRT reads do not go through the GPU decoder pipeline)
  • dask+cupy: not applicable

Test plan

  • 17 new tests in test_vrt_validation_2321.py: one negative test per validator rule, plus entry-point parity tests asserting read_vrt and open_geotiff raise the same type and message for the same input.
  • Full geotiff suite: 5276 passed, 68 skipped.
  • Chunked dispatch: unsupported VRT raises at graph build, not inside a chunk function.

Add `_vrt_validation.validate_parsed_vrt` as the single entry point for
auditing a parsed `VRTDataset` against every capability the read
pipeline does not honour. Both `read_vrt` (eager and chunked dispatch)
and `open_geotiff('foo.vrt')` call the validator before any source
decode, so the two entry points produce equivalent failures for the
same bad input. Previously the per-source SrcRect/DstRect, zero
pixel-size, and unsupported-resample checks fired mid-decode (and per
chunk task under chunked dispatch), so a malformed VRT could build a
dask graph successfully and blow up deep in a `compute()` chunk
function.

Validator rules: band count sanity, dtype kind compatibility, transform
orientation (rotated/sheared), pixel-size compatibility,
SrcRect/DstRect non-negativity and within-extent, and the supported
resampling set. The mixed-band-nodata case keeps its existing
`MixedBandMetadataError` typed-error contract by delegating to the
already-registered `_check_read_mixed_band_metadata` hook. The
rotated-transform and unparseable-CRS cases keep their existing typed
subclasses (`RotatedTransformError`, `UnparseableCRSError`) so
`except`-by-subclass callers stay green; the validator adds the
source-path to the message and lifts the check ahead of any decode.
The new capability checks raise `VRTUnsupportedError`, a
`GeoTIFFAmbiguousMetadataError` subclass (and therefore `ValueError`).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 23, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Centralize VRT capability validation (#2329)

Read every changed file in full from the issue-2329 worktree. Findings below; the rest of the diff is fine.

Blockers

None.

Suggestions

  • Duplicate GeoTIFFFallbackWarning on malformed CRS (_vrt_validation.py:246). The validator calls _wkt_to_epsg, which emits a GeoTIFFFallbackWarning on parse failure. The existing _check_read_unparseable_crs in _validation.py:1025 uses pyproj.CRS.from_user_input directly with no warning. Under the new code, a malformed VRT CRS emits a fallback warning that the existing check did not. You can see it in the test output for test_unparseable_crs_rejected_without_opt_in. Either swap to from_user_input directly, or wrap the probe in warnings.catch_warnings().

  • _looks_like_wkt and _NEAREST_RESAMPLE_ALGS are duplicated instead of imported (_vrt_validation.py:52-77). _NEAREST_RESAMPLE_ALGS already lives in _vrt.py:566 and _looks_like_wkt already lives in _crs.py:35. The docstring justifies the copy by pointing at an import cycle, but a lazy import inside the validator body sidesteps the cycle without the drift risk. If someone adds Nearest_Neighbor to one set but not the other later, the bug will be hard to spot.

Nits

  • mode='read' parameter (_vrt_validation.py:84, 132-137). The only accepted value is 'read'; the documented future write mode does not exist yet. Either drop the parameter until a write validator actually needs to share the entry point, or accept that every caller is going to pass mode='read' forever.

  • Rotated and unparseable-CRS branches shadow the registered hooks (_vrt_validation.py:181-193, 240-258). _check_read_rotated_transform and _check_read_unparseable_crs still run from validate_read_metadata right after the validator, but the validator's branches now preempt them by raising first. That is intentional (to embed the source path in the message), but it is worth a comment in the docstring or in validate_read_metadata so a future maintainer modifying the registered hook realises the hook is now reachable only when the validator passes the case (e.g. allow_rotated=True).

What looks good

  • TDD order is correct: 17 negative tests added, validator written to make them pass, full geotiff suite stays at 5276 passing.
  • The entry-point parity tests (test_resample_parity_across_entry_points, test_rotated_parity_across_entry_points, test_zero_bands_parity_across_entry_points) compare both error type and message between read_vrt and open_geotiff, which is the strongest form of the parity claim.
  • Chunked dispatch is wired (test_unsupported_resample_chunked_raises_at_build exercises it), so the "deep in a chunk function" failure mode the parent issue called out is closed.
  • VRTUnsupportedError is a GeoTIFFAmbiguousMetadataError subclass and therefore a ValueError, so except ValueError callers keep working.
  • Existing typed errors keep their contracts: the validator raises RotatedTransformError / MixedBandMetadataError / UnparseableCRSError for cases that already had those types; only the new capability rules raise VRTUnsupportedError.

Checklist

  • Algorithm matches reference: not applicable (no algorithmic change).
  • All implemented backends produce consistent results: validator runs at both eager and chunked dispatch.
  • NaN handling: not applicable.
  • Edge cases covered: zero bands, negative SrcRect/DstRect, zero pixel size, out-of-extent DstRect, unsupported resample, rotated transform, unparseable CRS, mixed-band nodata.
  • Dask chunk boundaries: rejection happens at graph build, not in a chunk function.
  • No premature materialization or unnecessary copies.
  • Benchmark not needed (validation overhead is a few ms per parse).
  • README feature matrix: not applicable (no new public API).
  • Docstrings present and accurate.

…2329)

- Replace ``_wkt_to_epsg`` (which emits ``GeoTIFFFallbackWarning`` on
  parse failure) with a direct ``pyproj.CRS.from_user_input`` probe so
  the validator's unparseable-CRS check matches the no-warning
  behaviour of the registered ``_check_read_unparseable_crs`` hook.
- Lazy-import ``_NEAREST_RESAMPLE_ALGS`` from ``_vrt`` and
  ``_looks_like_wkt`` from ``_crs`` instead of hand-copying both
  constants. Sidesteps the import cycle without the drift risk that
  comes with maintaining two copies.
- Add a Notes section to the ``validate_parsed_vrt`` docstring that
  flags the overlap with the registered ``validate_read_metadata``
  hooks so a future maintainer modifying the rotated-transform or
  unparseable-CRS check knows the validator preempts them on the
  VRT path.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Second pass after commit 8be5d84. Re-read the diff in the worktree.

Dispositions from the first review

  • Fixed: duplicate GeoTIFFFallbackWarning on malformed CRS. The validator now probes pyproj directly with from_user_input, matching the silent behaviour of the registered hook. Test output for test_unparseable_crs_rejected_without_opt_in no longer emits the spurious warning.
  • Fixed: duplicated _NEAREST_RESAMPLE_ALGS and _looks_like_wkt. The validator lazy-imports both from their canonical homes (_vrt._NEAREST_RESAMPLE_ALGS and _crs._looks_like_wkt). Drift risk gone.
  • Fixed: shadow on registered hooks. Added a Notes section to the validator docstring spelling out the overlap with _check_read_rotated_transform and _check_read_unparseable_crs so a future maintainer editing either side knows the validator preempts them on the VRT path.
  • Dismissed (with reason): drop the mode='read' parameter. The master plan (docs/superpowers/plans/2026-05-22-geotiff-vrt-release-hardening-2321.md) calls out validate_parsed_vrt(parsed, *, mode) as the canonical signature for the centralised entry point, with a write-mode follow-up implied. Dropping the parameter would diverge from the plan; keeping it costs a single mode != 'read' check per call.

Remaining

None. Full geotiff suite stays at 5276 passing.

Checklist

  • All first-pass blockers and suggestions addressed.
  • Nit dismissals recorded with reason.
  • Tests still pass on the eager and chunked paths.

@brendancol brendancol merged commit 7811426 into main May 24, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoTIFF: centralize VRT capability validation (#2321 sub-task 2)

1 participant