Skip to content

GeoTIFF release hardening: define and lock down supported VRT contract #2321

@brendancol

Description

@brendancol

Context

The GeoTIFF module is now large enough that release readiness should be judged by an explicit support contract, not by the fact that many edge cases have tests. VRT support is useful for xarray-spatial users, but it should remain a conservative interoperability layer rather than an attempt to reimplement GDAL VRT completely.

Recommendation: keep VRT support, but document and enforce a narrow supported subset. Anything outside that subset should fail closed with actionable errors rather than partially working, silently dropping metadata, or returning pixels whose downstream geospatial meaning is ambiguous.

Recommended VRT support contract

Supported for release:

  • Simple GDAL VRT mosaics backed by GeoTIFF sources.
  • Sources with compatible CRS, transform orientation, pixel size, dtype, band count, and band layout.
  • Windowed reads where source windows map cleanly to destination windows.
  • Lazy/dask reads over the same supported subset.
  • Explicit handling of nodata, including mixed-band nodata rejection by default.
  • Missing source policy controlled by missing_sources, with raise as the default.

Not promised for release unless explicitly added later:

  • Warped VRTs / reprojection VRTs.
  • Arbitrary resampling semantics beyond the implemented, tested subset.
  • Mixed CRS, mixed resolution, mixed dtype, or mixed band metadata unless there is a documented explicit opt-in.
  • Nested VRTs.
  • Complex source/mask/alpha semantics not fully modeled by the attrs contract.
  • Full GDAL VRT parity.

The docs should say this plainly: VRT is an advanced compatibility feature for simple mosaics, not a GDAL replacement.

Implementation plan

PR 1: Publish the VRT contract

  • Add a VRT support matrix to the GeoTIFF reference docs.
  • Mark VRT as advanced in prose wherever GeoTIFF tiers are described.
  • Document the supported subset and known non-goals above.
  • Add examples of safe usage and examples that intentionally raise.
  • Ensure public docstrings for open_geotiff(... .vrt ...), read_vrt, and write_vrt match the same contract.

Acceptance criteria:

  • Docs clearly distinguish supported simple mosaics from unsupported GDAL VRT features.
  • No doc path implies full GDAL VRT compatibility.

PR 2: Centralize VRT capability validation

  • Add a single validator for parsed VRT metadata before read execution.
  • Validate CRS compatibility, dtype compatibility, band count, nodata policy, transform orientation, pixel-size compatibility, source rectangle/destination rectangle sanity, and supported resampling.
  • Make direct read_vrt calls and open_geotiff(... .vrt ...) use the same validator.
  • Ensure unsupported features fail before any partial mosaic work starts where feasible.

Acceptance criteria:

  • Unsupported VRT features raise typed/actionable errors at graph build or eager read setup.
  • Direct and dispatcher entry points have equivalent failures.

PR 3: Add metadata parity tests for VRT, not just pixel tests

  • Add parity tests asserting attrs and coords for eager, dask, and GPU-enabled routes where applicable.
  • Cover transform, crs, crs_wkt, nodata, masked_nodata, georef_status, raster_type, gdal_metadata_xml, and extra_tags where the contract promises them.
  • Add negative tests for mixed CRS, mixed nodata, unsupported resampling, bad source/destination rectangles, and missing sources.

Acceptance criteria:

  • VRT tests fail if pixels match but metadata is wrong or missing.
  • Mixed or ambiguous metadata cannot silently flatten to one output value.

PR 4: Lock backend parity for VRT and sidecar/overview interactions

  • Add a focused backend parity matrix for VRT reads across eager and dask.
  • Include external overview sidecar sources where VRT references GeoTIFFs with .tif.ovr pyramids.
  • Assert metadata parity, not only values.
  • Confirm windowed reads shift coords and attrs['transform'] consistently.

Acceptance criteria:

  • The VRT path cannot pass by returning correct pixel values with wrong georeferencing attrs.
  • Windowed eager and lazy VRT reads agree on shape, coords, attrs, and values.

PR 5: Harden URL/source routing used by GeoTIFF and VRT

  • Centralize source scheme classification using urlparse(...).scheme.lower().
  • Make HTTP(S) detection case-insensitive everywhere.
  • Ensure uppercase HTTP:// / HTTPS:// still goes through _HTTPSource and SSRF/DNS-pinning checks, not fsspec.
  • Add regression tests for uppercase schemes and private-host rejection.

Acceptance criteria:

  • No code path uses case-sensitive startswith(('http://', 'https://')) for security-relevant dispatch.
  • HTTP(S) SSRF protections apply regardless of scheme casing.

PR 6: Release gate / audit checklist

  • Add a release checklist for GeoTIFF promises:
    • local GeoTIFF read/write
    • COG read/write
    • HTTP/fsspec reads
    • nodata lifecycle
    • attrs contract
    • VRT supported subset
    • GPU experimental paths
  • Add or update tests so each promised tier has at least one explicit parity gate.
  • Keep experimental/internal-only features documented as such.

Acceptance criteria:

  • Release notes can accurately state what is stable, advanced, experimental, and unsupported.
  • Every promised VRT behavior has a regression test.

Notes from review

The current module already contains a lot of defensive work. The remaining risk is mostly boundary drift between entry points and backends: eager vs dask vs GPU, local vs HTTP/fsspec, GeoTIFF vs VRT, and pixel parity vs metadata parity. For release readiness, the priority should be to reduce those drift surfaces and make unsupported behavior impossible to mistake for supported behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions