Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions docs/source/reference/geotiff.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,101 @@ with spatial coords on both axes but no explicit transform raises
Multi-row / multi-column writes are unaffected. 1x1 inputs still
require ``attrs['transform']`` because neither axis has a step.

VRT support matrix (issue #2321)
================================

VRT reads sit at the ``advanced`` tier in
:data:`xrspatial.geotiff.SUPPORTED_FEATURES` (``reader.vrt``).
``open_geotiff``, ``read_vrt``, and ``write_vrt`` all target the same
narrow subset of GDAL's VRT spec. The reference below is the canonical
contract; the three docstrings echo it.

Supported
---------

* Simple GDAL VRT mosaics whose ``<SourceFilename>`` entries point at
GeoTIFF files. The VRT XML must resolve to source paths under the
VRT's own directory (or under a root listed in
``XRSPATIAL_VRT_ALLOWED_ROOTS``); see the source-path containment
note on ``read_vrt`` (#1671).
* Sources that agree on CRS, transform orientation (axis-aligned,
same sign on the y step), pixel size, dtype, and band count. The
read rejects mismatch with ``MixedBandMetadataError`` /
``ValueError`` rather than silently flattening.
* Windowed reads via ``window=(row_start, col_start, row_stop,
col_stop)``. Eager and dask paths shift coords and
``attrs['transform']`` together so a windowed eager read and a
windowed dask read agree on metadata.
* Lazy / dask reads over the same subset via ``chunks=``. Construction
parses the VRT XML and runs a parse-time existence sweep over every
referenced source so a missing file is surfaced at graph build, not
at ``compute()`` time (#2265).
* Explicit ``nodata``. The default (``band_nodata=None``) rejects a VRT
whose bands declare disagreeing per-band ``<NoDataValue>`` sentinels
with ``MixedBandMetadataError``. ``band_nodata='first'`` opts back
into the legacy flatten-to-band-0 behaviour explicitly (#1987).
* ``missing_sources='raise'`` (the default since #1860). Pass
``missing_sources='warn'`` to opt into the lenient partial-mosaic
path; see "VRT missing sources" below.

Non-goals (intentionally unsupported)
-------------------------------------

* Warped / reprojection VRTs (``<VRTDataset subClass="VRTWarpedDataset">``).
* Arbitrary resampling beyond the tested subset. The VRT reader honours
only the small set of resampling rules its test corpus covers; other
modes raise rather than silently picking a default.
* Mixed CRS, resolution, dtype, or band metadata across sources without
an explicit opt-in. The default behaviour is to fail closed.
* Nested VRTs (a ``<SourceFilename>`` that itself points at a ``.vrt``).
* Complex source / mask band / alpha band structures
(``<ComplexSource>`` with arbitrary scale and offset,
``<MaskBand>``, ``<AlphaBand>``).
* Full GDAL VRT parity. The contract above is the supported surface;
anything outside it is on a best-effort basis at most and is allowed
to raise.

Safe usage
----------

A simple mosaic over two compatible GeoTIFF tiles, read eagerly with
the fail-closed defaults:

.. code-block:: python

from xrspatial.geotiff import open_geotiff, write_vrt

# Write a VRT that mosaics two tiles. Both tiles share CRS,
# pixel size, dtype, and band count.
vrt_path = write_vrt(
'mosaic.vrt',
source_files=['tile_west.tif', 'tile_east.tif'],
)

# Read with the defaults: missing_sources='raise',
# band_nodata=None (fail closed on disagreeing per-band sentinels).
da = open_geotiff(vrt_path)

Intentionally raises
--------------------

Pointing the read at a VRT whose source tiles disagree on their
per-band nodata sentinels triggers the fail-closed check:

.. code-block:: python

from xrspatial.geotiff import open_geotiff, MixedBandMetadataError

# tile_a.tif declares nodata=-9999, tile_b.tif declares nodata=0.
# The default band_nodata=None rejects the mosaic rather than
# flattening to one sentinel.
try:
open_geotiff('mixed_nodata.vrt')
except MixedBandMetadataError:
# Pass band_nodata='first' to opt back into the legacy
# flatten-to-band-0 semantics, or fix the source tiles.
pass

VRT missing sources
===================

Expand Down
41 changes: 40 additions & 1 deletion xrspatial/geotiff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,22 @@ def open_geotiff(source: str | BinaryIO, *,
- ``gpu=True, chunks=N``: Dask+CuPy for out-of-core GPU pipelines
- Default: NumPy eager read

VRT files are auto-detected by extension.
VRT files are auto-detected by extension. The supported VRT subset
is narrow on purpose (issue #2321; see the "VRT support matrix"
section in ``docs/source/reference/geotiff.rst`` for the canonical
contract). In short:

* Supported: simple GDAL VRT mosaics over GeoTIFF sources;
compatible CRS, transform orientation, pixel size, dtype, and
band count across sources; clean windowed reads; lazy / dask
reads over the same subset; explicit nodata with mixed-band
rejection by default; ``missing_sources='raise'`` as the
default.
* Non-goals (allowed to raise): warped / reprojection VRTs,
arbitrary resampling beyond the tested subset, mixed CRS /
resolution / dtype / band metadata without an opt-in, nested
VRTs, complex source / mask band / alpha band structures, full
GDAL VRT parity.

Parameters
----------
Expand Down Expand Up @@ -517,6 +532,30 @@ def open_geotiff(source: str | BinaryIO, *,
then raises ``ValueError`` (float-to-int is lossy in a way users
rarely intend). When the file has no in-range sentinel match, the
promotion is skipped and ``dtype=<integer>`` works either way.

Examples
--------
Safe VRT usage. Mosaic two compatible tiles and read with the
fail-closed defaults:

>>> from xrspatial.geotiff import open_geotiff, write_vrt
>>> vrt_path = write_vrt( # doctest: +SKIP
... 'mosaic.vrt',
... source_files=['tile_west.tif', 'tile_east.tif'],
... )
>>> da = open_geotiff(vrt_path) # doctest: +SKIP

Intentionally raises. A VRT whose source tiles disagree on their
per-band nodata sentinels is rejected by the default
``band_nodata=None``:

>>> from xrspatial.geotiff import MixedBandMetadataError
>>> try: # doctest: +SKIP
... open_geotiff('mixed_nodata.vrt')
... except MixedBandMetadataError:
... pass # pass band_nodata='first' to opt back into the
... # legacy flatten-to-band-0 semantics, or fix the
... # source tiles.
"""
from ._reader import _coerce_path

Expand Down
48 changes: 48 additions & 0 deletions xrspatial/geotiff/_backends/vrt.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,30 @@ def read_vrt(source: str, *,
raises a typed error rather than silently flattening. See
:data:`xrspatial.geotiff.SUPPORTED_FEATURES` for the full tier map.

Supported subset (issue #2321; see the "VRT support matrix" section
in ``docs/source/reference/geotiff.rst`` for the canonical
contract):

* Simple GDAL VRT mosaics whose ``<SourceFilename>`` entries point
at GeoTIFF files (sources must resolve under the VRT's own
directory or an ``XRSPATIAL_VRT_ALLOWED_ROOTS`` root; #1671).
* Sources that agree on CRS, transform orientation, pixel size,
dtype, and band count. Mismatch raises rather than flattening.
* Windowed reads via ``window=``; eager and dask paths shift
coords and ``attrs['transform']`` together.
* Lazy / dask reads via ``chunks=`` over the same subset, with a
parse-time missing-source sweep (#2265).
* Explicit ``nodata``; ``band_nodata=None`` (the default) rejects
disagreeing per-band sentinels with ``MixedBandMetadataError``
(#1987).
* ``missing_sources='raise'`` is the default (#1860).

Non-goals (intentionally unsupported, allowed to raise): warped /
reprojection VRTs, arbitrary resampling beyond the tested subset,
mixed CRS / resolution / dtype / band metadata without an opt-in,
nested VRTs, complex source / mask band / alpha band structures,
full GDAL VRT parity.

The VRT's source GeoTIFFs are read via windowed reads and assembled
into a single array.

Expand Down Expand Up @@ -270,6 +294,30 @@ def read_vrt(source: str, *,
failures, which surface as per-task ``GeoTIFFFallbackWarning``
instead. Each worker still emits ``GeoTIFFFallbackWarning`` for
missing sources at execution time as well.

Examples
--------
Safe usage. Mosaic two compatible tiles and read with the
fail-closed defaults:

>>> from xrspatial.geotiff import open_geotiff, write_vrt
>>> vrt_path = write_vrt( # doctest: +SKIP
... 'mosaic.vrt',
... source_files=['tile_west.tif', 'tile_east.tif'],
... )
>>> da = read_vrt(vrt_path) # doctest: +SKIP

Intentionally raises. A VRT whose source tiles disagree on their
per-band nodata sentinels is rejected by the default
``band_nodata=None``:

>>> from xrspatial.geotiff import MixedBandMetadataError
>>> try: # doctest: +SKIP
... read_vrt('mixed_nodata.vrt')
... except MixedBandMetadataError:
... pass # pass band_nodata='first' to opt back into the
... # legacy flatten-to-band-0 semantics, or fix the
... # source tiles.
"""
from .._reader import _coerce_path
from .._vrt import _apply_integer_sentinel_mask_with_presence as _vrt_mask_with_presence
Expand Down
50 changes: 50 additions & 0 deletions xrspatial/geotiff/_writers/vrt.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,24 @@ def write_vrt(path: str = _VRT_PATH_MISSING_SENTINEL,
disagreement. See :data:`xrspatial.geotiff.SUPPORTED_FEATURES` for
the full tier map.

Output targets the same narrow subset of GDAL's VRT spec that the
reader supports (issue #2321; see the "VRT support matrix" section
in ``docs/source/reference/geotiff.rst`` for the canonical
contract):

* Supported: simple GDAL VRT mosaics over GeoTIFF sources;
compatible CRS, transform orientation, pixel size, dtype, and
band count across sources; clean windowed reads on the
consumer side; lazy / dask reads over the same subset on the
consumer side; explicit nodata; ``missing_sources='raise'`` as
the read-side default.
* Non-goals (the writer does not emit these and the reader is
allowed to raise on them): warped / reprojection VRTs,
arbitrary resampling beyond the tested subset, mixed CRS /
resolution / dtype / band metadata without an opt-in, nested
VRTs, complex source / mask band / alpha band structures, full
GDAL VRT parity.

Parameters
----------
path : str
Expand Down Expand Up @@ -73,6 +91,38 @@ def write_vrt(path: str = _VRT_PATH_MISSING_SENTINEL,
-------
str
Path to the written VRT file.

Examples
--------
Safe usage. Mosaic two compatible tiles; the consumer can then
read the resulting VRT with the fail-closed defaults. Paths
below are illustrative; replace with paths to real GeoTIFF
files on disk:

>>> from xrspatial.geotiff import write_vrt, open_geotiff
>>> vrt_path = write_vrt( # doctest: +SKIP
... 'mosaic.vrt',
... source_files=['tile_west.tif', 'tile_east.tif'],
... )
>>> da = open_geotiff(vrt_path) # doctest: +SKIP

Intentionally raises (on the read side). If the source tiles
disagree on their per-band nodata sentinels, the default
``band_nodata=None`` on ``open_geotiff`` / ``read_vrt`` rejects
the mosaic with ``MixedBandMetadataError``. The writer does not
pre-validate cross-tile metadata; the failure mode lives on the
read side:

>>> from xrspatial.geotiff import MixedBandMetadataError
>>> # tile_a.tif declares nodata=-9999; tile_b.tif declares nodata=0
>>> bad_path = write_vrt( # doctest: +SKIP
... 'mixed_nodata.vrt',
... source_files=['tile_a.tif', 'tile_b.tif'],
... )
>>> try: # doctest: +SKIP
... open_geotiff(bad_path)
... except MixedBandMetadataError:
... pass # fix the source tiles or pass band_nodata='first'.
"""
# Explicit signature (previously ``**kwargs``) so ``inspect.signature``,
# IDE autocomplete, and ``mypy --strict`` can see the accepted kwargs
Expand Down
Loading