Skip to content

geotiff: define a public contract for DataArray attrs (canonical / alias / pass-through) #1984

@brendancol

Description

@brendancol

Reason or Problem

xrspatial.geotiff populates DataArray.attrs with a large surface area: crs, crs_wkt, transform, nodata, nodatavals, _FillValue, raster_type, extra_tags, gdal_metadata, gdal_metadata_xml, image_description, extra_samples, colormap, colormap_rgba, cmap, x_resolution, y_resolution, resolution_unit, crs_name, plus a long list of GeoKey-derived fields (geog_citation, datum_code, angular_units, linear_units, semi_major_axis, inv_flattening, projection_code, vertical_crs, vertical_citation, vertical_units).

Today there is no documented contract telling callers which keys are canonical, which are compatibility aliases, and which are pass-through. Downstream code is left guessing. For example, attrs['nodata']'s presence currently doubles as a flag that NaN-masking has already run (see comment at _attrs.py:91-92), which the reviewer flagged as overloaded.

Proposal

Define and publish a three-tier classification for every attr emitted by the read path and consumed by the write path:

  1. Canonical — owned by xrspatial. Round-trip stable. Examples: crs, crs_wkt, transform, raster_type, extra_tags, gdal_metadata/gdal_metadata_xml, the resolution group.
  2. Compatibility alias — read for interop with other ecosystems (rioxarray nodatavals, CF _FillValue). Never emitted by our writers when a canonical key is present.
  3. Best-effort pass-through — preserved through round-trip when possible but not guaranteed semantically. Examples: GeoKey-derived metadata that the writer cannot fully reconstruct.

Design:

  • New module docstring in xrspatial/geotiff/_attrs.py listing every attr key by tier with a one-line semantic definition.
  • A attrs_contract.rst page under docs/source/ with the same table, plus the round-trip invariant per tier.
  • Tests that lock the contract: a single fixture exercising every attr, with explicit assertions per tier.
  • Pin attrs['nodata'] semantics specifically. This is the subject of issue add curvature #5 (split declared vs masked nodata) and that issue is the prerequisite for nailing this one down.

Usage:
Library users read the contract page to know which keys they can rely on. Internal code stops adding new attrs without first picking a tier.

Value:
The reviewer noted that "bugs in this module often come from one backend getting a fix and another backend lagging." A pinned contract is the missing reference for parity tests (issue #2) and round-trip tests (issue #3) to assert against.

Stakeholders and Impacts

  • Read paths (_reader.py, _backends/) — must keep emitting the canonical set.
  • Writer paths (_writer.py, _writers/, _vrt.py) — must accept canonical keys and the documented aliases.
  • Downstream xrspatial functions that read attrs — should switch to canonical keys.

Drawbacks

A published contract is a constraint. Some current attrs may need to be deprecated rather than promoted to canonical.

Alternatives

Leave the contract implicit. Continue treating attrs as best-effort and let each downstream function defend itself. This is the current state and the reviewer flagged it as a source of churn.

Unresolved Questions

  • Should crs_wkt always be canonical, or canonical only when crs (EPSG) is missing?
  • Are the GeoKey-derived attrs canonical or pass-through? They round-trip today but the writer reconstructs them from crs/crs_wkt.
  • Does the contract need a version (attrs['_xrspatial_geotiff_contract'] = 1) to allow future evolution?

Additional Notes or Context

Source of this proposal: code review feedback on the geotiff module, suggestion 1 of 5. Related issues: #2 (parity matrix), #3 (round-trip invariants), #4 (fail-closed), #5 (declared vs masked nodata).

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencyenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions