From e9a33d8e3aefedc783483d0d7080b6936ed4241a Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Mon, 18 May 2026 08:24:09 -0700 Subject: [PATCH 1/2] geotiff: golden corpus phase 4 PR 2, baseline-update docs (#1930) Adds the contributor-facing page that explains how to bless intentional pixel-level baseline changes for the golden corpus. * ``docs/source/contributing/index.rst`` -- new contributing section, only entry today is the corpus baselines page. * ``docs/source/contributing/golden_corpus_baselines.rst`` -- what a corpus failure means, how to bless an intentional baseline change, how to add a new fixture, and how to use the fast / slow split (phase 4 PR 1). * ``docs/source/index.rst`` -- toctree entry for the new section. This is the last PR in the #1930 plan. Closes out the phase 4 ops set. --- .../contributing/golden_corpus_baselines.rst | 135 ++++++++++++++++++ docs/source/contributing/index.rst | 15 ++ docs/source/index.rst | 1 + 3 files changed, 151 insertions(+) create mode 100644 docs/source/contributing/golden_corpus_baselines.rst create mode 100644 docs/source/contributing/index.rst diff --git a/docs/source/contributing/golden_corpus_baselines.rst b/docs/source/contributing/golden_corpus_baselines.rst new file mode 100644 index 000000000..031f61a04 --- /dev/null +++ b/docs/source/contributing/golden_corpus_baselines.rst @@ -0,0 +1,135 @@ +.. _golden_corpus_baselines: + +******************************** +Updating golden-corpus baselines +******************************** + +The geotiff golden corpus (issue +`#1930 `_) +pins every backend to a rasterio reference read for every fixture. +The corpus lives at +``xrspatial/geotiff/tests/golden_corpus/`` and is generated +deterministically from ``manifest.yaml``. The committed ``.tif`` files +under ``fixtures/`` are byte-stable across regenerations; this is what +makes "pixel parity" mean something. + +This page is for the case where a code change legitimately changes the +pixels a backend produces. The corpus is intentionally rigid -- any +unintended change shows up as a test failure -- so blessing a change is +an explicit step. + +When a corpus test fails +======================== + +Failure modes you might see: + +1. ``compare_to_oracle`` raises ``AssertionError: pixel arrays differ``. + Either the reader changed how it decodes the fixture, or the fixture + bytes changed. Compare the fixture's ``md5sum`` against the value + from main; if it shifted, the fixture moved. + +2. ``compare_to_oracle`` raises ``AssertionError: dtype mismatch`` or + ``transform mismatch``. The fixture is intact but the reader changed + how it surfaces an attr. Check whether the attr is in the oracle's + canonical-attr set (today: ``crs`` / ``transform`` / ``nodata`` / + ``dtype``); if so, the canonical-attrs contract is what changed. + +3. An ``xfail(strict=True)`` test flips to ``XPASS``. This is the + "regression-of-regressions" surface: a parity gap that was previously + documented in ``_PARITY_GAPS`` is now passing the oracle. Remove the + entry from the ``_PARITY_GAPS`` dict and the matching docstring + bullet, and merge. + +Blessing a pixel-level change +============================= + +If a code change makes a backend produce different pixels and the new +output is correct, the corpus baseline needs updating. The workflow: + +1. **Confirm the change is intentional.** Read the failing test output + and the diff. If you cannot articulate why the new bytes are correct + in one sentence, the change is probably a bug, not a baseline + refresh. + +2. **Regenerate any affected fixtures.** From the repository root: + + .. code-block:: bash + + python -m xrspatial.geotiff.tests.golden_corpus.generate + + Or, for one fixture by id: + + .. code-block:: bash + + python -m xrspatial.geotiff.tests.golden_corpus.generate \ + --only + + The generator is deterministic. If two runs produce different bytes + for the same fixture, that is the bug -- file an issue rather than + committing the unstable output. + +3. **Inspect the changed bytes.** ``git diff --stat`` on the + ``fixtures/`` tree shows which files moved. ``rasterio info `` + on the old vs new fixture is a low-effort sanity check that the + change is what you expected (e.g. tile layout, compression). + +4. **Run the full corpus matrix.** All backend modules under + ``xrspatial/geotiff/tests/test_golden_corpus_*_1930.py`` should pass + against the new baseline. ``pytest`` runs them all; ``pytest -m "not + slow"`` runs the PR fast lane only. + +5. **Commit the manifest change and the regenerated bytes together.** + The PR description must answer: what changed, why is the new value + correct, and what code change made it correct. A baseline refresh + PR without a one-paragraph justification will be sent back. + +Adding a new fixture +==================== + +Adding a fixture is cheaper than blessing a baseline change because no +existing test is asserting against the new bytes yet: + +1. Add the entry to ``fixtures:`` in ``manifest.yaml``. Re-use the + ``defaults`` block; only override what is interesting for the new + axis you are exercising. The schema is documented at the top of + ``manifest.yaml`` and enforced by ``generate.validate()``. + +2. Run ``python -m xrspatial.geotiff.tests.golden_corpus.generate + --only ``. Confirm the file lands under ``fixtures/`` and + stays under the 12 KB per-fixture budget (the smoke tests enforce + this). + +3. If the new fixture exercises a backend gap that is not in any of the + per-module ``_PARITY_GAPS`` tables, expect the parametrised test to + fail. Either fix the gap or add the fixture id to ``_PARITY_GAPS`` + with a reason and a follow-up trail. Do not just merge a failing + test. + +Fast / slow split +================= + +Each fixture's ``tags:`` list controls which lane it runs in. A fixture +is fast iff ``"fast"`` is in its tags; everything else picks up +``pytest.mark.slow`` via the helper in +``xrspatial/geotiff/tests/golden_corpus/_marks.py``. + +* ``pytest``: runs every cell. +* ``pytest -m "not slow"``: PR fast lane. +* ``pytest -m slow``: only the slow cells; useful for a nightly job. + +When adding a fixture that decodes in well under a second, include +``fast`` in its tags. Reserve the slow lane for fixtures that are +genuinely heavy (large COGs, jpeg2000, multi-source VRTs). + +Reference points +================ + +* `Issue #1930 `_ + -- the tracking issue, including the phase plan. +* ``xrspatial/geotiff/tests/golden_corpus/_oracle.py`` -- the oracle + contract. ``compare_to_oracle(path, candidate, lossy=False)`` is the + single function every backend test calls; it asserts pixels (bit- + exact or shape-only when ``lossy``), dtype, transform, CRS, and + nodata. +* ``xrspatial/geotiff/tests/golden_corpus/manifest.yaml`` -- the + schema-documented list of fixtures. Edit here, regenerate, commit. diff --git a/docs/source/contributing/index.rst b/docs/source/contributing/index.rst new file mode 100644 index 000000000..235f9e43f --- /dev/null +++ b/docs/source/contributing/index.rst @@ -0,0 +1,15 @@ +.. _contributing: + +************ +Contributing +************ + +Developer-facing docs. These pages cover workflows for editing +xrspatial itself: regenerating test fixtures, blessing +pixel-level baseline changes, and so on. End-user docs are under +:ref:`user_guide` and :ref:`reference`. + +.. toctree:: + :maxdepth: 1 + + golden_corpus_baselines diff --git a/docs/source/index.rst b/docs/source/index.rst index 0b95a5812..cc0241c33 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -106,3 +106,4 @@ xarray-spatial does not depend on GDAL / GEOS, which makes it fully extensible i getting_started/index user_guide/index reference/index + contributing/index From 11cb2231f749bc1d61ac06f9782b24dbedcc0c8b Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Mon, 18 May 2026 09:01:38 -0700 Subject: [PATCH 2/2] geotiff: phase 4 PR 2 review fixes (#1930) * Add a note up front that the doc describes the system as of the full Phase 3 + Phase 4.1 PR set, so contributors do not chase a missing ``_marks.py`` while one of those PRs is still in flight. * Replace the ``\`` line-continuation in the per-fixture regeneration example with a single-line invocation; the backslash form is POSIX-shell-only and would fail in ``cmd.exe``. * Normalise file-path references to the full ``xrspatial/geotiff/tests/golden_corpus/`` form where the text used the bare filename. * Cross-link the corpus ``README.md`` from the Fast / Slow section rather than repeat its content; the two pages have different audiences (test-tree contributor vs docs-site reader) and the cross-reference is the lightest way to keep them aligned. --- .../contributing/golden_corpus_baselines.rst | 23 ++++++++++++++----- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/docs/source/contributing/golden_corpus_baselines.rst b/docs/source/contributing/golden_corpus_baselines.rst index 031f61a04..32222662c 100644 --- a/docs/source/contributing/golden_corpus_baselines.rst +++ b/docs/source/contributing/golden_corpus_baselines.rst @@ -18,6 +18,14 @@ pixels a backend produces. The corpus is intentionally rigid -- any unintended change shows up as a test failure -- so blessing a change is an explicit step. +.. note:: + + This page describes the system as of the full Phase 3 (backend + wiring) and Phase 4.1 (fast / slow split) PR set. File and symbol + references (``_marks.py``, ``_PARITY_GAPS``, the per-backend test + modules) assume those PRs have merged. If one is still in flight, + check the matching open PR before chasing a missing file. + When a corpus test fails ======================== @@ -57,12 +65,12 @@ output is correct, the corpus baseline needs updating. The workflow: python -m xrspatial.geotiff.tests.golden_corpus.generate - Or, for one fixture by id: + Or, for one fixture by id (one-line invocation; portable across + POSIX shells and ``cmd.exe``): .. code-block:: bash - python -m xrspatial.geotiff.tests.golden_corpus.generate \ - --only + python -m xrspatial.geotiff.tests.golden_corpus.generate --only The generator is deterministic. If two runs produce different bytes for the same fixture, that is the bug -- file an issue rather than @@ -89,10 +97,11 @@ Adding a new fixture Adding a fixture is cheaper than blessing a baseline change because no existing test is asserting against the new bytes yet: -1. Add the entry to ``fixtures:`` in ``manifest.yaml``. Re-use the +1. Add the entry to ``fixtures:`` in + ``xrspatial/geotiff/tests/golden_corpus/manifest.yaml``. Re-use the ``defaults`` block; only override what is interesting for the new axis you are exercising. The schema is documented at the top of - ``manifest.yaml`` and enforced by ``generate.validate()``. + the manifest file and enforced by ``generate.validate()``. 2. Run ``python -m xrspatial.geotiff.tests.golden_corpus.generate --only ``. Confirm the file lands under ``fixtures/`` and @@ -111,7 +120,9 @@ Fast / slow split Each fixture's ``tags:`` list controls which lane it runs in. A fixture is fast iff ``"fast"`` is in its tags; everything else picks up ``pytest.mark.slow`` via the helper in -``xrspatial/geotiff/tests/golden_corpus/_marks.py``. +``xrspatial/geotiff/tests/golden_corpus/_marks.py``. See the corpus +``README.md`` next to that helper for a shorter version of this same +table aimed at someone editing the manifest directly. * ``pytest``: runs every cell. * ``pytest -m "not slow"``: PR fast lane.