diff --git a/docs/source/contributing/golden_corpus_baselines.rst b/docs/source/contributing/golden_corpus_baselines.rst new file mode 100644 index 000000000..32222662c --- /dev/null +++ b/docs/source/contributing/golden_corpus_baselines.rst @@ -0,0 +1,146 @@ +.. _golden_corpus_baselines: + +******************************** +Updating golden-corpus baselines +******************************** + +The geotiff golden corpus (issue +`#1930 `_) +pins every backend to a rasterio reference read for every fixture. +The corpus lives at +``xrspatial/geotiff/tests/golden_corpus/`` and is generated +deterministically from ``manifest.yaml``. The committed ``.tif`` files +under ``fixtures/`` are byte-stable across regenerations; this is what +makes "pixel parity" mean something. + +This page is for the case where a code change legitimately changes the +pixels a backend produces. The corpus is intentionally rigid -- any +unintended change shows up as a test failure -- so blessing a change is +an explicit step. + +.. note:: + + This page describes the system as of the full Phase 3 (backend + wiring) and Phase 4.1 (fast / slow split) PR set. File and symbol + references (``_marks.py``, ``_PARITY_GAPS``, the per-backend test + modules) assume those PRs have merged. If one is still in flight, + check the matching open PR before chasing a missing file. + +When a corpus test fails +======================== + +Failure modes you might see: + +1. ``compare_to_oracle`` raises ``AssertionError: pixel arrays differ``. + Either the reader changed how it decodes the fixture, or the fixture + bytes changed. Compare the fixture's ``md5sum`` against the value + from main; if it shifted, the fixture moved. + +2. ``compare_to_oracle`` raises ``AssertionError: dtype mismatch`` or + ``transform mismatch``. The fixture is intact but the reader changed + how it surfaces an attr. Check whether the attr is in the oracle's + canonical-attr set (today: ``crs`` / ``transform`` / ``nodata`` / + ``dtype``); if so, the canonical-attrs contract is what changed. + +3. An ``xfail(strict=True)`` test flips to ``XPASS``. This is the + "regression-of-regressions" surface: a parity gap that was previously + documented in ``_PARITY_GAPS`` is now passing the oracle. Remove the + entry from the ``_PARITY_GAPS`` dict and the matching docstring + bullet, and merge. + +Blessing a pixel-level change +============================= + +If a code change makes a backend produce different pixels and the new +output is correct, the corpus baseline needs updating. The workflow: + +1. **Confirm the change is intentional.** Read the failing test output + and the diff. If you cannot articulate why the new bytes are correct + in one sentence, the change is probably a bug, not a baseline + refresh. + +2. **Regenerate any affected fixtures.** From the repository root: + + .. code-block:: bash + + python -m xrspatial.geotiff.tests.golden_corpus.generate + + Or, for one fixture by id (one-line invocation; portable across + POSIX shells and ``cmd.exe``): + + .. code-block:: bash + + python -m xrspatial.geotiff.tests.golden_corpus.generate --only + + The generator is deterministic. If two runs produce different bytes + for the same fixture, that is the bug -- file an issue rather than + committing the unstable output. + +3. **Inspect the changed bytes.** ``git diff --stat`` on the + ``fixtures/`` tree shows which files moved. ``rasterio info `` + on the old vs new fixture is a low-effort sanity check that the + change is what you expected (e.g. tile layout, compression). + +4. **Run the full corpus matrix.** All backend modules under + ``xrspatial/geotiff/tests/test_golden_corpus_*_1930.py`` should pass + against the new baseline. ``pytest`` runs them all; ``pytest -m "not + slow"`` runs the PR fast lane only. + +5. **Commit the manifest change and the regenerated bytes together.** + The PR description must answer: what changed, why is the new value + correct, and what code change made it correct. A baseline refresh + PR without a one-paragraph justification will be sent back. + +Adding a new fixture +==================== + +Adding a fixture is cheaper than blessing a baseline change because no +existing test is asserting against the new bytes yet: + +1. Add the entry to ``fixtures:`` in + ``xrspatial/geotiff/tests/golden_corpus/manifest.yaml``. Re-use the + ``defaults`` block; only override what is interesting for the new + axis you are exercising. The schema is documented at the top of + the manifest file and enforced by ``generate.validate()``. + +2. Run ``python -m xrspatial.geotiff.tests.golden_corpus.generate + --only ``. Confirm the file lands under ``fixtures/`` and + stays under the 12 KB per-fixture budget (the smoke tests enforce + this). + +3. If the new fixture exercises a backend gap that is not in any of the + per-module ``_PARITY_GAPS`` tables, expect the parametrised test to + fail. Either fix the gap or add the fixture id to ``_PARITY_GAPS`` + with a reason and a follow-up trail. Do not just merge a failing + test. + +Fast / slow split +================= + +Each fixture's ``tags:`` list controls which lane it runs in. A fixture +is fast iff ``"fast"`` is in its tags; everything else picks up +``pytest.mark.slow`` via the helper in +``xrspatial/geotiff/tests/golden_corpus/_marks.py``. See the corpus +``README.md`` next to that helper for a shorter version of this same +table aimed at someone editing the manifest directly. + +* ``pytest``: runs every cell. +* ``pytest -m "not slow"``: PR fast lane. +* ``pytest -m slow``: only the slow cells; useful for a nightly job. + +When adding a fixture that decodes in well under a second, include +``fast`` in its tags. Reserve the slow lane for fixtures that are +genuinely heavy (large COGs, jpeg2000, multi-source VRTs). + +Reference points +================ + +* `Issue #1930 `_ + -- the tracking issue, including the phase plan. +* ``xrspatial/geotiff/tests/golden_corpus/_oracle.py`` -- the oracle + contract. ``compare_to_oracle(path, candidate, lossy=False)`` is the + single function every backend test calls; it asserts pixels (bit- + exact or shape-only when ``lossy``), dtype, transform, CRS, and + nodata. +* ``xrspatial/geotiff/tests/golden_corpus/manifest.yaml`` -- the + schema-documented list of fixtures. Edit here, regenerate, commit. diff --git a/docs/source/contributing/index.rst b/docs/source/contributing/index.rst new file mode 100644 index 000000000..235f9e43f --- /dev/null +++ b/docs/source/contributing/index.rst @@ -0,0 +1,15 @@ +.. _contributing: + +************ +Contributing +************ + +Developer-facing docs. These pages cover workflows for editing +xrspatial itself: regenerating test fixtures, blessing +pixel-level baseline changes, and so on. End-user docs are under +:ref:`user_guide` and :ref:`reference`. + +.. toctree:: + :maxdepth: 1 + + golden_corpus_baselines diff --git a/docs/source/index.rst b/docs/source/index.rst index 0b95a5812..cc0241c33 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -106,3 +106,4 @@ xarray-spatial does not depend on GDAL / GEOS, which makes it fully extensible i getting_started/index user_guide/index reference/index + contributing/index