Skip to content

geotiff: add Hypothesis property tests for round-trip metadata#2141

Merged
brendancol merged 4 commits into
xarray-contrib:mainfrom
brendancol:issue-2134
May 20, 2026
Merged

geotiff: add Hypothesis property tests for round-trip metadata#2141
brendancol merged 4 commits into
xarray-contrib:mainfrom
brendancol:issue-2134

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2134.

Summary

  • Add xrspatial/geotiff/tests/test_roundtrip_properties.py. A composite Hypothesis strategy over eight metadata axes (coord dtype, axis direction, shape, CRS/transform presence, nodata mode, band layout, pixel dtype, CRS EPSG) feeds two test cases that assert a fixed-point invariant across two write -> read cycles.
  • One case runs on the numpy backend with the registered rockout_default profile (200 examples). A second case wraps the input in dask chunks and runs on the registered ci profile (50 examples) to exercise the streaming write path. GPU paths and byte equality stay out of scope per the issue.
  • Register hypothesis in the tests extras in setup.cfg. The new module guards its imports with pytest.importorskip("hypothesis") so the test still skips cleanly in environments that don't install the extras.

Notes

Test plan

  • pytest xrspatial/geotiff/tests/test_roundtrip_properties.py::test_round_trip_fixed_point_numpy -- 200 passing examples
  • pytest xrspatial/geotiff/tests/test_roundtrip_properties.py::test_round_trip_fixed_point_dask -- 50 passing examples
  • pytest xrspatial/geotiff/tests/test_fuzz_hypothesis_1661.py still green

…y-contrib#2134)

Adds xrspatial/geotiff/tests/test_roundtrip_properties.py with a
composite Hypothesis strategy over eight metadata axes (coord dtype,
axis direction, shape, CRS/transform presence, nodata mode, band
layout, pixel dtype, CRS EPSG) and asserts a fixed-point invariant
across two write -> read cycles on the numpy and dask+numpy
backends.

Registers ``hypothesis`` in the ``tests`` extras in setup.cfg so the
new module's ``importorskip`` is the only gate in environments that
do install the extras.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 19, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: geotiff: add Hypothesis property tests for round-trip metadata

Blockers

None.

Suggestions

  • xrspatial/geotiff/tests/test_roundtrip_properties.py:388-396 and :454-458: the except (ValueError, TypeError): assume(False) block silently swallows ~8.5% of executions (17/200 on the numpy property per Hypothesis stats). If a previously-supported combo starts raising ValueError, the test will just skip the draw instead of failing -- a real regression becomes invisible. Two ways out: (a) narrow the strategy upfront so the writer-reject rate is near zero, or (b) record which combos get skipped via hypothesis.event(...) so a regression that bumps the skip rate from 8% to 30% shows up in CI stats. The minimal version of (b) is one line: hypothesis.event(f"writer_rejected_{spec['nodata_mode']}_{spec['pixel_dtype']}") before the assume(False).

  • xrspatial/geotiff/tests/test_roundtrip_properties.py:84-105: the profiles are registered at import time but no pytest hook calls settings.load_profile('ci') in CI. The @settings(parent=settings.get_profile('ci')) decorator does inherit those settings, so the dask test does run at 50 examples. But the name 'ci' looks like it's meant to be the active profile in CI runs, which it isn't. Rename to 'reduced' or 'fast', or add a note in the docstring spelling out that 'ci' here is just an inheritance source.

  • xrspatial/geotiff/tests/test_roundtrip_properties.py:390: the bound e in except (ValueError, TypeError) as e: is unused. Drop the as e, or log it -- hypothesis.event(f"writer_rejected: {type(e).__name__}") would solve the previous suggestion too.

Nits

  • xrspatial/geotiff/tests/test_roundtrip_properties.py:101: the profile name rockout_default reads as project-specific jargon. The issue's acceptance criterion says "default and ci". 'local' would do, or just drop the registration and apply @settings(max_examples=200, deadline=None, ...) inline since only one test uses it.

  • xrspatial/geotiff/tests/test_roundtrip_properties.py:237 and :238: crs_epsg and n_bands are always drawn but only consumed for some draws (half and one third respectively). Strategy slots wasted, not a bug.

  • xrspatial/geotiff/tests/test_roundtrip_properties.py:452: 200 + 50 = 250 tmpdirs per session, each with two small .tif files. pytest cleans them up at teardown, but a single tmpdir plus os.unlink after each round would keep the file count to two at any moment.

  • xrspatial/geotiff/tests/test_roundtrip_properties.py:175-179: docstring says "Returns None for the none case, or a Python scalar otherwise"; the nan branch returns float('nan'), which is a scalar but worth flagging explicitly.

What looks good

  • All eight strategy axes from the issue are present.
  • assume(_is_legal_combo(spec)) is inside the composite strategy, so filtered draws don't burn the example budget.
  • _assert_fixed_point compares NaN-as-data via mask-and-equal, and handles the transform tuple with math.isclose.
  • _LOCKED_ATTRS includes _NO_GEOREF_KEY, so the #2120 marker is in the contract.
  • Docstring cross-references #2087, #2092, #2120, and test_backend_parity_matrix.py.
  • setup.cfg picks up the hypothesis dev-dep that the issue flagged as a follow-up.

Checklist

  • Algorithm matches reference/paper -- N/A, test-only change.
  • All implemented backends produce consistent results -- numpy and dask+numpy covered; GPU out of scope per issue.
  • NaN handling is correct.
  • Edge cases covered by tests -- 1x1, 1xN, Nx1 shapes included.
  • Dask chunk boundaries handled correctly.
  • No premature materialization or unnecessary copies.
  • Benchmark exists or is not needed -- not needed for test-only PR.
  • README feature matrix updated (if applicable) -- N/A.
  • Docstrings present and accurate.

* Rename ``rockout_default`` / ``ci`` profiles to ``local`` / ``reduced``
  and document in the module docstring that both are inheritance
  sources for ``@settings(parent=...)``, not the active CI profile.
* Tag writer-reject ``assume(False)`` skips with
  ``hypothesis.event(...)`` so a regression that bumps the skip rate
  shows up in Hypothesis statistics output.
* Draw ``crs_epsg`` only when a CRS is going to be passed to the
  writer and ``n_bands`` only when the band layout uses a band axis.
* Clarify the ``_pick_nodata`` docstring on the ``nan`` return.
* Unlink the per-example .tif files eagerly so a 200-example session
  doesn't leave 400 files on disk until pytest teardown.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 1478b36

All seven findings from the first pass are addressed in this push. Disposition:

Suggestions (3 of 3 fixed)

  • Skip-rate visibility -- hypothesis.event(f"writer_rejected:{type(exc).__name__}") now tags each writer-reject assume(False). Hypothesis stats output shows the rate (~4.29% writer_rejected:ValueError on the latest local run). A regression that bumps the rate will surface in CI.
  • 'ci' profile name -- renamed to 'reduced'. Docstring now explains that both profiles are inheritance sources for @settings(parent=...) rather than the active CI profile.
  • Unused e in except -- now bound as exc and consumed in the event(...) call.

Nits (4 of 4 fixed)

  • rockout_default profile name -- renamed to 'local'.
  • crs_epsg / n_bands always drawn -- now conditional inside the composite strategy. crs_epsg is drawn only when georef in ('crs_only', 'both'); n_bands only when band_layout != 'no_band'.
  • Tmpdir churn -- per-example .tif files are now os.unlink-ed in a finally block. Disk file count stays at 2 (numpy test) or 2 (dask test) at any moment instead of growing to 400 + 100 by session teardown.
  • _pick_nodata docstring -- now explicitly notes the nan branch returns float('nan').

Tests pass post-merge with origin/main: 200 + 50 examples, 0 failures.

…b#2134)

``test_round_trip_property`` draws a compression codec from
``LOSSLESS_CODECS`` which included ``lz4`` and ``zstd``. Neither
package is in the ``[tests]`` extras, so the moment Hypothesis
sampled either codec the writer raised ``ImportError`` and the
fuzz run failed -- pre-existing flake the new property tests on
PR 2141 happened to surface across all three CI platforms.

Drop ``lz4`` / ``zstd`` from the strategy when their packages are
missing, using the existing ``LZ4_AVAILABLE`` / ``ZSTD_AVAILABLE``
flags from ``_compression``. CI runners get a smaller codec set;
local runs with the packages installed still exercise both.
@brendancol brendancol merged commit 6549a48 into xarray-contrib:main May 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: add Hypothesis property tests for metadata round trips

1 participant