geotiff: add Hypothesis property tests for round-trip metadata#2141
Conversation
…y-contrib#2134) Adds xrspatial/geotiff/tests/test_roundtrip_properties.py with a composite Hypothesis strategy over eight metadata axes (coord dtype, axis direction, shape, CRS/transform presence, nodata mode, band layout, pixel dtype, CRS EPSG) and asserts a fixed-point invariant across two write -> read cycles on the numpy and dask+numpy backends. Registers ``hypothesis`` in the ``tests`` extras in setup.cfg so the new module's ``importorskip`` is the only gate in environments that do install the extras.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: geotiff: add Hypothesis property tests for round-trip metadata
Blockers
None.
Suggestions
-
xrspatial/geotiff/tests/test_roundtrip_properties.py:388-396and:454-458: theexcept (ValueError, TypeError): assume(False)block silently swallows ~8.5% of executions (17/200 on the numpy property per Hypothesis stats). If a previously-supported combo starts raisingValueError, the test will just skip the draw instead of failing -- a real regression becomes invisible. Two ways out: (a) narrow the strategy upfront so the writer-reject rate is near zero, or (b) record which combos get skipped viahypothesis.event(...)so a regression that bumps the skip rate from 8% to 30% shows up in CI stats. The minimal version of (b) is one line:hypothesis.event(f"writer_rejected_{spec['nodata_mode']}_{spec['pixel_dtype']}")before theassume(False). -
xrspatial/geotiff/tests/test_roundtrip_properties.py:84-105: the profiles are registered at import time but no pytest hook callssettings.load_profile('ci')in CI. The@settings(parent=settings.get_profile('ci'))decorator does inherit those settings, so the dask test does run at 50 examples. But the name'ci'looks like it's meant to be the active profile in CI runs, which it isn't. Rename to'reduced'or'fast', or add a note in the docstring spelling out that'ci'here is just an inheritance source. -
xrspatial/geotiff/tests/test_roundtrip_properties.py:390: the boundeinexcept (ValueError, TypeError) as e:is unused. Drop theas e, or log it --hypothesis.event(f"writer_rejected: {type(e).__name__}")would solve the previous suggestion too.
Nits
-
xrspatial/geotiff/tests/test_roundtrip_properties.py:101: the profile namerockout_defaultreads as project-specific jargon. The issue's acceptance criterion says "default and ci".'local'would do, or just drop the registration and apply@settings(max_examples=200, deadline=None, ...)inline since only one test uses it. -
xrspatial/geotiff/tests/test_roundtrip_properties.py:237and:238:crs_epsgandn_bandsare always drawn but only consumed for some draws (half and one third respectively). Strategy slots wasted, not a bug. -
xrspatial/geotiff/tests/test_roundtrip_properties.py:452: 200 + 50 = 250 tmpdirs per session, each with two small.tiffiles. pytest cleans them up at teardown, but a single tmpdir plusos.unlinkafter each round would keep the file count to two at any moment. -
xrspatial/geotiff/tests/test_roundtrip_properties.py:175-179: docstring says "Returns None for the none case, or a Python scalar otherwise"; thenanbranch returnsfloat('nan'), which is a scalar but worth flagging explicitly.
What looks good
- All eight strategy axes from the issue are present.
assume(_is_legal_combo(spec))is inside the composite strategy, so filtered draws don't burn the example budget._assert_fixed_pointcompares NaN-as-data via mask-and-equal, and handles the transform tuple withmath.isclose._LOCKED_ATTRSincludes_NO_GEOREF_KEY, so the #2120 marker is in the contract.- Docstring cross-references #2087, #2092, #2120, and
test_backend_parity_matrix.py. setup.cfgpicks up thehypothesisdev-dep that the issue flagged as a follow-up.
Checklist
- Algorithm matches reference/paper -- N/A, test-only change.
- All implemented backends produce consistent results -- numpy and dask+numpy covered; GPU out of scope per issue.
- NaN handling is correct.
- Edge cases covered by tests -- 1x1, 1xN, Nx1 shapes included.
- Dask chunk boundaries handled correctly.
- No premature materialization or unnecessary copies.
- Benchmark exists or is not needed -- not needed for test-only PR.
- README feature matrix updated (if applicable) -- N/A.
- Docstrings present and accurate.
* Rename ``rockout_default`` / ``ci`` profiles to ``local`` / ``reduced`` and document in the module docstring that both are inheritance sources for ``@settings(parent=...)``, not the active CI profile. * Tag writer-reject ``assume(False)`` skips with ``hypothesis.event(...)`` so a regression that bumps the skip rate shows up in Hypothesis statistics output. * Draw ``crs_epsg`` only when a CRS is going to be passed to the writer and ``n_bands`` only when the band layout uses a band axis. * Clarify the ``_pick_nodata`` docstring on the ``nan`` return. * Unlink the per-example .tif files eagerly so a 200-example session doesn't leave 400 files on disk until pytest teardown.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review after 1478b36
All seven findings from the first pass are addressed in this push. Disposition:
Suggestions (3 of 3 fixed)
- Skip-rate visibility --
hypothesis.event(f"writer_rejected:{type(exc).__name__}")now tags each writer-rejectassume(False). Hypothesis stats output shows the rate (~4.29% writer_rejected:ValueErroron the latest local run). A regression that bumps the rate will surface in CI. -
'ci'profile name -- renamed to'reduced'. Docstring now explains that both profiles are inheritance sources for@settings(parent=...)rather than the active CI profile. - Unused
einexcept-- now bound asexcand consumed in theevent(...)call.
Nits (4 of 4 fixed)
-
rockout_defaultprofile name -- renamed to'local'. -
crs_epsg/n_bandsalways drawn -- now conditional inside the composite strategy.crs_epsgis drawn only whengeoref in ('crs_only', 'both');n_bandsonly whenband_layout != 'no_band'. - Tmpdir churn -- per-example
.tiffiles are nowos.unlink-ed in afinallyblock. Disk file count stays at 2 (numpy test) or 2 (dask test) at any moment instead of growing to 400 + 100 by session teardown. -
_pick_nodatadocstring -- now explicitly notes thenanbranch returnsfloat('nan').
Tests pass post-merge with origin/main: 200 + 50 examples, 0 failures.
…b#2134) ``test_round_trip_property`` draws a compression codec from ``LOSSLESS_CODECS`` which included ``lz4`` and ``zstd``. Neither package is in the ``[tests]`` extras, so the moment Hypothesis sampled either codec the writer raised ``ImportError`` and the fuzz run failed -- pre-existing flake the new property tests on PR 2141 happened to surface across all three CI platforms. Drop ``lz4`` / ``zstd`` from the strategy when their packages are missing, using the existing ``LZ4_AVAILABLE`` / ``ZSTD_AVAILABLE`` flags from ``_compression``. CI runners get a smaller codec set; local runs with the packages installed still exercise both.
Closes #2134.
Summary
xrspatial/geotiff/tests/test_roundtrip_properties.py. A composite Hypothesis strategy over eight metadata axes (coord dtype, axis direction, shape, CRS/transform presence, nodata mode, band layout, pixel dtype, CRS EPSG) feeds two test cases that assert a fixed-point invariant across two write -> read cycles.rockout_defaultprofile (200 examples). A second case wraps the input in dask chunks and runs on the registeredciprofile (50 examples) to exercise the streaming write path. GPU paths and byte equality stay out of scope per the issue.hypothesisin thetestsextras insetup.cfg. The new module guards its imports withpytest.importorskip("hypothesis")so the test still skips cleanly in environments that don't install the extras.Notes
assume(False)rather than treated as failures. Hypothesis stats show ~30-50% invalid draws on the strategy filter and ~5% from the in-test writer refusal, which is expected for the documented refusal surface.test_backend_parity_matrix.py, the no-georef marker from geotiff: to_geotiff silently strips georef on int64 step-1 user coords #2120, the nodata semantics split from Bug: attrs['masked_nodata'] reports True when masking was disabled #2092, and the int coords sentinel from Bug: integer spatial coords silently strip georef on write #2087, so future readers can trace the contract.Test plan
pytest xrspatial/geotiff/tests/test_roundtrip_properties.py::test_round_trip_fixed_point_numpy-- 200 passing examplespytest xrspatial/geotiff/tests/test_roundtrip_properties.py::test_round_trip_fixed_point_dask-- 50 passing examplespytest xrspatial/geotiff/tests/test_fuzz_hypothesis_1661.pystill green