VRT metadata parity tests across backends (#2333)#2338
Conversation
Locks the cross-backend metadata contract for VRT reads: eager numpy, dask via the chunks= dispatcher, and GPU eager via read_vrt(gpu=True). Each backend's attrs (transform, crs, georef_status, nodata, masked_nodata) and coord arrays must match the numpy baseline for the full-georef, transform-only, and integer-with-nodata cases. Negative tests pin the fail-closed posture for ambiguous VRT input: mixed CRS, mixed per-band nodata, unsupported resampling with a size-changing destination rectangle, malformed SrcRect/DstRect, and missing sources under both 'raise' and 'warn' policies. The mixed-CRS case is currently silently flattened and lands as xfail(strict=True) so it surfaces as XPASS the moment sub-PR 2's validator closes the gap. Part of #2321 (release hardening sub-task 3). Temp file names include '_2321_' per CLAUDE.md.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: VRT metadata parity tests across backends (#2338)
Tests-only PR adding backend parity coverage for VRT reads. The cross-backend assertion shape (key-set diff + value diff + paired pixel test) is solid, and the xfail(strict=True) on the mixed-CRS case is the correct way to pin a known gap. A few findings on tightening the safety net and on de-duplicating against the existing test infrastructure.
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:51-62 duplicates the project's shared GPU detection helper.
xrspatial/geotiff/tests/conftest.py:14-53already exportsgpu_available()andrequires_gpu = pytest.mark.skipif(...). The local_cupy_available/_HAS_GPUblock reaches the same answer, but it runs at import time twice (once here, once in conftest) and risks drifting from the canonical implementation. Replace the local_HAS_GPUwithfrom xrspatial.geotiff.tests.conftest import requires_gpuand applymarks=requires_gpuin the_BACKENDSlist instead of an inlinepytest.mark.skipif(not _HAS_GPU, ...). -
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:739-749 and 825-836:
test_unsupported_resample_alg_raisesandtest_negative_dstrect_size_rejectedassert against the exception type but not the message. A regression that swaps in an unrelatedNotImplementedError(e.g., a code path that hasn't been wired up yet) or an unrelatedValueError(e.g., the source validator firing on a different field) would keep the test green. Add amatch=clause naming the offending field:match=r"Bilinear|1751"for the resample test,match=r"DstRect"or similar for the dstrect test, so the test pins the actual rejection path rather than "something raised." -
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:896-928:
test_missing_sources_warn_records_holesweakens silently if the hole data structure changes. Line 924 readsholes[0].get('source') if isinstance(holes[0], dict) else Noneand line 925 only asserts whenhole_source is not None. If a future refactor changes hole entries from dicts to dataclass instances (or anything else), the path-equality assertion is silently skipped and the test still passes. Make the dict check a hard assertion:assert isinstance(holes[0], dict), f"hole entry type drifted: {type(holes[0])}", thenhole_source = holes[0]['source']and the existing path check. The contract documented in #1734 promises a dict shape, so the test should pin it. -
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:574-612:
_write_mixed_crs_vrt's docstring (lines 510-518) says "the existing reader rejects the conflicting WKT at parse time," but thexfailblock ontest_mixed_crs_vrt_does_not_silently_flattenrecords the opposite: the read succeeds and silently flattens. The two comments contradict each other. The helper docstring should match reality. Either rewrite it to say "today the per-source CRS check does NOT reject this; see the xfail on the consumer test," or drop the misleading sentence. Keeping it makes the next reader doubt which one is current.
Nits (optional improvements)
-
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:75-85:
_UTM33N_WKTis defined and commented "Used for the mixed-CRS negative" but never referenced anywhere in the file. The mixed-CRS fixture usesattrs={'crs': 32633}on the second source and letsto_geotiffresolve the WKT; the constant is dead. Either wire it into the fixture so the test exercises a literal-WKT mixed-CRS case (not just an EPSG-code one), or drop it. -
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:661-667: the dask parametrize for
test_mixed_nodata_vrt_fails_closed_by_defaultuses a barelambda, so the pytest test id renders asdask_chunks_2-<lambda>instead of a readable name. Define_read_dask_chunks_2(p): return open_geotiff(p, chunks=2)at module scope and pass that, or passid=explicitly on thepytest.param, so failures render[dask_chunks_2]cleanly. -
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:39-40:
to_geotiffis imported fromxrspatial.geotiffbutwriteis imported fromxrspatial.geotiff._writer. The two writers aren't interchangeable (public surface vs internal one). Worth a one-line comment near the import block stating why both are needed. The fixture helpers below useto_geotifffor full-coords DataArray sources andwritefor raw numpy arrays (the per-band integer fixtures), which is fine, just make the split visible to the next reader. -
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:611: the
# noqa: B017comment refers to flake8-bugbear's "do not assert blind exception" rule, but the project's lint config may not enable bugbear. IfB017isn't in the active ruleset the noqa pragma is a no-op and can be dropped. Defensive comments naming a non-active rule add noise without protecting anything. -
Module docstring at lines 1-29 lists
gdal_metadata_xmlandextra_tagsin the contract list (lines 8-9), but both keys live in_VRT_OMITTED_ATTR_KEYSand are never asserted on (the VRT path documented at_backends/vrt.pyintentionally omits them). The docstring overstates the file's coverage. Either trim the contract list at the top to match the keys the file actually asserts on, or add an explicit "this file's coverage stops at<list>;gdal_metadata_xml/extra_tagsparity is owned by the non-VRT backend parity suite" note so the next reader doesn't chase a phantom assertion.
What looks good
- Paired attrs/pixels/coords tests (one VRT case, three separate test functions): a regression that fixes one and breaks another surfaces on exactly one assertion. Helps triage.
xfail(strict=True)on the mixed-CRS gap flips to XPASS the moment sub-PR 2's validator closes the gap, which forces the follow-up commit to convert it to a properpytest.raises(VRTUnsupportedError). The reasoning is captured in the xfail message.- The eager-vs-dask
_BACKEND_LIFECYCLE_KEYSfilter fornodata_pixels_presentis documented (lines 114-120) with a citation to issue #2135. Without that note the cross-backend equality would either fail or silently drop a real attr. _to_numpyuses the.data.get()path for cupy buffers per the project's GPU convention rather than.values.- Every temp file name embeds
_2321_per CLAUDE.md so parallel runs do not collide.
Checklist
- No algorithmic code changed (tests-only PR)
- Backend parity exercised across numpy / dask / GPU
- NaN handling pinned via the masked-nodata case
- Edge cases: file does not cover single-pixel rasters, all-NaN VRTs, or empty SrcRect (probably out of scope here; the existing suite covers those individually)
- Dask chunk boundaries handled (chunks=2 on 4x4 grids forces non-trivial chunking)
- No premature materialization (only test paths that need value comparison call
.compute()/.values) - [N/A] Benchmark (tests-only)
- [N/A] README feature matrix (no new public function)
- Docstrings: module docstring slightly overstates coverage (see Nits); per-test docstrings are clear
- Reuse the project-wide requires_gpu marker from xrspatial.geotiff.tests.conftest instead of re-implementing the cupy + CUDA probe locally. - Tighten the unsupported-resample and bad-DstRect negatives with match= clauses on the algorithm name and the field name so an unrelated raise from elsewhere in the VRT stack cannot keep the tests green. - Replace the conditional isinstance check on the vrt_holes entry in the missing_sources='warn' test with a hard assertion + key access, so a future refactor that changes the entry shape surfaces instead of silently weakening the path-equality check. - Fix the misleading docstring on _write_mixed_crs_vrt; today the per-source CRS check does NOT reject the conflict, which is what the xfail on the consumer test documents. - Drop the unreferenced _UTM33N_WKT constant. - Replace the inline lambda in the mixed_nodata parametrize with a named module-level _read_dask_chunks_2 so the pytest id renders readably (no more <lambda> in failure output). - Add a comment in the import block explaining why to_geotiff and write are both pulled in. - Drop the # noqa: B017 pragma; the project lint config does not enable flake8-bugbear, so it was a no-op. - Move gdal_metadata_xml / extra_tags out of the module docstring's "covered" list and into a new "out of scope" section so the doc matches the actual assert surface.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review pass (#2338)
Commit 06268931 addresses every finding from the prior review. Disposition summary:
Suggestions (4): all fixed
- Reuse
requires_gpufromxrspatial/geotiff/tests/conftest.py:51instead of the local_cupy_available/_HAS_GPUblock. The local helper was deleted;_BACKENDSnow usesmarks=requires_gpu. test_unsupported_resample_alg_raisesnow passesmatch=r"Bilinear|1751".test_negative_dstrect_size_rejectednow passesmatch=r"DstRect.*negative size"(verified against the actual error message string).test_missing_sources_warn_records_holes: the conditionalisinstance(...) else Noneshim is gone. The test now hard-assertsisinstance(holes[0], dict)and readsholes[0]['source']directly, so a future entry-type drift surfaces here._write_mixed_crs_vrtdocstring rewritten so it matches the xfail on the consumer test: today the per-source CRS check does NOT reject the conflict.
Nits (5): all applied
_UTM33N_WKTconstant removed.- Inline
lambdain themixed_nodataparametrize replaced by a named module-level_read_dask_chunks_2. Test ids now render[dask_chunks_2-_read_dask_chunks_2]. - Import block carries a comment explaining why
to_geotiffandwriteare both pulled in. # noqa: B017pragma dropped; the project lint config does not enable bugbear so it was a no-op.- Module docstring contract list trimmed to the keys this file actually asserts on (
transform,crs,nodata,masked_nodata,georef_status,raster_type), withcrs_wkt/gdal_metadata_xml/extra_tagsmoved into an explicit "out of scope" section.
Test status
27 passed, 1 xfailed (the mixed-CRS gap that sub-PR 2 must close). 84 passed, 1 xfailed across the broader VRT test suite locally.
Closes #2333. Part of #2321 (release hardening sub-task 3 of 6).
Summary
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.pylocking the cross-backend metadata contract for VRT reads. Each backend's attrs and coord arrays must match the numpy baseline; a regression that fixes pixels but breaks attrs (or vice versa) trips one of the paired assertions, never both silently.pytest.importorskip('cupy').'raise'and'warn').xfail(strict=True)so it flips to XPASS the moment sub-PR 2's validator closes the gap.Backend coverage
open_geotiff(vrt)(dispatcher path)open_geotiff(vrt, chunks=2)(dispatcher path)read_vrt(vrt, gpu=True)(open_geotiffrejects.vrt + gpu=True; the direct entry point owns the gpu kwarg)pytest.importorskip('cupy')plus acupy.cuda.is_available()check guards the GPU cases.Test plan
pytest xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py -v(27 passed, 1 xfailed locally with a working CUDA device)pytestagainst the surrounding VRT regression suite (84 passed, 1 xfailed)Notes
VRTUnsupportedErroris not landed yet. The negative tests carryTODO(#2321)comments and assert against the current error types; the upgrade is mechanical when PR 2 ships._2321_perCLAUDE.md.