Refactor GeoTIFF Phase 4: complete metadata finalization centralization#2232
Merged
Conversation
Closes #2227 (part of #2211). * Add optional pixels_present kwarg to _finalize_lazy_read_attrs so the eager VRT path can stamp nodata_pixels_present through the shared finalization helper instead of writing it after the call. The default keeps the dask lazy contract from issue #2135. * Drop stale _populate_attrs_from_geo_info, _set_nodata_attrs, and _validate_read_geo_info imports from _backends/gpu.py. The GPU paths now route every attrs write through _finalize_eager_read or _finalize_lazy_read_attrs and never call those lower-level helpers directly. * Add test_attrs_finalization_parity_2211.py, a table-driven attrs parity test comparing eager numpy, dask+numpy, GPU (when available), dask+GPU (when available), and VRT for four fixture shapes. Backend-specific keys are carved out in one named frozenset.
brendancol
commented
May 21, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: GeoTIFF Phase 4 attrs finalization centralization
Blockers
None.
Suggestions
-
xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py:19— docstring still says "MinIsWhite photometric" but the fixture was renamed touint8_no_nodata. Either restore MinIsWhite coverage or update the description to match the actual fixture. -
xrspatial/geotiff/_attrs.py:1356— the newpixels_presentkwarg has no direct unit test intest_finalization_helpers_2162.py. The lazy "pixels_present absent by default" assertions stay green, but a positive test (passpixels_present=True/False, check the attr lands) would prevent a future regression where the kwarg silently stops being threaded through to_set_nodata_attrs. The parity test covers it transitively via VRT, which is OK but indirect.
Nits
-
test_attrs_finalization_parity_2211.py:122—_attrs_closedeclaresrtolas a parameter but only consults it inside thetransformbranch. Either drop the parameter or apply it to every numeric key for consistency. The docstring says "all other keys must match exactly" so dropping it is the cleaner option. -
test_attrs_finalization_parity_2211.py:67-70— the 3-D dims branch (['y', 'x', 'band']) in_coord_arrayis dead code since none of the fixtures pass 3-D arrays. Either drop the branch or add a multi-band fixture. -
test_attrs_finalization_parity_2211.py:259—_open_vrtcallsopen_geotiff(path)to seed the VRT XML. A silent drift inopen_geotiffwould propagate into the VRT helper and only surface as a VRT-vs-dask divergence rather than a clean failure. Reading the values from the source DataArray that_write_*returns (or accepting them as parameters) would isolate the VRT path from the eager path.
What looks good
- The diff is exactly what #2227 called for: one new kwarg with a backward-compatible default, one inline write removed, three stale imports cleaned up.
- The lazy contract from #2135 is preserved —
pixels_presentdefaults toNoneand all existing'nodata_pixels_present' not in out.attrsassertions intest_lazy_finalization_parity_2162.pystill pass. - The carveout for backend-specific keys lives in one named
frozensetwith a self-test against the docstring, so a future maintainer adding a new key has a single place to update. - The full geotiff suite stays green (the one failure in
test_lowlevel_write_pushdown_2138.py[lz4]is the experimental-codec gate and predates this PR).
Checklist
* Add four unit tests in test_finalization_helpers_2162.py covering the new pixels_present kwarg on _finalize_lazy_read_attrs: forwarded True / False values land on attrs, the None default still omits the attr (dask contract from #2135), and a stray forward is ignored when nodata is None. * test_attrs_finalization_parity_2211.py: decouple the VRT helper from open_geotiff by reading dtype + nodata from a _FixtureMeta the writer returns, and the on-disk geometry from module-level constants shared with _coord_array. The VRT wrapper now generates the XML from those constants and shifts origin by half a pixel to match the TIFF AREA_OR_POINT=Area convention. * Drop the unused rtol parameter from _attrs_close and move the tolerance into module-level constants. * Drop the dead 3-D branch from _coord_array and assert 2-D input. * Update module docstring to match the renamed uint8_no_nodata fixture (the previous MinIsWhite text was stale).
brendancol
commented
May 21, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review (round 2): GeoTIFF Phase 4 attrs finalization centralization
Re-review after commit e30ab49.
Blockers
None.
Suggestions
None.
Nits
None blocking. The previous round's nits are all addressed.
Disposition of round-1 findings
- Suggestion 1 (docstring drift): fixed. Module docstring now matches the
uint8_no_nodatafixture name. - Suggestion 2 (missing direct unit test for
pixels_presentkwarg): fixed. Four new tests intest_finalization_helpers_2162.pycover the True / False / None / "nodata=None ignores stray forward" cases. - Nit 1 (
_attrs_closertol parameter): fixed. Tolerance is now_TRANSFORM_RTOL/_TRANSFORM_ATOLmodule constants and only applies to thetransformkey, matching the docstring. - Nit 2 (dead 3-D branch): fixed.
_coord_arraynow asserts 2-D input. - Nit 3 (
_open_vrtcallingopen_geotiff): fixed. The VRT helper takes a_FixtureMetafrom the writer and reads geometry from module-level constants. The same constants seed_coord_array, so the VRT path and the TIFF path now share one source of truth without going throughopen_geotiff.
What looks good
- The fixture-geometry constants are paired with an in-function assert (
_FIX_HEIGHT/_FIX_WIDTHvsarr.shape), so a future fixture that uses a different shape fails loudly rather than silently producing a mis-sized VRT. - The half-pixel shift in
_open_vrt(corner-based GDAL convention vs center-based xarray coords) is explicit and explained in a comment, so a future maintainer can tell the offset is intentional rather than a copy-paste bug. - The new
pixels_present=True/False/None/nodata=Nonetest matrix pins all four states the kwarg can land in, so a future regression where the kwarg stops being threaded through will fail loudly.
Checklist
- Refactor matches issue #2227 scope
- No public API change; existing contract tests still pass
- Backend keysets compared in both directions (value + key set)
- Lazy/dask contract from #2135 preserved
- Direct unit coverage for the new
pixels_presentkwarg - VRT helper isolated from
open_geotiff - Stale imports removed
Ready to merge once CI is green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2227
Part of #2211
Summary
Finishes the migration started by PRs #2200, #2205, #2207, #2209 from epic #2162. After this PR, every read backend assembles its attrs through one of two finalization entry points in
xrspatial/geotiff/_attrs.py:_finalize_eager_readfor the eager numpy + GPU paths._finalize_lazy_read_attrsfor the dask, dask+GPU, and VRT paths.Changes
_attrs.py: add an optionalpixels_presentkwarg to_finalize_lazy_read_attrs. The default staysNoneso the dask backends keep the lazy contract from geotiff: split overloaded masked_nodata into separate nodata lifecycle signals #2135 (a strict per-chunk reduction would force.compute())._backends/vrt.py: the eager VRT path now threads its VRT-awarenodata_pixels_presentscan through the helper's new kwarg instead of writing the attr after the call._backends/gpu.py: drop stale imports of_populate_attrs_from_geo_info,_set_nodata_attrs, and_validate_read_geo_info. The GPU paths route every attrs write through the two top-level helpers.test_attrs_finalization_parity_2211.pyis a table-driven parity check across eager numpy, dask+numpy, GPU (when available), dask+GPU (when available), and VRT. Four fixture shapes (plain float, float with sentinel, integer with sentinel, plain uint8). Backend-specific keys (vrt_holes,nodata_pixels_present, TIFF tag pass-throughs) are carved out in a single named frozenset so a future migration that promotes one of those keys to all backends has one place to update.Backend coverage
numpy, cupy, dask+numpy, dask+cupy, VRT. GPU branches are gated on
cupy + CUDAavailability so the test still runs on CPU-only hosts.Test plan
pytest xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.pypasses (9/9)pytest xrspatial/geotiff/tests/test_attrs_parity_1548.py test_attrs_contract_*.py test_finalization_helpers_2162.py test_eager_finalization_parity_2162.py test_lazy_finalization_parity_2162.py test_vrt_finalization_parity_2162.pyall green (129 tests)test_lowlevel_write_pushdown_2138.py[lz4](experimental codec gate, untouched by this PR)Coordination
PR-B (#2225) is also touching
_attrs.py(georef_status). PR-C (#2226) is touching_reader.py/backends for nodata. PR-F (#2229) is adding a parity test. I kept the_attrs.pyedit here to one small added kwarg so a rebase against any of those branches stays cheap.Not in scope
test_attrs_contract_*.pyandtest_attrs_parity_1548.pywere left untouched and still pass.