Skip to content

GeoTIFF: VRT backend parity with .ovr sidecar interactions (#2321 sub-task 4)#2335

Merged
brendancol merged 3 commits into
mainfrom
issue-2330
May 23, 2026
Merged

GeoTIFF: VRT backend parity with .ovr sidecar interactions (#2321 sub-task 4)#2335
brendancol merged 3 commits into
mainfrom
issue-2330

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Adds xrspatial/geotiff/tests/test_vrt_backend_parity_2321.py, a small parity matrix that asserts pixel and metadata equality between eager and dask VRT reads, plus sidecar .ovr vs inline-overview equivalence.
  • Mirrors the structure of test_backend_parity_matrix.py: declarative fixture and backend specs, a shared _assert_metadata_parity helper, one parametrised cell per (fixture, backend) pair.
  • Windowed-coord shift parity is split out into its own test so a coord or transform drift on the dask path surfaces a single named failure instead of a matrix-wide flag.

Backend coverage

Tests-only. Backends exercised by the new cells:

  • numpy (eager)
  • dask+numpy

GPU + VRT parity already has dedicated coverage in test_vrt_lazy_chunks_1814.py and is out of scope here.

Test plan

  • pytest xrspatial/geotiff/tests/test_vrt_backend_parity_2321.py -v: 12 cells pass locally.
  • Sanity check: test_assert_metadata_parity_flags_transform_drift confirms the harness itself fails on a transform-only drift.
  • Sidecar fixture pair (overview_external_ovr_uint16.tif + .tif.ovr) and the inline-overview comparison fixture both already live under golden_corpus/fixtures/. No new on-disk fixtures added.

Closes #2330. Sub-task 4 of #2321.

Sub-task 4 of #2321. Locks eager vs dask parity on the VRT read surface
that is most likely to drift between backends: metadata (transform, crs,
crs_wkt, georef_status), windowed-coord shifts, and the .tif.ovr sidecar
lookup vs an equivalent inline-overview source.

The matrix mirrors test_backend_parity_matrix.py with a small declarative
fixture/backend layout, a shared materialise + parity helper, and one
parametrised cell per (fixture, backend) pair. Twelve cells in total.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 23, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: VRT backend parity with .ovr sidecar interactions (#2321 sub-task 4)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • test_vrt_backend_parity_2321.py:317-328 -- the sidecar-uint16-window cell only checks eager-vs-dask parity for its windowed transform; it does not pin absolute values. The float32 windowed test (test_windowed_vrt_shifts_coords_and_transform_consistently) does pin absolute coords and the pixel-size half of the transform tuple, so a regression where BOTH backends shift coords the same wrong way would slip past the sidecar cell. Either reuse that absolute pin (the bundled sidecar fixture has known pixel size 0.001 and origin -120.0, 45.0), or document explicitly that the sidecar cell is parity-only.

Nits (optional improvements)

  • test_vrt_backend_parity_2321.py:355-366 -- the vrt_fixture resolver globs *.vrt and returns the first match, then re-reads it via open_geotiff purely to recover the dtype. The dtype is already determined by the builder; storing it next to the path on first build (or hard-coding per _FixtureSpec) avoids a stray open_geotiff round-trip on every cache hit.
  • test_vrt_backend_parity_2321.py:431 -- test_sidecar_vrt_attrs_match_inline builds both VRTs fresh on every cell (no session cache) because it uses tmp_path directly. Two parametrised cells (eager + dask) and the build is cheap, so this is fine, but a session-scoped fixture would shave a few writes. Pure perf nit.
  • test_vrt_backend_parity_2321.py:299-329 -- the four _FIXTURES entries collapse onto two builders via builder.__name__. The collapse logic lives in the vrt_fixture resolver and is implicit; a one-line comment on _FIXTURES saying "fix_id is unique per (builder, window); the resolver caches per builder" would save a reader a hop.

What looks good

  • The harness mirrors test_backend_parity_matrix.py structure faithfully: same _materialise / _assert_pixels_equal shape, same dataclass-driven matrix, same labelled assertion messages. A future move to a shared parity harness is mechanical, as the docstring claims.
  • test_assert_metadata_parity_flags_transform_drift is the right kind of sanity check: it locks the helper's own behaviour so a regression that quietly drops one of the metadata assertions cannot let the matrix pass with empty checks.
  • The "windowed-cell straddles the tile seam" comment on two-tile-float32-window-spans-seam is the kind of intent-explaining note that catches the next reviewer up immediately.
  • Re-using the bundled overview_external_ovr_uint16.tif / .tif.ovr fixture (no new on-disk fixtures added) keeps the corpus small and matches the brief's "only if no existing case exercises the path."

Checklist

  • Algorithm matches reference -- tests-only, no algorithm changes.
  • All implemented backends produce consistent results -- exactly what the new cells assert.
  • NaN handling is correct -- _assert_pixels_equal uses NaN-aware comparison for floats.
  • Edge cases covered by tests -- window straddles tile seam; sidecar with and without window; cross-fixture pyramid comparison.
  • Dask chunk boundaries handled correctly -- chunks=(16, 16) on a 16x32 mosaic yields a 1x2 grid; window cells force the graph to read both backing sources.
  • No premature materialization or unnecessary copies -- _materialise is the only .compute() site, called once at assertion time.
  • [n/a] Benchmark exists or is not needed -- tests-only.
  • [n/a] README feature matrix updated -- tests-only.
  • Docstrings present and accurate -- module-level docstring explicitly names the parent issue and the acceptance bar.

- Cache (path, dtype) in the vrt_fixture resolver instead of re-opening
  the VRT on every cache hit just to recover the dtype.
- Add a dedicated test that pins the absolute coord/transform shift for
  the sidecar windowed cell. The parametrised matrix only checks
  eager-vs-dask equality; the absolute pin catches the regression where
  both backends drift the same way.
- Clarify that fix_id collapses to builder name in the cache via a comment
  on _FIXTURES.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (second pass): after follow-up commits

Status of original findings

  • Suggestion (sidecar window absolute pin): fixed in test_sidecar_window_shifts_to_known_coords. The bundled fixture's pixel size (0.001) and origin (-120.0, 45.0) are pinned, so a regression that drifts BOTH backends the same way now surfaces.
  • Nit (cache dtype lookup): fixed. vrt_fixture now caches (path, dtype) in an in-process dict; cache hits no longer round-trip through open_geotiff.
  • Nit (per-cell sidecar build in test_sidecar_vrt_attrs_match_inline): deferred. Two parametrised cells, builds are cheap, session-scoping adds complexity for no measurable gain.
  • Nit (comment on _FIXTURES collapse): fixed. One-line comment above _FIXTURES says the fix_id is unique per (builder, window) and the resolver caches per builder.

Blockers

None.

Suggestions

None.

Nits

None.

What looks good

  • The new sidecar absolute-shift test follows the same pattern as test_windowed_vrt_shifts_coords_and_transform_consistently, so the two windowed surfaces (float32 mosaic + sidecar) now have symmetric coverage.
  • The cache rework removes a stray open_geotiff call per cache hit, which is the kind of small cleanup that pays off when the matrix grows.

All 13 cells pass locally.

The cache dict introduced in the previous review-fix commit lived inside
the function-scoped ``vrt_fixture`` and was rebuilt every test, so every
cell that shared a builder still triggered a fresh write. On POSIX that
is just wasted I/O. On Windows ``to_geotiff`` renames a ``.tmp`` file
over the existing target, and the previous cell's read may still hold
the target open, producing PermissionError / OSError.

Move the cache to a session-scoped fixture so the build runs once per
builder for the entire pytest session.
@brendancol brendancol merged commit a614349 into main May 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoTIFF: VRT backend parity with .ovr sidecar interactions (#2321 sub-task 4)

1 participant