Skip to content

Release-gate: overview/sidecar metadata survival (PR 3 of 5 of epic #2341)#2363

Merged
brendancol merged 3 commits into
mainfrom
issue-2359
May 24, 2026
Merged

Release-gate: overview/sidecar metadata survival (PR 3 of 5 of epic #2341)#2363
brendancol merged 3 commits into
mainfrom
issue-2359

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2359.

Summary

  • Adds xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py with 12 tests covering the metadata-survival contract on overview reads.
  • Two fixture shapes are built in-test: an internal-overview COG via to_geotiff(cog=True, overview_levels=[2, 4], nodata=...) and a tiled base + external .ovr sidecar via rasterio with TIFF_USE_OVR=YES.
  • Each test runs through both eager (open_geotiff) and dask (read_geotiff_dask) read paths. Per-level attrs must agree on crs, crs_wkt, georef_status, raster_type, nodata, masked_nodata. transform must scale pixel size by the level factor with the origin held.
  • A cross-source parity test asserts the internal-COG and external-sidecar reads return matching per-level metadata.
  • A row in docs/source/reference/release_gate_geotiff.rst under "Sidecar and overview interactions" points at the new file.

Backend coverage

Read paths: numpy eager, dask+numpy. CPU-only because the metadata contract lives on the attrs dict assembled by the reader, which is dtype-and-device agnostic; GPU pixel reads inherit the same attrs through the existing GPU sidecar tests (test_gpu_sidecar_georef_parity_2324.py).

Test plan

  • pytest xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py -v -- 12 passed locally
  • CI green on the PR

Epic #2341 flagged "overview reads lose CRS/transform/nodata metadata"
as a release-blocker risk. The existing overview tests cover pixel
correctness and specific nodata behaviour but do not pin the full
metadata contract across overview levels, and they do not cover parity
between internal COG overviews and external .ovr sidecars.

The new test file constructs both fixture shapes in-test and asserts,
through eager and dask reads, that crs, crs_wkt, georef_status,
raster_type, nodata, and masked_nodata agree across base + every
overview level, and that transform scales pixel size by the level
factor while keeping the origin fixed.

A row in docs/source/reference/release_gate_geotiff.rst under "Sidecar
and overview interactions" cites the new file.

Closes #2359
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 24, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Release-gate: overview/sidecar metadata survival (PR 3 of 5 of epic #2341)

Test-only PR. 12 parametrized tests, in-test fixture construction, eager + dask read paths, two source shapes (internal COG, external .ovr sidecar) plus a cross-source parity test. All 12 tests pass locally.

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:110-122 -- _write_internal_overview_cog does not assert the overview IFDs were actually written. A regression where to_geotiff(overview_levels=[2, 4], cog=True) silently writes a no-overview COG would only surface through the downstream shape-mismatch test. A one-line rasterio.open(path).overviews(1) == [2, 4] post-write check would localise the failure to the writer instead of the reader.

  • xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:61 -- _OVERVIEW_FACTORS = [2, 4] is a module-level mutable list. None of the tests mutate it, but using a tuple (2, 4) removes the foot-gun for future edits and matches the immutable shape of the other constants (_BASE_TRANSFORM is already a tuple).

Nits (optional improvements)

  • xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:45 -- dask_array = pytest.importorskip("dask.array") binds a name that is never used. The importorskip gate works whether the result is bound or not; calling pytest.importorskip("dask.array") on its own removes the unused-name lint signal.

  • xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:148 -- from rasterio.enums import Resampling lives inside _write_external_sidecar and runs on every sidecar build. Top-of-file would be marginally faster and consistent with the rasterio = pytest.importorskip(...) line at module scope. Cost is one extra import at module-load.

What looks good

  • In-test fixture construction for both COG and sidecar shapes, matching the issue spec's "build one in-test if not already present" guidance.
  • Inlined assertion helpers in this file only; no shared cross-PR helper module per the epic constraint.
  • Unique tmp paths use uuid.uuid4().hex plus the issue number and a test-specific label.
  • Both eager and dask read paths are parametrized; the cross-source parity test catches drift between the two pyramid paths.
  • The contract is articulated explicitly in _EQUAL_KEYS; the docs row in release_gate_geotiff.rst matches the assertion set.

Checklist

  • Algorithm matches reference/paper (N/A -- test-only)
  • All implemented backends produce consistent results (eager and dask read paths covered)
  • NaN handling is correct (the constructed fixture writes one NaN cell, exercising the masked_nodata lifecycle)
  • Edge cases are covered by tests (presence/absence symmetry on raster_type; absent-on-base behaviour handled)
  • Dask chunk boundaries handled correctly (chunks=8 straddles overview levels)
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed (release-gate test; no perf-sensitive surface)
  • README feature matrix updated (if applicable) -- N/A
  • Docstrings present and accurate (module docstring + per-helper docstrings)

- Add post-write sanity that to_geotiff actually emitted the requested
  overview IFDs so a writer regression localises near the writer call
  rather than as a downstream shape mismatch.
- Switch _OVERVIEW_FACTORS from a list to a tuple to match the other
  module-level constants and remove a foot-gun for future edits.
- Drop the unused dask_array binding from pytest.importorskip; the
  gate still fires whether the result is bound or not.
- Hoist `from rasterio.enums import Resampling` to module scope so it
  is not re-imported on every sidecar build.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (post fixes for #2363)

All four findings from the previous review are addressed in 0b1afab:

  • Suggestion 1 (writer sanity check): fixed. _write_internal_overview_cog now reopens the file with rasterio.open and asserts ds.overviews(1) == [2, 4] post-write.
  • Suggestion 2 (_OVERVIEW_FACTORS to tuple): fixed. The list-call sites that need a list (to_geotiff and ds.build_overviews) wrap with list(_OVERVIEW_FACTORS) at the call site.
  • Nit 1 (unused dask_array binding): fixed. The line is now pytest.importorskip("dask.array") with no LHS.
  • Nit 2 (in-function Resampling import): fixed. Hoisted to module scope under the # noqa: E402 next to the other post-importorskip imports.

12 tests still pass locally:

xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py ........ 12 passed in 0.28s

No new findings on this pass.

@brendancol
Copy link
Copy Markdown
Contributor Author

CI status

The fast-lane workflow on run (macos-latest, 3.14) failed with 4 VRT test failures, which then cancelled the sibling ubuntu-latest, 3.14 and windows-latest, 3.14 jobs under the same workflow's fail-fast policy:

  • xrspatial/geotiff/tests/test_unsupported_features_2349.py::test_vrt_with_skewed_geotransform_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_unsupported_resample_alg_raises
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_srcrect_size_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_dstrect_size_rejected

All four pre-exist on main at the merge base (sha 9c40df4). The same four failures appear in the main-branch CI run id 26363918141. The two changed files in this PR (test_release_gate_overview_sidecar_metadata_2341.py and one row in release_gate_geotiff.rst) do not touch VRT code or VRT test files.

The other workflows on this PR pass:

  • run (3.12) (rio-cogeo validator workflow) -- pass
  • run (ubuntu-latest, 3.14) (golden_corpus subset workflow) -- pass
  • label -- pass

The 12 tests added by this PR pass locally (eager + dask, COG + sidecar, cross-source parity). Treating the VRT failures as out of scope for this PR -- they are a separate main regression that another PR will need to fix.

@brendancol brendancol merged commit 43cd541 into main May 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release-gate: overview / sidecar metadata survival (PR 3 of 5 of epic #2341)

1 participant