Route GPU eager sidecar reads through the base buffer for georef (#2324)#2328
Conversation
The GPU eager path in ``xrspatial/geotiff/_backends/gpu.py`` swapped the local ``data, header`` references to the sidecar's buffer before calling ``extract_geo_info_with_overview_inheritance``. For sidecar IFDs that lack their own geokeys (the common GDAL convention), the helper then parsed the inherited base IFD against the sidecar bytes. The CPU eager path and dask metadata path keep ``data, header`` bound to the base file's buffers and pass ``sidecar_origin=georef_origin`` to the helper; the GPU path did not. Mirror that pattern on the GPU path: * Keep ``data, header`` pointing at the base file. * Use ``ifd_data, ifd_header`` for tile / strip reads that need the sidecar buffer when the selected IFD lives in the sidecar. * Build the ``georef_origin`` mapping and pass it to the helper so a sidecar IFD that does carry its own georef payload routes the parse to sidecar bytes (the #2315 contract on the eager GPU path). Add a parity test that loads the same ``.tif`` + ``.tif.ovr`` pair via CPU eager, dask, and GPU and asserts georef attrs match. The GPU half is gated behind the standard cupy + CUDA skip so the test runs cleanly on CI machines without a GPU.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Route GPU eager sidecar reads through the base buffer for georef (#2324)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
- PR body slightly overstates the pre-fix harm. The body says the helper "parsed the inherited base IFD against the sidecar bytes" leading to silently wrong georef. In practice, IFD entries are pre-resolved at
parse_all_ifdstime (eachIFDEntry.valueis already extracted against the IFD's source buffer), andextract_geo_info->_parse_geokeysdoes not use thedataargument to look up offsets. It readsifd.get_value(TAG_GEO_DOUBLE_PARAMS)andifd.entries[...].value. So the pre-fix code returned correct georef in today's tree, even though the buffer pointer was wrong. The fix is still right -- it aligns the GPU path with the CPU and dask paths and protects against any future change that does usedatafor offset resolution -- but the PR body could acknowledge that the bug is latent today rather than user-visible, so a reader does not go looking for a regression test that exposes the divergence. The parity tests confirm agreement across backends, not pre-fix corruption.
Nits (optional improvements)
-
xrspatial/geotiff/_backends/gpu.py:1282: the chunked GPU GDS qualification probe also callsextract_geo_info_with_overview_inheritance(ifd, ifds, raw, header.byte_order, ...)withoutsidecar_origin. The chunked path does not attach sidecar IFDs to itsifdslist (parses the base file only) and falls through toread_geotiff_daskfor non-qualifying files, so the omission is not exercised today. Addingsidecar_origin=Noneexplicitly, or wiring the same mapping for completeness, would make the four call sites of this helper match shape. Out of scope for the current PR per issue #2324's GPU-eager framing, but a candidate follow-up if you want full alignment. -
xrspatial/geotiff/tests/test_gpu_sidecar_georef_parity_2324.py:79:_attrs_subsetbuildstuple(da.attrs.get("transform", ())). If a backend ever stores the transform as a numpy array,tuple(np.array)yields numpy scalars and the equality check may still pass but with surprising types.tuple(float(x) for x in da.attrs.get("transform", ()))normalizes that. Minor.
What looks good
- The
data, header->ifd_data, ifd_headerrename is consistent across all four downstream call sites (lines 689, 753, 774, 795). Comments updated to match the rename, and the new variable names make the sidecar-vs-base distinction obvious to a future reader. georef_originconstruction matches the CPU and dask paths byte-for-byte. Theif sidecar_origin else Noneshort-circuit avoids allocating an empty dict in the common no-sidecar case.- Tests share the
_write_pairfixture fromtest_sidecar_own_geokeys_2315.pyrather than duplicating the IFD-rewriting plumbing, with a docstring note explaining why. Keeps the two suites coupled to the same fixture format. - GPU assertions gated behind
@_gpu_only(cupy + CUDA), so the test file runs cleanly on CI machines without a GPU.
Checklist
- Algorithm matches reference: pattern mirrors
_reader.py:244and__init__.py:304. - All implemented backends produce consistent results: confirmed by the new parity test trio.
- NaN handling: not applicable, metadata-only change.
- Edge cases covered: with-geokeys, without-geokeys, and the cross-backend baseline.
- Dask chunk boundaries: not applicable.
- No premature materialization or unnecessary copies: no new allocations beyond the
georef_originmapping (skipped when there is no sidecar). - Benchmark: not applicable.
- README feature matrix: not applicable, no new public function.
- Docstrings present and accurate: helper docstrings on the test file and inline comments explain the bug, the fix, and the alignment with CPU and dask paths.
* Pass ``sidecar_origin=None`` explicitly at the chunked GPU GDS qualification probe call site so all four call sites of ``extract_geo_info_with_overview_inheritance`` in this file share the same call shape. No semantic change -- the probe parses base-file IFDs only and non-qualifying files fall through to ``read_geotiff_dask`` which already handles sidecars. * Coerce transform tuple elements to plain ``float`` in ``_attrs_subset`` so cross-backend equality is on native Python floats rather than a mix of Python floats and numpy scalars.
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up after nit fixes): Route GPU eager sidecar reads through the base buffer for georef (#2324)
Blockers
None.
Suggestions
None remaining.
Nits
None remaining. The previous review's nits are addressed:
xrspatial/geotiff/_backends/gpu.py:1282:sidecar_origin=Noneis now passed explicitly at the chunked GPU GDS qualification probe call site, with a comment that explains why the probe parses base IFDs only. All four call sites ofextract_geo_info_with_overview_inheritancein this file now share the same call shape.xrspatial/geotiff/tests/test_gpu_sidecar_georef_parity_2324.py:79:_attrs_subsetcoerces transform elements tofloatso the cross-backend equality check uses native Python floats.- PR body now flags that the bug is latent today rather than user-visible, so readers do not look for a regression test that exposes pre-fix corruption.
What looks good
- Follow-up commit is tightly scoped (2 files, +18 / -2). No drive-by refactors.
- All 37 sidecar-related tests pass locally after the follow-up.
Checklist
- All findings from the previous review either addressed or noted as out of scope with reason.
- No new findings introduced by the follow-up commit.
- PR body now matches the actual scope of the change.
Closes #2324
Summary
The GPU eager
read_geotiff_gpupath overwrote its localdata, headerreferences with the sidecar's buffer before callingextract_geo_info_with_overview_inheritance. The CPU eager path (_reader.py) and the dask metadata path (__init__.py) keepdata, headerbound to the base file and passsidecar_origin=georef_originto the helper. The GPU path did not.In today's tree the divergence is latent rather than user-visible: IFD entries are pre-resolved at
parse_all_ifdstime andextract_geo_info->_parse_geokeysreads fromifd.entries[...].valuerather than re-resolving offsets againstdata, so the pre-fix code happens to return the correct georef. The fix aligns the GPU path with the CPU and dask paths and protects against any future change that does start usingdatafor offset resolution. The intent of the variable names also becomes clear:dataandheaderare always the base file's;ifd_dataandifd_headerare the swapped buffer used for tile and strip reads when the selected IFD lives in the sidecar.This PR mirrors the CPU/dask pattern on the GPU eager path:
data, headerpointing at the base file's buffers.ifd_data, ifd_headerlocals for tile / strip reads that need the sidecar buffer when the selected IFD lives in the sidecar.georef_originmapping and pass it to the helper so a sidecar IFD that does carry its own georef payload routes the parse to sidecar bytes (the Sidecar georeferencing extraction ignores sidecar's own geokeys #2315 contract on the GPU eager path).sidecar_origin=Noneexplicitly at the chunked GPU GDS qualification probe site so all four call sites of the helper share the same call shape.Backend coverage
read_geotiff_daskfor sidecar files, which already routes through the CPU sidecar handling.Test plan
test_gpu_sidecar_georef_parity_2324.pyloads the same.tif+.tif.ovrpair via CPU eager, dask, and GPU and asserts georef attrs (transform, crs) match. The tests confirm cross-backend agreement, not pre-fix corruption (see Summary above).cupy + CUDA requiredskip via the project's existing_gpu_available()helper. Verified locally on a CUDA-enabled box (cupy.cuda.is_available() == True).-k "sidecar or gpu_window or gpu_nodata or remote_sidecar").