Route sidecar georef tags through the sidecar's bytes (#2315)#2320
Conversation
The local eager reader and the metadata-only path both swap pixel bytes over to the sidecar when the selected IFD lives there, but the georef extraction call right below the swap still passed the base file's data + header.byte_order. This worked under the usual GDAL convention -- sidecar IFDs carry no geokeys -- but silently broke when a sidecar declared its own GeoKeyDirectory or model tags. extract_geo_info_with_overview_inheritance now accepts an optional sidecar_origin mapping from id(ifd) to (data, byte_order). When the selected IFD is in the mapping AND declares one of the georef-bearing tags (33550, 33922, 34264, 34735), the helper resolves tag offsets against the sidecar's bytes. Sidecars without geokeys still inherit from the base file's level-0 IFD as before. Both call sites (_reader.py and __init__.py:_read_geo_info) now build and forward the mapping. Includes a synthetic regression test covering both cases: sidecar with its own ModelPixelScale / ModelTiepoint / GeoKeyDirectory, and sidecar with those tags stripped (today's GDAL-convention behavior).
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Route sidecar georef tags through the sidecar's bytes (#2315)
Blockers
- None.
Suggestions
-
xrspatial/geotiff/_cog_http.py:242-- the HTTP / fsspec metadata path callsextract_geo_info_with_overview_inheritance(ifd, ifds, header_bytes, header.byte_order, ...)without the newsidecar_originkwarg. A sidecar IFD over HTTP that declares its own georef payload still parses against the base file's bytes here. A sibling rockout is reportedly handling the HTTP / byte-order side of this finding; if that is the plan, leave a cross-reference comment in the source so future maintainers see the symmetry argument. Otherwise the one-line fix is to thread the sidecar's IFD ids and(sidecar.data, sidecar.header.byte_order)throughdiscover_remote_sidecar's return into a mapping and pass it through.
Nits
-
xrspatial/geotiff/_geotags.pyextract_geo_info_with_overview_inheritancebody -- the inheritance branch (~line 1174) still callsextract_geo_info(base_ifd, data, byte_order, ...). Ifbase_ifdsomehow comes from the sidecar (uncommon: the base file would need to have no full-res IFD of its own), the inherited georef parses against the wrong buffer. The newsidecar_originmap is the natural place to fix it -- the same lookup you do forifdwould work forbase_ifd. Low priority, but a 3-line guard is cheap. -
xrspatial/geotiff/tests/test_sidecar_own_geokeys_2315.py_mark_first_ifd_as_overviewassumes the new NewSubfileType entry (tag 254) sorts before every existing tag in the IFD, so it inserts at position 0. True forto_geotiffoutput today (smallest tag emitted is 256 / ImageWidth). Brittle if a future writer change adds a tag with id <= 254. Worth anassertthat the first existing tag id is > 254, or abisect-style insert.
What looks good
- The fix is conservatively gated on
_ifd_has_georef_payload(ifd). The GDAL-convention case (sidecar with no geokeys) is unchanged. - Both eager and metadata-only call sites are updated symmetrically.
- The regression test exercises both branches: sidecar wins when it has its own geokeys, base inherits when the sidecar has none.
- Signature change is backward compatible (new kwarg defaults to
None).
Checklist
- Algorithm matches reference
- NaN handling: not applicable (metadata path, no array ops)
- Edge cases covered (sidecar with vs without geokeys)
- Dask: the
_read_geo_infopath used by the dask graph builder is exercised - No premature materialization (metadata-only changes)
- Benchmark coverage: not applicable
- README feature matrix: not applicable (bug fix, no new function)
- Docstrings present and updated
* `_geotags.py`: when a base IFD is also tracked in `sidecar_origin` (uncommon: a file with no full-resolution IFD on the base side), resolve its georef against the sidecar's bytes too. Mirrors the lookup applied to the selected IFD. * `test_sidecar_own_geokeys_2315.py`: assert the first existing IFD tag id is > 254 before inserting the NewSubfileType entry at position 0, so a future writer change cannot silently break the fixture's "insert at front" assumption. * `_cog_http.py`: leave a cross-reference comment noting that the `sidecar_origin` routing added in #2315 is not threaded through the HTTP / fsspec metadata path yet; that gap is tracked by the separate HTTP sidecar-byte-order finding.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review
Disposition of the round-1 findings:
- Suggestion (
_cog_http.py:242, HTTP sidecar): deferred to the sibling HTTP / dask sidecar-byte-order rockout that already owns this surface. Added a cross-reference comment in the source so the symmetry argument is visible to future maintainers. - Nit (base_ifd inheritance): fixed in commit 9d8a3e3. The inheritance branch now applies the same
sidecar_originlookup tobase_ifdthat the main branch applies toifd. - Nit (test fixture insert assumption): fixed in commit 9d8a3e3.
_mark_first_ifd_as_overviewnow asserts the first existing IFD tag id is > 254 before inserting NewSubfileType at position 0, so a future writer change that emits a tag <= 254 fails loudly instead of silently producing an out-of-order IFD.
All 34 sidecar-focused tests still pass after the follow-up.
Summary
Closes #2315.
sidecar_originmapping toextract_geo_info_with_overview_inheritance. When a sidecar IFD declares its own GeoKeyDirectory, ModelPixelScale, ModelTiepoint, or ModelTransformation, the helper parses those tags against the sidecar's bytes and byte order instead of the base file's._reader.pyand_read_geo_infoin__init__.py) so the eager and metadata-only paths agree.Backend coverage
Read-side only. The fix applies to every read backend that shares the metadata parse: numpy eager, dask+numpy via
_read_geo_info, and cupy / dask+cupy which call back into the same metadata helper. No GPU or dask-specific code changes.Test plan
test_sidecar_own_geokeys_2315.pycovers both branches:open_geotiff(eager) and_read_geo_info(metadata-only / dask graph builder) exercised.test_sidecar_ovr_2112.pysuite still passes (28 tests).