Skip to content

Tolerate unreadable .ovr sidecar on base read (#2416)#2419

Merged
brendancol merged 2 commits into
mainfrom
issue-2416
May 26, 2026
Merged

Tolerate unreadable .ovr sidecar on base read (#2416)#2419
brendancol merged 2 commits into
mainfrom
issue-2416

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2416.

Summary

  • The release contract puts reader.local_file at the stable tier and reader.sidecar_ovr at advanced. Before this change a stale or malformed sibling .ovr would take the stable base read down on all three eager paths (CPU, GPU, metadata-only helper).
  • Catch sidecar load failures in _read_to_array, _read_geo_info, and the GPU backend; emit a RuntimeWarning and fall through to base-only. CloudSizeLimitError still propagates because the byte budget is a caller-set contract.
  • Requesting a specific external overview level (overview_level >= 1) still surfaces the underlying parse error. Silent fallback only applies when the caller did not ask for the sidecar surface.
  • Pre-existing discover_remote_sidecar (dask metadata path) already followed this rule. The eager paths now match.

Backend coverage

Test plan

  • pytest xrspatial/geotiff/tests/test_sidecar_bad_does_not_break_base_2416.py (12 new tests including 5 parametrized payloads, GPU eager path, CloudSizeLimitError pass-through, explicit-level error surfacing).
  • pytest xrspatial/geotiff/tests/test_sidecar_*.py xrspatial/geotiff/tests/test_remote_sidecar_*.py (70 existing sidecar tests still pass).
  • Full geotiff suite: pytest xrspatial/geotiff/tests/ -> 5731 passed, 68 skipped, 6 xfailed.

The release contract puts reader.local_file at the stable tier and
reader.sidecar_ovr at advanced. Before this change, the eager CPU
path, the eager GPU path, and the metadata-only helper all parsed a
sibling .ovr before IFD selection, so a stale or corrupt sidecar (a
gdaladdo dropping, a partial download, a third-party tool artefact)
turned an advanced-tier fault into a stable-tier base read failure.

Catch sidecar load failures during discovery, emit a RuntimeWarning,
and fall through to base-file-only behaviour. CloudSizeLimitError
still propagates because the byte budget is a caller-set contract.
Requesting a specific external overview level surfaces the underlying
parse error -- silent fallback only applies when the caller did not
ask for the broken surface. Mirrors the contract that
discover_remote_sidecar already uses on the dask metadata path.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The change is tight and the test surface is solid. A few items below; none are blockers.

Blockers

None.

Suggestions

  • xrspatial/geotiff/__init__.py:290-301 and xrspatial/geotiff/_backends/gpu.py:419-435: the _read_geo_info and GPU eager paths catch a bare Exception and do not re-raise CloudSizeLimitError the way _reader._read_to_array does. In practice both paths are local-file-only at this call site, so the budget error cannot fire today, but the asymmetry is easy to miss the next time someone routes a cloud source through here. Either re-raise CloudSizeLimitError for symmetry with _reader.py:244 or add a one-line comment at each site noting "local-file-only here; budget breach cannot fire" so the next reader does not wonder.
  • xrspatial/geotiff/_reader.py:248 (and the two parallel sites): stacklevel=2 points at the immediate caller of _read_to_array, not at the user's open_geotiff / read_geotiff_dask call. The warning location in the user's traceback may end up inside the library rather than at the user's call line. Consider bumping to stacklevel=3 or higher after walking the actual call chain.

Nits

  • The warning text says "Request a specific external overview level to surface the error instead." That is accurate but a bit indirect. A short hint like "Delete the .ovr file or pass overview_level=N >= 1 to surface the parse error" is more actionable. Optional.
  • xrspatial/geotiff/tests/test_sidecar_bad_does_not_break_base_2416.py has a _gpu_available() helper that duplicates the same pattern used in test_sidecar_ovr_2112.py. Not worth a refactor in this PR, but a candidate for a shared conftest.py fixture if anyone consolidates later.

What looks good

  • Three call sites covered: _reader._read_to_array, __init__._read_geo_info, _backends.gpu.read_geotiff_gpu. The contract is consistent across CPU eager, GPU eager, and dask metadata paths.
  • CloudSizeLimitError re-raise on the CPU eager path keeps the caller-set byte budget visible.
  • Tests pin the level-1-still-raises behaviour, the overview_level=0 path, the CloudSizeLimitError pass-through (via monkeypatch on the local-file path), and five corrupt-payload shapes.
  • Module-level import warnings added to _reader.py rather than function-scoped, avoiding the UnboundLocalError trap that bit the GPU file before the fix.
  • Release-contract row updated so the tier promise is documented, not just implemented.

Checklist

  • Algorithm matches reference: contract aligns with _sidecar.discover_remote_sidecar's existing dask-path behaviour.
  • Backends consistent: CPU eager, GPU eager, metadata-only, dask metadata helper all follow the same rule.
  • NaN handling: not applicable (read-path control flow).
  • Edge cases tested: empty file, short file, gzip magic, PNG magic, plain text; plus overview_level=0 and overview_level=1.
  • Dask chunk boundaries: not applicable.
  • No premature materialization.
  • Benchmark: not applicable (no perf-sensitive change).
  • README feature matrix: not applicable (no new function).
  • Docstrings present.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 26, 2026
- Re-raise CloudSizeLimitError on the GPU eager path and the
  metadata-only _read_geo_info helper, mirroring the eager CPU path
  in _reader._read_to_array. The exception cannot fire on a local
  mmap source today, but keeping the symmetry explicit prevents a
  silent regression if a future patch widens either call site to a
  cloud source. Add CloudSizeLimitError to the module-level imports
  in __init__.py and _backends/gpu.py.
- Bump warnings.warn stacklevel from 2 to 3 so the warning location
  resolves at the user's open_geotiff / read_geotiff_dask call site
  rather than inside the library.
- Rewrite the warning text to give an actionable next step:
  "Delete the .ovr file or pass overview_level>=1 to surface the
  parse error." Replaces the indirect previous wording.
- New test pinning the CloudSizeLimitError re-raise on the
  metadata-only path so the symmetry stays covered.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 2ddafa0

All Suggestions and Nits from the prior review applied:

Fixed

  • Suggestion 1: Re-raise CloudSizeLimitError on the GPU eager path (_backends/gpu.py:421-429) and the metadata-only helper (__init__.py:292-300). Both sites now match _reader._read_to_array:244 for symmetry. CloudSizeLimitError added to module-level imports in __init__.py and _backends/gpu.py.
  • Suggestion 2: warnings.warn stacklevel bumped from 2 to 3 at all three call sites so the warning resolves at the user's open_geotiff / read_geotiff_dask call rather than inside the library.
  • Nit 1: Warning text rewritten to give an actionable next step ("Delete the .ovr file or pass overview_level>=1 to surface the parse error.").

Dismissed

  • Nit 2 (shared _gpu_available() fixture): out of scope for this PR. The duplication is in a parallel test file (test_sidecar_ovr_2112.py); consolidating it would touch unrelated tests.

New test

  • test_read_geo_info_cloud_size_limit_error_is_not_silenced pins the re-raise on the metadata-only path so the new symmetry stays covered.

Verification

  • pytest xrspatial/geotiff/tests/: 5732 passed, 68 skipped, 6 xfailed.

@brendancol brendancol merged commit 3398063 into main May 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Corrupt .ovr sidecar breaks stable base read in eager paths

1 participant