Skip to content

Reject non-positive tile_size when cog=True regardless of tiled#2316

Merged
brendancol merged 2 commits into
mainfrom
issue-2311
May 22, 2026
Merged

Reject non-positive tile_size when cog=True regardless of tiled#2316
brendancol merged 2 commits into
mainfrom
issue-2311

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2311

Summary

  • Validate tile_size at the to_geotiff boundary whenever cog=True, regardless of tiled, so non-positive values raise ValueError instead of falling through to the auto-overview loop.
  • Validate tile_size at the top of the inner _write when cog=True, before any encode work runs. This converts the prior ZeroDivisionError (tile_size=0, tiled=True) and the infinite loop (tile_size<=0, tiled=False) into one typed ValueError for direct callers of the array-level writer.
  • Tighten the auto-overview halving loop in _writer.py to require oh, ow > 0, so a future caller that bypasses both validators still cannot drive an infinite loop.

Backend coverage

Writer-side change. The eager CPU writer dispatch covers numpy and dask+numpy callers. The GPU writer already runs _validate_tile_size_arg unconditionally at entry, so the hang path never reached the COG overview loop there.

Test plan

  • pytest xrspatial/geotiff/tests/test_cog_tile_size_hang_2311.py (7 passed)
  • pytest xrspatial/geotiff/tests/test_cog_invalid_input_errors_2286.py xrspatial/geotiff/tests/test_cog.py xrspatial/geotiff/tests/test_size_param_validation_gpu_vrt_1776.py (81 passed, no regressions)
  • pytest xrspatial/geotiff/tests/ -k "not http and not gpu and not cupy" (4028 passed, no regressions)

The new regression test uses a SIGALRM-based watchdog (POSIX) so a future regression that reintroduces the hang fails the test instead of locking up the run.

`to_geotiff(..., cog=True, tiled=False, tile_size=-1)` previously hung the
writer. tile_size validation only ran when `tiled=True`, but the COG path
consumed tile_size for overview auto-generation regardless of the strip-
vs-tiled flag. With a negative tile_size the auto-overview loop in
`_writer.py:490` had `oh > tile_size and ow > tile_size` permanently true
once oh, ow halved to 0 while the inner guard suppressed appends, so the
loop spun forever.

- Validate tile_size at the to_geotiff boundary when tiled OR cog is
  true, so a non-positive value raises ValueError up front.
- Validate tile_size at the top of `_write` when cog=True, before any
  encode work runs. This converts the prior ZeroDivisionError
  (tile_size=0, tiled=True) and the infinite loop (tile_size<=0,
  tiled=False) into one typed ValueError for direct callers.
- Tighten the auto-overview loop to require oh, ow > 0.

Regression test exercises cog=True with tile_size in {-1, 0} for both
tiled flags, with a SIGALRM watchdog so any future regression fails the
test instead of hanging the run.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 22, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Reject non-positive tile_size when cog=True regardless of tiled

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  1. xrspatial/geotiff/_writer.py:522-526 -- the inner-loop defense check does not isinstance(tile_size, (int, np.integer)) or tile_size <= 0 but does not explicitly reject bool. The top-of-_write gate at line 407-414 does reject bool (or isinstance(tile_size, bool)). The two defense layers are inconsistent. Either (a) drop the redundant inner check now that the top gate covers it, or (b) add the bool reject to the inner check for parity. (b) is cheap and matches the public validator wording.

  2. xrspatial/geotiff/tests/test_cog_tile_size_hang_2311.py -- no test exercises the auto-overview-loop guard with tiled=False directly. test_writer_auto_overview_loop_rejects_non_positive_tile_size only covers tiled=True, which actually trips the top-of-_write validator at line 407 before reaching the loop. To make the inner-loop guard reachable, monkeypatch the top gate to a no-op and then assert the loop guard at line 522 fires. Without that, the inner-loop guard is uncovered. Lower priority because the top gate is the load-bearing one for users, but worth at least one test row so a future refactor that removes the top gate is caught.

Nits (optional improvements)

  1. xrspatial/geotiff/_writer.py:407-414 -- the comment block is long (~10 lines) for a single isinstance/<=0 check. Consider folding the explanation into a one-line # defense in depth: #2311 and let the body comment carry the detail. Pure style preference; existing geotiff code mixes both conventions.

  2. xrspatial/geotiff/tests/test_cog_tile_size_hang_2311.py:88 -- import warnings inside the function body works but is non-idiomatic for this repo (the rest of the file imports at module scope). Move to module scope.

  3. xrspatial/geotiff/tests/test_cog_tile_size_hang_2311.py -- no test covers tile_size=None, tile_size=128.0, or tile_size=True. The top-of-_write gate rejects all three; a one-row parametrization would close that gap.

What looks good

  • Two-layer fix: public-boundary rejection in to_geotiff plus inner-writer rejection in _write, with comments cross-referencing the issue.
  • SIGALRM watchdog on every test so a regression that reintroduces the hang fails the test instead of locking up CI.
  • Existing GPU writer path already runs _validate_tile_size_arg unconditionally; no GPU-side change needed.
  • Sanity test confirms cog=False, tiled=False, tile_size=-1 is still accepted (matches the "ignored" warning shape) so the fix does not overcorrect into breaking existing strip-layout callers.
  • Test file message wording asserts both tile_size and positive substrings, which matches the existing _validate_tile_size convention from #1776/#2301.

Checklist

  • Algorithm matches reference/paper (N/A -- validation-only)
  • All implemented backends produce consistent results (GPU path was already validated)
  • NaN handling is correct (N/A)
  • Edge cases are covered by tests (tile_size in {-1, 0}; minor gap on tile_size=None/float/bool)
  • Dask chunk boundaries handled correctly (N/A)
  • No premature materialization or unnecessary copies (N/A)
  • Benchmark exists or is not needed (not needed -- validation)
  • README feature matrix updated (N/A -- no new public function)
  • Docstrings present and accurate (no signature change)

Suggestion 1 (eager.py:407 vs _writer.py:522 inconsistency): align the
inner-loop defense check at _writer.py:522 with the top-of-_write gate
by adding the bool reject. Both layers now use the same isinstance +
bool + <=0 trinity, matching the _validate_tile_size convention.

Suggestion 2 (inner-guard coverage): add a constants-inspection test
that pins the inner-overview guard's error message literal so a future
refactor that removes the loop-side defense fails the test even when
the top gate at line 407 still catches the bad input at runtime.

Nit 2 (import scope): move `import warnings` to module scope in the
test file to match the rest of the imports.

Nit 3 (non-int tile_size coverage): add parametrized test rows for
tile_size in {None, 128.0, True, False}, all of which the public
boundary rejects with a typed ValueError that names tile_size.

Nit 1 (long defense-in-depth comment) dismissed: the verbose block
matches the geotiff module's prevailing comment style (#2301, #2216,
#1987 PR 3) and removing it would diverge from the surrounding code.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (post-fixes)

Disposition of first-pass findings

  • Suggestion 1 (bool parity at _writer.py:522): fixed in commit a1767fa. Both defense layers now use the same isinstance + bool + <=0 trinity.
  • Suggestion 2 (inner-loop guard coverage): fixed via test_inner_overview_loop_guard_message_is_pinned. The constants-inspection approach pins the literal without requiring an invasive monkeypatch of the top gate. Reaching the inner guard through a real _write call requires either patching the top gate's inline isinstance/bool/<=0 check (which has no helper to monkeypatch) or duplicating most of _write in the test. The constants pin catches the case the second-line defense exists to catch (someone deletes it), without that complexity.
  • Nit 1 (long comment block): dismissed. The verbose comment matches the prevailing geotiff style (#2301, #2216, #1987 PR 3 all use ~10-line context comments on validation gates). Trimming would diverge from the surrounding code.
  • Nit 2 (import scope): fixed. import warnings is now at module scope.
  • Nit 3 (non-int tile_size rows): fixed. Added parametrized rows for tile_size in {None, 128.0, True, False}, all passing.

Second-pass findings

None. The follow-up commit is focused and the test count is now 12 passing in 0.09s.

@brendancol brendancol merged commit 35a0616 into main May 22, 2026
7 checks passed
@brendancol brendancol deleted the issue-2311 branch May 27, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_geotiff(cog=True, tile_size<=0) hangs when tiled=False

1 participant