Wrap eager source lifetime around read_all() (#2322)#2325
Merged
Conversation
``_read_to_array`` constructed a source (``_FileSource``, ``_BytesIOSource``, or ``_CloudSource``) and immediately called ``src.read_all()`` BEFORE entering the ``try/finally`` block that calls ``src.close()``. If ``read_all()`` raised mid-read, the exception propagated up and ``src.close()`` was never called. Move the ``try`` to start right after ``src`` is constructed and pull ``src.read_all()`` inside the protected region so cleanup runs even when the eager read fails. ``_CloudSource.close()`` is a no-op today, so this is a structural guard rather than a fix for an observable leak. It mirrors the close-on-error contract that ``_read_cog_http`` already enforces (issue #1816), and prevents a future resource-holding source from leaking state on the failure path.
brendancol
commented
May 23, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Wrap eager source lifetime around read_all() (#2322)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
xrspatial/geotiff/_reader.py:160—_CloudSource(source)itself can
raise (e.g. an fsspec failure when fetchingsize), and that raise
still happens outside the new try/finally. Today the partially
constructed_CloudSourcehas no state to clean up, but the same
reasoning that motivates this PR applies: a future stateful
constructor would leak. Out of scope for this PR (the constructor
guard is a separate concern), but worth noting in a follow-up.
What looks good
- The fix is the minimal correct one:
trynow starts immediately
aftersrcis constructed, withsrc.read_all()inside.
finally: src.close()already existed and now coversread_all()
too. - Comment block above the
tryexplains the why, links to issue
#1816 (the analogous_read_cog_httpfix), and references #2322. - The pre-existing
CloudSizeLimitErrorpaths at lines 167 and 175
still callsrc.close()before raising, outside the new try.
That's fine — they already had their own cleanup. No double-close
risk because they raise immediately after. - Tests cover all three source types (
_FileSource,_BytesIOSource,
_CloudSource) by patching the constructor to return a fake whose
read_all()raises. The test file follows the same pattern as
test_cog_http_close_on_error_1816.py. - I verified the tests fail without the production change and pass
with it, so they actually pin the contract. - Temp filenames include the issue number (
tmp_2322_cleanup_file.tif).
Checklist
- Algorithm matches reference/paper — N/A (cleanup correctness fix)
- All implemented backends produce consistent results — N/A
- NaN handling is correct — N/A
- Edge cases are covered by tests — all three source-type branches
- Dask chunk boundaries handled correctly — N/A
- No premature materialization or unnecessary copies — N/A
- Benchmark exists or is not needed — not needed
- README feature matrix updated — N/A
- Docstrings present and accurate — module docstring on test file
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2322.
Summary
_read_to_arrayused to callsrc.read_all()BEFORE entering thetry/finallyblock that callssrc.close(). Any exception fromread_all()(fsspec network failure, transient S3 error, local I/Ofailure) would skip the cleanup.
tryto start right aftersrcis constructedand pulls
src.read_all()inside the protected region. Mirrors theclose-on-error contract
_read_cog_httpalready enforces (geotiff: _read_cog_http skips source.close() when tile fetch raises #1816)._CloudSource.close()is a no-op today, so this is a structuralguard rather than a fix for an observable leak. The point is to keep
the cleanup intact before any future resource-holding source
(pinned credentials, persistent fsspec sessions, cached file
handles) gets added.
Backend coverage
N/A. Pure correctness fix in the eager-read entry point. No backend
dispatch involved.
Test plan
xrspatial/geotiff/tests/test_eager_source_close_on_error_2322.pycover the
_FileSource,_BytesIOSource, and_CloudSourcebranches by injecting fake sources whose
read_all()raisesand asserting
close()was still called.(
test_bytesio_source,test_cog_http_close_on_error_1816,test_cloud_read_byte_limit_1928,test_open_geotiff_missing_sources_1810— 40 tests).