Reject uppercase HTTP(S) schemes in geotiff URL dispatch (#2323)#2326
Conversation
URL schemes are case-insensitive per RFC 3986. The geotiff reader's
dispatch helpers used case-sensitive `startswith(('http://', 'https://'))`
checks, so an uppercase `HTTP://127.0.0.1/foo.tif` would skip
`_HTTPSource` (and its SSRF allow-list + pinned DNS from #1664 / #1846)
and route to the fsspec branch via `_is_fsspec_uri`.
Add `_is_http_url` in `_sources.py` that parses the URL once and
case-folds the scheme. Switch every dispatch site to use it:
- `_reader.read_to_array` (line 145)
- `_backends/dask` HTTP/fsspec branch (line 197) and the inner per-tile
source-class check (line 580)
- `_sources._is_fsspec_uri` (line 1454)
- `_sources._open_source` (line 1639)
- `_sidecar._is_http_url` (delegates to canonical helper)
- `_writer._is_fsspec_uri` (delegates to canonical helper)
- `_backends/gpu` eager path (line 350) and KvikIO eligibility check
(line 1056)
Add a focused regression test module (`test_uppercase_scheme_ssrf_2323`)
that:
- pins `_is_http_url` and `_is_fsspec_uri` against case-insensitive
variants (`HTTP`, `Http`, `hTTpS`)
- confirms `_open_source('HTTP://127.0.0.1/...')` and an uppercase URL
pointing at the cloud-metadata IP (169.254.169.254) raise
`UnsafeURLError` via the SSRF validator
- confirms `read_to_array('HTTP://...')` reaches the `_read_cog_http`
path (and not the fsspec `_CloudSource` path)
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Reject uppercase HTTP(S) schemes in geotiff URL dispatch (#2323)
Blockers
None.
Suggestions
-
Add a dask-backend dispatch test.
xrspatial/geotiff/_backends/dask.py:197-199and the per-tile branch around line 575-583 are touched here, but no test exercises the dask path for an uppercase URL.test_uppercase_scheme_ssrf_2323.pycovers_open_source,read_to_array, and the helpers -- not_read_geotiff_dask. A small monkeypatched test (stub_parse_cog_http_metalike the existing dask-HTTP coalesce tests do) confirming anHTTP://example.com/foo.tifsource lands on the http branch would close the gap. -
GPU
_can_use_kvikiolost its outside-the-try http rejection. Atxrspatial/geotiff/_backends/gpu.py:1057-1064the old code rejected http URLs before thetryblock that imported_is_fsspec_uri. Now both checks live inside the sametry: ... except Exception: pass. If the_readerimport ever fails (unlikely once the package is loaded, but the original code clearly cared), the function silently falls through to the kvikio eligibility check instead of refusing the HTTP URL. Either move_is_http_urlback outside the try, or drop the try and let an import error propagate -- if_readerwon't import, the kvikio path isn't the right thing to fix.
Nits
-
urlparseimport inside_is_http_url.xrspatial/geotiff/_sources.py:1464importsurlparseon every call. Python caches it, so the cost is a dict lookup, but the helper runs per-chunk in dask paths. Module-scope import would match_validate_http_url's neighbour code and read a little cleaner. Either way is fine. -
_sidecar._is_http_urland_writer._is_fsspec_uriindirection. Both now import-and-delegate inside the function body. A module-scope import would be tidier. Same caching argument.
What looks good
_is_http_urlis the right shape:urlparse+scheme.lower(), matching the pattern_validate_http_urlalready uses.- Coverage of the bug surface is thorough -- every grep hit for case-sensitive
startswithoutside tests has been converted. - The regression test parameterizes the case-insensitive variants (
HTTP,Http,hTTpS) against both the helpers and the dispatcher, including a loopback case and the cloud-metadata IP (169.254.169.254). - The
read_to_arraytest stubs_read_cog_httprather than making a network call, so it stays fast and deterministic. test_ssrf_hardening_1664andtest_dns_rebinding_pin_issue_1846still pass, so the lowercase path is unchanged.- Full geotiff suite: 5289 passed, 68 skipped.
Checklist
- Behaviour matches the cited fix direction (urlparse + lowercase scheme)
- No backend-specific logic changed (helper sits upstream of all backends)
- [N/A] NaN handling -- not relevant to URL dispatch
- Edge cases covered (mixed-case schemes, public IP, loopback, link-local metadata IP)
- [N/A] Dask chunk boundaries -- not relevant
- [N/A] Premature materialization -- not relevant
- [N/A] Benchmark -- pure correctness fix
- [N/A] README feature matrix -- no new public API
- Docstrings on the new helper
Follow-ups for PR #2326: - Add `TestDaskBackendUppercaseDispatch` covering `_read_geotiff_dask`'s own http/fsspec split. Stubs `_HTTPSource` / `_CloudSource` and asserts an uppercase `HTTP://` URL constructs `_HTTPSource`, an uppercase `S3://` URL constructs `_CloudSource`, and the lowercase http case still works. - `_can_use_kvikio`: move the `_is_http_url` check outside the `try/except Exception: pass` block. Inside the try, a hidden import failure would silently let an HTTP URL through to the kvikio branch (which opens the path as a local file). The fsspec check stays in the try because the rest of the kvikio eligibility logic should not break on a recoverable import error there. - Lift `urlparse` to a module-scope import in `_sources.py` so the helper does not re-resolve the import on every per-chunk dispatch. - Move the `_is_http_url` import in `_sidecar.py` and `_writer.py` to module scope. The function bodies now just call the canonical helper.
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up): Reject uppercase HTTP(S) schemes in geotiff URL dispatch (#2323)
Covers commit a73226ee on top of 8e2a9d69.
Blockers
None.
Suggestions
None remaining. All four items from the first pass are addressed:
- Dask-backend dispatch test (
TestDaskBackendUppercaseDispatchintest_uppercase_scheme_ssrf_2323.py) -- stubs_HTTPSourceand_CloudSource, asserts an uppercaseHTTP://URL constructs the http source, an uppercaseS3://URL still goes to the cloud source, and the lowercase http case still works. 3 new test methods. _can_use_kvikio(gpu.py:1057-1064) -- the_is_http_urlcheck now sits outside thetry/except Exception: pass. The fsspec check stays inside the try, which is the right split: an http URL must never reach the kvikio branch, but a transient import problem on the fsspec side should not poison kvikio eligibility for plain local files.urlparselifted to module scope in_sources.py:34._sidecar._is_http_urland_writer._is_fsspec_urinow import the canonical helper at module scope.
Nits
None.
What looks good
- The dask test uses sentinel exceptions on
_HTTPSource/_CloudSourceconstruction, so dispatch is observable without standing up a mock HTTP server or fsspec backend. Fast and explicit. - The
S3://counter-test confirms the fix did not over-correct -- fsspec URIs still go to fsspec. - Module-scope imports drop the per-call dict lookup on the dispatch hot path.
Test status
pytest xrspatial/geotiff/tests/test_uppercase_scheme_ssrf_2323.py-- 33 passedpytest xrspatial/geotiff/tests/-- 5292 passed, 68 skipped
Recommendation
Ready to merge once CI is green.
Reconciles the case-insensitive HTTP(S) dispatch helper introduced on both branches: * main #2326 added ``_is_http_url`` in ``_sources.py`` and routed the geotiff dispatch sites through it. * PR #2337 added the same fix as ``_is_http_source`` with broader test coverage, hoisted imports, and a typed ``NotImplementedError`` for HTTP(S) write targets in ``_write_bytes``. Resolution keeps ``_is_http_source`` as the canonical name (used by this PR's tests and the writer-side guard) and adds an ``_is_http_url`` alias in ``_sources.py``, plus a re-export from ``_reader.py``, so the test file shipped with #2326 (``test_uppercase_scheme_ssrf_2323.py``) keeps importing ``_is_http_url`` unchanged. The ``_sidecar.py`` wrapper ``_is_http_url`` now delegates to ``_is_http_source`` directly instead of through the canonical alias. Full geotiff suite: 5413 passed, 64 skipped, 1 xfailed, 1 xpassed.
Closes #2323.
URL schemes are case-insensitive per RFC 3986, but the geotiff reader's
dispatch helpers used case-sensitive
startswith(('http://', 'https://'))checks. An uppercase
HTTP://127.0.0.1/foo.tifslipped past_HTTPSource(and its SSRF allow-list and pinned DNS from #1664 / #1846) and landed on
the fsspec branch via
_is_fsspec_uri.Summary
_is_http_urlin_sources.pythat runs the source throughurlparseonce and case-folds the scheme.
_reader.read_to_array, the daskbackend HTTP/fsspec branch and per-tile source-class check,
_open_source,_is_fsspec_uri(reader and writer copies),_sidecar._is_http_url, andboth GPU-backend routing points.
test_uppercase_scheme_ssrf_2323.pywith case-insensitive coveragefor the helpers and for the dispatcher itself.
Backend coverage
No backend logic changed. The fix lives in scheme-detection helpers that
sit upstream of every backend (numpy, dask, gpu), so all four backends
benefit automatically.
Test plan
pytest xrspatial/geotiff/tests/test_uppercase_scheme_ssrf_2323.py(30 passed)pytest xrspatial/geotiff/tests/test_ssrf_hardening_1664.py xrspatial/geotiff/tests/test_dns_rebinding_pin_issue_1846.py(56 passed)pytest xrspatial/geotiff/tests/(5289 passed, 68 skipped)