Skip to content

Reject uppercase HTTP(S) schemes in geotiff URL dispatch (#2323)#2326

Merged
brendancol merged 2 commits into
mainfrom
issue-2323
May 23, 2026
Merged

Reject uppercase HTTP(S) schemes in geotiff URL dispatch (#2323)#2326
brendancol merged 2 commits into
mainfrom
issue-2323

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2323.

URL schemes are case-insensitive per RFC 3986, but the geotiff reader's
dispatch helpers used case-sensitive startswith(('http://', 'https://'))
checks. An uppercase HTTP://127.0.0.1/foo.tif slipped past _HTTPSource
(and its SSRF allow-list and pinned DNS from #1664 / #1846) and landed on
the fsspec branch via _is_fsspec_uri.

Summary

  • Add _is_http_url in _sources.py that runs the source through urlparse
    once and case-folds the scheme.
  • Switch every dispatch site to use it: _reader.read_to_array, the dask
    backend HTTP/fsspec branch and per-tile source-class check, _open_source,
    _is_fsspec_uri (reader and writer copies), _sidecar._is_http_url, and
    both GPU-backend routing points.
  • Add test_uppercase_scheme_ssrf_2323.py with case-insensitive coverage
    for the helpers and for the dispatcher itself.

Backend coverage

No backend logic changed. The fix lives in scheme-detection helpers that
sit upstream of every backend (numpy, dask, gpu), so all four backends
benefit automatically.

Test plan

  • pytest xrspatial/geotiff/tests/test_uppercase_scheme_ssrf_2323.py (30 passed)
  • pytest xrspatial/geotiff/tests/test_ssrf_hardening_1664.py xrspatial/geotiff/tests/test_dns_rebinding_pin_issue_1846.py (56 passed)
  • Full pytest xrspatial/geotiff/tests/ (5289 passed, 68 skipped)

URL schemes are case-insensitive per RFC 3986. The geotiff reader's
dispatch helpers used case-sensitive `startswith(('http://', 'https://'))`
checks, so an uppercase `HTTP://127.0.0.1/foo.tif` would skip
`_HTTPSource` (and its SSRF allow-list + pinned DNS from #1664 / #1846)
and route to the fsspec branch via `_is_fsspec_uri`.

Add `_is_http_url` in `_sources.py` that parses the URL once and
case-folds the scheme. Switch every dispatch site to use it:

- `_reader.read_to_array` (line 145)
- `_backends/dask` HTTP/fsspec branch (line 197) and the inner per-tile
  source-class check (line 580)
- `_sources._is_fsspec_uri` (line 1454)
- `_sources._open_source` (line 1639)
- `_sidecar._is_http_url` (delegates to canonical helper)
- `_writer._is_fsspec_uri` (delegates to canonical helper)
- `_backends/gpu` eager path (line 350) and KvikIO eligibility check
  (line 1056)

Add a focused regression test module (`test_uppercase_scheme_ssrf_2323`)
that:

- pins `_is_http_url` and `_is_fsspec_uri` against case-insensitive
  variants (`HTTP`, `Http`, `hTTpS`)
- confirms `_open_source('HTTP://127.0.0.1/...')` and an uppercase URL
  pointing at the cloud-metadata IP (169.254.169.254) raise
  `UnsafeURLError` via the SSRF validator
- confirms `read_to_array('HTTP://...')` reaches the `_read_cog_http`
  path (and not the fsspec `_CloudSource` path)
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 23, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Reject uppercase HTTP(S) schemes in geotiff URL dispatch (#2323)

Blockers

None.

Suggestions

  • Add a dask-backend dispatch test. xrspatial/geotiff/_backends/dask.py:197-199 and the per-tile branch around line 575-583 are touched here, but no test exercises the dask path for an uppercase URL. test_uppercase_scheme_ssrf_2323.py covers _open_source, read_to_array, and the helpers -- not _read_geotiff_dask. A small monkeypatched test (stub _parse_cog_http_meta like the existing dask-HTTP coalesce tests do) confirming an HTTP://example.com/foo.tif source lands on the http branch would close the gap.

  • GPU _can_use_kvikio lost its outside-the-try http rejection. At xrspatial/geotiff/_backends/gpu.py:1057-1064 the old code rejected http URLs before the try block that imported _is_fsspec_uri. Now both checks live inside the same try: ... except Exception: pass. If the _reader import ever fails (unlikely once the package is loaded, but the original code clearly cared), the function silently falls through to the kvikio eligibility check instead of refusing the HTTP URL. Either move _is_http_url back outside the try, or drop the try and let an import error propagate -- if _reader won't import, the kvikio path isn't the right thing to fix.

Nits

  • urlparse import inside _is_http_url. xrspatial/geotiff/_sources.py:1464 imports urlparse on every call. Python caches it, so the cost is a dict lookup, but the helper runs per-chunk in dask paths. Module-scope import would match _validate_http_url's neighbour code and read a little cleaner. Either way is fine.

  • _sidecar._is_http_url and _writer._is_fsspec_uri indirection. Both now import-and-delegate inside the function body. A module-scope import would be tidier. Same caching argument.

What looks good

  • _is_http_url is the right shape: urlparse + scheme.lower(), matching the pattern _validate_http_url already uses.
  • Coverage of the bug surface is thorough -- every grep hit for case-sensitive startswith outside tests has been converted.
  • The regression test parameterizes the case-insensitive variants (HTTP, Http, hTTpS) against both the helpers and the dispatcher, including a loopback case and the cloud-metadata IP (169.254.169.254).
  • The read_to_array test stubs _read_cog_http rather than making a network call, so it stays fast and deterministic.
  • test_ssrf_hardening_1664 and test_dns_rebinding_pin_issue_1846 still pass, so the lowercase path is unchanged.
  • Full geotiff suite: 5289 passed, 68 skipped.

Checklist

  • Behaviour matches the cited fix direction (urlparse + lowercase scheme)
  • No backend-specific logic changed (helper sits upstream of all backends)
  • [N/A] NaN handling -- not relevant to URL dispatch
  • Edge cases covered (mixed-case schemes, public IP, loopback, link-local metadata IP)
  • [N/A] Dask chunk boundaries -- not relevant
  • [N/A] Premature materialization -- not relevant
  • [N/A] Benchmark -- pure correctness fix
  • [N/A] README feature matrix -- no new public API
  • Docstrings on the new helper

Follow-ups for PR #2326:

- Add `TestDaskBackendUppercaseDispatch` covering
  `_read_geotiff_dask`'s own http/fsspec split. Stubs `_HTTPSource` /
  `_CloudSource` and asserts an uppercase `HTTP://` URL constructs
  `_HTTPSource`, an uppercase `S3://` URL constructs `_CloudSource`,
  and the lowercase http case still works.

- `_can_use_kvikio`: move the `_is_http_url` check outside the
  `try/except Exception: pass` block. Inside the try, a hidden import
  failure would silently let an HTTP URL through to the kvikio branch
  (which opens the path as a local file). The fsspec check stays in
  the try because the rest of the kvikio eligibility logic should not
  break on a recoverable import error there.

- Lift `urlparse` to a module-scope import in `_sources.py` so the
  helper does not re-resolve the import on every per-chunk dispatch.

- Move the `_is_http_url` import in `_sidecar.py` and `_writer.py` to
  module scope. The function bodies now just call the canonical helper.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up): Reject uppercase HTTP(S) schemes in geotiff URL dispatch (#2323)

Covers commit a73226ee on top of 8e2a9d69.

Blockers

None.

Suggestions

None remaining. All four items from the first pass are addressed:

  1. Dask-backend dispatch test (TestDaskBackendUppercaseDispatch in test_uppercase_scheme_ssrf_2323.py) -- stubs _HTTPSource and _CloudSource, asserts an uppercase HTTP:// URL constructs the http source, an uppercase S3:// URL still goes to the cloud source, and the lowercase http case still works. 3 new test methods.
  2. _can_use_kvikio (gpu.py:1057-1064) -- the _is_http_url check now sits outside the try/except Exception: pass. The fsspec check stays inside the try, which is the right split: an http URL must never reach the kvikio branch, but a transient import problem on the fsspec side should not poison kvikio eligibility for plain local files.
  3. urlparse lifted to module scope in _sources.py:34.
  4. _sidecar._is_http_url and _writer._is_fsspec_uri now import the canonical helper at module scope.

Nits

None.

What looks good

  • The dask test uses sentinel exceptions on _HTTPSource / _CloudSource construction, so dispatch is observable without standing up a mock HTTP server or fsspec backend. Fast and explicit.
  • The S3:// counter-test confirms the fix did not over-correct -- fsspec URIs still go to fsspec.
  • Module-scope imports drop the per-call dict lookup on the dispatch hot path.

Test status

  • pytest xrspatial/geotiff/tests/test_uppercase_scheme_ssrf_2323.py -- 33 passed
  • pytest xrspatial/geotiff/tests/ -- 5292 passed, 68 skipped

Recommendation

Ready to merge once CI is green.

@brendancol brendancol merged commit 402f085 into main May 23, 2026
7 checks passed
brendancol added a commit that referenced this pull request May 24, 2026
Reconciles the case-insensitive HTTP(S) dispatch helper introduced on
both branches:

* main #2326 added ``_is_http_url`` in ``_sources.py`` and routed the
  geotiff dispatch sites through it.
* PR #2337 added the same fix as ``_is_http_source`` with broader test
  coverage, hoisted imports, and a typed ``NotImplementedError`` for
  HTTP(S) write targets in ``_write_bytes``.

Resolution keeps ``_is_http_source`` as the canonical name (used by
this PR's tests and the writer-side guard) and adds an ``_is_http_url``
alias in ``_sources.py``, plus a re-export from ``_reader.py``, so the
test file shipped with #2326 (``test_uppercase_scheme_ssrf_2323.py``)
keeps importing ``_is_http_url`` unchanged. The ``_sidecar.py`` wrapper
``_is_http_url`` now delegates to ``_is_http_source`` directly instead
of through the canonical alias.

Full geotiff suite: 5413 passed, 64 skipped, 1 xfailed, 1 xpassed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP(S) SSRF hardening bypassable with uppercase schemes

1 participant