Skip to content

Epic: GeoTIFF remote/source safety hardening #2344

@brendancol

Description

@brendancol

Goal

Make remote and source dispatch paths safe, bounded, and consistent before the GeoTIFF-focused release.

Remote reads and source routing are high-risk because a small dispatch mistake can bypass SSRF defenses, fetch unbounded data, or route a URL into the wrong backend.

Scope

  • HTTP/HTTPS source classification.
  • fsspec/cloud source classification.
  • File-like and local path behavior.
  • HTTP private-host rejection and DNS-rebinding pinning.
  • Bounded metadata, tile, strip, cloud, and coalesced-range reads.
  • Consistent routing across eager, dask, GPU fallback, VRT source reads, and sidecar discovery.

Known Priority Issue

Several paths use case-sensitive checks like:

source.startswith(('http://', 'https://'))

URL schemes are case-insensitive. HTTP://... and HTTPS://... should still route through _HTTPSource and receive SSRF/DNS-pinning protections, not fall through to fsspec or local handling.

Work Items

  • Centralize source scheme classification using urlparse(source).scheme.lower().
  • Replace security-relevant case-sensitive HTTP(S) checks.
  • Make _is_fsspec_uri exclude HTTP(S) case-insensitively.
  • Add regression tests for uppercase HTTP:// and HTTPS:// schemes.
  • Add private-host rejection tests that prove uppercase schemes cannot bypass _HTTPSource.
  • Audit eager, dask, GPU, VRT, and sidecar paths for source-routing divergence.
  • Verify cloud byte budget behavior for eager fsspec reads.
  • Verify dask remote reads use bounded metadata/range reads rather than full-object reads.
  • Wrap source cleanup so failed read_all() calls do not skip cleanup.

Acceptance Criteria

  • No security-relevant path uses case-sensitive HTTP(S) dispatch.
  • HTTP(S) SSRF protections apply regardless of scheme casing.
  • fsspec/cloud reads remain bounded according to documented budgets.
  • Dask remote graph construction does not fetch full remote objects.
  • Source cleanup is exception-safe around read failures.
  • Remote behavior is documented as advanced, with explicit limits and escape hatches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions