Skip to content

reproject: add bounds_policy parameter (#2187)#2199

Merged
brendancol merged 5 commits into
mainfrom
issue-2187
May 21, 2026
Merged

reproject: add bounds_policy parameter (#2187)#2199
brendancol merged 5 commits into
mainfrom
issue-2187

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2187.

Summary

  • Add bounds_policy parameter to reproject() and merge() to control the output-bounds heuristics in _compute_output_grid. Options: auto (default, matches historical behaviour), raw (no heuristic), clamp (geographic clamp only), percentile (force 2/98 fallback).
  • Emit a UserWarning when auto, clamp, or percentile actually alters the bounds. The warning names the policy and reports the per-side delta vs the raw projected extent. Silent cropping was the bug the issue flagged.
  • Plumb the parameter through reproject() and merge(). Validate unknown tokens at the API boundary. Skipped when explicit bounds is supplied.

Backend coverage

  • numpy: covered by unit tests
  • dask + numpy: covered (test_raw_policy_dask_backend)
  • cupy / dask + cupy: bounds policy lives in pure-numpy grid setup; no backend-specific code path

Test plan

  • pytest xrspatial/tests/test_reproject.py::TestBoundsPolicy -x -q (10 new tests)
  • pytest xrspatial/tests/test_reproject.py -x -q (278 tests, no regressions)
  • User guide notebook executes end-to-end against the new parameter

…2187)

_compute_output_grid silently clamped geographic bounds and fell back
to 2/98 percentile bounds when the projected extent blew up. Both
branches cropped real data without telling the caller. Add an
explicit bounds_policy parameter with four options: auto (default,
current behaviour), raw (no heuristic), clamp (geographic clamp only),
and percentile (force 2/98 fallback). When auto / clamp / percentile
actually alters the bounds, emit a UserWarning naming the policy and
reporting the per-side delta vs raw. Plumbed through reproject() and
merge(). Tests cover the four policies, the warning behaviour, an
explicit-bounds bypass, and the dask backend.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 20, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject bounds_policy parameter

Blockers

None.

Suggestions

  • xrspatial/reproject/_grid.py:230-238: the bounds_policy == "raw" branch appends four corner points to xs/ys, but when the policy is "raw" the geographic clamp at line 187 is skipped, so src_left/right/bottom/top already equal the raw values. The four extra points are duplicates of corners already in edge_xs/edge_ys. Drop the block, or move the clamp gating so the extra corners actually contribute. As written it is dead code that obscures intent.

  • xrspatial/reproject/__init__.py:1847: inside merge() the per-input _compute_output_grid call passes bounds_policy through, so each input that crosses a singularity emits its own warning. A mosaic of N near-antimeridian rasters yields N near-identical warnings. Suppress the warning during the per-input gather pass and only emit it on the final merged-bounds call at line 1864.

Nits

  • xrspatial/reproject/_grid.py:187: bounds_policy='clamp' is silently a no-op when source_crs is projected. Note that next to the "clamp" entry in the reproject() docstring so users do not assume something is happening on a UTM input.

  • xrspatial/tests/test_reproject.py::TestBoundsPolicy: no test covers bounds_policy='clamp' on a mid-latitude geographic input where the clamp runs but trims nothing. A short test would pin the silent-no-op behaviour so it does not regress into spurious warnings.

  • xrspatial/reproject/_grid.py:199-204: the if src_left >= src_right fallback resets clamp_applied = False after the clamp would have inverted the range. The condition can only fire on inputs where right is within 0.02 deg of left, which _validate_grid_params rejects upstream. The branch is unreachable on valid inputs. Either drop it or add a comment explaining why it stays.

What looks good

  • Validation lives at the API boundary in both reproject() and merge(), with the error message listing the valid tokens.
  • Warning text names the policy, the trigger, and the per-side delta vs the raw bounds. A caller can recover the uncropped bounds from the warning without re-running with 'raw'.
  • The bounds=... short-circuit at line 177 means explicit bounds skip the policy logic entirely, and test_explicit_bounds_skips_policy_logic exercises that.
  • Default value preserves historical behaviour, so no existing caller breaks.

Checklist

  • Algorithm matches reference: heuristics unchanged in shape, just gated by the new parameter.
  • All implemented backends produce consistent results: bounds policy runs in pure numpy at graph-build time.
  • NaN handling is correct: existing np.isfinite filters on transformed coords are preserved.
  • Edge cases covered by tests: invalid-token rejection, explicit-bounds bypass, benign-input no-warning, dask backend.
  • Dask chunk boundaries handled correctly: policy fires once at graph build, not per chunk.
  • No premature materialization or unnecessary copies.
  • Benchmark not needed (parameter on existing path).
  • README feature matrix not needed (no new function, no backend change).
  • Docstrings present and accurate.

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after f959b9f

Dispositions for the first-pass findings:

  • Suggestion 1 (dead 'raw' corner block in _grid.py:230-238): fixed -- removed the block. When 'raw' is set, the clamp above is skipped, so the edge samples already use the original src_*_raw values. Replaced with a clarifying comment.
  • Suggestion 2 (per-input warnings in merge()): fixed -- per-input gather now captures bounds_policy warnings, deduplicates them, and emits a single summary warning naming the count and first trigger. Non-policy warnings are re-emitted untouched.
  • Nit 1 ('clamp' docstring on projected CRS): fixed -- docstring now says it is a no-op for projected source CRSes.
  • Nit 2 (missing test for 'clamp' no-op): fixed -- added test_clamp_policy_noop_on_benign_geographic and test_clamp_policy_noop_on_projected_source. Also added test_merge_dedupes_per_input_warnings to pin the dedup behaviour.
  • Nit 3 (unreachable if src_left >= src_right branch): kept with comment -- left the defensive guard in place and added a comment that _validate_grid_params rejects the degenerate inputs that would reach it. Cheaper than carrying the dependency on upstream validation.

Tests: 13 in TestBoundsPolicy (was 10), full test_reproject.py at 281 pass (was 278).

No further findings on the diff.

@brendancol brendancol merged commit 20f474c into main May 21, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: output-grid bounds use silent heuristics that can crop real data near singularities

1 participant