Skip to content

reproject: validate source coords are monotonic and regularly spaced (#2184)#2192

Merged
brendancol merged 4 commits into
mainfrom
issue-2184
May 20, 2026
Merged

reproject: validate source coords are monotonic and regularly spaced (#2184)#2192
brendancol merged 4 commits into
mainfrom
issue-2184

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2184

Summary

  • _source_bounds() inferred pixel resolution from only the first two coord samples and reused that single value for the whole grid, so irregular or non-monotonic x/y silently produced wrong georeferencing.
  • Added _validate_source_coords() and called it at the top of reproject() and merge() before any CRS resolution or grid math. Per axis it checks finiteness, strict monotonicity, and that every step is within 1e-6 * |median(diff)| of the median. Failures raise ValueError naming the axis, median step, worst deviation, offending index, and the two coord values at that index. Sub-ULP drift from real GeoTIFFs still passes.
  • merge() runs the check per raster so the message reports which input failed.

Backend coverage

Validation is pure NumPy on coords.values and runs before backend dispatch, so it fires identically for numpy, cupy, dask+numpy, and dask+cupy inputs. Tests cover all four.

Test plan

  • Regular ascending and descending coords pass.
  • Sub-ULP drift from a typical GeoTIFF affine still passes.
  • Single-pixel rasters pass (nothing to validate).
  • One perturbed sample in x raises with axis, median step, deviation, index, and surrounding coord values.
  • One perturbed sample in y raises with the same message shape.
  • Non-monotonic coords raise.
  • Repeated values raise.
  • NaN in coords raises.
  • Validation fires before CRS resolution.
  • merge() reports which raster is bad.
  • dask, cupy, and dask+cupy backends all raise from the same path.

…2184)

`_source_bounds()` infers pixel resolution from only the first two
samples on each axis, but downstream pixel math uses that single value
for the whole grid. Irregular or non-monotonic coords silently produced
wrong georeferencing.

Add `_validate_source_coords()` at the top of `reproject()` and `merge()`
(before CRS resolution or grid math). For each spatial axis it checks:

- All values are finite.
- Steps are strictly monotonic (all ascending or all descending).
- |diff - median(diff)| <= 1e-6 * |median(diff)| for every step.

On failure it raises `ValueError` naming the axis, the median step, the
worst deviation, the offending index, and the two coord values at that
index. Sub-ULP drift from real-world GeoTIFFs still passes.

Tests cover regular ascending/descending coords, sub-ULP drift, single-
pixel rasters, single-sample perturbation in x and y, non-monotonic
coords, repeated values, NaN in coords, validation order vs CRS
detection, merge() per-raster reporting, and dask / cupy / dask+cupy
backends.

Closes #2184
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 20, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject: validate source coords are monotonic and regularly spaced (#2184)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/reproject/__init__.py:140 - the strict-monotonicity check uses np.all(diffs > 0) or np.all(diffs < 0). For a 2-element coord with identical values, diffs = [0.0] falls through both branches and gives the right error. A short inline comment would stop a future reader from "simplifying" it to a sign check that accepts zero steps.
  • xrspatial/tests/test_reproject.py:2010 - test_reproject_rejects_repeated_coord matches the loose pattern r"strictly". Tightening to r"must be strictly monotonic" would catch a future reword that drops the word "strictly" elsewhere in the sentence.
  • xrspatial/reproject/__init__.py:138 - arr.astype(np.float64) always copies, even when arr is already float64. np.asarray(arr, dtype=np.float64) skips the copy in that case. Coord arrays are small so the cost is negligible either way.

What looks good

  • Validation runs at the very top of reproject() and merge(), before CRS resolution or any grid math. The dedicated ordering test (test_reproject_rejects_irregular_before_crs_resolution) locks that in.
  • The error message names the axis, the median step, the worst absolute deviation, the offending step index, and the two coord values at that index. Enough to find the bad sample without re-running diff.
  • merge() runs the check per raster and tags the error with rasters[i], so callers know which input failed.
  • The 1e-6 relative tolerance is loose enough to accept sub-ULP drift from real GeoTIFFs (covered by test_reproject_accepts_tiny_floating_drift) and tight enough to reject the 0.1 single-sample perturbation in the issue reproducer.
  • All four backends are covered: numpy (implicit), dask, cupy, dask+cupy. The validator runs on coords.values, which is materialized NumPy regardless of the data array backend, so backend parity is structural.
  • Single-pixel rasters skip the check rather than falling into an np.diff on a zero-length array.
  • Non-finite check uses np.isfinite.

Checklist

  • Algorithm matches the spec in the issue.
  • All implemented backends produce consistent results.
  • NaN handling is correct.
  • Edge cases covered by tests.
  • Dask chunk boundaries handled correctly.
  • No premature materialization or unnecessary copies.
  • Benchmark not needed (defensive validation, not a perf-critical path).
  • README feature matrix update not applicable (no new public API).
  • Docstrings present and accurate.

#2184)

- Explain why the strict-monotonicity check uses two separate
  comparisons instead of a single sign test, so a future refactor
  does not accidentally accept zero steps from repeated coords.
- Switch `arr.astype(np.float64)` to `np.asarray(arr, dtype=np.float64)`
  to skip the copy when coords are already float64.
- Tighten `test_reproject_rejects_repeated_coord` to match the full
  "must be strictly monotonic" phrase instead of just "strictly".
@brendancol brendancol merged commit 9e189e5 into main May 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: irregular x/y coordinates accepted but interpreted as a regular grid

1 participant