reproject: validate source coords are monotonic and regularly spaced (#2184)#2192
Merged
Conversation
…2184) `_source_bounds()` infers pixel resolution from only the first two samples on each axis, but downstream pixel math uses that single value for the whole grid. Irregular or non-monotonic coords silently produced wrong georeferencing. Add `_validate_source_coords()` at the top of `reproject()` and `merge()` (before CRS resolution or grid math). For each spatial axis it checks: - All values are finite. - Steps are strictly monotonic (all ascending or all descending). - |diff - median(diff)| <= 1e-6 * |median(diff)| for every step. On failure it raises `ValueError` naming the axis, the median step, the worst deviation, the offending index, and the two coord values at that index. Sub-ULP drift from real-world GeoTIFFs still passes. Tests cover regular ascending/descending coords, sub-ULP drift, single- pixel rasters, single-sample perturbation in x and y, non-monotonic coords, repeated values, NaN in coords, validation order vs CRS detection, merge() per-raster reporting, and dask / cupy / dask+cupy backends. Closes #2184
brendancol
commented
May 20, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: reproject: validate source coords are monotonic and regularly spaced (#2184)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
xrspatial/reproject/__init__.py:140- the strict-monotonicity check usesnp.all(diffs > 0) or np.all(diffs < 0). For a 2-element coord with identical values,diffs = [0.0]falls through both branches and gives the right error. A short inline comment would stop a future reader from "simplifying" it to a sign check that accepts zero steps.xrspatial/tests/test_reproject.py:2010-test_reproject_rejects_repeated_coordmatches the loose patternr"strictly". Tightening tor"must be strictly monotonic"would catch a future reword that drops the word "strictly" elsewhere in the sentence.xrspatial/reproject/__init__.py:138-arr.astype(np.float64)always copies, even whenarris already float64.np.asarray(arr, dtype=np.float64)skips the copy in that case. Coord arrays are small so the cost is negligible either way.
What looks good
- Validation runs at the very top of
reproject()andmerge(), before CRS resolution or any grid math. The dedicated ordering test (test_reproject_rejects_irregular_before_crs_resolution) locks that in. - The error message names the axis, the median step, the worst absolute deviation, the offending step index, and the two coord values at that index. Enough to find the bad sample without re-running diff.
merge()runs the check per raster and tags the error withrasters[i], so callers know which input failed.- The 1e-6 relative tolerance is loose enough to accept sub-ULP drift from real GeoTIFFs (covered by
test_reproject_accepts_tiny_floating_drift) and tight enough to reject the 0.1 single-sample perturbation in the issue reproducer. - All four backends are covered: numpy (implicit), dask, cupy, dask+cupy. The validator runs on
coords.values, which is materialized NumPy regardless of the data array backend, so backend parity is structural. - Single-pixel rasters skip the check rather than falling into an
np.diffon a zero-length array. - Non-finite check uses
np.isfinite.
Checklist
- Algorithm matches the spec in the issue.
- All implemented backends produce consistent results.
- NaN handling is correct.
- Edge cases covered by tests.
- Dask chunk boundaries handled correctly.
- No premature materialization or unnecessary copies.
- Benchmark not needed (defensive validation, not a perf-critical path).
- README feature matrix update not applicable (no new public API).
- Docstrings present and accurate.
#2184) - Explain why the strict-monotonicity check uses two separate comparisons instead of a single sign test, so a future refactor does not accidentally accept zero steps from repeated coords. - Switch `arr.astype(np.float64)` to `np.asarray(arr, dtype=np.float64)` to skip the copy when coords are already float64. - Tighten `test_reproject_rejects_repeated_coord` to match the full "must be strictly monotonic" phrase instead of just "strictly".
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2184
Summary
_source_bounds()inferred pixel resolution from only the first two coord samples and reused that single value for the whole grid, so irregular or non-monotonic x/y silently produced wrong georeferencing._validate_source_coords()and called it at the top ofreproject()andmerge()before any CRS resolution or grid math. Per axis it checks finiteness, strict monotonicity, and that every step is within1e-6 * |median(diff)|of the median. Failures raiseValueErrornaming the axis, median step, worst deviation, offending index, and the two coord values at that index. Sub-ULP drift from real GeoTIFFs still passes.merge()runs the check per raster so the message reports which input failed.Backend coverage
Validation is pure NumPy on
coords.valuesand runs before backend dispatch, so it fires identically for numpy, cupy, dask+numpy, and dask+cupy inputs. Tests cover all four.Test plan
merge()reports which raster is bad.