Improve robustness in match_candidate_items_to_windows when re-projection is needed#541
Improve robustness in match_candidate_items_to_windows when re-projection is needed#541
Conversation
favyen2
commented
Mar 3, 2026
- Refactor safely_reproject_within_valid_area and use it when matching items to windows.
- Call shapely.make_valid in STGeometry.to_projection to handle edge cases.
…tion is needed. - Refactor safely_reproject_within_valid_area and use it when matching items to windows. - Call shapely.make_valid in STGeometry.to_projection to handle edge cases.
|
@claude can you review this PR? |
|
Claude finished @favyen2's task in 2m 30s —— View job PR Review
Overall this is a well-motivated PR. The core idea — clip in WGS84 first, then reproject — is more robust than the old approach of either naively reprojecting large geometries or using the Correctness1. Behavioral change in The old In # reproj may extend ~1 degree outside bounds
reprojected_features.append(Feature(reproj, feat.properties))If downstream callers expect feature geometries to lie within the queried bounds (e.g., for rendering or spatial joins), this could silently produce wrong results. The removed
2.
if src_geom.projection == valid_geom.projection:
results.append(src_geom) # returned even if it doesn't intersect valid_geom
continueThe docstring says "Returns None for source geometries that don't intersect valid_geom in WGS84" — but the same-projection case skips this entirely and always returns Minor Issues3.
if self.projection.crs != CRS.from_epsg(WGS84_EPSG): # allocates a new CRS object
...
if other.projection.crs != CRS.from_epsg(WGS84_EPSG): # same
4. A comment explaining why 1 degree was chosen (and any known edge cases where it might be too small, such as features near projection zone boundaries) would help future readers. Fix this → 5. Weaker assertions in The old test verified the exact reprojected bounds; the new test only checks Positive Notes
|
…rojection-robustness
Many data sources re-project to WGS84 for initial spatiotemporal intersection, previously they didn't handle antimeridian splitting but we add STGeometry.to_wgs84 function that they can call to handle it.
|
@claude can you review this PR? |
|
Claude finished @favyen2's task in 2m 46s —— View job PR Review
This is a solid, well-motivated PR. The core strategy — clip in WGS84 to a buffered window before reprojecting — is more principled than both the old direct-reproject approach and the Correctness Issues1. The old # vector_format.py:362 — only intersection checked, geometry not clipped
if not reproj.shp.intersects(dst_geom.shp):
continue
reprojected_features.append(Feature(reproj, feat.properties))Whether this matters depends on callers. If downstream code only needs to find which features overlap, it's fine. If code assumes returned geometries are bounded (e.g. for area computations, rendering, or spatial joins), it can silently produce wrong results. Recommend either adding a 2.
for shp in flatten_shape(wgs84_geometry.shp):
...
for lon_min in range(...):
for lat_min in range(...):
items.append(self._tile_to_item[(lon_min, lat_min)]) # no seen-set guardFor a window that is a MultiPolygon (e.g. from antimeridian splitting), if multiple parts cover overlapping tile coordinates, the same Minor Issues3.
if self.projection.crs == CRS.from_epsg(WGS84_EPSG):
4. Old: reproject This is the right tradeoff for antimeridian correctness, but worth noting if 5.
if clipped_shp.is_empty or clipped_shp.area == 0:
results.append(None)
continueFor line or point source geometries (area = 0 by definition), this will always append Positive Notes
|
| def is_too_large(self) -> bool: | ||
| """Returns whether this geometry's spatial coverage is too large. | ||
| def to_wgs84(self) -> "STGeometry": | ||
| """Convert to WGS84 with antimeridian splitting handling. | ||
|
|
||
| This means that it will likely have issues during re-projections and such. | ||
| For geometries already in WGS84, this is a no-op. | ||
| """ | ||
| wgs84_bounds = self.to_projection(WGS84_PROJECTION).shp.bounds | ||
| if wgs84_bounds[2] - wgs84_bounds[0] > MAX_GEOMETRY_DEGREES: | ||
| return True | ||
| if wgs84_bounds[3] - wgs84_bounds[1] > MAX_GEOMETRY_DEGREES: | ||
| return True | ||
| return False |
There was a problem hiding this comment.
We do use this function in Studio, though I suppose we could easily write our own version that does the same thing (and ideally we want to remove the rslearn dependency from Studio entirely anyway). But for now, maybe mark this as deprecated?
| Unlike direct to_projection(), this clips each source geometry in WGS84 to a | ||
| buffered area around valid_geom before reprojecting. This minimizes distortions in | ||
| case valid_geom is small but src_geoms may be large. It works best if src_geoms are | ||
| also natively in WGS84; otherwise, there could be distortion issues re-projecting |
There was a problem hiding this comment.
I feel like it could be more clear in the overall design that this is only addressing problems when the item geometries (src_geoms here) are in WGS84. If there are big item geometries in other projections, they would still be distorted when we re-project them to WGS84. Ideally we would re-project valid_geom to the geometry of the src_geoms, but the problem is we don't know how to deal with antimeridian splitting and stuff like that in every single CRS possible.
| b = part.bounds | ||
| boxes.append( | ||
| shapely.box( | ||
| b[0] - WGS84_CLIP_BUFFER_DEGREES, |
There was a problem hiding this comment.
If we add max(width, height) here then we would better deal with large window geometries.
There was a problem hiding this comment.
This code hinges on windows being small relative to items but you could still have like 3x3 windows that are much smaller than 10x10 items or something like that.
| if reproj is None: | ||
| continue | ||
| reprojected_features.append(Feature(geom, feat.properties)) | ||
| # Only include shapes that intersect the given bounds. | ||
| if not reproj.shp.intersects(dst_geom.shp): | ||
| continue |
There was a problem hiding this comment.
Might be worth clipping to dst_geom.shp because the reprojected geometry clips to a +/- 1 degree buffer.
clipped_shp = reproj.shp.intersection(dst_geom.shp)
clipped_geom = STGeometry(reproj.projection, clipped_shp, reproj.time_range)Then use clipped_geom instead of reproj below