[New feature] Implement temporal aggregations#555
Conversation
…RAL_MAX, and TEMPORAL_MIN methods; update dataset format and related classes to handle group time ranges.
|
@favyen2 any thoughts? If this is a NOGO I need to start looking at alternative approaches. Thanks |
favyen2
left a comment
There was a problem hiding this comment.
Adding the group time ranges looks good to me, it allows using temporal stack with period_duration which previously wouldn't have worked, and might mean we can get rid of this special case:
rslearn/rslearn/train/dataset.py
Line 71 in 700082a
However I don't think it should be handled in manage.py, instead it feels like something that should come out of the match_candidate_items_to_window function. Currently it seems the per-period matching logic is now duplicated.
The temporal stack seems fine to me, currently we have been doing aggregation in transform so that we can try different aggregations (so the materialized data is the most fine-grained data we might use), but I could see it being useful. One alternative would be to change the MEAN and MEDIAN modes to operate over multiple timesteps but it wouldn't have the clipping from temporal stack. Another limitation is it doesn't support aggregating over periods, e.g. you might want daily max precipitation or something like that, but we could add it later when it is needed.
|
Here is Claude review: I've launched a review agent to analyze PR #555 "[New feature] Implement temporal aggregations". It's reading through the full diff and relevant source files. I'll share the detailed review once it completes. Here's the full review of PR #555: Code Review: PR #555 — Implement Temporal AggregationsOverviewThe PR adds three new compositing methods ( BugsBug 1 (High Impact) — Non-ingest path never receives In ERA5Land and similar climate data sources use Bug 2 — Misleading error message If Bug 3 — Malformed JSON example in The diff adds an extra Breaking Changes (Despite Claims Otherwise)
This is a public function (no
The new optional
Code checking
The PR implies it's only written for period-duration layers, but the non-period-duration path also wraps groups in Design Concern —
|
| Issue | Severity |
|---|---|
| Non-ingest path missing group_time_ranges | High — feature incomplete for climate sources |
| compute_expected_timestamps removed without deprecation | Medium — breaking API change |
| Malformed JSON in docs | Medium |
| Misleading error message for TEMPORAL_* on non-temporal sources | Low |
| MatchedItemGroup subclassing list is fragile | Low |
| Missing tests for edge cases | Low |
The most critical item to resolve before merge is Bug 1 — the non-ingest path not forwarding group_time_ranges to materializers. Without that fix, the feature won't work for the climate data sources it was apparently designed for.
|
I think claude is right that MatchedItemGroup may be better with dataclass especially now that all data sources use it (so there is no need to maintain interoperability with list). For |
|
I guess it might involve adding |
|
Implemented the direct-materialization plumbing for
Implemented MatchedItemGroup as dataclass |
…aggregations # Conflicts: # rslearn/data_sources/earthdaily.py
…ith group_time_ranges.
…d code references
Address #554
This PR adds native temporal reducers for multi-temporal raster layers, making it possible to materialize climate variables directly as aggregated features over the same temporal windows used by EO layers such as Sentinel-2.
Added new raster compositing methods:
Implemented temporal reduction over the raster time dimension after clipping to the effective request interval.
Extended prepare/materialize metadata so item groups can retain their exact per-group time ranges.
Updated documentation to describe the new compositing methods and the additional group time metadata.
Added unit tests covering:
No intended breaking changes.
The changes are backward-compatible in the main paths:
The only behavioral change is for raster layers that use
period_durationand a temporal stack/reducer compositing method. For those layers, prepare now records exact per-group time ranges and materialize uses those per-group intervals instead of only the whole window time range. TTwo practical notes:
items.jsonnow may include an extra optional field, group_time_ranges