Skip to content

[New feature] Implement temporal aggregations#555

Merged
favyen2 merged 17 commits intoallenai:masterfrom
robmarkcole:implement-temporal-aggregations
Mar 27, 2026
Merged

[New feature] Implement temporal aggregations#555
favyen2 merged 17 commits intoallenai:masterfrom
robmarkcole:implement-temporal-aggregations

Conversation

@robmarkcole
Copy link
Copy Markdown
Collaborator

@robmarkcole robmarkcole commented Mar 11, 2026

Address #554

This PR adds native temporal reducers for multi-temporal raster layers, making it possible to materialize climate variables directly as aggregated features over the same temporal windows used by EO layers such as Sentinel-2.

Added new raster compositing methods:

  • TEMPORAL_MEAN
  • TEMPORAL_MAX
  • TEMPORAL_MIN

Implemented temporal reduction over the raster time dimension after clipping to the effective request interval.
Extended prepare/materialize metadata so item groups can retain their exact per-group time ranges.
Updated documentation to describe the new compositing methods and the additional group time metadata.

Added unit tests covering:

  • temporal reducer behavior
  • storage/serialization of group time ranges
  • prepare-stage handling of period-based group time ranges

No intended breaking changes.

The changes are backward-compatible in the main paths:

  • existing datasets still load because group_time_ranges is optional in stored layer metadata
  • existing compositing methods are unchanged
  • the new reducers are opt-in

The only behavioral change is for raster layers that use period_duration and a temporal stack/reducer compositing method. For those layers, prepare now records exact per-group time ranges and materialize uses those per-group intervals instead of only the whole window time range. T
Two practical notes:

  • items.json now may include an extra optional field, group_time_ranges
  • if any external tooling reads items.json with a strict schema, it may need to tolerate that additional field

…RAL_MAX, and TEMPORAL_MIN methods; update dataset format and related classes to handle group time ranges.
Comment thread rslearn/dataset/manage.py Outdated
@robmarkcole robmarkcole changed the title [DRAFT] Implement temporal aggregations [New feature] Implement temporal aggregations Mar 11, 2026
@robmarkcole
Copy link
Copy Markdown
Collaborator Author

@favyen2 any thoughts? If this is a NOGO I need to start looking at alternative approaches. Thanks

Copy link
Copy Markdown
Collaborator

@favyen2 favyen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the group time ranges looks good to me, it allows using temporal stack with period_duration which previously wouldn't have worked, and might mean we can get rid of this special case:

def compute_expected_timestamps(

However I don't think it should be handled in manage.py, instead it feels like something that should come out of the match_candidate_items_to_window function. Currently it seems the per-period matching logic is now duplicated.

The temporal stack seems fine to me, currently we have been doing aggregation in transform so that we can try different aggregations (so the materialized data is the most fine-grained data we might use), but I could see it being useful. One alternative would be to change the MEAN and MEDIAN modes to operate over multiple timesteps but it wouldn't have the clipping from temporal stack. Another limitation is it doesn't support aggregating over periods, e.g. you might want daily max precipitation or something like that, but we could add it later when it is needed.

Comment thread rslearn/dataset/manage.py Outdated
Comment thread rslearn/dataset/manage.py Outdated
Comment thread rslearn/dataset/manage.py Outdated
Comment thread rslearn/train/dataset.py Outdated
@favyen2
Copy link
Copy Markdown
Collaborator

favyen2 commented Mar 26, 2026

Here is Claude review:

I've launched a review agent to analyze PR #555 "[New feature] Implement temporal aggregations". It's reading through the full diff and relevant source files. I'll share the detailed review once it completes.

Here's the full review of PR #555:


Code Review: PR #555 — Implement Temporal Aggregations

Overview

The PR adds three new compositing methods (TEMPORAL_MEAN, TEMPORAL_MAX, TEMPORAL_MIN) and reworks per-group time range tracking across prepare/materialize/train. The core logic is sound but there are several issues worth addressing before merge.


Bugs

Bug 1 (High Impact) — Non-ingest path never receives group_time_ranges

In rslearn/dataset/manage.py, the non-ingest data_source.materialize() call does not forward group_time_ranges:

retry(
    fn=lambda: data_source.materialize(
        window, item_groups, layer_name, layer_cfg  # missing group_time_ranges
    ),
    ...
)

ERA5Land and similar climate data sources use ingest=False, so they use this code path. Since the PR's primary motivation appears to be temporal aggregation of climate variables, this means the feature is incomplete for exactly those data sources.

Bug 2 — Misleading error message

If TEMPORAL_MEAN/MAX/MIN is configured on a non-temporal data source, the error raised says "SPATIAL_MOSAIC_TEMPORAL_STACK requires items to have timestamps" — referencing the wrong method. Should be caught and re-raised with a context-appropriate message.

Bug 3 — Malformed JSON example in docs/DatasetFormat.md

The diff adds an extra ], that closes an already-closed bracket, likely making the JSON example invalid. Worth double-checking the rendered output.


Breaking Changes (Despite Claims Otherwise)

compute_expected_timestamps removed from rslearn/train/dataset.py

This is a public function (no _ prefix). Any code importing it from outside the package will get an ImportError. This should either be deprecated first or explicitly documented as a breaking change.

Materializer.materialize() signature change

The new optional group_time_ranges parameter means external subclasses with overridden materialize() will have a signature mismatch (mypy/pyright would flag this, though Python won't crash).

MatchedItemGroup returned instead of plain list

Code checking type(result[0]) is list (exact type identity, not isinstance) would break.

group_time_ranges written universally to items.json

The PR implies it's only written for period-duration layers, but the non-period-duration path also wraps groups in MatchedItemGroup, so group_time_ranges is written for all raster layers. Any strict schema reader of items.json will be affected.


Design Concern — MatchedItemGroup subclasses list

class MatchedItemGroup(list):
    def __init__(self, items, request_time_range):
        ...

This is fragile. Slicing or copying loses the metadata:

mg = MatchedItemGroup([item1], time_range)
copy = mg[:]  # plain list — request_time_range is gone

A dataclass or NamedTuple wrapping the list would be safer and more idiomatic.


Test Gaps

  • No test for all-nodata input (would produce RuntimeWarning from NumPy on fully-masked arrays)
  • No test for T=1 single-timestep input with temporal reducers
  • No test for temporal reducers configured on non-temporal data sources (error message verification)
  • No test for the non-ingest materialize path with group_time_ranges
  • No test for the non-period-duration path attaching geometry.time_range to all groups

The 7 removed compute_expected_timestamps tests covered edge cases (time offsets, no time range, etc.) that are now untested — though the simpler replacement function makes this less critical.


Minor Style Issues

  • Three nearly identical wrapper functions (build_temporal_mean_composite, etc.) could be a single factory
  • _extract_group_time_ranges docstring says the fallback "returns geometry time range" but that value can be None

Summary

Issue Severity
Non-ingest path missing group_time_ranges High — feature incomplete for climate sources
compute_expected_timestamps removed without deprecation Medium — breaking API change
Malformed JSON in docs Medium
Misleading error message for TEMPORAL_* on non-temporal sources Low
MatchedItemGroup subclassing list is fragile Low
Missing tests for edge cases Low

The most critical item to resolve before merge is Bug 1 — the non-ingest path not forwarding group_time_ranges to materializers. Without that fix, the feature won't work for the climate data sources it was apparently designed for.

@favyen2
Copy link
Copy Markdown
Collaborator

favyen2 commented Mar 26, 2026

I think claude is right that MatchedItemGroup may be better with dataclass especially now that all data sources use it (so there is no need to maintain interoperability with list).

For group_time_ranges, do you think there is a clean way to use it during direct materialization?

@favyen2
Copy link
Copy Markdown
Collaborator

favyen2 commented Mar 26, 2026

I guess it might involve adding group_time_ranges: list[tuple[datetime, datetime] | None] | None = None to DataSource.materialize signature? And then passing it to the RasterMaterializer?

@allenai allenai deleted a comment from claude bot Mar 26, 2026
@robmarkcole
Copy link
Copy Markdown
Collaborator Author

robmarkcole commented Mar 26, 2026

Implemented the direct-materialization plumbing for group_time_ranges:

  • DataSource.materialize now accepts optional group_time_ranges in the base interface
  • materialize_window now passes stored layer_data.group_time_ranges for direct materialization when supported, with backward compatibility for older custom sources (signature check)
  • Direct materializers now forward that argument into RasterMaterializer

Implemented MatchedItemGroup as dataclass

Comment thread rslearn/dataset/manage.py Outdated
Comment thread rslearn/data_sources/utils.py Outdated
@favyen2 favyen2 closed this Mar 27, 2026
@favyen2 favyen2 reopened this Mar 27, 2026
@favyen2 favyen2 merged commit da431b3 into allenai:master Mar 27, 2026
4 of 5 checks passed
@robmarkcole robmarkcole deleted the implement-temporal-aggregations branch March 27, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants