Rewrite clim benchmark with 366-DOY forecast-style climatology by colonesej · Pull Request #33 · ecmwf/hyve

colonesej · 2026-04-17T11:25:11Z

What (edited)

The main goal of this script is to create an ensemble forecast-like dataset from the reanalysis dataset. This forecast represents the climatology for each pixel grid where reanalysis is available. to build an ensemble, different climatology percentiles are used, for intance with 10 ensemble members, we would have their values as the climatological 10th, 20th, ...., 100th quantile.

The script should build one forecast for different day-of-years.

Sampling is determined by window and stride, where the first represents the number of steps around the target doy to sample from, and stripe represents the frequency inside this window to sample (every day, week or month)

For sub-daily data (like EFAS), we can sample centering on the usual forecast issue time (00 and 12) then, producing slightly different windows for each.

Computationally the strategy is to manually define computing blocks of required indices for each targeted step, then computing each block separately and keep memory footprint under control.

The simplest case, is daily data, daily window, 1 issue time, at 00, one quantile (median) and one leadtime step. This should produce the climatological median for each day of the reanalysis. It can be verified using GloFAS data.

Lead years: We are changing a bit the leap-year convention from the original script, meaning this should always produce a 366 days output, where 29thFeb is assigned its natural doy of 60 (will match 1st of March in non-leap years). the 366th doy always receive the value valid for the 31st of December, either a unique value or a duplicate for 365th in non-leap years.

Why

The previous hyve-clim-benchmark date handling was brittle (leap-day behavior, hardcoded output calendar assumptions, stride semantics tied to raw timestep indexing). This PR rewrites the tool around explicit 366-day DOY logic and forecast-like output structure.

Original plan (executed)

Replace monolithic script with a modular package.
Implement pure date/index logic with explicit validation and deterministic DOY pools.
Compute climatology as (doy, issue_hour, ensemble, *space).
Add fit-for-purpose tests for edge cases and the main operational use cases.
Remove legacy implementation after coverage is in place.

What changed

Replaced src/hyve/tools/clim_benchmark.py with package modules:
- src/hyve/tools/clim_benchmark/config.py
- src/hyve/tools/clim_benchmark/dates.py
- src/hyve/tools/clim_benchmark/sampling.py
- src/hyve/tools/clim_benchmark/percentiles.py
- src/hyve/tools/clim_benchmark/io.py
- src/hyve/tools/clim_benchmark/cli.py
- src/hyve/tools/clim_benchmark/__init__.py

Date and calendar model

Output is always indexed by integer doy=1..366.
Non-leap Dec 31 contributes to both DOY 365 and DOY 366 pool (fallback semantics).
Leap-day is preserved in DOY handling (no leap-day drop/remap bug).
Window sampling wraps around year boundaries on the 366-day canonical calendar.

CLI and configuration

New categorical stride argument: --stride {daily,weekly,monthly}.
Validation for stride/window consistency:
- daily requires window_days >= 1
- weekly requires window_days >= 7
- monthly requires window_days >= 30
Added --issue-frequency (hours, divisor of 24) and validation against inferred input timestep.
Added configurable --percentiles list.
Output keeps input variable name (no hardcoded dis).

Output semantics

Produces forecast-like climatology shape: (doy, issue_hour, ensemble, *space).
Adds auxiliary month and day coordinates on DOY (Option A decision).
Adds output attrs: time_unit, window_days, stride, issue_frequency_hours, percentiles.

Computation strategy

Uses pool-by-pool dask.delayed percentile computation with chunk({'time': -1}) for bounded memory (Option A decision).

Tests added

tests/test_clim_benchmark_dates.py
tests/test_clim_benchmark_end_to_end.py

Coverage includes:

irregular/duplicate/non-monotonic timestep detection,
DOY mapping edge cases,
window/stride pool sizes and wrap-around,
leap-day regression cases,
issue-frequency partitioning for sub-daily data,
output shape/metadata/variable-name preservation,
primary use-cases:
- daily data + daily issue,
- 6-hourly data + 12-hour issue,
CLI smoke behavior.

Validation

Full repository test suite passes locally: 62 passed.

Notes

This intentionally introduces a CLI behavior update for hyve-clim-benchmark consistent with the rewrite goals.

codecov-commenter · 2026-04-17T11:27:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (91a2952) to head (2ed6a66).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##              main       #33    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files            2         5     +3     
  Lines          108       385   +277     
==========================================
+ Hits           108       385   +277

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

colonesej · 2026-04-17T12:25:51Z

Hi @andreas-grafberger, can I ask you to review the main requirements and test coverages in this PR!? Do they deviate too much from the previous implementation? Is it a good starting point in reviewing the clim-benchmark script?

Copilot

Pull request overview

This PR rewrites the hyve-clim-benchmark tool into a modular package centered on an explicit 366-DOY calendar model and forecast-like output structure, replacing the legacy monolithic script and adding targeted unit + end-to-end tests.

Changes:

Replaced src/hyve/tools/clim_benchmark.py with a package implementation (config, dates, sampling, percentiles, io, cli) built around deterministic 366-DOY pooling and (doy, issue_hour, ensemble, *space) outputs.
Added CLI/config validation for stride/window, issue-frequency, and percentile selection; preserved input variable name in output.
Added unit tests for date/index logic and end-to-end tests for key operational scenarios and leap-year semantics.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/test_clim_benchmark_end_to_end.py	End-to-end coverage for daily + sub-daily use-cases, output shape/attrs, leap-year behavior, and CLI smoke.
tests/test_clim_benchmark_dates.py	Unit coverage for timestep inference, DOY mapping/pools, issue-hour splitting, and config validation.
src/hyve/tools/clim_benchmark/sampling.py	Builds (doy, issue_hour) slots and gathers time samples for per-slot computations.
src/hyve/tools/clim_benchmark/percentiles.py	Computes per-slot quantiles and assembles the final climatology array.
src/hyve/tools/clim_benchmark/io.py	Adds auxiliary `(month, day)` coords and writes NetCDF with metadata attrs.
src/hyve/tools/clim_benchmark/dates.py	Core DOY logic: DOY mapping, pool construction with wrap-around, issue-hour splitting.
src/hyve/tools/clim_benchmark/config.py	Pydantic config + validation for window/stride, issue frequency, and percentiles.
src/hyve/tools/clim_benchmark/cli.py	CLI argument parsing and orchestration of the pipeline.
src/hyve/tools/clim_benchmark/init.py	Exposes `main` for the `hyve-clim-benchmark` entrypoint.
src/hyve/tools/clim_benchmark.py	Removes the legacy script implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    delayed_results: dict[Slot, xr.DataArray] = {}
+    for slot_key, indices in slots.items():
+        sub = gather(da, indices)
+        delayed_results[slot_key] = dask.delayed(_quantile_slot)(sub, quantiles)
+
+    computed_list = dask.compute(*delayed_results.values())
+    computed: dict[Slot, xr.DataArray] = dict(zip(delayed_results.keys(), computed_list))
+
+    # Build the full output array: (doy, issue_hour, ensemble, *space).
+    issue_hours = sorted({ih for (_, ih) in slots.keys()})
+    space_dims = [d for d in da.dims if d != "time"]
+    space_sizes = {d: da.sizes[d] for d in space_dims}
+    space_coords = {d: da.coords[d] for d in space_dims if d in da.coords}
+
+    out_shape = (366, len(issue_hours), len(percentiles), *space_sizes.values())
+    out_dtype = next(iter(computed.values())).dtype if computed else da.dtype
+    data = np.full(out_shape, np.nan, dtype=out_dtype)


+    space_coords = {d: da.coords[d] for d in space_dims if d in da.coords}
+
+    out_shape = (366, len(issue_hours), len(percentiles), *space_sizes.values())
+    out_dtype = next(iter(computed.values())).dtype if computed else da.dtype


+    ds_in = xr.open_dataset(input_path, chunks={})
+    da = _select_variable(ds_in, variable)
+    if config.start_date is not None or config.end_date is not None:
+        da = da.sel(time=slice(config.start_date, config.end_date))
+
+    time_index = pd.DatetimeIndex(da["time"].values)
+    timestep = infer_timestep(time_index)
+    timestep_hours = timestep.total_seconds() / 3600.0
+    config.validate_against_data(timestep_hours)
+
+    logger.info(
+        "timestep=%s, window_days=%d, stride=%s, issue_frequency_hours=%d",
+        timestep,
+        config.window_days,
+        config.stride,
+        config.issue_frequency_hours,
+    )
+
+    slots = build_slots(
+        da,
+        window_days=config.window_days,
+        stride=config.stride,
+        issue_frequency_hours=config.issue_frequency_hours,
+    )
+    logger.info("Built %d (doy, issue_hour) slots", len(slots))
+
+    clim_da = compute_climatology(da, slots, config.percentiles)
+    ds_out = build_output_dataset(clim_da, config, timestep)
+
+    if output_path:
+        logger.info("Writing %s", output_path)
+        write_netcdf(ds_out, output_path)
+    return ds_out


+    buckets: dict[int, list[int]] = {d: [] for d in range(1, 367)}
+    for i, d in enumerate(doys):
+        buckets[int(d)].append(i)
+    # Duplicate non-leap Dec 31 into DOY 366.
+    non_leap_dec31 = np.where(
+        (~is_leap_year(years)) & (months == 12) & (days == 31)
+    )[0]
+    for i in non_leap_dec31:
+        buckets[366].append(int(i))
+
+    return {d: np.asarray(sorted(ix), dtype=np.int64) for d, ix in buckets.items()}


+            src_doy = ((target_doy - 1 + int(off)) % 366) + 1
+            parts.append(doy_to_idx[src_doy])
+        if parts:
+            pools[target_doy] = np.concatenate(parts)


…-doy-rewrite Reduce memory usage and add better error message

Rewrite clim benchmark around 366-DOY forecast-style climatology

2ed6a66

colonesej requested a review from Copilot April 17, 2026 12:24

Copilot started reviewing on behalf of colonesej April 17, 2026 12:24 View session

colonesej added this to HyVe Apr 17, 2026

github-project-automation Bot moved this to Backlog in HyVe Apr 17, 2026

Copilot AI reviewed Apr 17, 2026

View reviewed changes

colonesej assigned colonesej and unassigned colonesej May 26, 2026

colonesej moved this from Backlog to In progress in HyVe Jun 5, 2026

Reduce memory usage and add better error message

989d91e

colonesej moved this from In progress to In review in HyVe Jun 19, 2026

Merge pull request #35 from andy-fat-potato-uk/feature/clim-benchmark…

a1ecdd1

…-doy-rewrite Reduce memory usage and add better error message

colonesej self-assigned this Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite clim benchmark with 366-DOY forecast-style climatology#33

Rewrite clim benchmark with 366-DOY forecast-style climatology#33
colonesej wants to merge 3 commits into
mainfrom
feature/clim-benchmark-doy-rewrite

colonesej commented Apr 17, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 17, 2026

Uh oh!

colonesej commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

colonesej commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What (edited)

Why

Original plan (executed)

What changed

Date and calendar model

CLI and configuration

Output semantics

Computation strategy

Tests added

Validation

Notes

Uh oh!

codecov-commenter commented Apr 17, 2026

Codecov Report

Uh oh!

colonesej commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

colonesej commented Apr 17, 2026 •

edited

Loading