Skip to content

Rewrite clim benchmark with 366-DOY forecast-style climatology#33

Open
colonesej wants to merge 3 commits into
mainfrom
feature/clim-benchmark-doy-rewrite
Open

Rewrite clim benchmark with 366-DOY forecast-style climatology#33
colonesej wants to merge 3 commits into
mainfrom
feature/clim-benchmark-doy-rewrite

Conversation

@colonesej

@colonesej colonesej commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

What (edited)

The main goal of this script is to create an ensemble forecast-like dataset from the reanalysis dataset. This forecast represents the climatology for each pixel grid where reanalysis is available. to build an ensemble, different climatology percentiles are used, for intance with 10 ensemble members, we would have their values as the climatological 10th, 20th, ...., 100th quantile.

The script should build one forecast for different day-of-years.

Sampling is determined by window and stride, where the first represents the number of steps around the target doy to sample from, and stripe represents the frequency inside this window to sample (every day, week or month)

For sub-daily data (like EFAS), we can sample centering on the usual forecast issue time (00 and 12) then, producing slightly different windows for each.

Computationally the strategy is to manually define computing blocks of required indices for each targeted step, then computing each block separately and keep memory footprint under control.

The simplest case, is daily data, daily window, 1 issue time, at 00, one quantile (median) and one leadtime step. This should produce the climatological median for each day of the reanalysis. It can be verified using GloFAS data.

Lead years: We are changing a bit the leap-year convention from the original script, meaning this should always produce a 366 days output, where 29thFeb is assigned its natural doy of 60 (will match 1st of March in non-leap years). the 366th doy always receive the value valid for the 31st of December, either a unique value or a duplicate for 365th in non-leap years.

Why

The previous hyve-clim-benchmark date handling was brittle (leap-day behavior, hardcoded output calendar assumptions, stride semantics tied to raw timestep indexing). This PR rewrites the tool around explicit 366-day DOY logic and forecast-like output structure.

Original plan (executed)

  1. Replace monolithic script with a modular package.
  2. Implement pure date/index logic with explicit validation and deterministic DOY pools.
  3. Compute climatology as (doy, issue_hour, ensemble, *space).
  4. Add fit-for-purpose tests for edge cases and the main operational use cases.
  5. Remove legacy implementation after coverage is in place.

What changed

  • Replaced src/hyve/tools/clim_benchmark.py with package modules:
    • src/hyve/tools/clim_benchmark/config.py
    • src/hyve/tools/clim_benchmark/dates.py
    • src/hyve/tools/clim_benchmark/sampling.py
    • src/hyve/tools/clim_benchmark/percentiles.py
    • src/hyve/tools/clim_benchmark/io.py
    • src/hyve/tools/clim_benchmark/cli.py
    • src/hyve/tools/clim_benchmark/__init__.py

Date and calendar model

  • Output is always indexed by integer doy=1..366.
  • Non-leap Dec 31 contributes to both DOY 365 and DOY 366 pool (fallback semantics).
  • Leap-day is preserved in DOY handling (no leap-day drop/remap bug).
  • Window sampling wraps around year boundaries on the 366-day canonical calendar.

CLI and configuration

  • New categorical stride argument: --stride {daily,weekly,monthly}.
  • Validation for stride/window consistency:
    • daily requires window_days >= 1
    • weekly requires window_days >= 7
    • monthly requires window_days >= 30
  • Added --issue-frequency (hours, divisor of 24) and validation against inferred input timestep.
  • Added configurable --percentiles list.
  • Output keeps input variable name (no hardcoded dis).

Output semantics

  • Produces forecast-like climatology shape: (doy, issue_hour, ensemble, *space).
  • Adds auxiliary month and day coordinates on DOY (Option A decision).
  • Adds output attrs: time_unit, window_days, stride, issue_frequency_hours, percentiles.

Computation strategy

  • Uses pool-by-pool dask.delayed percentile computation with chunk({'time': -1}) for bounded memory (Option A decision).

Tests added

  • tests/test_clim_benchmark_dates.py
  • tests/test_clim_benchmark_end_to_end.py

Coverage includes:

  • irregular/duplicate/non-monotonic timestep detection,
  • DOY mapping edge cases,
  • window/stride pool sizes and wrap-around,
  • leap-day regression cases,
  • issue-frequency partitioning for sub-daily data,
  • output shape/metadata/variable-name preservation,
  • primary use-cases:
    • daily data + daily issue,
    • 6-hourly data + 12-hour issue,
  • CLI smoke behavior.

Validation

  • Full repository test suite passes locally: 62 passed.

Notes

  • This intentionally introduces a CLI behavior update for hyve-clim-benchmark consistent with the rewrite goals.

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (91a2952) to head (2ed6a66).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main       #33    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files            2         5     +3     
  Lines          108       385   +277     
==========================================
+ Hits           108       385   +277     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@colonesej

Copy link
Copy Markdown
Collaborator Author

Hi @andreas-grafberger, can I ask you to review the main requirements and test coverages in this PR!? Do they deviate too much from the previous implementation? Is it a good starting point in reviewing the clim-benchmark script?

@colonesej colonesej added this to HyVe Apr 17, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in HyVe Apr 17, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rewrites the hyve-clim-benchmark tool into a modular package centered on an explicit 366-DOY calendar model and forecast-like output structure, replacing the legacy monolithic script and adding targeted unit + end-to-end tests.

Changes:

  • Replaced src/hyve/tools/clim_benchmark.py with a package implementation (config, dates, sampling, percentiles, io, cli) built around deterministic 366-DOY pooling and (doy, issue_hour, ensemble, *space) outputs.
  • Added CLI/config validation for stride/window, issue-frequency, and percentile selection; preserved input variable name in output.
  • Added unit tests for date/index logic and end-to-end tests for key operational scenarios and leap-year semantics.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_clim_benchmark_end_to_end.py End-to-end coverage for daily + sub-daily use-cases, output shape/attrs, leap-year behavior, and CLI smoke.
tests/test_clim_benchmark_dates.py Unit coverage for timestep inference, DOY mapping/pools, issue-hour splitting, and config validation.
src/hyve/tools/clim_benchmark/sampling.py Builds (doy, issue_hour) slots and gathers time samples for per-slot computations.
src/hyve/tools/clim_benchmark/percentiles.py Computes per-slot quantiles and assembles the final climatology array.
src/hyve/tools/clim_benchmark/io.py Adds auxiliary (month, day) coords and writes NetCDF with metadata attrs.
src/hyve/tools/clim_benchmark/dates.py Core DOY logic: DOY mapping, pool construction with wrap-around, issue-hour splitting.
src/hyve/tools/clim_benchmark/config.py Pydantic config + validation for window/stride, issue frequency, and percentiles.
src/hyve/tools/clim_benchmark/cli.py CLI argument parsing and orchestration of the pipeline.
src/hyve/tools/clim_benchmark/init.py Exposes main for the hyve-clim-benchmark entrypoint.
src/hyve/tools/clim_benchmark.py Removes the legacy script implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +54 to +70
delayed_results: dict[Slot, xr.DataArray] = {}
for slot_key, indices in slots.items():
sub = gather(da, indices)
delayed_results[slot_key] = dask.delayed(_quantile_slot)(sub, quantiles)

computed_list = dask.compute(*delayed_results.values())
computed: dict[Slot, xr.DataArray] = dict(zip(delayed_results.keys(), computed_list))

# Build the full output array: (doy, issue_hour, ensemble, *space).
issue_hours = sorted({ih for (_, ih) in slots.keys()})
space_dims = [d for d in da.dims if d != "time"]
space_sizes = {d: da.sizes[d] for d in space_dims}
space_coords = {d: da.coords[d] for d in space_dims if d in da.coords}

out_shape = (366, len(issue_hours), len(percentiles), *space_sizes.values())
out_dtype = next(iter(computed.values())).dtype if computed else da.dtype
data = np.full(out_shape, np.nan, dtype=out_dtype)
space_coords = {d: da.coords[d] for d in space_dims if d in da.coords}

out_shape = (366, len(issue_hours), len(percentiles), *space_sizes.values())
out_dtype = next(iter(computed.values())).dtype if computed else da.dtype
Comment on lines +94 to +126
ds_in = xr.open_dataset(input_path, chunks={})
da = _select_variable(ds_in, variable)
if config.start_date is not None or config.end_date is not None:
da = da.sel(time=slice(config.start_date, config.end_date))

time_index = pd.DatetimeIndex(da["time"].values)
timestep = infer_timestep(time_index)
timestep_hours = timestep.total_seconds() / 3600.0
config.validate_against_data(timestep_hours)

logger.info(
"timestep=%s, window_days=%d, stride=%s, issue_frequency_hours=%d",
timestep,
config.window_days,
config.stride,
config.issue_frequency_hours,
)

slots = build_slots(
da,
window_days=config.window_days,
stride=config.stride,
issue_frequency_hours=config.issue_frequency_hours,
)
logger.info("Built %d (doy, issue_hour) slots", len(slots))

clim_da = compute_climatology(da, slots, config.percentiles)
ds_out = build_output_dataset(clim_da, config, timestep)

if output_path:
logger.info("Writing %s", output_path)
write_netcdf(ds_out, output_path)
return ds_out
Comment on lines +94 to +104
buckets: dict[int, list[int]] = {d: [] for d in range(1, 367)}
for i, d in enumerate(doys):
buckets[int(d)].append(i)
# Duplicate non-leap Dec 31 into DOY 366.
non_leap_dec31 = np.where(
(~is_leap_year(years)) & (months == 12) & (days == 31)
)[0]
for i in non_leap_dec31:
buckets[366].append(int(i))

return {d: np.asarray(sorted(ix), dtype=np.int64) for d, ix in buckets.items()}
src_doy = ((target_doy - 1 + int(off)) % 366) + 1
parts.append(doy_to_idx[src_doy])
if parts:
pools[target_doy] = np.concatenate(parts)
@colonesej colonesej assigned colonesej and unassigned colonesej May 26, 2026
@colonesej colonesej moved this from Backlog to In progress in HyVe Jun 5, 2026
@colonesej colonesej moved this from In progress to In review in HyVe Jun 19, 2026
…-doy-rewrite

Reduce memory usage and add better error message
@colonesej colonesej self-assigned this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants