Skip to content

feat(Germany): Add Germany solar generation data pipeline #134

Open
Sharkyii wants to merge 2 commits intoopenclimatefix:mainfrom
Sharkyii:germany
Open

feat(Germany): Add Germany solar generation data pipeline #134
Sharkyii wants to merge 2 commits intoopenclimatefix:mainfrom
Sharkyii:germany

Conversation

@Sharkyii
Copy link
Copy Markdown

Pull Request

Description

Adds a complete end-to-end pipeline for Germany solar PV forecasting - including data download scripts for SMARD PV generation and GFS weather data, a processing/validation pipeline, and a baseline model training script, all configurable via YAML configs

Fixes #121

How Has This Been Tested?

All scripts were validated end-to-end against real data:

PV data: 34,944 time steps, 1 GSP region (Germany 2021)
GFS data: 28 init times, 17 forecast steps, 33×41 spatial grid
Verified data loading, validation (negative value checks, NaN percentage), temporal alignment, and normalization constant calculation

  • Yes

If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@Sharkyii
Copy link
Copy Markdown
Author

Adds a complete end-to-end pipeline for training solar PV forecasting models for Germany, using GFS weather data (NOAA) and PV generation data from the SMARD API.

What's included

  • Data ingestion: Scripts to download PV generation data (downloading_pv_germany.py) and GFS weather forecasts (download_gfs_germany_fast.py), with support for both recent and historical data via herbie-data
  • Processing pipeline: germany_pipeline.py orchestrates inspection, validation, temporal alignment, and normalization constant generation
  • Model training: train_germany_baseline.py trains a baseline forecasting model from processed Zarr data
  • Configs: GFS download settings, PV data config, regional boundaries, and PVNet datamodule configuration for Germany
  • Utilities: Shared helpers in germany_utils.py

Data format

  • PV: (datetime_gmt, gsp_id) — generation_mw, capacity_mwp
  • GFS: (init_time_utc, step, lat, lon) — 14 weather channels at 0.25° resolution

Checkout - README

Next Steps

Train Model still in development

@Sharkyii Sharkyii changed the title Added germany pipeline feat(Germany): Add Germany solar generation data pipeline Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Country selection and coordination for PVNet training

1 participant