implement evaluation routine for SSL by iluise · Pull Request #1747 · ecmwf/WeatherGenerator

iluise · 2026-01-29T13:54:34Z

Description

add code for standard inference + evaluation for jepa/dinov3 etc..
usage:

agpu
uv run ssl_analysis --run-id <run id> (optional: -- verbose)

Issue Number

Closes #1746

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

* rm model_forward assignment in val * rm clutter from diffusion branch * reverse if order

* Fix bug with diagnostic streams * Avoid that empty decoders are allocated

* Doing something wrong * Make fine-tuning work * Rename sensibly

* Enable multiple student views for one target * Improved readability

* add pin mem to IOReaderData * add pin mem to sample & modelbatch class * add pin mem to stream data * add pin mem to training loop * run /scripts/actions.sh lint * run ./scripts/actions.sh unit-test * ignore check torch import in package * move pinning to MultiStreamDataSampler * add _pin_tensor & _pin_tensor_list helper func * ruff the code * move back pin mem. to train loop * Remove the ignore-import-error rule and revert to the state before the change * create protocol for pinnable obj * remove pin_mem from IOReaderData class * add pin_memory to Trainer.validate * remove pin_memory from loader_params * Rever export/export_inference.py to state before c3fc9a7 * change name * revise Pinnable class description * add memory_pinning in config, train & va loop * use getattr to avoid CICD warning * use setattr to avoid CICD warning * disable pylint for self.source_tokens_lens * Fixed issues with memory pinning due to rebasing and also adjusted config position of flag * Reverting unadvert changes --------- Co-authored-by: Javad Kasravi <j.kasravi@fz-juelich.de> Co-authored-by: Javad Kasravi <jkasravi@santis-ln002.cscs.ch> Co-authored-by: Javad kasravi <kasravi66@gmail.com>

…tructured (#1653)

* split WeatherGenReader functionality to allow reading only JSON adding weathergen JSON reader to develop * informative error when metrics are not there * restore JSONreader after rebase * JSONreader mostly restored * MLFlow logging independent of JSON/zarr * linting, properly cheking fsteps, ens, samples in JSONreader * tiny change to restore the MergeReader * lint * enabling JSONreader to skip plots and missing scores gracefully * required reformatting * move skipping of metrics to the reader class * slighly more explicit formulations --------- Co-authored-by: Sebastian Buschow <sbuschow@santis-ln001.cscs.ch> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln002.cscs.ch> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com>

* Add target type value error * Remove type * Remove unused code * Commit what shall have been committed * Remove target readout type from config * Add computing stream names to embedding engine --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int>

* add default streams + fix lead time error * update config * Correct a bug creating aggr issues on scores (#1685) --------- Co-authored-by: Savvas Melidonis <79579567+SavvasMel@users.noreply.github.com>

* add default streams + fix lead time error * update config * update ratio plots and bar plots for single run * fix title * Update config Added support information for forecast_step configuration. --------- Co-authored-by: Savvas Melidonis <79579567+SavvasMel@users.noreply.github.com>

* add argument * check stage argument * removed unnecessary code * arbitrary position arguments * Fix error text * get stage info from environment variable. * Update run_train.py --------- Co-authored-by: Simon Grasse <s.grasse@fz-juelich.de>

* caching get_shared_wg_path() * renaming get_path_output to get_path_results * model and results paths from get_shared_wg_path() and removed _get_config_attribute() * marking get_shared_wg_path() as private * removing set_path() * fixed call to _get_shared_wg_path * fixed import, code clean-up, change caching decorator * changed way of caching _get_shared_wg_base_path * fixed typing error * changes in Refactor shared WG path handling and model config I/O - Simplify get_path_model/get_path_run to always resolve via _get_shared_wg_path() - Change _get_shared_wg_path() to cached, argument-free helper returning the shared working dir from private config - Adjust model config save/load to build filenames relative to the run’s model directory instead of passing parent paths around - Update load_run_config and load_merge_configs to use new path helpers and improve assertion/log messages - Replace internal _get_shared_wg_path("results") usages with get_path_run() in wegen_reader and train_logger * fixed base_path in metrics_path * fixed forgotten config.general * fixed lint raised issues * Improve path handling and add missing docstrings - Add docstrings to 10+ utility functions for better documentation - Refactor load_run_config to improve path construction logic - Move mini_epoch string formatting from _get_model_config_file_read_name to caller for better separation of concerns - Add validation for mini_epoch_str format with descriptive error messages - Fix multi-line docstring format in _load_private_conf * fixed line too long * reverting to previous _get_model_config_file_read_name() * pretty fix for _get_model_config_file_read_name * pretty fix for _get_model_config_file_read_name * removed unused/undefined path

* replace '_' with '-' * cli options underscore to dash * change underscores to hyphens * rename options in cli unit test

Co-authored-by: Savvas Melidonis <79579567+SavvasMel@users.noreply.github.com>

* rename write_num_samples to num_samples * Fixing linting --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int>

* remove misleading logging of mini_epoch * add forecast_steps logging

* Fix duplicate run_id in results and runplots paths. Linting. * remove duplicate run_id also from metrics directory * Linting

clessig · 2026-02-03T13:00:09Z

+
+def get_evaluation_config(run_id, verbose=False):
+    """Create evaluation configuration for multiple streams."""
+    cfg = omegaconf.OmegaConf.create(


We should use yaml config files here and not hard-code configs in the code.

do you want it as part of the main config then? Can we do it in a separate PR?

sophie-xhonneux · 2026-02-16T16:03:20Z

Hey @iluise, thanks so much for working on this! Is there a summary or high-level overview of the changes or is the best thing I can do, going through the code line-by-line?

iluise · 2026-02-16T16:12:50Z

I think you should mainly test if it does what we want for the ssl analysis (aka are the plots and the numbers what we want to test the pre-training?) and feedbacks are more than welcome.

most of the changes are to remove verbosity, so not very interesting overall. the main novelty is the ssl_analysis.py code. Feedbacks on that are also welcome as people from pre-training might touch it in the future to change their analysis I guess (nb. the config is now hardcoded but we will change that in the future).

tjhunter · 2026-02-25T15:12:22Z

 ]

 [project.scripts]
+ssl_analysis= "weathergen.evaluate.ssl.ssl_eval:ssl_analysis"


can we fold that into an evaluation sub-task? I am happy to discuss reasons for that.

iluise · 2026-04-09T11:54:19Z

closing as implemented elsewhere by Simone.

iluise and others added 30 commits January 16, 2026 16:56

latent_space evaluation scripts + propagate verbose

c5c1a05

lint

a4b0f12

add usage

40eccbd

fix log

3ed01fc

Jk/develop/1639 fix shard val forward (#1642)

2f9f125

* rm model_forward assignment in val * rm clutter from diffusion branch * reverse if order

Clessig/develop/fix finetuning 1640 (#1641)

7c4bb82

* Fix bug with diagnostic streams * Avoid that empty decoders are allocated

Sophiex/dev/synop nppatms finetuning configs (#1644)

9144c64

* Doing something wrong * Make fine-tuning work * Rename sensibly

Enable multiple student views for one target for JEPA (#1617)

699a8aa

* Enable multiple student views for one target * Improved readability

Fix test for empty targets in decoder creation (#1646)

88e809d

add regions to integration tests (#1648)

9bdd7d0

Allows for writing normalized samples; fixed config to keep it well-s…

15a8c29

…tructured (#1653)

add default streams + fix lead time error (#1670)

c054552

* add default streams + fix lead time error * update config * Correct a bug creating aggr issues on scores (#1685) --------- Co-authored-by: Savvas Melidonis <79579567+SavvasMel@users.noreply.github.com>

Update normlise output flag (#1681)

3aeb324

slurm script inference (#1675)

25948f3

* add argument * check stage argument * removed unnecessary code * arbitrary position arguments * Fix error text * get stage info from environment variable. * Update run_train.py --------- Co-authored-by: Simon Grasse <s.grasse@fz-juelich.de>

[infra] consistent cli options (#1668)

b37706c

* replace '_' with '-' * cli options underscore to dash * change underscores to hyphens * rename options in cli unit test

Fix bug for missing run_id path in model path (#1704)

9710f81

fix bar plot (#1698)

fa952ff

Co-authored-by: Savvas Melidonis <79579567+SavvasMel@users.noreply.github.com>

Fix output generation during inference (#1707)

34fa89a

* rename write_num_samples to num_samples * Fixing linting --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int>

backwards compatilble run_id look up (#1715)

d368400

remove misleading logging of mini_epoch (#1679)

e84d8d8

* remove misleading logging of mini_epoch * add forecast_steps logging

Fix duplicate run_id in results and runplots paths (#1716)

cfcad2e

* Fix duplicate run_id in results and runplots paths. Linting. * remove duplicate run_id also from metrics directory * Linting

latent_space evaluation scripts + propagate verbose

16f790d

lint

d0701eb

update latent space eval

49601ff

rebase to develop

9824a54

iluise added this to WeatherGen-dev Jan 29, 2026

iluise added 8 commits January 29, 2026 15:01

Merge branch 'develop' into iluise/develop/eval-latent-space

a24bbde

Update default_config.yml

9a713a8

Update default_forecast_config.yml

a097b3c

Update default_forecast_config.yml

2744792

Update csv_reader.py

74760c8

Update wegen_reader.py

668bd2c

Update plot_utils.py

594efb1

Update utils.py

725e145

iluise changed the title ~~Iluise/develop/eval latent space~~ implement evaluation routine for SSL Jan 29, 2026

iluise and others added 6 commits January 29, 2026 15:14

Update plotter.py

04a4872

lint

a984045

fix verbose

139a2e2

rename to ssl analysis

85bc1b2

lint

1c436bc

Merge branch 'develop' into iluise/develop/eval-latent-space

71ce8e4

clessig reviewed Feb 3, 2026

View reviewed changes

Merge branch 'develop' into iluise/develop/eval-latent-space

4667baa

iluise requested a review from sophie-xhonneux February 16, 2026 14:30

Fix logger reference in plotter.py

d5ff544

iluise mentioned this pull request Feb 25, 2026

Chain inference and evaluation to training #1925

Closed

6 tasks

tjhunter reviewed Feb 25, 2026

View reviewed changes

iluise added 3 commits March 2, 2026 18:37

merge develop

06e3acd

rebase to develop

2e64dd7

fix run inference

bdda1a6

iluise closed this Apr 9, 2026

github-project-automation Bot moved this to Done in WeatherGen-dev Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement evaluation routine for SSL#1747

implement evaluation routine for SSL#1747
iluise wants to merge 51 commits intodevelopfrom
iluise/develop/eval-latent-space

iluise commented Jan 29, 2026 •

edited

Loading

Uh oh!

clessig Feb 3, 2026

Uh oh!

iluise Feb 16, 2026

Uh oh!

sophie-xhonneux commented Feb 16, 2026

Uh oh!

iluise commented Feb 16, 2026

Uh oh!

tjhunter Feb 25, 2026

Uh oh!

iluise commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

iluise commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

clessig Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

iluise Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

sophie-xhonneux commented Feb 16, 2026

Uh oh!

iluise commented Feb 16, 2026

Uh oh!

tjhunter Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

iluise commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

iluise commented Jan 29, 2026 •

edited

Loading