[codex] Add eval prognostic model configs by loliverhennigh · Pull Request #860 · NVIDIA/earth2studio

loliverhennigh · 2026-05-14T21:21:24Z

Summary

This adds a standard prognostic model catalog for the eval recipe, plus one narrow model-specific fix needed by the validated FengWu CPU fallback path.

The PR includes configs for ACE2, AIFS, AIFSENS, Atlas, Aurora, cBottleVideo, FCN, FengWu, GenCast Mini, GraphCast Small/Operational, Pangu 3/6/24, and SFNO. Existing DLWP, FCN3, DLESyM, and StormScope configs remain in the catalog.

What changed

Add standard recipes/eval/cfg/model/*.yaml configs for the prognostic model catalog.
Use concrete model module paths in the new configs, avoiding package-level prognostic model export changes.
Instantiate nested Hydra model.load_args, which is needed by models with configurable sources/load helpers.
Preserve tensor dtype/device after forecast-grid interpolation and initialize DistributedManager before rank-0-only work if needed.
Keep DLESyM-specific handling inside the eval recipe pipeline without adding a model-package import shim.
Add a FengWu CPU ONNXRuntime fallback path and update the FengWu ONNX test fixture to use dynamic_axes with the legacy exporter.
Document how the added model configs map to optional Earth2Studio extras without changing the eval lockfile.

Scope guardrails

This PR intentionally avoids broad import/export changes:

No changes to earth2studio/models/px/__init__.py.
No lazy package-level model exports.
No private PyTorch DTensor import shim in the DLESyM model source.
No GraphCast/GenCast source changes; the earlier xarray-copy cleanup was removed because it was not clearly necessary for inference.
No eval uv.lock or pyproject.toml changes.

The only Earth2Studio model implementation touched is FengWu, where CPU execution needs to bypass ONNXRuntime IO binding.

FuXi is intentionally not included in the recipe catalog in this PR. CPU ONNXRuntime fails on its fp16 com.microsoft.Gelu node, and the GPU retry confirmed CUDA/ORT visibility but still creates the initial session on CPU during model construction.

Validation

Current clean PR checks:

uv run pre-commit run --all-files
From recipes/eval: PYTHONPATH=../.. uv run --extra dev pytest test -q
- 333 passed, 4 skipped, 147 warnings
PYTHONPATH=recipes/eval /Users/oliverhennigh/Documents/New\ project\ 5/earth2studio/.venv/bin/python -m pytest recipes/eval/test/test_models.py recipes/eval/test/test_data.py -q
- 49 passed, 21 warnings
/Users/oliverhennigh/Documents/New\ project\ 5/earth2studio/.venv/bin/python -m pytest test/models/px/test_fengwu.py -q -k 'test_fengwu_call and cpu'
- 2 passed, 10 deselected, 13 warnings
git diff --check upstream/main

NVL72 sweep notes:

The broader model sweep exercised the new configs through recipes/eval/main.py, nsteps=1, with readable forecast.zarr outputs for the working model set.
Temporary broad import/export investigations from that sweep have been removed from this PR.

loliverhennigh force-pushed the codex/eval-prognostic-models branch 10 times, most recently from d3c5229 to 25413f9 Compare May 18, 2026 18:21

Add eval prognostic model configs

ef2deea

loliverhennigh force-pushed the codex/eval-prognostic-models branch from 25413f9 to ef2deea Compare May 18, 2026 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add eval prognostic model configs#860

[codex] Add eval prognostic model configs#860
loliverhennigh wants to merge 1 commit into
NVIDIA:mainfrom
loliverhennigh:codex/eval-prognostic-models

loliverhennigh commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loliverhennigh commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Scope guardrails

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

loliverhennigh commented May 14, 2026 •

edited

Loading