TC hunt vol. 4 by mariusaurus · Pull Request #871 · NVIDIA/earth2studio

mariusaurus · 2026-05-20T15:36:58Z

Earth2Studio Pull Request

Description

This PR adds some plotting functionality for the TC pipeline.
Main components are two notebooks (as python cells with instructions how to convert) that produce field plots with tracks and comparison of track data with reference tracks.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

…l but needs further testing with batching.

…num workers

… are too many right now

…t of scope

…anups

…en nested thread pools

greptile-apps · 2026-05-22T12:37:11Z

Greptile Summary

This PR delivers the TC tracking visualisation tooling that was previously marked "coming soon": two JupyText notebook scripts (tracks_slayground_notebook.py and plot_tracks_n_fields_notebook.py) plus three supporting library modules (data_handling.py, plotting_helpers.py, analyse_n_plot.py). It also widens great_circle_distance in tc_hunt_utils.py to accept array inputs and exports EARTH_RADIUS_M as a shared constant.

data_handling.py contains two logic issues: remove_trailing_nans crashes with IndexError when a merged track has no non-NaN rows, and rebase_by_lead_time hardcodes \"msl\" as the trim variable regardless of the variables argument.
plot_tracks_n_fields_notebook.py uses xr.Dataset.sel with a floating-point step for spatial sub-selection, which is fragile for scale > 1.
analyse_n_plot.py will raise NameError if called with an empty cases list because err_dict is only assigned inside the loop.

Confidence Score: 3/5

The new library modules work for the documented happy path but have unchecked crash paths reachable in ordinary use.

Two crash paths in data_handling.py are reachable in ordinary use: remove_trailing_nans raises IndexError on any track with no valid overlapping timesteps, and rebase_by_lead_time hardcodes msl as the trim variable, breaking when callers omit it from variables.

data_handling.py needs the most attention; plot_tracks_n_fields_notebook.py and analyse_n_plot.py each have one smaller issue worth fixing before wide use.

Important Files Changed

Filename	Overview
recipes/tc_tracking/plotting/data_handling.py	New library module for track ingestion and error computation; contains a crashable IndexError path and a hardcoded variable assumption in the lead-time rebasing logic.
recipes/tc_tracking/plotting/analyse_n_plot.py	Batch analysis entry point; well-structured with a latent NameError if an empty case list is supplied.
recipes/tc_tracking/plotting/plotting_helpers.py	New plotting helper library; mostly clean.
recipes/tc_tracking/plotting/plot_tracks_n_fields_notebook.py	New JupyText notebook for animated field+track plots; uses xr.Dataset.sel for coarsening which is fragile for scale > 1.
recipes/tc_tracking/plotting/tracks_slayground_notebook.py	New JupyText notebook for ensemble track analysis; no significant issues.
recipes/tc_tracking/src/tc_hunt_utils.py	Exports EARTH_RADIUS_M constant and widens great_circle_distance to accept array inputs; clean, backward-compatible change.

Comments Outside Diff (4)

recipes/tc_tracking/plotting/data_handling.py, line 909-914 (link)

IndexError when track has no valid overlapping data

np.where(~either_nans)[0][-1] raises IndexError when every row has a NaN in at least one of var or var_tru — e.g. when a predicted track has no temporal overlap with the reference track after the left-join. This can realistically occur for short-lived predicted tracks or when the true track CSV is missing msl values for early timesteps.
recipes/tc_tracking/plotting/data_handling.py, line 946 (link)

Hardcoded "msl" in rebase_by_lead_time breaks for custom variable lists

remove_trailing_nans(merged_track, "msl") assumes "msl" is always a column in both the predicted track and tru_track, regardless of the variables argument passed by the caller. If a user calls compute_averages_of_errors_over_lead_time with variables=["dist"] (no "msl"), the merged frame won't have a "msl" column and a KeyError is raised. The trimming variable should be derived from variables instead of hard-wired.
recipes/tc_tracking/plotting/plot_tracks_n_fields_notebook.py, line 1358-1362 (link)

xr.Dataset.sel breaks for scale > 1 due to floating-point step mismatch

ds.sel(lat=list(np.arange(lat_min, lat_max, scale * 0.25)), ...) selects coordinates by exact label. For scale=1 the 0.25-degree step matches the grid, but np.arange with a float step accumulates floating-point rounding errors, and for larger scale values the step may not align with dataset coordinate values. Using ds.sel(..., method="nearest") or ds.isel(lat=slice(None, None, scale), lon=slice(None, None, scale)) after sub-setting the region would be more robust.
recipes/tc_tracking/plotting/analyse_n_plot.py, line 504-506 (link)

NameError when cases is an empty list

If cases=[] is passed, the loop body never runs and err_dict is never assigned. The subsequent storm_metrics["var"] = list(err_dict.keys()) line raises NameError: name 'err_dict' is not defined. A guard or pre-loop initialisation would prevent the confusing crash.

_{Reviews (1): Last reviewed commit: "Merge branch 'main' into mkoch/tc_hunt_4" | Re-trigger Greptile}

mariusaurus and others added 30 commits December 1, 2025 07:52

resolved conflict

d3a2245

update changelog

0189d52

move seed initialization and fix dxwrapper tests

f11b18b

tempest extremes diagnostic model

d063760

error message

a4d2544

testing if TE is available and works

c1cdca0

started working on support for batch sizes >1, currently works for bs 1

016f16b

halfway to larger batch support

68e33b5

enabling TE for batch sizes of >1. async version seems to work as wel…

7bd60e1

…l but needs further testing with batching.

option to pass file names to TE connector

3b0c00e

array equal test

1e9bbe8

first stable try

d6be6dd

support for per-member parallel execution and lets user controll max …

1e9b275

…num workers

precommit hooks

b5f5c18

vibe-coded some tests, need to be hand-tested and selected

af8bc71

vibe-coded some tests, need to be hand-tested and selected

a9fd2bc

passing all pre-commit tests, still need to sub-select tests as there…

526e6bf

… are too many right now

subselected tests

c3258d9

install doc

3fd145d

throwing an error in case cleanup is not called before object goes ou…

c26f453

…t of scope

custom depenmdency failure message for TE

d2a8e4a

moved tensor tiling and concatenation to utils

0ab6d67

enable setting fcn3 random seed

8ca3fae

add proper noise handling for fcn3

e93932e

fix linting and test issues

bc9e3ac

update lockfile

2685f90

move seed initialization and fix dxwrapper tests

e3a4e3d

tc tracking pipeline

1dec990

update

02945f1

updated uv.lock

f89efe3

mariusaurus and others added 23 commits April 8, 2026 07:50

removed some stale code, replaced use_ram with shm location, some cle…

01b1a00

…anups

replace TempestExtremes module-level globals with singleton and flatt…

f099214

…en nested thread pools

removed SFNO support, replaced print statements with loggers.

c86bfe0

refactored pyproject.toml

2252140

Merge branch 'main' into mkoch/tc_hunt_1

29c75a7

updated install notes

a6c0b3b

added reproducibility

3d3adcd

store overwrite

8198ac1

pt 3 init

fca4d39

updated install docs

0434fe7

Merge branch 'main' into mkoch/tc_hunt_1

eb341e7

Merge branch 'mkoch/tc_hunt_1' into mkoch/tc_hunt_2

f3cd9dc

Merge branch 'mkoch/tc_hunt_2' into mkoch/tc_hunt_3

05feb7c

Merge branch 'main' into mkoch/tc_hunt_1

d9cdda9

setuptools version

58a361f

optional AIFS/FCN3 deps

02a900c

Merge branch 'mkoch/tc_hunt_1' into mkoch/tc_hunt_2

36de792

Merge branch 'mkoch/tc_hunt_2' into mkoch/tc_hunt_3

5cdb8b9

data sources in config

c7d01c7

greptile feedback

cdca059

plotting

4105167

updated jupyter cells

6d634e0

merged main

45cf91d

mariusaurus requested a review from NickGeneva May 20, 2026 15:36

mariusaurus self-assigned this May 20, 2026

mariusaurus added 3 commits May 22, 2026 05:25

claude review

4f3ae2b

removed some dead code

8e3853c

Merge branch 'main' into mkoch/tc_hunt_4

1e7397d

mariusaurus marked this pull request as ready for review May 22, 2026 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TC hunt vol. 4#871

TC hunt vol. 4#871
mariusaurus wants to merge 127 commits into
NVIDIA:mainfrom
mariusaurus:mkoch/tc_hunt_4

mariusaurus commented May 20, 2026

Uh oh!

greptile-apps Bot commented May 22, 2026 •

edited

Loading

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (4)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mariusaurus commented May 20, 2026

Earth2Studio Pull Request

Description

Checklist

Dependencies

Uh oh!

greptile-apps Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (4)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented May 22, 2026 •

edited

Loading