Skip to content

[1438] Adding padding to end_date to avoid duplicate samples#1749

Open
enssow wants to merge 1 commit intoecmwf:developfrom
enssow:sorcha/dev/1438
Open

[1438] Adding padding to end_date to avoid duplicate samples#1749
enssow wants to merge 1 commit intoecmwf:developfrom
enssow:sorcha/dev/1438

Conversation

@enssow
Copy link
Contributor

@enssow enssow commented Jan 29, 2026

Description

TimeWindowHandler doesn't produce enough available forecast initilisation times to choose for samples when run inference on a model trained with $n_{fstep}$ forecast steps and $n_{samples}*dt\geq t_{end} - t_{start}$.
Where $n_{fstep}=$--forecast_steps, $n_{samples}=$--samples, $dt=$--step_hours, $t_{start}=$--start, $t_{end}=$--end
(See #1438 and #1085) for more info

This PR provides this padding by working out how many available individual initialisation times there are and adjusting the end of the time window to accomodate that and taking into account the extra time needed to accomodate the number of forecast steps to rollout to

Issue Number

Closes #1438

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@enssow
Copy link
Contributor Author

enssow commented Jan 29, 2026

Tested on SANTIS for:

  • uv run inference --from-run-id f4duf5ji --samples 254 --streams-output ERA5 --options training_config.forecast.num_steps=5
  • uv run inference --from-run-id f4duf5ji --samples 254 --streams-output ERA5
    both now do not return duplication warning and inference_id cixysv6l was run to completion with success

@ankitpatnala
Copy link
Contributor

ankitpatnala commented Feb 3, 2026

Thanks @enssow for handling this issue
I tested the code using
srun uv run inference --from-run-id f4duf5ji --samples 10 -start="2022-10-01" -end="2022-10-02" --streams-output ERA5 --options training_config.forecast.num_steps=5
The code functioned way it has been described.

But I still do not know what will be a better strategy; should we pad with available dates or decrease the num_samples to defined range date. What if there is no data after the defined end_date. It will throw an error or return empty tensors.

@ankitpatnala
Copy link
Contributor

Can you run some inference with this options
srun uv run inference --from-run-id f4duf5ji --samples 10 -start=2022-10-01 -end=2022-10-02 --streams-output ERA5 --options training_config.forecast.num_steps=10 training_config.forecast.time_step=03:00:00

I saw some unwanted behaviour there

Logging set up. Logs are in ./output/uv3yi4ac
DDP initialization: rank=0, world_size=1
Using adjusted end date 2022-10-03T00:00:00.000000000 instead of 2022-10-02T00:00:00.000000000
TimeWindowHandler: start=2022-10-01T00:00:00.000000000, end=2022-10-03T00:00:00.000000000, len=06:00:00, step=06:00:00`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Duplicate samples during inference due to different length assumtions in MultiStreamDataReader

2 participants