Skip to content

perf: load cutout into memory and use dask threads#2136

Closed
FabianHofmann wants to merge 1 commit intomasterfrom
perf/dask-tweaks
Closed

perf: load cutout into memory and use dask threads#2136
FabianHofmann wants to merge 1 commit intomasterfrom
perf/dask-tweaks

Conversation

@FabianHofmann
Copy link
Copy Markdown
Contributor

This PR is an alternative to #2135 and tries to optimize dask parameters instead of relying on a new atlite backend proposed in PyPSA/atlite#497.

The branch replaces the dask distributed scheduler (LocalCluster + Client) with the simpler dask threaded scheduler across all scripts that process atlite cutouts.

As far as I understood, the main issue with threaded dask distribution is that they are not thread-safe when using NetCDF files, therefore to get some performance gains, the whole Cutout needs to be loaded into memory first before actually distribution is scheduled.

A new helper function _get_netcdf_chunk_sizes() reads on-disk chunk sizes from netCDF files to align dask chunks with the stored layout. We then eagerly load and re-chunk pattern in load_cutout() which the makes threaded-scheduler usage safe. Note the pre-loading, the dasks workers would run on partial load, only using 20-30% of the cores.

Current performance gains:

22:10 min -> ~15 min

I will need to make some more research whether I am missing something, but perhaps this is a good place to continue the discussion @coroa

Checklist

Required:

  • Changes are tested locally and behave as expected.
  • Code and workflow changes are documented.
  • A release note entry is added to doc/release_notes.rst.

If applicable:

  • Changes in configuration options are reflected in scripts/lib/validation.
  • For new data sources or versions, these instructions have been followed.
  • New rules are documented in the appropriate doc/*.rst files.

@coroa
Copy link
Copy Markdown
Member

coroa commented Apr 7, 2026

As far as I understood, the main issue with threaded dask distribution is that they are not thread-safe when using NetCDF files, therefore to get some performance gains, the whole Cutout needs to be loaded into memory first before actually distribution is scheduled.

Source? That would be defeating. ie loading the cutout fully would be impossible in many cases.

@FabianHofmann
Copy link
Copy Markdown
Contributor Author

As far as I understood, the main issue with threaded dask distribution is that they are not thread-safe when using NetCDF files, therefore to get some performance gains, the whole Cutout needs to be loaded into memory first before actually distribution is scheduled.

Source? That would be defeating. ie loading the cutout fully would be impossible in many cases.

very true - so this would not be an option. Here some links on netcdf IO locking

It seems that the real solution would using zarr as format. but that would likely be another story.

@FabianHofmann
Copy link
Copy Markdown
Contributor Author

It seems to be deep in the xarray history: there is a mentioning of a global lock in the release notes in https://github.com/pydata/xarray/blob/main/doc/whats-new.rst#v052-16-july-2015

xray.open_dataset and xray.open_mfdataset now use a global thread lock by default for reading from netCDF files with dask. This avoids possible segmentation faults for reading from netCDF4 files when HDF5 is not configured properly for concurrent access (:issue:444).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants