perf: load cutout into memory and use dask threads by FabianHofmann · Pull Request #2136 · PyPSA/pypsa-eur

FabianHofmann · 2026-04-07T19:41:12Z

This PR is an alternative to #2135 and tries to optimize dask parameters instead of relying on a new atlite backend proposed in PyPSA/atlite#497.

The branch replaces the dask distributed scheduler (LocalCluster + Client) with the simpler dask threaded scheduler across all scripts that process atlite cutouts.

As far as I understood, the main issue with threaded dask distribution is that they are not thread-safe when using NetCDF files, therefore to get some performance gains, the whole Cutout needs to be loaded into memory first before actually distribution is scheduled.

A new helper function _get_netcdf_chunk_sizes() reads on-disk chunk sizes from netCDF files to align dask chunks with the stored layout. We then eagerly load and re-chunk pattern in load_cutout() which the makes threaded-scheduler usage safe. Note the pre-loading, the dasks workers would run on partial load, only using 20-30% of the cores.

Current performance gains:

22:10 min -> ~15 min

I will need to make some more research whether I am missing something, but perhaps this is a good place to continue the discussion @coroa

Checklist

Required:

Changes are tested locally and behave as expected.
Code and workflow changes are documented.
A release note entry is added to doc/release_notes.rst.

If applicable:

Changes in configuration options are reflected in scripts/lib/validation.
For new data sources or versions, these instructions have been followed.
New rules are documented in the appropriate doc/*.rst files.

coroa · 2026-04-07T19:45:31Z

As far as I understood, the main issue with threaded dask distribution is that they are not thread-safe when using NetCDF files, therefore to get some performance gains, the whole Cutout needs to be loaded into memory first before actually distribution is scheduled.

Source? That would be defeating. ie loading the cutout fully would be impossible in many cases.

FabianHofmann · 2026-04-07T20:01:33Z

As far as I understood, the main issue with threaded dask distribution is that they are not thread-safe when using NetCDF files, therefore to get some performance gains, the whole Cutout needs to be loaded into memory first before actually distribution is scheduled.

Source? That would be defeating. ie loading the cutout fully would be impossible in many cases.

very true - so this would not be an option. Here some links on netcdf IO locking

HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler pydata/xarray#1836
Parallel non-locked read using dask.Client crashes pydata/xarray#2190

It seems that the real solution would using zarr as format. but that would likely be another story.

FabianHofmann · 2026-04-07T20:10:16Z

It seems to be deep in the xarray history: there is a mentioning of a global lock in the release notes in https://github.com/pydata/xarray/blob/main/doc/whats-new.rst#v052-16-july-2015

xray.open_dataset and xray.open_mfdataset now use a global thread lock by default for reading from netCDF files with dask. This avoids possible segmentation faults for reading from netCDF4 files when HDF5 is not configured properly for concurrent access (:issue:444).

perf: load cutout into memory and use dask threads

920a89e

coroa mentioned this pull request Apr 7, 2026

perf: use chunks="auto" in a single thread #2137

Merged

6 tasks

FabianHofmann closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: load cutout into memory and use dask threads#2136

perf: load cutout into memory and use dask threads#2136
FabianHofmann wants to merge 1 commit intomasterfrom
perf/dask-tweaks

FabianHofmann commented Apr 7, 2026

Uh oh!

coroa commented Apr 7, 2026 •

edited

Loading

Uh oh!

FabianHofmann commented Apr 7, 2026

Uh oh!

FabianHofmann commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FabianHofmann commented Apr 7, 2026

Checklist

Uh oh!

coroa commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FabianHofmann commented Apr 7, 2026

Uh oh!

FabianHofmann commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coroa commented Apr 7, 2026 •

edited

Loading