perf: load cutout into memory and use dask threads#2136
perf: load cutout into memory and use dask threads#2136FabianHofmann wants to merge 1 commit intomasterfrom
Conversation
Source? That would be defeating. ie loading the cutout fully would be impossible in many cases. |
very true - so this would not be an option. Here some links on netcdf IO locking
It seems that the real solution would using zarr as format. but that would likely be another story. |
|
It seems to be deep in the xarray history: there is a mentioning of a global lock in the release notes in https://github.com/pydata/xarray/blob/main/doc/whats-new.rst#v052-16-july-2015
|
This PR is an alternative to #2135 and tries to optimize dask parameters instead of relying on a new atlite backend proposed in PyPSA/atlite#497.
The branch replaces the dask distributed scheduler (
LocalCluster+Client) with the simpler dask threaded scheduler across all scripts that process atlite cutouts.As far as I understood, the main issue with threaded
daskdistribution is that they are not thread-safe when using NetCDF files, therefore to get some performance gains, the wholeCutoutneeds to be loaded into memory first before actually distribution is scheduled.A new helper function
_get_netcdf_chunk_sizes()reads on-disk chunk sizes from netCDF files to align dask chunks with the stored layout. We then eagerly load and re-chunk pattern inload_cutout()which the makes threaded-scheduler usage safe. Note the pre-loading, the dasks workers would run on partial load, only using 20-30% of the cores.Current performance gains:
22:10 min -> ~15 min
I will need to make some more research whether I am missing something, but perhaps this is a good place to continue the discussion @coroa
Checklist
Required:
doc/release_notes.rst.If applicable:
scripts/lib/validation.doc/*.rstfiles.