Skip to content

Conversation

@slevang
Copy link
Contributor

@slevang slevang commented Nov 20, 2025

Implements this proposal, bypassing the list_engines call in the case that an explicit string engine is passed that exists in the standard list. This can shave up to a few seconds off the first open_dataset call depending on your env size.

Only side effect I can see is it does change the error reporting for missing engines. Now we get:

ImportError: The zarr package is required for working with Zarr stores but could not be imported. Please install it with your package manager (e.g. conda or pip).

Instead of:

ValueError: unrecognized engine 'zarr' must be one of your download engines: ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib', 'gini', 'rasterio', 'store']. To install additional dependencies, see:
https://docs.xarray.dev/en/stable/user-guide/io.html 
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html

This also doesn't help for non-standard engines, e.g. open_dataset(..., engine="cfgrib"), but if speed is crucial you can pass the backend object itself in that case.

@jsignell
Copy link
Contributor

jsignell commented Dec 5, 2025

The new error message kind of seems better than the old one honestly for engine='zarr'. Do you think it's worth adding a test for this change? Would it be too heavy to try to have a test that proves that for these default engines you never need to call list_engines?

Copy link
Contributor

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I love the test 😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

First invocation of open_dataset takes 3 seconds due to backend entrypoint discovery being slow

2 participants