You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This means that with some small updates the current Iris NetCDF loader.py and saver.py should be able implicitly handle Zarr format.
Details
To invoke the nczarr data model in NetCDF it is required to pass a URL to the Dataset, rather than a normal file path, e.g. for a Zarr dataset store locally here:
You can use the ncdump program on the command line to prove this behaviour without needing to go via the python NetCDF package, e.g: ncdump -h file:////data/users/joe.bloggs/air_pressure.zarr#mode=nczarr,file
The implementation of this could be done in two ways:
Option 1: Allow users to pass a URL to the loader/saver.
This already works 🥳 if you call the netCDF loader directly with the URL, e.g. iris.fileformats.netcdf.loader.load_cubes
Works for the saving with one small modification:
Don't get abspath for URLs (we need to keep it as a URL, not a path)
If iris.save you also need to explicitly pass the saver='nc' keyword.
Currently fails if you try load via iris.load because:
iris.loading will fail as it treats the URL as a plain file (the #mode part ends up in the file name).
The URL needs to be passed as-is to iris.fileformats.netcdf.loader
There is no FORMAT_AGENT that can be matched against a Zarr file (tricky as it is a directory, not a file)
Option 2: Update the loading and saving API to include an nczarr keyword
This could handle some of the complexity above by allowing the user to pass a normal file path
The loader/saver would convert this to the relevant file://...#mode=nczarr,fie URL internally.
Would still need special handling for the FORMAT_AGENT check?
Caveats
The nczarr Data Model is based on Zarr Specification V2. The current Zarr Specification is V3.
This means that files generated with the more modern V3 Spec will not be readable by nczarr (at the moment at least - assumedly the nczarr data model will eventually support the V3 spec too)
A quick proof-of-concept showed that the nczarr netCDF driver does not write scalar variables, but rather creates a single element array of shape=(1,) and a dimension_names=["_scalar_"]. This means that things like grid_mapping variables get written out as a dimensioned array, rather than a scalar array.
This seems to confuse Iris when loading these variables as they fail to be recognised as their expected CF type and end up as CFDataVariables. This would need investigating on cf.py and some specific handling put in place for this (check for dimensions == ["_scalar_"])?
✨ Feature Request
Since v4.8.0, the NetCDF-c library can read/write Zarr files using the
nczarrbackend.Ref: https://docs.unidata.ucar.edu/nug/current/ncZarr_head.html
This means that with some small updates the current Iris NetCDF
loader.pyandsaver.pyshould be able implicitly handle Zarr format.Details
To invoke the nczarr data model in NetCDF it is required to pass a URL to the
Dataset, rather than a normal file path, e.g. for a Zarr dataset store locally here:/data/users/joe.bloggs/air_pressure.zarrwould need to be specified using the URL:
file:////data/users/joe.bloggs/air_pressure.zarr#mode=nczarr,fileTip
You can use the
ncdumpprogram on the command line to prove this behaviour without needing to go via the python NetCDF package, e.g:ncdump -h file:////data/users/joe.bloggs/air_pressure.zarr#mode=nczarr,fileThe implementation of this could be done in two ways:
Option 1: Allow users to pass a URL to the loader/saver.
iris.fileformats.netcdf.loader.load_cubesabspathfor URLs (we need to keep it as a URL, not a path)iris.saveyou also need to explicitly pass thesaver='nc'keyword.iris.loadbecause:iris.loadingwill fail as it treats the URL as a plain file (the#modepart ends up in the file name).iris.fileformats.netcdf.loaderFORMAT_AGENTthat can be matched against a Zarr file (tricky as it is a directory, not a file)Option 2: Update the loading and saving API to include an
nczarrkeywordfile://...#mode=nczarr,fieURL internally.FORMAT_AGENTcheck?Caveats
The
nczarrData Model is based on Zarr Specification V2. The current Zarr Specification is V3.nczarr(at the moment at least - assumedly thenczarrdata model will eventually support the V3 spec too)A quick proof-of-concept showed that the nczarr netCDF driver does not write scalar variables, but rather creates a single element array of
shape=(1,)and adimension_names=["_scalar_"]. This means that things likegrid_mappingvariables get written out as a dimensioned array, rather than a scalar array.cf.pyand some specific handling put in place for this (check fordimensions == ["_scalar_"])?Links: