Skip to content

Implicit Zarr support via nczarr #6967

@ukmo-ccbunney

Description

@ukmo-ccbunney

✨ Feature Request

Since v4.8.0, the NetCDF-c library can read/write Zarr files using the nczarr backend.
Ref: https://docs.unidata.ucar.edu/nug/current/ncZarr_head.html

This means that with some small updates the current Iris NetCDF loader.py and saver.py should be able implicitly handle Zarr format.

Details

To invoke the nczarr data model in NetCDF it is required to pass a URL to the Dataset, rather than a normal file path, e.g. for a Zarr dataset store locally here:

/data/users/joe.bloggs/air_pressure.zarr

would need to be specified using the URL:

file:////data/users/joe.bloggs/air_pressure.zarr#mode=nczarr,file

Tip

You can use the ncdump program on the command line to prove this behaviour without needing to go via the python NetCDF package, e.g:
ncdump -h file:////data/users/joe.bloggs/air_pressure.zarr#mode=nczarr,file

The implementation of this could be done in two ways:

  • Option 1: Allow users to pass a URL to the loader/saver.

    • This already works 🥳 if you call the netCDF loader directly with the URL, e.g. iris.fileformats.netcdf.loader.load_cubes
    • Works for the saving with one small modification:
      • Don't get abspath for URLs (we need to keep it as a URL, not a path)
      • If iris.save you also need to explicitly pass the saver='nc' keyword.
    • Currently fails if you try load via iris.load because:
      • iris.loading will fail as it treats the URL as a plain file (the #mode part ends up in the file name).
        • The URL needs to be passed as-is to iris.fileformats.netcdf.loader
      • There is no FORMAT_AGENT that can be matched against a Zarr file (tricky as it is a directory, not a file)
  • Option 2: Update the loading and saving API to include an nczarr keyword

    • This could handle some of the complexity above by allowing the user to pass a normal file path
    • The loader/saver would convert this to the relevant file://...#mode=nczarr,fie URL internally.
      • Would still need special handling for the FORMAT_AGENT check?

Caveats

  • The nczarr Data Model is based on Zarr Specification V2. The current Zarr Specification is V3.

    • This means that files generated with the more modern V3 Spec will not be readable by nczarr (at the moment at least - assumedly the nczarr data model will eventually support the V3 spec too)
  • A quick proof-of-concept showed that the nczarr netCDF driver does not write scalar variables, but rather creates a single element array of shape=(1,) and a dimension_names=["_scalar_"]. This means that things like grid_mapping variables get written out as a dimensioned array, rather than a scalar array.

    • This seems to confuse Iris when loading these variables as they fail to be recognised as their expected CF type and end up as CFDataVariables. This would need investigating on cf.py and some specific handling put in place for this (check for dimensions == ["_scalar_"])?

Links:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions