Open parflow files with xarray
pf-xarray is an extension to xarray that is meant to
make working with Parflow input and output data much easier.
This package introduces extensions which aim to make it simple to load in such datasets into xarray data structures in a standard and robust way.
This library is stull under rapid development so APIs may change and features will be added at a reasonable pace. Check back often to stay in sync!
This project is in the very early stages, so you will have to install
from source to use pf-xarray:
git clone https://github.com/arbennett/pf-xarray.git
cd pf-xarray
python setup.py develop
Once installed there is nothing to import. This package implements backends
and extensions that are directly hooked into xarray itself. You can use
the functionality to read Parflow data with:
import xarray as xr
# Read a single 'pfb' file
da = xr.open_dataarray(path_to_pfb, name=varname)
# Read several 'pfb' files given a 'pfmetadata' file
ds = xr.open_dataset(path_to_pfmetadata)
The xarray backend uses the underlying ParflowBinaryReader class to read
the Parflow binary files. It can be used in standalone mode if you do not wish
to interact with files as xarray objects, but rather as numpy arrays. To use
reader you should use it like other context manager
objects. To open a single file and read the entire dataset you can do:
from pf_xarray import ParflowBinaryReader, read_pfb, read_stack_of_pfbs
with ParflowBinaryReader(your_pfb_file) as pfb:
data = pfb.read_all_subgrids()
Similarly, you can select out subsets of the data with the read_subarray method.
For instance, to read a (100, 100, 1) slice of the data you would use:
start_x = 100
start_y = 100
start_z = 0
nx, ny, nz = 100, 100, 1
with ParflowBinaryReader(your_pfb_file) as pfb:
data = pfb.read_subarray(start_x, start_y, start_z, nx, ny, nz)
To facilitate ease of use we also have implemented two helper functions, read_pfb
and read_stack_of_pfbs. To simply load a full single pfb you can do:
data = read_pfb(your_pfb_file)
In the case that you want to load up a timeseries of pfb files that all have the
same shape you can use the read_stack_of_pfbs to efficiently read them into a
single array of shape (n_timesteps, x, y, z). The function does this be reducing
overhead of reading all of the subgrid headers and simply reuses the offsets to read
subgrid data across all of the files. It can be used as:
data = read_stack_of_pfbs(your_list_of_pfb_files)
There is also support for adding indexing to the read_stack_of_pfbs so you don't have
to read the entire spatial extent of all of the files. This is accomplished by adding
a key dictionary to the function call. The format of key should be:
{'x': {'start': start_x, 'stop': end_x},
'y': {'start': start_y, 'stop': end_y},
'z': {'start': start_z, 'stop': end_z}}
For example, to do something similar to the ParflowBinaryReader.read_subarray example:
key = {'x': {'start': 100, 'stop': 200},
'y': {'start': 100, 'stop': 200},
'z': {'start': 0, 'stop': 1}}
data = read_stack_of_pfbs(your_list_of_pfb_files, key)
In the future we hope that this package can provide broader functionality and
make use of the PFTools ecosystem. Some planned features are:
- Compatibility with the
pftools.hydrologymodule so that we can calculate derived variables seamlessly and efficiently. - Ability to be constructed from a
pftools.Runobject. - Ability to write out
pfbfiles.
Check back for more development notes!