Skip to content

Pint vs cf-units: an investigation #6929

@stephenworsley

Description

@stephenworsley

📰 Custom Issue

Pint is a popular units package which has the advantage over cf-units that it is a pure Python package which doesn't rely on a C library (like UDUNITS for cf-units). This allows it to be pip installable. switching to a Pint implementation of units which is CF compliant would therefore make iris and all its dependencies pip installable.

There are, however, several notable differences between Pint and cf-units which would need to be either addressed, or swallowed as a change in behaviour (possibly breaking). Some of these, it could be argued, would be improvements, though where possible we would aim for a faithful emulation of UDUNITS, which the CF conventions treat as an authoratative source.


Different Units Supported

Differences in the what units are supported in cf-units and Pint fall into different categories I will describe below. I don't aim to give an exhaustive list of exactly what units fall into these categories, but rather aim to hint at what the scope for such an exhaustive list might be expected to encompass.

It should be noted that Pint makes it easy to support the defining of additional units, so these differences are not expected to be a major obstacle.

Units in UDUNITS but not in Pint

e.g. degrees_north...

Units in CF and cf-units, but not in UDUNITS or Pint

e.g. level, layer...

Units in cf-units but not in CF, UDUNITS or Pint

e.g. unknown, no-unit

Units in Pint but not in cf-units

e.g. delta_degree_Celsius...

Units unique to cfunits

The package cfunits, not to be confused with cf-units, which also offers a python implementation of UDUNITS has the unit calendar_year, which is unique to it.

Units in both cf-units and Pint which have different values

e.g. year. In Pint a year is 365.25 days, in CF it is 365.242198781 days.


Differences in Representation

There are differences in that way units are represented in cf-units and Pint, both as input and output.

Exponentials

In cf-units, exponentials are implicit; in Pint they are explicit. This is illustrated by this issue: hgrecco/pint#851.

>>>import pint
>>>ureg = pint.UnitRegistry()
>>>test_str = '1e6 Hz s-2'
>>>print(ureg(test_str))

999998.0 dimensionless

>>>from cf_units import Unit
>>>test_str = '1e6 Hz s-2'
>>>print(Unit(test_str).definition)

1000000 s-3

Note: it is shown in this issue that there is an easy fix which allows Pint to behave more like cf-units.

Dimensionless Units

When dimensionless quantities are represented, the word "dimensionless" is added (as described in hgrecco/pint#1486). This is done in a way which is not simple to modify.


Differences in Behaviour

There are further particular differences in behaviour with respect to unit multiplication and conversion.

Conversion From Units and Arrays vs Quantities

Within cf-units, unit conversion could be expressed as a function, involving Units applied to a quantity or array. e.g.

>>> Unit("m").convert(np.array([100, 200]), Unit("km"))
array([0.1, 0.2])

Within Pint, as far as I'm aware, conversion is done by first making a Quantity out of an array and calling a method which returns another Quantity.

>>> ureg.Quantity(np.array([100, 200]), "m").to("km")
<Quantity([0.1 0.2], 'kilometer')>

This suggests that the proper way of shifting to Pint is to replace whichever arrays have associated Units with Quantity objects. It should still be possible to perform unit conversions using the existing Iris paradigm of holding a unitless dask/numpy array in cube.data and an associated unit in cube.units, but performing unit conversion may become more cumbersome (at least with respect to API) in such a case.

Simplification After Multiplication

cf-units will simplify units after multiplying them:

>>> Unit("km") * Unit("m")
Unit('1000 m2')

Pint tends to maintain their original definition:

>>> ureg.Unit("km") * ureg.Unit("m")
<Unit('kilometer * meter')>

It is possible to perform such a simplification manually, but it appears this must be done for Quantities rather than Units:

>>> (ureg.Quantity(1, "km") * ureg.Quantity(1, "m")).to_base_units()
<Quantity(1000.0, 'meter ** 2')>

Special Behaviour of unknown/no-unit

The Units "unknown" and "no-unit" behave differently under multiplication than any unit in Pint I am aware of. For any unit (other than "no-unit") multiplying by "unknown" will result in "unknown". Multiplying any unit by "no-unit" will result in an error.

Special Behaviour of Units With Offsets

Units with offsets, e.g. Celsius, have some subtle and unusual behaviours. One problem which often arises is that there is often an ambiguity as to whether Celsius refers to an actual temperature (degrees Celsius) or refers to a difference in temperature (Celsius).

Awkwardly, cf-units considers "degrees_celsius" and "celsius" to be equivalent.

>>> Unit("celsius") == Unit("degree_celsius")
True

Further complicating this, when "celsius" is multiplied, it is simplified to "Kelvin".

>>> Unit("degrees_celsius") * Unit(1)
Unit('K')
>>> Unit("celsius") * Unit(1)
Unit('K')

Pint, on the other hand, includes the unit "delta_degree_Celsius", which explicitly represents difference in Celsius. This automatically replaces Celsius whenever Celsius is involved in a compound unit:

>>> ureg.Unit("celsius")
<Unit('degree_Celsius')>
>>> ureg.Unit("celsius s ** -1")
<Unit('delta_degree_Celsius / second')>

Notably, "delta_degree_Celsius" and "celsius" convert differently in Pint:

>>> ureg.Quantity(1, "celsius").to_base_units()
<Quantity(274.15, 'kelvin')>
>>> ureg.Quantity(1, "delta_degree_Celsius").to_base_units()
<Quantity(1, 'kelvin')>

Also worth noting, "celsius" does not allow multiplication in Pint and throws an error, likely due to this inherent ambiguity. The approach Pint takes seems more somewhat more robust than cf-units, though the conversion of "celsius" to "delta_degree_Celsius" may be slightly arbitrary, but introducing these changes seems like it would be disruptive.


cftime

cftime is a major part of cf-units, but is pip installable on its own so there may be an option to inherit from it derectly. Any replacement for cf-units ought to reckon with how cftime would relate to its replacement. It's worth investigating how possible it might be to implement cftime either through or alongside Pint. Is there a way of integrating cftime into Pint or would it make sense to create an object which can represent either a cftime object or a Pint object? It's worth also considering how this would relate to other packages which make use of cftime such as nc-time-axis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions