Skip to content

Roundtripping inconsistent for objects of shape=() and numpy scalars... #98

@hmaarrfk

Description

@hmaarrfk

It seems that there are some edge cases with serializing scalars and numpy arrays with 0-dimensions

This has always confused me, so for precision, I quote numpy's documentation

An array scalar is an instance of the types/classes float32, float64, etc., whereas a 0-dimensional array is an ndarray instance containing precisely one array scalar.

So I added the following test to the file tests/test_np.py to see if things serialize correctly:

def test_nd_array_shape_empty():
    to_dump = zeros((), dtype='uint32')
    to_dump[...] = 123

    the_dumps = dumps(to_dump)
    the_double_dumps = dumps(loads(dumps(to_dump)))

    assert the_dumps == the_double_dumps
_________________________________ test_nd_array_shape_empty __________________________________

    def test_nd_array_shape_empty():
        to_dump = zeros((), dtype='uint32')
        to_dump[...] = 123
    
        the_dumps = dumps(to_dump)
        the_double_dumps = dumps(loads(dumps(to_dump)))
    
>       assert the_dumps == the_double_dumps
E       assert '{"__ndarray_... "shape": []}' == '123'
E         - 123
E         + {"__ndarray__": 123, "dtype": "uint32", "shape": []}

tests/test_np.py:197: AssertionError

After round tripping, we aren't preserving the "0-dimension" and it is being downcast to a numpy-scalar.

This strange behavior leads to the test suite using encode_scalars_inplace as an attempt to workaround the warning that "scalars cannot be reliably encoded", then turns around to using the strange "automatic downcast" behavior to recover the original structure.
https://github.com/mverleg/pyjson_tricks/blob/master/tests/test_np.py#L170

However, this would be incorrect if the user mixes "0-dimensional" numpy arrays.

Now, I originally came here to try to address a pretty specific usecase of ours: I am trying to serialize numpy datetime64 scalars. I tried to augment them to 0-dimensional arrays, however, this breaks two assumptions made in pyjson_tricks:

  • numpy.datetime64 objects are not numpy.generics used in the encoder
  • numpy.datetime64 constructors cannot be obtained with the getattr used in the decoder since they have units (annoying I know, date is complicated)

My proposal unfortunately requires breaking anybody using replace_scalars_inplace such as the test suite

Other references

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions