As of v0.6.3, I'm not sure if it's possible to serialize and deserialize NumPy arrays of nested record types.
Example model:
ChildRecord: !record
fields:
c: int
ParentRecord: !record
fields:
p: int
child: ChildRecord
MyProtocol: !protocol
sequence:
records: ParentRecord[,]
Now, walking through the example as if I'm a new user of Yardl:
Step 1
If I attempt to write a NumPy array of ParentRecord, I get a Yardl error about its dtype:
child = issue.ChildRecord(c=42)
parent = issue.ParentRecord(p=7, child=child)
records = np.tile(parent, (3, 4))
with issue.BinaryMyProtocolWriter("data.bin") as w:
w.write_records(records)
...
File "/workspaces/yardl/joe/issue-#194/python/issue/_binary.py", line 1129, in _write_data
raise ValueError(message)
ValueError: Expected dtype {'names': ['p', 'child'], 'formats': ['<i4', [('c', '<i4')]], 'offsets': [0, 4], 'itemsize': 8, 'aligned': True}, got object
This is documented behavior: https://microsoft.github.io/yardl/python/language.html#arrays.
Step 2
Unfortunately, we can't just "set" the correct dtype, e.g.
records = np.tile(parent, (3, 4)).astype(issue.get_dtype(issue.ParentRecord))
File "/workspaces/yardl/joe/issue-#194/python/test.py", line 60, in main
records = np.tile(parent, (3, 4)).astype(issue.get_dtype(issue.ParentRecord))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'ParentRecord'
Step 3
So I'll manually transform my data to a NumPy structured array (noting that this is not very user-friendly).
records = np.tile(
np.array(
(parent.p, (parent.child.c,)), dtype=issue.get_dtype(issue.ParentRecord)
),
(3, 4),
)
This allows me to successfully write my array but now I get an error when reading the array!
with issue.BinaryMyProtocolReader("data.bin") as r:
records_read = r.read_records()
...
File "/workspaces/yardl/joe/issue-#194/python/test.py", line 72, in main
records_read = r.read_records()
^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/issue-#194/python/issue/protocols.py", line 113, in read_records
value = self._read_records()
^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/issue-#194/python/issue/binary.py", line 43, in _read_records
return _binary.NDArraySerializer(ParentRecordSerializer(), 2).read(self._stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/issue-#194/python/issue/_binary.py", line 1251, in read
return self._read_data(stream, shape)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/issue-#194/python/issue/_binary.py", line 1149, in _read_data
result[i] = self.element_serializer.read_numpy(stream)
~~~~~~^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'ChildRecord'
As of v0.6.3, I'm not sure if it's possible to serialize and deserialize NumPy arrays of nested record types.
Example model:
Now, walking through the example as if I'm a new user of Yardl:
Step 1
If I attempt to write a NumPy array of
ParentRecord, I get a Yardl error about its dtype:This is documented behavior: https://microsoft.github.io/yardl/python/language.html#arrays.
Step 2
Unfortunately, we can't just "set" the correct
dtype, e.g.Step 3
So I'll manually transform my data to a NumPy structured array (noting that this is not very user-friendly).
This allows me to successfully write my array but now I get an error when reading the array!