You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. lmdb is a memory-mapped database format that provides random access to individual training events faster than SQLite.
The main benefits of lmdb: Faster query speeds, lower storage footprint, serialized data allows for an elegant implementation of storing arbitrary DataRepresentations (see #781).
For larger datasets, and particularly large events, SQLite can become prohibitive as a dataset format.
Describe the solution you'd like
A LMDBWriter that outputs events in lmdb format. Should accept a serialization method (msgpack, for example). Should optionally accept a DataRepresentation - if given, representations are pre-computed and serialized using dill or similar pickle methods (see Graph construction before training #781). A field in the database should contain relevant information regarding the serializer, such that the file is a self-contained object that users can read without prior knowledge of the serializer used.
A LMDBDataset that is compatible with the lmdb database format. Should automatically check for which serializer was used, so the user doesn't have to guess. Should be able to retrieve pre-computed data representations.
Is your feature request related to a problem? Please describe.
lmdb is a memory-mapped database format that provides random access to individual training events faster than SQLite.
The main benefits of
lmdb: Faster query speeds, lower storage footprint, serialized data allows for an elegant implementation of storing arbitraryDataRepresentations(see #781).For larger datasets, and particularly large events, SQLite can become prohibitive as a dataset format.
Describe the solution you'd like
LMDBWriterthat outputs events inlmdbformat. Should accept a serialization method (msgpack, for example). Should optionally accept a DataRepresentation - if given, representations are pre-computed and serialized usingdillor similar pickle methods (see Graph construction before training #781). A field in the database should contain relevant information regarding the serializer, such that the file is a self-contained object that users can read without prior knowledge of the serializer used.LMDBDatasetthat is compatible with the lmdb database format. Should automatically check for which serializer was used, so the user doesn't have to guess. Should be able to retrieve pre-computed data representations.