minimum-viable-product checklist

This is a somewhat-detailed list of all the features to be implemented for the first actually useful version of scribe, which might be called "1.0" then.

Tome type

finalize and document overall design of Tome. done. see docs/tome.md for detailed docs.
libfmt support for Tome
use xtensor as array backend.
use std::map<std::string, ...> as backend for dicts
compact "array-of-numbers" implementation for integer/float/complex.
rewrite examples in Tome docs to use top-level dicts
"rank" -> "ndim"

Schema validation

Validation has to happen both with the exicit scribe validate command, as well as the scribe::read(...) and scribe::write(...) functions that take a schema. Un-validated read/write only does basic consistency checks.

string validation: min/max length
integer validation: implicit range of type
array validation: check shape and recurse to elements
dict validation: check keys and recurse to elements
validation errors should indicate a (human-readable) location of the failure
if chunksize is part of the schema, it has to match.
schema documentation. See docs/schema.md
write a json-schema for the format of a schema itself.

Reading/Writing

JSON

integers: precise domains according to the type (e.g. [-128,127] for int8). Needs careful attention to nlohmann's handling of numbers close to the limits.
floats
complex
strings
arrays. Assume everything 1D when no schema is given.
~~[ ] dicts. Make sure to reject duplicate keys.~~ duplicate keys considered undefined
'any' schema reading

HDF5

float: precision has to match exactly
integer: exact width/signdness match
complex, string: document precise mapping (because these are not part of the HDF5 spec). We follow what HighFive does by default. Check how Grid saves complex numbers. MIght need flag in schema/traits/hdf5
arrays of numbers via datasets
arrays of non-numbers via groups (potentially nested)
activate chunking (single-chunk is okay as a default)
activate fletcher32
reading with 'any' schema

Code generation

Generate C++ code using the scribe codegen command.

Python binding

read data given data (hdf5 file) and a schema
python binding of validation

Testing infrastructure

Some design decisions

What should Tome::operator[] return? Answer: Tome &, even though that does not work with compact numerical arrays. Using .get<double>(index) is preferred anyway.
Should we interpred non-existing data as empty dict/array/strings ? Answer: No, lets be strict for now.
What does a (multi-dimensional) array map to when generating C++ code? Answer: always xtensor, regardless of rank. Customizability can be added later.

large files

partial reads (not the same as "lazy reads")
performance tests: create datasets with millions of entries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minimum-viable-product checklist

Tome type

Schema validation

Reading/Writing

JSON

HDF5

Code generation

Python binding

Testing infrastructure

Some design decisions

large files

FilesExpand file tree

mvp_checklist.md

Latest commit

History

mvp_checklist.md

File metadata and controls

minimum-viable-product checklist

Tome type

Schema validation

Reading/Writing

JSON

HDF5

Code generation

Python binding

Testing infrastructure

Some design decisions

large files