Speaking personally, the best outcome of this would be to find that someone has already solved the problems we're thinking about, or at least at least has a solution that can be extended to cover this project's specific application focuses (#1 and #10)
Below is a continuously-updated list of other projects. Everything here should be with a grain of salt, as it's an attempt to glean information from many different specifications :)
MOSAIC
Formats: XML, HDF5 (with straightforward extensions based on the data model)
License: Creative Commons 3.0
Units: List of supported units in spec
Design criteria: https://mosaic-data-model.github.io/design_criteria.html
Data stored: topology, CG info, selections (i.e., subsets of the file's data), references to other "universes"; "properties" (unclear if these are whole-system properties or atomic properties?)
Specification: https://mosaic-data-model.github.io/
Rich molecule format
todo
H5MD
Type: Binary (HDF5)
Self-describing: yes
Domain: molecular dynamics
Flexible units: yes
Human readable: not without HDF5 viewer
License: GPL (need to understand copyleft implications here)
Data stored: MD state data; atoms and their connectivity. Arbitrary atom lists/groups can be defined (nothing specific for chains/residues/etc)
Specification: http://nongnu.org/h5md/
Amber NetCDF
Type: Binary (HDF5)
Self-describing: Yes
Human readable: not directly (GUI viewers available)
Flexible units: yes
Domain: biomolecular dynamics
Data stored: Trajectory (no topology)
Specification: http://ambermd.org/netcdf/nctraj.xhtml
MDTraj HDF5
Type: HDF5
Self-describing: Yes
Human readable: no
Data stored: Trajectory (+ topology as a JSON string)
Flexible units: yes
Domain: biomolecular dynamics
Specification: https://github.com/mdtraj/mdtraj/wiki/HDF5-Trajectory-Format
Notes: This is an extension of Amber NetCDF format. FF-focused topology storage (as JSON)
Chemical Markup Language (CML)
Type: XML
Human readable: yes
Self-describing: sort of - must adhere to a schema
Specification: http://www.xml-cml.org/
Flexible units: yes
Domain: small molecule modeling
Data stored: coordinates, molecular properties, topology w/ stereochemistry, calculation parameters, electronic wavefunctions, computational metadata (i.e. hostname, programVersion, etc.). No support for biomolecules or trajectories.
Notes: I like this project's aims, but there's a LOT of conceptual overhead for understanding XML schema. I don't think I've ever used software that supports CML.
PDBx / MMCif
Type: CIF (text-based, see http://www.iucr.org/resources/cif/spec/version1.1/cifsyntax)
Self-describing: yes
Flexible units: no
Domain: Crystallography / NMR
Specification: http://mmcif.wwpdb.org/docs/tutorials/content/atomic-description.html
http://mmcif.wwpdb.org/docs/tutorials/content/molecular-entities.html
Notes: Vast improvement over original PDB. Medium-to-high conceptual overhead. Parsers are still hard to come by.
Data stored: everything you'd expect in a PDB file: topology + coordinates + crystallographic metadata.
Chemical JSON
Type: JSON (text-based)
Self-describing: yes
Notes: I think this is more of a proof-of-principle (implemented in Avogadro) than a mature spec, but interesting nonetheless. JSON is by far the easiest language here to read and write, both with machines and by hand.
Specification: http://wiki.openchemistry.org/Chemical_JSON
Speaking personally, the best outcome of this would be to find that someone has already solved the problems we're thinking about, or at least at least has a solution that can be extended to cover this project's specific application focuses (#1 and #10)
Below is a continuously-updated list of other projects. Everything here should be with a grain of salt, as it's an attempt to glean information from many different specifications :)
MOSAIC
Formats: XML, HDF5 (with straightforward extensions based on the data model)
License: Creative Commons 3.0
Units: List of supported units in spec
Design criteria: https://mosaic-data-model.github.io/design_criteria.html
Data stored: topology, CG info, selections (i.e., subsets of the file's data), references to other "universes"; "properties" (unclear if these are whole-system properties or atomic properties?)
Specification: https://mosaic-data-model.github.io/
Rich molecule format
todo
H5MD
Type: Binary (HDF5)
Self-describing: yes
Domain: molecular dynamics
Flexible units: yes
Human readable: not without HDF5 viewer
License: GPL (need to understand copyleft implications here)
Data stored: MD state data; atoms and their connectivity. Arbitrary atom lists/groups can be defined (nothing specific for chains/residues/etc)
Specification: http://nongnu.org/h5md/
Amber NetCDF
Type: Binary (HDF5)
Self-describing: Yes
Human readable: not directly (GUI viewers available)
Flexible units: yes
Domain: biomolecular dynamics
Data stored: Trajectory (no topology)
Specification: http://ambermd.org/netcdf/nctraj.xhtml
MDTraj HDF5
Type: HDF5
Self-describing: Yes
Human readable: no
Data stored: Trajectory (+ topology as a JSON string)
Flexible units: yes
Domain: biomolecular dynamics
Specification: https://github.com/mdtraj/mdtraj/wiki/HDF5-Trajectory-Format
Notes: This is an extension of Amber NetCDF format. FF-focused topology storage (as JSON)
Chemical Markup Language (CML)
Type: XML
Human readable: yes
Self-describing: sort of - must adhere to a schema
Specification: http://www.xml-cml.org/
Flexible units: yes
Domain: small molecule modeling
Data stored: coordinates, molecular properties, topology w/ stereochemistry, calculation parameters, electronic wavefunctions, computational metadata (i.e. hostname, programVersion, etc.). No support for biomolecules or trajectories.
Notes: I like this project's aims, but there's a LOT of conceptual overhead for understanding XML schema. I don't think I've ever used software that supports CML.
PDBx / MMCif
Type: CIF (text-based, see http://www.iucr.org/resources/cif/spec/version1.1/cifsyntax)
Self-describing: yes
Flexible units: no
Domain: Crystallography / NMR
Specification: http://mmcif.wwpdb.org/docs/tutorials/content/atomic-description.html
http://mmcif.wwpdb.org/docs/tutorials/content/molecular-entities.html
Notes: Vast improvement over original PDB. Medium-to-high conceptual overhead. Parsers are still hard to come by.
Data stored: everything you'd expect in a PDB file: topology + coordinates + crystallographic metadata.
Chemical JSON
Type: JSON (text-based)
Self-describing: yes
Notes: I think this is more of a proof-of-principle (implemented in Avogadro) than a mature spec, but interesting nonetheless. JSON is by far the easiest language here to read and write, both with machines and by hand.
Specification: http://wiki.openchemistry.org/Chemical_JSON