Skip to content

Evaluate alternative formats #13

@avirshup

Description

@avirshup

Speaking personally, the best outcome of this would be to find that someone has already solved the problems we're thinking about, or at least at least has a solution that can be extended to cover this project's specific application focuses (#1 and #10)

Below is a continuously-updated list of other projects. Everything here should be with a grain of salt, as it's an attempt to glean information from many different specifications :)

MOSAIC
Formats: XML, HDF5 (with straightforward extensions based on the data model)
License: Creative Commons 3.0
Units: List of supported units in spec
Design criteria: https://mosaic-data-model.github.io/design_criteria.html
Data stored: topology, CG info, selections (i.e., subsets of the file's data), references to other "universes"; "properties" (unclear if these are whole-system properties or atomic properties?)
Specification: https://mosaic-data-model.github.io/

Rich molecule format
todo

H5MD
Type: Binary (HDF5)
Self-describing: yes
Domain: molecular dynamics
Flexible units: yes
Human readable: not without HDF5 viewer
License: GPL (need to understand copyleft implications here)
Data stored: MD state data; atoms and their connectivity. Arbitrary atom lists/groups can be defined (nothing specific for chains/residues/etc)
Specification: http://nongnu.org/h5md/

Amber NetCDF
Type: Binary (HDF5)
Self-describing: Yes
Human readable: not directly (GUI viewers available)
Flexible units: yes
Domain: biomolecular dynamics
Data stored: Trajectory (no topology)
Specification: http://ambermd.org/netcdf/nctraj.xhtml

MDTraj HDF5
Type: HDF5
Self-describing: Yes
Human readable: no
Data stored: Trajectory (+ topology as a JSON string)
Flexible units: yes
Domain: biomolecular dynamics
Specification: https://github.com/mdtraj/mdtraj/wiki/HDF5-Trajectory-Format
Notes: This is an extension of Amber NetCDF format. FF-focused topology storage (as JSON)

Chemical Markup Language (CML)
Type: XML
Human readable: yes
Self-describing: sort of - must adhere to a schema
Specification: http://www.xml-cml.org/
Flexible units: yes
Domain: small molecule modeling
Data stored: coordinates, molecular properties, topology w/ stereochemistry, calculation parameters, electronic wavefunctions, computational metadata (i.e. hostname, programVersion, etc.). No support for biomolecules or trajectories.
Notes: I like this project's aims, but there's a LOT of conceptual overhead for understanding XML schema. I don't think I've ever used software that supports CML.

PDBx / MMCif
Type: CIF (text-based, see http://www.iucr.org/resources/cif/spec/version1.1/cifsyntax)
Self-describing: yes
Flexible units: no
Domain: Crystallography / NMR
Specification: http://mmcif.wwpdb.org/docs/tutorials/content/atomic-description.html
http://mmcif.wwpdb.org/docs/tutorials/content/molecular-entities.html
Notes: Vast improvement over original PDB. Medium-to-high conceptual overhead. Parsers are still hard to come by.
Data stored: everything you'd expect in a PDB file: topology + coordinates + crystallographic metadata.

Chemical JSON
Type: JSON (text-based)
Self-describing: yes
Notes: I think this is more of a proof-of-principle (implemented in Avogadro) than a mature spec, but interesting nonetheless. JSON is by far the easiest language here to read and write, both with machines and by hand.
Specification: http://wiki.openchemistry.org/Chemical_JSON

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions