-
Notifications
You must be signed in to change notification settings - Fork 2
Description
netCDF allows for a lot more information than exists in exchange files, with the CCHDO documentation metadata extraction project going, eventually we will need a place to put that metadata. For a while now, I have wanted to store the information contained in the "Bob Headers" in a more structured way. The following ACDD attributes, when pushed down from the global to variable level, should enable the creation of "Bob Headers": processing_level, comment, creator_name, project, date_modified, date_metadata_modified. Further examination of each of these:
processing_level
In ACDD the processing level is a freeform string. We should use this to indicate the following status that very roughly correspond to the satellite communities L0 though L4 processing levels :- collected - water was taken but not received
- raw - used for CTD but not discrete
- preliminary - data in the file that maybe has not had final calibration applied
- final - data that is not expecting any more updates
- product - we probably won't use this, but included since that is what L4 tends to be
A controlled vocabulary of these should be searched for.
comment
Free text notes, usually these are very short for each parameter. This is the "notes" part of the Bob Headerscreator_name
This is the PI for the parameter in question, we should use array of strings for multiple PIs in our at rest data files. This is the "who" part of the Bob Headers. There is also acreator_urlattribute that we might consider storing ORCiDs in.project
We need a way to tie multiple variables with the same PI/status together, e.g. nutrients are usually 3~5 variables. In the ACDD docs, a program (GO-SHIP) is made up of multiple projects (Total Carbon, pH, Nutrients, CTD, etc..). Variables that have the same project value would be grouped into the "includes" list in the Bob Headers, thecommentandcreator_namewould need to be the same to avoid ambiguity.
There probably is not a single controlled vocabulary for these project names, they would also likely benefit from some coordination with GO-SHIP.date_modified
If the data itself is changed, this would be updated to be the date it was changed in the data file. The merge_fq accessor already updates this.date_metadata_modified
If only the metadata were modified, this attribute would be updated to the date the change was done. The merge_fq accessor already updates this if the print format is different.
The only non standard ACDD usage of the above are being at the variable level rather than global, and the possible use of arrays of strings. We could define combining rules to put all this information in the global attributes that fully conform to ACDD, but this would likely be one way (update the globals from variables, not the other way around). For example: the global date_modified would be set to the most recent date seen from all the variables that also have date_modified.
Things this might make possible:
- Getting a list of updated files since (or even between/before) a certain date could be done at a per variable level by examining the
date_modifiedattribute. We can even exclude simple metadata updates that didn't change the values used in science. - Find all the preliminary data or exclude preliminary data from a result set.
- Know who has not turned in their data yet by examining the
processing_levelattribute for "collected" and thecreator_nameattribute. This can also be done for bottle data with flag 1.