-
Notifications
You must be signed in to change notification settings - Fork 8
OSD 2014 environmental data csv documentation
This page documents the syntactic structure and content of the OSD 2014 environmental data of this CSV file.
Note that this repository contains an R script which imports this data into an R session and performs some basic pre-processing to ready it for analysis. View the script here
This CSV is UTF-8 encoded and has a header row.
Each field is separated by pipe symbol |.
For brevity the values of this column contain only the number part of
the OSD sampling site identifier (OSD id) from the OSD Registry. To derive the
full OSD identifier just prefix each number with OSD e.g. number 70 then becomes OSD70.
This is just a label in the sense of designating the sample and not a good stable identifier (in case a value changes the label changes)!
The label is just the concatenation of values of different columns
- OSD id (including
OSDprefix) (see columnosd_id) - date of sampling (see column
local_date) - depth of sampling including unit meter (m) (see column
water_depth) - The short form label of the protocol used for filtering the sample (see column
protocol)
Each value is separated by an underscore _.
Example: OSD76_2014-06-20_0.5m_NE08
tbd.
The European Nucleotide Archive accession number of the archived sample data.
The accession number of the BioSample database as issued by ENA during sample data submission.
Please note: for unknown reasons not all ENA samples got a corresponding BioSample accession number.
All geographic coordinates of the actual site where an OSD sample was
collected are given in WGS 84 decimal degrees. start_lat and start_lon
refer to the location where the actual sampling started and stop_lat
and stop_lon refer to the location where the sampling
stopped. Therefore, stop_lat and stop_lon only differ from
start_lat and start_lon if sampling was done on a moving platform
(like e.g. a research vessel) and the difference was recorded. In most
cases there is no difference.
The latitude of the sampling site.
The longitude of the sampling site.
The latitude of the sampling site.
The longitude of the sampling site.
This should be better named sampling depth i.e. the depth of sampling
in the water column in meter (m). 0 codes for surface water
without precise depth measurement.
The date of sampling in year-month-day (YYYY-MM-DD) format (according to ISO).
The time of the day at the sampling site when actual sampling start.
The time of the day at the sampling site when actual sampling start.
The time of the day coded in the UTC/Greenwich time standard when actual sampling start.
The time of the day coded in the UTC/Greenwich time standard when actual sampling start.
Name of the site as given by the OSD participants.
The name of the IHO Sea Area assigned to the sampling site base on the data provided by Marine Regions.
Source data documentation: http://www.marineregions.org/sources.php#iho
Please note: Not all OSD sites have an IHO region assigned. In some case they are actually on land or too far away from any marine coast line (e.g. inside fjords or rivers).
The
Marine Regions Geographic IDentifier
correpsonding to the IHO Sea Area. See iho_label column.
The abbreviated protocol name used for DNA filtration.
Please refer to the OSD Handbook for detailed documentation of the protocols.
The objective as stated by the people sampling.
tbd.
tbd.
tbd.
As measured at the time of sampling.
Please refer to the OSD Handbook for detailed documentation of these parameters and corresponding units.
As measured at the time of sampling.
Please refer to the OSD Handbook for detailed documentation of these parameters and corresponding units.
The textual representation of a biome term. All terms are taken from Environmental Ontology as of 2015-09-01 see their GitHub repository for technical details.
For brevity the values of this column just contains the number part of
the ENVO identifier of an biome related term in column biome. To derive the
full ENVO identifier just prefix the number with ENVO e.g. number 00000447 then becomes
ENVO:00000447. See
ENVO Readme for more details.
One can also choose to create ENVO term purl references by prefixing it with `http://purl.obolibrary.org/obo/ENVO_' e.g. http://purl.obolibrary.org/obo/ENVO_00000447
The R script which imports and prepares this data for analysis (noted above, available here), expands the truncated identifiers into PURLs.
like biome column.
like biome_id column.
like biome column.
like biome_id column.
As measured at the time of sampling.
Please refer to the OSD Handbook for detailed documentation of the following parameters and their corresponding units.
ph
phosphate
nitrate
carbon_organic_particulate
nitrite
carbon_organic_dissolved_doc
nano_microplankton
downward_par
conductivity
primary_production_isotope_uptake
primary_production_oxygen
dissolved_oxygen_concentration
nitrogen_organic_particulate_pon
meso_macroplankton
bacterial_production_isotope_uptake
nitrogen_organic_dissolved_don
ammonium
silicate
bacterial_production_respiration
turbidity
fluorescence
pigment_concentration
picoplankton_flow_cytometry