Skip to content

OSD 2014 environmental data csv documentation

renzo edited this page Dec 4, 2015 · 10 revisions

Overview

This page documents the syntactic structure and content of the OSD 2014 environmental data of this CSV file.

Note that this repository contains an R script which imports this data into an R session and performs some basic pre-processing to ready it for analysis. View the script here

Syntax

This CSV is UTF-8 encoded and has a header row.

Each field is separated by pipe symbol |.

Field descriptions

Identifiers and labels

osd_id

For brevity the values of this column contain only the number part of the OSD sampling site identifier (OSD id) from the OSD Registry. To derive the full OSD identifier just prefix each number with OSD e.g. number 70 then becomes OSD70.

label

This is just a label in the sense of designating the sample and not a good stable identifier (in case a value changes the label changes)!

The label is just the concatenation of values of different columns

  1. OSD id (including OSD prefix) (see column osd_id)
  2. date of sampling (see column local_date)
  3. depth of sampling including unit meter (m) (see column water_depth)
  4. The short form label of the protocol used for filtering the sample (see column protocol)

Each value is separated by an underscore _.

Example: OSD76_2014-06-20_0.5m_NE08

bioarchive_code

tbd.

ena_acc

The European Nucleotide Archive accession number of the archived sample data.

biosample_acc

The accession number of the BioSample database as issued by ENA during sample data submission.

Please note: for unknown reasons not all ENA samples got a corresponding BioSample accession number.

Geographic localization and time

All geographic coordinates of the actual site where an OSD sample was collected are given in WGS 84 decimal degrees. start_lat and start_lon refer to the location where the actual sampling started and stop_lat and stop_lon refer to the location where the sampling stopped. Therefore, stop_lat and stop_lon only differ from start_lat and start_lon if sampling was done on a moving platform (like e.g. a research vessel) and the difference was recorded. In most cases there is no difference.

start_lat

The latitude of the sampling site.

start_lon

The longitude of the sampling site.

stop_lat

The latitude of the sampling site.

stop_lon

The longitude of the sampling site.

water_depth

This should be better named sampling depth i.e. the depth of sampling in the water column in meter (m). 0 codes for surface water without precise depth measurement.

local_date

The date of sampling in year-month-day (YYYY-MM-DD) format (according to ISO).

local_start

The time of the day at the sampling site when actual sampling start.

local_end

The time of the day at the sampling site when actual sampling start.

start_date_time_utc

The time of the day coded in the UTC/Greenwich time standard when actual sampling start.

end_date_time_utc

The time of the day coded in the UTC/Greenwich time standard when actual sampling start.

site_name

Name of the site as given by the OSD participants.

iho_label

The name of the IHO Sea Area assigned to the sampling site base on the data provided by Marine Regions.

Source data documentation: http://www.marineregions.org/sources.php#iho

Please note: Not all OSD sites have an IHO region assigned. In some case they are actually on land or too far away from any marine coast line (e.g. inside fjords or rivers).

mrgid

The Marine Regions Geographic IDentifier correpsonding to the IHO Sea Area. See iho_label column.

Sampling data

protocol

The abbreviated protocol name used for DNA filtration.

Please refer to the OSD Handbook for detailed documentation of the protocols.

objective

The objective as stated by the people sampling.

platform

tbd.

device

tbd.

description

tbd.

Mandatory environmental data

water_temperature

As measured at the time of sampling.

Please refer to the OSD Handbook for detailed documentation of these parameters and corresponding units.

salinity

As measured at the time of sampling.

Please refer to the OSD Handbook for detailed documentation of these parameters and corresponding units.

biome

The textual representation of a biome term. All terms are taken from Environmental Ontology as of 2015-09-01 see their GitHub repository for technical details.

biome_id

For brevity the values of this column just contains the number part of the ENVO identifier of an biome related term in column biome. To derive the full ENVO identifier just prefix the number with ENVO e.g. number 00000447 then becomes ENVO:00000447. See ENVO Readme for more details.

One can also choose to create ENVO term purl references by prefixing it with `http://purl.obolibrary.org/obo/ENVO_' e.g. http://purl.obolibrary.org/obo/ENVO_00000447

The R script which imports and prepares this data for analysis (noted above, available here), expands the truncated identifiers into PURLs.

feature

like biome column.

feature_id

like biome_id column.

material

like biome column.

material_id

like biome_id column.

Optional environmental data

As measured at the time of sampling.

Please refer to the OSD Handbook for detailed documentation of the following parameters and their corresponding units.

ph

phosphate

nitrate

carbon_organic_particulate

nitrite

carbon_organic_dissolved_doc

nano_microplankton

downward_par

conductivity

primary_production_isotope_uptake

primary_production_oxygen

dissolved_oxygen_concentration

nitrogen_organic_particulate_pon

meso_macroplankton

bacterial_production_isotope_uptake

nitrogen_organic_dissolved_don

ammonium

silicate

bacterial_production_respiration

turbidity

fluorescence

pigment_concentration

picoplankton_flow_cytometry