-
Notifications
You must be signed in to change notification settings - Fork 9
Data Formatting and Loading
Phinch 2.0 currently supports downstream analyses of Biological Observation Matrix (BIOM) files, a standardized file format used to represent diverse types of genomic data and other biological observations. The Phinch 2.0 app now supports both BIOM v1.0 (JSON-formatted) and BIOM v2.0 (HDF5-formatted) files, the standard outputs from the QIIME1 and QIIME2 pipelines, respectively. The most typical user applications of BIOM files are environmental rRNA amplicons or shotgun metagenomic data, although any type of sample/observation data can be represented as .biom files (RNA-seq, gene variants, morphological character matrices, etc.). See below for file conversion instructions. Any type of tab-delimited file containing biological data can be converted into a BIOM file using the QIIME pipeline or the standalone biom-format python package.
Note: All sample metadata and taxonomy/ontology information MUST be embedded in the BIOM file before being loaded into the Phinch App.
For users having trouble with QIIME 1.9 file conversion, please troubleshoot according to this GitHub thread: https://github.com/PitchInteractiveInc/Phinch/issues/46
If your BIOM 1.0 file is correctly formatted and still not working, then can you try the suggestions in this thread (converting back to classic OTU table and then back to BIOM with metadata re-added): https://groups.google.com/forum/#!topic/phinch/rsIk8DCQ0VM
To prepare your files for visualization, follow these steps:
Sample metadata is defined as any descriptive information about your biological samples or the environment where they were collected; you should include any type of metadata that may be useful for interpreting and analyzing patterns in your data. Some common types of sample metadata include geographic coordinates (latitude/longitude), collection date, state/country, sampling matrix (water, air, soil, sediment), etc. Mapping files can contain as much or as little sample metadata as is useful or necessary. For example, sample metadata for a human microbiome study might also include information about patient sex, body site where samples were collected, or patient age. Mapping files should be prepared according to QIIME formatting conventions; we recommend the Keeimei plugin for Google Sheets to check and validate your mapping file headers and values: https://keemei.qiime2.org/
To label your samples in Phinch and export graphics with human-readable IDs, include a column in your metadata mapping file with the header labelled as phinchID (these entries can be the same or different as the first SampleID column). The phinchID values will be pulled through into the visualizations to populate graph axes. If this column is not included, an arbitrary numerical ID will be assigned to each sample. For optimal visualization, phinchIDs should be no longer than 15 characters.
An example sample mapping file might look like this:
| #SampleID | BarcodeSequence | LinkerPrimerSequence | CollectionDate | Material | phinchID | Description |
|---|---|---|---|---|---|---|
| 0.SandCoralPond1.1 | ACTGAAGT | TATGGTAATTGTGTGCCAGCMGCCGCGGTAA | 2012-11-30T11:00:00 | Sand | CP.Day0.Sand | aquarium |
| 0.WaterCoralPond1.1 | ACTGGGG | TATGGTAATTGTGTGCCAGCMGCCGCGGTAA | 2012-11-30T11:00:00 | Water | CP.Day0.Water | aquarium |
| 0.WipesCoralPond1.1 | ACTGAAAA | TATGGTAATTGTGTGCCAGCMGCCGCGGTAA | 2012-11-30T11:00:00 | Wipes | CP.Day0.Wipes | aquarium |
Some notes on metadata formatting:
In order to be properly detected, all date/time metadata must be standardized according to MIxS standardized format (more information at the Genomic Standards Consortium wiki), and entered into one column in your original sample metadata mapping file, as follows:
[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss]-[Z]
This date format lists the year, month, and day, followed by a 24hr timestamp with a UTC offset (Z). Inclusion of timestamp and UTC offset are both optional; metadata columns can include date only. For example, metadata for a sample collected at 2:30pm EST on May 4, 2007 would be entered as: 2007-04-05T14:30:00-05:00
Similarly, any geographic coordinates or GPS data must be entered as decimal degrees (the format used by GoogleMaps, e.g. -90.017926). We recommend using separate columns labeled “Latitude” and “Longitude” in your original sample metadata mapping file, to ensure that GPS metadata is correctly detected.
For units of measurement: Include a space between the measurement value and unit, e.g. 2421 m, instead of 2421 or 2421m
For columns or fields with missing metadata: Enter 'no_data' if there is no measurement available for a given sample (for example, columns listing temperature, pH, or other chemical measurements where a given value was not recorded for some samples). We do not recommend leaving a blank space, since this may lead to improper importing and formatting of metadata values.
BIOM files are now the default output for metabarcoding (rRNA amplicons and other marker loci) and the analysis of shotgun metagnome data in the QIIME software package. Users wanting to visualize other data types, or convert files prepared outside the QIIME pipeline, should follow these file conversion instructions
Follow the commands and instructions listed on the BIOM website to prepare your files or convert from other formats: http://biom-format.org/documentation/adding_metadata.html
The basic command to add sample metadata to a BIOM table in QIIME2 is as follows:
biom add-metadata -i otu_table_w_tax.biom -o table_with_sample_metadata.biom -m sample_metadata.txt
Note: You MUST have taxonomy or ontology information ("observation metadata") embedded in your BIOM table in addition to sample metadata. If you do not already have this information embedded in the BIOM file you can add it along with sample metadata using the following command:
biom add-metadata -i otu_table.biom -o table_with_metadata.biom --observation-metadata-fp taxonomy_mapping.txt --sample-metadata-fp sample_metadata.txt
In QIIME1 (version 1.7 or later), users can prepare a .biom file for visualization by executing the following commands.
First, construct an OTU table:
make_otu_table.py -i final_otu_map_mc2.txt -o otu_table_mc2_w_tax.biom -t rep_set_tax_assignments.txt
Where your input file (-i) is your OTU Map (defining clusters of raw sequences reads), and taxonomy file (-t) contains the taxonomy or gene ontology strings that correspond to each OTU.
Second, add your sample metadata to your .biom file.
All sample metadata and taxonomy/ontology information MUST be embedded in the .biom file before being uploaded into Phinch.
In QIIME version 1.8 and above this can be done using the following command:
biom add-metadata -i otu_table_mc2_w_tax.biom -o otu_table_mc2_w_tax_and_metadata.biom -m sample_metadata_mapping_file.txt
In QIIME version 1.7 or below, you can add metadata with the following command:
add_metadata.py -i otu_table_mc2_w_tax.biom -o otu_table_mc2_w_tax_and_metadata.biom -m sample_metadata_mapping_file.txt
Where your input file (-i) is your .biom file from the previous step, and your mapping file (-m) is the tab-delimited mapping file that you prepared in Step 1 (formatted according to QIIME instructions).
Step 3: Download the Phinch 2.0 app and load in your BIOM file with embedded observation/sample data.
The Phinch 2.0 app will check that your file and its associated metadata are formatted correctly, and if everything looks OK you will see a message saying "File Validates: YES" and the app will then allow you to proceed forward to the filter data and visualization screens.
If you want to visualize biological data currently formatted as a tab-delimited text file (e.g. the style of OTU tables produced by older versions of QIIME, or any other type of genomic/morphological data that can be represented in matrix format, please refer to this BIOM documentation for conversion instructions. Phinch will accept BIOM v1.0 files (plain text JSON format) as well as BIOM v2.0 files (binary HDF5 format). Full documentation for the BIOM file format can be found at http://biom-format.org