-
Notifications
You must be signed in to change notification settings - Fork 4
07. Subsetting Data
GenePiper provides two approaches to subset data.
First, most of the analysis modules contain a filter panel for subsetting the data ad hoc. Second, GenePiper provides a Filter Phyloseq module for subsetting the data, and save the new data separately under a new individual data label. Both approaches utilise a data filter panel for the operation.
The filter panel provides two options to subsample. The simpler option is the Filter By Column method. In this method, user chooses one sample data table column as the variable to construct the subsetting expression. Another method is the Command Line that provides higher flexibility. This method is based on the subset_samples function of the phyloseq package. Users have to input a valid subsetting expression. Filtering options for the taxa are also available. Multiple filters may be stacked and will be operated from the top to the bottom. The summary of the subsampled phyloseq-class object will be shown.


Filter Phyloseq module provides method to subset data, and store under a new data label.
First, load data in the Load Data panel.

In the Filter panel, press the + button to add filter options.

There are two categories of filters, Sample and Taxa :
- Basically samples are filtered by any
Columnin the sample data table. Different filtering options are provided depending on the data-type of the column (variable) selected.

- For numeric variables, operators and numeric values can be set to filter the sample.

-
Basic operators provided for numeric variables are:
-
- "==" equal
-
- ">" larger than
-
- ">=" larger than and including
-
- "<" less than
-
- "<=" less than and including
-
For character variables, choose the variables from the checkboxes.

- More sophisticated filtering of samples can be achieved by the
Formulaoption, where user may input a valid subsetting expression just like using thesubset_samplesfunction from thephyloseqpackage.

Read the followings if you want to learn more about valid subsetting expression:
- Taxa can be filtered by
Count, where user may define the prevalence (fraction of total number of samples in which a taxon is observed) and an abundance cutoff.

- Taxa can be filtered by
Rank, where user may select a taxonomic rank, and choose the taxa from the checkboxes.

Multiple filters may be stacked. Any filter may be removed by pressing the - button. After setting up the filters, click the Filter button to start subsetting the data. The filters will be processed from top to bottom. The summary of the resultant dataset will be shown.
The filtered dataset can be saved in any existing Project. User may also create a new project by clicking the New Project Button.

To save the subsetted dataset, a unique data label should be provided.

The subsetted dataset can be downloaded by clicking the Download button
