Skip to content

How to analyse pooled data

Caitlin Cherryh edited this page Nov 5, 2024 · 2 revisions

This how-to guide shows you how to analyse pooled data with the different models available in PoolTools. You must ensure that your dataset is formatted correctly. See How to prepare your data for analysis for the requirements.

First, upload your spreadsheet through Upload data and select the columns for Results and Pool size. Incorrectly formatted results or pool size columns will return an error. When formatting and columns selection is correct, you can proceed with applying different models on the same uploaded data.

Prevalence estimations can be conducted on either:

  • The whole dataset, or
  • Separately for selected variables

In addition, a hierarchical (cluster) sampling model can be applied to either of the above options to obtain more accurate uncertainty estimates.

Once completed, results can be downloaded as a .csv file for further analysis.

How-to estimate prevalence on the whole dataset

The simplest use case is to estimate marker prevalence on the whole dataset. This option will produce a single prevalence estimate and ignore any underlying subgroups or hierarchical sampling structure in the data. After uploading your data and column selection:

  • Deselect "Stratify data?"
  • Click "Estimate prevalence"

How-to estimate prevalence by strata

Prevalence can be estimated independently for selected subgroups or variables. This is useful to identify any differences across e.g. time, species, or sampling locations. To do so:

  • Select "Stratify data?"
  • Select the columns you would like prevalence to be estimated for
  • Click "Estimate prevalence"

Tip

Bayesian estimations of prevalence can be conducted on non-hierarchical models by toggling "Bayesian calculations" under Advanced settings

How-to obtain more accurate estimates of uncertainty (hierarchical/cluster sampling)

We recommend that analyses should be adjusted for hierarchy to obtain more accurate prevalence estimates of the Bayesian credible intervals (see why it is important: Hierarchical sampling structure). The hierarchical model can be applied when estimating a single prevalence for the whole data, or across subgroups.

  • Select whether data should be stratified or not (see above sections)
  • Select "Adjust for hierarchical sampling"
  • Identify hierarchical variables in your data and drag into the "Hierarchical variables" bucket. A minimum of two variables must be selected
  • Reorder hierarchical variables from the largest to smallest sampling area
  • Click "Estimate prevalence"

PoolTools default options

Important

When estimating prevalence from data with hierarchical/cluster sampling in PoolTools, the user cannot control the kind of point estimate and interval used when all pools are negative.

When all pools are negative, PoolTools uses 0 as the point estimate and lower bound for the credible interval, and level posterior quantile as the upper bound of the interval.

Using PoolTestR gives the user greater control over the individual arguments. To conduct a hierarchical/cluster sampling analysis in PoolTestR, use the function PoolTestR::HierPoolPrev(). To control the kind of point estimate to use when all pools are negative, see the parameter all.negative.pools within HierPoolPrev().

For more details about this option, see the PoolTestR documentation on CRAN or GitHub.


TODO: How to interpret output (Note: I think this should be a separate Reference page, else the content will quickly look like something suited for a Tutorial)

Clone this wiki locally