Sampling-Uncertainty/definitions.tex at master · livecomsjournal/Sampling-Uncertainty · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
\subsection{Key Definitions}

In order to make the discussion that follows more precise, we first define key terms used in subsequent sections.
We caution that while many of these concepts are familiar, our terminology follows the {\it International Vocabulary of Metrology} (VIM) \citep{JCGM:VIM2012}, a standard that sometimes differs from the conventional or common language of engineering statistics.   For additional information about or clarification of the statistical meaning of terms in the VIM, we suggest that readers consult the {\it Guide to the expression of uncertainty in measurement} (GUM) \citep{JCGM:GUM2008}.

For clarity, we highlight a few differences between conventional terms and the VIM usage employed throughout this article.
Readers should study the term ``standard uncertainty'' which is sometimes estimated by (in common parlance) the ``standard error of the mean''; however, the VIM term for the latter is the ``experimental standard deviation of the mean.''
In cases of lexical ambiguity, the reader should assume that we hold to the definition of terms as given in the VIM.

Note also that the glossary is presented in a logical, rather than alphabetical order.  We strongly encourage reading it through in its entirety because of the structure and potentially unfamiliar terminology.  Importantly, we also recommend reading the discussion that immediately follows, since this (i) explains the rationale for adopting the chosen language, (ii)  discusses the limited relationship between statistics and uncertainty quantification, and (iii) thereby clarifies our perspective on best-practices.


\subsubsection{Glossary of Statistical Terms}
\begin{itemize}

\item {\bf Random quantity}: A quantity whose numerical value is inherently unknowable or unpredictable.
  Observations or measurements taken from a molecular simulation are treated as random quantities\footnote{Most molecular simulations (even those using pseudo-random number generators) are deterministic in that the sequence of visited states is generated by a fixed and known algorithm.  As such, the simulation output is never truly random.
In practice, however, the chaotic nature of the simulation allows for application of the principles of statistics to the analysis of simulation observations.
Thus, observations/measurements taken at points along the simulation may be treated as random quantities.
See Ref.~\cite{Leimkuhler} for more discussion of this rather deep point.}.

\item {\bf True value:}  The value of a quantity that is consistent with its definition and is the objective of an idealized measurement or simulation. The adjective ``true'' is often dropped when reference to the definition is clear by context \citep{JCGM:GUM2008,JCGM:VIM2012}.
  \label{def:true_value}

\item {\bf Expectation value}:  If $P(x)$ is the probability density of a continuous random quantity $x$, then the expectation value is given by the formula
\begin{align}
  \expval{x} = \int {\rm d}x\, P(x) x.
  \label{linavg_cont}
\end{align}
In the case that $x$ adopts discrete values $x_1,x_2,...$ with corresponding fractional (absolute) probabilities $P(x_j)$, we instead write
\begin{align}
  \expval{x} = \sum_j x_j P(x_j).
  \label{linavg}
\end{align}
Note that $P(x)$ is dimensionless when $x$ is discrete as shown above.
When $x$ is continuous, as in Eq.~\ref{linavg_cont}, $P(x)$ must have units reciprocal to $x$, e.g., if $x$ has units of kg, then $P(x)$ has units of 1/kg.
Furthermore, whether $x$ is discrete or continuous, $P(x)$ should always be normalized to ensure a total probability of unity.

\item {\bf Variance}:\footnote{The true probability density $P(x)$ is inherently unknowable, given that we can only collect a finite amount of data about $x$.  As such, we can only estimate its properties (e.g., mean and variance) and approximate its analytical form (e.g.\ see the end of Ref.\ \cite{SMC}).} Taking $P(x)$ as defined previously, the variance of a random quantity is a measure of how much it can fluctuate, given by the formula
\begin{align}
\sigma_x^2 = \int {\rm d}x\, P(x) \left( x  - \expval{x} \right)^2.
\end{align}
If $x$ assume discrete values, the corresponding definition becomes
\begin{align}
\sigma_x^2 = \sum_j P(x_j) \left( x_j  - \expval{x} \right)^2.
\end{align}

\item {\bf Standard Deviation}: The positive square root of the variance, denoted $\sigma_x$.\label{def:st_dev}  This is a measure of the width of the distribution of $x$, and is, in itself, \emph{not} a measure of the statistical uncertainty; see below.

\item {\bf Arithmetic mean}: An \emph{estimate} of the (true) expectation value of a random quantity, given by the formula
  %
  \begin{equation}
    \mean{x} = \dfrac{1}{n} \sum_{j=1}^{n} x_j \label{def:arith_mean}
  \end{equation}
  %
  where $x_j$ is an experimental or simulated realization of the random variable and $n$ is the number of samples.
\smallskip

\textbf{\textit{Remark:}} This quantity is often called the ``sample mean.''
Note that a proper realization of a random variable (with no systematic bias) will yield values distributed according to $P(x)$, so $\mean{x} \rightarrow \expval{x}$ as $n \rightarrow \infty$.


\item {\bf Standard Uncertainty}: Uncertainty in a result (e.g., estimation of a true value) as expressed in terms of a \hyperref[def:st_dev]{standard deviation}.\footnote{The definition of standard uncertainty does not specify how to calculate the standard deviation. This choice ultimately rests with the modeler and should be dictated by the details of the uncertainty relevant to the problem at hand.  Intuitively, this quantity should reflect the degree to which an estimate would vary if recomputed using new and independent data.}
  \label{def:std_unc}

\item {\bf Experimental standard deviation:\footnote{
      The term ``experimental'' can refer to simulated data, since these are the results of numerical experiments.
}
} An \emph{estimate} of the (true) standard deviation of a random variable, given by the formula\footnote{The factor of $n-1$ (as opposed to $n$) appearing in the denominator of Eq.~\ref{def:exp_st_dev} is needed to ensure that the variance estimate is {\it unbiased}, meaning that on average $\var{x}$ is equal to the true variance. Physically, we can interpret the $-1$ as accounting for the fact that one degree-of-freedom (e.g., piece of data) is lost via the appearance of $\mean{x}$ in the definition of $\stdev{x}$.  Equivalently, it accounts for the fact that the arithmetic mean is linearly correlated with each $x_j$ (cf.\ \hyperref[def:unc_obs]{Linearly Uncorrelated Observables}).}
  \begin{equation}
    \stdev{x} = \sqrt{\dfrac{\sum_{j=1}^n\left(x_j - \mean{x}\right)^2}{n-1}} \label{def:exp_st_dev}
  \end{equation}
  \smallskip
  The square of the experimental standard deviation, denoted $\var{x}$, is the experimental variance.\\
  \textbf{\textit{Remark:}} This quantity is often called the ``sample standard deviation.''
  Additionally, $\stdev{x}$ is a statistical property of the specific set of observations $\left\{x_1,x_2,...,x_n\right\}$, not of the random quantity $x$ in general.
  Thus, $\stdev{x}$ is sometimes written as $\stdev{x_j}$ for emphasis of this property.

  \item {\bf Linearly uncorrelated observables}:  If quantities $x$ and $y$ have mean values $\expval{x}$ and $\expval{y}$, then $x$ and $y$ are linearly uncorrelated if
\begin{equation}
  \expval{ \left(x - \expval{x} \right) \left(y - \expval{y} \right) } = 0 \label{def:unc_obs}
\end{equation}

\textbf{\textit{Remark:}} The concepts of linear uncorrelation and independence of random variables are often conflated. Two variables can be correlated even if Eq.~\ref{def:unc_obs} is 0, e.g.~when a scatter plot of the two variables forms a circle. Truly independent variables have zero linear and higher-order correlations, such that the joint density of two random variables $x$ and $y$ can be decomposed as $P(x,y)=P(x)P(y)$, which is a stronger condition than linear uncorrelation.  Empirically testing for independence, however, is not practical, nor is it necessary for any of the estimates discussed in this work.

\item {\bf Experimental standard deviation of the mean}: An estimate of the standard deviation of the distribution of the \hyperref[def:arith_mean]{arithmetic mean}, given by the formula
  %
  \begin{equation}
    \stdevmean{\mean{x}} = \dfrac{\stdev{x}}{\sqrt{n}}, \label{def:exp_st_dev_mean}
  \end{equation}
  where the realizations of $x_j$ are assumed to be linearly uncorrelated.\footnote{The true variance of the mean goes as $\sigma_x^2/n$, which assumes exact knowledge of $\sigma_x$.  Thus, the factor of $n$ (as opposed to $n-1$) appearing in Eq.~\eqref{def:exp_st_dev_mean} is motivated by the observation that $\langle \stdev{x}^2 \rangle = \sigma_x^2$, i.e., the experimental standard deviation provides an unbiased estimate of the experimental standard deviation of the mean.  It is important and somewhat counterintuitive, however, that Eq.~\eqref{def:exp_st_dev_mean} actually {\it underestimates} the true standard deviation of the mean, which is a trivial consequence of Jensen's inequality. }

\smallskip
\textbf{\textit{Remark:}} This quantity is often called the ``standard error.''

%%% Terms that might be added in a future iteration:
% \item {\bf Precision}: The amount of variability in an estimate (e.g., based on repeating a given simulation protocol multiple times).
%\item {\bf Accuracy}: The degree to which a result agrees with a reference value.

\item {\bf Raw data}: The numbers that the computer program directly generates as it proceeds through a sequence of states.
For example, a MC simulation generates a sequence of configurations, for which there are associated properties such as the instantaneous pressure, temperature, volume, etc.
\label{def:raw_data}
\item {\bf Derived observables}: Quantities derived from ``non-trivial'' analyses of raw data, e.g., properties that may not be computed for a single configuration such as free energies.
  \label{def:deriv_obs}


\item {\bf Correlation time}: In time-series data of a random quantity $x(t)$ (e.g., a physical property from a MC or MD trajectory; the sequence of trials moves is treated as a ``time series'' in MC), the correlation time (denoted here as $\tau$) is the {\it longest} separation time $\Delta t$ over which $x(t)$ and $x(t+\Delta t)$ remain (linearly) correlated.\footnote{Generally speaking, MC and MD trajectories generate new configurations from preceding ones.} (See Eq.~\ref{def:obs_ACF} for mathematical definition and Sec.~\ref{sec:autocorrelation} for discussion.)
Thus, the correlation time can be interpreted as the time over which the system retains memory of its previous states.
Such correlations are often {\bf stationary}, meaning that $\tau$ is independent of $t$.
Roughly speaking, the total simulation time divided by the longest correlation time yields an order-of-magnitude estimate of the number of (linearly) {\it uncorrelated} samples generated by a simulation. See Sec.~\ref{sec:autocorrelation}.  Note that the correlation time can be infinite.
\label{def:corr_time}

\item {\bf Two-sided confidence interval}: An interval, typically stated as $\expval{x} = \mean{x} \pm U$, which is expected to contain the possible values attributed to $\expval{x}$ given the experimental measurements of $x_j$ and a certain {\it level of confidence}, denoted $p$.
  The size of the confidence interval, known as the {\it expanded uncertainty}, is defined by $U = k \stdevmean{\mean{x}}$ where $k$ is the \hyperref[def:coveragefactor]{\it coverage factor} \cite{JCGM:VIM2012}.\footnote{This conceptual description of a confidence interval is only applicable when certain conditions are met, including the important stipulation that all uncertainty contained in $\stdevmean{\mean{x}}$ is determined only by statistical evaluation of the random experimental measurements of $x_j$ \cite{JCGM:GUM2008}.}
  The level of confidence $p$ is typically given as a percentage, e.g., 95~\%. Hence, the confidence interval is typically described as ``the $p$~\% confidence interval'' for a given value of $p$.
  \label{def:conf_int}

\item {\bf Coverage Factor}:
  The factor $k$ which is multiplied by the experimental standard deviation of the mean \hyperref[def:exp_st_dev_mean]{$\stdevmean{\mean{x}}$} to obtain the expanded uncertainty, typically in the range of 2 to 3. In general, $k$ is selected based on the chosen level of confidence $p$ and probability distribution that characterizes the measurement result $x_j$. For Gaussian-distributed data, $k$ is determined from the $t$-distribution, based on the level of confidence $p$ and the number of measurements in the experimental sample.\footnote{For discussion regarding the selection of $k$ for non-Gaussian-distributed data, consult Annex G of Ref.~\cite{JCGM:GUM2008}.} See Sec.~\ref{sec:conf_int} for further discussion on the selection of $k$ and the resultant computation of confidence intervals.
  \label{def:coveragefactor}

\end{itemize}

\subsubsection{Terminology and its relation to our broader perspective on uncertainty}

As surveyed by Refs.~\citep{JCGM:GUM2008,JCGM:VIM2012}, the discussion that originally motivated many of theses definitions appears rather philosophical.  However, there are practical issues at stake related to both the content of the definitions as well as the need to adopt their usage.  We review such issues now.

At the heart of the matter is the observation that any uncertainty analysis, no matter how thorough, is inherently subjective.  This can can be understood, for example, by noting that the arithmetic mean is itself actually a random quantity that only approximates the true expectation value.\footnote{Notably, the same observation applies to the experimental standard deviation and the corresponding experimental standard deviation of the mean.}  Because its variation relative to the true value depends on the number of samples (notwithstanding a little bad luck), one could therefore argue that a better mean is always obtained by collecting more data.   We cannot collect data indefinitely, however, so the {\it quality} of an estimate necessarily depends on a {\it choice} of when to stop.  Ultimately, this discussion forces us to acknowledge that {\it the role of any uncertainty estimate is to facilitate decision making,} and, as such, the thoroughness of any analysis should be tailored to the decision at hand.

Practically speaking, the definitions as put forth by the VIM attempt to reflect this perspective while also capturing ideas that the statistics community have long found useful.  For example, the concept of an ``experimental standard deviation of the mean'' is nothing more than the ``standard error of the mean.''  However, the adjective ``experimental'' explicitly acknowledges that the estimate is in fact obtained from observation (and not analytical results), while the use of ``deviation'' in place of ``error'' emphasizes that the latter is unknowable.  Similar considerations apply to the term ``experimental standard deviation,'' which is more commonly referred to as the ``sample standard deviation.''

It is important to note that subjectivity as identified in this discussion does not arise just from questions of sampling. In particular, methods such as parametric bootstrap and correlation analyses (discussed below) invoke modeling assumptions that can never be objectively tested.  Moreover, experts may not even agree on how to compute a derived quantity, which leads to ambiguity in what we mean by a ``true value'' \cite{patrone1}.  That we should consider these issues carefully and assess their impacts on any prediction is reflected in the definition of the ``standard uncertainty,'' which does not actually tell us how to compute uncertainties.  {\it Rather it is the task of the modeler to consider the impacts of their assumptions and choices when formulating a final uncertainty estimate.  To this end, the language we use plays a large role in how well these considerations are communicated.}

As a final thought, we reiterate that the goal of an uncertainty analysis is not necessarily to perform the most thorough computations possible, but rather to communicate clearly and openly what has been assumed and done.  We cannot predict every use-case for data that we generate, nor can we anticipate the decisions that will be made on the basis of our predictions.  The importance of clearly communicating therefore rests on the fact that in doing so, we allow others to decide for themselves whether our analysis is sufficient or requires revisiting.  To this end, consistent and precise use of language plays an important, if understated role.