You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/tractable-autocorrelations.Rmd
+75-21Lines changed: 75 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -15,15 +15,15 @@ knitr::opts_chunk$set(
15
15
)
16
16
```
17
17
18
-
This vignette demonstrates the use of an AR1 model within a GAM to account for
19
-
autocorrelation of the error terms of the model. We will use data from the
20
-
Sarica dataset [@Sarica2017] to demonstrate the how to include an autocorrelation
21
-
term and its impact on the model statistics.
18
+
This vignette demonstrates the use of an AR1 model within the GAM to account for
19
+
the spatial autocorrelation of the errors of the model. The ideas that we use closely follow the work of Van Rij and colleagues [@VanRij2019]. While their work used data that is different in many respects from our data (time-series of eye pupil size in psycholingustic experiments), there are also some similarities to the tract profile data that we are analyzing in `tractable`. For example, the signals tend to change slowly over time in their analysis, and tend to change rather slowly in space in our analysis. In both cases, this means that the models fail to capture some characteristics of the data, and that some inferences from the GAM models tend to be anti-conservative, unless a mitigation strategy and careful model checking are implemented. In particular, in both cases, the residuals of the model may exhibit substantial auto-correlation. They may also end up being centered not on zero. It is always worth checking the underlying assumption of normally-distributed residuals by plotting a so-called QQ plot. Finally, we can use formal model comparison methods to adjudicate between alternative models.
22
20
23
-
We start by loading the `tractable` library:
21
+
As an example of this approach, we will use data from the Sarica dataset [@Sarica2017] to demonstrate how to include an autocorrelation term in the GAM, and its impact on model statistics. We start by loading the `tractable` library, as well as the [itsadug](https://www.rdocumentation.org/packages/itsadug/versions/2.4.1/topics/itsadug) and [gratia](https://gavinsimpson.github.io/gratia/) libraries, which both provide functionality to assess GAMs fit by `mgcv` (our workhorse for GAM fitting).
24
22
25
23
```{r setup}
26
24
library(tractable)
25
+
library(itsadug)
26
+
library(gratia)
27
27
```
28
28
29
29
Next, we will use a function that is included in `tractable` to read this dataset
We will first fit a GAM model without any autocorrelation structure using the
41
-
`tractable_single_bundle` function. This model will use "group" and "age" to
42
-
predict FA, while smoothing over the tract nodes. We will also use the automated
43
-
procedure implemented in `tractable_single_bundle` to determine the ideal value
44
-
for `k`, a parameter used to determine the number of spline functions. The default behavior for `tractable_single_bundle` is to include the AR1 model. To avoid this, we set the parameter `autocor` to `FALSE`.
40
+
We will first fit a GAM model that does not account for autocorrelation structure
41
+
in the residuals using the `tractable_single_bundle` function. This model will use
42
+
"group" and "age" to account for the measured FA, while smoothing over the tract
43
+
nodes. We will also use the automated procedure implemented in
44
+
`tractable_single_bundle` to determine the ideal value for `k`, a parameter used
45
+
to determine the number of spline functions. The default behavior for
46
+
`tractable_single_bundle` is to include an AR1 model to account for
47
+
autocorrelations, as we will see below. But for now, to avoid this, we first set
48
+
the parameter `autocor` to `FALSE`.
45
49
46
50
47
51
```{r fit no ac model}
@@ -67,22 +71,46 @@ In addition, we see that there is a statistically significant effect of group
67
71
summary(gam_fit_cst_no_ac)
68
72
```
69
73
74
+
In this model, no steps were taken to account for autocorrelated residuals. We will use a few model diagnostics to evaluate the model. First, we will use the `gratia::appraise` function, which presents a few different visuals about the model:
The top left plot is a QQ plot, it shows the residuals as a function of their quantile. In a perfect world (or at least a world in which model assumptions are valid), these points should fall along the equality line. This plot is not terrible, but there are some deviations. Another plot to look at is the plot of the residuals as a function of the prediction. Here, we look both at the overall location and shape of the point cloud, as well as for any signs of clear structure. In this case, we see some signs of trouble: the whole cloud is shifted away from zero, and there are what appear as trails of points, suggesting that some points form patterns. The first of these issues is also apparent in the bottom left, where residuals are not zero centerd in the histogram of model residuals.
82
+
83
+
Another diagnostic plot that is going to be crucial as a diagnostic is the plot of the autocorrelation function of the residuals. A plot of this sort is provided by the `itsadug` library:
84
+
85
+
```{r plot acf residuals without ac, echo=TRUE, fig.height=6, fig.width=6}
86
+
itsadug::acf_resid(gam_fit_cst_no_ac)
87
+
```
88
+
89
+
The dashed blue lines indicate the 95% confidence interval for the auto-correlation
90
+
function of white noise. Here, we see that the auto-correlation at many of the lags
91
+
(and particularly in the immediate neighbor, lag-1) is substantially larger than
92
+
would be expected for a function with no autocorrelations. These autocorrelations
93
+
pose a danger to inference not only because of mis-specification of the model, but
94
+
also because we are going to under-estimate the standard error of the model in
95
+
this setting and this will result in false positives (this is what Van Rij et al.
96
+
elegantly refer to as "anti-conservative" models)
97
+
70
98
To account for potential spatial autocorrelation of FA values along the length
71
-
of the tract profile, we can incorporate a AR1 model into our GAM. Briefly, this
72
-
AR1 model is a linear model that estimates the amount of influence of the FA value of the preceding node on the FA value of the current node (see [@VanRij2019](https://journals.sagepub.com/doi/pdf/10.1177/2331216519832483) for an overview of accounting for autocorrelation using the `mgcv` package).
99
+
of the tract profile, we can incorporate an AR1 model into our GAM. Briefly, the
100
+
AR1 model is a linear model that estimates and accounts for the amount of influence
101
+
of the model residual FA value of each node on the residual FA value of
102
+
their neighbor node. This is somewhat akin to "pre-whitening" that fMRI researchers undertake, to account for temporal auto-correlations in the time-series measured with fMRI (see e.g. [@Olszowy2019-ge]).
73
103
74
104
The AR1 model takes a parameter $\rho$ to estimate autocorrelation effects. We
75
105
can pass our initial model into the function `itsadug::start_value_rho` to
76
-
automatically determine the value of $\rho$. We can also plot the autocorrelation
77
-
by setting `plot=TRUE` within that function (pictured below).
By default, the `tractable_single_bundle` function will determine the value of
85
-
$\rho$ and incorporate the AR1 model into the GAM estimation.
113
+
By default, the `tractable_single_bundle` function empirically determines the value of $\rho$ based on the data and uses it to incorporate the AR1 model of the residuals into the GAM estimation.
Notice some improvements to model characteristics: the residuals are more centered around zero and the QQ plot is somewhat improved. There is some residual structure in the scatter plot of residuals as a function of prediction. We can ask how bad this structure is in terms of the residual autocorrelation:
144
+
145
+
```{r plot ACF with AR1, echo=TRUE, fig.height=6, fig.width=6}
146
+
rho2 <- itsadug::acf_resid(gam_fit_cst)
147
+
print(rho2)
148
+
```
149
+
150
+
This shows that the lag-1 autocorrelation has been reduced from approximately 0.94 to approximately 0.29.
151
+
152
+
Finally, formal model comparison can tell us which of these models better fit the
153
+
data. Using the `itsadug` library this can be done using the Akaike Information
154
+
Criterion as a comparator. In this case, this also indicates that the model that
155
+
accounts for autocorrelations also has smaller residuals considering the number of
156
+
parameters, suggesting that it is overall a better model of the data.
0 commit comments