Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions 07-CI.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ where *N* is the size of the population, and *n* is the size of the sample. When

## What is a Confidence Interval?

Confidence intervals are a statement about the percentage of confidence intervals that contain the true parameter value. This behavior of confidence intervals is nicely visualized on this website by Kristoffer Magnusson: <http://rpsychologist.com/d3/CI/>. In @fig-cisim we see blue dots that represent means from a sample, and that fall around a red dashed vertical line, which represents the true value of the parameter in the population. Due to variation in the sample, the estimates do not all fall on the red dashed line. The horizontal lines around the blue dots are the confidence intervals. By default, the visualization shows 95% confidence intervals. Most of the lines are black (which means the confidence interval overlaps with the orange dashed line indicating the true population value), but some are red (indicating they do not capture the true population value). In the long run, 95% of the horizontal bars will be black, and 5% will be red.
Confidence intervals are a statement about the percentage of confidence intervals that contain the true parameter value. This behavior of confidence intervals is nicely visualized on this website by Kristoffer Magnusson: <http://rpsychologist.com/d3/CI/>. In @fig-cisim we see blue dots that represent means from a sample, and that fall around an orange dashed vertical line, which represents the true value of the parameter in the population. Due to variation in the sample, the estimates do not all fall on the orange dashed line. The horizontal lines around the blue dots are the confidence intervals. By default, the visualization shows 95% confidence intervals. Most of the lines are black (which means the confidence interval overlaps with the orange dashed line indicating the true population value), but some are red (indicating they do not capture the true population value). In the long run, 95% of the horizontal bars will be black, and 5% will be red.

```{r fig-cisim, fig.margin = FALSE, echo = FALSE}
#| fig-cap: "Series of simulated point estimates and confidence intervals."
Expand Down Expand Up @@ -90,7 +90,7 @@ It is tempting to use a Bayesian interpretation of a single confidence interval,

There is a direct relationship between the CI around an effect size and statistical significance of a null-hypothesis significance test. For example, if an effect is statistically significant (*p* \< 0.05) in a two-sided independent *t*-test with an alpha of .05, the 95% CI for the mean difference between the two groups will not include zero. Confidence intervals are sometimes said to be more informative than *p*-values, because they not only provide information about whether an effect is statistically significant (i.e., when the confidence interval does not overlap with the value representing the null hypothesis), but also communicate the precision of the effect size estimate. This is true, but as mentioned in the chapter on [*p*-values](#sec-pvalue) it is still recommended to add exact *p*-values, which facilitates the re-use of results for secondary analyses [@appelbaum_journal_2018], and allows other researchers to compare the *p*-value to an alpha level they would have preferred to use [@lehmann_testing_2005].

In order to maintain the direct relationship between a confidence interval and a *p*-value it is necessary to adjust the confidence interval level whenever the alpha level is adjusted. For example, if an alpha level of 5% is corrected for three comparisons to 0.05/3 - 0.0167, the corresponding confidence interval would be a 1 - 0.0167 = 0.9833 confidence interval. Similarly, if a *p*-value is computed for a one-sided *t*-test, there is only an upper or lower limit of the interval, and the other end of the interval ranges to −∞ or ∞.
In order to maintain the direct relationship between a confidence interval and a *p*-value it is necessary to adjust the confidence interval level whenever the alpha level is adjusted. For example, if an alpha level of 5% is corrected for three comparisons to 0.05/3 = 0.0167, the corresponding confidence interval would be a 1 - 0.0167 = 0.9833 confidence interval. Similarly, if a *p*-value is computed for a one-sided *t*-test, there is only an upper or lower limit of the interval, and the other end of the interval ranges to −∞ or ∞.

To maintain a direct relationship between an *F*-test and its confidence interval, a 90% CI for effect sizes from an *F*-test should be provided. The reason for this is explained by [Karl Wuensch](https://web.archive.org/web/20140104080701/http://core.ecu.edu/psyc/wuenschk/docs30/CI-Eta2-Alpha.doc). Where Cohen’s *d* can take both positive and negative values, r² or η² are squared, and can therefore only take positive values. This is related to the fact that *F*-tests (as commonly used in ANOVA) are one-sided. If you calculate a 95% CI, you can get situations where the confidence interval includes 0, but the test reveals a statistical difference with a *p* < .05 (for a more mathematical explanation, see @steiger_test_2004). This means that a 95% CI around Cohen's *d* in an independent *t*-test equals a 90% CI around η² for exactly the same test performed as an ANOVA. As a final detail, because eta-squared cannot be smaller than zero, the lower bound for the confidence interval cannot be smaller than 0. This means that a confidence interval for an effect that is not statistically different from 0 has to start at 0. You report such a CI as 90% CI [.00; .XX] where the XX is the upper limit of the CI.

Expand Down Expand Up @@ -132,7 +132,7 @@ title("Forest plot for a simulated meta-analysis")

We can see, based on the fact that the confidence intervals do not overlap with 0, that studies 1 and 3 were statistically significant. The diamond shape named the FE model (Fixed Effect model) is the meta-analytic effect size. Instead of using a black horizontal line, the upper limit and lower limit of the confidence interval are indicated by the left and right points of the diamond, and the center of the diamond is the meta-analytic effect size estimate. A meta-analysis calculates the effect size by combining and weighing all studies. The confidence interval for a meta-analytic effect size estimate is always narrower than that for a single study, because of the combined sample size of all studies included in the meta-analysis.

In the preceding section, we focused on examining whether the confidence interval overlapped with 0. This is a confidence interval approach to a null-hypothesis significance test. Even though we are not computing a *p*-value, we can directly see from the confidence interval whether *p* < $\alpha$. The confidence interval approach to hypothesis testing makes it quite intuitive to think about performing tests against non-zero null hypotheses [@bauer_unifying_1996]. For example, we could test whether we can reject an effect of 0.5 by examining if the 95% confidence interval does not overlap with 0.5. We can test whether an effect is *smaller* that 0.5 by examining if the 95% confidence interval falls completely *below* 0.5. We will see that this leads to a logical extension of null-hypothesis testing where, instead of testing to reject an effect of 0, we can test whether we can reject other effects of interest in **range predictions** and [**equivalence tests**](#sec-equivalencetest).
In the preceding section, we focused on examining whether the confidence interval overlapped with 0. This is a confidence interval approach to a null-hypothesis significance test. Even though we are not computing a *p*-value, we can directly see from the confidence interval whether *p* < $\alpha$. The confidence interval approach to hypothesis testing makes it quite intuitive to think about performing tests against non-zero null hypotheses [@bauer_unifying_1996]. For example, we could test whether we can reject an effect of 0.5 by examining if the 95% confidence interval does not overlap with 0.5. We can test whether an effect is *smaller* than 0.5 by examining if the 95% confidence interval falls completely *below* 0.5. We will see that this leads to a logical extension of null-hypothesis testing where, instead of testing to reject an effect of 0, we can test whether we can reject other effects of interest in **range predictions** and [**equivalence tests**](#sec-equivalencetest).

## The Standard Error and 95% Confidence Intervals

Expand Down Expand Up @@ -241,10 +241,10 @@ ggplot(as.data.frame(x), aes(x)) + # plot data
To calculate the prediction interval, we need a slightly different formula for the standard error than that which was used for the confidence interval, namely:

$$
Standard \ Error \ (SE) = \sigma/\sqrt(1+1/n)
Standard \ Error \ (SE) = \sigma\sqrt(1+1/n)
$$

When we rewrite the formula used for the confidence interval to $\sigma/\sqrt(1/N)$, we see that the difference between a confidence interval and the prediction interval is in the “1+” which always leads to wider intervals. Prediction intervals are **wider**, because they are constructed so that they will contain **a single future value** 95% of the time, instead of the **mean**. The fact that prediction intervals are wide is a good reminder that it is difficult to predict what will happen for any single individual.
When we rewrite the formula used for the confidence interval to $\sigma\sqrt(1/n)$, we see that the difference between a confidence interval and the prediction interval is in the “1+” which always leads to wider intervals. Prediction intervals are **wider**, because they are constructed so that they will contain **a single future value** 95% of the time, instead of the **mean**. The fact that prediction intervals are wide is a good reminder that it is difficult to predict what will happen for any single individual.

## Capture Percentages

Expand Down Expand Up @@ -574,7 +574,7 @@ cat(longmcq(opts_p))

Capture percentages are rarely directly used to make statistical inferences. The main reason we discuss them here is really to prevent the common misunderstanding that 95% of future means fall within a single confidence interval: Capture percentages clearly show that is not true. Prediction intervals are also rarely used in psychology, but are more common in data science.

**Q9** So far we have looked at confidence intervals around means, but we can also compute confidence intervals around standard deviations. If you run lines the first lines of the code below, you will see that with an alpha level of 0.05, 100 observations, and a true standard deviation of 1, the 95% CI around the standard deviation is [0.88; 1.16]. Change the assumed population standard deviation from 1 to 2 (st_dev <- 2). Keep all other settings the same. What is the 95% CI around the standard deviation of 2 with 100 observations?
**Q9** So far we have looked at confidence intervals around means, but we can also compute confidence intervals around standard deviations. If you run the first lines of the code below, you will see that with an alpha level of 0.05, 100 observations, and a true standard deviation of 1, the 95% CI around the standard deviation is [0.88; 1.16]. Change the assumed population standard deviation from 1 to 2 (st_dev <- 2). Keep all other settings the same. What is the 95% CI around the standard deviation of 2 with 100 observations?

```{r, eval=FALSE}
alpha_level <- 0.05 # set alpha level
Expand Down