Skip to content

Commit a775b79

Browse files
committed
Note 13-15 for 334.
1 parent 66bc367 commit a775b79

4 files changed

Lines changed: 514 additions & 0 deletions

File tree

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: Hypothesis Testing
3+
date: 2026-03-04
4+
---
5+
6+
## 1. Introduction to Hypothesis Testing
7+
8+
Hypothesis testing is a formal statistical procedure used to make decisions about the underlying properties of a population based on a sample of observations. The objective is to evaluate whether there is sufficient statistical evidence to reject a default baseline assumption in favor of an alternative claim.
9+
10+
We formalize our problem using two competing hypotheses:
11+
12+
- **Null hypothesis ($H\_0$):** This represents the default status quo, a statement of "no effect," "no discovery," or "no difference."
13+
- **Alternative hypothesis ($H\_1$):** This represents the active claim we wish to prove, representing a "discovery" or a significant deviation from the baseline.
14+
15+
---
16+
17+
## 2. Statistical Formulation
18+
19+
Suppose we observe data $X = (X\_1, \dots, X\_n)$ drawn from a probability distribution $f\_\theta$, where $\theta \in \Theta$ is an unknown parameter. The parameter space $\Theta$ is partitioned into two disjoint subsets, $\Theta\_0$ and $\Theta\_1$.
20+
21+
The formal hypothesis testing problem is stated as:
22+
$$
23+
H\_0: \theta \in \Theta\_0 \quad \text{vs.} \quad H\_1: \theta \in \Theta\_1
24+
$$
25+
By definition, we require $\Theta\_0 \cap \Theta\_1 = \emptyset$.
26+
27+
> Since it's a partition of the parameter space $\Theta$.
28+
29+
**Definition (Statistical Test):**
30+
A test is a formal decision rule, defined as a function $T$ that maps the observed data $X$ to the set of hypotheses $\{H\_0, H\_1\}$. Based on the observed values, the test explicitly instructs us to either "Accept $H\_0$" or "Reject $H\_0$" (which implies accepting $H\_1$).
31+
32+
### 2.1 Types of Hypotheses
33+
34+
Hypotheses are broadly categorized based on the specific number of parameter values they contain.
35+
36+
- **Simple Hypothesis:** The hypothesis precisely specifies exactly one single value for the parameter. For example, $H\_0: \theta = \theta\_0$. Thus, $|\Theta\_0| = 1$.
37+
- **Composite Hypothesis:** The hypothesis specifies a range or multiple possible values for the parameter. For example, $H\_1: \theta > \theta\_0$ or $H\_1: \theta \neq \theta\_0$. Thus, $|\Theta\_1| > 1$.
38+
39+
### 2.2 Examples of Testing Scenarios
40+
41+
**Example 1: Coin Tossing (Simple vs. Simple)**
42+
Suppose I toss a coin with bias $p$ exactly $4$ times, and $X$ of the tosses turn out to be heads. Suppose we have some prior knowledge that the bias is either $0.5$ or $0.7$.
43+
We formulate the hypothesis testing problem as:
44+
$$
45+
H\_0: p = 0.5 \quad \text{vs.} \quad H\_1: p = 0.7
46+
$$
47+
A test $T$ would take the observed number of heads $X \in \{0, 1, 2, 3, 4\}$ and map it to a decision in $\{H\_0, H\_1\}$.
48+
49+
**Example 2: Normal Mean Testing**
50+
Suppose we have a sample $X\_1, \dots, X\_n \sim \mathcal{N}(\mu, \sigma^2)$ with a known variance $\sigma^2$ but an unknown mean $\mu$.
51+
52+
- **Simple vs. Simple:** $H\_0: \mu = \mu\_0$ vs. $H\_1: \mu = \mu\_1$.
53+
- **Two-sided test:** $H\_0: \mu = \mu\_0$ vs. $H\_1: \mu \neq \mu\_0$. (Simple vs. Composite)
54+
- **One-sided test:** $H\_0: \mu = \mu\_0$ vs. $H\_1: \mu > \mu\_0$. (Simple vs. Composite)
55+
56+
---
57+
58+
## 3. Evaluating a Statistical Test
59+
60+
Whenever we make a decision using a statistical test, we risk making one of two distinct types of errors:
61+
62+
1. **$\alpha$ - Type I Error (False Positive):** We incorrectly reject the null hypothesis $H\_0$ when it is actually true.
63+
- The probability of committing a Type I Error is called the **Significance Level**, denoted by $\alpha$.
64+
- $\alpha = \prob(\text{Output } H\_1 \mid H\_0 \text{ is true})$.
65+
2. **$\beta$ - Type II Error (False Negative):** We incorrectly accept the null hypothesis $H\_0$ when the alternative $H\_1$ is actually true.
66+
- The probability of a Type II error is denoted by $\beta$.
67+
- The **Power** of the test is defined as $1 - \beta$, which is the probability of correctly rejecting $H\_0$ when $H\_1$ is true.
68+
- $1 - \beta = \prob(\text{Output } H\_1 \mid H\_1 \text{ is true})$.
69+
70+
In rigorous statistical practice, it is mathematically impossible to simultaneously minimize both $\alpha$ and $\beta$ for a fixed sample size $n$. The standard frequentist paradigm dictates that we fix the significance level $\alpha$ at a pre-determined, strictly controlled threshold (such as $0.05$ or $0.01$) and then actively seek the specific test that maximizes the statistical power $1 - \beta$.
71+
72+
> $\alpha$ and $\beta$ move in opposite directions. $\alpha$ and Power move in the same direction.
73+
74+
---
75+
76+
## 4. The Likelihood Ratio Test (Simple vs. Simple)
77+
78+
When both $H\_0$ and $H\_1$ are simple hypotheses (e.g., $H\_0: \theta = \theta\_0$ and $H\_1: \theta = \theta\_1$), the **Neyman-Pearson Lemma** provides the absolute optimal test that maximizes power for a given significance level $\alpha$. This optimal test is the **Likelihood Ratio (LR) Test**.
79+
80+
### 4.1 The Decision Rule
81+
82+
The Likelihood Ratio is defined as the ratio of the likelihood of the data under the alternative hypothesis to the likelihood of the data under the null hypothesis:
83+
$$
84+
\text{LR}(X) = \frac{f\_{\theta\_1}(X\_1, \dots, X\_n)}{f\_{\theta\_0}(X\_1, \dots, X\_n)}
85+
$$
86+
The formal decision rule for the Likelihood Ratio Test states that we should reject $H\_0$ if the likelihood ratio strictly exceeds a specific critical threshold $c$:
87+
$$
88+
\text{Reject } H\_0 \iff \text{LR}(X) > c
89+
$$
90+
The critical value $c$ is meticulously chosen to ensure that the probability of a Type I error exactly equals our desired significance level $\alpha$, that:
91+
$$
92+
\prob(\text{LR}(X) > c \mid \theta = \theta\_0) = \alpha
93+
$$
94+
95+
### 4.2 Example: Normal Mean Testing
96+
97+
Suppose $X\_1, \dots, X\_n \sim \mathcal{N}(\mu, \sigma^2)$. We want to find the exact LR test for $H\_0: \mu = \mu\_0$ versus $H\_1: \mu = \mu\_1$, assuming $\mu\_1 > \mu\_0$.
98+
99+
**Step 1: Construct the Likelihood Ratio**
100+
$$
101+
\text{LR}(X) = \frac{\exp\left(-\frac{1}{2\sigma^2} \sum\_{i=1}^n (X\_i - \mu\_1)^2\right)}{\exp\left(-\frac{1}{2\sigma^2} \sum\_{i=1}^n (X\_i - \mu\_0)^2\right)}
102+
$$
103+
By expanding the squares inside the exponential and simplifying, we get:
104+
$$
105+
\text{LR}(X) = \exp\left( \frac{n(\mu\_1 - \mu\_0)}{\sigma^2} \overline{X}\_n - \frac{n(\mu\_1^2 - \mu\_0^2)}{2\sigma^2} \right)
106+
$$
107+
108+
**Step 2: Simplify the Rejection Region**
109+
We reject $H\_0$ when $\text{LR}(X) > c$. Taking the natural logarithm of both sides:
110+
$$
111+
\begin{align*}
112+
\frac{n(\mu\_1 - \mu\_0)}{\sigma^2} \overline{X}\_n - \frac{n(\mu\_1^2 - \mu\_0^2)}{2\sigma^2} &> \ln c \\\\
113+
\overline{X}\_n &> \frac{\sigma^2}{n(\mu\_1 - \mu\_0)} \ln c + \frac{\mu\_1 + \mu\_0}{2} = \tau
114+
\end{align*}
115+
$$
116+
Because $\mu\_1 > \mu\_0$, the inequality direction is strictly preserved. The test mathematically reduces to: **Reject $H\_0$ if $\overline{X}\_n > \tau$.**
117+
118+
**Step 3: Determine the Critical Threshold**
119+
We want $\prob(\overline{X}\_n > \tau \mid \mu = \mu\_0) = \alpha$.
120+
Under $H\_0$, the sample mean follows $\overline{X}\_n \sim \mathcal{N}(\mu\_0, \sigma^2/n)$.
121+
Standardizing this variable gives:
122+
$$
123+
\prob\left( \frac{\overline{X}\_n - \mu\_0}{\sigma/\sqrt{n}} > \frac{\tau - \mu\_0}{\sigma/\sqrt{n}} \right) = \alpha
124+
$$
125+
Because the standardized variable is a standard Normal $Z$, we set $\frac{\tau - \mu\_0}{\sigma/\sqrt{n}} = z\_\alpha$, where $z\_\alpha$ is the upper $\alpha$-quantile of the standard normal distribution. This yields the final threshold:
126+
$$
127+
\tau = \mu\_0 + z\_\alpha \frac{\sigma}{\sqrt{n}}
128+
$$
129+
130+
---
131+
132+
## References
133+
134+
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
135+
2. Han, Y. (2026). Lecture 13: Simple Hypothesis Testing.
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: p-Values and Confidence Sets
3+
date: 2026-03-09
4+
---
5+
6+
## 1. Moving Beyond Simple Hypothesis Testing
7+
8+
In the previous lecture, we established that the Likelihood Ratio (LR) test is perfectly optimal for differentiating between two simple hypotheses. However, this foundational paradigm is highly restrictive for practical applications.
9+
10+
1. **Composite Hypotheses:** The LR test cannot be directly applied when the alternative hypothesis $H\_1$ is composite (e.g., $H\_1: \theta > \theta\_0$).
11+
2. **Difficult Power Calculations:** Controlling the Type II error probability ($\beta$) is mathematically difficult when dealing with composite alternative spaces, because $\beta$ must be calculated for every single parameter configuration within $H\_1$:
12+
$$
13+
\beta = \max\_{\theta \in \Theta\_1} \prob(\text{Accept } H\_0 \mid \theta)
14+
$$
15+
16+
Therefore, in broad practical usage, most statistical tests are exclusively designed to strictly control the Type I error (the significance level $\alpha$), without explicitly optimizing for $\beta$. We accomplish this by relying on test statistics with known null distributions.
17+
18+
> In addition, under a practical scenario, we fix $\alpha$ because false positives are considered worse than false negatives.
19+
20+
---
21+
22+
## 2. Test Statistics and Critical Regions
23+
24+
The standard strategy for composite hypothesis testing is to construct a specific measurable function of the data and the parameter, $h(X, \theta)$, such that under the null hypothesis $H\_0$, the sampling distribution of $h$ is completely known and free of unknown parameters.
25+
26+
For a test of $H\_0: \theta = \theta\_0$, we calculate the test statistic evaluated at the null parameter: $T(X) = h(X, \theta\_0)$.
27+
28+
We then establish a **rejection region** based strictly on critical values. For a one-sided test, we reject $H\_0$ if $T(X) > c\_\alpha$. The threshold $c\_\alpha$ is systematically chosen such that:
29+
$$
30+
\prob(T(X) > c\_\alpha \mid H\_0) = \alpha
31+
$$
32+
If the test statistic falls outside this narrowly constructed region, we conclude that the observed data is fundamentally incompatible with the null hypothesis, and we reject $H\_0$.
33+
34+
---
35+
36+
## 3. The $p$-Value
37+
38+
When simply reporting "Reject" or "Accept", a significant amount of statistical context is lost. A test statistic that barely crosses the threshold is treated identically to one that massively exceeds it. The **$p$-value** addresses this limitation by reporting the continuous strength of the evidence against the null hypothesis.
39+
40+
**Definition (P-Value):**
41+
The $p$-value is the probability, calculated precisely under the assumption that the null hypothesis $H\_0$ is true, of observing a test statistic at least as extreme as the one that was actually observed in the sample data.
42+
43+
If $T\_\text{obs}$ is the realized, observed value of our test statistic $T(X)$, the $p$-value for a right-sided test is:
44+
$$
45+
\text{$p$-value} = \prob(T(X) \ge T\_\text{obs} \mid H\_0)
46+
$$
47+
48+
### 3.1 Properties of the $p$-Value
49+
50+
1. **Decision Rule:** A $p$-value perfectly acts as an alternative decision mechanism. We reject $H\_0$ if and only if the calculated $\text{$p$-value} \le \alpha$.
51+
2. **Uniform Distribution Under the Null:** A fascinating mathematical property is that if the null hypothesis is completely true, and the test statistic is continuous, the $p$-value itself acts as a random variable that is uniformly distributed on the interval $[0, 1]$.
52+
$$
53+
\text{$p$-value} \sim \text{Unif}[0, 1] \quad \text{under } H\_0
54+
$$
55+
56+
---
57+
58+
## 4. Confidence Sets and Duality
59+
60+
Hypothesis testing aims to determine if a specific, isolated parameter value $\theta\_0$ is plausible. A **Confidence Set** essentially extends this logic by finding *all* possible parameter values that are plausible given the observed data.
61+
62+
**Definition (Confidence Set):**
63+
A $(1 - \alpha)$-confidence set (or interval) $CI(X)$ is a data-dependent interval constructed such that the true parameter $\theta\_0$ is contained within the set with a probability of at least $1 - \alpha$ prior to sampling.
64+
$$
65+
\prob(\theta\_0 \in CI(X) \mid \theta = \theta\_0) \ge 1 - \alpha \quad \text{for every } \theta\_0
66+
$$
67+
68+
### 4.1 The Duality Principle
69+
70+
There is a profound mathematical duality between hypothesis testing and confidence intervals. A confidence interval simply consists of all the null hypothesis values that would *not* be rejected by a level-$\alpha$ hypothesis test.
71+
72+
Let $A(\theta\_0)$ be the acceptance region of a level-$\alpha$ test for $H\_0: \theta = \theta\_0$.
73+
$$
74+
\prob(X \in A(\theta\_0) \mid \theta = \theta\_0) \ge 1 - \alpha
75+
$$
76+
The corresponding confidence interval is constructed by simply pivoting this probability statement to isolate the parameter:
77+
$$
78+
CI(X) = \{ \theta\_0 : X \in A(\theta\_0) \}
79+
$$
80+
81+
### 4.2 Example 1: Normal Mean with Unknown Variance
82+
83+
Suppose $X\_1, \dots, X\_n \sim \mathcal{N}(\mu, \sigma^2)$, with both parameters fully unknown. We wish to test $H\_0: \mu = \mu\_0$ versus $H\_1: \mu \neq \mu\_0$.
84+
85+
Because the true variance $\sigma^2$ is unknown, we use the sample variance $S\_n^2$. Our standard test statistic leverages the Student's t-distribution:
86+
$$
87+
T(X) = \frac{\overline{X}\_n - \mu\_0}{S\_n / \sqrt{n}} \sim t\_{n-1} \quad \text{under } H\_0
88+
$$
89+
90+
The symmetric acceptance region for a level-$\alpha$ test is:
91+
$$
92+
A(\mu\_0) = \{ X : -t\_{n-1, \alpha/2} \le \frac{\overline{X}\_n - \mu\_0}{S\_n / \sqrt{n}} \le t\_{n-1, \alpha/2} \}
93+
$$
94+
95+
To find the corresponding confidence interval, we mathematically pivot the inequality inside the acceptance region to isolate $\mu\_0$ in the center:
96+
$$
97+
\begin{align*}
98+
-t\_{n-1, \alpha/2} &\le \frac{\overline{X}\_n - \mu\_0}{S\_n / \sqrt{n}} \le t\_{n-1, \alpha/2} \\\\
99+
-t\_{n-1, \alpha/2} \frac{S\_n}{\sqrt{n}} &\le \overline{X}\_n - \mu\_0 \le t\_{n-1, \alpha/2} \frac{S\_n}{\sqrt{n}} \\\\
100+
-\overline{X}\_n - t\_{n-1, \alpha/2} \frac{S\_n}{\sqrt{n}} &\le -\mu\_0 \le -\overline{X}\_n + t\_{n-1, \alpha/2} \frac{S\_n}{\sqrt{n}} \\\\
101+
\overline{X}\_n - t\_{n-1, \alpha/2} \frac{S\_n}{\sqrt{n}} &\le \mu\_0 \le \overline{X}\_n + t\_{n-1, \alpha/2} \frac{S\_n}{\sqrt{n}}
102+
\end{align*}
103+
$$
104+
Thus, the exact $(1-\alpha)$ confidence interval for $\mu$ is directly derived from the hypothesis test's acceptance criteria.
105+
106+
### 4.3 Example 2: Normal Variance Testing
107+
108+
Suppose we instead wish to test the variance of our normal sample, setting $H\_0: \sigma = \sigma\_0$ versus a two-sided alternative $H\_1: \sigma \neq \sigma\_0$.
109+
110+
A natural test statistic is derived using the sample variance $S\_n^2$:
111+
$$
112+
T(X) = \frac{(n-1)S\_n^2}{\sigma\_0^2} = \frac{\sum\_{i=1}^n (X\_i - \overline{X}\_n)^2}{\sigma\_0^2} \sim \chi\_{n-1}^2 \quad \text{under } H\_0
113+
$$
114+
115+
The acceptance region for a level-$\alpha$ test involves the critical values of the Chi-Square distribution:
116+
$$
117+
A(\sigma\_0) = \{ X : c\_{1 - \alpha/2} \le \frac{\sum\_{i=1}^n (X\_i - \overline{X}\_n)^2}{\sigma\_0^2} \le c\_{\alpha/2} \}
118+
$$
119+
120+
By pivoting this acceptance region, we extract a confidence interval for the unknown variance $\sigma^2$:
121+
$$
122+
\begin{align*}
123+
c\_{1 - \alpha/2} &\le \frac{(n-1)S\_n^2}{\sigma\_0^2} \le c\_{\alpha/2} \\\\
124+
\frac{1}{c\_{\alpha/2}} &\le \frac{\sigma\_0^2}{(n-1)S\_n^2} \le \frac{1}{c\_{1 - \alpha/2}} \\\\
125+
\frac{(n-1)S\_n^2}{c\_{\alpha/2}} &\le \sigma\_0^2 \le \frac{(n-1)S\_n^2}{c\_{1 - \alpha/2}}
126+
\end{align*}
127+
$$
128+
This precisely constructs the $(1-\alpha)$ confidence interval for $\sigma^2$.
129+
130+
---
131+
132+
## References
133+
134+
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
135+
2. Han, Y. (2026). Lecture 14: P-value, confidence set.

0 commit comments

Comments
 (0)