Skip to content

Commit 06cfdfa

Browse files
committed
Update 334 notes.
1 parent 0f52015 commit 06cfdfa

6 files changed

Lines changed: 224 additions & 128 deletions

File tree

notes/courses/MATH-UA-334/01-sample-probability.md

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Sample space and probability
33
date: 2026-01-21
44
---
55

6-
## Roadmap
6+
## 0.Roadmap
77

88
We model a “random experiment” with a triple $(\Omega, \mathcal F, \mathbb P)$:
99

@@ -23,9 +23,9 @@ The rest of the course basically builds on these ideas: compute probabilities by
2323

2424
**Examples:**
2525

26-
* **Coin Toss:** Toss a coin three times. The sample space is:
26+
- **Coin Toss:** Toss a coin three times. The sample space is:
2727
$$\Omega = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}$$
28-
* **Card Draw:** Draw one card from a standard 52-card deck. The sample space contains 52 distinct elements.
28+
- **Card Draw:** Draw one card from a standard 52-card deck. The sample space contains 52 distinct elements.
2929

3030
### 1.2 Events
3131

@@ -47,13 +47,15 @@ For events $A, B \subseteq \Omega$, we define the following set operations:
4747
**Example (Coin tossed 3 times):**
4848

4949
Let $\Omega$ be the set of 3 coin flips. Define the following events:
50-
- $A =$ “at most one tail” $= \{HHH, HHT, HTH, THH\}$
51-
- $B =$ “first flip is tails” $= \{THH, THT, TTH, TTT\}$
50+
51+
- $A =$ “at most one tail” $= \{HHH, HHT, HTH, THH \}$
52+
- $B =$ “first flip is tails” $= \{THH, THT, TTH, TTT \}$
5253

5354
**Operations:**
54-
1. **Union ($A \cup B$):** "At most one tail OR first flip is tails".
55+
56+
1. **Union ($A \cup B$):** "At most one tail OR first flip is tails".
5557
$$A \cup B = \{HHH, HHT, HTH, THH, THT, TTH, TTT\} = \Omega \setminus \{HTT\}$$
56-
2. **Intersection ($A \cap B$):** "At most one tail AND first flip is tails".
58+
2. **Intersection ($A \cap B$):** "At most one tail AND first flip is tails".
5759
$$A \cap B = \{THH\}$$
5860

5961
---
@@ -66,23 +68,29 @@ A probability measure $\mathbb P$ is a function that assigns a real number to ea
6668

6769
A function $\mathbb P: \mathcal{F} \to [0, 1]$ is a probability measure if it satisfies the following three axioms:
6870

69-
1. **Normalization:** $\mathbb P(\Omega) = 1$.
70-
2. **Non-negativity:** $\mathbb P(A) \ge 0$ for all $A \subseteq \Omega$.
71-
3. **Countable Additivity (Disjoint Unions):** If $A_1, A_2, \dots$ are disjoint events (i.e., $A_i \cap A_j = \emptyset$ for $i \neq j$), then:
71+
1. **Normalization:** $\mathbb P(\Omega) = 1$.
72+
2. **Non-negativity:** $\mathbb P(A) \ge 0$ for all $A \subseteq \Omega$.
73+
3. **Countable Additivity (Disjoint Unions):** If $A_1, A_2, \dots$ are disjoint events (i.e., $A_i \cap A_j = \emptyset$ for $i \neq j$), then:
7274
$$\mathbb P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \mathbb P(A_i)$$
7375
*Special case:* For two disjoint events $A$ and $B$, $\mathbb P(A \cup B) = \mathbb P(A) + \mathbb P(B)$.
7476

7577
### 2.2 Properties Derived from Axioms
7678

7779
Using the axioms, we can derive several useful properties:
7880

79-
1. **Complement Rule:** $\mathbb P(A^c) = 1 - \mathbb P(A)$.
81+
1. **Complement Rule:** $\mathbb P(A^c) = 1 - \mathbb P(A)$.
82+
8083
*Proof:* Since $A \cup A^c = \Omega$ and $A \cap A^c = \emptyset$, by Axiom 3 and 1: $\mathbb P(A) + \mathbb P(A^c) = \mathbb P(\Omega) = 1$.
81-
2. **Empty Set:** $\mathbb P(\emptyset) = 0$.
84+
85+
2. **Empty Set:** $\mathbb P(\emptyset) = 0$.
86+
8287
*Proof:* $\emptyset = \Omega^c$, so $\mathbb P(\emptyset) = 1 - \mathbb P(\Omega) = 0$.
83-
3. **Monotonicity:** If $A \subseteq B$, then $\mathbb P(A) \le \mathbb P(B)$.
88+
89+
3. **Monotonicity:** If $A \subseteq B$, then $\mathbb P(A) \le \mathbb P(B)$.
90+
8491
*Proof:* Write $B = A \cup (B \cap A^c)$. Since these are disjoint, $\mathbb P(B) = \mathbb P(A) + \mathbb P(B \cap A^c) \ge \mathbb P(A)$ (by Axiom 2).
85-
4. **Inclusion-Exclusion Principle:** For any two events $A$ and $B$:
92+
93+
4. **Inclusion-Exclusion Principle:** For any two events $A$ and $B$:
8694
$$\mathbb P(A \cup B) = \mathbb P(A) + \mathbb P(B) - \mathbb P(A \cap B)$$
8795

8896
---
@@ -100,14 +108,15 @@ $$\mathbb P(A) = \frac{|A|}{|\Omega|} = \frac{\text{number of outcomes in } A}{\
100108

101109
To determine $|A|$ and $|\Omega|$, we often use combinatorial methods:
102110

103-
* **Multiplication Rule:** If an experiment has $k$ steps with $n_1, n_2, \dots, n_k$ choices respectively, the total number of outcomes is $n_1 \times n_2 \times \dots \times n_k$.
104-
* **Permutations:** Ordering $k$ distinct items from a set of $n$.
111+
- **Multiplication Rule:** If an experiment has $k$ steps with $n_1, n_2, \dots, n_k$ choices respectively, the total number of outcomes is $n_1 \times n_2 \times \dots \times n_k$.
112+
- **Permutations:** Ordering $k$ distinct items from a set of $n$.
105113
$$P(n, k) = \frac{n!}{(n-k)!}$$
106-
* **Combinations:** Choosing $k$ items from a set of $n$ without regard to order.
114+
- **Combinations:** Choosing $k$ items from a set of $n$ without regard to order.
107115
$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$
108116

109117
**Example:**
110118
What is the probability of drawing a King from a standard deck?
119+
111120
- $|\Omega| = 52$
112121
- $|A| = 4$ (King of Hearts, Diamonds, Clubs, Spades)
113122
- $\mathbb P(A) = \frac{4}{52} = \frac{1}{13}$
@@ -116,5 +125,5 @@ What is the probability of drawing a King from a standard deck?
116125

117126
## References
118127

119-
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
120-
2. Han, Y. (2026). Lecture 1: Sample space & probability.
128+
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
129+
2. Han, Y. (2026). Lecture 1: Sample space & probability.

notes/courses/MATH-UA-334/02-random-variables.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ The PMF $p_X(x)$ gives the probability that $X$ takes the specific value $x$.
5454
$$p_X(x_i) = \mathbb{P}(X = x_i)$$
5555

5656
**Properties:**
57+
5758
* $p_X(x_i) \ge 0$
5859
* $\sum_{i} p_X(x_i) = 1$
5960
* Relation to CDF: $F_X(x) = \sum_{x_i \le x} p_X(x_i)$ (a step function).
@@ -77,9 +78,10 @@ A random variable $X$ is **continuous** if there exists a non-negative function
7778
$$\mathbb{P}(X \in B) = \int_B f_X(x) \, dx$$
7879

7980
**Properties:**
81+
8082
* $f_X(x) \ge 0$.
81-
* $\int_{-\infty}^{\infty} f_X(x) \, dx = 1$.
82-
* Relation to CDF: $F_X(x) = \int_{-\infty}^{x} f_X(t) \, dt$.
83+
* $\int_{-\infty}^{\infty} f_X(x) dx = 1$.
84+
* Relation to CDF: $F_X(x) = \int_{-\infty}^{x} f_X(t) dt$.
8385
* Fundamental Theorem of Calculus: $f_X(x) = F'_X(x)$ (where derivative exists).
8486
* **Important:** For continuous RVs, $\mathbb{P}(X = c) = 0$ for any specific point $c$.
8587

@@ -126,4 +128,4 @@ This matches the density of $\mathcal{N}(\mu, \sigma^2)$.
126128
## References
127129

128130
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
129-
2. Han, Y. (2026). Lecture 2: Random Variables.
131+
2. Han, Y. (2026). Lecture 2: Random Variables.

notes/courses/MATH-UA-334/03-joint-distributions.md

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@ Let $X$ and $Y$ be discrete random variables. The behavior of the pair $(X, Y)$
1818
$$p_{X,Y}(x, y) = \mathbb{P}(X = x, Y = y)$$
1919

2020
**Properties:**
21-
1. **Non-negativity:** $p_{X,Y}(x, y) \ge 0$.
22-
2. **Normalization:** Summing over all possible pairs equals 1.
21+
22+
1. **Non-negativity:** $p_{X,Y}(x, y) \ge 0$.
23+
2. **Normalization:** Summing over all possible pairs equals 1.
2324
$$\sum_{x} \sum_{y} p_{X,Y}(x, y) = 1$$
2425

2526
### 2.2 Marginal Distribution
@@ -76,17 +77,20 @@ $$f_{U,V}(u, v) = f_{X,Y}(x, y) \cdot |J|^{-1}$$
7677
where $x, y$ are expressed in terms of $u, v$, and $J$ is the Jacobian of the transformation $(u,v) \to (x,y)$ (or inverse of the transformation $(x,y) \to (u,v)$).
7778

7879
Specifically, if we compute the Jacobian of the transformation **from $(x, y)$ to $(u, v)$**:
79-
$$J = \det \begin{bmatrix} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y} \\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} \end{bmatrix}$$
80+
$$
81+
J = \det \begin{bmatrix} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y} \\\\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} \end{bmatrix}
82+
$$
8083
Then:
8184
$$f_{U,V}(u, v) = f_{X,Y}(x(u,v), y(u,v)) \cdot \frac{1}{|J(x,y)|}$$
8285

8386
### Example 1: Sum of Random Variables
8487

8588
Let $U = X + Y$. To use the method, we introduce a dummy variable $V = Y$.
89+
8690
* Transformation: $u = x+y, v = y$.
8791
* Inverse: $x = u-v, y = v$.
8892
* Jacobian: $\frac{\partial u}{\partial x} = 1, \dots$ actually it is easier to find the Jacobian of the inverse map directly or just use the formula.
89-
$$\frac{\partial(u,v)}{\partial(x,y)} = \det \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} = 1$$
93+
$$\frac{\partial(u,v)}{\partial(x,y)} = \det \begin{bmatrix} 1 & 1 \\\\ 0 & 1 \end{bmatrix} = 1$$
9094
* Density:
9195
$$f_{U,V}(u, v) = f_{X,Y}(u-v, v) \cdot 1$$
9296
* Marginal of U (Convolution Formula):
@@ -96,18 +100,18 @@ Let $U = X + Y$. To use the method, we introduce a dummy variable $V = Y$.
96100
### Example 2: Polar Coordinates
97101

98102
Let $(X, Y)$ be independent standard normals, i.e., $f_{X,Y}(x,y) = \frac{1}{2\pi} e^{-(x^2+y^2)/2}$.
99-
Transform to polar coordinates $(R, \Theta)$:
100-
$$X = R \cos \Theta, \quad Y = R \sin \Theta$$
103+
Transform to polar coordinates $(R, \theta)$:
104+
$$X = R \cos \theta, \quad Y = R \sin \theta$$
101105

102-
* The Jacobian of the transformation from $(R, \Theta)$ to $(X, Y)$ is $r$.
103-
* $f_{R, \Theta}(r, \theta) = f_{X,Y}(r\cos\theta, r\sin\theta) \cdot r$
104-
* $f_{R, \Theta}(r, \theta) = \frac{1}{2\pi} e^{-r^2/2} \cdot r$ for $r \ge 0, \theta \in [0, 2\pi)$.
106+
* The Jacobian of the transformation from $(R, \theta)$ to $(X, Y)$ is $r$.
107+
* $f_{R, \theta}(r, \theta) = f_{X,Y}(r\cos\theta, r\sin\theta) \cdot r$
108+
* $f_{R, \theta}(r, \theta) = \frac{1}{2\pi} e^{-r^2/2} \cdot r$ for $r \ge 0, \theta \in [0, 2\pi)$.
105109

106-
This implies $R$ and $\Theta$ are independent, with $\Theta \sim \text{Unif}[0, 2\pi]$ and $R$ following a Rayleigh distribution.
110+
This implies $R$ and $\theta$ are independent, with $\theta \sim \text{Unif}[0, 2\pi]$ and $R$ following a Rayleigh distribution.
107111

108112
---
109113

110114
## References
111115

112-
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
113-
2. Han, Y. (2026). Lecture 3: Joint Distributions.
116+
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
117+
2. Han, Y. (2026). Lecture 3: Joint Distributions.

notes/courses/MATH-UA-334/04-expectation.md

Lines changed: 57 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -13,42 +13,55 @@ Once we know the distribution of a random variable (RV), it is often useful to c
1313

1414
### 2.1 Definition
1515

16-
The expected value $E[X]$ is the probability-weighted average of the possible values of $X$.
16+
The expected value $\E[X]$ is the probability-weighted average of the possible values of $X$.
1717

1818
* **Discrete Case:**
1919
If $X$ has PMF $p(x)$:
20-
$$E[X] = \sum_{x} x \cdot p(x)$$
20+
$$\E[X] = \sum_{x} x \cdot p(x)$$
2121
(Provided $\sum |x|p(x) < \infty$, otherwise undefined).
2222

2323
* **Continuous Case:**
2424
If $X$ has PDF $f(x)$:
25-
$$E[X] = \int_{-\infty}^{+\infty} x \cdot f(x) \, dx$$
26-
(Provided $\int |x|f(x) \, dx < \infty$).
25+
$$\E[X] = \int_{-\infty}^{+\infty} x \cdot f(x) dx$$
26+
(Provided $\int |x|f(x) dx < \infty$).
2727

28-
**Interpretation:** $E[X]$ represents the long-run average. By the Law of Large Numbers, if $X_1, \dots, X_n$ are i.i.d. copies of $X$, then $\frac{1}{n}\sum X_i \to E[X]$.
28+
**Interpretation:** $\E[X]$ represents the long-run average. By the Law of Large Numbers, if $X_1, \dots, X_n$ are i.i.d. copies of $X$, then $\frac{1}{n}\sum X_i \to \E[X]$.
2929

3030
### 2.2 Examples
3131

32-
1. **Bernoulli:** $X \sim \text{Bern}(p)$.
33-
$$E[X] = 1 \cdot p + 0 \cdot (1-p) = p$$
34-
35-
2. **Binomial:** $X \sim B(n, p)$.
36-
$$E[X] = \sum_{k=0}^n k \binom{n}{k} p^k (1-p)^{n-k} = np$$
37-
*Derivation hint:* Factor out $n$ and $p$, and re-index the sum to look like a Binomial $(n-1, p)$ sum which equals 1.
38-
39-
3. **Normal:** $X \sim \mathcal{N}(\mu, \sigma^2)$.
32+
1. **Bernoulli:** $X \sim \text{Bern}(p)$.
33+
$$\E[X] = 1 \cdot p + 0 \cdot (1-p) = p$$
34+
35+
2. **Binomial:** $X \sim B(n, p)$.
36+
$$\E[X] = \sum_{k=0}^n k \binom{n}{k} p^k (1-p)^{n-k} = np$$
37+
38+
> Since the first term of the sum is $0$ (when $k=0$), we can start the summation from $k=1$. Let $m = n-1$ and $j = k-1$. As $k$ goes from $1$ to $n$, $j$ goes from $0$ to $m$. Note that $n-k = (m+1) - (j+1) = m-j$.
39+
> $$
40+
> \begin{align*}
41+
> \E[X] &= \sum_{k=1}^n k \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k} \\\\
42+
> &= \sum_{k=1}^n \frac{n \cdot (n-1)!}{(k-1)!(n-k)!} p \cdot p^{k-1} (1-p)^{n-k} \\\\
43+
> &= np \sum_{k=1}^n \frac{(n-1)!}{(k-1)!(n-k)!} p^{k-1} (1-p)^{n-k} \\\\
44+
> &= np \sum_{j=0}^m \frac{m!}{j!(m-j)!} p^j (1-p)^{m-j} \\\\
45+
> &= np \sum_{j=0}^m \binom{m}{j} p^j (1-p)^{m-j} \\\\
46+
> &= np(p + (1-p))^m \\\\
47+
> &= np(1) \\\\
48+
> &= np
49+
> \end{align*}
50+
> $$
51+
52+
3. **Normal:** $X \sim \mathcal{N}(\mu, \sigma^2)$.
4053
Using the substitution $y = \frac{x-\mu}{\sigma}$:
41-
$$E[X] = \int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \, dx = \mu$$
54+
$$\E[X] = \int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx = \mu$$
4255

4356
### 2.3 Properties
4457

45-
1. **Linearity:** For constants $a, b$:
46-
$$E[aX + bY] = aE[X] + bE[Y]$$
58+
1. **Linearity:** For constants $a, b$:
59+
$$\E[aX + bY] = a\E[X] + b\E[Y]$$
4760
(This holds even if $X$ and $Y$ are dependent).
4861

49-
2. **LOTUS (Law of the Unconscious Statistician):**
50-
To compute $E[g(X)]$, we don't need the PDF of $g(X)$; we can use the PDF of $X$.
51-
$$E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) \, dx$$
62+
2. **LOTUS (Law of the Unconscious Statistician):**
63+
To compute $\E[g(X)]$, we don't need the PDF of $g(X)$; we can use the PDF of $X$.
64+
$$\E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) dx$$
5265

5366
---
5467

@@ -58,40 +71,46 @@ The variance measures the spread or dispersion of a distribution around its mean
5871

5972
### 3.1 Definition
6073

61-
$$\text{Var}(X) = E[(X - E[X])^2]$$
74+
$$\Var{X} = \E[(X - \E[X])^2]$$
6275

63-
Standard Deviation: $\sigma_X = \sqrt{\text{Var}(X)}$.
76+
Standard Deviation: $\sigma_X = \sqrt{\Var{X}}$.
6477

6578
### 3.2 Computational Formula
6679

67-
Expanding the square in the definition:
68-
$$\text{Var}(X) = E[X^2 - 2X E[X] + (E[X])^2]$$
69-
Using linearity (noting that $E[X]$ is a constant):
70-
$$\text{Var}(X) = E[X^2] - (E[X])^2$$
80+
Expanding the square in the definition and using linearity:
81+
$$
82+
\begin{align*}
83+
\Var{X}
84+
&= \E[X^2 - 2X \E[X] + (\E[X])^2]\\\\
85+
&= \E[X^2] - \E[2X \E[X]] + \E[(\E[X])^2]\\\\
86+
&= \E[X^2] - 2 \E[X] \E[X] + (\E[X])^2\\\\
87+
&= \E[X^2] - (\E[X])^2
88+
\end{align*}
89+
$$
7190

7291
### 3.3 Properties of Variance
7392

74-
1. **Non-negative:** $\text{Var}(X) \ge 0$.
75-
2. **Scaling:** $\text{Var}(aX + b) = a^2 \text{Var}(X)$.
93+
1. **Non-negative:** $\Var{X} \ge 0$.
94+
2. **Scaling:** $\Var{aX + b} = a^2 \Var{X}$.
7695
* Adding a constant ($b$) shifts the distribution but does not change the spread.
7796
* Multiplying by $a$ scales the spread by $|a|$, so variance scales by $a^2$.
78-
3. **Sum of Independent RVs:** If $X$ and $Y$ are independent:
79-
$$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$$
97+
3. **Sum of Independent RVs:** If $X$ and $Y$ are independent:
98+
$$\Var{X + Y} = \Var{X} + \Var{Y}$$
8099

81100
### 3.4 Examples
82101

83-
1. **Bernoulli($p$):**
84-
* $E[X] = p$.
85-
* $E[X^2] = 1^2 \cdot p + 0^2 \cdot (1-p) = p$.
86-
* $\text{Var}(X) = p - p^2 = p(1-p)$.
102+
1. **Bernoulli($p$):**
103+
* $\E[X] = p$.
104+
* $\E[X^2] = 1^2 \cdot p + 0^2 \cdot (1-p) = p$.
105+
* $\Var{X} = p - p^2 = p(1-p)$.
87106

88-
2. **Normal($\mu, \sigma^2$):**
89-
* Calculation of $\text{Var}(X)$ involves integration by parts or recognizing the second moment of the standard normal is 1.
90-
* Result: $\text{Var}(X) = \sigma^2$.
107+
2. **Normal($\mu, \sigma^2$):**
108+
* Calculation of $\Var{X}$ involves integration by parts or recognizing the second moment of the standard normal is 1.
109+
* Result: $\Var{X} = \sigma^2$.
91110

92111
---
93112

94113
## References
95114

96-
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
97-
2. Han, Y. (2026). Lecture 4: Expectation.
115+
1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
116+
2. Han, Y. (2026). Lecture 4: Expectation.

0 commit comments

Comments
 (0)