Update 334 notes.

localhost433 · localhost433 · commit 06cfdfaacdb6 · 2026-03-18T13:56:37.000-04:00
diff --git a/notes/courses/MATH-UA-334/01-sample-probability.md b/notes/courses/MATH-UA-334/01-sample-probability.md
@@ -3,7 +3,7 @@ title: Sample space and probability
 date: 2026-01-21
 ---
 
-## Roadmap
+## 0.Roadmap
 
 We model a “random experiment” with a triple $(\Omega, \mathcal F, \mathbb P)$:
 
@@ -23,9 +23,9 @@ The rest of the course basically builds on these ideas: compute probabilities by
 
 **Examples:**
 
-* **Coin Toss:** Toss a coin three times. The sample space is:
+- **Coin Toss:** Toss a coin three times. The sample space is:
     $$\Omega = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}$$
-* **Card Draw:** Draw one card from a standard 52-card deck. The sample space contains 52 distinct elements.
+- **Card Draw:** Draw one card from a standard 52-card deck. The sample space contains 52 distinct elements.
 
 ### 1.2 Events
 
@@ -47,13 +47,15 @@ For events $A, B \subseteq \Omega$, we define the following set operations:
 **Example (Coin tossed 3 times):**
 
 Let $\Omega$ be the set of 3 coin flips. Define the following events:
-- $A =$ “at most one tail” $= \{HHH, HHT, HTH, THH\}$
-- $B =$ “first flip is tails” $= \{THH, THT, TTH, TTT\}$
+
+- $A =$ “at most one tail” $= \{HHH, HHT, HTH, THH \}$
+- $B =$ “first flip is tails” $= \{THH, THT, TTH, TTT \}$
 
 **Operations:**
-1.  **Union ($A \cup B$):** "At most one tail OR first flip is tails".
+
+1. **Union ($A \cup B$):** "At most one tail OR first flip is tails".
     $$A \cup B = \{HHH, HHT, HTH, THH, THT, TTH, TTT\} = \Omega \setminus \{HTT\}$$
-2.  **Intersection ($A \cap B$):** "At most one tail AND first flip is tails".
+2. **Intersection ($A \cap B$):** "At most one tail AND first flip is tails".
     $$A \cap B = \{THH\}$$
 
 ---
@@ -66,23 +68,29 @@ A probability measure $\mathbb P$ is a function that assigns a real number to ea
 
 A function $\mathbb P: \mathcal{F} \to [0, 1]$ is a probability measure if it satisfies the following three axioms:
 
-1.  **Normalization:** $\mathbb P(\Omega) = 1$.
-2.  **Non-negativity:** $\mathbb P(A) \ge 0$ for all $A \subseteq \Omega$.
-3.  **Countable Additivity (Disjoint Unions):** If $A_1, A_2, \dots$ are disjoint events (i.e., $A_i \cap A_j = \emptyset$ for $i \neq j$), then:
+1. **Normalization:** $\mathbb P(\Omega) = 1$.
+2. **Non-negativity:** $\mathbb P(A) \ge 0$ for all $A \subseteq \Omega$.
+3. **Countable Additivity (Disjoint Unions):** If $A_1, A_2, \dots$ are disjoint events (i.e., $A_i \cap A_j = \emptyset$ for $i \neq j$), then:
     $$\mathbb P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \mathbb P(A_i)$$
     *Special case:* For two disjoint events $A$ and $B$, $\mathbb P(A \cup B) = \mathbb P(A) + \mathbb P(B)$.
 
 ### 2.2 Properties Derived from Axioms
 
 Using the axioms, we can derive several useful properties:
 
-1.  **Complement Rule:** $\mathbb P(A^c) = 1 - \mathbb P(A)$.
+1. **Complement Rule:** $\mathbb P(A^c) = 1 - \mathbb P(A)$.
+
     *Proof:* Since $A \cup A^c = \Omega$ and $A \cap A^c = \emptyset$, by Axiom 3 and 1: $\mathbb P(A) + \mathbb P(A^c) = \mathbb P(\Omega) = 1$.
-2.  **Empty Set:** $\mathbb P(\emptyset) = 0$.
+
+2. **Empty Set:** $\mathbb P(\emptyset) = 0$.
+
     *Proof:* $\emptyset = \Omega^c$, so $\mathbb P(\emptyset) = 1 - \mathbb P(\Omega) = 0$.
-3.  **Monotonicity:** If $A \subseteq B$, then $\mathbb P(A) \le \mathbb P(B)$.
+
+3. **Monotonicity:** If $A \subseteq B$, then $\mathbb P(A) \le \mathbb P(B)$.
+
     *Proof:* Write $B = A \cup (B \cap A^c)$. Since these are disjoint, $\mathbb P(B) = \mathbb P(A) + \mathbb P(B \cap A^c) \ge \mathbb P(A)$ (by Axiom 2).
-4.  **Inclusion-Exclusion Principle:** For any two events $A$ and $B$:
+
+4. **Inclusion-Exclusion Principle:** For any two events $A$ and $B$:
     $$\mathbb P(A \cup B) = \mathbb P(A) + \mathbb P(B) - \mathbb P(A \cap B)$$
 
 ---
@@ -100,14 +108,15 @@ $$\mathbb P(A) = \frac{|A|}{|\Omega|} = \frac{\text{number of outcomes in } A}{\
 
 To determine $|A|$ and $|\Omega|$, we often use combinatorial methods:
 
-* **Multiplication Rule:** If an experiment has $k$ steps with $n_1, n_2, \dots, n_k$ choices respectively, the total number of outcomes is $n_1 \times n_2 \times \dots \times n_k$.
-* **Permutations:** Ordering $k$ distinct items from a set of $n$.
+- **Multiplication Rule:** If an experiment has $k$ steps with $n_1, n_2, \dots, n_k$ choices respectively, the total number of outcomes is $n_1 \times n_2 \times \dots \times n_k$.
+- **Permutations:** Ordering $k$ distinct items from a set of $n$.
     $$P(n, k) = \frac{n!}{(n-k)!}$$
-* **Combinations:** Choosing $k$ items from a set of $n$ without regard to order.
+- **Combinations:** Choosing $k$ items from a set of $n$ without regard to order.
     $$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$
 
 **Example:**
 What is the probability of drawing a King from a standard deck?
+
 - $|\Omega| = 52$
 - $|A| = 4$ (King of Hearts, Diamonds, Clubs, Spades)
 - $\mathbb P(A) = \frac{4}{52} = \frac{1}{13}$
@@ -116,5 +125,5 @@ What is the probability of drawing a King from a standard deck?
 
 ## References
 
-1.  Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
-2.  Han, Y. (2026). Lecture 1: Sample space & probability.
+1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
+2. Han, Y. (2026). Lecture 1: Sample space & probability.
diff --git a/notes/courses/MATH-UA-334/02-random-variables.md b/notes/courses/MATH-UA-334/02-random-variables.md
@@ -54,6 +54,7 @@ The PMF $p_X(x)$ gives the probability that $X$ takes the specific value $x$.
 $$p_X(x_i) = \mathbb{P}(X = x_i)$$
 
 **Properties:**
+
 * $p_X(x_i) \ge 0$
 * $\sum_{i} p_X(x_i) = 1$
 * Relation to CDF: $F_X(x) = \sum_{x_i \le x} p_X(x_i)$ (a step function).
@@ -77,9 +78,10 @@ A random variable $X$ is **continuous** if there exists a non-negative function
 $$\mathbb{P}(X \in B) = \int_B f_X(x) \, dx$$
 
 **Properties:**
+
 * $f_X(x) \ge 0$.
-* $\int_{-\infty}^{\infty} f_X(x) \, dx = 1$.
-* Relation to CDF: $F_X(x) = \int_{-\infty}^{x} f_X(t) \, dt$.
+* $\int_{-\infty}^{\infty} f_X(x) dx = 1$.
+* Relation to CDF: $F_X(x) = \int_{-\infty}^{x} f_X(t) dt$.
 * Fundamental Theorem of Calculus: $f_X(x) = F'_X(x)$ (where derivative exists).
 * **Important:** For continuous RVs, $\mathbb{P}(X = c) = 0$ for any specific point $c$.
 
@@ -126,4 +128,4 @@ This matches the density of $\mathcal{N}(\mu, \sigma^2)$.
 ## References
 
 1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
-2. Han, Y. (2026). Lecture 2: Random Variables.
+2. Han, Y. (2026). Lecture 2: Random Variables.
diff --git a/notes/courses/MATH-UA-334/03-joint-distributions.md b/notes/courses/MATH-UA-334/03-joint-distributions.md
@@ -18,8 +18,9 @@ Let $X$ and $Y$ be discrete random variables. The behavior of the pair $(X, Y)$
 $$p_{X,Y}(x, y) = \mathbb{P}(X = x, Y = y)$$
 
 **Properties:**
-1.  **Non-negativity:** $p_{X,Y}(x, y) \ge 0$.
-2.  **Normalization:** Summing over all possible pairs equals 1.
+
+1. **Non-negativity:** $p_{X,Y}(x, y) \ge 0$.
+2. **Normalization:** Summing over all possible pairs equals 1.
     $$\sum_{x} \sum_{y} p_{X,Y}(x, y) = 1$$
 
 ### 2.2 Marginal Distribution
@@ -76,17 +77,20 @@ $$f_{U,V}(u, v) = f_{X,Y}(x, y) \cdot |J|^{-1}$$
 where $x, y$ are expressed in terms of $u, v$, and $J$ is the Jacobian of the transformation $(u,v) \to (x,y)$ (or inverse of the transformation $(x,y) \to (u,v)$).
 
 Specifically, if we compute the Jacobian of the transformation **from $(x, y)$ to $(u, v)$**:
-$$J = \det \begin{bmatrix} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y} \\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} \end{bmatrix}$$
+$$
+J = \det \begin{bmatrix} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y} \\\\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} \end{bmatrix}
+$$
 Then:
 $$f_{U,V}(u, v) = f_{X,Y}(x(u,v), y(u,v)) \cdot \frac{1}{|J(x,y)|}$$
 
 ### Example 1: Sum of Random Variables
 
 Let $U = X + Y$. To use the method, we introduce a dummy variable $V = Y$.
+
 * Transformation: $u = x+y, v = y$.
 * Inverse: $x = u-v, y = v$.
 * Jacobian: $\frac{\partial u}{\partial x} = 1, \dots$ actually it is easier to find the Jacobian of the inverse map directly or just use the formula.
-    $$\frac{\partial(u,v)}{\partial(x,y)} = \det \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} = 1$$
+    $$\frac{\partial(u,v)}{\partial(x,y)} = \det \begin{bmatrix} 1 & 1 \\\\ 0 & 1 \end{bmatrix} = 1$$
 * Density:
     $$f_{U,V}(u, v) = f_{X,Y}(u-v, v) \cdot 1$$
 * Marginal of U (Convolution Formula):
@@ -96,18 +100,18 @@ Let $U = X + Y$. To use the method, we introduce a dummy variable $V = Y$.
 ### Example 2: Polar Coordinates
 
 Let $(X, Y)$ be independent standard normals, i.e., $f_{X,Y}(x,y) = \frac{1}{2\pi} e^{-(x^2+y^2)/2}$.
-Transform to polar coordinates $(R, \Theta)$:
-$$X = R \cos \Theta, \quad Y = R \sin \Theta$$
+Transform to polar coordinates $(R, \theta)$:
+$$X = R \cos \theta, \quad Y = R \sin \theta$$
 
-* The Jacobian of the transformation from $(R, \Theta)$ to $(X, Y)$ is $r$.
-* $f_{R, \Theta}(r, \theta) = f_{X,Y}(r\cos\theta, r\sin\theta) \cdot r$
-* $f_{R, \Theta}(r, \theta) = \frac{1}{2\pi} e^{-r^2/2} \cdot r$ for $r \ge 0, \theta \in [0, 2\pi)$.
+* The Jacobian of the transformation from $(R, \theta)$ to $(X, Y)$ is $r$.
+* $f_{R, \theta}(r, \theta) = f_{X,Y}(r\cos\theta, r\sin\theta) \cdot r$
+* $f_{R, \theta}(r, \theta) = \frac{1}{2\pi} e^{-r^2/2} \cdot r$ for $r \ge 0, \theta \in [0, 2\pi)$.
 
-This implies $R$ and $\Theta$ are independent, with $\Theta \sim \text{Unif}[0, 2\pi]$ and $R$ following a Rayleigh distribution.
+This implies $R$ and $\theta$ are independent, with $\theta \sim \text{Unif}[0, 2\pi]$ and $R$ following a Rayleigh distribution.
 
 ---
 
 ## References
 
-1.  Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
-2.  Han, Y. (2026). Lecture 3: Joint Distributions.
+1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
+2. Han, Y. (2026). Lecture 3: Joint Distributions.
diff --git a/notes/courses/MATH-UA-334/04-expectation.md b/notes/courses/MATH-UA-334/04-expectation.md
@@ -13,42 +13,55 @@ Once we know the distribution of a random variable (RV), it is often useful to c
 
 ### 2.1 Definition
 
-The expected value $E[X]$ is the probability-weighted average of the possible values of $X$.
+The expected value $\E[X]$ is the probability-weighted average of the possible values of $X$.
 
 * **Discrete Case:**
     If $X$ has PMF $p(x)$:
-    $$E[X] = \sum_{x} x \cdot p(x)$$
+    $$\E[X] = \sum_{x} x \cdot p(x)$$
     (Provided $\sum |x|p(x) < \infty$, otherwise undefined).
 
 * **Continuous Case:**
     If $X$ has PDF $f(x)$:
-    $$E[X] = \int_{-\infty}^{+\infty} x \cdot f(x) \, dx$$
-    (Provided $\int |x|f(x) \, dx < \infty$).
+    $$\E[X] = \int_{-\infty}^{+\infty} x \cdot f(x) dx$$
+    (Provided $\int |x|f(x) dx < \infty$).
 
-**Interpretation:** $E[X]$ represents the long-run average. By the Law of Large Numbers, if $X_1, \dots, X_n$ are i.i.d. copies of $X$, then $\frac{1}{n}\sum X_i \to E[X]$.
+**Interpretation:** $\E[X]$ represents the long-run average. By the Law of Large Numbers, if $X_1, \dots, X_n$ are i.i.d. copies of $X$, then $\frac{1}{n}\sum X_i \to \E[X]$.
 
 ### 2.2 Examples
 
-1.  **Bernoulli:** $X \sim \text{Bern}(p)$.
-    $$E[X] = 1 \cdot p + 0 \cdot (1-p) = p$$
-
-2.  **Binomial:** $X \sim B(n, p)$.
-    $$E[X] = \sum_{k=0}^n k \binom{n}{k} p^k (1-p)^{n-k} = np$$
-    *Derivation hint:* Factor out $n$ and $p$, and re-index the sum to look like a Binomial $(n-1, p)$ sum which equals 1.
-
-3.  **Normal:** $X \sim \mathcal{N}(\mu, \sigma^2)$.
+1. **Bernoulli:** $X \sim \text{Bern}(p)$.
+    $$\E[X] = 1 \cdot p + 0 \cdot (1-p) = p$$
+
+2. **Binomial:** $X \sim B(n, p)$.
+    $$\E[X] = \sum_{k=0}^n k \binom{n}{k} p^k (1-p)^{n-k} = np$$
+
+    > Since the first term of the sum is $0$ (when $k=0$), we can start the summation from $k=1$. Let $m = n-1$ and $j = k-1$. As $k$ goes from $1$ to $n$, $j$ goes from $0$ to $m$. Note that $n-k = (m+1) - (j+1) = m-j$.
+    > $$
+    >     \begin{align*}
+    >         \E[X] &= \sum_{k=1}^n k \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k} \\\\
+    >         &= \sum_{k=1}^n \frac{n \cdot (n-1)!}{(k-1)!(n-k)!} p \cdot p^{k-1} (1-p)^{n-k} \\\\
+    >         &= np \sum_{k=1}^n \frac{(n-1)!}{(k-1)!(n-k)!} p^{k-1} (1-p)^{n-k} \\\\
+    >         &= np \sum_{j=0}^m \frac{m!}{j!(m-j)!} p^j (1-p)^{m-j} \\\\
+    >         &= np \sum_{j=0}^m \binom{m}{j} p^j (1-p)^{m-j} \\\\
+    >         &= np(p + (1-p))^m \\\\
+    >         &= np(1) \\\\
+    >         &= np
+    >     \end{align*}
+    > $$
+
+3. **Normal:** $X \sim \mathcal{N}(\mu, \sigma^2)$.
     Using the substitution $y = \frac{x-\mu}{\sigma}$:
-    $$E[X] = \int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \, dx = \mu$$
+    $$\E[X] = \int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx = \mu$$
 
 ### 2.3 Properties
 
-1.  **Linearity:** For constants $a, b$:
-    $$E[aX + bY] = aE[X] + bE[Y]$$
+1. **Linearity:** For constants $a, b$:
+    $$\E[aX + bY] = a\E[X] + b\E[Y]$$
     (This holds even if $X$ and $Y$ are dependent).
 
-2.  **LOTUS (Law of the Unconscious Statistician):**
-    To compute $E[g(X)]$, we don't need the PDF of $g(X)$; we can use the PDF of $X$.
-    $$E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) \, dx$$
+2. **LOTUS (Law of the Unconscious Statistician):**
+    To compute $\E[g(X)]$, we don't need the PDF of $g(X)$; we can use the PDF of $X$.
+    $$\E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) dx$$
 
 ---
 
@@ -58,40 +71,46 @@ The variance measures the spread or dispersion of a distribution around its mean
 
 ### 3.1 Definition
 
-$$\text{Var}(X) = E[(X - E[X])^2]$$
+$$\Var{X} = \E[(X - \E[X])^2]$$
 
-Standard Deviation: $\sigma_X = \sqrt{\text{Var}(X)}$.
+Standard Deviation: $\sigma_X = \sqrt{\Var{X}}$.
 
 ### 3.2 Computational Formula
 
-Expanding the square in the definition:
-$$\text{Var}(X) = E[X^2 - 2X E[X] + (E[X])^2]$$
-Using linearity (noting that $E[X]$ is a constant):
-$$\text{Var}(X) = E[X^2] - (E[X])^2$$
+Expanding the square in the definition and using linearity:
+$$
+    \begin{align*}
+        \Var{X}
+        &= \E[X^2 - 2X \E[X] + (\E[X])^2]\\\\
+        &= \E[X^2] - \E[2X \E[X]] + \E[(\E[X])^2]\\\\
+        &= \E[X^2] - 2 \E[X] \E[X] + (\E[X])^2\\\\
+        &= \E[X^2] - (\E[X])^2
+    \end{align*}
+$$
 
 ### 3.3 Properties of Variance
 
-1.  **Non-negative:** $\text{Var}(X) \ge 0$.
-2.  **Scaling:** $\text{Var}(aX + b) = a^2 \text{Var}(X)$.
+1. **Non-negative:** $\Var{X} \ge 0$.
+2. **Scaling:** $\Var{aX + b} = a^2 \Var{X}$.
     * Adding a constant ($b$) shifts the distribution but does not change the spread.
     * Multiplying by $a$ scales the spread by $|a|$, so variance scales by $a^2$.
-3.  **Sum of Independent RVs:** If $X$ and $Y$ are independent:
-    $$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$$
+3. **Sum of Independent RVs:** If $X$ and $Y$ are independent:
+    $$\Var{X + Y} = \Var{X} + \Var{Y}$$
 
 ### 3.4 Examples
 
-1.  **Bernoulli($p$):**
-    * $E[X] = p$.
-    * $E[X^2] = 1^2 \cdot p + 0^2 \cdot (1-p) = p$.
-    * $\text{Var}(X) = p - p^2 = p(1-p)$.
+1. **Bernoulli($p$):**
+    * $\E[X] = p$.
+    * $\E[X^2] = 1^2 \cdot p + 0^2 \cdot (1-p) = p$.
+    * $\Var{X} = p - p^2 = p(1-p)$.
 
-2.  **Normal($\mu, \sigma^2$):**
-    * Calculation of $\text{Var}(X)$ involves integration by parts or recognizing the second moment of the standard normal is 1.
-    * Result: $\text{Var}(X) = \sigma^2$.
+2. **Normal($\mu, \sigma^2$):**
+    * Calculation of $\Var{X}$ involves integration by parts or recognizing the second moment of the standard normal is 1.
+    * Result: $\Var{X} = \sigma^2$.
 
 ---
 
 ## References
 
-1.  Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
-2.  Han, Y. (2026). Lecture 4: Expectation.
+1. Rice, J. A. (2007). *Mathematical Statistics and Data Analysis* (3rd ed.). Thomson Brooks/Cole.
+2. Han, Y. (2026). Lecture 4: Expectation.
diff --git a/notes/courses/MATH-UA-334/05-cov-cond-exp.md b/notes/courses/MATH-UA-334/05-cov-cond-exp.md
diff --git a/notes/courses/MATH-UA-334/06-limit-theorems.md b/notes/courses/MATH-UA-334/06-limit-theorems.md