Skip to content

Commit 025bbb9

Browse files
committed
Note 10-27 for CS 310
1 parent ecddb6f commit 025bbb9

12 files changed

Lines changed: 2340 additions & 1 deletion
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
---
2+
title: 08/09 - Hashing
3+
date: 2026-02-16/18
4+
---
5+
6+
## Roadmap
7+
8+
These lectures introduce **hash tables**, the primary data structure for implementing dynamic sets with fast average-case operations. We motivate the design through direct-address tables, introduce hash functions and collision resolution via chaining, then cover **open addressing** and **universal hashing** for provable worst-case guarantees.
9+
10+
1. **The Dictionary Problem**: Operations and motivation.
11+
2. **Direct-Address Tables**: A simple starting point.
12+
3. **Hash Tables**: Hash functions and collisions.
13+
4. **Chaining**: Collision resolution via linked lists.
14+
5. **Analysis of Chaining**: Expected cost under simple uniform hashing.
15+
6. **Open Addressing**: Linear probing, quadratic probing, double hashing.
16+
7. **Analysis of Open Addressing**: Expected number of probes.
17+
8. **Universal Hashing**: Worst-case guarantees via randomized hash functions.
18+
9. **A Universal Hash Family**: Construction and proof.
19+
20+
---
21+
22+
## 1. The Dictionary Problem
23+
24+
We want a data structure supporting:
25+
26+
* `INSERT(S, x)`: Insert element $x$ into set $S$.
27+
* `DELETE(S, x)`: Remove element $x$ from set $S$.
28+
* `SEARCH(S, k)`: Find element with key $k$ in $S$.
29+
30+
**Goal**: All three operations in $O(1)$ expected time.
31+
32+
---
33+
34+
## 2. Direct-Address Tables
35+
36+
**Idea**: If keys are drawn from a universe $U = \{0, 1, \dots, m-1\}$, allocate an array $T[0 \dots m-1]$. Slot $k$ holds a pointer to the element with key $k$ (or `NIL`).
37+
38+
**Performance**: All operations take $\Theta(1)$ worst-case time.
39+
40+
**Problem**: If the universe $|U|$ is large (e.g., 64-bit integers), the table is impractically large. In practice, the number of keys actually stored $n \ll |U|$.
41+
42+
---
43+
44+
## 3. Hash Tables
45+
46+
**Idea**: Use a **hash function** $h: U \to \{0, 1, \dots, m-1\}$ to map keys to table slots. Store element with key $k$ in slot $h(k)$.
47+
48+
$$h: U \to \{0, 1, \dots, m-1\}$$
49+
50+
The table size $m$ is much smaller than $|U|$.
51+
52+
**Collision**: Two keys $k_1 \neq k_2$ with $h(k_1) = h(k_2)$ is a **collision**. Collisions are unavoidable if $n > m$.
53+
54+
### Hash Function Design
55+
56+
A good hash function satisfies the **simple uniform hashing assumption**: each key is equally likely to hash to any of the $m$ slots, independently of all other keys.
57+
58+
**Division method**: $h(k) = k \bmod m$. Choose $m$ to be a prime not close to a power of 2.
59+
60+
**Multiplication method**: $h(k) = \lfloor m \cdot (kA \bmod 1) \rfloor$ for some constant $0 < A < 1$.
61+
62+
---
63+
64+
## 4. Chaining
65+
66+
**Chaining** resolves collisions by placing all elements that hash to the same slot into a **linked list**.
67+
68+
* Slot $j$ contains a pointer to the head of the list of all elements with $h(k) = j$.
69+
70+
### Pseudocode
71+
72+
```text
73+
CHAINED-HASH-INSERT(T, x)
74+
1. insert x at the head of list T[h(x.key)]
75+
76+
CHAINED-HASH-SEARCH(T, k)
77+
1. search for an element with key k in list T[h(k)]
78+
79+
CHAINED-HASH-DELETE(T, x)
80+
1. delete x from the list T[h(x.key)]
81+
```
82+
83+
`INSERT` takes $O(1)$ time. `DELETE` takes $O(1)$ if lists are doubly linked.
84+
85+
---
86+
87+
## 5. Analysis of Chaining
88+
89+
Define the **load factor** $\alpha = n/m$ (average number of elements per slot).
90+
91+
**Theorem**: Under simple uniform hashing, an unsuccessful search takes expected time $\Theta(1 + \alpha)$.
92+
93+
**Proof sketch**: An unsuccessful search examines all elements in slot $h(k)$. The expected list length is $\alpha = n/m$. Adding $O(1)$ for computing $h(k)$ gives $\Theta(1 + \alpha)$.
94+
95+
**Theorem**: Under simple uniform hashing, a successful search takes expected time $\Theta(1 + \alpha)$.
96+
97+
**Interpretation**: If $n = O(m)$ (i.e., $\alpha = O(1)$), all operations take $O(1)$ expected time.
98+
99+
---
100+
101+
## 6. Open Addressing
102+
103+
In **open addressing**, all elements are stored in the hash table itself (no linked lists). On collision, we **probe** for an alternative slot.
104+
105+
A **probe sequence** for key $k$ is a permutation $\langle h(k,0), h(k,1), \dots, h(k,m-1) \rangle$ of $\{0,1,\dots,m-1\}$.
106+
107+
```text
108+
HASH-INSERT(T, k)
109+
1. i = 0
110+
2. repeat
111+
3. j = h(k, i)
112+
4. if T[j] == NIL
113+
5. T[j] = k
114+
6. return j
115+
7. else i = i + 1
116+
8. until i == m
117+
9. error "hash table overflow"
118+
```
119+
120+
```text
121+
HASH-SEARCH(T, k)
122+
1. i = 0
123+
2. repeat
124+
3. j = h(k, i)
125+
4. if T[j] == k
126+
5. return j
127+
6. i = i + 1
128+
7. until T[j] == NIL or i == m
129+
8. return NIL
130+
```
131+
132+
**Deletion** is tricky: cannot just set to `NIL` (would break search). Use a special `DELETED` sentinel.
133+
134+
### Probing Strategies
135+
136+
**Linear Probing**: $h(k, i) = (h'(k) + i) \bmod m$.
137+
* Simple but causes **primary clustering**: long runs of occupied slots form and grow.
138+
139+
**Quadratic Probing**: $h(k, i) = (h'(k) + c_1 i + c_2 i^2) \bmod m$.
140+
* Reduces primary clustering but causes **secondary clustering**: two keys with the same $h'(k)$ have identical probe sequences.
141+
142+
**Double Hashing**: $h(k, i) = (h_1(k) + i \cdot h_2(k)) \bmod m$.
143+
* Uses two independent hash functions.
144+
* Gives $\Theta(m^2)$ distinct probe sequences; approximates uniform hashing well.
145+
* Requirement: $h_2(k)$ must be coprime to $m$ for all $k$ (e.g., choose $m$ prime).
146+
147+
---
148+
149+
## 7. Analysis of Open Addressing
150+
151+
Assume **uniform hashing**: each key is equally likely to have any of the $m!$ permutations as its probe sequence.
152+
153+
**Theorem**: Under uniform hashing with load factor $\alpha = n/m < 1$:
154+
155+
* Expected number of probes in an **unsuccessful search**: $\leq \dfrac{1}{1 - \alpha}$.
156+
* Expected number of probes in a **successful search**: $\leq \dfrac{1}{\alpha} \ln \dfrac{1}{1-\alpha}$.
157+
158+
**Implication**: For $\alpha$ bounded away from 1, operations take $O(1)$ expected time. As $\alpha \to 1$, performance degrades sharply.
159+
160+
---
161+
162+
## 8. Universal Hashing
163+
164+
**Problem with fixed hash functions**: For any deterministic $h$, an adversary can choose $n$ keys that all hash to the same slot, giving $\Theta(n)$ worst-case time per operation.
165+
166+
**Solution**: Choose the hash function **randomly** at runtime from a family $\mathcal{H}$.
167+
168+
**Definition**: A family $\mathcal{H}$ of hash functions from $U$ to $\{0, \dots, m-1\}$ is **universal** if for any two distinct keys $k, \ell \in U$:
169+
$$\Pr_{h \in \mathcal{H}}[h(k) = h(\ell)] \leq \frac{1}{m}$$
170+
171+
**Theorem**: If $h$ is chosen uniformly from a universal family $\mathcal{H}$, and we use chaining, then for any key $k$:
172+
$$E[\text{number of collisions with } k] < \frac{n}{m} = \alpha$$
173+
174+
So all operations take $O(1 + \alpha) = O(1)$ expected time when $n = O(m)$, regardless of the input.
175+
176+
---
177+
178+
## 9. A Universal Hash Family
179+
180+
**Construction**: Let $p$ be a prime larger than $|U|$. For $a \in \{1, \dots, p-1\}$ and $b \in \{0, \dots, p-1\}$, define:
181+
$$h_{a,b}(k) = ((ak + b) \bmod p) \bmod m$$
182+
183+
The family $\mathcal{H} = \{ h_{a,b} : a \in \{1,\dots,p-1\}, b \in \{0,\dots,p-1\} \}$ is universal.
184+
185+
**Proof sketch**: For distinct $k, \ell \in U$, $ak + b \not\equiv a\ell + b \pmod{p}$, so their images in $\mathbb{Z}_p$ are distinct and uniformly distributed. The probability both map to the same slot modulo $m$ is at most $\lceil p/m \rceil / (p-1) \leq 1/m$ for $p \geq m$.
186+
187+
---
188+
189+
## References
190+
191+
* **CLRS**: Chapter 11 — Hash Tables (Sections 11.1–11.5).
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
title: 10 - Binary Search Trees
3+
date: 2026-02-23
4+
---
5+
6+
## Roadmap
7+
8+
This lecture introduces **Binary Search Trees (BSTs)** as a data structure for dynamic ordered sets. We define the BST property, present the fundamental operations, and analyze their running time.
9+
10+
1. **BST Property**: Definition and representation.
11+
2. **Traversal**: In-order, pre-order, post-order.
12+
3. **Searching**: `SEARCH`, `MINIMUM`, `MAXIMUM`, `SUCCESSOR`, `PREDECESSOR`.
13+
4. **Modification**: `INSERT` and `DELETE`.
14+
5. **Analysis**: Running time and the problem of balance.
15+
16+
---
17+
18+
## 1. The BST Property
19+
20+
A **binary search tree** is a rooted binary tree where each node $x$ has fields: `key`, `left`, `right`, and `parent`.
21+
22+
**BST Property**: For any node $x$:
23+
24+
* If $y$ is in the **left subtree** of $x$: $y.\text{key} \leq x.\text{key}$.
25+
* If $y$ is in the **right subtree** of $x$: $y.\text{key} \geq x.\text{key}$.
26+
27+
The **height** $h$ of a BST determines the cost of all operations. In the worst case $h = \Theta(n)$ (degenerate tree); for a balanced tree $h = \Theta(\log n)$.
28+
29+
---
30+
31+
## 2. Tree Traversal
32+
33+
**In-order traversal** visits nodes in sorted order.
34+
35+
```text
36+
INORDER-TREE-WALK(x)
37+
1. if x != NIL
38+
2. INORDER-TREE-WALK(x.left)
39+
3. print x.key
40+
4. INORDER-TREE-WALK(x.right)
41+
```
42+
43+
Running time: $\Theta(n)$ for a tree with $n$ nodes.
44+
45+
Similarly, **pre-order** (root, left, right) and **post-order** (left, right, root) traversals are defined.
46+
47+
---
48+
49+
## 3. Searching
50+
51+
### SEARCH
52+
53+
```text
54+
TREE-SEARCH(x, k)
55+
1. if x == NIL or k == x.key
56+
2. return x
57+
3. if k < x.key
58+
4. return TREE-SEARCH(x.left, k)
59+
5. else return TREE-SEARCH(x.right, k)
60+
```
61+
62+
Running time: $O(h)$.
63+
64+
### MINIMUM and MAXIMUM
65+
66+
```text
67+
TREE-MINIMUM(x)
68+
1. while x.left != NIL
69+
2. x = x.left
70+
3. return x
71+
```
72+
73+
```text
74+
TREE-MAXIMUM(x)
75+
1. while x.right != NIL
76+
2. x = x.right
77+
3. return x
78+
```
79+
80+
Both run in $O(h)$.
81+
82+
### SUCCESSOR
83+
84+
The **successor** of node $x$ is the node with the smallest key greater than $x.\text{key}$.
85+
86+
```text
87+
TREE-SUCCESSOR(x)
88+
1. if x.right != NIL
89+
2. return TREE-MINIMUM(x.right)
90+
3. y = x.parent
91+
4. while y != NIL and x == y.right
92+
5. x = y
93+
6. y = y.parent
94+
7. return y
95+
```
96+
97+
Running time: $O(h)$.
98+
99+
---
100+
101+
## 4. Modification
102+
103+
### INSERT
104+
105+
```text
106+
TREE-INSERT(T, z)
107+
1. y = NIL
108+
2. x = T.root
109+
3. while x != NIL
110+
4. y = x
111+
5. if z.key < x.key
112+
6. x = x.left
113+
7. else x = x.right
114+
8. z.parent = y
115+
9. if y == NIL
116+
10. T.root = z // tree was empty
117+
11. elif z.key < y.key
118+
12. y.left = z
119+
13. else y.right = z
120+
```
121+
122+
Running time: $O(h)$.
123+
124+
### DELETE
125+
126+
Three cases when deleting node $z$:
127+
128+
1. **$z$ has no children**: Simply remove $z$.
129+
2. **$z$ has one child**: Splice out $z$, linking $z$'s parent to $z$'s child.
130+
3. **$z$ has two children**: Find $z$'s successor $y$ (which has at most one child). Copy $y$'s key into $z$, then delete $y$ (falls into case 1 or 2).
131+
132+
We use a helper `TRANSPLANT` to replace one subtree with another:
133+
134+
```text
135+
TRANSPLANT(T, u, v)
136+
1. if u.parent == NIL
137+
2. T.root = v
138+
3. elif u == u.parent.left
139+
4. u.parent.left = v
140+
5. else u.parent.right = v
141+
6. if v != NIL
142+
7. v.parent = u.parent
143+
```
144+
145+
```text
146+
TREE-DELETE(T, z)
147+
1. if z.left == NIL
148+
2. TRANSPLANT(T, z, z.right)
149+
3. elif z.right == NIL
150+
4. TRANSPLANT(T, z, z.left)
151+
5. else y = TREE-MINIMUM(z.right)
152+
6. if y.parent != z
153+
7. TRANSPLANT(T, y, y.right)
154+
8. y.right = z.right
155+
9. y.right.parent = y
156+
10. TRANSPLANT(T, z, y)
157+
11. y.left = z.left
158+
12. y.left.parent = y
159+
```
160+
161+
Running time: $O(h)$.
162+
163+
---
164+
165+
## 5. Analysis
166+
167+
All BST operations run in $O(h)$ time. The height $h$ depends on how balanced the tree is:
168+
169+
| Tree Shape | Height | Operation Cost |
170+
|---|---|---|
171+
| Balanced | $\Theta(\log n)$ | $\Theta(\log n)$ |
172+
| Degenerate (sorted input) | $\Theta(n)$ | $\Theta(n)$ |
173+
| Random insertions (expected) | $\Theta(\log n)$ | $\Theta(\log n)$ |
174+
175+
**Problem**: Without rebalancing, adversarial input order yields a degenerate tree. This motivates **Red-Black Trees**, which maintain balance explicitly.
176+
177+
---
178+
179+
## References
180+
181+
* **CLRS**: Chapter 12 — Binary Search Trees.

0 commit comments

Comments
 (0)