Skip to content

Commit 7bcf0dd

Browse files
Update README.md with Chapter 5 structure and clickable navigation
1 parent d81d74d commit 7bcf0dd

1 file changed

Lines changed: 56 additions & 12 deletions

File tree

README.md

Lines changed: 56 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,15 @@ It provides clean, minimal, and well-tested implementations of key reinforcement
77

88
---
99

10+
## 📑 Chapter Navigation
11+
12+
- [Chapter 2: The RL Problem Formulation](./ch2_rl_formulation)
13+
- [Chapter 3: Multi-Armed Bandits](./ch3_multi_armed_bandits)
14+
- [Chapter 4: Dynamic Programming Approaches](./ch4_dynamic_programming)
15+
- [Chapter 5: Monte Carlo Methods](./ch5_monte_carlo)
16+
17+
---
18+
1019
## 📂 Repository Structure
1120

1221
```
@@ -22,30 +31,35 @@ rl-fundamentals-code/
2231
├─ ch3_multi_armed_bandits/ # Chapter 3: Multi-Armed Bandits
2332
│ ├─ bandits.py # Bernoulli & Gaussian bandit environments
2433
│ ├─ epsilon_greedy.py # Sample-average ε-greedy agent
25-
│ ├─ ucb.py # UCB1 agent (with tunable exploration constant)
26-
│ ├─ thompson.py # Beta–Bernoulli Thompson Sampling agent
34+
│ ├─ ucb.py # UCB1 agent
35+
│ ├─ thompson.py # Thompson Sampling (Beta–Bernoulli)
2736
│ ├─ experiments.py # Run algorithms, generate regret plots
28-
│ ├─ plots/ # Saved figures (regret_bernoulli.png, etc.)
37+
│ ├─ plots/ # Saved figures
2938
│ └─ tests/ # Regression tests (ordering, sublinear regret)
3039
3140
├─ ch4_dynamic_programming/ # Chapter 4: Dynamic Programming Approaches
32-
│ ├─ gridworld.py # 4x4 deterministic GridWorld MDP
41+
│ ├─ gridworld.py # 4x4 deterministic GridWorld
3342
│ ├─ policy_evaluation.py # Iterative policy evaluation
3443
│ ├─ policy_iteration.py # Howard’s policy iteration
3544
│ ├─ value_iteration.py # Bellman optimality (value iteration)
36-
│ ├─ utils.py # Uniform random + greedy helpers
45+
│ ├─ utils.py # Random + greedy helpers
3746
│ ├─ examples/ # Run PI/VI demos
38-
│ └─ tests/ # Pytest checks for DP convergence/optimality
47+
│ └─ tests/ # Pytest checks for DP convergence
3948
40-
├─ utils/ # Shared helper utilities (future use)
49+
├─ ch5_monte_carlo/ # Chapter 5: Monte Carlo Methods
50+
│ ├─ examples/
51+
│ │ ├─ mc_prediction_demo.py # First-visit vs every-visit MC (two-state MDP)
52+
│ │ ├─ mc_control_es_gridworld.py # MC control with Exploring Starts
53+
│ │ ├─ mc_control_onpolicy_gridworld.py# On-policy MC control with ε-soft policies
54+
│ │ └─ mc_offpolicy_is_demo.py # Off-policy IS: ordinary vs weighted
55+
│ └─ tests/
56+
│ ├─ test_mc_control.py # GridWorld control tests (MC-ES & on-policy)
57+
│ └─ test_offpolicy_is.py # Off-policy IS variance checks
4158
59+
├─ utils/ # Shared helper utilities (future use)
4260
├─ .github/workflows/ # CI: runs pytest on every push/PR
4361
│ └─ python-tests.yml
44-
45-
├─ requirements.txt # Global dependencies (numpy, matplotlib, pytest)
46-
├─ requirements_ch2.txt # Chapter 2–specific dependencies
47-
├─ requirements_ch3.txt # Chapter 3–specific dependencies
48-
├─ requirements_ch4.txt # Chapter 4–specific dependencies (optional)
62+
├─ requirements.txt # Global dependencies
4963
└─ README.md # Project overview + usage
5064
```
5165

@@ -102,6 +116,12 @@ Run only Chapter 4 tests:
102116
python -m pytest -q ch4_dynamic_programming/tests
103117
```
104118

119+
Run only Chapter 5 tests:
120+
121+
```bash
122+
python -m pytest -q ch5_monte_carlo/tests
123+
```
124+
105125
---
106126

107127
## 🧪 Examples
@@ -130,6 +150,30 @@ Run Value Iteration demo (Chapter 4):
130150
python -m ch4_dynamic_programming.examples.run_value_iteration
131151
```
132152

153+
Run MC prediction demo (Chapter 5):
154+
155+
```bash
156+
python -m ch5_monte_carlo.examples.mc_prediction_demo
157+
```
158+
159+
Run MC control with Exploring Starts (Chapter 5):
160+
161+
```bash
162+
python -m ch5_monte_carlo.examples.mc_control_es_gridworld
163+
```
164+
165+
Run on-policy MC control with ε-soft policies (Chapter 5):
166+
167+
```bash
168+
python -m ch5_monte_carlo.examples.mc_control_onpolicy_gridworld
169+
```
170+
171+
Run off-policy IS demo (Chapter 5):
172+
173+
```bash
174+
python -m ch5_monte_carlo.examples.mc_offpolicy_is_demo
175+
```
176+
133177
---
134178

135179
## ⚙️ Continuous Integration

0 commit comments

Comments
 (0)