@@ -7,6 +7,15 @@ It provides clean, minimal, and well-tested implementations of key reinforcement
77
88---
99
10+ ## 📑 Chapter Navigation
11+
12+ - [ Chapter 2: The RL Problem Formulation] ( ./ch2_rl_formulation )
13+ - [ Chapter 3: Multi-Armed Bandits] ( ./ch3_multi_armed_bandits )
14+ - [ Chapter 4: Dynamic Programming Approaches] ( ./ch4_dynamic_programming )
15+ - [ Chapter 5: Monte Carlo Methods] ( ./ch5_monte_carlo )
16+
17+ ---
18+
1019## 📂 Repository Structure
1120
1221```
@@ -22,30 +31,35 @@ rl-fundamentals-code/
2231├─ ch3_multi_armed_bandits/ # Chapter 3: Multi-Armed Bandits
2332│ ├─ bandits.py # Bernoulli & Gaussian bandit environments
2433│ ├─ epsilon_greedy.py # Sample-average ε-greedy agent
25- │ ├─ ucb.py # UCB1 agent (with tunable exploration constant)
26- │ ├─ thompson.py # Beta–Bernoulli Thompson Sampling agent
34+ │ ├─ ucb.py # UCB1 agent
35+ │ ├─ thompson.py # Thompson Sampling (Beta–Bernoulli)
2736│ ├─ experiments.py # Run algorithms, generate regret plots
28- │ ├─ plots/ # Saved figures (regret_bernoulli.png, etc.)
37+ │ ├─ plots/ # Saved figures
2938│ └─ tests/ # Regression tests (ordering, sublinear regret)
3039│
3140├─ ch4_dynamic_programming/ # Chapter 4: Dynamic Programming Approaches
32- │ ├─ gridworld.py # 4x4 deterministic GridWorld MDP
41+ │ ├─ gridworld.py # 4x4 deterministic GridWorld
3342│ ├─ policy_evaluation.py # Iterative policy evaluation
3443│ ├─ policy_iteration.py # Howard’s policy iteration
3544│ ├─ value_iteration.py # Bellman optimality (value iteration)
36- │ ├─ utils.py # Uniform random + greedy helpers
45+ │ ├─ utils.py # Random + greedy helpers
3746│ ├─ examples/ # Run PI/VI demos
38- │ └─ tests/ # Pytest checks for DP convergence/optimality
47+ │ └─ tests/ # Pytest checks for DP convergence
3948│
40- ├─ utils/ # Shared helper utilities (future use)
49+ ├─ ch5_monte_carlo/ # Chapter 5: Monte Carlo Methods
50+ │ ├─ examples/
51+ │ │ ├─ mc_prediction_demo.py # First-visit vs every-visit MC (two-state MDP)
52+ │ │ ├─ mc_control_es_gridworld.py # MC control with Exploring Starts
53+ │ │ ├─ mc_control_onpolicy_gridworld.py# On-policy MC control with ε-soft policies
54+ │ │ └─ mc_offpolicy_is_demo.py # Off-policy IS: ordinary vs weighted
55+ │ └─ tests/
56+ │ ├─ test_mc_control.py # GridWorld control tests (MC-ES & on-policy)
57+ │ └─ test_offpolicy_is.py # Off-policy IS variance checks
4158│
59+ ├─ utils/ # Shared helper utilities (future use)
4260├─ .github/workflows/ # CI: runs pytest on every push/PR
4361│ └─ python-tests.yml
44- │
45- ├─ requirements.txt # Global dependencies (numpy, matplotlib, pytest)
46- ├─ requirements_ch2.txt # Chapter 2–specific dependencies
47- ├─ requirements_ch3.txt # Chapter 3–specific dependencies
48- ├─ requirements_ch4.txt # Chapter 4–specific dependencies (optional)
62+ ├─ requirements.txt # Global dependencies
4963└─ README.md # Project overview + usage
5064```
5165
@@ -102,6 +116,12 @@ Run only Chapter 4 tests:
102116python -m pytest -q ch4_dynamic_programming/tests
103117```
104118
119+ Run only Chapter 5 tests:
120+
121+ ``` bash
122+ python -m pytest -q ch5_monte_carlo/tests
123+ ```
124+
105125---
106126
107127## 🧪 Examples
@@ -130,6 +150,30 @@ Run Value Iteration demo (Chapter 4):
130150python -m ch4_dynamic_programming.examples.run_value_iteration
131151```
132152
153+ Run MC prediction demo (Chapter 5):
154+
155+ ``` bash
156+ python -m ch5_monte_carlo.examples.mc_prediction_demo
157+ ```
158+
159+ Run MC control with Exploring Starts (Chapter 5):
160+
161+ ``` bash
162+ python -m ch5_monte_carlo.examples.mc_control_es_gridworld
163+ ```
164+
165+ Run on-policy MC control with ε-soft policies (Chapter 5):
166+
167+ ``` bash
168+ python -m ch5_monte_carlo.examples.mc_control_onpolicy_gridworld
169+ ```
170+
171+ Run off-policy IS demo (Chapter 5):
172+
173+ ``` bash
174+ python -m ch5_monte_carlo.examples.mc_offpolicy_is_demo
175+ ```
176+
133177---
134178
135179## ⚙️ Continuous Integration
0 commit comments