1010[ ![ ch8] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch8.yml/badge.svg )] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch8.yml )
1111[ ![ ch9] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch9.yml/badge.svg )] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch9.yml )
1212[ ![ ch10] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch10.yml/badge.svg )] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch10.yml )
13+ [ ![ ch11] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch11.yml/badge.svg )] ( https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch11.yml )
1314
1415---
1516
@@ -28,24 +29,26 @@ It provides clean, minimal, and well-tested implementations of key reinforcement
2829- [ Chapter 8: Eligibility Traces and TD(λ)] ( ./ch8_td_lambda )
2930- [ Chapter 9: Model-Based RL and Planning] ( ./ch9_model_based_planning )
3031- [ Chapter 10: Function Approximation Basics] ( ./ch10_function_approx )
32+ - [ Chapter 11: Policy Gradient Fundamentals (REINFORCE)] ( ./ch11_policy_gradient )
3133
3234
3335---
3436
3537## 📊 Chapter Progress
3638
37- | Chapter | Title | Status | Notes |
38- | ---------| ---------------------------------| ---------------| -----------------------------------------------------|
39- | 1 | Introduction | ✅ Complete | Book only (no code needed) |
40- | 2 | The RL Problem Formulation | ✅ Complete | GridWorld, evaluation, policies, examples |
41- | 3 | Multi-Armed Bandits | ✅ Complete | Bandit envs, ε-greedy, UCB, Thompson |
42- | 4 | Dynamic Programming Approaches | ✅ Complete | Policy Iteration, Value Iteration |
43- | 5 | Monte Carlo Methods | ✅ Complete | Prediction, Control, On/Off-Policy |
44- | 6 | Temporal-Difference Learning | ✅ Complete | TD(0), n-step TD, prediction examples |
45- | 7 | TD Control | ✅ Complete | SARSA, Q-learning, Cliff-Walking, exploration |
46- | 8 | Eligibility Traces and TD(λ) | ✅ Complete | TD(λ), SARSA(λ), True Online TD(λ), gridworld demos |
47- | 9 | Model-Based RL and Planning | ✅ Complete | Dyna-Q, planning with rollouts, gridworld demos |
48- | 10 | Function Approximation Basics | ✅ Complete | Linear approx, tile coding, TD(0), SARSA, Mountain Car |
39+ | Chapter | Title | Status | Notes |
40+ | ---------| ----------------------------------| ---------------| -----------------------------------------------------|
41+ | 1 | Introduction | ✅ Complete | Book only (no code needed) |
42+ | 2 | The RL Problem Formulation | ✅ Complete | GridWorld, evaluation, policies, examples |
43+ | 3 | Multi-Armed Bandits | ✅ Complete | Bandit envs, ε-greedy, UCB, Thompson |
44+ | 4 | Dynamic Programming Approaches | ✅ Complete | Policy Iteration, Value Iteration |
45+ | 5 | Monte Carlo Methods | ✅ Complete | Prediction, Control, On/Off-Policy |
46+ | 6 | Temporal-Difference Learning | ✅ Complete | TD(0), n-step TD, prediction examples |
47+ | 7 | TD Control | ✅ Complete | SARSA, Q-learning, Cliff-Walking, exploration |
48+ | 8 | Eligibility Traces and TD(λ) | ✅ Complete | TD(λ), SARSA(λ), True Online TD(λ), gridworld demos |
49+ | 9 | Model-Based RL and Planning | ✅ Complete | Dyna-Q, planning with rollouts, gridworld demos |
50+ | 10 | Function Approximation Basics | ✅ Complete | Linear approx, tile coding, TD(0), SARSA, Mountain Car |
51+ | 11 | Policy Gradient Fundamentals | ✅ Complete | REINFORCE, baselines, softmax & Gaussian policies |
4952
5053---
5154
@@ -61,6 +64,8 @@ rl-fundamentals-code/
6164├─ ch7_td_control/ # Chapter 7
6265├─ ch8_td_lambda/ # Chapter 8
6366├─ ch9_model_based_planning/ # Chapter 9
67+ ├─ ch10_function_approx/ # Chapter 10
68+ ├─ ch11_policy_gradient/ # Chapter 11
6469├─ utils/
6570└─ .github/workflows/
6671```
0 commit comments