Update README with Chapter 11 (Policy Gradient REINFORCE)

srikanthbaride · srikanthbaride · commit 00778b311f9e · 2025-09-12T12:34:08.000-05:00
diff --git a/README.md b/README.md
@@ -10,6 +10,7 @@
 [![ch8](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch8.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch8.yml)
 [![ch9](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch9.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch9.yml)
 [![ch10](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch10.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch10.yml)
+[![ch11](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch11.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch11.yml)
 
 ---
 
@@ -28,24 +29,26 @@ It provides clean, minimal, and well-tested implementations of key reinforcement
 - [Chapter 8: Eligibility Traces and TD(λ)](./ch8_td_lambda)
 - [Chapter 9: Model-Based RL and Planning](./ch9_model_based_planning)
 - [Chapter 10: Function Approximation Basics](./ch10_function_approx)
+- [Chapter 11: Policy Gradient Fundamentals (REINFORCE)](./ch11_policy_gradient)
 
 
 ---
 
 ## 📊 Chapter Progress
 
-| Chapter | Title                           | Status        | Notes                                               |
-|---------|---------------------------------|---------------|-----------------------------------------------------|
-| 1       | Introduction                    | ✅ Complete    | Book only (no code needed)                          |
-| 2       | The RL Problem Formulation      | ✅ Complete    | GridWorld, evaluation, policies, examples           |
-| 3       | Multi-Armed Bandits             | ✅ Complete    | Bandit envs, ε-greedy, UCB, Thompson                |
-| 4       | Dynamic Programming Approaches  | ✅ Complete    | Policy Iteration, Value Iteration                   |
-| 5       | Monte Carlo Methods             | ✅ Complete    | Prediction, Control, On/Off-Policy                  |
-| 6       | Temporal-Difference Learning    | ✅ Complete    | TD(0), n-step TD, prediction examples               |
-| 7       | TD Control                      | ✅ Complete    | SARSA, Q-learning, Cliff-Walking, exploration       |
-| 8       | Eligibility Traces and TD(λ)    | ✅ Complete    | TD(λ), SARSA(λ), True Online TD(λ), gridworld demos |
-| 9       | Model-Based RL and Planning     | ✅ Complete    | Dyna-Q, planning with rollouts, gridworld demos     |
-| 10      | Function Approximation Basics   | ✅ Complete    | Linear approx, tile coding, TD(0), SARSA, Mountain Car |
+| Chapter | Title                            | Status        | Notes                                               |
+|---------|----------------------------------|---------------|-----------------------------------------------------|
+| 1       | Introduction                     | ✅ Complete    | Book only (no code needed)                          |
+| 2       | The RL Problem Formulation       | ✅ Complete    | GridWorld, evaluation, policies, examples           |
+| 3       | Multi-Armed Bandits              | ✅ Complete    | Bandit envs, ε-greedy, UCB, Thompson                |
+| 4       | Dynamic Programming Approaches   | ✅ Complete    | Policy Iteration, Value Iteration                   |
+| 5       | Monte Carlo Methods              | ✅ Complete    | Prediction, Control, On/Off-Policy                  |
+| 6       | Temporal-Difference Learning     | ✅ Complete    | TD(0), n-step TD, prediction examples               |
+| 7       | TD Control                       | ✅ Complete    | SARSA, Q-learning, Cliff-Walking, exploration       |
+| 8       | Eligibility Traces and TD(λ)     | ✅ Complete    | TD(λ), SARSA(λ), True Online TD(λ), gridworld demos |
+| 9       | Model-Based RL and Planning      | ✅ Complete    | Dyna-Q, planning with rollouts, gridworld demos     |
+| 10      | Function Approximation Basics    | ✅ Complete    | Linear approx, tile coding, TD(0), SARSA, Mountain Car |
+| 11      | Policy Gradient Fundamentals     | ✅ Complete    | REINFORCE, baselines, softmax & Gaussian policies   |
 
 ---
 
@@ -61,6 +64,8 @@ rl-fundamentals-code/
 ├─ ch7_td_control/                 # Chapter 7
 ├─ ch8_td_lambda/                  # Chapter 8
 ├─ ch9_model_based_planning/       # Chapter 9
+├─ ch10_function_approx/          # Chapter 10
+├─ ch11_policy_gradient/          # Chapter 11
 ├─ utils/
 └─ .github/workflows/
 ```