Skip to content

Commit 00778b3

Browse files
Update README with Chapter 11 (Policy Gradient REINFORCE)
1 parent 511a6f8 commit 00778b3

1 file changed

Lines changed: 17 additions & 12 deletions

File tree

README.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
[![ch8](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch8.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch8.yml)
1111
[![ch9](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch9.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch9.yml)
1212
[![ch10](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch10.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch10.yml)
13+
[![ch11](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch11.yml/badge.svg)](https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code/actions/workflows/ch11.yml)
1314

1415
---
1516

@@ -28,24 +29,26 @@ It provides clean, minimal, and well-tested implementations of key reinforcement
2829
- [Chapter 8: Eligibility Traces and TD(λ)](./ch8_td_lambda)
2930
- [Chapter 9: Model-Based RL and Planning](./ch9_model_based_planning)
3031
- [Chapter 10: Function Approximation Basics](./ch10_function_approx)
32+
- [Chapter 11: Policy Gradient Fundamentals (REINFORCE)](./ch11_policy_gradient)
3133

3234

3335
---
3436

3537
## 📊 Chapter Progress
3638

37-
| Chapter | Title | Status | Notes |
38-
|---------|---------------------------------|---------------|-----------------------------------------------------|
39-
| 1 | Introduction | ✅ Complete | Book only (no code needed) |
40-
| 2 | The RL Problem Formulation | ✅ Complete | GridWorld, evaluation, policies, examples |
41-
| 3 | Multi-Armed Bandits | ✅ Complete | Bandit envs, ε-greedy, UCB, Thompson |
42-
| 4 | Dynamic Programming Approaches | ✅ Complete | Policy Iteration, Value Iteration |
43-
| 5 | Monte Carlo Methods | ✅ Complete | Prediction, Control, On/Off-Policy |
44-
| 6 | Temporal-Difference Learning | ✅ Complete | TD(0), n-step TD, prediction examples |
45-
| 7 | TD Control | ✅ Complete | SARSA, Q-learning, Cliff-Walking, exploration |
46-
| 8 | Eligibility Traces and TD(λ) | ✅ Complete | TD(λ), SARSA(λ), True Online TD(λ), gridworld demos |
47-
| 9 | Model-Based RL and Planning | ✅ Complete | Dyna-Q, planning with rollouts, gridworld demos |
48-
| 10 | Function Approximation Basics | ✅ Complete | Linear approx, tile coding, TD(0), SARSA, Mountain Car |
39+
| Chapter | Title | Status | Notes |
40+
|---------|----------------------------------|---------------|-----------------------------------------------------|
41+
| 1 | Introduction | ✅ Complete | Book only (no code needed) |
42+
| 2 | The RL Problem Formulation | ✅ Complete | GridWorld, evaluation, policies, examples |
43+
| 3 | Multi-Armed Bandits | ✅ Complete | Bandit envs, ε-greedy, UCB, Thompson |
44+
| 4 | Dynamic Programming Approaches | ✅ Complete | Policy Iteration, Value Iteration |
45+
| 5 | Monte Carlo Methods | ✅ Complete | Prediction, Control, On/Off-Policy |
46+
| 6 | Temporal-Difference Learning | ✅ Complete | TD(0), n-step TD, prediction examples |
47+
| 7 | TD Control | ✅ Complete | SARSA, Q-learning, Cliff-Walking, exploration |
48+
| 8 | Eligibility Traces and TD(λ) | ✅ Complete | TD(λ), SARSA(λ), True Online TD(λ), gridworld demos |
49+
| 9 | Model-Based RL and Planning | ✅ Complete | Dyna-Q, planning with rollouts, gridworld demos |
50+
| 10 | Function Approximation Basics | ✅ Complete | Linear approx, tile coding, TD(0), SARSA, Mountain Car |
51+
| 11 | Policy Gradient Fundamentals | ✅ Complete | REINFORCE, baselines, softmax & Gaussian policies |
4952

5053
---
5154

@@ -61,6 +64,8 @@ rl-fundamentals-code/
6164
├─ ch7_td_control/ # Chapter 7
6265
├─ ch8_td_lambda/ # Chapter 8
6366
├─ ch9_model_based_planning/ # Chapter 9
67+
├─ ch10_function_approx/ # Chapter 10
68+
├─ ch11_policy_gradient/ # Chapter 11
6469
├─ utils/
6570
└─ .github/workflows/
6671
```

0 commit comments

Comments
 (0)