Skip to content

Saheb/rl-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 RL Notebooks

A 17-notebook journey through Reinforcement Learning — from Bellman to AlphaZero to RLHF.

Learn by doing: each notebook is a workbook with guided # TODO scaffolds you fill in yourself. Solutions are included so you can check your work.

📖 Structure

# Title Key Ideas Colab GitHub
Act I — Mathematical Foundations
01 The Game of Life Agents, Environments, MDPs notebook exercise notebook exercise
02 Time Travel Returns, Bellman Equation notebook exercise notebook exercise
03 The Spreadsheet of the Mind Dynamic Programming, Value Iteration notebook exercise notebook exercise
Act II — Value-Based
04 Learning by Stumbling Q-Learning, Explore/Exploit notebook exercise notebook exercise
05 Giving the AI Eyes DQN, Replay Buffers notebook exercise notebook exercise
06 Brain Hacks Double DQN, Prioritized Replay notebook exercise notebook exercise
Act III — Policy-Based
07 Throwing Away the Table Policy Gradients, REINFORCE notebook exercise notebook exercise
08 The Player and the Coach Actor-Critic notebook exercise notebook exercise
09 Stepping Carefully PPO, GAE, Entropy Bonus notebook exercise notebook exercise
10 The Steering Wheel Continuous Actions, PPO notebook exercise notebook exercise
Act IV — Engineering
11 The Clone Army Distributed RL, A3C/IMPALA notebook exercise notebook exercise
12 Learning in the Dark Offline RL, CQL notebook exercise notebook exercise
13 The Dream Machine Model-Based RL, Dyna-Q notebook exercise notebook exercise
Act V — LLM Alignment
14 Slaying the Memory Monster GRPO (DeepSeek) notebook exercise notebook exercise
15 The Great Bypass DPO notebook exercise notebook exercise
Act VI — Grandmasters
16 The Infinite Curriculum Self-Play, MCTS, AlphaZero notebook exercise notebook exercise
17 The Hidden Board CFR, Poker notebook exercise notebook exercise

📂 Folders

  • notebooks/ — Complete notebooks with solutions filled in
  • exercises/ — Same notebooks with # TODO blocks for you to complete
  • utils/ — Shared plotting and environment helpers

🚀 Getting Started

# Clone the repo
git clone https://github.com/YOUR_USERNAME/rl-notebooks.git
cd rl-notebooks

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
uv pip install -r requirements.txt

# Launch Jupyter
jupyter notebook

Start with exercises/01_agents_envs_mdps.ipynb if you want to learn by doing, or notebooks/01_agents_envs_mdps.ipynb if you want to read the completed version.

🎯 Prerequisites

  • Python 3.10+
  • Basic comfort with NumPy
  • High-school math (we introduce all the RL math gently)
  • Curiosity 🙂

☁️ Google Colab

Every notebook works in Colab — click the badges in the table above to open any notebook or exercise directly.

📝 License

MIT

About

A 17-notebook journey through Reinforcement Learning — from Bellman to AlphaZero to RLHF.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors