RL study guide — foundations through RLHF, DPO, GRPO, RLVR, agentic RL, and offline RL. Hand-written CS294 notes, 19 lecture drafts, 5 tested exercises, citations that resolve.
-
Updated
May 15, 2026 - Python
RL study guide — foundations through RLHF, DPO, GRPO, RLVR, agentic RL, and offline RL. Hand-written CS294 notes, 19 lecture drafts, 5 tested exercises, citations that resolve.
n-armed bandit algorithms comparison + simulation app
Interactive RL learning platform: 13 chapters from Sutton & Barto, 18K+ lines, fill-in-the-blank exercises with bilingual explanations. Bandits → DP → MC → TD → Policy Gradient → DQN → PPO → SAC → MARL → RLHF
Fork of ShangtongZhang/reinforcement-learning-an-introduction - Python implementations of algorithms from Sutton and Barto's RL textbook (2nd Edition)
Reinforcement learning algorithms with mathematical derivations and Sutton & Barto figure reproductions.
Add a description, image, and links to the sutton-barto topic page so that developers can more easily learn about it.
To associate your repository with the sutton-barto topic, visit your repo's landing page and select "manage topics."