强化学习入门实验 —— 从零实现,一行一行理解
这是我在学习 Sutton & Barto《Reinforcement Learning: An Introduction》过程中的配套实验代码。
每个实验从零手写,不依赖 gym / stable-baselines 等框架,确保每一行代码都能被完全理解。
| # | 实验 | 章节 | 核心概念 |
|---|---|---|---|
| 1 | 多臂老虎机 ε-greedy | Ch.2 | 探索 vs 利用、增量更新、ε-greedy |
| 2 | Q-learning 网格世界 | Ch.6 | TD学习、Q表、ε-greedy策略、收敛 |
# 安装依赖(只需要 numpy + matplotlib)
pip install numpy matplotlib
# 实验 1:老虎机
python bandit/epsilon_greedy.py
# 实验 2:网格世界
python gridworld/q_learning.pyLiu Daihong (岱宗) — 大一本科生,与同学组队做 RL 科研。
Hands-on experiments accompanying my study of Sutton & Barto's "Reinforcement Learning: An Introduction".
Every experiment is built from scratch — no Gym, no Stable-Baselines. Every line is meant to be understood.
| # | Experiment | Chapter | Key Concepts |
|---|---|---|---|
| 1 | Multi-Armed Bandit (ε-greedy) | Ch.2 | Exploration vs Exploitation, Incremental Updates |
| 2 | Q-learning Grid World | Ch.6 | TD Learning, Q-table, Policy Convergence |
pip install numpy matplotlib
python bandit/epsilon_greedy.py
python gridworld/q_learning.pyLiu Daihong — Freshman undergrad, exploring RL with a research group.