@@ -12,18 +12,33 @@ It provides clean, minimal, and well-tested implementations of key reinforcement
1212
1313```
1414rl-fundamentals-code/
15- ├─ ch2_rl_formulation/ # Chapter 2: The RL Problem Formulation
16- │ ├─ gridworld.py # 4x4 GridWorld MDP
17- │ ├─ evaluation.py # Policy evaluation, q_from_v
18- │ ├─ policies.py # Greedy & ε-greedy policies
19- │ ├─ examples/ # Numeric examples + GridWorld demo
20- │ └─ tests/ # Pytest-based validation
21- ├─ ch3_multi_armed_bandits/ # Chapter 3: Multi-Armed Bandits (placeholder)
22- │ └─ tests/
23- ├─ utils/ # Shared helper utilities
24- ├─ .github/workflows/ # CI: runs PyTest on every push/PR
25- ├─ requirements.txt # Global dependencies
26- └─ README.md
15+ ├─ ch2_rl_formulation/ # Chapter 2: The RL Problem Formulation
16+ │ ├─ gridworld.py # 4x4 GridWorld MDP (tabular P,R builder)
17+ │ ├─ evaluation.py # Policy evaluation, q_from_v(), greedy_from_q()
18+ │ ├─ policies.py # Deterministic & ε-greedy policies
19+ │ ├─ value_iteration.py # Bellman optimality, value iteration
20+ │ ├─ examples/ # Numeric examples, GridWorld demo, plotting
21+ │ └─ tests/ # Pytest-based checks for chapter numbers
22+ │
23+ ├─ ch3_multi_armed_bandits/ # Chapter 3: Multi-Armed Bandits
24+ │ ├─ bandits.py # Bernoulli & Gaussian bandit environments
25+ │ ├─ epsilon_greedy.py # Sample-average ε-greedy agent
26+ │ ├─ ucb.py # UCB1 agent (with tunable exploration constant)
27+ │ ├─ thompson.py # Beta–Bernoulli Thompson Sampling agent
28+ │ ├─ experiments.py # Run algorithms, generate regret plots
29+ │ ├─ plots/ # Saved figures (regret_bernoulli.png, etc.)
30+ │ └─ tests/ # Regression tests (ordering, sublinear regret)
31+ │
32+ ├─ utils/ # Shared helper utilities (future use)
33+ │
34+ ├─ .github/workflows/ # CI: runs pytest on every push/PR
35+ │ └─ python-tests.yml
36+ │
37+ ├─ requirements.txt # Global dependencies (numpy, matplotlib, pytest)
38+ ├─ requirements_ch2.txt # Chapter 2–specific dependencies
39+ ├─ requirements_ch3.txt # Chapter 3–specific dependencies
40+ └─ README.md # Project overview + usage
41+
2742```
2843
2944---
0 commit comments