This repository contains a collection of Reinforcement Learning (RL) algorithms implemented in the form of Jupyter notebooks. It serves as a sandbox environment for experimenting with and understanding RL algorithms. The goal is to provide clean, easy-to-follow implementations with explanations in order to facilitate learning and experimentation in RL.
Currently, the repository includes the following algorithms:
- Policy Gradient with Baseline (policy-gradient-baseline.ipynb)
- This notebook implements the vanilla policy gradient algorithm with a baseline, which reduces the variance of the gradient estimates and leads to more stable learning.
- The baseline is typically the state-value function, which helps in faster convergence.
- Trust Region Policy Optimization (TRPO) (trpo.ipynb)
- This notebook demonstrates the TRPO algorithm, which improves policy optimization by enforcing a trust region constraint, ensuring stable and monotonic policy updates.
- TRPO is known for its reliability and effectiveness in complex RL environments.
Make sure you have the following installed:
- Python 3.x
- Jupyter Notebook
- Recommended: A virtual environment such as venv or conda.