RL Sandbox

This repository contains a collection of Reinforcement Learning (RL) algorithms implemented in the form of Jupyter notebooks. It serves as a sandbox environment for experimenting with and understanding RL algorithms. The goal is to provide clean, easy-to-follow implementations with explanations in order to facilitate learning and experimentation in RL.

Overview

Currently, the repository includes the following algorithms:

Policy Gradient with Baseline (policy-gradient-baseline.ipynb)
- This notebook implements the vanilla policy gradient algorithm with a baseline, which reduces the variance of the gradient estimates and leads to more stable learning.
- The baseline is typically the state-value function, which helps in faster convergence.
Trust Region Policy Optimization (TRPO) (trpo.ipynb)
- This notebook demonstrates the TRPO algorithm, which improves policy optimization by enforcing a trust region constraint, ensuring stable and monotonic policy updates.
- TRPO is known for its reliability and effectiveness in complex RL environments.

Getting Started

Prerequisites

Make sure you have the following installed:

Python 3.x
Jupyter Notebook
Recommended: A virtual environment such as venv or conda.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
policy-gradient-baseline.ipynb		policy-gradient-baseline.ipynb
q-learning.ipynb		q-learning.ipynb
trpo.ipynb		trpo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Sandbox

Overview

Getting Started

Prerequisites

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Sandbox

Overview

Getting Started

Prerequisites

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages