This guide covers setting up the reinforcement learning development environment using either native Python or Docker.
The setup.sh script automatically detects your GPU and guides you through setup:
./setup.shDirect modes:
./setup.sh native # Native Python (auto-detect GPU)
./setup.sh docker # Docker container (auto-detect GPU)
./setup.sh native cpu # Force CPU-only
./setup.sh docker cuda # Force CUDA
./setup.sh docker rocm # Force ROCm (AMD GPUs)- Python 3.11 or later
- pip and venv
- (Optional) NVIDIA CUDA drivers or AMD ROCm drivers
-
Create virtual environment:
python3 -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate
-
Install base dependencies:
pip install -r requirements/requirements-base.txt
-
Install PyTorch (choose one):
CPU-only (lightweight, ~500MB):
pip install -r requirements/requirements-torch-cpu.txt
NVIDIA CUDA (requires CUDA 12.x drivers):
pip install -r requirements/requirements-torch-cuda.txt
AMD ROCm (requires ROCm 6.x drivers):
pip install -r requirements/requirements-torch-rocm.txt
-
Verify installation:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
- Docker Engine 20.10+
- Docker Compose v2
- (Optional) NVIDIA Container Toolkit for CUDA
- (Optional) ROCm drivers for AMD GPUs
-
CPU-only (lightweight, ~500MB base):
bash docker/run.sh cpu
-
NVIDIA CUDA (requires nvidia-docker):
bash docker/run.sh cuda
-
AMD ROCm (requires ROCm drivers):
bash docker/run.sh rocm
Build and run specific containers:
# Build
docker compose -f docker/docker-compose.yml build drl-cpu
# Run interactively
docker compose -f docker/docker-compose.yml run --rm drl-cpu
# Run specific example
docker compose -f docker/docker-compose.yml run --rm drl-cpu \
python modules/module_01_intro/examples/bandit_epsilon_greedy.pypython --version
pip list | grep -E "(torch|numpy|gymnasium|rich)"NVIDIA CUDA:
nvidia-smi
python -c "import torch; print(torch.cuda.is_available())"AMD ROCm:
rocminfo
python -c "import torch; print(torch.cuda.is_available())" # ROCm uses CUDA API# Simple bandit (NumPy only, works everywhere)
python modules/module_01_intro/examples/bandit_epsilon_greedy.py --arms 5 --steps 1000
# Deep RL example (requires PyTorch)
python modules/module_02_value_methods/examples/dqn_cartpole.py --episodes 501. PyTorch not using GPU:
- Verify drivers:
nvidia-smiorrocminfo - Check CUDA/ROCm versions match PyTorch requirements
- Ensure correct PyTorch installation (CUDA/ROCm variant)
2. Import errors in examples:
- Activate virtual environment:
source .venv/bin/activate - Install missing packages:
pip install -r requirements/requirements-base.txt
3. Docker GPU not accessible:
- NVIDIA: Install NVIDIA Container Toolkit
- AMD: Ensure ROCm drivers installed and
/dev/kfd,/dev/driaccessible
4. Python 3.13 compatibility:
- PyTorch wheels may not be available for Python 3.13
- Use Python 3.11 or Docker containers instead
Native environment:
- Use virtual environment to avoid conflicts
- Install only needed PyTorch variant (CPU/CUDA/ROCm)
- Consider using
mambainstead ofpipfor faster installs
Docker environment:
- Use volume caching for faster pip installs (already configured)
- Keep containers running with
docker compose up -dfor repeated use - Limit GPU visibility:
NVIDIA_VISIBLE_DEVICES=0 docker compose run drl-cuda
requirements/
├── requirements-base.txt # Core dependencies (NumPy, Rich, Gymnasium)
├── requirements-torch-cpu.txt # PyTorch CPU-only
├── requirements-torch-cuda.txt # PyTorch with CUDA support
└── requirements-torch-rocm.txt # PyTorch with ROCm support
Base dependencies (all environments):
numpy>=1.24- Numerical computingrich>=13.7- CLI formattinggymnasium[classic-control]>=1.0- RL environmentstensorboard>=2.16- Logging and metrics
PyTorch variants:
- CPU: Lightweight, ~500MB, works everywhere
- CUDA: NVIDIA GPUs, requires CUDA 12.x drivers
- ROCm: AMD GPUs, requires ROCm 6.x drivers
After setup is complete:
-
Run smoke test:
python scripts/smoke_test.py
The runner reports detected versions of PyTorch, Ray, and Optuna up front, and will gracefully skip optional checks (for example, Box2D-based environments) if those extras aren’t installed. Use
--core-onlyor--skip-optionalfor the quickest validation loops. -
Try examples:
# Module 1: Multi-armed bandits python modules/module_01_intro/examples/bandit_epsilon_greedy.py python modules/module_01_intro/examples/bandit_ucb.py # UCB exploration strategy # Module 2: Deep Q-Learning python modules/module_02_value_methods/examples/dqn_cartpole.py
-
Read module content:
- Start with
modules/module_01_intro/content.md - Each module has theory + runnable examples
- Start with
-
Experiment:
- Modify hyperparameters using CLI flags
- Add
--helpto any example for options