soapy-rl

Codebase for offline DQN-style control experiments with SOAP and AdamW optimizers on pendulum dynamics. This repository implements the SOAP optimizer from SOAP: Improving and stabilizing shampoo using adam for language modeling for neural fitted Q-iteration tasks.

SOAP Optimizer: Approximate Gauss-Newton training
Training Framework: Discrete-action control and Q-iteration training using Neuromancer and PyTorch
Dynamics: Inverted Pendulum and Acrobot systems modeled as ODEs
Reproducibility: Experiment sweeps with multiple seeds and W&B integration for tracking results

Repository Structure

soapy-rl/
├── soap.py                      # SOAP optimizer implementation (https://arxiv.org/abs/2409.11321)
├── train.py                     # Main training entry point for single runs
├── pendulum.py                  # ODE dynamics: InvertedPendulum, Acrobot, DoubleInvertedPendulum, DoubleIntegrator
├── pyproject.toml              # Project metadata and dependencies
├── uv.lock                      # Reproducible dependency lock file (uv package manager)
├── scripts/
│   ├── training/
│   │   └── main_trainer.py      # Multi-run sweep launcher (20 experiments: 10 seeds × 2 optimizers)
│   └── analysis/
│       ├── wandb_data.py        # Download W&B run histories to local pickle files
│       └── plot.py              # Generate comparison plots from processed run data
├── archive/                     # Historical experiments (reference implementations)
├── data/
│   ├── raw/                     # Seed data and raw inputs
│   └── processed/               # Generated processed datasets (.pkl files)
├── figures/                     # Generated plot outputs
├── runs/                        # TensorBoard logs and checkpoints (not committed)
└── wandb/                       # W&B cache (not committed)

Quick Start

Installation is reproducible via uv.lock.

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Sync dependencies and create virtual environment:
```
uv sync
```
Activate the environment:
```
source .venv/bin/activate
```
Run training:
```
python train.py
```

Running Experiments

Single Training Run

# Default configuration (SOAP optimizer, 200 epochs, seed=1)
python train.py

# With custom parameters
python train.py --seed=42 --optimizer=adamw --learning_rate=0.001 --total_epochs=100

Multi-Run Sweep (Reproducing Paper Results)

# Runs 20 experiments: 10 seeds × 2 optimizers (SOAP vs AdamW)
python scripts/training/main_trainer.py

Retrieve Results from Weights & Biases

# Download run histories as local pickle files
python scripts/analysis/wandb_data.py

Generate Comparison Plots

# Create plots comparing optimizer performance
python scripts/analysis/plot.py

Configuration

Command-Line Arguments (train.py)

--seed: Random seed (default: 1)
--optimizer: Optimizer choice: soap, adam, adamw, sgd, rmsprop, asgd (default: soap)
--learning_rate: Learning rate (default: 1e-4)
--total_epochs: Number of training epochs (default: 200)
--batch_size: Batch size for training (default: 512)
--num_data: Size of initial dataset (default: 100000)
--track: Enable Weights & Biases logging (default: True)
--wandb_project_name: W&B project name (default: cleanRL)
--save_model: Save trained model to disk (default: True)
--env_id: Environment name used in run naming (default: pendulum)
--cuda: Use GPU if available (default: True)

Weights & Biases Integration

Experiment tracking is enabled by default and logged to W&B. To use this feature:

Create a Weights & Biases account
Authenticate locally:
```
wandb login
```
Runs will automatically sync to your W&B project (cleanRL by default)

To disable W&B logging:

python train.py --track=False

Output Structure

runs/{run_name}/: TensorBoard logs and model checkpoints
- runs/{run_name}/{exp_name}.cleanrl_model: Saved neural network weights
- events.out.tfevents.*: TensorBoard event files (training curves, metrics)
figures/: Generated comparison plots (matplotlib/plotly)
data/processed/: Processed datasets (pickle files)
wandb/: Local W&B cache (not committed to version control)

Dependencies

Key dependencies (see pyproject.toml for full list):

PyTorch: torch>=2.6.0 — Deep learning framework
Neuromancer: neuromancer>=1.5.6 — Physics-informed ML and systems modeling
Gymnasium: gymnasium>=1.2.2 — RL environment API
W&B: wandb>=0.21.0 — Experiment tracking
TensorBoard: tensorboard>=2.20.0 — Visualization
tyro: tyro>=0.9.35 — CLI argument parsing

Notes

Reproducibility: The uv.lock file ensures exact reproducibility of dependencies. Always use uv sync to maintain consistency.
Archive: The archive/ directory contains historical experiments and alternative implementations. These are included for reference but are not part of the primary release.
Development Environment: Active TensorBoard logs and W&B caches are stored locally in runs/ and wandb/ but are not committed to version control.
GPU: GPU acceleration is enabled by default if CUDA is available. To force CPU, use --cuda=False.

Paper Reference

If you use this code, please cite our paper:

@article{bestmckay2026errorwhitening,
  title={Error whitening: Why {Gauss-Newton} outperforms {Newton}},
  author={Best Mckay, Maricela and Lawrence, Nathan P and Wetton, Brian and Gopaluni, Bhushan},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
archive		archive
data/raw		data/raw
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pendulum.py		pendulum.py
pyproject.toml		pyproject.toml
soap.py		soap.py
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

soapy-rl

Repository Structure

Quick Start

Running Experiments

Single Training Run

Multi-Run Sweep (Reproducing Paper Results)

Retrieve Results from Weights & Biases

Generate Comparison Plots

Configuration

Command-Line Arguments (train.py)

Weights & Biases Integration

Output Structure

Dependencies

Notes

Paper Reference

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

soapy-rl

Repository Structure

Quick Start

Running Experiments

Single Training Run

Multi-Run Sweep (Reproducing Paper Results)

Retrieve Results from Weights & Biases

Generate Comparison Plots

Configuration

Command-Line Arguments (train.py)

Weights & Biases Integration

Output Structure

Dependencies

Notes

Paper Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages