Optimizer Zoo

A beginner-friendly collection of optimizer implementations for learning and benchmarking.

The goal: Implement each optimizer yourself, from scratch, to deeply understand how it works. The repository provides detailed docstrings with math, pseudocode, and step-by-step implementation guides -- you fill in the actual code.

Learning Roadmap

Recommended order (each builds on the previous):

#	Optimizer	Difficulty	Key Concept	Paper
1	SGD + Momentum	Beginner	Velocity buffer, Nesterov lookahead	Sutskever 2013
2	Adam / AdamW	Beginner	Adaptive learning rates, bias correction	Kingma 2014, Loshchilov 2017
3	Lion	Beginner	sign() update, program search discovery	Chen 2023
4	Muon	Intermediate	Newton-Schulz orthogonalization, polar decomposition	Jordan 2025
5	Schedule-Free	Intermediate	Iterate averaging, no lr schedule needed	Defazio 2024
6	Sophia	Advanced	Diagonal Hessian estimation, clipped updates	Liu 2023
7	SOAP	Advanced	Kronecker preconditioners, eigenbasis rotation	Vyas 2024
8	PSGD-Kron	Expert	Lie group preconditioner fitting	Li 2015

Project Structure

optimizer-zoo/
├── optimizers/          # YOUR IMPLEMENTATIONS GO HERE
│   ├── __init__.py      # Registry (get_optimizer, list_optimizers)
│   ├── sgd_momentum.py  # SGD + Momentum + Nesterov
│   ├── adam.py          # Adam / AdamW
│   ├── lion.py          # Lion
│   ├── muon.py          # Muon
│   ├── sophia.py        # Sophia-H
│   ├── soap.py          # SOAP
│   ├── schedule_free.py # Schedule-Free AdamW
│   └── psgd.py          # PSGD-Kron
│
├── models/              # Benchmark networks (pre-built)
│   ├── mlp.py           # MLP for MNIST (~1 min)
│   ├── resnet.py        # ResNet-18 for CIFAR-10 (~5 min)
│   ├── vit.py           # ViT-Tiny for CIFAR-10 (~10 min)
│   └── gpt2.py          # Small GPT-2 for WikiText-2 (~20 min)
│
├── benchmarks/          # Training & comparison scripts
│   ├── train.py         # Unified training entry point
│   ├── compare.py       # Plot comparison charts
│   └── configs/         # Default hyperparameters per task
│
├── tests/               # Correctness verification
│   └── test_optimizers.py
│
└── scripts/
    └── run_all.sh       # Run all benchmarks for one optimizer

Quick Start

Setup

cd optimizer-zoo
pip install -e ".[dev]"

Workflow: Implement -> Test -> Benchmark

Step 1: Read the docstring in the optimizer file. It contains the math, pseudocode, and step-by-step guide.

Step 2: Implement the step() method (replace raise NotImplementedError).

Step 3: Test your implementation:

# Run convergence test for all implemented optimizers
pytest tests/test_optimizers.py -v

# Run test for a specific optimizer
pytest tests/test_optimizers.py -v -k "adam"

Step 4: Benchmark against other optimizers:

# Quick sanity check: MLP on MNIST
python benchmarks/train.py --model mlp --dataset mnist --optimizer adam --lr 1e-3

# Full benchmark: ViT on CIFAR-10
python benchmarks/train.py --config benchmarks/configs/cifar10_vit.yaml --optimizer adam

# Run all benchmarks for one optimizer
./scripts/run_all.sh adam

# Compare results
python benchmarks/compare.py --dir results/

Example: Implementing Lion

Open optimizers/lion.py and read the docstring

The core is just 3 lines:

update = torch.sign(beta1 * m + (1 - beta1) * grad)  # direction
param -= lr * update                                   # step
m = beta2 * m + (1 - beta2) * grad                    # update momentum

Run pytest tests/test_optimizers.py -v -k "lion" to verify
Run python benchmarks/train.py --model mlp --dataset mnist --optimizer lion --lr 1e-4

Benchmark Results

Fill in as you implement each optimizer:

MNIST + MLP

Optimizer	LR	Best Test Acc	Train Time
SGD+Momentum
AdamW
Lion
Muon

CIFAR-10 + ViT-Tiny

Optimizer	LR	Best Test Acc	Train Time
AdamW
Lion
Muon
SOAP

WikiText-2 + GPT-2

Optimizer	LR	Best Test PPL	Train Time
AdamW
Muon
SOAP

Tips

Start simple: Implement SGD first to understand the Optimizer base class
Read PyTorch source: torch.optim.Adam is a great reference for AdamW
Use the tests: They catch most implementation bugs automatically
Compare carefully: Different optimizers need different learning rates. The configs provide reasonable defaults
Focus on Transformers: Muon and SOAP shine on ViT and GPT-2, not so much on MLP/CNN

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmarks		benchmarks
models		models
optimizers		optimizers
scripts		scripts
tests		tests
toy_demos		toy_demos
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizer Zoo

Learning Roadmap

Project Structure

Quick Start

Setup

Workflow: Implement -> Test -> Benchmark

Example: Implementing Lion

Benchmark Results

MNIST + MLP

CIFAR-10 + ViT-Tiny

WikiText-2 + GPT-2

Tips

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Optimizer Zoo

Learning Roadmap

Project Structure

Quick Start

Setup

Workflow: Implement -> Test -> Benchmark

Example: Implementing Lion

Benchmark Results

MNIST + MLP

CIFAR-10 + ViT-Tiny

WikiText-2 + GPT-2

Tips

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages