Setup and run guide for the training pipeline:
replay_gen -> behaviour cloning seed -> PPO train loop
If you cloned the repo without submodules, initialize them first:
git submodule update --init --recursiveInstall Python 3.13 into uv if you do not already have it:
uv python install 3.13Create the virtualenv and install Python dependencies from pyproject.toml / uv.lock:
uv syncFrom here on, run Python commands through uv:
uv run python <script>The project battles against a local Pokemon Showdown server.
Install the server dependencies once:
cd pokemon-showdown
npm install
cd ..For replay generation, start one local server in a separate terminal:
cd pokemon-showdown
node pokemon-showdown start --no-securityThat uses port 8000 by default, which matches the replay generation scripts.
Run this from the repo root while the local Showdown server is running:
uv run python src/replay_gen.py -n <number of replays>This writes replay shards under:
replays/fuzzy_heuristic/replays/simple_heuristic/replays/max_base_power/
Increase -n if you want more expert trajectories.
Train behaviour-cloned policies from those replay folders:
uv run python src/seed_pool.pyThis creates seed checkpoints in:
checkpoints/pool/seed_max_base_power.ptcheckpoints/pool/seed_simple_heuristic.ptcheckpoints/pool/seed_fuzzy_heuristic.pt
src/train_loop.py will automatically bootstrap from seed_fuzzy_heuristic.pt if there is no PPO checkpoint yet.
Start training from the repo root:
uv run python src/train_loop.pytrain_loop.py starts its own local Showdown processes, so you do not need to manually start the server for this step.
Training outputs:
- main checkpoint:
checkpoints/ppo_checkpoint.pt - opponent snapshots:
checkpoints/pool/ - TensorBoard logs:
runs/ppo_training/ - text log:
training.log
Training reads optional overrides from .ppoconfig in the repo root.
Current local example (for my CPU):
num_episodes=4
num_envs=1
n_jobs=2
batch_size=4
If .ppoconfig is missing, defaults from src/ppo_utils.py are used.
uv sync
cd pokemon-showdown && npm install && cd ..
# terminal 1
cd pokemon-showdown && node pokemon-showdown start --no-security
# terminal 2
uv run python src/replay_gen.py -n <number of replays>
uv run python src/seed_pool.py
uv run python src/train_loop.py