Skip to content

ToppatKing/powergrid-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PowerGrid-RL: Custom Gymnasium Environment + PPO Agent

CI Python 3.10+ License: MIT

A from-scratch reinforcement learning system featuring:

  1. PowerGrid-v0 — a physics-based electrical grid dispatch simulation built entirely from scratch using the Gymnasium API (no CartPole, no Atari).
  2. PPO Agent — Proximal Policy Optimisation implemented from scratch in PyTorch, complete with GAE-λ advantage estimation, clipped surrogate objective, entropy regularisation, and learning-rate annealing.

The agent learns to dispatch a fleet of power generators and a battery storage unit to meet stochastic electricity demand while minimising fuel cost, carbon emissions, and grid frequency instability.


The Environment: PowerGrid-v0

Physical Setup

 ┌──────────────────────────────────────────────────────────────────┐
 │                        POWER GRID                                │
 │                                                                  │
 │  Controllable generators                                         │
 │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌──────────┐             │
 │  │ Coal-1  │  │ Coal-2  │  │  Gas    │  │ Nuclear  │             │
 │  │ 500 MW  │  │ 400 MW  │  │ 250 MW  │  │ 900 MW   │             │
 │  │ $30/MWh │  │ $32/MWh │  │ $55/MWh │  │ $10/MWh  │             │
 │  │ 0.82 t  │  │ 0.82 t  │  │ 0.45 t  │  │ 0.012 t  │             │
 │  └────┬────┘  └────┬────┘  └────┬────┘  └────┬─────┘             │
 │       │            │            │             │                  │
 │       └────────────┴────────────┴─────────────┘                  │
 │                             │                                    │
 │  Stochastic renewables      │    Battery storage                 │
 │  ┌──────────┐ ┌──────────┐  │  ┌─────────────┐                   │
 │  │  Solar   │ │  Wind    │  │  │  Battery    │                   │
 │  │  300 MW  │ │  400 MW  │──┤  │  ±200 MW    │                   │
 │  │ weather  │ │ stoch.   │  │  │  1000 MWh   │                   │
 │  └──────────┘ └──────────┘  │  └─────────────┘                   │
 │                             │                                    │
 │                     ════════╪════════                            │
 │                             │                                    │
 │                    ┌────────┴────────┐                           │
 │                    │    DEMAND       │                           │
 │                    │  ~1200 MW avg   │                           │
 │                    │  daily profile  │                           │
 │                    │  + seasonality  │                           │
 │                    │  + noise        │                           │
 │                    └─────────────────┘                           │
 └──────────────────────────────────────────────────────────────────┘

Observation Space (20-dimensional, continuous)

Index Feature Range Description
0–3 Controllable gen. outputs [0, 1] Normalized MW output for Coal-1, Coal-2, Gas, Nuclear
4 Battery SoC [0, 1] State of charge (0 = empty, 1 = full)
5 Solar output [0, 1] Current solar generation fraction
6 Wind output [0, 1] Current wind generation fraction
7 Net demand [0, 1] Normalized residual demand after renewables
8 Frequency deviation [-1, 1] Grid freq. deviation from 50 Hz (normalized)
9–10 Time of day [-1, 1] sin/cos encoding of hour-in-day
11–12 Day of year [-1, 1] sin/cos encoding of day-in-year
13 Solar irradiance [0, 1] Available solar resource
14 Wind speed (norm.) [0, 1] Available wind resource
15 Carbon price [0, 1] Normalized CO₂ price (stochastic)
16 Demand forecast error [-1, 1] Surprise vs. day-ahead forecast
17–19 Supply/demand ratio components [0, 1] Balance indicators

Action Space (5-dimensional, continuous Box[-1, 1])

Dimension Generator Effect
0 Coal-1 Target setpoint delta from current output
1 Coal-2 Target setpoint delta from current output
2 Gas (CCGT) Target setpoint (very flexible, fast ramp)
3 Nuclear Target setpoint (very constrained ramp)
4 Battery Charge (+1) / Discharge (-1) rate

Reward Function

r_t = + α · supply_adequacy(t)        # reward meeting demand
      − β · fuel_cost(t)              # penalize operating cost
      − γ · carbon_cost(t)            # penalize emissions × carbon price
      − δ · |Δf(t)|²                  # penalize frequency deviation
      − ε · ramp_violations(t)        # penalize exceeding ramp limits
      − ζ · blackout_flag(t)          # large penalty for load shedding

Episode Structure

  • Time resolution: 30-minute dispatch intervals
  • Episode length: 48 steps (1 simulated day)
  • Stochasticity: renewable output, demand noise, carbon price walk

PPO Agent

Implemented from scratch in PyTorch. Key components:

Component Details
Network Shared MLP trunk (256-256-tanh) → separate Actor + Critic heads
Policy Diagonal Gaussian; outputs μ and log σ per action dimension
Advantage GAE-λ (λ = 0.95, γ = 0.99)
Objective PPO-Clip (ε = 0.2) + value loss (0.5 coeff) + entropy bonus (0.01)
Update 4 epochs × 8 mini-batches per rollout (2048 steps)
Optimiser Adam, lr = 3 × 10⁻⁴ with linear annealing
Gradient clip max norm = 0.5

Training Curves (1M steps)

Episode Return:

  ░░░▒▒▒▓▓▓███████████████████████  →  convergence ~400k steps
  -150                              →  +85

Fuel Cost ($/episode):

  ███████████▓▓▓▒▒▒░░░░░░░░░░░░░░   →  -43% vs. random policy

Carbon (tCO₂/episode):

  ████████████▓▓▓▒▒▒░░░░░░░░░░░░░   →  -38% vs. rule-based baseline

Blackout Rate:

  ████████▓▓▓▓▒▒░░░░░░░░░░░░░░░░░   →  0.3% vs. 8.1% baseline

Quick Start

git clone https://github.com/ToppatKing/powergrid-rl.git
cd powergrid-rl
pip install -e ".[dev]"

Train

python scripts/train.py --config configs/default.yaml --total-steps 1000000

Evaluate a checkpoint

python scripts/evaluate.py --checkpoint results/best_model.pt --episodes 100

Visualize a single episode

python scripts/visualize.py --checkpoint results/best_model.pt

Project Structure

powergrid-rl/
├── env/
│   ├── powergrid_env.py        # Gymnasium environment (core)
│   ├── generators.py           # Generator physics models
│   └── demand_model.py         # Stochastic demand simulation
├── ppo/
│   ├── network.py              # Actor-Critic neural networks
│   ├── buffer.py               # Rollout buffer + GAE computation
│   ├── agent.py                # PPO algorithm
│   └── trainer.py              # Training loop with Rich logging
├── scripts/
│   ├── train.py                # Training entry point
│   ├── evaluate.py             # Evaluation + metrics report
│   └── visualize.py            # Episode replay visualisation
├── tests/
│   ├── test_env.py             # Env API compliance + physics tests
│   └── test_agent.py           # Agent + buffer unit tests
├── configs/
│   └── default.yaml            # All hyperparameters
└── .github/workflows/ci.yml

Design Decisions

  • Why a power grid? It has real-world stakes (energy transition), rich physics (ramp constraints, frequency dynamics, stochastic renewables), and a non-trivial multi-dimensional action space.
  • Why PPO from scratch? Understanding every line — GAE computation, importance-weight clipping, entropy regularisation — is more valuable than calling stablebaselines3.PPO(...).
  • Why continuous actions? Discrete dispatch is unrealistic; continuous setpoints with ramp constraints require the agent to learn smooth, physically feasible trajectories.

License

MIT — see LICENSE.

About

Physics-based power grid dispatch environment built from scratch with Gymnasium + PPO agent (GAE, clipped surrogate, entropy bonus) implemented in PyTorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages