Continuum: A Dimension-Agnostic Neural ODE Deep Reinforcement Learning Framework for Physics-Based Environments
Kaden Seto1, Ryan Qian1, Kane Pan1
1University of Toronto
In this work, we propose Continuum, a deep RL framework and neural network architecture for physics-informed reinforcement learning. The architecture combines NODEs, Autoencoders, and model-free RL algorithms, where the latent space of the Autoencoder is represented by a time-dependent NODE that learns the continuous-time dynamics of the environment. In this architecture, we aim to build a neural network that has stronger physics alignment and interpretability, thus encouraging policies to make predictions based on structured latent representations of the learned system dynamics that promote stability and performance.
Follow the steps below to set up Continuum.
git clone https://github.com/kseto06/Continuum-
Open a terminal and run the following set of commands to install necessary libraries and dependencies:
Create a virtual environment:
python -m venv venv
If on macOS / Linux:
source venv/bin/activateIf on Windows:
venv\Scripts\activate
Install requirements:
pip install -r requirements.txt -
Once dependencies are installed, navigate to the correct directory:
cd continuum
Follow the instructions in main.py for setting up training:
-
Set the Gym/MuJoCo environment name out of the listed environments. Possible ClassicControl/Box2D/Mujoco envs to use that are physics-based are:
- Classic Control: CartPole-v1, MountainCar-v0/MountainCarContinuous-v0, Acrobot-v1, Pendulum-v1, LunarLander-v2
- Box2D: LunarLander-v3, BipedalWalker-v3
- Mujoco: HalfCheetah-v5, Hopper-v5, Walker2d-v5, Ant-v5, Humanoid-v5, HumanoidStandup-v5, Swimmer-v5, Reacher-v5, InvertedPendulum-v5, InvertedDoublePendulum-v5
env_name = "<env_name>"
-
If resuming training from a checkpoint, set the path of the
.pklfile from the training checkpoint inrl-model:env = VecNormalize.load("<path>", env)
Otherwise, use the given default
VecNormalizearguments provided, i.e:env = make_vec_env(env_name, n_envs=16, vec_env_cls=SubprocVecEnv)
-
Optionally, set the model architecture using PyTorch's
nn.Sequentialclass, otherwise the network architecture will be defaulted for you. An example of an architecture:latent_dim = 64 features_dim = 128 network_arch = nn.Sequential( DepthCat(1), nn.Linear(latent_dim + 1, features_dim), nn.Tanh(), LSTMOutputExtractor(input_size=features_dim, hidden_size=features_dim, num_layers=1, batch_first=True), nn.Tanh(), nn.Linear(features_dim, latent_dim) )
-
Ensure that the corresponding architecture uses the correct
Extractorclass. If the network architecture is:
- Standard MLP: use
MlpNodeExtractor - CNN: use
CnnNodeExtractor - LSTM: use
MlpLstmNodeExtractorIf the network architecture in Step 3 was defaulted, then theExtractorclass used will be automatically defaulted as well.
- For running training, run this command in terminal with the expected arguments:
python main.py <solver_name (str)> <total_timesteps (int)> <checkpoint_interval (int)>
In inference.py:
- Set the Gym/MuJoCo environment name to run inference on. We provide pretrained models on our NODE architecture and on standard PPO for
Humanoid-v5,HumanoidStandup-v5,Ant-v5, andHalfCheetah-v5.env_name = "<env_name>"
- Set the
.pklpath and the model's path.zipto load the model. The file paths to our provided pretrained models are given below. For thevec_pathandmodel_pathvariables, load ONLY either NODE files or PPO files:
-
Humanoid-v5:
- NODE
.pklPath:model/rl-model/Humanoid-v5/Humanoid-v5_NODE_Pretrained.pkl - NODE
.zipPath:model/rl-model/Humanoid-v5/Humanoid-v5_NODE_Pretrained.zip - PPO
.pklPath:model/rl-model/Humanoid-v5/Humanoid-v5_PPO_Pretrained.pkl - PPO
.zipPath:model/rl-model/Humanoid-v5/Humanoid-v5_PPO_Pretrained.zip
- NODE
-
HumanoidStandup-v5:
- NODE
.pklPath:model/rl-model/HumanoidStandup-v5/HumanoidStandup-v5_NODE_Pretrained.pkl - NODE
.zipPath:model/rl-model/HumanoidStandup-v5/HumanoidStandup-v5_NODE_Pretrained.zip - PPO
.pklPath:model/rl-model/HumanoidStandup-v5/HumanoidStandup-v5_PPO_Pretrained.pkl - PPO
.zipPath:model/rl-model/HumanoidStandup-v5/HumanoidStandup-v5_PPO_Pretrained.zip
- NODE
-
Ant-v5:
- NODE
.pklPath:model/rl-model/Ant-v5/Ant-v5_NODE_Pretrained.pkl - NODE
.zipPath:model/rl-model/Ant-v5/Ant-v5_NODE_Pretrained.zip - PPO
.pklPath:model/rl-model/Ant-v5/Ant-v5_PPO_Pretrained.pkl - PPO
.zipPath:model/rl-model/Ant-v5/Ant-v5_PPO_Pretrained.zip
- NODE
-
HalfCheetah-v5:
- NODE
.pklPath:model/rl-model/HalfCheetah-v5/HalfCheetah-v5_NODE_Pretrained.pkl - NODE
.zipPath:model/rl-model/HalfCheetah-v5/HalfCheetah-v5_NODE_Pretrained.zip - PPO
.pklPath:model/rl-model/HalfCheetah-v5/HalfCheetah-v5_PPO_Pretrained.pkl - PPO
.zipPath:model/rl-model/HalfCheetah-v5/HalfCheetah-v5_PPO_Pretrained.zip
vec_path = "<path_to_pkl>" model_path = "<path_to_model>"
- NODE
- Run the file in a terminal command:
python inference.py
