🎥 Demo Video • 🌐 Website • 📄 Paper
Tested on Ubuntu 22.04 with Python 3.10
conda create -n mpail2 python=3.10
conda activate mpail2
pip install --upgrade pip
pip install -e .
This is the default install for mpail2, including Gym / MuJoCo and the Python packages used by the real-robot stacks. IsaacLab and the real-robot environments still require their own system-level setup first, so install Isaac Sim / IsaacLab, ROS 2 / ros2_kortex, or the Franka control stack before using those backends. See docs/INSTALL.md for environment-specific setup.
One entry point: python -m mpail2.train.train or mpail2-train. If --env matches an Isaac task (default is push when omitted), Hydra + Isaac Lab run; otherwise the Gymnasium / MuJoCo script runs (e.g. Ant-v5).
python mpail2/train/train.py --env push --headless log.no_wandb=True # Isaac
python mpail2/train/train.py --env Ant-v5 # Gym (MuJoCo)
python mpail2/train/train.py --env Humanoid-v5 log.video=Truerunner.py: Outer-most loop. Steps environment and callsact()on the learner.learner.py: Stores interactions, calls planner, and updates component models.planner.py: Performs online planning (MPPI) using component models.
All loss computations and gradient updates are performed within learner.py. For those interested in reading the implementation, learner.py is the file to begin with.
Composed by the planner are the component models in the above figure and discussed in Section 3 of the paper.
-
encoder.py: expectsDict[str,tensor]observations -
dynamics.py:$f:\mathcal{Z} \times \mathcal{A}^{H} \rightarrow \mathcal{Z}^{H+1}$ -
reward.py:$r:\mathcal{Z}\times\mathcal{Z}\rightarrow \mathbb{R}$ -
value.py: Ensembled,$Q:\mathcal{Z}\times\mathcal{A}\rightarrow \mathbb{R}$ -
sampling.py(policy): composes the policy$\pi(\mathbf{a}_{t:t+H}|z)$ . Uses policy and previous plan to sample plans from policy and fitted gaussian.
Except for sampling.py, these files are primarily torch.nn.Modules with forward definitions as you expect their mathematical representations to be.
If you found this work useful in your efforts, please consider citing:
@article{han2026planning,
title = {Planning from Observation and Interaction},
author = {Han, Tyler and Shen, Siyang and Baijal, Rohan and Ravichandran, Harine and Nemekhbold, Bat and Huang, Kevin and Jung, Sanghun and Boots, Byron},
journal = {arXiv preprint arXiv:2602.24121},
year = {2026}
}

