This repository provides a PyTorch implementation of R2-Dreamer (ICLR 2026), a computationally efficient world model that achieves high performance on continuous control benchmarks. It also includes an efficient PyTorch DreamerV3 reproduction that trains ~5x faster than a widely used codebase, along with other baselines. Selecting R2-Dreamer via the config provides an additional ~1.6x speedup over this baseline.
Install dependencies. This repository is tested with Ubuntu 24.04 and Python 3.11.
If you prefer Docker, follow docs/docker.md.
# Installing via a virtual env like uv is recommended.
pip install -r requirements.txtRun training on default settings:
python3 train.py logdir=./logdir/testMonitoring results:
tensorboard --logdir ./logdirSwitching algorithms:
# Choose an algorithm via model.rep_loss:
# r2dreamer|dreamer|infonce|dreamerpro|nedreamer
python3 train.py model.rep_loss=r2dreamernedreamer (NE-Dreamer, Bredis et al., 2026) replaces the pixel decoder
with a causal temporal transformer that predicts the next-step encoder
embedding and aligns it to a stop-gradient target via Barlow Twins. Hyper-
parameters live under model.nedreamer.*; the three Sec. 4.3 ablations are
exposed as model.nedreamer.use_transformer, model.nedreamer.use_shift,
and model.nedreamer.use_projector. The implementation plan is in
docs/nedreamer_plan.md.
Curious Replay (Kauvar & Doyle et al.,
ICML 2023) is available as a prioritized sampling option, orthogonal to the
choice of rep_loss. Enable with:
python3 train.py model.rep_loss=nedreamer model.curious_replay.enabled=True env=crafterThe buffer's per-transition priority follows Eq. 1 of the paper:
p_i = c * beta^v_i + (|L_i| + eps)^alpha
where v_i is the visit count and L_i = |dyn + rew + cont| is the
per-step world-model loss (computed in dreamer.py:_cal_grad and threaded
back through buffer.update_priority). All five c, beta, alpha, eps, p_max knobs are exposed under model.curious_replay.* with the paper's
defaults.
Trained on a single A100 80GB; ~380 env-steps/sec with model.compile=True,
~25 minutes wall-clock per game. 3 eval episodes per checkpoint
(env.eval_episode_num=3, trainer.eval_every=2e4).
| Game | Init eval | Best eval | Final eval (400k) | Notes |
|---|---|---|---|---|
| Pong | -20.7 | -11.3 @ 400k | -11.3 | Monotonic improvement; loss/ne 1025 → 40 |
| Breakout | 0.0 | 7.3 @ 380k | 3.7 | Eval oscillates between scoring and the no-FIRE time-limit stall (a known Atari-100k Breakout pathology when autostart: False) |
| Boxing | -11.0 | 65.7 @ 380k | 61.0 | Strong learning; agent dominates the bot late in training |
For easier code reading, inline tensor shape annotations are provided. See docs/tensor_shapes.md.
At the moment, the following benchmarks are available in this repository.
| Environment | Observation | Action | Budget | Description |
|---|---|---|---|---|
| Meta-World | Image | Continuous | 1M | Robotic manipulation with complex contact interactions. |
| DMC Proprio | State | Continuous | 500K | DeepMind Control Suite with low-dimensional inputs. |
| DMC Vision | Image | Continuous | 1M | DeepMind Control Suite with high-dimensional images inputs. |
| DMC Subtle | Image | Continuous | 1M | DeepMind Control Suite with tiny task-relevant objects. |
| Atari 100k | Image | Discrete | 400K | 26 Atari games. |
| Crafter | Image | Discrete | 1M | Survival environment to evaluates diverse agent abilities. |
| Memory Maze | Image | Discrete | 100M | 3D mazes to evaluate RL agents' long-term memory. |
Use Hydra to select a benchmark and a specific task using env and env.task, respectively.
python3 train.py ... env=dmc_vision env.task=dmc_walker_walkIf you run MuJoCo-based environments (DMC / MetaWorld) on headless machines, you may need to set MUJOCO_GL for offscreen rendering. Using EGL is recommended as it accelerates rendering, leading to faster simulation throughput.
# For example, when using EGL (GPU)
export MUJOCO_GL=egl
# (optional) Choose which GPU EGL uses
export MUJOCO_EGL_DEVICE_ID=0More details: Working with MuJoCo-based environments
If you want automatic formatting/basic checks before commits, you can enable pre-commit:
pip install pre-commit
# This sets up a pre-commit hook so that checks are run every time you commit
pre-commit install
# Manual pre-commit run on all files
pre-commit run --all-filesIf you find this code useful, please consider citing:
@inproceedings{
morihira2026rdreamer,
title={R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation},
author={Naoki Morihira and Amal Nahar and Kartik Bharadwaj and Yasuhiro Kato and Akinobu Hayashi and Tatsuya Harada},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=Je2QqXrcQq}
}