Skip to content

google-research/gowu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

GowU: Go-With-Uncertainty

GitHub | Paper

By Zakaria Mhammedi and James Cohan, Google Research.

GowU decouples exploration from policy optimization in reinforcement learning. Instead of training agents with intrinsic motivation, GowU uses an uncertainty-guided tree search, inspired by the Go-With-The-Winner algorithm, to systematically drive exploration without the overhead of policy optimization. This enables an order of magnitude more efficient exploration than standard intrinsic motivation baselines on hard exploration benchmarks, achieving state-of-the-art performance on Montezuma's Revenge, Pitfall!, and Venture, and solving sparse-reward MuJoCo Adroit and AntMaze tasks directly from pixels without expert demonstrations.

Installation

GowU requires Python 3.11 and careful version alignment between JAX and DeepMind's legacy RL libraries (Acme, Reverb, Launchpad). Follow the steps below in order.

1. Create a Virtual Environment

python3.11 -m venv gowu_env
source gowu_env/bin/activate

2. Install GowU and Pinned Dependencies

# Install base package in editable mode (--no-deps to avoid resolving
# dm-launchpad, which must be built from source in Step 3)
pip install --no-deps -e .


pip install \
  'numpy<2.0.0' \
  'scipy==1.11.4' \
  'jax==0.4.18' \
  'jaxlib==0.4.18' \
  'dm-haiku==0.0.10' \
  'optax==0.1.7' \
  'chex==0.1.85' \
  'tensorflow==2.15.1' \
  'tensorflow-probability==0.23.0' \
  'dm-env==1.6' \
  'dm-tree==0.1.8' \
  'gym>=0.26.0' \
  'ale-py==0.8.1' \
  'gym[accept-rom-license]' \
  'fasteners' \
  'tqdm' \
  'frozendict>=2.3.0' \
  'ml-collections>=0.1.1' \
  'absl-py>=1.0.0' \
  'Pillow' \
  'portpicker' \
  'mock' \
  'xmanager'

# Install Reverb, Acme, and Sonnet (--no-deps to prevent
# pip from pulling newer incompatible transitive dependencies)
pip install --no-deps \
  'dm-acme==0.4.0' \
  'dm-reverb==0.13.0' \
  'dm-sonnet==2.0.2'

# Pin protobuf to match TensorFlow
pip install 'protobuf==4.25.9'

3. Install dm-launchpad

dm-launchpad is no longer distributed on PyPI. Build it from source using Bazel:

git clone https://github.com/google-deepmind/launchpad.git
cd launchpad
./oss_build.sh --python 3.11
pip install /tmp/launchpad/dist/dm_launchpad-*.whl

4. Patch dm-acme for Python 3.11

dm-acme 0.4.0 references deprecated JAX and NumPy symbols that were removed in the versions required by this stack. Apply these three patches inside your virtual environment's site-packages:

Patch acme/jax/utils.py:

SITE=$(python -c "import site; print(site.getsitepackages()[0])")

sed -i 's/isinstance(x, jax\.xla\.DeviceArray)/isinstance(x, jax.Array)/g' \
    "$SITE/acme/jax/utils.py"
sed -i 's/jax\.xla\.Device/jax.Device/g' \
    "$SITE/acme/jax/utils.py"
sed -i 's/jnp\.DeviceArray/jax.Array/g' \
    "$SITE/acme/jax/utils.py"

Patch acme/jax/variable_utils.py:

sed -i 's/jax\.xla\.Device/jax.Device/g' \
    "$SITE/acme/jax/variable_utils.py"

Patch acme/wrappers/atari_wrapper.py:

sed -i 's/np\.float\b/np.float64/g' \
    "$SITE/acme/wrappers/atari_wrapper.py"

5. Verify Installation

python -c "
import jax; print(f'JAX:        {jax.__version__}')
import acme; print('Acme:       OK')
import reverb; print('Reverb:     OK')
import launchpad; print('Launchpad:  OK')
import gowu; print('GowU:       OK')
print()
print('All imports successful!')
"

MuJoCo & D4RL Setup (Adroit, AntMaze)

These additional steps are only needed for continuous control tasks (not for Atari games like MontezumaRevenge or Pitfall).

Note on Gym compatibility: Gym 0.26.2 (used in gowu_env for Atari) introduced breaking API changes that are incompatible with D4RL environments. All MuJoCo-based tasks — both Adroit (hammer-v0, relocate-v0, door-v0) and AntMaze — require a separate gowu_mujoco_env with Gym 0.23.1.

1. Install OS Dependencies & Mesa Graphics

sudo apt update
sudo apt install -y libosmesa6-dev patchelf python3-dev

2. Install MuJoCo 2.1.0 Binaries

mkdir -p ~/.mujoco
cd ~/.mujoco
curl -sSL https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -o mujoco.tar.gz
tar -xzf mujoco.tar.gz
rm mujoco.tar.gz

3. Create gowu_mujoco_env (for Adroit & AntMaze)

All D4RL/MuJoCo environments require Gym 0.23.1 because Gym 0.26+ introduced breaking API changes that are incompatible with D4RL's environment wrappers.

python3.11 -m venv gowu_mujoco_env
source gowu_mujoco_env/bin/activate

pip install --no-deps -e .

# Install the same dependency stack but with gym==0.23.1 instead of gym>=0.26.0
pip install \
  'numpy<2.0.0' \
  'scipy==1.11.4' \
  'jax==0.4.18' \
  'jaxlib==0.4.18' \
  'dm-haiku==0.0.10' \
  'optax==0.1.7' \
  'chex==0.1.85' \
  'tensorflow==2.15.1' \
  'tensorflow-probability==0.23.0' \
  'dm-env==1.6' \
  'dm-tree==0.1.8' \
  'gym==0.23.1' \
  'dm-control==1.0.41' \
  'mujoco==3.8.1' \
  'fasteners' \
  'tqdm' \
  'frozendict>=2.3.0' \
  'ml-collections>=0.1.1' \
  'absl-py>=1.0.0' \
  'Pillow' \
  'portpicker' \
  'mock' \
  'xmanager'

pip install --no-deps \
  'dm-acme==0.4.0' \
  'dm-reverb==0.13.0' \
  'dm-sonnet==2.0.2'
pip install 'protobuf==4.25.9'

pip install 'Cython<3' 'setuptools==69.5.1' imageio
pip install --no-deps 'mujoco-py<2.2,>=2.1'
pip install --no-deps --ignore-requires-python git+https://github.com/Farama-Foundation/d4rl@master
pip install --no-deps git+https://github.com/aravindr93/mjrl.git

# dm-launchpad must be built from source (see Step 3 in the main guide)

Apply the same dm-acme patches to this venv as well.

Environment summary:

  • gowu_env (Gym 0.26.2) — Atari tasks (MontezumaRevenge, Pitfall)
  • gowu_mujoco_env (Gym 0.23.1) — Adroit (hammer-v0, relocate-v0, door-v0) and AntMaze tasks

4. Export Environment Variables

These must be set before every GowU launch:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin
export MUJOCO_GL=osmesa

Hardware Requirements

GowU spawns many parallel processes (actors, learners, replay buffer, coordinator). The default configuration (64 particles, 32 actors, 2 groups) has the following requirements:

Resource Recommended
CPU cores 48+
RAM 64+ GB
Disk 64 GB (for checkpoints)
GPU Not required (CPU-only)

Typical memory breakdown (64 particles, 32 actors, 2 groups):

  • Coordinator (gowu): ~5–10 GB RSS (includes exploration tree)
  • Actors: ~1 GB each × 32 = ~32 GB
  • Reverb + RND + other workers: ~1–2 GB
  • Exploration tree: ~2–10 GB (grows over time; included in coordinator RSS)
  • Total: 40–60 GB in steady state

For smaller machines, reduce num_particles, num_actors, and agent.num_ensemble proportionally. For example, on a 32 GB machine:

--config.num_particles=16 --config.agent.num_ensemble=16 \
--config.n_groups=1 --config.num_actors=8

Running

GowU uses Launchpad for distributed execution.

Local multiprocessing

The simplest way to run GowU is with local_mp, which spawns all nodes (actors, learners, curriculum server) as separate processes on a single machine.

By default, Launchpad opens a tmux session with one pane per node. For long runs, set LAUNCHPAD_LAUNCH_LOCAL_TERMINAL=output_to_files to write all node logs to files instead.

cd /path/to/gowu-opensource
source gowu_env/bin/activate

# Write logs to files instead of tmux (optional)
export LAUNCHPAD_LAUNCH_LOCAL_TERMINAL=output_to_files

# Launch training
python -m gowu.run_gowu \
  --lp_launch_type=local_mp \
  --config=gowu/configs/base_config.py \
  --config.env_name=MontezumaRevenge \
  --config.num_particles=64 \
  --config.agent.num_ensemble=64 \
  --config.n_groups=2 \
  --config.num_actors=32

For Adroit or AntMaze tasks, use gowu_mujoco_env instead:

cd /path/to/gowu-opensource
source gowu_mujoco_env/bin/activate
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin
export MUJOCO_GL=osmesa
export D4RL_SUPPRESS_IMPORT_ERROR=1
export LAUNCHPAD_LAUNCH_LOCAL_TERMINAL=output_to_files

# Adroit example (hammer-v0):
python -m gowu.run_gowu \
  --lp_launch_type=local_mp \
  --config=gowu/configs/base_config.py \
  --config.env_name=hammer-v0 \
  --config.num_particles=64 \
  --config.agent.num_ensemble=64 \
  --config.n_groups=4 \
  --config.num_actors=32

# AntMaze example (antmaze-large-diverse-v0):
python -m gowu.run_gowu \
  --lp_launch_type=local_mp \
  --config=gowu/configs/base_config.py \
  --config.env_name=antmaze-large-diverse-v0 \
  --config.num_particles=64 \
  --config.agent.num_ensemble=64 \
  --config.n_groups=4 \
  --config.num_actors=32

When using output_to_files, logs are written to /tmp/launchpad_out/:

/tmp/launchpad_out/
├── gowu/0          # Curriculum server (coordinator) log
├── reverb/0        # Replay buffer log
├── rnd_learner/0   # RND learner log
└── actor/
    ├── 0           # Actor 0 log
    ├── 1           # Actor 1 log
    └── ...         # One file per actor

You can monitor the run in real time with:

tail -f /tmp/launchpad_out/gowu/0
grep "Reward =" /tmp/launchpad_out/gowu/0

Monitoring with TensorBoard

GowU logs training metrics (rewards, losses, active tree size, steps) to TensorBoard. By default, logs are written to the tb_logs subdirectory under the checkpoint directory (e.g. /tmp/gowu_checkpoints/tb_logs).

To start TensorBoard and view the metrics, run:

tensorboard --logdir=/tmp/gowu_checkpoints/tb_logs

Navigate to http://localhost:6006 in your browser.

Distributed execution

For true distributed runs across multiple machines, Launchpad supports several launch types. Set --lp_launch_type to one of the following:

  • local_mp — All nodes on one machine (default, shown above).
  • vertex_ai — Launch on Vertex AI with each node running in its own container.
  • ssh — Launch across multiple machines via SSH. Requires passwordless SSH access to all worker machines.

For example, to launch on Vertex AI:

python -m gowu.run_gowu \
  --lp_launch_type=vertex_ai \
  --config=gowu/configs/base_config.py \
  --config.env_name=hammer-v0 \
  --config.num_actors=32

See the Launchpad documentation for details on configuring each launch type.

Structure

  • gowu.py: The core curriculum server.
  • actor.py: Distributed actor implementation.
  • run_gowu.py: Launchpad program entry point.
  • learners/: Contains different learners (policy, rnd, contrastive).
  • models/: Network architectures.
  • env_utils/: Environment utilities and factories.
  • configs/: Configuration files.
  • optimizers/: Custom optimizers (Muon).
  • utils/: Utility functions.

Citation

If you use GowU in your research, please cite:

@misc{mhammedi2026decouplingexplorationpolicyoptimization,
      title={Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration},
      author={Zakaria Mhammedi and James Cohan},
      year={2026},
      eprint={2603.22273},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.22273},
}

Contributing

See CONTRIBUTING.md for guidelines. For bugs and feature requests, please open an issue on GitHub.

License

Apache 2.0. See LICENSE for details.

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages