GowU: Go-With-Uncertainty

By Zakaria Mhammedi and James Cohan, Google Research.

GowU decouples exploration from policy optimization in reinforcement learning. Instead of training agents with intrinsic motivation, GowU uses an uncertainty-guided tree search, inspired by the Go-With-The-Winner algorithm, to systematically drive exploration without the overhead of policy optimization. This enables an order of magnitude more efficient exploration than standard intrinsic motivation baselines on hard exploration benchmarks, achieving state-of-the-art performance on Montezuma's Revenge, Pitfall!, and Venture, and solving sparse-reward MuJoCo Adroit and AntMaze tasks directly from pixels without expert demonstrations.

Installation

GowU requires Python 3.11 and careful version alignment between JAX and DeepMind's legacy RL libraries (Acme, Reverb, Launchpad). Follow the steps below in order.

1. Create a Virtual Environment

python3.11 -m venv gowu_env
source gowu_env/bin/activate

2. Install GowU and Pinned Dependencies

# Install base package in editable mode (--no-deps to avoid resolving
# dm-launchpad, which must be built from source in Step 3)
pip install --no-deps -e .


pip install \
  'numpy<2.0.0' \
  'scipy==1.11.4' \
  'jax==0.4.18' \
  'jaxlib==0.4.18' \
  'dm-haiku==0.0.10' \
  'optax==0.1.7' \
  'chex==0.1.85' \
  'tensorflow==2.15.1' \
  'tensorflow-probability==0.23.0' \
  'dm-env==1.6' \
  'dm-tree==0.1.8' \
  'gym>=0.26.0' \
  'ale-py==0.8.1' \
  'gym[accept-rom-license]' \
  'fasteners' \
  'tqdm' \
  'frozendict>=2.3.0' \
  'ml-collections>=0.1.1' \
  'absl-py>=1.0.0' \
  'Pillow' \
  'portpicker' \
  'mock' \
  'xmanager'

# Install Reverb, Acme, and Sonnet (--no-deps to prevent
# pip from pulling newer incompatible transitive dependencies)
pip install --no-deps \
  'dm-acme==0.4.0' \
  'dm-reverb==0.13.0' \
  'dm-sonnet==2.0.2'

# Pin protobuf to match TensorFlow
pip install 'protobuf==4.25.9'

3. Install dm-launchpad

dm-launchpad is no longer distributed on PyPI. Build it from source using Bazel:

git clone https://github.com/google-deepmind/launchpad.git
cd launchpad
./oss_build.sh --python 3.11
pip install /tmp/launchpad/dist/dm_launchpad-*.whl

4. Patch dm-acme for Python 3.11

dm-acme 0.4.0 references deprecated JAX and NumPy symbols that were removed in the versions required by this stack. Apply these three patches inside your virtual environment's site-packages:

Patch acme/jax/utils.py:

SITE=$(python -c "import site; print(site.getsitepackages()[0])")

sed -i 's/isinstance(x, jax\.xla\.DeviceArray)/isinstance(x, jax.Array)/g' \
    "$SITE/acme/jax/utils.py"
sed -i 's/jax\.xla\.Device/jax.Device/g' \
    "$SITE/acme/jax/utils.py"
sed -i 's/jnp\.DeviceArray/jax.Array/g' \
    "$SITE/acme/jax/utils.py"

Patch acme/jax/variable_utils.py:

sed -i 's/jax\.xla\.Device/jax.Device/g' \
    "$SITE/acme/jax/variable_utils.py"

Patch acme/wrappers/atari_wrapper.py:

sed -i 's/np\.float\b/np.float64/g' \
    "$SITE/acme/wrappers/atari_wrapper.py"

5. Verify Installation

python -c "
import jax; print(f'JAX:        {jax.__version__}')
import acme; print('Acme:       OK')
import reverb; print('Reverb:     OK')
import launchpad; print('Launchpad:  OK')
import gowu; print('GowU:       OK')
print()
print('All imports successful!')
"

MuJoCo & D4RL Setup (Adroit, AntMaze)

These additional steps are only needed for continuous control tasks (not for Atari games like MontezumaRevenge or Pitfall).

Note on Gym compatibility: Gym 0.26.2 (used in gowu_env for Atari) introduced breaking API changes that are incompatible with D4RL environments. All MuJoCo-based tasks — both Adroit (hammer-v0, relocate-v0, door-v0) and AntMaze — require a separate gowu_mujoco_env with Gym 0.23.1.

1. Install OS Dependencies & Mesa Graphics

sudo apt update
sudo apt install -y libosmesa6-dev patchelf python3-dev

2. Install MuJoCo 2.1.0 Binaries

mkdir -p ~/.mujoco
cd ~/.mujoco
curl -sSL https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -o mujoco.tar.gz
tar -xzf mujoco.tar.gz
rm mujoco.tar.gz

3. Create `gowu_mujoco_env` (for Adroit & AntMaze)

All D4RL/MuJoCo environments require Gym 0.23.1 because Gym 0.26+ introduced breaking API changes that are incompatible with D4RL's environment wrappers.

python3.11 -m venv gowu_mujoco_env
source gowu_mujoco_env/bin/activate

pip install --no-deps -e .

# Install the same dependency stack but with gym==0.23.1 instead of gym>=0.26.0
pip install \
  'numpy<2.0.0' \
  'scipy==1.11.4' \
  'jax==0.4.18' \
  'jaxlib==0.4.18' \
  'dm-haiku==0.0.10' \
  'optax==0.1.7' \
  'chex==0.1.85' \
  'tensorflow==2.15.1' \
  'tensorflow-probability==0.23.0' \
  'dm-env==1.6' \
  'dm-tree==0.1.8' \
  'gym==0.23.1' \
  'dm-control==1.0.41' \
  'mujoco==3.8.1' \
  'fasteners' \
  'tqdm' \
  'frozendict>=2.3.0' \
  'ml-collections>=0.1.1' \
  'absl-py>=1.0.0' \
  'Pillow' \
  'portpicker' \
  'mock' \
  'xmanager'

pip install --no-deps \
  'dm-acme==0.4.0' \
  'dm-reverb==0.13.0' \
  'dm-sonnet==2.0.2'
pip install 'protobuf==4.25.9'

pip install 'Cython<3' 'setuptools==69.5.1' imageio
pip install --no-deps 'mujoco-py<2.2,>=2.1'
pip install --no-deps --ignore-requires-python git+https://github.com/Farama-Foundation/d4rl@master
pip install --no-deps git+https://github.com/aravindr93/mjrl.git

# dm-launchpad must be built from source (see Step 3 in the main guide)

Apply the same dm-acme patches to this venv as well.

Environment summary:

gowu_env (Gym 0.26.2) — Atari tasks (MontezumaRevenge, Pitfall)

gowu_mujoco_env (Gym 0.23.1) — Adroit (hammer-v0, relocate-v0, door-v0) and AntMaze tasks

4. Export Environment Variables

These must be set before every GowU launch:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin
export MUJOCO_GL=osmesa

Hardware Requirements

GowU spawns many parallel processes (actors, learners, replay buffer, coordinator). The default configuration (64 particles, 32 actors, 2 groups) has the following requirements:

Resource	Recommended
CPU cores	48+
RAM	64+ GB
Disk	64 GB (for checkpoints)
GPU	Not required (CPU-only)

Typical memory breakdown (64 particles, 32 actors, 2 groups):

Coordinator (gowu): ~5–10 GB RSS (includes exploration tree)
Actors: ~1 GB each × 32 = ~32 GB
Reverb + RND + other workers: ~1–2 GB
Exploration tree: ~2–10 GB (grows over time; included in coordinator RSS)
Total: 40–60 GB in steady state

For smaller machines, reduce num_particles, num_actors, and agent.num_ensemble proportionally. For example, on a 32 GB machine:

--config.num_particles=16 --config.agent.num_ensemble=16 \
--config.n_groups=1 --config.num_actors=8

Running

GowU uses Launchpad for distributed execution.

Local multiprocessing

The simplest way to run GowU is with local_mp, which spawns all nodes (actors, learners, curriculum server) as separate processes on a single machine.

By default, Launchpad opens a tmux session with one pane per node. For long runs, set LAUNCHPAD_LAUNCH_LOCAL_TERMINAL=output_to_files to write all node logs to files instead.

cd /path/to/gowu-opensource
source gowu_env/bin/activate

# Write logs to files instead of tmux (optional)
export LAUNCHPAD_LAUNCH_LOCAL_TERMINAL=output_to_files

# Launch training
python -m gowu.run_gowu \
  --lp_launch_type=local_mp \
  --config=gowu/configs/base_config.py \
  --config.env_name=MontezumaRevenge \
  --config.num_particles=64 \
  --config.agent.num_ensemble=64 \
  --config.n_groups=2 \
  --config.num_actors=32

For Adroit or AntMaze tasks, use gowu_mujoco_env instead:

cd /path/to/gowu-opensource
source gowu_mujoco_env/bin/activate
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin
export MUJOCO_GL=osmesa
export D4RL_SUPPRESS_IMPORT_ERROR=1
export LAUNCHPAD_LAUNCH_LOCAL_TERMINAL=output_to_files

# Adroit example (hammer-v0):
python -m gowu.run_gowu \
  --lp_launch_type=local_mp \
  --config=gowu/configs/base_config.py \
  --config.env_name=hammer-v0 \
  --config.num_particles=64 \
  --config.agent.num_ensemble=64 \
  --config.n_groups=4 \
  --config.num_actors=32

# AntMaze example (antmaze-large-diverse-v0):
python -m gowu.run_gowu \
  --lp_launch_type=local_mp \
  --config=gowu/configs/base_config.py \
  --config.env_name=antmaze-large-diverse-v0 \
  --config.num_particles=64 \
  --config.agent.num_ensemble=64 \
  --config.n_groups=4 \
  --config.num_actors=32

When using output_to_files, logs are written to /tmp/launchpad_out/:

/tmp/launchpad_out/
├── gowu/0          # Curriculum server (coordinator) log
├── reverb/0        # Replay buffer log
├── rnd_learner/0   # RND learner log
└── actor/
    ├── 0           # Actor 0 log
    ├── 1           # Actor 1 log
    └── ...         # One file per actor

You can monitor the run in real time with:

tail -f /tmp/launchpad_out/gowu/0
grep "Reward =" /tmp/launchpad_out/gowu/0

Monitoring with TensorBoard

GowU logs training metrics (rewards, losses, active tree size, steps) to TensorBoard. By default, logs are written to the tb_logs subdirectory under the checkpoint directory (e.g. /tmp/gowu_checkpoints/tb_logs).

To start TensorBoard and view the metrics, run:

tensorboard --logdir=/tmp/gowu_checkpoints/tb_logs

Navigate to http://localhost:6006 in your browser.

Distributed execution

For true distributed runs across multiple machines, Launchpad supports several launch types. Set --lp_launch_type to one of the following:

local_mp — All nodes on one machine (default, shown above).
vertex_ai — Launch on Vertex AI with each node running in its own container.
ssh — Launch across multiple machines via SSH. Requires passwordless SSH access to all worker machines.

For example, to launch on Vertex AI:

python -m gowu.run_gowu \
  --lp_launch_type=vertex_ai \
  --config=gowu/configs/base_config.py \
  --config.env_name=hammer-v0 \
  --config.num_actors=32

See the Launchpad documentation for details on configuring each launch type.

Structure

gowu.py: The core curriculum server.
actor.py: Distributed actor implementation.
run_gowu.py: Launchpad program entry point.
learners/: Contains different learners (policy, rnd, contrastive).
models/: Network architectures.
env_utils/: Environment utilities and factories.
configs/: Configuration files.
optimizers/: Custom optimizers (Muon).
utils/: Utility functions.

Citation

If you use GowU in your research, please cite:

@misc{mhammedi2026decouplingexplorationpolicyoptimization,
      title={Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration},
      author={Zakaria Mhammedi and James Cohan},
      year={2026},
      eprint={2603.22273},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.22273},
}

Contributing

See CONTRIBUTING.md for guidelines. For bugs and feature requests, please open an issue on GitHub.

License

Apache 2.0. See LICENSE for details.

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
docs		docs
gowu		gowu
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GowU: Go-With-Uncertainty

Installation

1. Create a Virtual Environment

2. Install GowU and Pinned Dependencies

3. Install dm-launchpad

4. Patch dm-acme for Python 3.11

5. Verify Installation

MuJoCo & D4RL Setup (Adroit, AntMaze)

1. Install OS Dependencies & Mesa Graphics

2. Install MuJoCo 2.1.0 Binaries

3. Create `gowu_mujoco_env` (for Adroit & AntMaze)

4. Export Environment Variables

Hardware Requirements

Running

Local multiprocessing

Monitoring with TensorBoard

Distributed execution

Structure

Citation

Contributing

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GowU: Go-With-Uncertainty

Installation

1. Create a Virtual Environment

2. Install GowU and Pinned Dependencies

3. Install dm-launchpad

4. Patch dm-acme for Python 3.11

5. Verify Installation

MuJoCo & D4RL Setup (Adroit, AntMaze)

1. Install OS Dependencies & Mesa Graphics

2. Install MuJoCo 2.1.0 Binaries

3. Create gowu_mujoco_env (for Adroit & AntMaze)

4. Export Environment Variables

Hardware Requirements

Running

Local multiprocessing

Monitoring with TensorBoard

Distributed execution

Structure

Citation

Contributing

License

Disclaimer

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

3. Create `gowu_mujoco_env` (for Adroit & AntMaze)

Packages