Skip to content

Remove Arena's Gr00t Docker; use Isaac-Gr00t clients#607

Draft
cvolkcvolk wants to merge 14 commits intomainfrom
cvolk/refactor-remove-policy-server
Draft

Remove Arena's Gr00t Docker; use Isaac-Gr00t clients#607
cvolkcvolk wants to merge 14 commits intomainfrom
cvolk/refactor-remove-policy-server

Conversation

@cvolkcvolk
Copy link
Copy Markdown
Collaborator

@cvolkcvolk cvolkcvolk commented Apr 15, 2026

Summary [Draft; Do not merge]

This is a first shot at removing the Gr00t Docker and its dependecies. The Gr00t model is run through the official provided Isaac-Gr00t policy-server implementation.

  • Remove Arena's duplicated remote_policy/ module that re-implemented Isaac-GR00T's server_client.py (ZMQ+msgpack server, client, serializer)
  • Remove ClientSidePolicy, ActionChunkingClientSidePolicy, and Gr00tRemoteServerSidePolicy wrapper classes
  • Remove is_remote/shutdown_remote pattern from PolicyBase and policy_runner.py
  • Remove Dockerfile.gr00t_server and run_gr00t_server.sh (GR00T repo provides its own server)
  • Add Gr00tRemoteClosedloopPolicy — a thin wrapper that uses GR00T's native PolicyClient and reuses the same obs/action translation pipeline as the local Gr00tClosedloopPolicy

Arena no longer maintains any networking code. Each policy framework (GR00T, OpenPI, etc.) owns its own server and client.

Motivation

Arena's remote_policy/ module duplicated Isaac-GR00T's gr00t/policy/server_client.py — same ZMQ REP/REQ pattern, same msgpack serializer, same endpoint dispatch. Every protocol change in GR00T required a matching change in Arena. Meanwhile, other frameworks use entirely different protocols (OpenPI uses WebSocket, OpenVLA uses HTTP REST), making Arena's single-protocol approach a dead end for "bring any policy" evaluation.

The observation and action translation (gr00t_core.py) is the hard, valuable part and it already exists and is kept.

What stays the same

  • PolicyBase ABC (minus is_remote)
  • ActionChunkingState (multi-env GPU chunk buffer)
  • gr00t_core.py (obs/action translation pipeline)
  • Gr00tClosedloopPolicy (local, in-process)
  • policy_runner.py / eval_runner.py evaluation loop

Example: Gr00t policy running through the official Isaac-Gr00t repository (instead of IsaacLab-Arena-Gr00t)

The policy model runs in GR00T's own Docker container, and Arena runs in its base Docker container.

You will need two terminals.

Terminal 1: Start the GR00T Policy Server

This starts the official Isaac-GR00T Docker container and launches the policy server inside it. The server loads the model and listens for inference requests over ZMQ.

Build the GR00T Docker image (first time only):

cd submodules/Isaac-GR00T
bash docker/build.sh

This builds a Docker image called gr00t-dev based on nvcr.io/nvidia/pytorch:25.04-py3 with all GR00T dependencies installed.

Find the server IP — inside the GR00T container, run:

hostname -I

Note this IP (e.g., 192.168.222.2). You will use it in Terminal 2.

Launch the policy server:

docker run -it --rm --gpus all \
  -p 5555:5555 \
  gr00t-dev \
  bash -c "cd /workspace/gr00t && uv run python gr00t/eval/run_gr00t_server.py \
    --model-path nvidia/GR00T-N1.6-DROID \
    --embodiment-tag OXE_DROID \
    --device cuda \
    --host 0.0.0.0 --port 5555"

Wait until you see: Server is ready and listening on tcp://0.0.0.0:5555

Terminal 2: Run Arena Evaluation

This starts the Arena base Docker container (Isaac Sim + Arena, no GR00T model dependencies) and runs the evaluation against the remote policy server.

Start the Arena Docker container:

./docker/run_docker.sh

Run the evaluation inside the Arena container:

/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py \
  --viz kit \
  --policy_type isaaclab_arena_gr00t.policy.gr00t_remote_closedloop_policy.Gr00tRemoteClosedloopPolicy \
  --policy_config_yaml_path isaaclab_arena_gr00t/policy/config/droid_manip_gr00t_closedloop_config.yaml \
  --remote_host <GROOT_SERVER_IP> --remote_port 5555 \
  --enable_cameras --num_episodes 5 \
  pick_and_place_maple_table --embodiment droid_abs_joint_pos

Replace <GROOT_SERVER_IP> with the IP from Terminal 1 (e.g., 192.168.222.2).

The Arena container has no GR00T model dependencies. It only uses GR00T's lightweight PolicyClient (ZMQ + msgpack) to send observations and receive actions. All model inference happens in the GR00T container.

Delete the entire networking module: PolicyServer, PolicyClient,
MessageSerializer, ServerSidePolicy, ActionProtocol, RemotePolicyConfig,
and the server runner CLI. These duplicate Isaac-GR00T's
gr00t/policy/server_client.py. Each policy framework owns its own
server and client.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
These classes wrapped Arena's now-deleted PolicyClient. Framework
wrappers will use each framework's native client instead.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
This class wrapped GR00T's Gr00tPolicy inside Arena's ServerSidePolicy
abstraction. The GR00T repo already provides its own server via
gr00t/eval/run_gr00t_server.py.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
test_policy_client.py and test_action_chunking_client.py tested the
now-removed PolicyClient and ActionChunkingClientSidePolicy.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
The remote/local distinction is no longer an Arena-level concern. Each
framework wrapper handles its own lifecycle internally.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
The is_remote/shutdown_remote pattern is removed. Framework wrappers
handle their own client lifecycle internally.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Dockerfile.gr00t_server launched Arena's remote_policy_server_runner,
which is now deleted. The GR00T repo provides its own server via
gr00t/eval/run_gr00t_server.py with its own Docker setup.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Thin wrapper that connects to a GR00T policy server via GR00T's own
PolicyClient (gr00t.policy.server_client). Reuses the same obs/action
translation pipeline as the local Gr00tClosedloopPolicy — only the
inference call changes from in-process to remote.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Remove load_gr00t_policy_from_config() and the top-level
Gr00tPolicy import from gr00t_core.py so the module no longer
pulls in transformers/torch model stack at import time.

The local Gr00tClosedloopPolicy now loads the model via its own
_load_policy() method. This keeps gr00t_core.py lightweight for
the remote wrapper which only needs obs/action translation.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Replace the import of gr00t.experiment.launch_finetune (which pulls in
tyro, FinetuneConfig, and the training experiment module) with the
10-line inlined equivalent. The remote wrapper no longer requires any
GR00T training dependencies.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@xyao-nv
Copy link
Copy Markdown
Collaborator

xyao-nv commented Apr 15, 2026

I'm sorry I'd prefer if we do MRs in small chunks. Claude is good at generating +++ lines of codes, and for this refactoring, I think we shall be careful how it was written by Claude.

Copy link
Copy Markdown
Collaborator

@xyao-nv xyao-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break down into smaller MRs once we agree on the design.

Update ensure_groot_deps_in_path to automatically find and add the
Isaac-GR00T submodule to sys.path if gr00t is not already importable.
This allows the remote wrapper to import PolicyClient without requiring
a full pip install of the GR00T package or manual PYTHONPATH setup.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk
Copy link
Copy Markdown
Collaborator Author

@xyao-nv yes this is just prototyping. Most test will likely fail. Not meant for review.

@cvolkcvolk cvolkcvolk changed the title Remove Arena's networking layer; use framework-native clients Remove Arena's Gr00t Docker; use Isaac-Gr00t clients Apr 15, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR cleanly removes Arena's duplicated networking stack (remote_policy/ module, Docker files, server/client wrappers) in favor of using GR00T's native PolicyClient. The new Gr00tRemoteClosedloopPolicy is a well-designed thin wrapper that reuses the existing obs/action translation pipeline. The introduction of the ActionScheduler ABC is a solid design improvement that enables future scheduling strategies (temporal ensembling, pass-through, etc.). One initialization-order bug needs fixing before merge.

Design Assessment

Design is sound. The "each framework owns its own server and client" philosophy is the right direction. Removing ~1,800 lines of duplicated networking code and replacing it with a thin 237-line wrapper around the upstream client significantly reduces maintenance burden. The ActionScheduler ABC is a clean strategy pattern that decouples scheduling logic from policy logic.

Findings

🔴 Critical: _load_policy() references self.device before it is assignedisaaclab_arena_gr00t/policy/gr00t_closedloop_policy.py:84

In Gr00tClosedloopPolicy.__init__, _load_policy() is called at line 84, but self.device is not assigned until line 88. Inside _load_policy(), self.device is passed to Gr00tPolicy(... device=self.device ...), which will raise AttributeError: 'Gr00tClosedloopPolicy' object has no attribute 'device'.

The old code avoided this because load_gr00t_policy_from_config() read from policy_config.policy_device directly rather than self.device.

Fix: either move self.device = config.policy_device (and the multi-GPU rank logic) above the _load_policy() call, or have _load_policy() use self.policy_config.policy_device instead of self.device:

        # Basic attributes — must be set before _load_policy() which uses self.device
        self.num_envs = config.num_envs
        self.device = config.policy_device
        self.task_mode = TaskMode(self.policy_config.task_mode_name)

        world_size = get_world_size()
        if world_size > 1 and "cuda" in config.policy_device:
            local_rank = get_local_rank()
            self.device = f"cuda:{local_rank}"

        self.policy: Gr00tPolicy = self._load_policy()

🟡 Warning: reset() always resets the full remote server even for partial env resetsisaaclab_arena_gr00t/policy/gr00t_remote_closedloop_policy.py:231

When policy_runner.py calls policy.reset(env_ids=...) for a subset of terminated/truncated environments, Gr00tRemoteClosedloopPolicy.reset() unconditionally calls self._client.reset() which resets the entire server-side model state. In a multi-env setup, this resets the server even when only one env needs resetting, potentially invalidating in-flight predictions for other envs.

This matches the local policy's behavior (self.policy.reset() also resets everything), so it's not a regression — but it's worth noting as a correctness concern for multi-env evaluation. Consider guarding the server reset:

def reset(self, env_ids: torch.Tensor | None = None):
    if env_ids is None:
        env_ids = slice(None)
        self._client.reset()  # Only full-reset the server on full reset
    self._action_scheduler.reset(env_ids)

🟡 Warning: sys.path pollution in inlined load_modality_configisaaclab_arena_gr00t/utils/io_utils.py:221

The inlined modality config loader does sys.path.append(str(path.parent)) without cleanup. If load_gr00t_modality_config_from_file() is called multiple times (e.g., different configs), each call permanently appends to sys.path. Additionally, if the modality config module's stem collides with another module already on sys.path, importlib.import_module(path.stem) could import the wrong module.

Consider using a temporary sys.path modification or importlib.util.spec_from_file_location for more isolated loading:

spec = importlib.util.spec_from_file_location(path.stem, str(path))
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)

🔵 Suggestion: Gr00tRemoteClosedloopPolicy has no close() or lifecycle management for the clientisaaclab_arena_gr00t/policy/gr00t_remote_closedloop_policy.py

The old ClientSidePolicy had a shutdown_remote() method to explicitly close the ZMQ socket. The new Gr00tRemoteClosedloopPolicy holds a Gr00tPolicyClient reference but provides no way to clean up the connection. While policy_runner.py no longer calls shutdown logic (the remote cleanup block was removed), it would be good practice to implement close() on the policy so callers can deterministically release the network connection.

Test Coverage

  • Deleted tests: test_policy_client.py and test_action_chunking_client.py were correctly removed — they tested the now-deleted PolicyClient and ActionChunkingClientSidePolicy.
  • Missing: No new tests for Gr00tRemoteClosedloopPolicy. This is understandable since it requires a running GR00T policy server, but an integration test (similar to the deleted test_action_chunking_client.py which used a dummy server fixture) would be valuable to verify the obs/action translation pipeline works end-to-end through the remote path.
  • Missing: No unit tests for the new ActionScheduler / ActionChunkScheduler rename. The backward-compat alias ActionChunkingState = ActionChunkScheduler should be exercised by existing tests that import ActionChunkingState.

CI Status

⏳ Pre-commit check is pending.

Verdict

Minor fixes needed

The self.device initialization order bug (🔴) must be fixed — it will cause an AttributeError at construction time for Gr00tClosedloopPolicy. The other findings are improvements worth considering but aren't blocking. Overall this is a well-structured refactoring that significantly reduces Arena's maintenance surface area.

config.policy_config_yaml_path, Gr00tClosedloopPolicyConfig
)
self.policy: Gr00tPolicy = load_gr00t_policy_from_config(self.policy_config)
self.policy: Gr00tPolicy = self._load_policy()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: _load_policy() uses self.device which hasn't been assigned yet.

self.device is set on line 88, but _load_policy() is called here on line 84. Inside _load_policy(), Gr00tPolicy(... device=self.device ...) will raise AttributeError.

The old load_gr00t_policy_from_config() avoided this by reading policy_config.policy_device directly.

Fix: move the self.device assignment (lines 88-93, including the multi-GPU rank logic) above this line, or have _load_policy() use self.policy_config.policy_device.

if path.exists() and path.suffix == ".py":
sys.path.append(str(path.parent))
importlib.import_module(path.stem)
else:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 sys.path.append(str(path.parent)) permanently pollutes sys.path and risks module name collisions if the stem matches an existing package. Consider importlib.util.spec_from_file_location() for isolated loading:

spec = importlib.util.spec_from_file_location(path.stem, str(path))
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants