Skip to content

Pim/feat/mujoco render decoupling#2070

Open
Nabla7 wants to merge 26 commits into
mainfrom
pim/feat/mujoco-render-decoupling
Open

Pim/feat/mujoco render decoupling#2070
Nabla7 wants to merge 26 commits into
mainfrom
pim/feat/mujoco-render-decoupling

Conversation

@Nabla7
Copy link
Copy Markdown
Collaborator

@Nabla7 Nabla7 commented May 13, 2026

Problem

We would like a runnable sim for navigation testing before the larger policy/control integration lands. After discussion with Mustafa, the higher-priority path is to land the sim-facing pieces first: load the G1/WBC MJCF into a scene, expose lidar/depth point clouds, support WASD control, and make custom scene testing straightforward.

This will reduce the scope of the current open GROOT/WBC PR while still giving navigation a collision mesh MuJoCo environment to test against.

Solution

Added the first g1-groot-wbc target, currently backed by MuJoCo, plus the mesh tooling needed to run it in scene environments.

Key changes:

  • Added mesh loading for USD/USDZ and GLB/GLTF scenes.
  • Added MJCF wrapper baking so scene meshes can be converted into MuJoCo collision geometry.
  • Extended MujocoSimModule with native viewer/headless support, kinematic base control, and lidar/depth point cloud output.
  • Added g1-groot-wbc, wired to mapping, costmaps, replanning, MovementManager, and optional Command Center control.
  • Added sim dependencies for scene mesh loading/baking: trimesh, usd-core.
  • Declared the existing Command Center Socket.IO runtime dependency in the web extra.

This PR intentionally does not add the full GROOT policy/WBC integration yet. It creates the sim/nav test surface that work can plug into next.

How to Test

Run the G1 GROOT/WBC target:

dimos run g1-groot-wbc

Open the native MuJoCo viewer:

DIMOS_MUJOCO_HEADLESS=0 dimos run g1-groot-wbc

Load a custom scene mesh:

DIMOS_MUJOCO_HEADLESS=0 \
DIMOS_SCENE_MESH_PATH=/path/to/scene.usdz \
DIMOS_SCENE_MESH_SCALE=0.05 \
dimos run g1-groot-wbc

Optional scene alignment knobs:

DIMOS_SCENE_MESH_Y_UP=1
DIMOS_SCENE_MESH_ROTATION_ZYX_DEG=0,0,0
DIMOS_SCENE_MESH_TRANSLATION=0,0,0

Contributor License Agreement

  • I have read and approved the CLA.

Comment thread pyproject.toml

web = [
"fastapi>=0.115.6",
"python-socketio>=5.11.0,<6",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed this for command center

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 33.44371% with 201 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
dimos/simulation/engines/mujoco_sim_module.py 26.17% 110 Missing ⚠️
dimos/simulation/engines/mujoco_engine.py 12.82% 68 Missing ⚠️
.../robot/unitree/g1/blueprints/basic/g1_groot_wbc.py 69.44% 17 Missing and 5 partials ⚠️
dimos/simulation/engines/mujoco_shm.py 66.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Greptile Summary

This PR introduces the g1-groot-wbc navigation test target backed by MuJoCo, along with the mesh tooling to load USD/GLB scenes as MuJoCo collision geometry. Several issues flagged in earlier review rounds (inverted boolean env-var checks, _cache_hit checking for .obj files, missing __all__ exports, full-file hash for cache keys) have been addressed.

  • Mesh bake pipeline (scene_mesh_to_mjcf.py, mesh_scene.py): per-prim VHACD convex decomposition, geometry-only GLB load to avoid OOM on large PBR scenes, stat-based cache keys, and correct __all__ exports.
  • MujocoSimModule (mujoco_sim_module.py, mujoco_engine.py): unified module owning the engine, SHM writer, lidar/depth point cloud publisher, kinematic base control via apply_root_twist, and joint-hold via enforce_position_targets.
  • Three issues from prior rounds remain open: get_camera_pose reads _data.cam_xpos/cam_xmat without _lock; wrapper.xml is written non-atomically (crash mid-write poisons the cache); and the cmd_vel subscription is wrapped in Disposable(subscribe(...)) instead of being registered directly, preventing cleanup on stop().

Confidence Score: 3/5

Three defects from prior review rounds remain unresolved and affect runtime correctness of the new sim module.

The cmd_vel subscription is never properly disposed on stop() — each restart stacks another active listener, causing duplicate velocity commands to accumulate. get_camera_pose reads cam_xpos/cam_xmat without holding the physics lock, so the point cloud publisher thread can read a torn pose mid-mj_step. The wrapper.xml write is non-atomic, so a Ctrl-C during the first VHACD bake leaves a corrupt cache file that blocks every subsequent launch until manually removed.

mujoco_sim_module.py (subscription lifecycle), mujoco_engine.py (get_camera_pose locking), and scene_mesh_to_mjcf.py (atomic cache write) need the most attention before this is used in any multi-restart or production-like environment.

Important Files Changed

Filename Overview
dimos/robot/all_blueprints.py Registers the new g1-groot-wbc blueprint entry; single-line addition, no issues.
dimos/robot/unitree/g1/blueprints/basic/g1_groot_wbc.py New blueprint wiring G1 to MuJoCo sim with nav stack. All boolean env-var reads now correctly use _env_bool.
dimos/simulation/engines/mujoco_engine.py Adds apply_root_twist, get_root_pose, enforce_position_targets, get_camera_pose. get_camera_pose reads _data.cam_xpos/cam_xmat without holding _lock — a confirmed data race from a prior review thread, still unresolved.
dimos/simulation/engines/mujoco_sim_module.py New unified sim module. The cmd_vel subscription is wrapped in Disposable(subscribe(...)) instead of being registered directly, so stop() cannot dispose it — prior thread, still unresolved.
dimos/simulation/engines/mujoco_shm.py New shared-memory IPC layer. create() has a resource-leak path if buffer allocation fails mid-loop; otherwise logic is sound.
dimos/simulation/mujoco/scene_mesh_to_mjcf.py VHACD-bake pipeline. Cache key now uses stat() metadata; _cache_hit correctly checks wrapper_path.exists(). wrapper_path.write_text() is non-atomic — prior thread, still unresolved.
dimos/simulation/mujoco/mesh_scene.py Scene mesh loading with USD/USDZ and GLB/GLTF support. all now correctly exports ScenePrimMesh and load_scene_prims.
pyproject.toml Adds trimesh and usd-core sim dependencies, declares socketio/fastapi/uvicorn in the web extra. No issues.

Reviews (8): Last reviewed commit: "Merge branch 'main' into pim/feat/mujoco..." | Re-trigger Greptile

Comment on lines +151 to +162
def _cache_key(
scene_mesh_path: Path,
robot_mjcf_path: Path,
alignment: SceneMeshAlignment,
meshdir: Path,
) -> str:
h = hashlib.sha256()
h.update(scene_mesh_path.read_bytes())
h.update(robot_mjcf_path.read_bytes())
h.update(repr(sorted(asdict(alignment).items())).encode())
h.update(str(meshdir).encode())
return h.hexdigest()[:_CACHE_KEY_LEN]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The cache key hashes the full byte content of the scene mesh file via read_bytes(). For large GLB assets — the PR description explicitly calls out scenes that allocate ~10 GB transiently during texture decode — this reads the entire file into memory a second time just to compute the key, potentially triggering the very OOM the geometry-only load path was designed to prevent. Using stat() fields (size + mtime) produces a cheap, collision-resistant key for content that doesn't change without also changing those metadata fields.

Suggested change
def _cache_key(
scene_mesh_path: Path,
robot_mjcf_path: Path,
alignment: SceneMeshAlignment,
meshdir: Path,
) -> str:
h = hashlib.sha256()
h.update(scene_mesh_path.read_bytes())
h.update(robot_mjcf_path.read_bytes())
h.update(repr(sorted(asdict(alignment).items())).encode())
h.update(str(meshdir).encode())
return h.hexdigest()[:_CACHE_KEY_LEN]
def _cache_key(
scene_mesh_path: Path,
robot_mjcf_path: Path,
alignment: SceneMeshAlignment,
meshdir: Path,
) -> str:
h = hashlib.sha256()
# Avoid read_bytes() on potentially multi-GB scene files; stat fields are
# a cheap, collision-resistant proxy that changes whenever content does.
def _file_sig(p: Path) -> str:
st = p.stat()
return f"{p}:{st.st_size}:{st.st_mtime_ns}"
h.update(_file_sig(scene_mesh_path).encode())
h.update(_file_sig(robot_mjcf_path).encode())
h.update(repr(sorted(asdict(alignment).items())).encode())
h.update(str(meshdir).encode())
return h.hexdigest()[:_CACHE_KEY_LEN]

Comment on lines +63 to +65
def _command_center_blueprints() -> list[Blueprint]:
if os.environ.get("DIMOS_ENABLE_COMMAND_CENTER", "1") in {"", "0"}:
return []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The disable check tests only for "" or "0", so DIMOS_ENABLE_COMMAND_CENTER=false, =False, =no, or =off all leave the command center enabled. The _env_bool helper defined in the same file already handles these cases correctly and should be used here for consistency.

Suggested change
def _command_center_blueprints() -> list[Blueprint]:
if os.environ.get("DIMOS_ENABLE_COMMAND_CENTER", "1") in {"", "0"}:
return []
def _command_center_blueprints() -> list[Blueprint]:
if not _env_bool("DIMOS_ENABLE_COMMAND_CENTER", True):
return []

Comment on lines +557 to +561
__all__ = [
"SceneMeshAlignment",
"load_scene_mesh",
"make_raycasting_scene",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ScenePrimMesh and load_scene_prims are consumed directly by scene_mesh_to_mjcf.py and are clearly part of this module's public interface, but they are absent from __all__. Any from dimos.mapping.mesh_scene import * usage or auto-documentation tooling will silently miss them.

Suggested change
__all__ = [
"SceneMeshAlignment",
"load_scene_mesh",
"make_raycasting_scene",
]
__all__ = [
"SceneMeshAlignment",
"ScenePrimMesh",
"load_scene_mesh",
"load_scene_prims",
"make_raycasting_scene",
]

@Nabla7
Copy link
Copy Markdown
Collaborator Author

Nabla7 commented May 13, 2026

FYI If you want to see the collision mesh in MuJoCo press 3

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

Comment on lines +282 to +284
self._gripper_ctrl_range = ctrl_range
self._gripper_joint_range = joint_range
logger.info(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Subscription not unsubscribed on stop()

Disposable(self.cmd_vel.subscribe(self._on_cmd_vel)) stores the Disposable returned by subscribe as the action of a new outer Disposable. When the outer Disposable.dispose() is called on module stop, it tries to invoke subscription() — but a reactivex.Disposable instance is not callable, so this raises TypeError and is silently swallowed by the framework's cleanup loop. The net effect is that _on_cmd_vel keeps firing after stop(), and every subsequent start() adds a second listener, causing duplicate velocity commands to accumulate. The subscription returned by subscribe() should be registered directly, matching every other subscription registration in this file.

Nabla7 and others added 4 commits May 13, 2026 03:24
Add an opt-in visual mesh passthrough for scene meshes and make VHACD more willing to decompose low-prim, high-component scenes.

Note: scenes with no well-defined object hierarchy can still produce spotty convex decomposition because the source mesh may be material-batched instead of object-batched.
"yes",
"on",
}
_disable_lidar = os.environ.get("DIMOS_DISABLE_LIDAR", "0") not in {"", "0"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Inverted semantics for "false" / "off" / "no" on several boolean flags

Three env-var checks use not in {"", "0"} instead of _env_bool. Because the set only whitelists the empty string and "0" as falsy, any other typical false-like value — "false", "off", "no" — passes the not in test and is treated as truthy. For _disable_lidar in particular the inversion is especially confusing: setting DIMOS_DISABLE_LIDAR=false would actually disable lidar (the variable evaluates to True). The same pattern recurs for _scene_mesh_collision (line 104) and DIMOS_KINEMATIC_BASE_CONTROL (line 159).

Suggested change
_disable_lidar = os.environ.get("DIMOS_DISABLE_LIDAR", "0") not in {"", "0"}
_disable_lidar = _env_bool("DIMOS_DISABLE_LIDAR", False)

Comment on lines +204 to +205
def _cache_hit(wrapper_path: Path, cache_dir: Path) -> bool:
return wrapper_path.exists() and any(cache_dir.glob("*.obj"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _cache_hit gives a false negative when no OBJ files are produced

If every prim in the scene is degenerate enough to fall through to an inline box-geom fallback (no _write_hull_obj call is reached), the bake completes and writes a valid wrapper.xml, but no .obj files are present in the cache directory. On every subsequent startup any(cache_dir.glob("*.obj")) returns False, so _cache_hit returns False and the full bake — including potentially expensive VHACD decomposition — is repeated. Checking for the wrapper alone is sufficient, since the wrapper is written only after baking succeeds.

Suggested change
def _cache_hit(wrapper_path: Path, cache_dir: Path) -> bool:
return wrapper_path.exists() and any(cache_dir.glob("*.obj"))
def _cache_hit(wrapper_path: Path, cache_dir: Path) -> bool:
return wrapper_path.exists()

asset_meshes="\n".join(asset_lines),
scene_geoms="\n".join(geom_lines),
)
wrapper_path.write_text(wrapper_xml)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-atomic cache write poisons the cache on interruption

wrapper_path.write_text(wrapper_xml) opens, writes, and closes the file in a single call, but it is not atomic. If the process is killed or the user presses Ctrl-C mid-write (plausible during a 30-minute VHACD bake), a partially-written wrapper.xml is left on disk. Because _cache_hit only checks wrapper_path.exists(), every subsequent run finds the file, returns the corrupt path as a "cache hit", and MujocoEngine then fails to load it with an opaque MuJoCo XML parse error. The try/except in g1_groot_wbc.py only wraps bake_scene_mjcf, not the engine load, so the error propagates as a hard failure. Recovery requires the user to manually delete ~/.cache/dimos/scene_meshes/<key>/.

Write to a sibling temp file and atomically rename it into place so that wrapper.xml either doesn't exist or contains complete content.

Comment on lines +577 to +580
cam_id = mujoco.mj_name2id(self._model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name)
if cam_id < 0:
return None
return self._data.cam_xpos[cam_id].copy(), self._data.cam_xmat[cam_id].copy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 get_camera_pose reads cam_xpos and cam_xmat directly from self._data without holding self._lock. Every other new method in this PR that touches self._dataapply_root_twist, get_root_pose, enforce_position_targets — acquires the lock first. A caller on the RxPY scheduler thread or the pointcloud publisher thread races with the physics sim thread, which updates these arrays during mj_step and mj_forward, and could read a torn pose mid-update.

Suggested change
cam_id = mujoco.mj_name2id(self._model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name)
if cam_id < 0:
return None
return self._data.cam_xpos[cam_id].copy(), self._data.cam_xmat[cam_id].copy()
cam_id = mujoco.mj_name2id(self._model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name)
if cam_id < 0:
return None
with self._lock:
return self._data.cam_xpos[cam_id].copy(), self._data.cam_xmat[cam_id].copy()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant