Pim/feat/mujoco render decoupling#2070
Conversation
|
|
||
| web = [ | ||
| "fastapi>=0.115.6", | ||
| "python-socketio>=5.11.0,<6", |
There was a problem hiding this comment.
Needed this for command center
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR introduces the
Confidence Score: 3/5Three defects from prior review rounds remain unresolved and affect runtime correctness of the new sim module. The cmd_vel subscription is never properly disposed on stop() — each restart stacks another active listener, causing duplicate velocity commands to accumulate. get_camera_pose reads cam_xpos/cam_xmat without holding the physics lock, so the point cloud publisher thread can read a torn pose mid-mj_step. The wrapper.xml write is non-atomic, so a Ctrl-C during the first VHACD bake leaves a corrupt cache file that blocks every subsequent launch until manually removed. mujoco_sim_module.py (subscription lifecycle), mujoco_engine.py (get_camera_pose locking), and scene_mesh_to_mjcf.py (atomic cache write) need the most attention before this is used in any multi-restart or production-like environment. Important Files Changed
Reviews (8): Last reviewed commit: "Merge branch 'main' into pim/feat/mujoco..." | Re-trigger Greptile |
| def _cache_key( | ||
| scene_mesh_path: Path, | ||
| robot_mjcf_path: Path, | ||
| alignment: SceneMeshAlignment, | ||
| meshdir: Path, | ||
| ) -> str: | ||
| h = hashlib.sha256() | ||
| h.update(scene_mesh_path.read_bytes()) | ||
| h.update(robot_mjcf_path.read_bytes()) | ||
| h.update(repr(sorted(asdict(alignment).items())).encode()) | ||
| h.update(str(meshdir).encode()) | ||
| return h.hexdigest()[:_CACHE_KEY_LEN] |
There was a problem hiding this comment.
The cache key hashes the full byte content of the scene mesh file via
read_bytes(). For large GLB assets — the PR description explicitly calls out scenes that allocate ~10 GB transiently during texture decode — this reads the entire file into memory a second time just to compute the key, potentially triggering the very OOM the geometry-only load path was designed to prevent. Using stat() fields (size + mtime) produces a cheap, collision-resistant key for content that doesn't change without also changing those metadata fields.
| def _cache_key( | |
| scene_mesh_path: Path, | |
| robot_mjcf_path: Path, | |
| alignment: SceneMeshAlignment, | |
| meshdir: Path, | |
| ) -> str: | |
| h = hashlib.sha256() | |
| h.update(scene_mesh_path.read_bytes()) | |
| h.update(robot_mjcf_path.read_bytes()) | |
| h.update(repr(sorted(asdict(alignment).items())).encode()) | |
| h.update(str(meshdir).encode()) | |
| return h.hexdigest()[:_CACHE_KEY_LEN] | |
| def _cache_key( | |
| scene_mesh_path: Path, | |
| robot_mjcf_path: Path, | |
| alignment: SceneMeshAlignment, | |
| meshdir: Path, | |
| ) -> str: | |
| h = hashlib.sha256() | |
| # Avoid read_bytes() on potentially multi-GB scene files; stat fields are | |
| # a cheap, collision-resistant proxy that changes whenever content does. | |
| def _file_sig(p: Path) -> str: | |
| st = p.stat() | |
| return f"{p}:{st.st_size}:{st.st_mtime_ns}" | |
| h.update(_file_sig(scene_mesh_path).encode()) | |
| h.update(_file_sig(robot_mjcf_path).encode()) | |
| h.update(repr(sorted(asdict(alignment).items())).encode()) | |
| h.update(str(meshdir).encode()) | |
| return h.hexdigest()[:_CACHE_KEY_LEN] |
| def _command_center_blueprints() -> list[Blueprint]: | ||
| if os.environ.get("DIMOS_ENABLE_COMMAND_CENTER", "1") in {"", "0"}: | ||
| return [] |
There was a problem hiding this comment.
The disable check tests only for
"" or "0", so DIMOS_ENABLE_COMMAND_CENTER=false, =False, =no, or =off all leave the command center enabled. The _env_bool helper defined in the same file already handles these cases correctly and should be used here for consistency.
| def _command_center_blueprints() -> list[Blueprint]: | |
| if os.environ.get("DIMOS_ENABLE_COMMAND_CENTER", "1") in {"", "0"}: | |
| return [] | |
| def _command_center_blueprints() -> list[Blueprint]: | |
| if not _env_bool("DIMOS_ENABLE_COMMAND_CENTER", True): | |
| return [] |
| __all__ = [ | ||
| "SceneMeshAlignment", | ||
| "load_scene_mesh", | ||
| "make_raycasting_scene", | ||
| ] |
There was a problem hiding this comment.
ScenePrimMesh and load_scene_prims are consumed directly by scene_mesh_to_mjcf.py and are clearly part of this module's public interface, but they are absent from __all__. Any from dimos.mapping.mesh_scene import * usage or auto-documentation tooling will silently miss them.
| __all__ = [ | |
| "SceneMeshAlignment", | |
| "load_scene_mesh", | |
| "make_raycasting_scene", | |
| ] | |
| __all__ = [ | |
| "SceneMeshAlignment", | |
| "ScenePrimMesh", | |
| "load_scene_mesh", | |
| "load_scene_prims", | |
| "make_raycasting_scene", | |
| ] |
|
FYI If you want to see the collision mesh in MuJoCo press |
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
| self._gripper_ctrl_range = ctrl_range | ||
| self._gripper_joint_range = joint_range | ||
| logger.info( |
There was a problem hiding this comment.
Subscription not unsubscribed on
stop()
Disposable(self.cmd_vel.subscribe(self._on_cmd_vel)) stores the Disposable returned by subscribe as the action of a new outer Disposable. When the outer Disposable.dispose() is called on module stop, it tries to invoke subscription() — but a reactivex.Disposable instance is not callable, so this raises TypeError and is silently swallowed by the framework's cleanup loop. The net effect is that _on_cmd_vel keeps firing after stop(), and every subsequent start() adds a second listener, causing duplicate velocity commands to accumulate. The subscription returned by subscribe() should be registered directly, matching every other subscription registration in this file.
Add an opt-in visual mesh passthrough for scene meshes and make VHACD more willing to decompose low-prim, high-component scenes. Note: scenes with no well-defined object hierarchy can still produce spotty convex decomposition because the source mesh may be material-batched instead of object-batched.
| "yes", | ||
| "on", | ||
| } | ||
| _disable_lidar = os.environ.get("DIMOS_DISABLE_LIDAR", "0") not in {"", "0"} |
There was a problem hiding this comment.
Inverted semantics for
"false" / "off" / "no" on several boolean flags
Three env-var checks use not in {"", "0"} instead of _env_bool. Because the set only whitelists the empty string and "0" as falsy, any other typical false-like value — "false", "off", "no" — passes the not in test and is treated as truthy. For _disable_lidar in particular the inversion is especially confusing: setting DIMOS_DISABLE_LIDAR=false would actually disable lidar (the variable evaluates to True). The same pattern recurs for _scene_mesh_collision (line 104) and DIMOS_KINEMATIC_BASE_CONTROL (line 159).
| _disable_lidar = os.environ.get("DIMOS_DISABLE_LIDAR", "0") not in {"", "0"} | |
| _disable_lidar = _env_bool("DIMOS_DISABLE_LIDAR", False) |
| def _cache_hit(wrapper_path: Path, cache_dir: Path) -> bool: | ||
| return wrapper_path.exists() and any(cache_dir.glob("*.obj")) |
There was a problem hiding this comment.
_cache_hit gives a false negative when no OBJ files are produced
If every prim in the scene is degenerate enough to fall through to an inline box-geom fallback (no _write_hull_obj call is reached), the bake completes and writes a valid wrapper.xml, but no .obj files are present in the cache directory. On every subsequent startup any(cache_dir.glob("*.obj")) returns False, so _cache_hit returns False and the full bake — including potentially expensive VHACD decomposition — is repeated. Checking for the wrapper alone is sufficient, since the wrapper is written only after baking succeeds.
| def _cache_hit(wrapper_path: Path, cache_dir: Path) -> bool: | |
| return wrapper_path.exists() and any(cache_dir.glob("*.obj")) | |
| def _cache_hit(wrapper_path: Path, cache_dir: Path) -> bool: | |
| return wrapper_path.exists() |
| asset_meshes="\n".join(asset_lines), | ||
| scene_geoms="\n".join(geom_lines), | ||
| ) | ||
| wrapper_path.write_text(wrapper_xml) |
There was a problem hiding this comment.
Non-atomic cache write poisons the cache on interruption
wrapper_path.write_text(wrapper_xml) opens, writes, and closes the file in a single call, but it is not atomic. If the process is killed or the user presses Ctrl-C mid-write (plausible during a 30-minute VHACD bake), a partially-written wrapper.xml is left on disk. Because _cache_hit only checks wrapper_path.exists(), every subsequent run finds the file, returns the corrupt path as a "cache hit", and MujocoEngine then fails to load it with an opaque MuJoCo XML parse error. The try/except in g1_groot_wbc.py only wraps bake_scene_mjcf, not the engine load, so the error propagates as a hard failure. Recovery requires the user to manually delete ~/.cache/dimos/scene_meshes/<key>/.
Write to a sibling temp file and atomically rename it into place so that wrapper.xml either doesn't exist or contains complete content.
| cam_id = mujoco.mj_name2id(self._model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name) | ||
| if cam_id < 0: | ||
| return None | ||
| return self._data.cam_xpos[cam_id].copy(), self._data.cam_xmat[cam_id].copy() |
There was a problem hiding this comment.
get_camera_pose reads cam_xpos and cam_xmat directly from self._data without holding self._lock. Every other new method in this PR that touches self._data — apply_root_twist, get_root_pose, enforce_position_targets — acquires the lock first. A caller on the RxPY scheduler thread or the pointcloud publisher thread races with the physics sim thread, which updates these arrays during mj_step and mj_forward, and could read a torn pose mid-update.
| cam_id = mujoco.mj_name2id(self._model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name) | |
| if cam_id < 0: | |
| return None | |
| return self._data.cam_xpos[cam_id].copy(), self._data.cam_xmat[cam_id].copy() | |
| cam_id = mujoco.mj_name2id(self._model, mujoco.mjtObj.mjOBJ_CAMERA, camera_name) | |
| if cam_id < 0: | |
| return None | |
| with self._lock: | |
| return self._data.cam_xpos[cam_id].copy(), self._data.cam_xmat[cam_id].copy() |
Problem
We would like a runnable sim for navigation testing before the larger policy/control integration lands. After discussion with Mustafa, the higher-priority path is to land the sim-facing pieces first: load the G1/WBC MJCF into a scene, expose lidar/depth point clouds, support WASD control, and make custom scene testing straightforward.
This will reduce the scope of the current open GROOT/WBC PR while still giving navigation a collision mesh MuJoCo environment to test against.
Solution
Added the first
g1-groot-wbctarget, currently backed by MuJoCo, plus the mesh tooling needed to run it in scene environments.Key changes:
MujocoSimModulewith native viewer/headless support, kinematic base control, and lidar/depth point cloud output.g1-groot-wbc, wired to mapping, costmaps, replanning, MovementManager, and optional Command Center control.trimesh,usd-core.webextra.This PR intentionally does not add the full GROOT policy/WBC integration yet. It creates the sim/nav test surface that work can plug into next.
How to Test
Run the G1 GROOT/WBC target:
Open the native MuJoCo viewer:
Load a custom scene mesh:
Optional scene alignment knobs:
Contributor License Agreement