Skip to content

RosNav DockerModule#1568

Draft
jeff-hykin wants to merge 244 commits intodevfrom
jeff/fix/rosnav3
Draft

RosNav DockerModule#1568
jeff-hykin wants to merge 244 commits intodevfrom
jeff/fix/rosnav3

Conversation

@jeff-hykin
Copy link
Member

@jeff-hykin jeff-hykin commented Mar 15, 2026

DRAFT: Docker PR needs to be reviewed first

File change count is so high because g1 blueprints were redone

Problem

G1 parity with Go2

Solution

Breaking Changes

How to Test

Contributor License Agreement

  • I have read and approved the CLA.

spomichter and others added 30 commits January 23, 2026 07:34
… Unitree Go2 Navigation & Exploration Beta

Pre-Release v0.0.8: Unitree Go2 Navigation & Exploration Beta, Transport Updates, Documentation updates, Rerun fixes, Person follow, Readme updates

## What's Changed
* Small docs clarification about stream getters by @leshy in #1043
* Fix split view on wide monitors by @jeff-hykin in #1048
* Docs: Install & Develop  by @jeff-hykin in #1022
* Add uv to nix and fix resulting problems by @jeff-hykin in #1021
* v0.0.8 by @paul-nechifor in #1050
* Style changes in docs by @paul-nechifor in #1051
* Revert "Add uv to nix and fix resulting problems" by @leshy in #1053
* Transport benchmarks + Raw ros transport by @leshy in #1038
* feat: default to rerun-web and auto-open browser on startup (browser … by @Nabla7 in #1019
* bbox detections visual check by @leshy in #1017
* fix: only auto-open browser for rerun-web viewer backend by @Nabla7 in #1066
* move slow tests to integration by @paul-nechifor in #1063
* Streamline transport start/stop methods by @Kaweees in #1062
* Person follow skill with EdgeTAM by @paul-nechifor in #1042
* fix: increase costmap floor z_offset to avoid z-fighting by @Nabla7 in #1073
* Fixed issue #1074 by @alexlin2 in #1075
* ROS transports initial by @leshy in #1057
* Fix System Config Values for LCM on MacOS and Refactor by @jeff-hykin in #1065
* SHM Transport basic fixes by @leshy in #1041
* commented out Mem Transport test case by @leshy in #1077
* Docs/advanced streams update 2 by @leshy in #1078
* Fix more tests by @paul-nechifor in #1071
* feat: navigation docker updates from bona_local_dev by @baishibona in #1081
* Fix missing dependencies by @Kaweees in #1085
* Release readme fixes by @spomichter in #1076

## New Contributors
* @baishibona made their first contribution in #1081

**Full Changelog**: v0.0.7...v0.0.8
… Unitree Go2 Navigation & Exploration Beta

Pre-Release v0.0.8: Unitree Go2 Navigation & Exploration Beta, Transport Updates, Documentation updates, Rerun fixes, Person follow, Readme updates

## What's Changed
* Small docs clarification about stream getters by @leshy in #1043
* Fix split view on wide monitors by @jeff-hykin in #1048
* Docs: Install & Develop  by @jeff-hykin in #1022
* Add uv to nix and fix resulting problems by @jeff-hykin in #1021
* v0.0.8 by @paul-nechifor in #1050
* Style changes in docs by @paul-nechifor in #1051
* Revert "Add uv to nix and fix resulting problems" by @leshy in #1053
* Transport benchmarks + Raw ros transport by @leshy in #1038
* feat: default to rerun-web and auto-open browser on startup (browser … by @Nabla7 in #1019
* bbox detections visual check by @leshy in #1017
* fix: only auto-open browser for rerun-web viewer backend by @Nabla7 in #1066
* move slow tests to integration by @paul-nechifor in #1063
* Streamline transport start/stop methods by @Kaweees in #1062
* Person follow skill with EdgeTAM by @paul-nechifor in #1042
* fix: increase costmap floor z_offset to avoid z-fighting by @Nabla7 in #1073
* Fixed issue #1074 by @alexlin2 in #1075
* ROS transports initial by @leshy in #1057
* Fix System Config Values for LCM on MacOS and Refactor by @jeff-hykin in #1065
* SHM Transport basic fixes by @leshy in #1041
* commented out Mem Transport test case by @leshy in #1077
* Docs/advanced streams update 2 by @leshy in #1078
* Fix more tests by @paul-nechifor in #1071
* feat: navigation docker updates from bona_local_dev by @baishibona in #1081
* Fix missing dependencies by @Kaweees in #1085
* Release readme fixes by @spomichter in #1076

## New Contributors
* @baishibona made their first contribution in #1081

**Full Changelog**: v0.0.7...v0.0.8
…HTTPS from SSH, get_data change, LFS changes

v0.0.9 Release Patch: Git clone change to HTTPS from SSH, get_data change, LFS changes
…HTTPS from SSH, get_data change, LFS changes

v0.0.9 Release Patch: Git clone change to HTTPS from SSH, get_data change, LFS changes
Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization via Rerun


## Highlights

88+ commits, 20 contributors, 700+ files changed.

The TLDR: **a complete manipulation stack**, **MuJoCo simulation**, **DDS transport**, and **a rewritten visualization pipeline**. Agents are no longer bolted on top — they're refactored as native modules with direct stream access. The entire ROS message dependency has been removed from core DimOS, and we've added VR, phone, and arm teleoperation stacks. You can now vibecode a pick-and-place task from natural language to motor commands. Installation has been significantly streamlined — no more direnv, simpler setup, and the web viewer is now the default.

---

## 🚀 New Features

### Simulation
- **MuJoCo simulation module** — Run any DimOS blueprint in simulation with no hardware. Supports xArm and Unitree embodiments, parses MJCF/URDF for robot properties, monotonic clock timing (no `time.sleep`). `dimos --simulation run unitree-go2` ([#1035](#1035)) by @jca0
- **Simulation teleop blueprints** — Added simulation teleop blueprints for Piper, xArm6, and xArm7. ([#1308](#1308)) by @mustafab0

### Manipulation
- **Modular manipulation stack** — Full planning stack with Drake: FK/IK solvers (Jacobian + Drake optimization), RRT path planning, world model with obstacle monitoring, multi-robot management. xArm6/7 and Piper support. ([#1079](#1079)) by @mustafab0
- **Joint servo and cartesian controllers** — Joint position/velocity controllers and cartesian IK task with Pinocchio solver. PoseStamped stream input for real-time control. ([#1116](#1116)) by @mustafab0
- **GraspGen integration** — Grasp generation via Docker-hosted GPU model. Lazy container startup, thread-safe init, RPC `generate_grasps()` returns ranked PoseArray. ([#1119](#1119), [#1234](#1234)) by @JalajShuklaSS
- **Gripper control** — Gripper RPC methods on control coordinator, exposed adapter property for custom implementations. ([#1213](#1213)) by @mustafab0
- **Detection3D and Object support** — Object input topics, TF support on manipulation module, pointcloud-to-convex-hull for Drake imports. ([#1236](#1236)) by @mustafab0
- **Agentic pick and place** — Reimplemented manipulation skills for agent-driven pick-and-place workflows. ([#1237](#1237)) by @mustafab0

### Teleoperation
- **Quest VR teleoperation** — Full WebXR + Deno bridge stack. Quest controller data (pose, trigger, grip) streamed to DimOS modules. Monitor-style locking for control loops. ([#1215](#1215)) by @ruthwikdasyam
- **Phone teleoperation** — Control Go2 from your phone with a web-based teleop interface. ([#1280](#1280)) by @ruthwikdasyam
- **Arm teleop with Pinocchio IK** — Single and dual arm teleoperation using Pinocchio inverse kinematics. Blueprints for xArm, Piper, and dual configurations. ([#1246](#1246)) by @ruthwikdasyam

### Transports & Infrastructure
- **DDS transport protocol** — CycloneDDS transport with configurable QoS (high-throughput and reliable profiles). Optional install, benchmark integration. ([#1174](#1174)) by @Kaweees
- **Pubsub pattern subscriptions** — Glob and regex pattern matching for topic subscriptions. `subscribe_all()` for bridge-style consumers. Topic type encoding in channel strings (`/topic#module.ClassName`). ([#1114](#1114)) by @leshy
- **LCM raw bytes passthrough** — Skip `lcm_encode()` when message is already bytes. ([#1223](#1223)) by @leshy
- **Unified TimeSeriesStore** — Pluggable backends (InMemory, SQLite, Pickle, PostgreSQL) with SortedKeyList for O(log n) operations. Replaces the old replay system and TimestampedCollection. Collection API with slice, range, and streaming methods. ([#1080](#1080)) by @leshy
- **DimosROS benchmark tests** — Benchmark suite for ROS transport performance. ([#1087](#1087)) by @leshy

### Navigation
- **FASTLIO2 support** — Hardware-verified localization with arm64 support. Docker deployment with FAR Planner, terrain analysis, and bagfile playback mode. Builds or-tools from source on arm64. ([#1149](#1149)) by @baishibona
- **Native Livox + FASTLIO2 module** — First-class DimOS native module for Livox Mid-360 lidar with FASTLIO2 localization. ([#1235](#1235)) by @leshy

### Visualization
- **RerunBridge module and CLI** — New bridge that subscribes to all LCM messages and logs those with `to_rerun()` to Rerun viewer. GlobalConfig singleton, web viewer support. Replaces the old rerun initialization system. ([#1154](#1154)) by @leshy
- **Webcam rerun visualization** — Camera module logs to Rerun with pinhole projection for 3D visualization. ([#1117](#1117)) by @ruthwikdasyam
- **Default viewer switched to rerun-web** — Browser-based viewer is now the default for broader compatibility. No native viewer install needed. ([#1324](#1324)) by @spomichter

### Agents
- **Agent refactor** — Restructured agent module with cleaner imports and global config integration. ([#1211](#1211)) by @paul-nechifor
- **Timestamp knowledge** — Agents now have timestamp awareness in prompts for temporal reasoning. ([#1093](#1093)) by @ClaireBookworm
- **Observe skill** — Go2 can now observe (capture and describe) its environment via agent skill. ([#1109](#1109)) by @paul-nechifor

### Platform & Hardware
- **G1 without ROS** — Unitree G1 blueprints decoupled from ROS dependency. Lazy imports for fast startup. ([#1221](#1221)) by @jeff-hykin
- **ARM (aarch64) support** — DimOS runs on ARM hardware. Platform-conditional dependencies, open3d source builds for arm64. ([#1229](#1229)) by @jeff-hykin
- **Universal joint/hardware schema** — `HardwareComponent` dataclass with `JointState`, `JointName` type aliases. Backend registry with auto-discovery for SDK adapters. ([#1040](#1040), [#1067](#1067)) by @mustafab0

---

## 🔧 Improvements

- **Optional Dask** — Start without Dask using `--no-dask` flag. Startup time reduced from ~60s to ~45s. ([#1111](#1111), [#1232](#1232)) by @paul-nechifor
- **RPC rework** — Renamed `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream`. Added `ModuleRef`, improved type hints throughout. ([#1143](#1143)) by @jeff-hykin
- **Image class simplification** — Rewritten as pure NumPy dataclass. Removed CUDA backend, unused methods (solve_pnp, csrt_tracker), and image_impls/ directory. ([#1161](#1161)) by @leshy
- **Odometry message cleanup** — Simplified Odometry message type. ([#1256](#1256)) by @leshy
- **Remove all ROS message dependencies** — Purged ROS message types from core DimOS. Refactored rosnav to use ROSTransport. Removed dead ROS bridge code. ([#1230](#1230)) by @alexlin2
- **Removed bad function serialization** — Eliminated unnecessary serialization of Python functions. ([#1121](#1121)) by @paul-nechifor
- **Benchmark IEC units** — Switched bandwidth benchmarks from SI to IEC units for accuracy. ([#1147](#1147)) by @leshy
- **Pubsub typing improvements** — Thread-safety locks on `subscribe_new_topics` and `subscribe_all`. Proper type params across pubsub stack. ([#1153](#1153)) by @leshy
- **Autogenerated blueprint list** — Blueprints are now auto-discovered and listed. ([#1100](#1100)) by @paul-nechifor
- **Generic Buttons message** — Renamed `QuestButtons` to `Buttons` with generic field names for cross-platform teleop. ([#1261](#1261)) by @ruthwikdasyam
- **Dev container uses ros-dev image** — `./bin/dev` now runs the ROS-enabled dev image. ([#1170](#1170)) by @leshy
- **LSP support** — Added python-lsp-server and python-lsp-ruff to dev dependencies. ([#1169](#1169)) by @leshy
- **Lazy-load pyrealsense2** — RealSense camera module uses lazy imports to avoid errors in simulation environments without the SDK. ([#1309](#1309)) by @spomichter
- **Removed unused mmcv and mmengine** — Dead Detic dependencies removed, eliminating slow source builds from install. ([#1319](#1319)) by @spomichter
- **Simplified installation** — Removed direnv requirement, streamlined install instructions across all platforms. ([#1315](#1315)) by @spomichter
- **DDS extra excluded from --all-extras** — `cyclonedds` requires a source build, so `dds` is now excluded from `uv sync --all-extras` by default. ([#1318](#1318)) by @spomichter
- **Nix pre-commit skip** — Skip pre-commit install if hooks already exist. ([#1162](#1162)) by @leshy
- **Removed base-requirements** — Consolidated dependency management. ([#1098](#1098)) by @paul-nechifor
- **Removed old graspnet** — Cleaned up deprecated graspnet version. ([#1248](#1248)) by @paul-nechifor
- **Code cleanup** — Removed `tofix` markers ([#1216](#1216)), fixed ruff issues ([#1112](#1112)), removed old README_installation.md ([#1101](#1101)) by @paul-nechifor

---

## 🐛 Bug Fixes

- Fix LFS updating (move from .local to venv) ([#1090](#1090)) by @jeff-hykin
- Launch hotfixes: git clone HTTPS, get_data main branch ([#1091](#1091)) by @spomichter
- Fix camera demo not showing in Rerun ([#1148](#1148)) by @jeff-hykin
- Default to rerun native viewer ([#1099](#1099)) by @Nabla7
- Fix exploration blocking agent loop ([#1258](#1258)) by @paul-nechifor
- Fix person-follow blocking agent loop ([#1278](#1278)) by @paul-nechifor
- Skip metric3d tests on unsupported xformers GPUs (Blackwell compute capability >9.0) ([#1225](#1225)) by @leshy
- Fix manipulation tests ([#1218](#1218), [#1247](#1247)) by @jeff-hykin, @paul-nechifor
- Fix control coordinator e2e test ([#1212](#1212)) by @mustafab0
- Fix xarm7-sim broken e2e tests ([#1294](#1294)) by @paul-nechifor
- Pin langchain to restore supported providers ([#1241](#1241)) by @spomichter
- Fix missing library dependencies in Nix flake ([#1240](#1240)) by @Kaweees
- Fix discord invite link ([#1122](#1122)) by @spomichter
- macOS edgecase fix ([#1096](#1096)) by @jeff-hykin
- Fix second N in logo ([#1250](#1250)) by @jeff-hykin
- Fix Unitree Go2 minor issues ([#1307](#1307)) by @paul-nechifor
- Fix broken tests ([#1305](#1305)) by @ruthwikdasyam
- Fix `uv sync` for some macOS systems ([#1322](#1322)) by @jeff-hykin
- Fix mmcv install ([#1313](#1313)) by @paul-nechifor
- Fix mypy issues ([#1150](#1150), [#1167](#1167), [#1257](#1257)) by @leshy, @paul-nechifor, @jeff-hykin
- Fix Nix install uv pip extras ([#1321](#1321)) by @spomichter

---

## 📚 Documentation

- **Major docs overhaul** — New README with feature grid, hardware table, quickstart. Navigation, transports, data streams, and agent docs. ([#1295](#1295)) by @leshy
- **Day 1 docs** — Comprehensive getting started guide, development docs, contributing guide, architecture overview. Executable blueprint docs via md-babel-py. ([#1064](#1064)) by @jeff-hykin
- **Arm integration guide** — How-to for integrating new robotic arms with DimOS. ([#1238](#1238)) by @mustafab0
- **MCP documentation update** — Updated MCP install and usage instructions. ([#1251](#1251)) by @Kaweees
- **Docker docs** — First pass on Docker deployment documentation. ([#1151](#1151)) by @leshy
- **Transports documentation** — Encode/decode mixins, SHM examples, ROS/DDS transport docs. ([#1107](#1107)) by @leshy
- **Rerun API examples** — Updated examples for the new RerunBridge API. ([#1262](#1262)) by @jeff-hykin
- **PR template** added ([#1172](#1172)) by @christiefhyang
- **Simplified install instructions** — Removed direnv, streamlined across all platforms. ([#1315](#1315)) by @spomichter
- **Python example restored** — Added back the Python usage example. ([#1317](#1317)) by @jeff-hykin
- **Nix install updated** — Replaced uv with pip for Nix compatibility. ([#1326](#1326)) by @ruthwikdasyam
- **README improvements** ([#1311](#1311)) by @paul-nechifor
- **Simplified writing docs** — Consolidated writing_docs to a single markdown file. ([#1254](#1254)) by @jeff-hykin

---

## 🏗️ CI & Build

- **ci-complete gate** — Dynamic branch protection via single aggregated status check. MD-only PRs no longer blocked. ([#1279](#1279)) by @spomichter
- **Path-based test filtering** — Test jobs fully skip (no container spin-up) when no relevant code changed. ([#1284](#1284), [#1286](#1286)) by @spomichter
- **Navigation docker build workflow** — CI builds for the ROS navigation stack. ([#1259](#1259)) by @spomichter
- **CUDA test marker** — `@pytest.mark.cuda` for GPU-dependent tests. ([#1220](#1220)) by @jeff-hykin
- **e2e test marker** — Marked end-to-end tests for selective CI runs. ([#1110](#1110)) by @paul-nechifor
- **pytest stdin fix** — Added `-s` to default addopts for LCM autoconf compatibility. ([#1320](#1320)) by @spomichter

---

## ⚠️ Breaking Changes

- **RPC renames**: `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream` ([#1143](#1143))
- **Image class rewrite**: `CudaImage` and `NumpyImage` removed. Image is now a pure NumPy dataclass. Methods like `solve_pnp`, `csrt_tracker`, `from_depth`, `to_depth_meters` removed. ([#1161](#1161))
- **ROS messages removed from core**: All `to_ros`/`from_ros` conversion methods removed. Use `ROSTransport` instead. ([#1230](#1230))
- **QuestButtons → Buttons**: Renamed with generic field names. ([#1261](#1261))
- **RerunBridge replaces old rerun init**: `dimos.dashboard.rerun_init` removed. Use `RerunBridgeModule` or the `rerun-bridge` CLI. ([#1154](#1154))
- **Unitree directory restructuring**: `unitree_go2` → `unitree/go2`, `unitree_g1` → `unitree/g1`. Blueprint names updated. ([#1221](#1221))
- **Default viewer is now rerun-web**: Use `--viewer-backend rerun` to restore native viewer. ([#1324](#1324))

---

## Quickstart

```bash
# Install
uv pip install dimos[base,unitree]

# Try it (no hardware needed)
# NOTE: First run downloads ~2.4 GB from LFS
dimos --replay run unitree-go2

# Simulate
uv pip install dimos[base,unitree,sim]
dimos --simulation run unitree-go2
```

---

## New Contributors 🎉

- @ruthwikdasyam — Quest VR teleoperation, phone teleop, arm teleop, webcam rerun viz
- @JalajShuklaSS — GraspGen integration
- @jca0 — MuJoCo simulation module
- @christiefhyang — PR template

---

**Full Changelog**: [v0.0.9...v0.0.10](v0.0.9...v0.0.10)
Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization via Rerun


## Highlights

88+ commits, 20 contributors, 700+ files changed.

The TLDR: **a complete manipulation stack**, **MuJoCo simulation**, **DDS transport**, and **a rewritten visualization pipeline**. Agents are no longer bolted on top — they're refactored as native modules with direct stream access. The entire ROS message dependency has been removed from core DimOS, and we've added VR, phone, and arm teleoperation stacks. You can now vibecode a pick-and-place task from natural language to motor commands. Installation has been significantly streamlined — no more direnv, simpler setup, and the web viewer is now the default.

---

## 🚀 New Features

### Simulation
- **MuJoCo simulation module** — Run any DimOS blueprint in simulation with no hardware. Supports xArm and Unitree embodiments, parses MJCF/URDF for robot properties, monotonic clock timing (no `time.sleep`). `dimos --simulation run unitree-go2` ([#1035](#1035)) by @jca0
- **Simulation teleop blueprints** — Added simulation teleop blueprints for Piper, xArm6, and xArm7. ([#1308](#1308)) by @mustafab0

### Manipulation
- **Modular manipulation stack** — Full planning stack with Drake: FK/IK solvers (Jacobian + Drake optimization), RRT path planning, world model with obstacle monitoring, multi-robot management. xArm6/7 and Piper support. ([#1079](#1079)) by @mustafab0
- **Joint servo and cartesian controllers** — Joint position/velocity controllers and cartesian IK task with Pinocchio solver. PoseStamped stream input for real-time control. ([#1116](#1116)) by @mustafab0
- **GraspGen integration** — Grasp generation via Docker-hosted GPU model. Lazy container startup, thread-safe init, RPC `generate_grasps()` returns ranked PoseArray. ([#1119](#1119), [#1234](#1234)) by @JalajShuklaSS
- **Gripper control** — Gripper RPC methods on control coordinator, exposed adapter property for custom implementations. ([#1213](#1213)) by @mustafab0
- **Detection3D and Object support** — Object input topics, TF support on manipulation module, pointcloud-to-convex-hull for Drake imports. ([#1236](#1236)) by @mustafab0
- **Agentic pick and place** — Reimplemented manipulation skills for agent-driven pick-and-place workflows. ([#1237](#1237)) by @mustafab0

### Teleoperation
- **Quest VR teleoperation** — Full WebXR + Deno bridge stack. Quest controller data (pose, trigger, grip) streamed to DimOS modules. Monitor-style locking for control loops. ([#1215](#1215)) by @ruthwikdasyam
- **Phone teleoperation** — Control Go2 from your phone with a web-based teleop interface. ([#1280](#1280)) by @ruthwikdasyam
- **Arm teleop with Pinocchio IK** — Single and dual arm teleoperation using Pinocchio inverse kinematics. Blueprints for xArm, Piper, and dual configurations. ([#1246](#1246)) by @ruthwikdasyam

### Transports & Infrastructure
- **DDS transport protocol** — CycloneDDS transport with configurable QoS (high-throughput and reliable profiles). Optional install, benchmark integration. ([#1174](#1174)) by @Kaweees
- **Pubsub pattern subscriptions** — Glob and regex pattern matching for topic subscriptions. `subscribe_all()` for bridge-style consumers. Topic type encoding in channel strings (`/topic#module.ClassName`). ([#1114](#1114)) by @leshy
- **LCM raw bytes passthrough** — Skip `lcm_encode()` when message is already bytes. ([#1223](#1223)) by @leshy
- **Unified TimeSeriesStore** — Pluggable backends (InMemory, SQLite, Pickle, PostgreSQL) with SortedKeyList for O(log n) operations. Replaces the old replay system and TimestampedCollection. Collection API with slice, range, and streaming methods. ([#1080](#1080)) by @leshy
- **DimosROS benchmark tests** — Benchmark suite for ROS transport performance. ([#1087](#1087)) by @leshy

### Navigation
- **FASTLIO2 support** — Hardware-verified localization with arm64 support. Docker deployment with FAR Planner, terrain analysis, and bagfile playback mode. Builds or-tools from source on arm64. ([#1149](#1149)) by @baishibona
- **Native Livox + FASTLIO2 module** — First-class DimOS native module for Livox Mid-360 lidar with FASTLIO2 localization. ([#1235](#1235)) by @leshy

### Visualization
- **RerunBridge module and CLI** — New bridge that subscribes to all LCM messages and logs those with `to_rerun()` to Rerun viewer. GlobalConfig singleton, web viewer support. Replaces the old rerun initialization system. ([#1154](#1154)) by @leshy
- **Webcam rerun visualization** — Camera module logs to Rerun with pinhole projection for 3D visualization. ([#1117](#1117)) by @ruthwikdasyam
- **Default viewer switched to rerun-web** — Browser-based viewer is now the default for broader compatibility. No native viewer install needed. ([#1324](#1324)) by @spomichter

### Agents
- **Agent refactor** — Restructured agent module with cleaner imports and global config integration. ([#1211](#1211)) by @paul-nechifor
- **Timestamp knowledge** — Agents now have timestamp awareness in prompts for temporal reasoning. ([#1093](#1093)) by @ClaireBookworm
- **Observe skill** — Go2 can now observe (capture and describe) its environment via agent skill. ([#1109](#1109)) by @paul-nechifor

### Platform & Hardware
- **G1 without ROS** — Unitree G1 blueprints decoupled from ROS dependency. Lazy imports for fast startup. ([#1221](#1221)) by @jeff-hykin
- **ARM (aarch64) support** — DimOS runs on ARM hardware. Platform-conditional dependencies, open3d source builds for arm64. ([#1229](#1229)) by @jeff-hykin
- **Universal joint/hardware schema** — `HardwareComponent` dataclass with `JointState`, `JointName` type aliases. Backend registry with auto-discovery for SDK adapters. ([#1040](#1040), [#1067](#1067)) by @mustafab0

---

## 🔧 Improvements

- **Optional Dask** — Start without Dask using `--no-dask` flag. Startup time reduced from ~60s to ~45s. ([#1111](#1111), [#1232](#1232)) by @paul-nechifor
- **RPC rework** — Renamed `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream`. Added `ModuleRef`, improved type hints throughout. ([#1143](#1143)) by @jeff-hykin
- **Image class simplification** — Rewritten as pure NumPy dataclass. Removed CUDA backend, unused methods (solve_pnp, csrt_tracker), and image_impls/ directory. ([#1161](#1161)) by @leshy
- **Odometry message cleanup** — Simplified Odometry message type. ([#1256](#1256)) by @leshy
- **Remove all ROS message dependencies** — Purged ROS message types from core DimOS. Refactored rosnav to use ROSTransport. Removed dead ROS bridge code. ([#1230](#1230)) by @alexlin2
- **Removed bad function serialization** — Eliminated unnecessary serialization of Python functions. ([#1121](#1121)) by @paul-nechifor
- **Benchmark IEC units** — Switched bandwidth benchmarks from SI to IEC units for accuracy. ([#1147](#1147)) by @leshy
- **Pubsub typing improvements** — Thread-safety locks on `subscribe_new_topics` and `subscribe_all`. Proper type params across pubsub stack. ([#1153](#1153)) by @leshy
- **Autogenerated blueprint list** — Blueprints are now auto-discovered and listed. ([#1100](#1100)) by @paul-nechifor
- **Generic Buttons message** — Renamed `QuestButtons` to `Buttons` with generic field names for cross-platform teleop. ([#1261](#1261)) by @ruthwikdasyam
- **Dev container uses ros-dev image** — `./bin/dev` now runs the ROS-enabled dev image. ([#1170](#1170)) by @leshy
- **LSP support** — Added python-lsp-server and python-lsp-ruff to dev dependencies. ([#1169](#1169)) by @leshy
- **Lazy-load pyrealsense2** — RealSense camera module uses lazy imports to avoid errors in simulation environments without the SDK. ([#1309](#1309)) by @spomichter
- **Removed unused mmcv and mmengine** — Dead Detic dependencies removed, eliminating slow source builds from install. ([#1319](#1319)) by @spomichter
- **Simplified installation** — Removed direnv requirement, streamlined install instructions across all platforms. ([#1315](#1315)) by @spomichter
- **DDS extra excluded from --all-extras** — `cyclonedds` requires a source build, so `dds` is now excluded from `uv sync --all-extras` by default. ([#1318](#1318)) by @spomichter
- **Nix pre-commit skip** — Skip pre-commit install if hooks already exist. ([#1162](#1162)) by @leshy
- **Removed base-requirements** — Consolidated dependency management. ([#1098](#1098)) by @paul-nechifor
- **Removed old graspnet** — Cleaned up deprecated graspnet version. ([#1248](#1248)) by @paul-nechifor
- **Code cleanup** — Removed `tofix` markers ([#1216](#1216)), fixed ruff issues ([#1112](#1112)), removed old README_installation.md ([#1101](#1101)) by @paul-nechifor

---

## 🐛 Bug Fixes

- Fix LFS updating (move from .local to venv) ([#1090](#1090)) by @jeff-hykin
- Launch hotfixes: git clone HTTPS, get_data main branch ([#1091](#1091)) by @spomichter
- Fix camera demo not showing in Rerun ([#1148](#1148)) by @jeff-hykin
- Default to rerun native viewer ([#1099](#1099)) by @Nabla7
- Fix exploration blocking agent loop ([#1258](#1258)) by @paul-nechifor
- Fix person-follow blocking agent loop ([#1278](#1278)) by @paul-nechifor
- Skip metric3d tests on unsupported xformers GPUs (Blackwell compute capability >9.0) ([#1225](#1225)) by @leshy
- Fix manipulation tests ([#1218](#1218), [#1247](#1247)) by @jeff-hykin, @paul-nechifor
- Fix control coordinator e2e test ([#1212](#1212)) by @mustafab0
- Fix xarm7-sim broken e2e tests ([#1294](#1294)) by @paul-nechifor
- Pin langchain to restore supported providers ([#1241](#1241)) by @spomichter
- Fix missing library dependencies in Nix flake ([#1240](#1240)) by @Kaweees
- Fix discord invite link ([#1122](#1122)) by @spomichter
- macOS edgecase fix ([#1096](#1096)) by @jeff-hykin
- Fix second N in logo ([#1250](#1250)) by @jeff-hykin
- Fix Unitree Go2 minor issues ([#1307](#1307)) by @paul-nechifor
- Fix broken tests ([#1305](#1305)) by @ruthwikdasyam
- Fix `uv sync` for some macOS systems ([#1322](#1322)) by @jeff-hykin
- Fix mmcv install ([#1313](#1313)) by @paul-nechifor
- Fix mypy issues ([#1150](#1150), [#1167](#1167), [#1257](#1257)) by @leshy, @paul-nechifor, @jeff-hykin
- Fix Nix install uv pip extras ([#1321](#1321)) by @spomichter

---

## 📚 Documentation

- **Major docs overhaul** — New README with feature grid, hardware table, quickstart. Navigation, transports, data streams, and agent docs. ([#1295](#1295)) by @leshy
- **Day 1 docs** — Comprehensive getting started guide, development docs, contributing guide, architecture overview. Executable blueprint docs via md-babel-py. ([#1064](#1064)) by @jeff-hykin
- **Arm integration guide** — How-to for integrating new robotic arms with DimOS. ([#1238](#1238)) by @mustafab0
- **MCP documentation update** — Updated MCP install and usage instructions. ([#1251](#1251)) by @Kaweees
- **Docker docs** — First pass on Docker deployment documentation. ([#1151](#1151)) by @leshy
- **Transports documentation** — Encode/decode mixins, SHM examples, ROS/DDS transport docs. ([#1107](#1107)) by @leshy
- **Rerun API examples** — Updated examples for the new RerunBridge API. ([#1262](#1262)) by @jeff-hykin
- **PR template** added ([#1172](#1172)) by @christiefhyang
- **Simplified install instructions** — Removed direnv, streamlined across all platforms. ([#1315](#1315)) by @spomichter
- **Python example restored** — Added back the Python usage example. ([#1317](#1317)) by @jeff-hykin
- **Nix install updated** — Replaced uv with pip for Nix compatibility. ([#1326](#1326)) by @ruthwikdasyam
- **README improvements** ([#1311](#1311)) by @paul-nechifor
- **Simplified writing docs** — Consolidated writing_docs to a single markdown file. ([#1254](#1254)) by @jeff-hykin

---

## 🏗️ CI & Build

- **ci-complete gate** — Dynamic branch protection via single aggregated status check. MD-only PRs no longer blocked. ([#1279](#1279)) by @spomichter
- **Path-based test filtering** — Test jobs fully skip (no container spin-up) when no relevant code changed. ([#1284](#1284), [#1286](#1286)) by @spomichter
- **Navigation docker build workflow** — CI builds for the ROS navigation stack. ([#1259](#1259)) by @spomichter
- **CUDA test marker** — `@pytest.mark.cuda` for GPU-dependent tests. ([#1220](#1220)) by @jeff-hykin
- **e2e test marker** — Marked end-to-end tests for selective CI runs. ([#1110](#1110)) by @paul-nechifor
- **pytest stdin fix** — Added `-s` to default addopts for LCM autoconf compatibility. ([#1320](#1320)) by @spomichter

---

## ⚠️ Breaking Changes

- **RPC renames**: `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream` ([#1143](#1143))
- **Image class rewrite**: `CudaImage` and `NumpyImage` removed. Image is now a pure NumPy dataclass. Methods like `solve_pnp`, `csrt_tracker`, `from_depth`, `to_depth_meters` removed. ([#1161](#1161))
- **ROS messages removed from core**: All `to_ros`/`from_ros` conversion methods removed. Use `ROSTransport` instead. ([#1230](#1230))
- **QuestButtons → Buttons**: Renamed with generic field names. ([#1261](#1261))
- **RerunBridge replaces old rerun init**: `dimos.dashboard.rerun_init` removed. Use `RerunBridgeModule` or the `rerun-bridge` CLI. ([#1154](#1154))
- **Unitree directory restructuring**: `unitree_go2` → `unitree/go2`, `unitree_g1` → `unitree/g1`. Blueprint names updated. ([#1221](#1221))
- **Default viewer is now rerun-web**: Use `--viewer-backend rerun` to restore native viewer. ([#1324](#1324))

---

## Quickstart

```bash
# Install
uv pip install dimos[base,unitree]

# Try it (no hardware needed)
# NOTE: First run downloads ~2.4 GB from LFS
dimos --replay run unitree-go2

# Simulate
uv pip install dimos[base,unitree,sim]
dimos --simulation run unitree-go2
```

---

## New Contributors 🎉

- @ruthwikdasyam — Quest VR teleoperation, phone teleop, arm teleop, webcam rerun viz
- @JalajShuklaSS — GraspGen integration
- @jca0 — MuJoCo simulation module
- @christiefhyang — PR template

---

**Full Changelog**: [v0.0.9...v0.0.10](v0.0.9...v0.0.10)
docs(go2): add getting started guide (#1339)
docs(go2): add getting started guide (#1339)
- Fix imports to use direct module paths (no __init__.py)
- Rename 'image' stream to 'color_image' to match ROSNav output
- Update StreamCollector constructor to **kwargs-only (new Module API)
RpcCall.__getstate__/__setstate__ only serialized name and remote_name,
losing the _timeout attribute. This caused 'RpcCall has no attribute
_timeout' errors when skills were invoked via rpc_calls on Docker-hosted
modules (the RpcCall gets pickled across the LCM RPC boundary).

Also add NavigationInterface to ROSNav's base classes.
test_rosnav_goal_navigation: sends a nav goal via set_goal() and verifies
the robot's odom converges to within tolerance of the target.

test_rosnav_agentic: full agent stack (MockModel + NavSkillProxy + ROSNav)
where the LLM agent receives 'go to (2, 0)', calls goto_global via the
rpc_calls skill proxy, and the test verifies the robot reaches the target.

Includes NavSkillProxy module that bridges Agent skills to DockerModule
RPCs (same pattern as production agentic blueprints), and TestAgent that
filters DockerModules from on_system_modules to avoid stale pickle issue.
The agent starts processing messages as soon as build() completes,
but the test was waiting 10s warmup before marking the start position.
By then, navigation had already happened. Mark start immediately after
first odom to capture position before agent moves the robot.

Also simplified the test architecture with better comments explaining
the DockerModule serialization constraints and NavSkillBridge pattern.
- Add proper _disposables cleanup for stream subscriptions
- Use subprocess.check_output instead of subprocess.run
- Move inline import (autoconnect) to top of file
@jeff-hykin jeff-hykin marked this pull request as draft March 15, 2026 17:31
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 15, 2026

Greptile Summary

Large PR that achieves G1 parity with Go2 by refactoring the Docker module system, adding new G1 blueprints/effectors, and introducing a Docker-based ROS navigation module. The core Docker deployment pipeline is significantly improved with content-hash rebuilds, container reconnection, per-method RPC timeouts, and structured parallel deployment with cleanup on partial failure.

  • Docker module overhaul: DockerModule now implements ModuleProxyProtocol, moves container lifecycle into __init__ with proper cleanup, adds reconnect support (docker_reconnect_container), and uses content-hash-based rebuild detection instead of image-exists checks
  • RPC timeout refactor: Removed hardcoded 1200s start timeout from call_sync; timeout resolution is now per-method via rpc_timeouts dict on module classes, with RPCClient.default_rpc_timeout (120s) and RPCClient.start_rpc_timeout (1200s) as defaults
  • Parallel deploy with cleanup: New safe_thread_map utility and DockerWorkerManager ensure that when parallel deployments partially fail, successfully-started modules/containers are cleaned up before raising ExceptionGroup
  • G1 blueprints reorganized: Old G1 blueprints moved to legacy/ subdirectory; new blueprints added for agentic, basic, and perceptive configurations with MuJoCo and onboard variants
  • G1 high-level effectors: New HighLevelG1Spec protocol with two implementations — G1HighLevelDdsSdk (native Unitree SDK2 over DDS for onboard) and G1HighLevelWebRtc (remote control)
  • ROS navigation Docker module: New ROSNavModule (~940 lines) encapsulating full ROS2 nav stack in a Docker container with LCM↔ROS bridge, teleop override, and Dockerfile for amd64/arm64
  • aarch64/Jetson fixes: Worker entrypoint preloads static-TLS native libs on aarch64 to prevent cannot allocate memory in static TLS block errors; is_jetson() utility added
  • Note: PR is marked DRAFT — author mentions needing to remove irrelevant changes (like dot.py)

Confidence Score: 3/5

  • This is a large DRAFT PR with solid architecture but a couple of latent issues in the deployment pipeline that should be addressed before merge.
  • Score of 3 reflects the PR's large scope (76 files, +6682/-596 lines) with two notable issues: (1) RpcCall's default timeout=0 is a latent trap that will cause instant TimeoutError for any future caller that omits the argument, and (2) deploy_parallel's cross-group cleanup gap where successful worker modules aren't stopped when Docker deployment fails (or vice versa). The PR is also marked DRAFT by the author. The overall design is well-thought-out with good test coverage for the new parallel deployment logic.
  • Pay close attention to dimos/core/rpc_client.py (timeout default), dimos/core/module_coordinator.py (cross-group cleanup in deploy_parallel), and dimos/web/websocket_vis/websocket_vis_module.py (debug logging).

Important Files Changed

Filename Overview
dimos/core/docker_runner.py Major refactor: DockerModule now implements ModuleProxyProtocol, container lifecycle moved into init with proper cleanup, reconnect support added, build-hash-based rebuild, and per-method RPC timeout resolution. Well-structured with good error handling.
dimos/core/docker_build.py Added content-hash-based rebuild detection using Docker image labels, configurable build timeout for long aarch64 builds, and extra build args support. Clean implementation.
dimos/core/module_coordinator.py Docker vs worker module routing added to deploy/deploy_parallel. deploy_parallel has a cross-group cleanup gap where successful modules from one group aren't stopped when the other group fails.
dimos/core/rpc_client.py Added ModuleProxyProtocol, per-method timeout resolution, and explicit timeout parameter on RpcCall. The default timeout=0 for RpcCall is a latent bug that will cause instant TimeoutError for any future caller that omits the argument.
dimos/utils/safe_thread_map.py New utility for parallel thread execution with structured error handling and cleanup callbacks. Includes ExceptionGroup polyfill for Python 3.10. Well-designed with good documentation.
dimos/core/worker.py Added aarch64 static TLS preloading workaround for Jetson devices and improved deploy_module error handling with finally block for slot reservation. The TLS fix is well-documented and targeted.
dimos/core/worker_manager.py Replaced ThreadPoolExecutor with safe_thread_map for deploy_parallel, adding proper cleanup of successful RPCClients when siblings fail. Good improvement to error handling.
dimos/navigation/rosnav/rosnav_module.py New 942-line Docker-based ROS navigation module for Unitree robots. Comprehensive ROS topic remapping, teleop override, and stream bridging between DimOS and ROS2 container.
dimos/core/docker_worker_manager.py New static helper for parallel Docker module deployment with proper cleanup of successful containers when siblings fail.
dimos/protocol/rpc/spec.py Removed hardcoded 1200s start timeout from call_sync; timeout resolution is now handled by the caller (RpcCall/RPCClient). Clean separation of concerns.
dimos/robot/unitree/g1/effectors/high_level/dds_sdk.py New G1 high-level control module using native Unitree SDK2 over DDS. Implements HighLevelG1Spec with velocity control, arm gestures, and FSM state management.
dimos/web/websocket_vis/websocket_vis_module.py Improved error logging for stream subscriptions and moved browser open to a background thread. Contains leftover [DEBUG] log lines that should be cleaned up.
dimos/visualization/vis_module.py New shared visualization module factory that creates viewer-specific blueprints based on global config (foxglove, rerun, etc.).
dimos/core/module.py Added rpc_timeouts dict for per-method timeout overrides. Removed configure_stream RPC (replaced by set_transport forwarding in DockerModule). Clean changes.
dimos/agents/agent.py Added rpc_timeouts override for on_system_modules (180s) and explicit timeout parameter on RpcCall construction.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    MC[ModuleCoordinator.deploy_parallel] --> Split{Is Docker module?}
    Split -->|Yes| DI[DockerWorkerManager.deploy_parallel]
    Split -->|No| WI[WorkerManager.deploy_parallel]
    
    DI --> DM1[DockerModule.__init__]
    DI --> DM2[DockerModule.__init__]
    
    DM1 --> Hash{Build hash changed?}
    Hash -->|Yes| Build[build_image]
    Hash -->|No| Pull{Image exists?}
    Pull -->|No| DockerPull[docker pull]
    Pull -->|Yes| Run
    Build --> Run[docker run -d]
    DockerPull --> Run
    Run --> WaitRPC[_wait_for_rpc]
    WaitRPC --> Ready[Container Ready]
    
    WI --> W[Worker.deploy_module]
    W --> RPC[RPCClient]
    
    Ready --> STM[start_all_modules via safe_thread_map]
    RPC --> STM
    STM --> StartRPC[module.start via RPC]
    STM --> OnSys[on_system_modules]
    
    subgraph Error Handling
        STM -->|Failure| EG[ExceptionGroup]
        DI -->|Partial Failure| Cleanup[Stop successful modules]
        WI -->|Partial Failure| CleanupW[Stop successful RPCClients]
    end
Loading

Last reviewed commit: a98cfe3

remote_name: str,
unsub_fns: list, # type: ignore[type-arg]
stop_client: Callable[[], None] | None = None,
timeout: float = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dangerous default timeout of 0

timeout: float = 0 will cause event.wait(0) in spec.py:58 to return immediately (non-blocking), resulting in an instant TimeoutError for any RpcCall that doesn't pass an explicit timeout. While all current call sites do pass an explicit value, this is a dangerous default — any future caller that omits timeout will get a confusing immediate timeout. Consider defaulting to RPCClient.default_rpc_timeout (120.0) to make the safe path the easy path.

Suggested change
timeout: float = 0,
timeout: float = 120.0,

Comment on lines +199 to +204
transport = getattr(self.global_costmap, "_transport", "MISSING")
logger.info(f"[DEBUG] global_costmap transport before subscribe: {transport}")
try:
unsub = self.global_costmap.subscribe(self._on_global_costmap)
self._disposables.add(Disposable(unsub))
except Exception:
...
logger.info(f"[DEBUG] Subscribed to global_costmap OK, transport={transport}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logging left in production code

These [DEBUG] log lines appear to be leftover debugging instrumentation. They should be removed or demoted to logger.debug() before merge.

Suggested change
transport = getattr(self.global_costmap, "_transport", "MISSING")
logger.info(f"[DEBUG] global_costmap transport before subscribe: {transport}")
try:
unsub = self.global_costmap.subscribe(self._on_global_costmap)
self._disposables.add(Disposable(unsub))
except Exception:
...
logger.info(f"[DEBUG] Subscribed to global_costmap OK, transport={transport}")
try:
unsub = self.global_costmap.subscribe(self._on_global_costmap)
self._disposables.add(Disposable(unsub))
logger.info("Subscribed to global_costmap")

Comment on lines +185 to +193
def _on_errors(
_outcomes: list[Any], _successes: list[Any], errors: list[Exception]
) -> None:
_register()
raise ExceptionGroup("deploy_parallel failed", errors)

safe_thread_map([_deploy_workers, _deploy_docker], lambda fn: fn(), _on_errors)
_register()
return results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial failure doesn't clean up cross-group deployments

If _deploy_docker fails but _deploy_workers already succeeded, the successfully-deployed worker modules in results are registered via _register() but never cleaned up. Similarly, if _deploy_workers fails but Docker modules already succeeded, those Docker modules are orphaned.

The inner safe_thread_map calls within each deployer have their own cleanup for intra-group failures, but there's no cross-group teardown. When one of the two group functions raises, _on_errors calls _register() and re-raises — leaving the modules from the other (successful) group running without any shutdown path.

Consider stopping all successfully-deployed modules in _on_errors before re-raising:

def _on_errors(
    _outcomes: list[Any], _successes: list[Any], errors: list[Exception]
) -> None:
    # Stop any modules that deployed successfully before the failure
    for module in results:
        if module is not None:
            with suppress(Exception):
                module.stop()
    raise ExceptionGroup("deploy_parallel failed", errors)

@@ -0,0 +1,160 @@
#!/usr/bin/env python3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adaptation of existing code, renamed because we now have both Unity and Mujoco Sim

@jeff-hykin jeff-hykin changed the title Jeff/fix/rosnav3 RosNav DockerModule Mar 15, 2026
jeff-hykin and others added 18 commits March 15, 2026 11:58
- Module.rpc_timeouts dict allows per-method timeout overrides
- RPCClient resolves timeouts from module's rpc_timeouts, with defaults:
  start=1200s, everything else=120s
- RpcCall carries resolved timeout, passes it to call_sync
- DockerModule mirrors the same pattern via _resolve_timeout()
- call_sync no longer auto-detects 'start' — caller is responsible
- Pickle compat: RpcCall supports both old 2-tuple and new 3-tuple state
Not related to rosnav feature work.
- Remove @DataClass(kw_only=True) from HelloDockerConfig (conflicts with Pydantic)
- Pop global_config from kwargs before passing to config class
- Store rpc_timeouts/default_rpc_timeout in PubSubRPCMixin (not Protocol)
- Remove __init__ from RPCClient Protocol and RPCSpec (structural typing only)
- Use short 3s timeout for readiness probe polling (was using 120s default)
- Extract NavigationStrategy/VlModelName into lightweight types.py files
  (same fix as jeff/fix/help — prevents torch import in Docker containers)
Address Paul's review comment to use check_output with plain 'cowsay'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants