Skip to content

Jalaj bt pickplace#1604

Open
JalajShuklaSS wants to merge 6 commits intodevfrom
jalaj_BT_pickplace
Open

Jalaj bt pickplace#1604
JalajShuklaSS wants to merge 6 commits intodevfrom
jalaj_BT_pickplace

Conversation

@JalajShuklaSS
Copy link
Contributor

Problem

The existing PickAndPlaceModule tightly couples perception, grasp generation, planning, and execution into a single monolithic skill-based module. This makes it difficult to compose behavior-tree (BT) driven pick-and-place workflows where the BT orchestrates each phase independently.

Solution

New BTManipulationModule — a slim ManipulationModule subclass in dimos/manipulation/bt/ that provides:

  • Perception integration (objects stream from OSR, obstacle monitor)
  • GraspGen Docker grasp generation via RPC
  • MeshCat grasp visualization
  • No @skill methods — only the BT PickPlaceModule registers skills with the agent, avoiding skill registration conflicts

BT PickPlaceModule updated to route all RPC calls through BTManipulationModule instead of the monolithic PickAndPlaceModule, and adds a detect skill that chains set_prompts -> refresh_obstacles -> list_cached_detections.

Bug fixes:

  • Pydantic pickle errors: Changed Iterable[T] / Sequence[T] type hints to list[T] in RobotModelConfig — Pydantic v2 wraps these in unpicklable ValidatorIterator objects, breaking worker process deployment
  • OSR init: Added **kwargs passthrough so ObjectSceneRegistrationModule accepts the g=GlobalConfig(...) kwarg from the module system
  • PointCloud2 pickle: Added __getstate__ to strip cached Open3D OrientedBoundingBox objects that can't be pickled for LCM transport
  • xArm adapter: Added clean_error() + motion_enable() on connect and error-state logging (error_code, warn_code, state, mode) on command failure
  • Docker: Added langchain-core to [docker] extras since dimos.core.module imports it at the top level

Key design decision: Full architectural separation between BT and non-BT workflows. The BT blueprint uses BTManipulationModule (no ties to PickAndPlaceModule), keeping both paths independent and composable.

Breaking Changes

None — the existing PickAndPlaceModule, xarm_perception, and xarm_perception_agent blueprints are unchanged.

How to Test

# Terminal 1: Start coordinator + BT agent
dimos run coordinator-xarm7 bt-pick-place-agent

# Terminal 2: Start human CLI
humancli

# In the CLI, test the pipeline:
# 1. Detection: "detect the red cup"
# 2. Pick: "pick the red cup"
# 3. Place: "place back or place in front of the coconut water"
# 4. Go home: "go home"
# Also just get amazed by opening the drake simulator to see grasp poses 

Jalaj Shukla added 2 commits March 18, 2026 20:19
Introduce a py_trees Behavior Tree module (PickPlaceModule) that wraps
PickAndPlaceModule RPCs with retry, recovery, grasp verification, and
interruptible execution. Includes gripper adaptation, workspace filtering,
and DL-based grasp generation via GraspGen Docker.

- New dimos/manipulation/bt/ package: actions, conditions, trees, pick_place_module
- New gripper_adapter for cross-gripper grasp pose adaptation
- Add visualize_grasps RPC to PickAndPlaceModule (MeshCat wireframes)
- Add rpc_timeout passthrough in RpcCall for long Docker RPCs
- Fix GraspGen Dockerfile: --no-deps, spconv-cu120, full runtime deps
- Add angular_distance to Quaternion for pose verification
- Add xarm7 joint limits, bt-pick-place-agent blueprint
- Add py-trees>=2.2,<3 dependency
- Remove @DataClass from PickPlaceModuleConfig (ModuleConfig is now Pydantic)
- Replace dataclasses.field with pydantic.Field for home_joints default
- Update __init__ signature: drop *args (new module system is kwargs-only)
- Fix all_blueprints path: manipulation_blueprints to blueprints (file renamed on dev)
@JalajShuklaSS JalajShuklaSS self-assigned this Mar 19, 2026
…issues

- Add BTManipulationModule: slim ManipulationModule subclass for BT workflow
  with perception integration, GraspGen Docker, and MeshCat visualization
- Fix Pydantic pickle errors: change Iterable/Sequence to list in config
- Fix OSR __init__ to accept **kwargs for GlobalConfig passthrough
- Fix PointCloud2 pickle: strip cached Open3D bounding boxes in __getstate__
- Fix detect skill: accept prompts as keyword list instead of varargs
- Fix xArm adapter: add clean_error/motion_enable on connect, error logging
- Add langchain-core to Docker extras for module.py import
- Update all BT actions/conditions to use BTManipulationModule RPC names
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR introduces a full BT-driven pick-and-place architecture (BTManipulationModule + PickPlaceModule) that keeps perception, grasp generation, and motion planning as independent, composable RPC modules orchestrated by a py_trees Behavior Tree, cleanly separated from the existing monolithic PickAndPlaceModule. It also ships several targeted bug fixes: Pydantic pickle errors from Iterable/Sequence type hints, ObjectSceneRegistrationModule missing **kwargs, PointCloud2 Open3D bounding-box pickle failures, xArm error-state logging, and langchain-core missing from Docker extras.

Key changes:

  • New dimos/manipulation/bt/ package: BTManipulationModule, PickPlaceModule, actions.py, conditions.py, trees.py — full two-level retry BT with DL + heuristic grasp fallback
  • New bt_pick_place_agent blueprint registered in all_blueprints.py
  • RobotModelConfig fields changed from Iterable/Sequence to list[T] to fix Pydantic v2 pickling
  • ObjectSceneRegistrationModule.__init__ now accepts **kwargs for global config passthrough
  • PointCloud2.__getstate__ strips cached Open3D OrientedBoundingBox / bounding-box properties
  • xArm adapter: clean_error() + motion_enable() on connect, structured error logging on failure
  • rpc_timeout forwarded through RpcCall.__call__ via a kwargs.pop pattern

Issues found:

  • visualize_grasps in both BTManipulationModule and PickAndPlaceModule accesses self._world_monitor.world._meshcat without a None guard, which will crash with AttributeError if the world monitor is unavailable — every other method in the same class has this guard
  • GenerateGrasps BT action has a thread safety issue: calling terminate(INVALID) detaches the running thread without stopping it, allowing it to write stale grasp results into self._result after the next initialise() has already started
  • Dockerfile comment promises a reproducibility pin but the code still points at the floating main branch
  • detect skill imports time redundantly inside the function body when it is already imported at module level

Confidence Score: 3/5

  • Largely solid architecture with targeted bug fixes, but two logic issues — an unguarded world-monitor dereference that will crash at runtime and a thread race condition in grasp generation — should be addressed before merge.
  • The new BT infrastructure is well-designed and the bug fixes (Pydantic pickle, OSR kwargs, PointCloud2 getstate, xArm adapter) are correct and targeted. However, the missing None-check in visualize_grasps is a guaranteed crash path that is present in two files, and the GenerateGrasps race condition can cause the robot to attempt grasps from a cancelled run on a retry, which is both a correctness and a potential safety concern in a physical robot context. These reduce confidence from 5 to 3.
  • dimos/manipulation/bt/bt_manipulation_module.py (visualize_grasps None crash), dimos/manipulation/bt/actions.py (GenerateGrasps thread race), dimos/manipulation/pick_and_place_module.py (same visualize_grasps crash), dimos/manipulation/grasping/docker_context/Dockerfile (floating branch)

Important Files Changed

Filename Overview
dimos/manipulation/bt/bt_manipulation_module.py New BTManipulationModule extending ManipulationModule with perception + GraspGen RPC methods; contains a potential NoneType crash in visualize_grasps and relies on double-checked locking that could expose uninitialized GraspGen state.
dimos/manipulation/bt/actions.py BT action leaf nodes; GenerateGrasps has a thread safety issue where a terminated background RPC thread can write stale results into self._result after the next initialise() call has already started.
dimos/manipulation/bt/pick_place_module.py BT PickPlaceModule with @Skill methods and async BT execution loop; detect skill has a redundant inner import; stop uses a benign TOCTOU; overall async design is sound.
dimos/manipulation/bt/trees.py Tree builders for pick, place, and go-home BTs with two-level retry; RetryOnFailure decorator implementation is correct; build_pick_tree and build_place_tree are well-structured.
dimos/manipulation/grasping/gripper_adapter.py New GripperAdapter shifting grasp poses along the approach axis; target.tcp_offset is stored in GripperGeometry but unused in the shift formula, which may be intentional (shift to TCP contact point).
dimos/manipulation/pick_and_place_module.py Added visualize_grasps RPC; duplicates the same None-unchecked _world_monitor.world._meshcat access pattern as BTManipulationModule, risking an AttributeError crash.
dimos/manipulation/grasping/docker_context/Dockerfile Expanded dependency list for GraspGen, switched to --no-deps install, added spconv-cu120; comment claims reproducibility pin but still uses 'git checkout main' (floating branch, not a commit SHA).

Sequence Diagram

sequenceDiagram
    participant Agent
    participant PickPlaceModule
    participant BT as BehaviorTree
    participant BTManipModule as BTManipulationModule
    participant OSR as ObjectSceneRegistrationModule
    participant GraspGen as GraspGenModule (Docker)
    participant Coordinator

    Agent->>PickPlaceModule: pick("red cup")
    PickPlaceModule->>PickPlaceModule: _start_async(_exec_pick)
    PickPlaceModule-->>Agent: None (async started)

    PickPlaceModule->>BT: build_pick_tree() + _tick_tree()

    loop BT Tick Loop (20Hz)
        BT->>BTManipModule: reset() [RPC]
        BT->>OSR: set_prompts(["red cup"]) [RPC]
        Note over BT,OSR: prompt_settle_time (~3s sleep)
        BT->>BTManipModule: refresh_obstacles(duration) [RPC]
        BT->>BTManipModule: list_cached_detections() [RPC]
        BTManipModule-->>BT: detections list

        BT->>OSR: get_object_pointcloud_by_name("red cup") [RPC]
        OSR-->>BT: PointCloud2

        BT->>BTManipModule: generate_grasps(pc, scene_pc) [RPC, background thread]
        Note over BT,GraspGen: GenerateGrasps returns RUNNING while Docker starts
        BTManipModule->>GraspGen: generate_grasps(pointcloud) [Docker RPC]
        GraspGen-->>BTManipModule: PoseArray (grasp candidates)
        BTManipModule-->>BT: PoseArray

        BT->>BTManipModule: visualize_grasps(candidates) [RPC, MeshCat]

        loop GraspWithRetry
            BT->>BTManipModule: plan_to_pose(pre_grasp_pose) [RPC]
            BTManipModule->>Coordinator: plan trajectory
            Coordinator-->>BTManipModule: trajectory
            BT->>BTManipModule: execute() [RPC]
            Coordinator->>Coordinator: execute trajectory
            BT->>BTManipModule: get_trajectory_status() [RPC poll]
            BTManipModule-->>BT: COMPLETED
            BT->>BTManipModule: set_gripper(0.0) [RPC close]
            BT->>BTManipModule: get_gripper() [RPC verify]
            BTManipModule-->>BT: position > threshold → SUCCESS
        end

        BT-->>PickPlaceModule: SUCCESS
    end

    PickPlaceModule->>PickPlaceModule: _notify_agent("Pick complete")
    PickPlaceModule-)Agent: /human_input publish
Loading

Comments Outside Diff (2)

  1. dimos/manipulation/bt/bt_manipulation_module.py, line 1329 (link)

    P1 visualize_grasps will crash when _world_monitor is None

    self._world_monitor can be None (e.g., when planning initialization fails). Every other method that touches it in this class guards with an explicit None check (e.g., refresh_obstacles, start), but visualize_grasps dereferences it unconditionally, causing an AttributeError: 'NoneType' object has no attribute 'world' at runtime.

    The identical pattern is duplicated in dimos/manipulation/pick_and_place_module.py at the new visualize_grasps method (line ~258 in that file).

  2. dimos/manipulation/bt/pick_place_module.py, line 1989 (link)

    P2 Redundant import time inside function body

    time is already imported at the module level (line 7 of the file). This shadowed import is unnecessary and slightly misleading.

Last reviewed commit: "fix(bt): integrate B..."

Comment on lines +504 to +511

def update(self) -> Status:
if not self._execute_ok:
self.bb.error_message = "Error: Trajectory execution send failed"
return Status.FAILURE

try:
status = self.module.get_rpc_calls("BTManipulationModule.get_trajectory_status")()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Race condition: old RPC thread can overwrite next attempt's results

When the BT is interrupted (terminate(INVALID)) while GenerateGrasps is running, self._thread is set to None but the background thread is still alive and will eventually write to self._result or self._error. If the BT retries immediately:

  1. terminate() sets _thread = None, _result = None, _error = None
  2. initialise() for the new attempt sets _result = None and starts a new thread T2
  3. The old thread T1 (still running) writes self._result = <stale result>
  4. T2 also writes self._result = <correct result>, but T1 may finish last

If T1 finishes after T2, update() will read T1's stale grasp candidates, causing the robot to attempt grasps that are no longer valid.

The fix is to use a per-run generation counter so the thread can detect it has been superseded before writing:

def initialise(self) -> None:
    self._run_id = object()          # fresh token each run
    current_id = self._run_id
    self._result = None
    self._error = None
    ...
    def _run() -> None:
        try:
            result = ...
            if current_id is self._run_id:   # still current?
                self._result = result
        except Exception as e:
            if current_id is self._run_id:
                self._error = e
    ...

@@ -25,7 +25,7 @@ RUN conda init bash && \
echo 'yes' | conda tos accept --override-channels --channel defaults 2>/dev/null || true && \
conda create -n app python=3.10 -y && conda clean -afy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Comment promises reproducibility but still uses a floating branch

The comment was updated to say "pin to a known-good commit for reproducibility", but the command still checks out main — a floating branch that will silently drift with every upstream push. This means two Docker image builds on different days may use different GraspGen code and produce different results, which is the exact problem the comment claims to have fixed.

Pin to the specific commit SHA that was tested:

Suggested change
conda create -n app python=3.10 -y && conda clean -afy
RUN git clone https://github.com/NVlabs/GraspGen.git && cd GraspGen && git checkout <tested-commit-sha>

# Conflicts:
#	dimos/robot/all_blueprints.py
@mustafab0
Copy link
Contributor

mustafab0 commented Mar 19, 2026

Runnnig blueprint bt-pick-place-agent first time. The container dies when starting.

20:00:50.032[inf][dimos/core/docker_runner.py   ] Starting docker container: dimos_graspgenmodule_1881383_1773949155
20:00:50.336[inf][dimos/core/docker_runner.py   ] Waiting for GraspGenModule to be ready...
20:01:10.500[err][n/bt/bt_manipulation_module.py] Grasp generation failed: Container died during startup:

Traceback (most recent call last):
  File "/opt/conda/envs/app/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants