Skip to content

observe() wastes 2 LLM round trips per call #2054

@Mgczacki

Description

@Mgczacki

Description

System

macOS 14.5 (Darwin 23.5.0)

Robot/Sim/Hardware (including firmware version)

Unitree Go2 Mujoco simulator — invoked via dimos --simulation run unitree-go2-agentic. The affected code is in the agent/MCP layer and is hardware-independent.

Steps to reproduce

  1. dimos --simulation run unitree-go2-agentic
  2. Prompt the agent: "Go forward towards the wall and take a pic. What do you see?"
  3. The model emits [relative_move(forward=1), observe()]. Both execute. relative_move returns text; observe returns "Tool call started with UUID: . You will be updated with the result soon." even though the actual image is in hand at the same instant.
  4. The state graph re-invokes the LLM on the placeholder. The model emits another observe() (wasted round trip 1).
  5. The graph re-invokes the LLM again. The model emits a hedging text reply (wasted round trip 2), which terminates the graph.
  6. _thread_loop drains a pre-staged HumanMessage (the actual image, queued by _append_image_to_history during step 3) and starts a fresh graph invocation, in which the model finally describes the image.

This causes 2 wasted LLM round trips inside the first invocation, plus the round trip in the second invocation that produces the real answer. ~10s of avoidable latency, extra tokens, and a user-visible apologetic message.

DimOS version

a8ee67867 (branch main, fork of dimensionalOS/dimos)

Logs / screenshots

Langfuse trace of the affected turn (public):
https://us.cloud.langfuse.com/project/cmp23t80n09ooad08jnw1lksy/traces/9fa3b09bd507434c25b9fa9b6797ea7b?observation=trace-9fa3b09bd507434c25b9fa9b6797ea7b&timestamp=2026-05-12T04%3A31%3A16.928Z

Annotated transcript — the two wasted LLM round trips are inside a single state-graph invocation:

--- state graph invocation 1 ---
user > "Go forward towards the wall, and take a pic. what you see?"
assistant > tool_calls=[relative_move(forward=1), observe()] <- LLM round trip 1
tool > relative_move -> "Navigation goal reached"
tool > observe -> "Tool call started with UUID: 5f93444b..."
assistant > tool_calls=[observe()] <- LLM round trip 2
(WASTED - duplicate)
tool > observe -> "Tool call started with UUID: c4b3a345..."
assistant > "I'm approaching the wall. Let me capture an image..." <- LLM round trip 3
(WASTED hedge)
--- state graph exits ---

--- _thread_loop drains the pre-staged HumanMessage from the queue ---

--- state graph invocation 2 ---
user > "This is the artefact for the 'observe' tool with UUID:=5f93444b-..."
assistant > "I can see a room with chairs and tables, a wall with a curtain..." <- the real
answer

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions