Description
System
macOS 14.5 (Darwin 23.5.0)
Robot/Sim/Hardware (including firmware version)
Unitree Go2 Mujoco simulator — invoked via dimos --simulation run unitree-go2-agentic. The affected code is in the agent/MCP layer and is hardware-independent.
Steps to reproduce
- dimos --simulation run unitree-go2-agentic
- Prompt the agent: "Go forward towards the wall and take a pic. What do you see?"
- The model emits [relative_move(forward=1), observe()]. Both execute. relative_move returns text; observe returns "Tool call started with UUID: . You will be updated with the result soon." even though the actual image is in hand at the same instant.
- The state graph re-invokes the LLM on the placeholder. The model emits another observe() (wasted round trip 1).
- The graph re-invokes the LLM again. The model emits a hedging text reply (wasted round trip 2), which terminates the graph.
- _thread_loop drains a pre-staged HumanMessage (the actual image, queued by _append_image_to_history during step 3) and starts a fresh graph invocation, in which the model finally describes the image.
This causes 2 wasted LLM round trips inside the first invocation, plus the round trip in the second invocation that produces the real answer. ~10s of avoidable latency, extra tokens, and a user-visible apologetic message.
DimOS version
a8ee67867 (branch main, fork of dimensionalOS/dimos)
Logs / screenshots
Langfuse trace of the affected turn (public):
https://us.cloud.langfuse.com/project/cmp23t80n09ooad08jnw1lksy/traces/9fa3b09bd507434c25b9fa9b6797ea7b?observation=trace-9fa3b09bd507434c25b9fa9b6797ea7b×tamp=2026-05-12T04%3A31%3A16.928Z
Annotated transcript — the two wasted LLM round trips are inside a single state-graph invocation:
--- state graph invocation 1 ---
user > "Go forward towards the wall, and take a pic. what you see?"
assistant > tool_calls=[relative_move(forward=1), observe()] <- LLM round trip 1
tool > relative_move -> "Navigation goal reached"
tool > observe -> "Tool call started with UUID: 5f93444b..."
assistant > tool_calls=[observe()] <- LLM round trip 2
(WASTED - duplicate)
tool > observe -> "Tool call started with UUID: c4b3a345..."
assistant > "I'm approaching the wall. Let me capture an image..." <- LLM round trip 3
(WASTED hedge)
--- state graph exits ---
--- _thread_loop drains the pre-staged HumanMessage from the queue ---
--- state graph invocation 2 ---
user > "This is the artefact for the 'observe' tool with UUID:=5f93444b-..."
assistant > "I can see a room with chairs and tables, a wall with a curtain..." <- the real
answer
Description
System
macOS 14.5 (Darwin 23.5.0)
Robot/Sim/Hardware (including firmware version)
Unitree Go2 Mujoco simulator — invoked via dimos --simulation run unitree-go2-agentic. The affected code is in the agent/MCP layer and is hardware-independent.
Steps to reproduce
This causes 2 wasted LLM round trips inside the first invocation, plus the round trip in the second invocation that produces the real answer. ~10s of avoidable latency, extra tokens, and a user-visible apologetic message.
DimOS version
a8ee67867 (branch main, fork of dimensionalOS/dimos)
Logs / screenshots
Langfuse trace of the affected turn (public):
https://us.cloud.langfuse.com/project/cmp23t80n09ooad08jnw1lksy/traces/9fa3b09bd507434c25b9fa9b6797ea7b?observation=trace-9fa3b09bd507434c25b9fa9b6797ea7b×tamp=2026-05-12T04%3A31%3A16.928Z
Annotated transcript — the two wasted LLM round trips are inside a single state-graph invocation:
--- state graph invocation 1 ---
user > "Go forward towards the wall, and take a pic. what you see?"
assistant > tool_calls=[relative_move(forward=1), observe()] <- LLM round trip 1
tool > relative_move -> "Navigation goal reached"
tool > observe -> "Tool call started with UUID: 5f93444b..."
assistant > tool_calls=[observe()] <- LLM round trip 2
(WASTED - duplicate)
tool > observe -> "Tool call started with UUID: c4b3a345..."
assistant > "I'm approaching the wall. Let me capture an image..." <- LLM round trip 3
(WASTED hedge)
--- state graph exits ---
--- _thread_loop drains the pre-staged HumanMessage from the queue ---
--- state graph invocation 2 ---
user > "This is the artefact for the 'observe' tool with UUID:=5f93444b-..."
assistant > "I can see a room with chairs and tables, a wall with a curtain..." <- the real
answer