Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion livekit-agents/livekit/agents/voice/agent_activity.py
Original file line number Diff line number Diff line change
Expand Up @@ -1944,6 +1944,7 @@ def _tool_execution_started_cb(fnc_call: llm.FunctionCall) -> None:
# reset the `created_at` to the start time of the tool execution
fnc_call.created_at = time.time()
speech_handle._item_added([fnc_call])
self._session._update_agent_state("processing")
Copy link
Contributor

@longcw longcw Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the tool call has a text message alongside, or there is a session.say in the tool call? the state may become thinking -> speaking -> processing (while agent is still speaking), or thinking -> processing -> speaking (while the function tool is running).

the main problem is the function call execution can be parallel with other states. I am not sure what is the original purpose of adding this state, but we had a function_tools_executed event, what if adding a function_tools_started event? does that solve the issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.

I mentioned in #4460 that I can already fire that event from server to worker, and simulate that. So event-based handling isn't an issue.

The problem is that our client side also communicate with livekit cloud for agent state, and that state will still be thinking when tool is being used. Sure I could communicate between client and my backend/worker, but that's kinda circumventing the entire livekit agent state management.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MonkeyLeeT What do you think of supporting something like a ToolState, which would switch between executing and idle? Perhaps this could be an AgentSession property? Let me know if this could address your use case!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work! As long as that's synced via livekit cloud so any client connecting to that can get this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MonkeyLeeT you can sync the tool state to client via room.local_participant.set_attributes, for example the agent_state is updated in https://github.com/livekit/agents/blob/livekit-agents@1.3.10/livekit-agents/livekit/agents/voice/room_io/room_io.py#L425-L429.

I think you can track the tool state in the function tool itself and sync the state to the client via the set_attributes API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would client side get an event for attribute updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can listen on "participant_attributes_changed"?

docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tinalenguyen Lemme make a try, but is there any other concern to support this as a state? It feels very natural to me.


def _tool_execution_completed_cb(out: ToolExecutionOutput) -> None:
if out.fnc_call_out:
Expand Down Expand Up @@ -2214,6 +2215,7 @@ async def _realtime_generation_task(
with tracer.start_as_current_span(
"agent_turn", context=self._session._root_span_context
) as current_span:
self._session._update_agent_state("thinking")
current_span.set_attribute(trace_types.ATTR_AGENT_TURN_ID, speech_handle._generation_id)
if parent_id := speech_handle._parent_generation_id:
current_span.set_attribute(trace_types.ATTR_AGENT_PARENT_TURN_ID, parent_id)
Expand Down Expand Up @@ -2414,6 +2416,7 @@ async def _read_fnc_stream() -> None:
)

def _tool_execution_started_cb(fnc_call: llm.FunctionCall) -> None:
self._session._update_agent_state("processing")
speech_handle._item_added([fnc_call])
self._agent._chat_ctx.items.append(fnc_call)
self._session._tool_items_added([fnc_call])
Expand Down Expand Up @@ -2444,7 +2447,8 @@ def _tool_execution_completed_cb(out: ToolExecutionOutput) -> None:
await speech_handle.wait_if_not_interrupted(
[asyncio.ensure_future(audio_output.wait_for_playout())]
)
self._session._update_agent_state("listening")
if exe_task.done():
self._session._update_agent_state("listening")
Comment on lines +2450 to +2451

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore listening state after long tool runs

In _realtime_generation_task_impl, the state flips back to listening only when exe_task is already done at audio playout time. If a tool execution outlasts playout and does not require a follow‑up reply, exe_task completes later but there is no subsequent state update in this function, so the agent can remain stuck in processing/thinking until the next turn. This makes state consumers (e.g., room attributes or UI) believe the agent is still busy even though tool execution finished; consider updating the state after await exe_task when no reply is generated.

Useful? React with 👍 / 👎.

current_span.set_attribute(
trace_types.ATTR_SPEECH_INTERRUPTED, speech_handle.interrupted
)
Expand Down
2 changes: 1 addition & 1 deletion livekit-agents/livekit/agents/voice/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ async def wait_for_playout(self) -> None:
]

UserState = Literal["speaking", "listening", "away"]
AgentState = Literal["initializing", "idle", "listening", "thinking", "speaking"]
AgentState = Literal["initializing", "idle", "listening", "thinking", "speaking", "processing"]


class UserStateChangedEvent(BaseModel):
Expand Down
8 changes: 4 additions & 4 deletions tests/test_agent_session.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,14 +177,14 @@ async def test_tool_call() -> None:
check_timestamp(playback_finished_events[0].playback_position, 2.0, speed_factor=speed)
check_timestamp(playback_finished_events[1].playback_position, 3.0, speed_factor=speed)

assert len(agent_state_events) == 6
assert len(agent_state_events) == 7
assert agent_state_events[0].old_state == "initializing"
assert agent_state_events[0].new_state == "listening"
assert agent_state_events[1].new_state == "thinking"
assert agent_state_events[2].new_state == "speaking"
assert (
agent_state_events[3].new_state == "thinking"
) # from speaking to thinking when tool call is executed
agent_state_events[2].new_state == "processing"
) # from thinking to processing when tool call is executed
assert agent_state_events[3].new_state == "thinking"
check_timestamp(agent_state_events[3].created_at - t_origin, 5.5, speed_factor=speed)
assert agent_state_events[4].new_state == "speaking"
assert agent_state_events[5].new_state == "listening"
Expand Down
Loading