-
Notifications
You must be signed in to change notification settings - Fork 2.6k
add processing agentstate #4518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1944,6 +1944,7 @@ def _tool_execution_started_cb(fnc_call: llm.FunctionCall) -> None: | |
| # reset the `created_at` to the start time of the tool execution | ||
| fnc_call.created_at = time.time() | ||
| speech_handle._item_added([fnc_call]) | ||
| self._session._update_agent_state("processing") | ||
|
|
||
| def _tool_execution_completed_cb(out: ToolExecutionOutput) -> None: | ||
| if out.fnc_call_out: | ||
|
|
@@ -2214,6 +2215,7 @@ async def _realtime_generation_task( | |
| with tracer.start_as_current_span( | ||
| "agent_turn", context=self._session._root_span_context | ||
| ) as current_span: | ||
| self._session._update_agent_state("thinking") | ||
| current_span.set_attribute(trace_types.ATTR_AGENT_TURN_ID, speech_handle._generation_id) | ||
| if parent_id := speech_handle._parent_generation_id: | ||
| current_span.set_attribute(trace_types.ATTR_AGENT_PARENT_TURN_ID, parent_id) | ||
|
|
@@ -2414,6 +2416,7 @@ async def _read_fnc_stream() -> None: | |
| ) | ||
|
|
||
| def _tool_execution_started_cb(fnc_call: llm.FunctionCall) -> None: | ||
| self._session._update_agent_state("processing") | ||
| speech_handle._item_added([fnc_call]) | ||
| self._agent._chat_ctx.items.append(fnc_call) | ||
| self._session._tool_items_added([fnc_call]) | ||
|
|
@@ -2444,7 +2447,8 @@ def _tool_execution_completed_cb(out: ToolExecutionOutput) -> None: | |
| await speech_handle.wait_if_not_interrupted( | ||
| [asyncio.ensure_future(audio_output.wait_for_playout())] | ||
| ) | ||
| self._session._update_agent_state("listening") | ||
| if exe_task.done(): | ||
| self._session._update_agent_state("listening") | ||
|
Comment on lines
+2450
to
+2451
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In Useful? React with 👍 / 👎. |
||
| current_span.set_attribute( | ||
| trace_types.ATTR_SPEECH_INTERRUPTED, speech_handle.interrupted | ||
| ) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the tool call has a text message alongside, or there is a
session.sayin the tool call? the state may becomethinking -> speaking -> processing(while agent is still speaking), orthinking -> processing -> speaking(while the function tool is running).the main problem is the function call execution can be parallel with other states. I am not sure what is the original purpose of adding this state, but we had a
function_tools_executedevent, what if adding afunction_tools_startedevent? does that solve the issue?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment.
I mentioned in #4460 that I can already fire that event from server to worker, and simulate that. So event-based handling isn't an issue.
The problem is that our client side also communicate with livekit cloud for agent state, and that state will still be thinking when tool is being used. Sure I could communicate between client and my backend/worker, but that's kinda circumventing the entire livekit agent state management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MonkeyLeeT What do you think of supporting something like a
ToolState, which would switch betweenexecutingandidle? Perhaps this could be anAgentSessionproperty? Let me know if this could address your use case!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work! As long as that's synced via livekit cloud so any client connecting to that can get this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MonkeyLeeT you can sync the tool state to client via
room.local_participant.set_attributes, for example theagent_stateis updated in https://github.com/livekit/agents/blob/livekit-agents@1.3.10/livekit-agents/livekit/agents/voice/room_io/room_io.py#L425-L429.I think you can track the tool state in the function tool itself and sync the state to the client via the
set_attributesAPI.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would client side get an event for attribute updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can listen on
"participant_attributes_changed"?docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tinalenguyen Lemme make a try, but is there any other concern to support this as a state? It feels very natural to me.