read frames from pre-connect audio buffer #2156

longcw · 2025-04-29T10:31:26Z

No description provided.

github-actions · 2025-04-29T10:31:39Z

✅ Changeset File Detected

The following changeset entries were found:

patch - livekit-agents

Change description:
read frames from pre-connect audio buffer and remove await from room_io start (#2156)

theomonnom · 2025-04-29T10:34:38Z

I'm wondering if there is a way to do this automatically, so we don't even expose any APIs for it, it's just automatic

theomonnom · 2025-04-29T10:38:09Z

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py

+        return self._data
+
+    @utils.log_exceptions(logger=logger)
+    async def _read_audio_task(self, reader: rtc.ByteStreamReader, participant_id: str):


Let's make sure we're reading from the right participant_id? The one that is registered inside the RoomIO

we don't know which participant listening to at this moment, so maybe we buffer all the pre-connect audios for participants and decide if we want to use them based on the participant id and the received timestamp.

Imagine in a multi-user room, the second user joined the room with the pre-connect audio but the agent is not listening to that user until the set_participant is called, then that pre-connect buffer was actually out of date we should ignore.

fixed for multiple participants

longcw · 2025-04-29T10:40:31Z

examples/voice_agents/precompute_audio_buffer.py

+async def entrypoint(ctx: JobContext):
+    # register a pre-connect audio handler before connecting to the room so that
+    # it won't miss the audio buffer (TODO (long): does this makes sense?)
+    pre_connect_audio = PreConnectAudioHandler(ctx.room).register()


@theomonnom we can do that automatically if this line can run after the ctx.connect(), basically register the byte stream handler inside the room io after the ctx.connect(). I am not sure if we are going to miss the stream reader if the ctx connect will auto subscribe the audio track.

@bcherry to enable it automatically, is that okay to register the handler after ctx.connect, what will happen if the handler is registered after the byte stream sent?

My concern is that if we move the registering to somewhere like room io, user may do anything between the ctx.connect and session.start(), the handler may not be there when the audio is sent, also there might be a gap between the audio track subscribed (done in ctx.connect) and audio stream created from the track (in room_io.start()).

(If the pre-connect buffer is sent until the audio track got subscribed.)

pblazej · 2025-04-29T11:37:15Z

Works fine for me (in a single participant scenario) 👍

bcherry · 2025-04-29T14:14:48Z

yeah i think this needs to be automatic from the agent side so the developer only has to "turn it on" in one spot (which would be on the client). right now the client sets a participant attribute indicating this feature is enabled but I think we're going to move that to be a track feature instead. ideally the agents framework can read that and automatically do the right thing

pblazej · 2025-04-30T10:17:36Z

Token detection seems to work fine 👍 I'm leaving the final decision here to @bcherry (more moving parts)

lukasIO

Please hold this for a moment, until we can get protocol level support for the pre connect buffer as part of the audio track features

lukasIO · 2025-04-30T11:12:41Z

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py

+from ..agent import logger, utils
+
+PRE_CONNECT_AUDIO_BUFFER_STREAM = "lk.agent.pre-connect-audio-buffer"
+PRE_CONNECT_AUDIO_ATTRIBUTE = "lk.agent.pre-connect-audio"


we discussed this in the client team and the most reliable solution would be to use the audioTrackFeature enum on the publication itself to figure out if the preconnect buffer should be handled for that track

@longcw are you fine with this alternative?

I think it's okay, how to read that?

class TrackInfo(_message.Message): ... audio_features: _containers.RepeatedScalarFieldContainer[AudioTrackFeature]

but the exact case hasn't been added yet, am I right @lukasIO?

that's correct, here's the PR to add it

what is audio_features? pls let me know how to access it from python sdk when it's ready

it's part of the track info object, might be that the rust layer is not yet exposing this

I think it is mapped, the snippet above is copied from models.pyi

lukasIO · 2025-04-30T11:14:14Z

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py

+        self._timeout = timeout
+        self._max_delta_s = max_delta_s
+
+        self._buffers: dict[str, asyncio.Future[_PreConnectAudioBuffer]] = {}


instead of a dictionary with participant identity keys, it would make more sense to use the track id. We don't know the publication id yet on the client side, but the mediastreamtrack id should work, I think

I don't think it's needed to distinguish the tracks from the same participant, we always read the first audio track from the participant.

The track id in transcription makes it's hard to use IMO, I don't know if it's worth to add it here

in what way does it make it harder to use?

just so that this doesn't get lost after switching PRs, what's your thinking here @longcw ?

I don't against it, but let's see what is the new protocol... for example if the buffer is bound with a track id, then it's fine to use track id as the key

protocol PR was merged btw livekit/protocol#1057

it's not added to the python sdk right? and can we update this example to use the track feature flag as well so I can test it livekit-examples/agent-starter-swift#16

longcw · 2025-04-30T13:47:11Z

separate the room io updates to #2167

longcw · 2025-05-01T02:58:32Z

separated the change in RoomIO into #2167, created a new one for this feature #2171 (basically reverted a few commits)

longcw added 4 commits April 29, 2025 16:32

add pre-connect audio buffer

355b47f

read buffer as a list

b732d68

clean logs

ca15aa0

add wait_for_data for PreConnectAudioData

c161ebd

longcw requested review from a team, bcherry and pblazej April 29, 2025 10:31

theomonnom reviewed Apr 29, 2025

View reviewed changes

longcw commented Apr 29, 2025

View reviewed changes

longcw added 2 commits April 29, 2025 21:33

Merge remote-tracking branch 'origin/main' into longc/pre-connect-audio

b77db38

support multi participant

4b0073f

update PreConnectAudioData

ad3e6b7

longcw requested a review from theomonnom April 29, 2025 14:42

longcw added 5 commits April 30, 2025 12:04

move PreConnectAudioHandler to room io

1ea9d6b

update comments

13171f9

update comments

f066d26

clean up timeout

92fa347

check PRE_CONNECT_AUDIO_ATTRIBUTE == true

ca5537e

remove await from room io start

c24ea79

lukasIO requested changes Apr 30, 2025

View reviewed changes

longcw added 5 commits April 30, 2025 19:24

fix room connection handler

60ada10

revert datastream chat listener

9664d5b

Merge remote-tracking branch 'origin/main' into longc/pre-connect-audio

509d91e

revert examples

f24f9b4

Create changeset-5c01f156.md

e95e07b

longcw mentioned this pull request May 1, 2025

add pre-connected audio buffer #2171

Merged

longcw closed this May 1, 2025

read frames from pre-connect audio buffer #2156

read frames from pre-connect audio buffer #2156

Uh oh!

Conversation

longcw commented Apr 29, 2025

Uh oh!

github-actions bot commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Changeset File Detected

Uh oh!

theomonnom commented Apr 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longcw Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pblazej commented Apr 29, 2025

Uh oh!

bcherry commented Apr 29, 2025

Uh oh!

pblazej commented Apr 30, 2025

Uh oh!

lukasIO left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longcw commented Apr 30, 2025

Uh oh!

longcw commented May 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Apr 29, 2025 •

edited

Loading

longcw Apr 29, 2025 •

edited

Loading