Skip to content

Conversation

@longcw
Copy link
Contributor

@longcw longcw commented Apr 29, 2025

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 29, 2025

✅ Changeset File Detected

The following changeset entries were found:

  • patch - livekit-agents

Change description:
read frames from pre-connect audio buffer and remove await from room_io start (#2156)

@longcw longcw requested review from a team, bcherry and pblazej April 29, 2025 10:31
@theomonnom
Copy link
Member

I'm wondering if there is a way to do this automatically, so we don't even expose any APIs for it, it's just automatic

return self._data

@utils.log_exceptions(logger=logger)
async def _read_audio_task(self, reader: rtc.ByteStreamReader, participant_id: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we're reading from the right participant_id? The one that is registered inside the RoomIO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know which participant listening to at this moment, so maybe we buffer all the pre-connect audios for participants and decide if we want to use them based on the participant id and the received timestamp.

Imagine in a multi-user room, the second user joined the room with the pre-connect audio but the agent is not listening to that user until the set_participant is called, then that pre-connect buffer was actually out of date we should ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed for multiple participants

async def entrypoint(ctx: JobContext):
# register a pre-connect audio handler before connecting to the room so that
# it won't miss the audio buffer (TODO (long): does this makes sense?)
pre_connect_audio = PreConnectAudioHandler(ctx.room).register()
Copy link
Contributor Author

@longcw longcw Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theomonnom we can do that automatically if this line can run after the ctx.connect(), basically register the byte stream handler inside the room io after the ctx.connect(). I am not sure if we are going to miss the stream reader if the ctx connect will auto subscribe the audio track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcherry to enable it automatically, is that okay to register the handler after ctx.connect, what will happen if the handler is registered after the byte stream sent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that if we move the registering to somewhere like room io, user may do anything between the ctx.connect and session.start(), the handler may not be there when the audio is sent, also there might be a gap between the audio track subscribed (done in ctx.connect) and audio stream created from the track (in room_io.start()).

(If the pre-connect buffer is sent until the audio track got subscribed.)

@pblazej
Copy link

pblazej commented Apr 29, 2025

Works fine for me (in a single participant scenario) 👍

@bcherry
Copy link
Contributor

bcherry commented Apr 29, 2025

yeah i think this needs to be automatic from the agent side so the developer only has to "turn it on" in one spot (which would be on the client). right now the client sets a participant attribute indicating this feature is enabled but I think we're going to move that to be a track feature instead. ideally the agents framework can read that and automatically do the right thing

@longcw longcw requested a review from theomonnom April 29, 2025 14:42
@pblazej
Copy link

pblazej commented Apr 30, 2025

Token detection seems to work fine 👍 I'm leaving the final decision here to @bcherry (more moving parts)

Copy link
Contributor

@lukasIO lukasIO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hold this for a moment, until we can get protocol level support for the pre connect buffer as part of the audio track features

from ..agent import logger, utils

PRE_CONNECT_AUDIO_BUFFER_STREAM = "lk.agent.pre-connect-audio-buffer"
PRE_CONNECT_AUDIO_ATTRIBUTE = "lk.agent.pre-connect-audio"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed this in the client team and the most reliable solution would be to use the audioTrackFeature enum on the publication itself to figure out if the preconnect buffer should be handled for that track

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@longcw are you fine with this alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay, how to read that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class TrackInfo(_message.Message):
...
    audio_features: _containers.RepeatedScalarFieldContainer[AudioTrackFeature]

but the exact case hasn't been added yet, am I right @lukasIO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, here's the PR to add it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is audio_features? pls let me know how to access it from python sdk when it's ready

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's part of the track info object, might be that the rust layer is not yet exposing this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is mapped, the snippet above is copied from models.pyi

self._timeout = timeout
self._max_delta_s = max_delta_s

self._buffers: dict[str, asyncio.Future[_PreConnectAudioBuffer]] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a dictionary with participant identity keys, it would make more sense to use the track id. We don't know the publication id yet on the client side, but the mediastreamtrack id should work, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's needed to distinguish the tracks from the same participant, we always read the first audio track from the participant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The track id in transcription makes it's hard to use IMO, I don't know if it's worth to add it here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what way does it make it harder to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so that this doesn't get lost after switching PRs, what's your thinking here @longcw ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't against it, but let's see what is the new protocol... for example if the buffer is bound with a track id, then it's fine to use track id as the key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocol PR was merged btw livekit/protocol#1057

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not added to the python sdk right? and can we update this example to use the track feature flag as well so I can test it livekit-examples/agent-starter-swift#16

@longcw
Copy link
Contributor Author

longcw commented Apr 30, 2025

separate the room io updates to #2167

@longcw
Copy link
Contributor Author

longcw commented May 1, 2025

separated the change in RoomIO into #2167, created a new one for this feature #2171 (basically reverted a few commits)

@longcw longcw closed this May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants