-
Notifications
You must be signed in to change notification settings - Fork 2.7k
read frames from pre-connect audio buffer #2156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Changeset File DetectedThe following changeset entries were found:
Change description: |
|
I'm wondering if there is a way to do this automatically, so we don't even expose any APIs for it, it's just automatic |
| return self._data | ||
|
|
||
| @utils.log_exceptions(logger=logger) | ||
| async def _read_audio_task(self, reader: rtc.ByteStreamReader, participant_id: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we're reading from the right participant_id? The one that is registered inside the RoomIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't know which participant listening to at this moment, so maybe we buffer all the pre-connect audios for participants and decide if we want to use them based on the participant id and the received timestamp.
Imagine in a multi-user room, the second user joined the room with the pre-connect audio but the agent is not listening to that user until the set_participant is called, then that pre-connect buffer was actually out of date we should ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed for multiple participants
| async def entrypoint(ctx: JobContext): | ||
| # register a pre-connect audio handler before connecting to the room so that | ||
| # it won't miss the audio buffer (TODO (long): does this makes sense?) | ||
| pre_connect_audio = PreConnectAudioHandler(ctx.room).register() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theomonnom we can do that automatically if this line can run after the ctx.connect(), basically register the byte stream handler inside the room io after the ctx.connect(). I am not sure if we are going to miss the stream reader if the ctx connect will auto subscribe the audio track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcherry to enable it automatically, is that okay to register the handler after ctx.connect, what will happen if the handler is registered after the byte stream sent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that if we move the registering to somewhere like room io, user may do anything between the ctx.connect and session.start(), the handler may not be there when the audio is sent, also there might be a gap between the audio track subscribed (done in ctx.connect) and audio stream created from the track (in room_io.start()).
(If the pre-connect buffer is sent until the audio track got subscribed.)
|
Works fine for me (in a single participant scenario) 👍 |
|
yeah i think this needs to be automatic from the agent side so the developer only has to "turn it on" in one spot (which would be on the client). right now the client sets a participant attribute indicating this feature is enabled but I think we're going to move that to be a track feature instead. ideally the agents framework can read that and automatically do the right thing |
|
Token detection seems to work fine 👍 I'm leaving the final decision here to @bcherry (more moving parts) |
lukasIO
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please hold this for a moment, until we can get protocol level support for the pre connect buffer as part of the audio track features
| from ..agent import logger, utils | ||
|
|
||
| PRE_CONNECT_AUDIO_BUFFER_STREAM = "lk.agent.pre-connect-audio-buffer" | ||
| PRE_CONNECT_AUDIO_ATTRIBUTE = "lk.agent.pre-connect-audio" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed this in the client team and the most reliable solution would be to use the audioTrackFeature enum on the publication itself to figure out if the preconnect buffer should be handled for that track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@longcw are you fine with this alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay, how to read that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class TrackInfo(_message.Message):
...
audio_features: _containers.RepeatedScalarFieldContainer[AudioTrackFeature]but the exact case hasn't been added yet, am I right @lukasIO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct, here's the PR to add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is audio_features? pls let me know how to access it from python sdk when it's ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's part of the track info object, might be that the rust layer is not yet exposing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is mapped, the snippet above is copied from models.pyi
| self._timeout = timeout | ||
| self._max_delta_s = max_delta_s | ||
|
|
||
| self._buffers: dict[str, asyncio.Future[_PreConnectAudioBuffer]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a dictionary with participant identity keys, it would make more sense to use the track id. We don't know the publication id yet on the client side, but the mediastreamtrack id should work, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's needed to distinguish the tracks from the same participant, we always read the first audio track from the participant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The track id in transcription makes it's hard to use IMO, I don't know if it's worth to add it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what way does it make it harder to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just so that this doesn't get lost after switching PRs, what's your thinking here @longcw ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't against it, but let's see what is the new protocol... for example if the buffer is bound with a track id, then it's fine to use track id as the key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protocol PR was merged btw livekit/protocol#1057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not added to the python sdk right? and can we update this example to use the track feature flag as well so I can test it livekit-examples/agent-starter-swift#16
|
separate the room io updates to #2167 |
No description provided.