Skip to content

Feature Request: Utterance timestamps in the ChatContext or Transcript #2326

@zaheerabbas-prodigal

Description

@zaheerabbas-prodigal

I want to capture the word level or atleast utterance level timestamps for both the user and agent transcripts and store the timestamp details in the ChatContext. The usecase is to use the ChatContext as the transcript to run some post-processing. Example use cases- displaying the calls in UI, redaction, summarization etc.

Currently the transcription data the SDK exposes for the user transcript is just the text part of the transcript and the utterance or word level timestamps are NOT exposed at all from the SDK.

The agent's transcript however does not even have the timestamps even though elevenlabs and cartesia TTS support timestamps in their API.

Has anyone tried a way to get these timestamp data from the livekit-agents SDK?

I am happy to submit a PR to add this feature. Would the PR be merged if I added this feature?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions