Skip to content

streaming tts with piper #339

@bchinnari

Description

@bchinnari

Piper official repo, as of now, splits the input text into sentences using phonemize().
It generates audio chunk for whole sentence at once and then "yields" audio for each sentence one aftyer the other.

So, if the first sentence in the input text is long, it will some time to generate the audio.

Does this repo follow the same if i select piperEngine? or does this repo somehow genrate audio for each word by word instead of sentence by sentence? If it produces word by word , latency will be much lower.

Genral Question: Is there a opern source TTS system which has bith

  1. generates speech word by word or like sub-sentence level, so that latency is lower
  2. Has script for fine tuning the pre-trained model with our custom data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions