Piper official repo, as of now, splits the input text into sentences using phonemize().
It generates audio chunk for whole sentence at once and then "yields" audio for each sentence one aftyer the other.
So, if the first sentence in the input text is long, it will some time to generate the audio.
Does this repo follow the same if i select piperEngine? or does this repo somehow genrate audio for each word by word instead of sentence by sentence? If it produces word by word , latency will be much lower.
Genral Question: Is there a opern source TTS system which has bith
- generates speech word by word or like sub-sentence level, so that latency is lower
- Has script for fine tuning the pre-trained model with our custom data.
Piper official repo, as of now, splits the input text into sentences using phonemize().
It generates audio chunk for whole sentence at once and then "yields" audio for each sentence one aftyer the other.
So, if the first sentence in the input text is long, it will some time to generate the audio.
Does this repo follow the same if i select piperEngine? or does this repo somehow genrate audio for each word by word instead of sentence by sentence? If it produces word by word , latency will be much lower.
Genral Question: Is there a opern source TTS system which has bith