Hi! First off, thanks for creating KittenTTS - it's been great for adding audio feedback to my projects.
I've noticed that audio generation sometimes cuts off abruptly at the end of messages, making the speech sound unnatural or incomplete. I've been experimenting with adding padding to the text to get smoother endings.
What I've tried
Currently, I'm appending "...." (five dots) to the end of messages, which seems to help with the audio rendering and provides a more natural trailing off but there are sometimes some artifacts. I'm also using this pattern within messages for better speech cadence.
Questions
- Is there a recommended approach for padding text to ensure clean audio endings?
- Are there specific characters or patterns that work better than others for this purpose?
Example
// Current approach
speak({ text: "Task completed successfully....." })
// vs without padding
speak({ text: "Task completed successfully" }) // Sometimes cuts off abruptly
The padded version seems to give the TTS engine something to "decay" into, but I'm wondering if there's a more elegant solution or if I'm approaching this the wrong way.
Any guidance would be appreciated! Happy to contribute to docs if there's a standardized approach you'd recommend.
Hi! First off, thanks for creating KittenTTS - it's been great for adding audio feedback to my projects.
I've noticed that audio generation sometimes cuts off abruptly at the end of messages, making the speech sound unnatural or incomplete. I've been experimenting with adding padding to the text to get smoother endings.
What I've tried
Currently, I'm appending "...." (five dots) to the end of messages, which seems to help with the audio rendering and provides a more natural trailing off but there are sometimes some artifacts. I'm also using this pattern within messages for better speech cadence.
Questions
Example
// Current approach
speak({ text: "Task completed successfully....." })
// vs without padding
speak({ text: "Task completed successfully" }) // Sometimes cuts off abruptly
The padded version seems to give the TTS engine something to "decay" into, but I'm wondering if there's a more elegant solution or if I'm approaching this the wrong way.
Any guidance would be appreciated! Happy to contribute to docs if there's a standardized approach you'd recommend.