Skip to content

Question: Best practices for text padding to improve audio generation endings #12

@dhofheinz

Description

@dhofheinz

Hi! First off, thanks for creating KittenTTS - it's been great for adding audio feedback to my projects.

I've noticed that audio generation sometimes cuts off abruptly at the end of messages, making the speech sound unnatural or incomplete. I've been experimenting with adding padding to the text to get smoother endings.

What I've tried

Currently, I'm appending "...." (five dots) to the end of messages, which seems to help with the audio rendering and provides a more natural trailing off but there are sometimes some artifacts. I'm also using this pattern within messages for better speech cadence.

Questions

  1. Is there a recommended approach for padding text to ensure clean audio endings?
  2. Are there specific characters or patterns that work better than others for this purpose?

Example

// Current approach
speak({ text: "Task completed successfully....." })

// vs without padding
speak({ text: "Task completed successfully" }) // Sometimes cuts off abruptly

The padded version seems to give the TTS engine something to "decay" into, but I'm wondering if there's a more elegant solution or if I'm approaching this the wrong way.

Any guidance would be appreciated! Happy to contribute to docs if there's a standardized approach you'd recommend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions