Skip to content
/ ltts Public

Quick CLI for local text-to-speech using Qwen3-TTS or Kokoro TTS.

License

Notifications You must be signed in to change notification settings

fcjr/ltts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ltts

PyPI Version Python Versions License UV Friendly CI Publish

Quick CLI for local text-to-speech with two backends: Qwen3-TTS (default) and Kokoro TTS.

Install

Recommended (fast, reproducible):

uv tool install ltts

Run without installing:

uvx ltts "hello world" --say

With pip:

pip install ltts

NVIDIA GPU (Optional)

For faster inference on NVIDIA GPUs:

pip install 'ltts[cuda]'

Usage

# Generate speech (saves to output.mp3 by default)
ltts "Hello, world!"

# Play through speakers
ltts "Hello, world!" --say

# Save to specific file
ltts "Hello, world!" -o speech.wav

# Read from stdin
echo "Hello from pipe" | ltts --say
cat article.txt | ltts -o article.mp3

Backends

Qwen3-TTS (default)

Higher quality with voice cloning and emotional control. Supports 10 languages.

# Preset voices
ltts "Hello, world!" -v Ryan --say       # English male (default)
ltts "Hello, world!" -v Aiden --say      # English male
ltts "你好世界" -v Vivian --say           # Chinese female
ltts "こんにちは" -v Ono_Anna --say       # Japanese female
ltts "안녕하세요" -v Sohee --say           # Korean female

# Voice cloning (3+ seconds of reference audio)
ltts "Hello in your voice" --ref-audio voice.wav --say
ltts "Hello" --ref-audio voice.wav --ref-text "transcript" --say

# Emotional control
ltts "I can't believe we won!" --instruct "speak with excitement" --say

# Smaller model for faster inference
ltts "Hello world" --model-size 0.6B --say

Preset voices: Ryan, Aiden (English), Vivian, Serena, Dylan, Eric, Uncle_Fu (Chinese), Ono_Anna (Japanese), Sohee (Korean)

Languages: en, zh, ja, ko, de, fr, es, pt, it, ru

Kokoro TTS

Lightweight with 50+ voices. Supports streaming for faster time-to-first-audio.

# Use Kokoro backend
ltts "Hello world" -b kokoro -v af_heart --say
ltts "こんにちは" -b kokoro -v jf_alpha --say

# Stream chunks as generated (lower latency)
ltts "Hello world" -b kokoro --say --chunk

Voices: af_heart, af_alloy, af_bella, am_adam, am_michael (American), bf_alice, bf_emma, bm_daniel (British), jf_alpha, jm_kumo (Japanese), zf_xiaobei, zm_yunxi (Chinese), ef_dora, em_alex (Spanish), ff_siwis (French), and more.

Full voice list: https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md

Options

# Device selection
ltts "Hello" -d cpu --say    # CPU (default)
ltts "Hello" -d cuda --say   # NVIDIA GPU
ltts "Hello" -d mps --say    # Apple Silicon

# Output formats
ltts "test" -o out.mp3       # MP3 (default)
ltts "test" -o out.wav       # WAV
ltts "test" -o out.ogg       # OGG
ltts "test" -o out.flac      # FLAC

# Language override
ltts "Bonjour" -l fr --say

Notes

  • First run downloads models to ~/.cache/huggingface/ (~3GB for Qwen 1.7B, ~330MB for Kokoro)
  • Audio playback (--say) runs at 24 kHz
  • On Linux, ensure PulseAudio/PipeWire is running for audio playback

Development

uv sync
uv run ltts "hello world" --say
uv run ltts "hello world" -b kokoro -v af_heart --say
./scripts/release.sh

About

Quick CLI for local text-to-speech using Qwen3-TTS or Kokoro TTS.

Topics

Resources

License

Stars

Watchers

Forks