vibevoice-cpu

Offline VibeVoice text-to-speech that turns text into audio files — built to run CPU-native, and to use a GPU automatically when one is present.

This is a file generator, not a live read-aloud voice. Point it at some text and a reference voice, and it writes a WAV. It downloads the model and the reference voices on first use, and warns you when RAM is tight so a long run on a small machine isn't a surprise.

Why

Microsoft's VibeVoice is a great long-form, expressive TTS model — but the published code is GPU-oriented, needs a manual model download, and has no clean "just give me a WAV" entry point. This package wraps it into something small and CPU-first: automatic device selection, on-demand model + voice downloads, a one-call Python API, and a CLI.

Born out of Quill (a screen-reader-first editor, a Community Access project), which wanted to export a document to a spoken audio file without requiring a GPU.

Install

pip install "vibevoice-cpu[model]"     # CLI/helpers + the inference stack (torch, etc.)
pip install "vibevoice-cpu[model,ram]" # also better RAM detection (psutil)

The first run downloads the model (several GB for 1.5B) and the reference voices into a cache ($VIBEVOICE_CPU_HOME, else ~/.cache/vibevoice-cpu).

Use it (Python)

from vibevoice_cpu import synthesize, list_voices

print(list_voices(download=True))          # ['Alice', 'Carter', 'Frank', ...]
synthesize("Hello there, this is VibeVoice.", "out.wav", voice="Alice")

Use it (CLI)

vibevoice-cpu download                     # pre-fetch model + voices
vibevoice-cpu voices                       # list reference voices
vibevoice-cpu synth -o out.wav -v Alice "Hello there."
echo "A whole paragraph…" | vibevoice-cpu synth -o out.wav

Devices & speed

CUDA (NVIDIA) → used automatically, bfloat16.
MPS (Apple Silicon) → used automatically, float32.
CPU → the fallback, float32. It works — but VibeVoice is a large model, so on CPU expect minutes per passage, not real time. With less than ~8 GB free it will be very slow and may swap; the library prints a warning before it starts.

This is why it's a file tool: generate in the background, save the WAV, play it when it's ready.

API

synthesize(text, output_path, *, voice="Alice", model="1.5B", on_log=...) -> Path
VibeVoiceEngine(model="1.5B", *, on_log=..., cfg_scale=1.3, cpu_threads=None) — .load(), .synthesize(text, output_path, voice=...), .device.
list_voices(download=False), available_ram_gb(), ram_warning(model="1.5B").

Models

"1.5B" (default) and "7B" map to the community Hugging Face mirrors; you can also pass any repo id or local path.

License

MIT — made by Taylor Arndt, a Community Access project. Contributions welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/vibevoice_cpu		src/vibevoice_cpu
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vibevoice-cpu

Why

Install

Use it (Python)

Use it (CLI)

Devices & speed

API

Models

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vibevoice-cpu

Why

Install

Use it (Python)

Use it (CLI)

Devices & speed

API

Models

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages