The open-source voice synthesis studio.
Clone voices. Generate speech. Build voice-powered apps.
All running locally on your machine.
voicebox.sh • Download • Features • API • Roadmap
Click the image above to watch the demo video on voicebox.sh
Voice AI is exploding, but most tools are either cloud-locked, expensive, or a nightmare to set up. Voicebox is different:
- 100% Local — Your voice data never leaves your machine
- Lightweight — No bloated Electron, native Tauri performance
- Fast — Near-instant on CUDA, optimized for Apple Silicon
- Flexible — Use the app, integrate the API, or both
- Open Source — No subscriptions, no limits, no lock-in
Built with Tauri (Rust), TypeScript, React, and Python. Native performance meets modern DX.
Voicebox is available now for macOS and Windows.
| Platform | Download |
|---|---|
| macOS (Apple Silicon) | voicebox_aarch64.app.tar.gz |
| macOS (Intel) | voicebox_x64.app.tar.gz |
| Windows (MSI) | voicebox_0.1.0_x64_en-US.msi |
| Windows (Setup) | voicebox_0.1.0_x64-setup.exe |
Linux builds coming soon — Currently blocked by GitHub runner disk space limitations.
Powered by Alibaba's Qwen3-TTS — a breakthrough model that achieves near-perfect voice cloning from just a few seconds of audio.
- Instant cloning — Upload a sample, get a voice profile
- High fidelity — Natural prosody, emotion, and cadence
- Multi-language — English, Chinese, and more coming
- Create profiles from audio files or record directly in-app
- Import/Export profiles to share or backup
- Organize with descriptions and language tags
- Text-to-speech with any cloned voice
- Batch generation for long-form content
- Smart caching — regenerate instantly with voice prompt caching
- In-app recording with waveform visualization
- Automatic transcription powered by Whisper
- Export recordings in multiple formats
- Full history of all generated audio
- Search & filter by voice, text, or date
- Re-generate any past generation with one click
- Local mode — Everything runs on your machine
- Remote mode — Connect to a GPU server on your network
- One-click server — Turn any machine into a Voicebox server
Voicebox exposes a full REST API, so you can integrate voice synthesis into your own apps.
# Generate speech
curl -X POST http://localhost:8000/api/generate \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "profile_id": "abc123"}'
# List voice profiles
curl http://localhost:8000/api/profiles
# Create a profile from audio
curl -X POST http://localhost:8000/api/profiles \
-F "audio=@voice-sample.wav" \
-F "name=My Voice"Use cases:
- Game dialogue systems
- Podcast/video production pipelines
- Accessibility tools
- Voice assistants
- Content creation automation
Full API documentation available at http://localhost:8000/docs when running.
| Layer | Technology |
|---|---|
| Desktop App | Tauri (Rust) |
| Frontend | React, TypeScript, Tailwind CSS |
| State | Zustand, React Query |
| Backend | FastAPI (Python) |
| Voice Model | Qwen3-TTS |
| Transcription | Whisper |
| Database | SQLite |
| Audio | WaveSurfer.js, librosa |
Why this stack?
- Tauri over Electron — 10x smaller bundle, native performance, lower memory
- FastAPI — Async Python with automatic OpenAPI schema generation
- Type-safe end-to-end — Generated TypeScript client from OpenAPI spec
Voicebox is the beginning of something bigger. Here's what's coming:
| Feature | Description |
|---|---|
| Real-time Synthesis | Stream audio as it generates, word by word |
| Conversation Mode | Multi-speaker dialogues with automatic turn-taking |
| Voice Effects | Pitch shift, reverb, M3GAN-style effects |
| Timeline Editor | Audio studio with word-level precision editing |
| More Models | XTTS, Bark, and other open-source voice models |
- Voice Design — Create new voices from text descriptions
- Project System — Save and load complex multi-voice sessions
- Plugin Architecture — Extend with custom models and effects
- Mobile Companion — Control Voicebox from your phone
Voicebox aims to be the one-stop shop for everything voice — cloning, synthesis, editing, effects, and beyond.
See CONTRIBUTING.md for detailed setup and contribution guidelines.
# Clone the repo
git clone https://github.com/voicebox-sh/voicebox.git
cd voicebox
# Install dependencies
bun install
# Install Python dependencies
cd backend && pip install -r requirements.txt && cd ..
# Start development
bun run devPrerequisites: Bun, Rust, Python 3.11+. CUDA-capable GPU recommended (CPU inference supported but slower).
voicebox/
├── app/ # Shared React frontend
├── tauri/ # Desktop app (Tauri + Rust)
├── web/ # Web deployment
├── backend/ # Python FastAPI server
├── landing/ # Marketing website
└── scripts/ # Build & release scripts
Contributions welcome! See CONTRIBUTING.md for guidelines.
- Fork the repo
- Create a feature branch
- Make your changes
- Submit a PR
Found a security vulnerability? Please report it responsibly. See SECURITY.md for details.
MIT License — see LICENSE for details.