Voicebox

The open-source voice synthesis studio.
Clone voices. Generate speech. Build voice-powered apps.
All running locally on your machine.

voicebox.sh • Download • Features • API • Roadmap

Click the image above to watch the demo video on voicebox.sh

Why Voicebox?

Voice AI is exploding, but most tools are either cloud-locked, expensive, or a nightmare to set up. Voicebox is different:

100% Local — Your voice data never leaves your machine
Lightweight — No bloated Electron, native Tauri performance
Fast — Near-instant on CUDA, optimized for Apple Silicon
Flexible — Use the app, integrate the API, or both
Open Source — No subscriptions, no limits, no lock-in

Built with Tauri (Rust), TypeScript, React, and Python. Native performance meets modern DX.

Download

Voicebox is available now for macOS and Windows.

Platform	Download
macOS (Apple Silicon)	voicebox_aarch64.app.tar.gz
macOS (Intel)	voicebox_x64.app.tar.gz
Windows (MSI)	voicebox_0.1.0_x64_en-US.msi
Windows (Setup)	voicebox_0.1.0_x64-setup.exe

Linux builds coming soon — Currently blocked by GitHub runner disk space limitations.

Features

Voice Cloning with Qwen3-TTS

Powered by Alibaba's Qwen3-TTS — a breakthrough model that achieves near-perfect voice cloning from just a few seconds of audio.

Instant cloning — Upload a sample, get a voice profile
High fidelity — Natural prosody, emotion, and cadence
Multi-language — English, Chinese, and more coming

Voice Profile Management

Create profiles from audio files or record directly in-app
Import/Export profiles to share or backup
Organize with descriptions and language tags

Speech Generation

Text-to-speech with any cloned voice
Batch generation for long-form content
Smart caching — regenerate instantly with voice prompt caching

Recording & Transcription

In-app recording with waveform visualization
Automatic transcription powered by Whisper
Export recordings in multiple formats

Generation History

Full history of all generated audio
Search & filter by voice, text, or date
Re-generate any past generation with one click

Flexible Deployment

Local mode — Everything runs on your machine
Remote mode — Connect to a GPU server on your network
One-click server — Turn any machine into a Voicebox server

API

Voicebox exposes a full REST API, so you can integrate voice synthesis into your own apps.

# Generate speech
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "profile_id": "abc123"}'

# List voice profiles
curl http://localhost:8000/api/profiles

# Create a profile from audio
curl -X POST http://localhost:8000/api/profiles \
  -F "audio=@voice-sample.wav" \
  -F "name=My Voice"

Use cases:

Game dialogue systems
Podcast/video production pipelines
Accessibility tools
Voice assistants
Content creation automation

Full API documentation available at http://localhost:8000/docs when running.

Tech Stack

Layer	Technology
Desktop App	Tauri (Rust)
Frontend	React, TypeScript, Tailwind CSS
State	Zustand, React Query
Backend	FastAPI (Python)
Voice Model	Qwen3-TTS
Transcription	Whisper
Database	SQLite
Audio	WaveSurfer.js, librosa

Why this stack?

Tauri over Electron — 10x smaller bundle, native performance, lower memory
FastAPI — Async Python with automatic OpenAPI schema generation
Type-safe end-to-end — Generated TypeScript client from OpenAPI spec

Roadmap

Voicebox is the beginning of something bigger. Here's what's coming:

Coming Soon

Feature	Description
Real-time Synthesis	Stream audio as it generates, word by word
Conversation Mode	Multi-speaker dialogues with automatic turn-taking
Voice Effects	Pitch shift, reverb, M3GAN-style effects
Timeline Editor	Audio studio with word-level precision editing
More Models	XTTS, Bark, and other open-source voice models

Future Vision

Voice Design — Create new voices from text descriptions
Project System — Save and load complex multi-voice sessions
Plugin Architecture — Extend with custom models and effects
Mobile Companion — Control Voicebox from your phone

Voicebox aims to be the one-stop shop for everything voice — cloning, synthesis, editing, effects, and beyond.

Development

See CONTRIBUTING.md for detailed setup and contribution guidelines.

Quick Start

# Clone the repo
git clone https://github.com/voicebox-sh/voicebox.git
cd voicebox

# Install dependencies
bun install

# Install Python dependencies
cd backend && pip install -r requirements.txt && cd ..

# Start development
bun run dev

Prerequisites: Bun, Rust, Python 3.11+. CUDA-capable GPU recommended (CPU inference supported but slower).

Project Structure

voicebox/
├── app/              # Shared React frontend
├── tauri/            # Desktop app (Tauri + Rust)
├── web/              # Web deployment
├── backend/          # Python FastAPI server
├── landing/          # Marketing website
└── scripts/          # Build & release scripts

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Fork the repo
Create a feature branch
Make your changes
Submit a PR

Security

Found a security vulnerability? Please report it responsibly. See SECURITY.md for details.

License

MIT License — see LICENSE for details.

voicebox.sh

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github		.github
app		app
backend		backend
data		data
docs		docs
landing		landing
scripts		scripts
tauri		tauri
web		web
.biomeignore		.biomeignore
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
bun.lock		bun.lock
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voicebox

Why Voicebox?

Download

Features

Voice Cloning with Qwen3-TTS

Voice Profile Management

Speech Generation

Recording & Transcription

Generation History

Flexible Deployment

API

Tech Stack

Roadmap

Coming Soon

Future Vision

Development

Quick Start

Project Structure

Contributing

Security

License

About

Uh oh!

Releases 3

Packages

Languages

License

jamiepine/voicebox

Folders and files

Latest commit

History

Repository files navigation

Voicebox

Why Voicebox?

Download

Features

Voice Cloning with Qwen3-TTS

Voice Profile Management

Speech Generation

Recording & Transcription

Generation History

Flexible Deployment

API

Tech Stack

Roadmap

Coming Soon

Future Vision

Development

Quick Start

Project Structure

Contributing

Security

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages