voxcpm-mac-service

Runs OpenBMB VoxCPM as a service on macOS, on the Apple GPU (MPS), without Pinokio in the picture.

Fourth in the family, after kokoro-tts-mac-service, xtts-mac-service, and f5-tts-mac-service. Same shape, different model.

Why this exists

VoxCPM is the heaviest of the four engines but currently the strongest at preserving the style of the reference voice (rhythm, breath, micropauses). Useful when you want the cloned voice to actually sound like the person speaking, not just share their timbre.

The catch on Mac:

The upstream app.py only knows about CUDA. We patch it for MPS detection.
The upstream app.py always passes optimize=True, which inside the lib raises ValueError("VoxCPMModel can only be optimized on CUDA device") on MPS or CPU. We patch it to disable optimize when not on CUDA.
The upstream app.py binds to 0.0.0.0 with no --host flag. We add one.

voxcpm itself (the Python package) already detects MPS internally, so the patch is small and only touches the demo app, not the model lib.

What you get

VoxCPM at http://127.0.0.1:7863, 24/7.
Inference on the Apple GPU (with CPU fallback for ops without an MPS kernel).
Two model choices via VOXCPM_MODEL_ID:
- openbmb/VoxCPM-0.5B (default, ~3 GB, low VRAM, recommended for MPS)
- openbmb/VoxCPM2 (full 2B, ~5 GB, slower on MPS, higher quality)
Co-exists with Kokoro on 7860, XTTS on 7861, F5 on 7862.
Auto-restart if the process crashes.
48 kHz output (vs 24 kHz from XTTS and F5).

Before you start

macOS (tested on Sequoia, Apple Silicon).
Python 3.12 on PATH. brew install python@3.12.
~5 GB of disk for the 0.5B model + deps; ~7 GB for the 2B model.
Some patience: VoxCPM is heavier than the other three engines. First call after cold boot takes a while because of MPS JIT warmup, and even warm calls are slower than F5 or XTTS.

Install

git clone https://github.com/linuxelitebr/voxcpm-mac-service.git
cd voxcpm-mac-service
./scripts/install.sh

To use the full 2B model instead of the default 0.5B:

VOXCPM_MODEL_ID=openbmb/VoxCPM2 ./scripts/install.sh

The installer will:

Create ./env with Python 3.12 if it doesn't exist.
pip install voxcpm from PyPI (and a pile of friends including funasr, modelscope, etc.).
Apply the MPS + host patch to app.py (with backup).
Render com.voxcpm.tts.plist with your real paths, drop it in ~/Library/LaunchAgents/.
Load the LaunchAgent and wait for Gradio to answer on :7863.

First boot downloads the model, the SenseVoice ASR, and the denoiser. Plan for 5 to 15 minutes on a decent connection.

Use it

From the browser

Open http://127.0.0.1:7863. Upload a 5 to 30 second reference WAV, optionally write a control instruction in natural language ("warm voice, calm pacing"), type the target text, generate.

From Python

from gradio_client import Client, handle_file

c = Client("http://127.0.0.1:7863")
audio_path = c.predict(
    "Hello in my own voice, generated on the Apple GPU.",  # text
    "",                                                     # control instruction
    handle_file("path/to/your/voice-sample.wav"),           # reference_wav
    True,                                                   # show_prompt_text
    "transcript of the reference audio",                    # prompt_text
    2.0,                                                    # cfg_value (1.0 to 3.0)
    True,                                                   # DoNormalizeText
    False,                                                  # DoDenoisePromptAudio
    10,                                                     # dit_steps (1 to 50)
    api_name="/generate",
)
print(audio_path)

Tuning knobs

cfg_value (1.0 to 3.0, default 2.0): higher follows the reference more strictly; lower lets the model improvise.
dit_steps (1 to 50, default 10): diffusion steps. More = cleaner output, slower. Sweet spot 10 to 20.
DoNormalizeText: recommended True for technical writing with punctuation.
DoDenoisePromptAudio: only turn on if your reference WAV has noise.
control_instruction: free text in any of the 30 supported languages. Examples: "warm young woman, calm pacing", "voz masculina madura, leitura técnica pausada". Empty = take cues from reference.

Day to day

Thing	Command	What it does
Start	`./scripts/start.sh`	Idempotent. Returns 0 instantly if already up.
Stop	`./scripts/stop.sh`	Unloads the agent for this session.
Status	`./scripts/status.sh`	Plist + LaunchAgent + HTTP, all in one screen.
Tail stdout	`tail -f logs/voxcpm.out.log`
Tail stderr	`tail -f logs/voxcpm.err.log`

start.sh defaults to WAIT_TIMEOUT=300 (5 min) because cold boot on MPS can be slow.

Tests

VOXCPM_REF_AUDIO=/path/to/your-voice.wav ./scripts/test.sh

Four checks: launchd state, MPS in logs, HTTP, and an E2E voice cloning call.

How it works

The patch

Three changes to app.py (full diff in patches/01-mps-and-host.patch):

1. Detect MPS

if torch.cuda.is_available():
    self.device = "cuda"
elif torch.backends.mps.is_available():
    self.device = "mps"
    os.environ.setdefault("PYTORCH_ENABLE_MPS_FALLBACK", "1")
else:
    self.device = "cpu"

2. Don't pass optimize=True outside CUDA

self.voxcpm_model = voxcpm.VoxCPM.from_pretrained(
    self._model_id,
    optimize=(self.device == "cuda"),
)

3. Add --host flag so we can pin to 127.0.0.1 from the LaunchAgent.

The actual voxcpm lib already supports MPS internally, so we only patch the demo app.

The LaunchAgent

~/Library/LaunchAgents/com.voxcpm.tts.plist runs as your user, on your login. Same shape as the sister projects: RunAtLoad=true, KeepAlive only on crash, ThrottleInterval=10, PYTHONUNBUFFERED=1, TOKENIZERS_PARALLELISM=false.

The model id is rendered into the plist at install time, so to switch models you uninstall and reinstall with a different VOXCPM_MODEL_ID.

Uninstall

./scripts/uninstall.sh           # just unloads the service
./scripts/uninstall.sh --revert  # also reverts the patch in app.py
./scripts/uninstall.sh --purge   # also nukes ./env and the model cache

Heads up

Slow on MPS. VoxCPM is genuinely heavy, and several diffusion ops fall back to CPU. Even the 0.5B model is significantly slower than F5 or XTTS on the same Apple Silicon. If you need throughput, run it on a CUDA box and point your client at it.
First call is the slowest. Plan for several minutes on the first generation while MPS JIT warms up.
Model cache lives in ~/.cache/modelscope/hub/openbmb/ for the VoxCPM weights, and ~/.cache/huggingface/ for the funasr ASR.
Apache-2.0 license on the weights, free for commercial use.

Credits

Model and code: OpenBMB/VoxCPM
PyPI package: voxcpm (maintained by OpenBMB)

License

MIT for the wrapper code in this repo. VoxCPM weights and lib code are Apache-2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voxcpm-mac-service

Why this exists

What you get

Before you start

Install

Use it

From the browser

From Python

Tuning knobs

Day to day

Tests

How it works

The patch

The LaunchAgent

Uninstall

Heads up

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
patches		patches
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
com.voxcpm.tts.plist		com.voxcpm.tts.plist

Folders and files

Latest commit

History

Repository files navigation

voxcpm-mac-service

Why this exists

What you get

Before you start

Install

Use it

From the browser

From Python

Tuning knobs

Day to day

Tests

How it works

The patch

The LaunchAgent

Uninstall

Heads up

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages