Voice Mode now includes MCP tools to automatically install and configure whisper.cpp and kokoro-fastapi, making it easier to set up free, private, open-source voice services.
These tools handle:
- System detection (macOS/Linux)
- Dependency installation
- GPU support configuration
- Model downloads
- Service configuration
Installs whisper.cpp for speech-to-text (STT) functionality.
- Automatic OS detection (macOS/Linux)
- GPU acceleration (Metal on macOS, CUDA on Linux)
- Model download management
- Build optimization
- Service configuration (launchd on macOS, systemd on Linux)
- Environment variable support for model selection
# Basic installation with defaults
result = await install_whisper_cpp()
# Custom installation
result = await install_whisper_cpp(
install_dir="~/my-whisper",
model="large-v3",
use_gpu=True,
force_reinstall=False
)install_dir(str, optional): Installation directory (default:~/.voicemode/whisper.cpp)model(str, optional): Whisper model to download (default:large-v2)- Available models:
tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large-v2,large-v3 - Note: large-v2 is default for best accuracy (requires ~3GB RAM)
- Available models:
use_gpu(bool, optional): Enable GPU support (default: auto-detect)force_reinstall(bool, optional): Force reinstallation (default: false)
{
"success": True,
"install_path": "/Users/user/.voicemode/whisper.cpp",
"model_path": "/Users/user/.voicemode/whisper.cpp/models/ggml-large-v2.bin",
"gpu_enabled": True,
"gpu_type": "metal", # or "cuda" or "cpu"
"performance_info": {
"system": "Darwin",
"gpu_acceleration": "metal",
"model": "large-v2",
"binary_path": "/Users/user/.voicemode/whisper.cpp/main",
"server_port": 2022,
"server_url": "http://localhost:2022"
},
"launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.whisper-server.plist", # macOS
"systemd_service": "/home/user/.config/systemd/user/whisper-server.service", # Linux
"start_script": "/Users/user/.voicemode/whisper.cpp/start-whisper-server.sh"
}Installs kokoro-fastapi for text-to-speech (TTS) functionality.
- Python environment management with UV
- Automatic model downloads
- Service configuration (launchd on macOS, systemd on Linux)
- Auto-start capability
# Basic installation with defaults
result = await install_kokoro_fastapi()
# Custom installation
result = await install_kokoro_fastapi(
install_dir="~/my-kokoro",
models_dir="~/my-models",
port=8881,
auto_start=True,
install_models=True,
force_reinstall=False
)install_dir(str, optional): Installation directory (default:~/.voicemode/kokoro-fastapi)models_dir(str, optional): Models directory (default:~/.voicemode/kokoro-models)port(int, optional): Service port (default: 8880)auto_start(bool, optional): Start service after installation (default: true)install_models(bool, optional): Download Kokoro models (default: true)force_reinstall(bool, optional): Force reinstallation (default: false)
{
"success": True,
"install_path": "/home/user/.voicemode/kokoro-fastapi",
"service_url": "http://127.0.0.1:8880",
"service_status": "managed_by_systemd", # Linux
"service_status": "managed_by_launchd", # macOS
"systemd_service": "/home/user/.config/systemd/user/kokoro-fastapi-8880.service", # Linux
"launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.kokoro-8880.plist", # macOS
"start_script": "/home/user/.voicemode/kokoro-fastapi/start-cpu.sh",
"message": "Kokoro-fastapi installed. Run: cd /home/user/.voicemode/kokoro-fastapi && ./start-cpu.sh"
}- Xcode Command Line Tools
- Homebrew (for cmake)
- Metal support (built-in)
- Build essentials (gcc, g++, make)
- CMake
- CUDA toolkit (optional, for NVIDIA GPU support)
- Python 3.10+
- Git
- ~5GB disk space for models
- UV package manager (installed automatically if missing)
After installation, the services integrate automatically with Voice Mode:
-
whisper.cpp:
- Runs automatically on boot (port 2022)
- OpenAI-compatible API endpoint
- Model selection via
VOICEMODE_WHISPER_MODELenvironment variable - View installed models:
claude resource read whisper://models
-
kokoro-fastapi:
- Automatically detected by Voice Mode's provider registry when running
- 67 voices available
- OpenAI-compatible API endpoint
# Install both services with defaults
whisper_result = await install_whisper_cpp() # Uses large-v2 by default
kokoro_result = await install_kokoro_fastapi()
# Check installation status
if whisper_result["success"] and kokoro_result["success"]:
print("Voice services installed successfully!")
print(f"Whisper: {whisper_result['install_path']}")
print(f"Whisper server: {whisper_result['performance_info']['server_url']}")
print(f"Kokoro API: {kokoro_result['service_url']}")# Force reinstall with larger model
result = await install_whisper_cpp(
model="large-v3",
force_reinstall=True
)# Install kokoro-fastapi on different port
result = await install_kokoro_fastapi(
port=9000,
models_dir="/opt/models/kokoro"
)-
Missing Dependencies
- The tools will report missing dependencies with installation instructions
- Follow the provided commands to install required packages
-
Port Conflicts
- If port 8880 is in use, specify a different port for kokoro-fastapi
- Check running services:
lsof -i :8880
-
GPU Not Detected
- On Linux, ensure NVIDIA drivers and CUDA are installed
- Use
nvidia-smito verify GPU availability - Force CPU mode with
use_gpu=Falseif needed
-
Model Download Failures
- Check internet connection
- Verify sufficient disk space
- Try smaller models first (tiny, base)
# Whisper
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
# Kokoro
launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist# Whisper
systemctl --user start whisper-server
systemctl --user stop whisper-server
systemctl --user status whisper-server
# Kokoro
systemctl --user start kokoro-fastapi-8880
systemctl --user stop kokoro-fastapi-8880
systemctl --user status kokoro-fastapi-8880# Set environment variable before restarting
export VOICEMODE_WHISPER_MODEL=base.en # or tiny, small, medium, large-v2, large-v3
# Restart service to apply change
# macOS
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
# Linux
systemctl --user restart whisper-serverRun the test suite to verify installation tools:
cd /path/to/voicemode
python -m pytest tests/test_installers.py -v
# Skip integration tests (no actual installation)
SKIP_INTEGRATION_TESTS=1 python -m pytest tests/test_installers.py -vWhen adding new installation tools:
- Create a new function in
voice_mode/tools/installers.py - Use the
@mcp.tool()decorator - Follow the existing pattern for error handling and return values
- Add comprehensive tests in
tests/test_installers.py - Update this documentation
- All installations are performed in user space (no sudo required)
- Models are downloaded from official sources
- Services bind to localhost only by default
- No external network access without explicit configuration