Skip to content

Latest commit

 

History

History
286 lines (225 loc) · 8.26 KB

File metadata and controls

286 lines (225 loc) · 8.26 KB

Voice Mode Installation Tools

Voice Mode now includes MCP tools to automatically install and configure whisper.cpp and kokoro-fastapi, making it easier to set up free, private, open-source voice services.

Overview

These tools handle:

  • System detection (macOS/Linux)
  • Dependency installation
  • GPU support configuration
  • Model downloads
  • Service configuration

Available Tools

install_whisper_cpp

Installs whisper.cpp for speech-to-text (STT) functionality.

Features

  • Automatic OS detection (macOS/Linux)
  • GPU acceleration (Metal on macOS, CUDA on Linux)
  • Model download management
  • Build optimization
  • Service configuration (launchd on macOS, systemd on Linux)
  • Environment variable support for model selection

Usage

# Basic installation with defaults
result = await install_whisper_cpp()

# Custom installation
result = await install_whisper_cpp(
    install_dir="~/my-whisper",
    model="large-v3",
    use_gpu=True,
    force_reinstall=False
)

Parameters

  • install_dir (str, optional): Installation directory (default: ~/.voicemode/whisper.cpp)
  • model (str, optional): Whisper model to download (default: large-v2)
    • Available models: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v2, large-v3
    • Note: large-v2 is default for best accuracy (requires ~3GB RAM)
  • use_gpu (bool, optional): Enable GPU support (default: auto-detect)
  • force_reinstall (bool, optional): Force reinstallation (default: false)

Return Value

{
    "success": True,
    "install_path": "/Users/user/.voicemode/whisper.cpp",
    "model_path": "/Users/user/.voicemode/whisper.cpp/models/ggml-large-v2.bin",
    "gpu_enabled": True,
    "gpu_type": "metal",  # or "cuda" or "cpu"
    "performance_info": {
        "system": "Darwin",
        "gpu_acceleration": "metal",
        "model": "large-v2",
        "binary_path": "/Users/user/.voicemode/whisper.cpp/main",
        "server_port": 2022,
        "server_url": "http://localhost:2022"
    },
    "launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.whisper-server.plist",  # macOS
    "systemd_service": "/home/user/.config/systemd/user/whisper-server.service",  # Linux
    "start_script": "/Users/user/.voicemode/whisper.cpp/start-whisper-server.sh"
}

install_kokoro_fastapi

Installs kokoro-fastapi for text-to-speech (TTS) functionality.

Features

  • Python environment management with UV
  • Automatic model downloads
  • Service configuration (launchd on macOS, systemd on Linux)
  • Auto-start capability

Usage

# Basic installation with defaults
result = await install_kokoro_fastapi()

# Custom installation
result = await install_kokoro_fastapi(
    install_dir="~/my-kokoro",
    models_dir="~/my-models",
    port=8881,
    auto_start=True,
    install_models=True,
    force_reinstall=False
)

Parameters

  • install_dir (str, optional): Installation directory (default: ~/.voicemode/kokoro-fastapi)
  • models_dir (str, optional): Models directory (default: ~/.voicemode/kokoro-models)
  • port (int, optional): Service port (default: 8880)
  • auto_start (bool, optional): Start service after installation (default: true)
  • install_models (bool, optional): Download Kokoro models (default: true)
  • force_reinstall (bool, optional): Force reinstallation (default: false)

Return Value

{
    "success": True,
    "install_path": "/home/user/.voicemode/kokoro-fastapi",
    "service_url": "http://127.0.0.1:8880",
    "service_status": "managed_by_systemd",  # Linux
    "service_status": "managed_by_launchd",  # macOS
    "systemd_service": "/home/user/.config/systemd/user/kokoro-fastapi-8880.service",  # Linux
    "launchagent": "/Users/user/Library/LaunchAgents/com.voicemode.kokoro-8880.plist",  # macOS
    "start_script": "/home/user/.voicemode/kokoro-fastapi/start-cpu.sh",
    "message": "Kokoro-fastapi installed. Run: cd /home/user/.voicemode/kokoro-fastapi && ./start-cpu.sh"
}

System Requirements

whisper.cpp

macOS

  • Xcode Command Line Tools
  • Homebrew (for cmake)
  • Metal support (built-in)

Linux

  • Build essentials (gcc, g++, make)
  • CMake
  • CUDA toolkit (optional, for NVIDIA GPU support)

kokoro-fastapi

All Systems

  • Python 3.10+
  • Git
  • ~5GB disk space for models
  • UV package manager (installed automatically if missing)

Integration with Voice Mode

After installation, the services integrate automatically with Voice Mode:

  1. whisper.cpp:

    • Runs automatically on boot (port 2022)
    • OpenAI-compatible API endpoint
    • Model selection via VOICEMODE_WHISPER_MODEL environment variable
    • View installed models: claude resource read whisper://models
  2. kokoro-fastapi:

    • Automatically detected by Voice Mode's provider registry when running
    • 67 voices available
    • OpenAI-compatible API endpoint

Examples

Complete Setup

# Install both services with defaults
whisper_result = await install_whisper_cpp()  # Uses large-v2 by default
kokoro_result = await install_kokoro_fastapi()

# Check installation status
if whisper_result["success"] and kokoro_result["success"]:
    print("Voice services installed successfully!")
    print(f"Whisper: {whisper_result['install_path']}")
    print(f"Whisper server: {whisper_result['performance_info']['server_url']}")
    print(f"Kokoro API: {kokoro_result['service_url']}")

Upgrade Existing Installation

# Force reinstall with larger model
result = await install_whisper_cpp(
    model="large-v3",
    force_reinstall=True
)

Custom Configuration

# Install kokoro-fastapi on different port
result = await install_kokoro_fastapi(
    port=9000,
    models_dir="/opt/models/kokoro"
)

Troubleshooting

Common Issues

  1. Missing Dependencies

    • The tools will report missing dependencies with installation instructions
    • Follow the provided commands to install required packages
  2. Port Conflicts

    • If port 8880 is in use, specify a different port for kokoro-fastapi
    • Check running services: lsof -i :8880
  3. GPU Not Detected

    • On Linux, ensure NVIDIA drivers and CUDA are installed
    • Use nvidia-smi to verify GPU availability
    • Force CPU mode with use_gpu=False if needed
  4. Model Download Failures

    • Check internet connection
    • Verify sufficient disk space
    • Try smaller models first (tiny, base)

Manual Service Management

macOS (launchd)

# Whisper
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist

# Kokoro
launchctl load ~/Library/LaunchAgents/com.voicemode.kokoro.plist
launchctl unload ~/Library/LaunchAgents/com.voicemode.kokoro.plist

Linux (systemd)

# Whisper
systemctl --user start whisper-server
systemctl --user stop whisper-server
systemctl --user status whisper-server

# Kokoro
systemctl --user start kokoro-fastapi-8880
systemctl --user stop kokoro-fastapi-8880
systemctl --user status kokoro-fastapi-8880

Change Whisper Model

# Set environment variable before restarting
export VOICEMODE_WHISPER_MODEL=base.en  # or tiny, small, medium, large-v2, large-v3

# Restart service to apply change
# macOS
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper-server.plist
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper-server.plist

# Linux
systemctl --user restart whisper-server

Testing

Run the test suite to verify installation tools:

cd /path/to/voicemode
python -m pytest tests/test_installers.py -v

# Skip integration tests (no actual installation)
SKIP_INTEGRATION_TESTS=1 python -m pytest tests/test_installers.py -v

Contributing

When adding new installation tools:

  1. Create a new function in voice_mode/tools/installers.py
  2. Use the @mcp.tool() decorator
  3. Follow the existing pattern for error handling and return values
  4. Add comprehensive tests in tests/test_installers.py
  5. Update this documentation

Security Notes

  • All installations are performed in user space (no sudo required)
  • Models are downloaded from official sources
  • Services bind to localhost only by default
  • No external network access without explicit configuration