Whisper.cpp STT Setup

Whisper.cpp is a local speech-to-text engine that provides an OpenAI-compatible API. Voice Mode can use it as an alternative to OpenAI's speech-to-text service.

How Voice Mode Uses Whisper

Voice Mode automatically checks for local STT services before falling back to OpenAI:

First: Checks for Whisper.cpp on http://127.0.0.1:2022/v1
Fallback: Uses OpenAI API (requires OPENAI_API_KEY)

Setting Up Whisper.cpp

Quick Installation

Voice Mode includes an installation tool that sets up Whisper.cpp automatically:

# Install with default large-v2 model (recommended)
claude run install_whisper_cpp

# Install with a specific model
claude run install_whisper_cpp --model base.en

This will:

Clone and build Whisper.cpp with GPU support (if available)
Download the specified model (default: large-v2)
Create a start script with environment variable support
Set up automatic startup (launchd on macOS, systemd on Linux)

Prerequisites

macOS:

Xcode Command Line Tools (xcode-select --install)
Homebrew (https://brew.sh)
cmake (brew install cmake)

Linux:

Build essentials (sudo apt install build-essential on Ubuntu/Debian)

Manual Installation

Alternatively, install Whisper.cpp following the official instructions.

Running the OpenAI-Compatible Server

To run Whisper.cpp with an OpenAI-compatible API endpoint:

whisper-server \
  --model models/ggml-large-v2.bin \
  --host 127.0.0.1 \
  --port 2022 \
  --inference-path "/v1/audio/transcriptions" \
  --threads 4 \
  --processors 1 \
  --convert \
  --print-progress

Key options:

--model: Model file path (supports tiny, base, small, medium, large-v2, large-v3)
--host: Server host (default: 127.0.0.1)
--port: Server port (Voice Mode expects 2022)
--inference-path: OpenAI-compatible endpoint path
--threads: Number of threads for processing
--processors: Number of parallel processors
--convert: Convert audio to required format automatically
--print-progress: Show transcription progress

Voice Mode will automatically detect and use it when running on port 2022!

Manual Configuration (Optional)

To use a different Whisper endpoint or force its use:

export STT_BASE_URL=http://127.0.0.1:2022/v1

Or add to your MCP configuration:

"voice-mode": {
  "env": {
    "STT_BASE_URL": "http://127.0.0.1:2022/v1"
  }
}

Model Selection

Available Models

Model	Size	RAM Usage	Accuracy	Speed
tiny	39 MB	~390 MB	Low	Fastest
base	142 MB	~500 MB	Fair	Fast
small	466 MB	~1 GB	Good	Moderate
medium	1.5 GB	~2.6 GB	Very Good	Slow
large-v2	3 GB	~3.9 GB	Excellent	Slower
large-v3	3 GB	~3.9 GB	Best	Slowest

Switching Models

Set the VOICEMODE_WHISPER_MODEL environment variable:

# Use base model for faster processing
export VOICEMODE_WHISPER_MODEL=base.en

# Use large-v2 for best accuracy (default)
export VOICEMODE_WHISPER_MODEL=large-v2

Viewing Available Models

Use the MCP resource to see installed models:

claude resource read whisper://models

Hardware Optimization

The installation tool automatically detects and enables:

Mac (Apple Silicon): Metal acceleration
NVIDIA GPU: CUDA acceleration
CPU: Optimized CPU builds

Performance

Local Whisper typically processes speech in 1-3 seconds depending on:

Hardware (GPU/CPU)
Model size
Audio length

Fully Local Setup

For completely offline voice processing, combine Whisper with Kokoro:

export STT_BASE_URL=http://127.0.0.1:2022/v1
export TTS_BASE_URL=http://127.0.0.1:8880/v1
export TTS_VOICE=af_sky

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper.cpp STT Setup

How Voice Mode Uses Whisper

Setting Up Whisper.cpp

Quick Installation

Prerequisites

Manual Installation

Running the OpenAI-Compatible Server

Manual Configuration (Optional)

Model Selection

Available Models

Switching Models

Viewing Available Models

Hardware Optimization

Performance

Fully Local Setup

FilesExpand file tree

whisper.cpp.md

Latest commit

History

whisper.cpp.md

File metadata and controls

Whisper.cpp STT Setup

How Voice Mode Uses Whisper

Setting Up Whisper.cpp

Quick Installation

Prerequisites

Manual Installation

Running the OpenAI-Compatible Server

Manual Configuration (Optional)

Model Selection

Available Models

Switching Models

Viewing Available Models

Hardware Optimization

Performance

Fully Local Setup