Pinned to whisper.cpp v1.8.3 (fork:
rmorse/whisper.cpp, branch:stream-pcm)
Safe Rust bindings for whisper.cpp with real-time PCM streaming and VAD support — OpenAI's Whisper speech recognition model.
use whisper_cpp_plus::WhisperContext;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let ctx = WhisperContext::new("models/ggml-base.en.bin")?;
// Audio must be 16kHz mono f32
let audio: Vec<f32> = load_audio("audio.wav");
let text = ctx.transcribe(&audio)?;
println!("{}", text);
Ok(())
}- Thread-safe —
WhisperContextisSend + Sync, share viaArc - Streaming — real-time transcription via
WhisperStreamandWhisperStreamPcm - VAD — Silero Voice Activity Detection integration
- Enhanced VAD — segment aggregation for optimal transcription chunks
- Temperature fallback — quality-based retry with multiple temperatures
- Async —
tokio::spawn_blockingwrappers (feature =async) - Cross-platform — Windows (MSVC), Linux, macOS (Intel & Apple Silicon)
- Quantization — model compression via
WhisperQuantize(feature =quantization) - Hardware acceleration — SIMD auto-detected, GPU via feature flags
[dependencies]
whisper-cpp-plus = "0.1.4"
# Optional
hound = "3.5" # WAV file loading- Rust 1.70.0+
- CMake 3.14+
- C++ compiler (MSVC on Windows, GCC/Clang on Linux/macOS)
whisper-cpp-plus = { version = "0.1.4", features = ["quantization"] } # Model quantization
whisper-cpp-plus = { version = "0.1.4", features = ["async"] } # Async API
whisper-cpp-plus = { version = "0.1.4", features = ["cuda"] } # NVIDIA GPU
whisper-cpp-plus = { version = "0.1.4", features = ["metal"] } # macOS GPUInstall the CUDA Toolkit and build:
cargo build --features cudaThe build script uses CMake to compile whisper.cpp with CUDA support automatically. The CUDA toolkit is located via CUDA_PATH → CUDA_HOME → standard install paths.
Advanced: prebuilt libraries — for CI or to skip recompilation, set WHISPER_PREBUILT_PATH to a directory containing pre-compiled static libs. See docs/CACHING_GUIDE.md.
| Crate | Description |
|---|---|
whisper-cpp-plus |
High-level safe Rust bindings |
whisper-cpp-plus-sys |
Low-level FFI bindings |
Model quantization available via features = ["quantization"].
| Type | Description | whisper.cpp equivalent |
|---|---|---|
WhisperContext |
Model context (Send + Sync) |
whisper_context* |
WhisperState |
Transcription state (Send only) |
whisper_state* |
FullParams |
Transcription parameters | whisper_full_params |
TranscriptionResult |
Text + timestamped segments | — |
WhisperStream |
Chunked real-time streaming | — |
WhisperStreamPcm |
Streaming from raw PCM input | stream-pcm.cpp |
WhisperVadProcessor |
Silero voice activity detection | whisper_vad_* |
EnhancedWhisperVadProcessor |
VAD + segment aggregation | — |
EnhancedWhisperState |
Transcription with temperature fallback | — |
WhisperQuantize |
Model quantization (feature) | quantize.cpp |
Transcription with parameters:
use whisper_cpp_plus::{WhisperContext, TranscriptionParams};
let ctx = WhisperContext::new("model.bin")?;
let params = TranscriptionParams::builder()
.language("en")
.temperature(0.0)
.enable_timestamps()
.n_threads(4)
.build();
let result = ctx.transcribe_with_params(&audio, params)?;
for segment in &result.segments {
println!("[{:.2}s - {:.2}s] {}",
segment.start_seconds(), segment.end_seconds(), segment.text);
}Concurrent transcription:
use std::sync::Arc;
let ctx = Arc::new(WhisperContext::new("model.bin")?);
// Each thread gets its own WhisperState internally
let handles: Vec<_> = files.iter().map(|file| {
let ctx = Arc::clone(&ctx);
std::thread::spawn(move || ctx.transcribe(&load_audio(file)))
}).collect();Streaming:
use whisper_cpp_plus::{WhisperStream, FullParams, SamplingStrategy};
let ctx = WhisperContext::new("model.bin")?;
let params = FullParams::new(SamplingStrategy::Greedy { best_of: 1 });
let mut stream = WhisperStream::new(&ctx, params)?;
loop {
let chunk = get_audio_chunk(); // your audio source
stream.feed_audio(&chunk);
let segments = stream.process_pending()?;
for seg in &segments {
println!("{}", seg.text);
}
}VAD preprocessing:
use whisper_cpp_plus::{WhisperVadProcessor, VadParams};
let mut vad = WhisperVadProcessor::new("models/ggml-silero-vad.bin")?;
let params = VadParams::default();
let segments = vad.segments_from_samples(&audio, ¶ms)?;
for (start, end) in segments.get_all_segments() {
let start_sample = (start * 16000.0) as usize;
let end_sample = (end * 16000.0) as usize;
let text = ctx.transcribe(&audio[start_sample..end_sample])?;
println!("[{:.1}s-{:.1}s] {}", start, end, text);
}Enhanced VAD with segment aggregation:
use whisper_cpp_plus::enhanced::{EnhancedWhisperVadProcessor, EnhancedVadParams};
let mut vad = EnhancedWhisperVadProcessor::new("models/ggml-silero-vad.bin")?;
let params = EnhancedVadParams::default();
let chunks = vad.process_with_aggregation(&audio, ¶ms)?;
for chunk in &chunks {
let text = ctx.transcribe(&chunk.audio)?;
println!("[{:.1}s, {:.1}s long] {}", chunk.offset_seconds, chunk.duration_seconds, text);
}Temperature fallback for difficult audio:
let params = TranscriptionParams::builder()
.language("en")
.build();
let result = ctx.transcribe_with_params_enhanced(&audio, params)?;
// Automatically retries with higher temperatures if quality thresholds aren't metMore examples in whisper-cpp-plus/examples/.
Beyond standard whisper.cpp bindings, this crate provides optimizations inspired by faster-whisper:
EnhancedWhisperVadProcessor aggregates Silero VAD speech segments into optimal-sized chunks for transcription. Instead of transcribing hundreds of tiny segments, it merges adjacent speech into configurable windows — 2-3x faster on audio with significant silence.
EnhancedWhisperState automatically retries transcription at higher temperatures when quality thresholds aren't met (compression ratio, log probability, no-speech probability). Handles noisy/difficult audio without manual intervention.
Both features are orthogonal — use one, both, or neither. See docs/ARCHITECTURE.md for design details.
- Safety — all unsafe FFI encapsulated with null checks, lifetime enforcement, RAII cleanup
- Zero-copy — audio slices passed directly to C++ via pointer, no intermediate copies
- Progressive enhancement —
Enhanced*types opt-in; base API stays clean - Idiomatic Rust — builder patterns,
thiserror, correctSend/Syncbounds - Cross-platform — Windows (MSVC), Linux, macOS; SIMD auto-detected, GPU via feature flags
The easiest way to get test models:
cargo xtask test-setupThis downloads ggml-tiny.en.bin and the Silero VAD model into whisper-cpp-plus-sys/whisper.cpp/models/ using whisper.cpp's own download scripts.
For production models, download from Hugging Face:
curl -L -o models/ggml-base.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin| Model | Size | English-only | Multilingual |
|---|---|---|---|
| tiny | 39 MB | tiny.en | tiny |
| base | 142 MB | base.en | base |
| small | 466 MB | small.en | small |
| medium | 1.5 GB | medium.en | medium |
| large-v3 | 3.1 GB | — | large-v3 |
# Download test models (tiny.en + Silero VAD)
cargo xtask test-setup
# Run all tests
cargo test -p whisper-cpp-plus
# With async tests
cargo test -p whisper-cpp-plus --features asyncTests that require models skip gracefully if not downloaded.
cargo test --lib # Unit tests (32 tests)
cargo test --test integration # Core integration (10 tests)
cargo test --test type_safety # Send/Sync verification (11 tests)
cargo test --test real_audio # JFK audio transcription
cargo test --test enhanced_integration # Enhanced VAD + fallback
cargo test --test stream_pcm_integration # WhisperStreamPcm modes
cargo test --test vad_integration # Silero VAD
cargo test --test quantization --features quantization # Model quantizationcargo bench --bench transcription # Core transcription
cargo bench --bench enhanced_vad_bench # VAD segment aggregation
cargo bench --bench enhanced_fallback_bench # Quality threshold checksC++ compilation takes several minutes. Use xtask to cache the compiled library:
# Build and cache once
cargo xtask prebuild
# Subsequent builds use cache (< 1 second)
cargo buildcargo xtask prebuild # Build precompiled library
cargo xtask prebuild --force # Force rebuild
cargo xtask info # Show available prebuilt libraries
cargo xtask clean # Remove cached libraries
cargo xtask test-setup # Download test modelsSee docs/CACHING_GUIDE.md for details.
WhisperContext:Send + Sync— share viaArcWhisperState:Sendonly — one per threadFullParams: notSend/Sync— create per transcription
All unsafe FFI operations encapsulated with null pointer checks, lifetime enforcement, and RAII cleanup.
"Failed to load model" — check file path, permissions, available memory
"Invalid audio format" — must be 16kHz mono f32, normalized to [-1, 1]
Linking errors on Windows — install Visual Studio Build Tools 2022, ensure x64 MSVC toolchain. See docs/TECHNICAL_REFERENCE.md.
git clone --recursive git@github.com:operator-kit/whisper-cpp-plus-rs
cd whisper-cpp-plus-rs
cargo xtask test-setup
cargo testSee docs/ARCHITECTURE.md for design decisions and module layout.
- whisper.cpp by Georgi Gerganov (MIT)
- OpenAI Whisper by OpenAI (MIT)