A Claude Code skill that gives Claude local audio AI superpowers on macOS — speech-to-text (25 languages), speaker diarization, voice activity detection, and live transcription. All processing runs locally on Apple Neural Engine. Zero cloud dependencies.
This is a Claude Code skill — a set of instructions and reference docs that teach Claude how to use FluidAudio for audio processing tasks.
When installed, Claude can:
- Transcribe audio/video files in 25 languages (auto-detected)
- Identify who spoke when (speaker diarization)
- Detect speech vs silence (VAD)
- Live-transcribe your microphone or system audio
- Extract audio from video files (MP4, MOV, MKV, etc.)
- macOS 14+ with Apple Silicon (M1/M2/M3/M4)
- Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"(brew.sh) - Xcode Command Line Tools:
xcode-select --install(takes 5-20 min on first install) - ffmpeg (for video/format conversion):
brew install ffmpeg - Claude Code: Install guide
cd ~/Projects # or wherever you keep repos
git clone https://github.com/FluidInference/FluidAudio.git
cd FluidAudio
swift build -c releaseFirst run note: The first
swift buildcompiles the project (~2-5 min). The first transcription also downloads AI models automatically (~500MB). After that, everything runs instantly from cache.
Add to your ~/.zshrc or ~/.bashrc:
export FLUIDAUDIO_HOME="$HOME/Projects/FluidAudio"Then reload: source ~/.zshrc
git clone https://github.com/b1rd33/fluidaudio-skill.git ~/.claude/skills/fluidaudioIn Claude Code, just ask:
- "Transcribe this audio file"
- "Extract audio from this video and transcribe it"
- "Who is speaking in this meeting recording?"
- "Read this text aloud"
- "What's in this Dutch voice memo?"
25 European languages with automatic detection — no language flag needed:
English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Czech, Slovak, Hungarian, Romanian, Bulgarian, Croatian, Serbian, Slovenian, Ukrainian, Greek, Turkish, Finnish, Swedish, Norwegian, Danish, Catalan
On Apple M4 Pro:
| Task | Speed |
|---|---|
| Speech-to-text | ~190x real-time (1hr audio in ~19s) |
| Diarization (offline) | ~50x real-time |
| VAD | ~500x real-time |
SKILL.md # Main skill document (loaded by Claude Code)
references/ # Detailed reference docs per feature
commands.md # Complete CLI reference
asr.md # Speech-to-text guide
multilingual.md # Language support & WER benchmarks
video.md # Video/format conversion
diarization.md # Speaker identification
vad.md # Voice activity detection
live.md # Real-time transcription
benchmarks.md # Performance testing
scripts/ # Helper shell scripts
extract_audio.sh # Video → WAV extraction
batch_transcribe.sh # Batch transcription
process_meeting.sh # Full meeting pipeline
MIT — see LICENSE