Skip to content

b1rd33/fluidaudio-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FluidAudio Skill for Claude Code

A Claude Code skill that gives Claude local audio AI superpowers on macOS — speech-to-text (25 languages), speaker diarization, voice activity detection, and live transcription. All processing runs locally on Apple Neural Engine. Zero cloud dependencies.

What This Is

This is a Claude Code skill — a set of instructions and reference docs that teach Claude how to use FluidAudio for audio processing tasks.

When installed, Claude can:

  • Transcribe audio/video files in 25 languages (auto-detected)
  • Identify who spoke when (speaker diarization)
  • Detect speech vs silence (VAD)
  • Live-transcribe your microphone or system audio
  • Extract audio from video files (MP4, MOV, MKV, etc.)

Requirements

  • macOS 14+ with Apple Silicon (M1/M2/M3/M4)
  • Homebrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" (brew.sh)
  • Xcode Command Line Tools: xcode-select --install (takes 5-20 min on first install)
  • ffmpeg (for video/format conversion): brew install ffmpeg
  • Claude Code: Install guide

Installation

1. Install FluidAudio

cd ~/Projects  # or wherever you keep repos
git clone https://github.com/FluidInference/FluidAudio.git
cd FluidAudio
swift build -c release

First run note: The first swift build compiles the project (~2-5 min). The first transcription also downloads AI models automatically (~500MB). After that, everything runs instantly from cache.

2. Set FLUIDAUDIO_HOME

Add to your ~/.zshrc or ~/.bashrc:

export FLUIDAUDIO_HOME="$HOME/Projects/FluidAudio"

Then reload: source ~/.zshrc

3. Install the skill

git clone https://github.com/b1rd33/fluidaudio-skill.git ~/.claude/skills/fluidaudio

4. Use it

In Claude Code, just ask:

  • "Transcribe this audio file"
  • "Extract audio from this video and transcribe it"
  • "Who is speaking in this meeting recording?"
  • "Read this text aloud"
  • "What's in this Dutch voice memo?"

Supported Languages (ASR)

25 European languages with automatic detection — no language flag needed:

English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Czech, Slovak, Hungarian, Romanian, Bulgarian, Croatian, Serbian, Slovenian, Ukrainian, Greek, Turkish, Finnish, Swedish, Norwegian, Danish, Catalan

Performance

On Apple M4 Pro:

Task Speed
Speech-to-text ~190x real-time (1hr audio in ~19s)
Diarization (offline) ~50x real-time
VAD ~500x real-time

Skill Structure

SKILL.md              # Main skill document (loaded by Claude Code)
references/           # Detailed reference docs per feature
  commands.md         # Complete CLI reference
  asr.md              # Speech-to-text guide
  multilingual.md     # Language support & WER benchmarks
  video.md            # Video/format conversion
  diarization.md      # Speaker identification
  vad.md              # Voice activity detection
  live.md             # Real-time transcription
  benchmarks.md       # Performance testing
scripts/              # Helper shell scripts
  extract_audio.sh    # Video → WAV extraction
  batch_transcribe.sh # Batch transcription
  process_meeting.sh  # Full meeting pipeline

License

MIT — see LICENSE

About

Claude Code skill for local speech-to-text on macOS — transcription (25 languages), speaker diarization, VAD. Apple Neural Engine, ~190x real-time. Zero cloud.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages