This guide will help you set up WhisperS2T for GPU-accelerated push-to-talk speech-to-text on Windows, with support for both Whisper and Parakeet models.
- GPU: NVIDIA RTX series (recommended: RTX 30-series or newer with 8GB+ VRAM)
- CUDA: Version 12.0 or higher
- RAM: 16GB minimum
- Storage: 10GB free space for models and dependencies
- OS: Windows 10/11
| Backend | Best For | Install Complexity |
|---|---|---|
| Parakeet | English-only, best accuracy | Medium (NeMo has many deps) |
| Whisper (CTranslate2) | Multilingual, good balance | Easy |
| Whisper (TensorRT) | Maximum speed | Complex |
Recommendation:
- For English: Use Parakeet
- For multilingual: Use Whisper with CTranslate2
- Download Miniconda from: https://docs.conda.io/en/latest/miniconda.html
- Install with default settings
- Restart your terminal/command prompt
Verify installation:
conda --versionVerify your CUDA installation:
nvidia-smiThis should show your GPU and CUDA version.
- Download CuDNN from: https://developer.nvidia.com/cudnn
- Choose the version matching your CUDA (e.g., cuDNN v8.9.7 for CUDA 12.x)
- Extract to:
C:\Program Files\NVIDIA\CUDNN\v8.x.x\ - Add the
binfolder to your system PATH
- Download from: https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip
- Extract to a folder (e.g.,
C:\ffmpeg\) - Add to system PATH:
- Search for "Environment Variables" in Windows
- Edit "Path" in System Variables
- Add:
C:\ffmpeg\bin
# Create new environment
conda create -n whisper python=3.10 -y
# Activate environment
conda activate whisperpip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Verify CUDA installation:
python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None')"# Install Whisper dependencies
pip install -r requirements-whisper.txt
# Install package
pip install -e .# Install Parakeet dependencies (NeMo)
pip install -r requirements-parakeet.txt
# Install package
pip install -e .
# Download or place your Parakeet model
# Option 1: Download from NGC (happens automatically on first use)
# Option 2: Place .nemo file in models/ folder# Install Whisper first
pip install -r requirements-whisper.txt
# Then add NeMo (may upgrade some packages)
pip install nemo_toolkit[asr]
# Install package
pip install -e .Note: NeMo may upgrade some packages that Whisper depends on. In practice, this usually works fine, but if you encounter issues, use separate conda environments.
python verify_setup.pyThis checks:
- ✅ Python version
- ✅ CUDA availability
- ✅ CuDNN installation
- ✅ FFmpeg installation
- ✅ WhisperS2T package
- ✅ PyAudio (microphone support)
- ✅ Available microphones
Copy the example configuration and customize:
copy .env.example .envEdit .env with your settings:
MIC_DEVICE=5 # Your microphone index
MODEL=large-v3 # Whisper model
BACKEND=CTranslate2 # Whisper backend
LANGUAGE=en # Language code
HOTKEY=ctrl+alt+shift+space # Push-to-talk hotkeyMIC_DEVICE=5 # Your microphone index
MODEL=models/parakeet-tdt-0.6b-v2.nemo # Path to .nemo file
BACKEND=Parakeet # Parakeet backend
LANGUAGE=en # English only
HOTKEY=ctrl+alt+shift+space # Push-to-talk hotkeyFind your microphone device index:
python -c "import pyaudio; p=pyaudio.PyAudio(); [print(f'{i}: {p.get_device_info_by_index(i)[\"name\"]}') for i in range(p.get_device_count()) if p.get_device_info_by_index(i)['maxInputChannels'] > 0]; p.terminate()"python whisper_hotkey.py- Wait for the model to load
- Whisper: First run downloads ~3GB for large-v3
- Parakeet: Uses your local .nemo file or downloads from NGC
- Press and hold your hotkey
- Speak into your microphone
- Release the hotkey
- The transcription appears and is copied to your clipboard!
After setup, the daily workflow is simple:
# Open a command prompt
conda activate whisper
python whisper_hotkey.pyThen:
- Hold hotkey → Record (you'll hear a pop sound)
- Release hotkey → Transcribe & copy to clipboard
- Paste anywhere with
Ctrl+V
Run Command Prompt as Administrator, then:
conda activate whisper
python whisper_hotkey.py# Verify PyTorch sees CUDA
python -c "import torch; print(torch.cuda.is_available())"If False:
- Update NVIDIA drivers
- Reinstall PyTorch with CUDA support
- Check Windows Sound Settings → Recording devices
- Verify the device index in
.envis correct - Ensure no other apps are using the microphone
- Verify you're using the correct microphone device
- Check that the .nemo file path is correct
- Ensure audio is actually being recorded (check file size)
If NeMo fails to import:
# Try reinstalling in a fresh environment
conda create -n parakeet python=3.10 -y
conda activate parakeet
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install nemo_toolkit[asr]The first run downloads the model. Be patient or use a smaller model:
# Whisper
MODEL=base # ~74MB, faster download
# Parakeet - use local file to avoid download
MODEL=models/parakeet-tdt-0.6b-v2.nemo| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
tiny |
~39MB | ⚡ Fastest | Basic | Testing only |
base |
~74MB | ⚡ Very Fast | Good | Quick notes |
small |
~244MB | 🚀 Fast | Better | Daily use |
medium |
~769MB | 🐌 Slower | Very Good | Important recordings |
large-v3 |
~1550MB | 🐌 Slowest | Best | Maximum accuracy |
| Model | Size | Speed | Accuracy | Notes |
|---|---|---|---|---|
parakeet-tdt-0.6b-v2 |
~600MB | 🚀 Fast | Best | Recommended for English |
parakeet-tdt-1.1b |
~1.1GB | 🚀 Fast | Best | Larger, slightly better |
Your push-to-talk speech-to-text is ready!
conda activate whisper
python whisper_hotkey.pyHold your hotkey, speak, release, and paste! 🎤✨
See USAGE_GUIDE.md for detailed configuration options.