conda activate whisper
python whisper_hotkey.pyThen hold your hotkey to record, release to transcribe. The transcription is automatically copied to your clipboard!
- Press & Hold your configured hotkey (default:
ctrl+alt+shift+space) - Speak - you'll hear a pop sound when recording starts
- Release the hotkey when done
- Transcription appears and is automatically copied to clipboard
- Paste anywhere with
Ctrl+V
All settings are in the .env file. Edit to customize:
# =============================================================================
# AUDIO SETTINGS
# =============================================================================
MIC_DEVICE=5 # Your microphone device index
SAMPLE_RATE=16000 # Keep at 16000 for Whisper
# =============================================================================
# MODEL SETTINGS
# =============================================================================
MODEL=large-v3 # Whisper model (see options below)
BACKEND=CTranslate2 # Fastest backend
LANGUAGE=en # Language code
# --- OR use Parakeet for best English accuracy ---
# MODEL=models/parakeet-tdt-0.6b-v2.nemo # Local .nemo file
# MODEL=nvidia/parakeet-tdt-0.6b-v2 # Download from NGC
# BACKEND=Parakeet
# LANGUAGE=en # Parakeet is English-only
# =============================================================================
# RECORDING SETTINGS
# =============================================================================
CHUNK_DURATION=10 # Seconds per chunk (10 recommended)
CHUNK_OVERLAP=2 # Overlap between chunks
# =============================================================================
# HOTKEY SETTINGS
# =============================================================================
HOTKEY=ctrl+alt+shift+space # Your push-to-talk hotkey
# =============================================================================
# STOP MODE SETTINGS
# =============================================================================
AUTO_STOP_ENABLED=false # Set to true for auto-stop on silence
SILENCE_THRESHOLD=2.0 # Seconds of silence before auto-stop
# =============================================================================
# OUTPUT SETTINGS
# =============================================================================
COPY_TO_CLIPBOARD=true # Auto-copy transcription
SHOW_PROGRESS=true # Show recording progress
PRINT_TRANSCRIPTION=true # Print final result to consolepython -c "import pyaudio; p=pyaudio.PyAudio(); [print(f'{i}: {p.get_device_info_by_index(i)[\"name\"]}') for i in range(p.get_device_count()) if p.get_device_info_by_index(i)['maxInputChannels'] > 0]; p.terminate()"| Model | Size | Speed | Accuracy | Recommended For |
|---|---|---|---|---|
tiny |
~39MB | ⚡ Fastest | Basic | Testing only |
base |
~74MB | ⚡ Very Fast | Good | Quick notes |
small |
~244MB | 🚀 Fast | Better | Daily use |
medium |
~769MB | 🐌 Slower | Very Good | Important recordings |
large-v2 |
~1550MB | 🐌 Slowest | Best | Maximum accuracy |
large-v3 |
~1550MB | 🐌 Slowest | Best | Recommended |
| Model | Size | Speed | Accuracy | Notes |
|---|---|---|---|---|
nvidia/parakeet-tdt-0.6b-v2 |
~600MB | 🚀 Fast | Best | Recommended for English |
nvidia/parakeet-tdt-1.1b |
~1.1GB | 🚀 Fast | Best | Larger, slightly better |
Local .nemo file |
Varies | 🚀 Fast | Best | Use your own model |
Note: Parakeet models require the NeMo toolkit: pip install nemo_toolkit[asr]
| Backend | Best For | Notes |
|---|---|---|
CTranslate2 |
General Whisper use | Default, fast, good balance |
TensorRT |
Maximum Whisper speed | Requires TensorRT-LLM setup |
HuggingFace |
Distil models, flexibility | Slower, more features |
OpenAI |
Original implementation | Reference, not optimized |
Parakeet |
Best English accuracy | English-only, requires NeMo |
HOTKEY=ctrl+alt+shift+space # 4-key combo
HOTKEY=ctrl+shift+r # 3-key combo
HOTKEY=f9 # Single function keyCommon codes: en, es, fr, de, it, pt, ru, ja, zh
# Normal push-to-talk mode
python whisper_hotkey.py
# Show current configuration
python whisper_hotkey.py --config
# Simple one-shot mode (no hotkey, just record for X seconds)
python whisper_hotkey.py --simple --duration 10
# Use a different .env file
python whisper_hotkey.py --env /path/to/custom.env-
Model Selection:
- For English: Use
Parakeetbackend withnvidia/parakeet-tdt-0.6b-v2for state-of-the-art accuracy - For multilingual: Use
large-v3withCTranslate2backend
- For English: Use
-
Chunk Duration: 10 seconds works well. The app automatically handles longer recordings by chunking and stitching.
-
Speak Naturally: The intelligent stitching algorithm handles sentence boundaries well. Don't worry about pausing between chunks.
-
Wait for the Pop: The audio notification confirms recording has started. Speak after you hear it.
-
Clean Release: Release the hotkey cleanly after you finish speaking. The transcription starts immediately.
-
Parakeet Setup: If using Parakeet, install NeMo first:
pip install nemo_toolkit[asr]
- Run as Administrator: The
keyboardlibrary may need admin privileges for global hotkeys - Try a simpler hotkey: Change to
f9orctrl+shift+rin.env - Check for conflicts: Another app might be using the same hotkey
- Ensure
files/pop.wavexists - Check Windows sound settings
- Verify
pyperclipis installed:pip install pyperclip - Check
COPY_TO_CLIPBOARD=truein.env
- Verify microphone index with the command above
- Check Windows sound settings → Recording devices
- Ensure no other apps are using the microphone
- Run
python verify_setup.pyto check GPU status - Ensure NVIDIA drivers are up to date
- Check that PyTorch sees your GPU:
python -c "import torch; print(torch.cuda.is_available())"
With an RTX 4080 and large-v3 model:
- 10-second chunk: ~1.5s transcription time
- Real-time factor: ~6-7x faster than real-time
- Memory usage: ~3-4GB VRAM
conda activate whisper
python whisper_hotkey.pyHold your hotkey, speak, release, paste! 🎤✨