An open-source, lightweight, and high-performance dictation tool with AI-powered formatting - an unofficial Python remake of wisprflow.
- OS: Windows 10/11 (uses Windows APIs for window detection and keyboard input)
- GPU: NVIDIA GPU with CUDA support (recommended for best performance)
- CPU mode available but significantly slower
- RAM: 4GB+ (8GB+ recommended for larger Whisper models)
- Microphone: Any input device (USB microphones like HyperX QuadCast work great)
Download and install Python from python.org
- Download the latest driver from NVIDIA's website
- Select your GPU model and install
- Download CUDA Toolkit 12.4
- Run the installer and select:
- CUDA Toolkit
- CUDA Development
- CUDA Runtime
- Default installation path:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
- Download cuDNN 9.17 for CUDA 12.x
- Extract the archive
- Copy files to CUDA installation:
cuDNN/bin/*.dll → C:\Program Files\NVIDIA\CUDNN\v9.17\bin\12.9\ cuDNN/include/*.h → C:\Program Files\NVIDIA\CUDNN\v9.17\include\ cuDNN/lib/*.lib → C:\Program Files\NVIDIA\CUDNN\v9.17\lib\
nvcc --version
nvidia-smiNote: If you don't have an NVIDIA GPU, you can still use CPU mode by changing WHISPER_DEVICE = "cpu" in wsprv2.py (line 30).
While not strictly required for this project, FFmpeg can improve audio compatibility:
- Download from ffmpeg.org
- Extract to a folder (e.g.,
C:\ffmpeg) - Add to PATH:
C:\ffmpeg\bin
git clone https://github.com/yourusername/wsprflowpy.git
cd wsprflowpypython -m venv venv
venv\Scripts\activatepip install -r requirements.txt-
Copy the example environment file:
copy .env.example .env
-
Edit
.envand add your API keys:# Optional: Hugging Face token for downloading models HF_TOKEN=your_huggingface_token_here # Required: OpenRouter API key for Claude formatting OPENROUTER_API_KEY=your_openrouter_api_key_here
- OpenRouter: Sign up at openrouter.ai and create an API key
- Used for AI-powered transcript formatting with Claude
- Pay-per-use pricing (very affordable for personal use)
- Hugging Face (optional): Sign up at huggingface.co
- Only needed if you encounter model download issues
The application automatically adds CUDA to PATH (lines 22-23 in wsprv2.py). Verify these paths match your installation:
os.environ['PATH'] = r'C:\Program Files\NVIDIA\CUDNN\v9.17\bin\12.9' + os.pathsep + os.environ['PATH']
os.environ['PATH'] = r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin' + os.pathsep + os.environ['PATH']Adjust if your CUDA/cuDNN is installed elsewhere.
-
Start the application:
python wsprv2.py
-
Microphone Selection: On first run, you'll be prompted to select your microphone:
Available input devices: [0] Microsoft Sound Mapper - Input [1] HyperX QuadCast S (2- HyperX) [2] Microphone Array (Realtek) ... Select input device index: 1- Your selection is saved to
mic_config.jsonand remembered for future runs - The app will auto-suggest preferred mics (HyperX, QuadCast, etc.)
- Your selection is saved to
-
Wait for initialization:
--- Initializing Clients --- Loading Whisper model... Whisper model loaded in 2.34s. OpenRouter client initialized. -
You're ready! The status badge will appear at the bottom center of your screen.
- Ctrl + Alt (Hold): Start recording for dictation
- Hold both keys and speak
- Release to stop recording
- Transcribed and formatted text is automatically pasted
The minimal UI badge shows current status:
- Thin line: Idle, ready to record
- Red waveform: Recording in progress
- Pulsing dots: Processing audio
Edit wsprv2.py to customize:
# Whisper STT Configuration (line 29-32)
WHISPER_MODEL_NAME = "small.en" # tiny.en, base.en, small.en, medium.en, large-v3
WHISPER_DEVICE = "cuda" # "cuda" or "cpu"
WHISPER_COMPUTE_TYPE = "float16" # float16 (GPU), int8 (CPU/low-end GPU)
WHISPER_BEAM_SIZE = 5 # Higher = more accurate but slower
# LLM for formatting (line 35)
MODEL_FORMATTER = "anthropic/claude-haiku-4.5" # Fast and cost-effective| Model | Size | Speed (GPU) | Accuracy | Use Case |
|---|---|---|---|---|
| tiny.en | 39MB | Fastest | Good | Quick notes, drafts |
| base.en | 74MB | Very fast | Better | General use |
| small.en | 244MB | Fast | Great | Recommended |
| medium.en | 769MB | Moderate | Excellent | High accuracy needed |
| large-v3 | 1.5GB | Slower | Best | Maximum accuracy |
- Saved config:
mic_config.jsonstores your microphone selection - Reset config: Delete
mic_config.jsonto reselect your microphone - Preferred mics: Edit
MIC_PREFERRED_KEYWORDSinwsprv2.py(line 72)
Place custom sound files in the sounds/ directory:
dictation-start.wav: Recording starteddictation-stop.wav: Recording stoppedNotification.wav: Errors or short recordings
- Verify CUDA installation:
nvcc --version - Check cuDNN files are in the correct location
- As a fallback, switch to CPU mode:
WHISPER_DEVICE = "cpu" WHISPER_COMPUTE_TYPE = "int8"
- Check Windows sound settings - ensure your mic is set as default recording device
- Delete
mic_config.jsonand restart to reselect - Try selecting "system default" option when prompted
- Use a better microphone (USB mics recommended)
- Speak clearly and reduce background noise
- Try a larger Whisper model:
medium.enorlarge-v3 - Increase beam size:
WHISPER_BEAM_SIZE = 10
- Ensure you're using CUDA (
WHISPER_DEVICE = "cuda") - Use a smaller model:
tiny.enorbase.en - Check GPU usage with
nvidia-smiduring transcription
# Reinstall dependencies
pip install --upgrade -r requirements.txt
# If specific package fails, install individually:
pip install faster-whisper --upgrade- Ensure the app has focus (click the status badge)
- Try running as Administrator
- Check if another app is intercepting the hotkeys
To create a standalone .exe:
pip install pyinstaller
pyinstaller --onefile --windowed --add-data "sounds;sounds" wsprv2.pyThe executable will be in the dist/ folder.
wsprflowpy/
├── wsprv2.py # Main application
├── requirements.txt # Python dependencies
├── .env # Environment variables (create from .env.example)
├── .env.example # Example environment file
├── mic_config.json # Microphone settings (auto-generated)
├── sounds/ # Sound effect files
│ ├── dictation-start.wav
│ ├── dictation-stop.wav
│ └── Notification.wav
└── README.md # This file
- Recording: Press Ctrl+Alt to start recording audio via sounddevice
- Transcription: Audio is processed by faster-whisper (local, no network needed)
- Formatting: Raw transcript is sent to Claude via OpenRouter for cleanup
- Context: Active window is detected to apply appropriate formatting style
- Pasting: Formatted text is copied to clipboard and pasted automatically
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details
- Inspired by wisprflow.ai
- Built with faster-whisper
- Formatting powered by Anthropic's Claude
If you encounter issues:
- Check the Troubleshooting section
- Open an issue on GitHub with:
- Error message
- Python version (
python --version) - CUDA version (
nvcc --version) - GPU model
Note: This is an unofficial remake and is not affiliated with Wisprflow.