A local voice-to-text dictation application using OpenAI Whisper with AI post-processing from Qwen 2.5. Type with your voice in any application - browsers, chat windows, code editors, and more.
- Push-to-talk recording: Press and hold a hotkey to record, release to transcribe
- System-wide input: Works in any application (browsers, editors, chat apps, Claude, etc.)
- Local processing: Uses Whisper AI running locally on your computer (no cloud, no API costs)
- Fast transcription: Uses faster-whisper (4x faster than standard Whisper)
- AI post-processing: Optional cleanup using a local Qwen 2.5 language model — fixes grammar, capitalization, punctuation, contractions, quotations, and sentence breaks; removes filler words (um, uh) and stutters
- On-Screen Recognition (OSR): Captures the active window via OCR to improve name accuracy and adapt formatting for chat, email, code, terminal, and documents
- Self-learning recognition: Passively builds per-app profiles over time — learns vocabulary, communication style, and app types so transcription accuracy improves the more you use it
- Estimated accuracy: Displays Whisper's confidence score on each transcription with detected app context
- Custom dictionary: Post-transcription word replacement with exact and fuzzy matching
- Recording overlay: Floating pill widget with live waveform visualization, feature badges, and app detection
- Sound effects: Audible start/stop tones with custom sound support
- Simple interface: Dark-themed system tray application with toast notifications
- Configurable: Customize hotkey, model size, audio device, and typing method
- Automatic model download: Downloads the speech model on first launch with progress animation
- Cross-platform: Works on Windows, Linux, and macOS
- Windows 10/11, Linux, or macOS
- Python 3.12 (managed via uv)
- Microphone
- Tesseract OCR (Linux/macOS only, optional — for On-Screen Recognition)
- Linux:
sudo apt-get install tesseract-ocr - macOS:
brew install tesseract - Windows: Not needed (uses built-in Windows OCR)
- Linux:
-
Clone or download this repository
-
Install uv (if not already installed):
Windows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Linux / macOS (terminal):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Linux/macOS only — install Tesseract OCR (optional, for On-Screen Recognition):
# Linux (Debian/Ubuntu) sudo apt-get install tesseract-ocr # macOS brew install tesseract
-
Run the application:
Platform Script Windows Double-click Start Resonance (Windows).batLinux / macOS Run ./Start Resonance (Linux, Mac).shOn first launch, dependencies install automatically and the speech model downloads (~140 MB).
Troubleshooting? Use the "with console" variant (
Start Resonance (Windows, with console).batorStart Resonance (Linux, Mac, with console).sh) to see detailed output.
- Launch the application (system tray icon will appear)
- On first launch, the Balanced model (~140 MB) downloads automatically
- Open any application where you want to type
- Press and hold
Ctrl+Alt(default hotkey) - Speak into your microphone
- Release the hotkey to transcribe
- Text will be automatically typed into the active window
Right-click the system tray icon and select Settings to configure:
- Hotkey: Change the push-to-talk keyboard shortcut
- Quality: Choose a Whisper model for speech recognition
- Fastest: Whisper Tiny (~70 MB), sub-second
- Balanced: Whisper Base (~140 MB), sub-second (default)
- Accurate: Whisper Small (~500 MB), ~2s
- Precision: Whisper Medium (~1.5 GB), ~5s
- Post-Processing (AI): Enable/disable AI-powered transcription cleanup using Qwen 2.5 1.5B (downloaded automatically, runs locally via llama.cpp)
- On-Screen Recognition (OSR): Enable OCR-based screen capture for app-aware formatting and name accuracy (requires Post-Processing)
- Self-Learning Recognition: Enable persistent per-app learning that improves over time (requires OSR)
- Audio Device: Select which microphone to use (WASAPI devices only for clean device list)
- Entry Method: Choose between clipboard paste or character-by-character typing
- Custom Dictionary: Add word replacements applied after transcription
- Usage Statistics: Track words dictated, transcriptions, time saved, and more
- Learning Statistics: Apps learned, words learned, top app, and average confidence
- Bug Report: Submit a pre-filled GitHub issue with system info and recent logs directly from Settings
- Python 3.12 (pinned via
.python-version, managed by uv) - PySide6: GUI framework (system tray, settings, overlays, toast notifications)
- QtMultimedia: Cross-platform audio playback for notification tones
- sounddevice: Audio recording
- faster-whisper: Speech recognition (CTranslate2 backend, CPU-optimized)
- llama.cpp (llama-server): Local inference server for post-processing
- Qwen 2.5 1.5B Instruct (GGUF Q4_K_M): Language model for transcription cleanup
- winocr (Windows) / pytesseract (Linux/macOS): OCR for screen context capture
- pywinctl: Cross-platform window management (Linux/macOS active window detection)
- mss: Screenshot capture for OCR
- pynput: Global hotkeys and keyboard simulation
- pyperclip: Clipboard-based text entry
- Global hotkey listener detects when you press/release the configured hotkey
- Audio is recorded from your microphone at 16kHz (Whisper's native sample rate)
- If OSR is enabled, OCR captures the active window in a background thread during recording (~56ms). If self-learning is also enabled, the captured data updates per-app vocabulary and style profiles
- When you release the hotkey, the audio is sent to faster-whisper for transcription (OCR-detected names are passed as vocabulary hints)
- If post-processing is enabled, the text is cleaned up by Qwen 2.5 via a local llama-server instance with an app-type-specific prompt (chat, email, code, terminal, or document)
- Custom dictionary replacements are applied
- Text is typed into the currently focused window via clipboard paste or keyboard simulation
Each feature builds on the previous one. Here's what changes at each level, using real examples of what you'd get if you said the same thing out loud.
Raw transcription from Whisper. It catches most words accurately and strips obvious filler sounds (um, uh), but punctuation, capitalization, and grammar are inconsistent.
| You say | You get |
|---|---|
| "yeah i was talking to jake about the uh the kubernetes deployment and he said its basically done" | yeah I was talking to Jake about the the Kubernetes deployment and he said its basically done |
| "hey sarah i wanted to follow up on the meeting about the robinson account" | hey Sarah I wanted to follow up on the meeting about the Robinson account |
| "can you check if the env variable for the redis connection string is set" | can you check if the env variable for the Redis connection string is set |
| "thanks for getting back to me so quickly i really appreciate it talk to you soon" | thanks for getting back to me so quickly I really appreciate it talk to you soon |
Whisper strips "uh" and "um" but misses stutters ("the the"), drops punctuation, and has inconsistent capitalization ("i" vs "I"). Every app gets the same raw output.
Enables the local Qwen 2.5 language model to clean up Whisper's output. Fixes grammar, capitalization, punctuation, contractions, stutters, and sentence breaks.
| You say | Whisper only | + Post-Processing |
|---|---|---|
| "yeah i was talking to jake about the uh the kubernetes deployment and he said its basically done" | yeah I was talking to Jake about the the Kubernetes deployment and he said its basically done |
Yeah, I was talking to Jake about the Kubernetes deployment and he said it's basically done. |
| "hey sarah i wanted to follow up on the meeting about the robinson account" | hey Sarah I wanted to follow up on the meeting about the Robinson account |
Hey Sarah, I wanted to follow up on the meeting about the Robinson account. |
| "thanks for getting back to me so quickly i really appreciate it talk to you soon" | thanks for getting back to me so quickly I really appreciate it talk to you soon |
Thanks for getting back to me so quickly. I really appreciate it. Talk to you soon. |
| "the the project is almost done i think we should uh deploy it tomorrow" | the the project is almost done I think we should deploy it tomorrow |
The project is almost done. I think we should deploy it tomorrow. |
Stutter ("the the") removed, contractions fixed ("its" to "it's"), sentence breaks added, proper punctuation and capitalization throughout. However, the same formal style is applied everywhere — a Discord message gets the same treatment as an email.
OCR captures your active window during recording. Two things improve:
- Name accuracy: Proper nouns visible on screen (colleague names, project names, technical terms) are fed to Whisper as vocabulary hints, so it spells them correctly
- App-aware formatting: The post-processing prompt changes based on what app you're in
| App type | What changes |
|---|---|
| Chat (Discord, Slack, Teams) | Keeps slang (lol, lmao, tbh, ngl), preserves "like", lowercase start, no trailing period, keeps casual emphasis (yeah yeah, fr fr), preserves informal contractions (tryna, gonna, wanna) |
| Email (Outlook, Gmail) | Professional tone, complete sentences, proper greetings preserved |
| Code (VS Code, PyCharm) | Preserves camelCase, snake_case, technical terms, file extensions |
| Terminal (PowerShell, cmd) | Preserves command names, flags, paths, technical terms |
| Document (Word, Notion) | Well-structured sentences, breaks run-on speech into clear paragraphs |
Example — same sentence, different apps:
| You say | Post-Processing only | + OSR in Discord (Chat) | + OSR in Outlook (Email) |
|---|---|---|---|
| "yeah ngl i think we should just push it to tomorrow tbh" | Yeah, I think we should just push it to tomorrow. |
yeah ngl I think we should just push it to tomorrow tbh |
Yeah, I think we should just push it to tomorrow. |
| "hey can you send me that report when you get a chance" | Hey, can you send me that report when you get a chance? |
hey can you send me that report when you get a chance? |
Hey, can you send me that report when you get a chance? |
The Chat prompt keeps the message casual — slang stays, lowercase start, no unnecessary period. The Email prompt keeps it professional. Without OSR, everything gets the same generic treatment.
Name accuracy example: If your coworker "Priya Raghavan" is visible in a Slack thread, OSR feeds that name to Whisper. Without it, Whisper might transcribe "Priya Ragavan" or "Priya Raghaven". With OSR, the correct spelling is hinted.
Builds persistent per-app profiles that improve over time. Two things are added on top of OSR:
- Vocabulary from past sessions: Names and terms you've encountered before in an app are used as Whisper hints even when they aren't visible on the current screen. If "Priya Raghavan" appeared in Slack last week, the learning engine remembers and hints it for future transcriptions in Slack
- Style adaptation: After 3+ sessions in an app, the engine learns communication patterns (formality level, punctuation habits, capitalization style) and adjusts the post-processing prompt to match
| Feature | OSR only | + Self-Learning |
|---|---|---|
| Vocabulary hints | Only names visible on screen right now | Names from screen + all names seen in this app before |
| Style prompt | Fixed per app type | Adapts to observed patterns (casual vs formal, punctuation density, etc.) |
| Overlay badge | Generic type ("Chat", "Email") | Specific app name ("Discord", "Outlook") |
| Persistence | None — starts fresh each session | Profiles saved to disk, improve across sessions |
Practical example: You use Slack daily with teammates named "Dmitri", "Xiaowen", and "Kayleigh". After a few sessions with self-learning on, these names are in your Slack vocabulary profile. Even when none of them are visible on screen, Whisper gets them as hints and spells them correctly. Without self-learning, Whisper would only get hints for names currently visible on the screen.
| Layer | What it adds | Requires |
|---|---|---|
| Whisper only | Raw speech-to-text | Nothing |
| + Post-Processing | Grammar, punctuation, capitalization, stutter removal, sentence breaks | Qwen 2.5 model (~1.1 GB download) |
| + OSR | App-aware formatting, name accuracy from screen | Post-Processing |
| + Self-Learning | Persistent vocabulary, style adaptation, improves over time | OSR |
Make sure uv is installed (see Installation step 2) and restart your terminal/command prompt after installation to refresh your PATH.
The launch scripts set a local cache (UV_CACHE_DIR) to avoid OneDrive hardlink issues. If you still see hardlink errors, your uv cache may be in an OneDrive-synced AppData folder.
Solution: Edit the launch script and add --link-mode=copy to the uv sync command.
- Check that your microphone is working and selected in Settings
- Ensure you're speaking clearly and loudly enough
- Try a larger model size for better accuracy
- Check for conflicts with other applications using the same hotkey
- Try changing to a different hotkey combination in Settings
- Some applications with anti-cheat or security features may block global hotkeys
- Some security-focused applications may block simulated keyboard input
- Try switching to clipboard paste in Settings
- The Whisper model downloads automatically on first launch (~140 MB for the default Balanced model)
- Models are cached locally, subsequent runs start instantly
MIT License
- Light / Dark / System theme — Toggle between light mode, dark mode, or follow the system setting. Each can be selected independently in Settings.
In-app updater— Shipped in v3.1.1.On-screen recognition (OCR)— Shipped in v3.0.0.Self-learning recognition— Shipped in v3.0.0.Cross-platform (Linux & macOS)— Shipped in v3.3.0.
- Fix: auto-update applies but version unchanged (infinite update loop) — The update batch script used
xcopyto copy new files over old, but xcopy doesn't delete files that no longer exist. Both old and newresonance-*.dist-info/directories coexisted in_internal/, andimportlib.metadata.version()found the old version first alphabetically. The app still thought it was on the old version and offered the update again in a loop. Fixed by deleting all oldresonance-*.dist-info/directories before copying new files
- Auto-updater test release
- Fix: worker thread callbacks not running on main thread —
QueuedConnectionon plain Python functions doesn't work in PySide6 (no receiver QObject for thread dispatch). All worker signal connections now relay through QObject signals: worker → VTTApplication relay signal → callback. This guarantees callbacks run on the main GUI thread. Fixes update toast invisible, settings crash on Check for Updates, and post-download startup toast missing - Fix: auto-update not applying — The update batch script was created inside the app directory (visible clutter) and launched with
DETACHED_PROCESSwhich silently failed on some systems. Now extracts to system temp, writes batch to system temp, usesCREATE_NEW_PROCESS_GROUPfor reliable subprocess launch, and includes diagnostic logging in the batch script
- Fix: update toast not appearing — Auto-update check found new versions but the toast never showed. Signal callbacks from worker threads to plain Python functions used
AutoConnection, which defaults toDirectConnection(runs on worker thread). Qt widgets created/modified from non-GUI threads silently fail. Fixed by using explicitQueuedConnectionfor all update worker signals - Fix: "Check for Updates" crash in Settings — Clicking the button found the update but then crashed the app. Same root cause — the callback modified GUI widgets from the worker thread, causing a segfault. Fixed with
QueuedConnection - Fix: post-processor answering questions instead of cleaning them — The question-answer hallucination guard failed when input started with filler words containing punctuation (e.g., "Okay, how does it look?"). The filler stripping compared
"okay,"against"okay"and didn't match, so the question word "how" was never reached and the guard was skipped. Fixed by stripping punctuation before filler comparison. Also added explicit?detection so any input containing a question mark triggers the guard
- Fix: model download crash in EXE — PyInstaller windowed mode sets
sys.stderrtoNone, causingtqdm/huggingface_hubto crash with "NoneType has no attribute 'write'" when downloading models from Settings. Fixed by redirecting to devnull - Fix: crash on cancelling model download — Closing or cancelling a download dialog while
snapshot_download()was running would crash the app. Worker signals are now disconnected before cleanup, and stuck threads are safely detached - Fix: model combo not reverting on failed download — After a failed download, the dropdown stayed on the failed model causing repeated download attempts on Save. Now reverts to the previously saved model
- Fix: download toast stuck after first-run install — The "Installing model" toast would not dismiss after the download completed. Now cleanly hides and shows the startup toast
- Fix: "Learning OSR" badge shown without dependencies — The overlay badge checked the config flag (default: true) instead of the actual engine instance. Fresh installs showed "Learning OSR" even with post-processing and OSR off. Default changed to false and badge now checks engine state
- Fix: bundled sounds missing — PyInstaller spec only bundled icons, not the custom piano tone WAV files. Added
src/resources/sounds/to the build - Fix: SSL DLL mismatch in EXE — PyInstaller picked up PySide6's OpenSSL DLLs instead of Python's, causing
_sslimport failures on machines without Python. Spec now force-bundles Python's ownlibssl/libcryptoDLLs
- Portable EXE: Distributable as a single folder — extract the ZIP, double-click
Resonance.exe, no Python or installer required. All data (models, config, logs) stored relative to the app directory - Auto-updater: Checks GitHub Releases 8 seconds after launch. Shows an interactive toast with Yes/No (auto-dismisses after 10s). On accept, downloads the update, writes a batch script that restarts the app with the new version. Also adds a "Check for Updates" button and version display in Settings
- Download auto-recovery: Detects and cleans up partially downloaded models (
.incompleteblobs or missingmodel.bin) on startup and before retries, so interrupted downloads no longer cause cryptic errors - PyInstaller build spec:
resonance.specbundles faster-whisper, CTranslate2 native DLLs, icons, and package metadata forimportlib.metadata.version()support - packaging dependency: Added
packaging>=23.0for semantic version comparison in the updater
- Self-learning pipeline wiring: Learned vocabulary from past sessions is now merged with OCR proper nouns and fed to Whisper as vocabulary hints. Style adaptation hints are appended to the post-processing system prompt. Previously, self-learning only observed and recorded data — now it actively improves transcription accuracy
- Feature Layers documentation: Added a detailed section to README showing concrete before/after examples for each feature level (Whisper only → Post-Processing → OSR → Self-Learning)
- On-Screen Recognition (OSR): OCR captures the active window during recording to extract proper nouns as Whisper vocabulary hints and detect app type (chat, email, code, terminal, document) for format-specific post-processing prompts
- Self-learning recognition: Passively builds per-app profiles over time — learns vocabulary, communication style (message length, capitalization, punctuation, formality), and app types with increasing confidence. Profiles persist across sessions in a separate JSON store
- App detection badges: During typing, shows detected app context above the transcription pill — generic type ("Chat", "Email") with OSR only, specific app name ("Discord", "Outlook") with self-learning enabled. Hidden for general/unknown apps
- Estimated accuracy badge: Displays Whisper's confidence score (derived from avg_logprob) on every transcription
- Terminal app type: Discriminates terminals from code editors with dedicated formatting prompt
- WASAPI audio filtering: Microphone dropdown shows only WASAPI devices, eliminating duplicate entries from MME/DirectSound/WDM-KS
- Dependency-chained settings: Post-Processing → OSR → Self-Learning toggles with grayed-out labels showing requirements
- Learning statistics: Four stat cards in settings — Apps Learned, Words Learned, Top App, Avg Confidence
- Comma-spam guard: Post-processor rejects output with excessive comma insertion
- Larger typing indicator: "Text Entered" and "Typing" pill enlarged with bigger text for better visibility
- Bug report button: Settings dialog includes a "Report Bug..." button that opens a pre-filled GitHub issue with system info and recent logs
- Scrollable settings dialog: Settings now scroll vertically on small screens with fixed Save/Cancel buttons at bottom
- Startup model download: Automatically downloads the speech model on first launch with animated progress toast; hotkey is disabled until download completes
- Default model changed: New installations default to Balanced (base, ~140 MB) instead of Accurate (small, ~500 MB) for faster first-run
- No speech detected: Recording overlay shows "No speech detected" in red when transcription returns empty
- Scroll wheel fix: Mouse wheel no longer accidentally changes dropdown selections while scrolling settings
- Improved scrollbar styling: Thinner, transparent scrollbar that blends with the dark theme
- AI post-processing: Local Qwen 2.5 model cleans up grammar, punctuation, capitalization, contractions, quotations, sentence breaks, filler words, and stutters
- Recording overlay badges: Shows active features (Post-Processing: ON) above the recording pill
- Startup toast: Displays model, post-processing status, and entry method on launch
- Clipboard/typing toast: Visual confirmation showing "Text entered" or "Typing" after transcription
- Overlay typing states: Green "Complete" and "Text Entered" states, animated dots during character-by-character output
- Usage statistics: Dashboard with 8 stat cards (words dictated, transcriptions, time saved, avg WPM, and more)
- Dark theme with rounded frameless dialogs
- Model download progress UI in settings
- Recording overlay with live waveform visualization
- Custom dictionary with fuzzy matching
- Sound effects (start/stop tones) with custom WAV support
- Thread-safe hotkey handling
Built with:
- OpenAI Whisper
- faster-whisper
- Qwen 2.5 (post-processing)
- llama.cpp (local inference)
- PySide6