feat: history tab, configurable hotkey, clipboard control#7
Open
nicremo wants to merge 12 commits into
Open
Conversation
- Cloud transcription via Groq/OpenAI-compatible APIs (whisper-large-v3) - Auto/Cloud/Local transcription mode with automatic offline fallback - API key encrypted via macOS Keychain (Electron safeStorage) - Default text model changed from gemma4:e4b (9.6GB) to qwen3.5:2b (2.7GB) - Configurable API base URL (Groq, OpenAI, Lemonfox, any compatible provider) - Language selector (German default, 11 languages available) - Stronger same-language prompt to prevent LLM translation - Built-in microphone preferred over external devices (AirPods fix) - New TranscriptionCard UI with source selector, API key management - Setup wizard with cloud/local transcription choice - Relaxed hotkey validation: Ollama not required when enhancement is off
Dictionary & Corrections: - Custom vocabulary tab with words and misspelling corrections - Words sent as Whisper prompt hints for better transcription - Corrections injected into LLM system prompt for auto-replacement - Two-column layout: Words (left) + Misspellings (right) - Async file lock prevents race conditions on concurrent writes - Whisper prompt truncated at ~800 chars (224 token limit) - IPC handlers with runtime input validation LLM Performance: - Disable thinking mode (think: false) for qwen3.5 models - Reduces rewrite time from ~14s to ~0.3s - Strip <think> tags from output as safety fallback Pipeline logging: - Log transcription settings, raw text, and final text for debugging
App Rules: - Auto-detect active app and switch style/enhancement level - 31 pre-configured dev tools (Ghostty, VS Code, Cursor, Zed, JetBrains, etc.) - All terminals, IDEs, git clients, API/DB tools default to Vibe Coding High - Non-dev apps use the user's manual default setting - Editable per-app style and level on the Style page - Rules stored in app-rules.json, add/remove/update via UI Credits: - Sidebar footer shows "Original by @giusmarci" and "Enhanced by Fabian Bitz"
Hotkey: - Switched from Fn to right Option key (keyCode 61) - Hold to record (release to transcribe) - Double-press for Handsfree mode (records until next single press) Handsfree Mode: - Debounced double-click detection (300ms window) - Red pulsing dot indicator in overlay when active - Red border around overlay bar during handsfree recording - Label shows "Handsfree" to distinguish from hold mode - Single press while handsfree stops recording and transcribes Idle Overlay: - Minimal 5-dot indicator always visible at bottom of screen - Shows app is ready without being intrusive - Same border-radius (14px) as active overlay - Semi-transparent with backdrop blur German Language Fix: - Language-specific reinforcement prompts written in target language - German prompt instructs LLM in German to output German - Preserves English technical terms when present in input - Dynamic prompt generation based on cloudLanguage setting Build: - Moved electron/electron-builder to devDependencies for packaging
App Icon: - Generated .icns from OpenWhisp logo (339px PNG -> all icon sizes) - Placed in build/icon.icns for electron-builder to pick up Tray Icon: - Added build/icons to extraResources so tray icons are bundled - Fixes missing menubar icon in packaged app Microphone Permission: - Check current status before requesting - If already denied, open System Settings directly (askForMediaAccess shows no dialog when previously denied) - Extended wait timeout to 15 attempts for manual permission grant
Root cause: Hardened Runtime blocks microphone access without explicit com.apple.security.device.audio-input entitlement. The app never appeared in macOS Microphone privacy list because the OS silently denied the request. - Added build/entitlements.mac.plist with audio-input entitlement - Configured electron-builder to use entitlements for both main and child processes - Improved mic permission flow: opens System Settings when previously denied
- Cloud rewrite as primary, Ollama as fallback (same pattern as transcription) - Default model: openai/gpt-oss-20b (1000 tokens/sec, practically free) - 5 cloud models available: GPT-OSS 20B/120B, Qwen3 32B, Llama 3.3 70B, Llama 3.1 8B - Cloud/Local toggle on Models page under Text Enhancement - Auto-fallback to local Ollama when cloud is unavailable - Ollama no longer required when rewrite mode is Cloud - Same API key and base URL as transcription (Groq)
History: - New History tab with persistent storage of all dictations - Each entry stores final text, raw transcription, style, app name, timestamp - Expandable entries with copy/remove actions and raw transcription diff - Thread-safe JSON persistence with async lock (max 500 entries) Clipboard: - New "Copy to clipboard" toggle in Preferences (default: off) - Clipboard write only happens when explicitly enabled or when auto-paste needs it - Status messages updated to reference history instead of clipboard
- Add HotkeyConfig type with keyCode, modifiers, and label - Swift helper accepts keyCode + modifiers as CLI arguments - Support three hotkey modes: single modifier (e.g. Fn), modifier combo (e.g. Cmd+Option), and regular key with modifiers (e.g. Cmd+Space) - Fn key (keyCode 63) as default with maskSecondaryFn detection - Hotkey recorder UI in Preferences with Fn toggle and custom key capture - Listener auto-restarts when hotkey setting changes - Escape to cancel recording, unknown keys ignored
Clipboard: previous fix restored clipboard content immediately after triggerPaste, but CGEvent key posts are fire-and-forget. The target app had not processed Cmd+V yet, so it read the old clipboard content. Fix: clear clipboard after 500ms delay instead of restoring. Listening: short key taps (< 300ms) started recording via the double-click timer even after key release. Fix: track physical key state and skip recording if key was already released when timer fires. History: added live refresh via history:updated IPC broadcast so the History tab updates immediately after each dictation.
- IPC handler broadcasts history:updated to all windows after dictation - Preload exposes onHistoryUpdated listener for live UI refresh - Settings update triggers hotkey listener restart when hotkey changes - Add broadcast dependency to IPC handler interface
f7794e5 to
7c7dfcb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a persistent dictation history, configurable hotkey with Fn key support, optional clipboard copying, and several critical bug fixes. All features are backwards-compatible.
History Tab
history:updatedIPC broadcast to all windowshistory.jsonin the Electron userData directoryConfigurable Dictation Hotkey
src/shared/hotkeys.ts: macOS key code mappings, modifier flags, label builderOptional Clipboard Copying
Bug Fixes
post(tap:)is fire-and-forget, the Swift process exits before the target app reads the clipboard. Fix: clear clipboard after 500ms delay instead of immediate restore.upevent, then firedhandleHotkeyDownwith no subsequentupto stop it. Fix: track physical key state viakeyHeldRefand cancel recording if key was released when timer fires.upevent, preventing the seconddownfrom recognizing a double-click. Fix: only setkeyHeldRef = falseonup, do not cancel the timer.What changed
src/shared/hotkeys.tssrc/shared/types.tssrc/main/history.tssrc/main/dictation.tssrc/main/ipc.tssrc/main/native-helper.tssrc/main/index.tssrc/main/defaults.tssrc/preload/index.tssrc/renderer/env.d.tssrc/renderer/App.tsxsrc/renderer/styles.cssswift/OpenWhispHelper.swiftTest plan