feat: right Option key (optional), handsfree mode, UI upgrades#6
Open
nicremo wants to merge 8 commits into
Open
feat: right Option key (optional), handsfree mode, UI upgrades#6nicremo wants to merge 8 commits into
nicremo wants to merge 8 commits into
Conversation
- Cloud transcription via Groq/OpenAI-compatible APIs (whisper-large-v3) - Auto/Cloud/Local transcription mode with automatic offline fallback - API key encrypted via macOS Keychain (Electron safeStorage) - Default text model changed from gemma4:e4b (9.6GB) to qwen3.5:2b (2.7GB) - Configurable API base URL (Groq, OpenAI, Lemonfox, any compatible provider) - Language selector (German default, 11 languages available) - Stronger same-language prompt to prevent LLM translation - Built-in microphone preferred over external devices (AirPods fix) - New TranscriptionCard UI with source selector, API key management - Setup wizard with cloud/local transcription choice - Relaxed hotkey validation: Ollama not required when enhancement is off
Dictionary & Corrections: - Custom vocabulary tab with words and misspelling corrections - Words sent as Whisper prompt hints for better transcription - Corrections injected into LLM system prompt for auto-replacement - Two-column layout: Words (left) + Misspellings (right) - Async file lock prevents race conditions on concurrent writes - Whisper prompt truncated at ~800 chars (224 token limit) - IPC handlers with runtime input validation LLM Performance: - Disable thinking mode (think: false) for qwen3.5 models - Reduces rewrite time from ~14s to ~0.3s - Strip <think> tags from output as safety fallback Pipeline logging: - Log transcription settings, raw text, and final text for debugging
App Rules: - Auto-detect active app and switch style/enhancement level - 31 pre-configured dev tools (Ghostty, VS Code, Cursor, Zed, JetBrains, etc.) - All terminals, IDEs, git clients, API/DB tools default to Vibe Coding High - Non-dev apps use the user's manual default setting - Editable per-app style and level on the Style page - Rules stored in app-rules.json, add/remove/update via UI Credits: - Sidebar footer shows "Original by @giusmarci" and "Enhanced by Fabian Bitz"
Hotkey: - Switched from Fn to right Option key (keyCode 61) - Hold to record (release to transcribe) - Double-press for Handsfree mode (records until next single press) Handsfree Mode: - Debounced double-click detection (300ms window) - Red pulsing dot indicator in overlay when active - Red border around overlay bar during handsfree recording - Label shows "Handsfree" to distinguish from hold mode - Single press while handsfree stops recording and transcribes Idle Overlay: - Minimal 5-dot indicator always visible at bottom of screen - Shows app is ready without being intrusive - Same border-radius (14px) as active overlay - Semi-transparent with backdrop blur German Language Fix: - Language-specific reinforcement prompts written in target language - German prompt instructs LLM in German to output German - Preserves English technical terms when present in input - Dynamic prompt generation based on cloudLanguage setting Build: - Moved electron/electron-builder to devDependencies for packaging
App Icon: - Generated .icns from OpenWhisp logo (339px PNG -> all icon sizes) - Placed in build/icon.icns for electron-builder to pick up Tray Icon: - Added build/icons to extraResources so tray icons are bundled - Fixes missing menubar icon in packaged app Microphone Permission: - Check current status before requesting - If already denied, open System Settings directly (askForMediaAccess shows no dialog when previously denied) - Extended wait timeout to 15 attempts for manual permission grant
Root cause: Hardened Runtime blocks microphone access without explicit com.apple.security.device.audio-input entitlement. The app never appeared in macOS Microphone privacy list because the OS silently denied the request. - Added build/entitlements.mac.plist with audio-input entitlement - Configured electron-builder to use entitlements for both main and child processes - Improved mic permission flow: opens System Settings when previously denied
- Cloud rewrite as primary, Ollama as fallback (same pattern as transcription) - Default model: openai/gpt-oss-20b (1000 tokens/sec, practically free) - 5 cloud models available: GPT-OSS 20B/120B, Qwen3 32B, Llama 3.3 70B, Llama 3.1 8B - Cloud/Local toggle on Models page under Text Enhancement - Auto-fallback to local Ollama when cloud is unavailable - Ollama no longer required when rewrite mode is Cloud - Same API key and base URL as transcription (Groq)
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switches the dictation hotkey to right Option key, adds a handsfree (continuous) dictation mode with double-click activation, improves the idle overlay with a minimal dot indicator, and adds several other enhancements.
Right Option Key
flagsChangedevents on keyCode 61Handsfree Mode
UI Upgrades
Cloud LLM Rewrite
llama-3.3-70b-versatileat 1000 tokens/sec)Build Fixes
icon.icns) included for packaged buildsentitlements.mac.plistwith audio-input entitlement for microphone accessTest plan
npm run packageproduces working .dmg with correct icon