Skip to content

feat: right Option key (optional), handsfree mode, UI upgrades#6

Open
nicremo wants to merge 8 commits into
giusmarci:mainfrom
nicremo:feat/option-key-handsfree-ui
Open

feat: right Option key (optional), handsfree mode, UI upgrades#6
nicremo wants to merge 8 commits into
giusmarci:mainfrom
nicremo:feat/option-key-handsfree-ui

Conversation

@nicremo
Copy link
Copy Markdown

@nicremo nicremo commented Apr 13, 2026

Summary

Switches the dictation hotkey to right Option key, adds a handsfree (continuous) dictation mode with double-click activation, improves the idle overlay with a minimal dot indicator, and adds several other enhancements.

Depends on: #5 (dictionary & corrections) and #4 (cloud transcription)

Right Option Key

  • Replaces the Fn key with the right Option key (keyCode 61) as the dictation trigger
  • The right Option key is rarely used in daily workflows, making it a better default
  • Swift helper updated to listen for flagsChanged events on keyCode 61

Handsfree Mode

  • Double-click the right Option key to enter handsfree (continuous) dictation mode
  • Recording continues until you press the key again
  • Visual indicator in the overlay shows when handsfree mode is active
  • Clean state management with proper cleanup on mode exit

UI Upgrades

  • Idle indicator: Minimal 5-dot animation replaces the previous idle state, looks much cleaner
  • App-aware auto style switching: Define per-app rules (e.g., "VS Code" -> Vibe Coding style, "Slack" -> Conversation style)
  • App rules persisted to JSON with full CRUD operations
  • Credits section added to preferences

Cloud LLM Rewrite

  • Optional cloud-based text rewriting via Groq or any OpenAI-compatible provider
  • Uses chat completions API (e.g., llama-3.3-70b-versatile at 1000 tokens/sec)
  • Automatic fallback to local Ollama if cloud rewrite fails
  • Configurable on the Models page with model selector

Build Fixes

  • App icon (icon.icns) included for packaged builds
  • Tray icon path resolution fixed for both dev and packaged mode
  • entitlements.mac.plist with audio-input entitlement for microphone access
  • Microphone permission flow improved (fresh state detection)

Test plan

  • Right Option key: hold to dictate, release to process
  • Double-click right Option: enters handsfree mode, key press exits it
  • Handsfree indicator: overlay shows active handsfree state
  • Idle state: 5-dot animation displays in overlay
  • App rules: add a rule (e.g., VS Code -> Vibe Coding), verify style switches automatically
  • App rules: edit and remove rules
  • Cloud rewrite: enable on Models page, verify text is polished via cloud LLM
  • Cloud rewrite fallback: disconnect internet, verify fallback to Ollama
  • Packaged build: npm run package produces working .dmg with correct icon
  • Microphone permission: fresh install prompts for mic access correctly

nicremo added 8 commits April 13, 2026 12:24
- Cloud transcription via Groq/OpenAI-compatible APIs (whisper-large-v3)
- Auto/Cloud/Local transcription mode with automatic offline fallback
- API key encrypted via macOS Keychain (Electron safeStorage)
- Default text model changed from gemma4:e4b (9.6GB) to qwen3.5:2b (2.7GB)
- Configurable API base URL (Groq, OpenAI, Lemonfox, any compatible provider)
- Language selector (German default, 11 languages available)
- Stronger same-language prompt to prevent LLM translation
- Built-in microphone preferred over external devices (AirPods fix)
- New TranscriptionCard UI with source selector, API key management
- Setup wizard with cloud/local transcription choice
- Relaxed hotkey validation: Ollama not required when enhancement is off
Dictionary & Corrections:
- Custom vocabulary tab with words and misspelling corrections
- Words sent as Whisper prompt hints for better transcription
- Corrections injected into LLM system prompt for auto-replacement
- Two-column layout: Words (left) + Misspellings (right)
- Async file lock prevents race conditions on concurrent writes
- Whisper prompt truncated at ~800 chars (224 token limit)
- IPC handlers with runtime input validation

LLM Performance:
- Disable thinking mode (think: false) for qwen3.5 models
- Reduces rewrite time from ~14s to ~0.3s
- Strip <think> tags from output as safety fallback

Pipeline logging:
- Log transcription settings, raw text, and final text for debugging
App Rules:
- Auto-detect active app and switch style/enhancement level
- 31 pre-configured dev tools (Ghostty, VS Code, Cursor, Zed, JetBrains, etc.)
- All terminals, IDEs, git clients, API/DB tools default to Vibe Coding High
- Non-dev apps use the user's manual default setting
- Editable per-app style and level on the Style page
- Rules stored in app-rules.json, add/remove/update via UI

Credits:
- Sidebar footer shows "Original by @giusmarci" and "Enhanced by Fabian Bitz"
Hotkey:
- Switched from Fn to right Option key (keyCode 61)
- Hold to record (release to transcribe)
- Double-press for Handsfree mode (records until next single press)

Handsfree Mode:
- Debounced double-click detection (300ms window)
- Red pulsing dot indicator in overlay when active
- Red border around overlay bar during handsfree recording
- Label shows "Handsfree" to distinguish from hold mode
- Single press while handsfree stops recording and transcribes

Idle Overlay:
- Minimal 5-dot indicator always visible at bottom of screen
- Shows app is ready without being intrusive
- Same border-radius (14px) as active overlay
- Semi-transparent with backdrop blur

German Language Fix:
- Language-specific reinforcement prompts written in target language
- German prompt instructs LLM in German to output German
- Preserves English technical terms when present in input
- Dynamic prompt generation based on cloudLanguage setting

Build:
- Moved electron/electron-builder to devDependencies for packaging
App Icon:
- Generated .icns from OpenWhisp logo (339px PNG -> all icon sizes)
- Placed in build/icon.icns for electron-builder to pick up

Tray Icon:
- Added build/icons to extraResources so tray icons are bundled
- Fixes missing menubar icon in packaged app

Microphone Permission:
- Check current status before requesting
- If already denied, open System Settings directly (askForMediaAccess
  shows no dialog when previously denied)
- Extended wait timeout to 15 attempts for manual permission grant
Root cause: Hardened Runtime blocks microphone access without explicit
com.apple.security.device.audio-input entitlement. The app never appeared
in macOS Microphone privacy list because the OS silently denied the request.

- Added build/entitlements.mac.plist with audio-input entitlement
- Configured electron-builder to use entitlements for both main and child processes
- Improved mic permission flow: opens System Settings when previously denied
- Cloud rewrite as primary, Ollama as fallback (same pattern as transcription)
- Default model: openai/gpt-oss-20b (1000 tokens/sec, practically free)
- 5 cloud models available: GPT-OSS 20B/120B, Qwen3 32B, Llama 3.3 70B, Llama 3.1 8B
- Cloud/Local toggle on Models page under Text Enhancement
- Auto-fallback to local Ollama when cloud is unavailable
- Ollama no longer required when rewrite mode is Cloud
- Same API key and base URL as transcription (Groq)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant