ET Mac Voxtral Realtime Desktop App#219
Open
seyeong-han wants to merge 18 commits intometa-pytorch:mainfrom
Open
ET Mac Voxtral Realtime Desktop App#219seyeong-han wants to merge 18 commits intometa-pytorch:mainfrom
seyeong-han wants to merge 18 commits intometa-pytorch:mainfrom
Conversation
Native SwiftUI macOS app that wraps ExecuTorch's voxtral_realtime_runner for on-device speech transcription using Voxtral-Mini-4B (Metal int4). Features: - Live transcription with real-time token streaming - Model preloading with loading progress indicators - Pause/resume within the same session - Session history with search, rename, and persistence - Audio level waveform visualization - Bundled runner binary, libomp, and model artifacts via build phase Uses XcodeGen (project.yml) to generate the Xcode project. Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor
- Introduced DictationManager to handle dictation state and hotkey registration. - Implemented startDictation and stopDictation methods in TranscriptStore for managing dictation sessions. - Added DictationOverlayView and DictationPanel for user interface during dictation. - Updated VoxtralRealtimeApp to integrate dictation features, including accessibility checks and hotkey registration. - Enhanced user experience with real-time dictation text display and command menu options for starting/stopping dictation.
- Raise silence threshold from 0.005 to 0.02 so background noise doesn't prevent auto-stop - Save frontmost app reference before showing panel and re-activate it before pasting - Use nil CGEventSource and .cgSessionEventTap for reliable paste - Add 300ms delay after panel dismiss for focus to settle - Remove AXIsProcessTrusted guard (unreliable with Debug builds) Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor
Overlay text area grows smoothly from 40pt to 200pt as transcribed text exceeds two lines, with animated height transition. Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor
- Introduced new preferences for silence detection: silence threshold and silence timeout, allowing users to customize sensitivity and auto-stop delay. - Updated SettingsView to include sliders for adjusting silence detection parameters. - Added a script to create a DMG for easy application distribution with a drag-to-Applications UI. - Included new app icon assets for better visual representation in the app.
… model repo - Rename project directory from apps/macos/speech-studio to apps/macos/VoxtralRealtimeApp - Rename branch from et-speech-studio to et-voxtral-realtime - Rewrite README as HF showcase app with end-user and developer sections - Update DMG volume name and all text references from "Speech Studio" to "Voxtral Realtime" - Update SetupGuideView with context-aware instructions (bundled vs developer build) - Update context.md to reflect model bundling in DMG and new paths Made-with: Cursor
- Add scripts/build.sh: one-command pipeline (check prereqs → download models → xcodegen → xcodebuild → create DMG), supports --download-models - Update create_dmg.sh: validates all 5 required files (runner, libomp, model, preprocessor, tokenizer) exist in .app bundle before creating DMG - Update README: add Download section pointing to GitHub Releases for end users, add quick-build section for developers, clarify that models are not in git and must be downloaded before building - Update context.md with distribution model and build pipeline decisions Made-with: Cursor
- build.sh now checks CONDA_DEFAULT_ENV is set before proceeding, with full setup instructions if no env is active - README restructured: conda env creation is step 1, all subsequent steps (ExecuTorch install, runner build, model download) run inside the env - Consolidated pip installs (huggingface_hub, sounddevice) into one step - Added DYLD_LIBRARY_PATH to CLI test section - Updated context.md constraints with conda env requirement Made-with: Cursor
Tested: full pipeline runs end-to-end producing a 3.5 GB DMG with all 5 required files bundled (runner, libomp, model, preprocessor, tokenizer). - build.sh: default EXECUTORCH_PATH changed to ~/executorch, enforces non-base conda env with full setup guide for et-metal, --help shows complete one-time setup sequence - project.yml: post-compile script reads EXECUTORCH_PATH and MODEL_DIR env vars (defaults to ~/executorch and ~/voxtral_realtime_quant_metal) - Preferences.swift: fallback runner path updated to ~/executorch - create_dmg.sh: osascript layout step is now non-fatal (skipped in non-interactive shells), hdiutil detach tolerates errors - context.md: paths and constraints updated for et-metal + ~/executorch Made-with: Cursor
Microphone: - Check AVCaptureDevice.authorizationStatus live before every startTranscription, resumeTranscription, and startDictation instead of relying on cached healthResult - Add HealthCheck.liveMicPermission() for direct, non-cached checks - Validate AudioEngine input format after start — throw microphoneNotAvailable if hardware returns zero sample rate - Re-run health check when app returns to foreground so UI reflects permission changes made in System Settings - Error messages now tell user to "quit and relaunch the app" since macOS caches permission grants per process lifetime Accessibility (auto-paste): - Re-check AXIsProcessTrustedWithOptions right before paste, not just at startup — catches trust invalidated by debug rebuilds - Handle nil CGEvents explicitly: log clear error instead of silently failing via optional chaining - Copy text to clipboard before attempting paste so it's always available even if CGEvent fails - Remove startup Accessibility prompt — defer to first paste attempt to avoid confusing users who don't use dictation Made-with: Cursor
Explains how to clear stale permission entries when mic or Accessibility prompts stop appearing after multiple builds/installs. Made-with: Cursor
The entitlements file was empty while Hardened Runtime was enabled, which caused macOS to silently deny microphone access without showing the permission prompt. - com.apple.security.device.audio-input: required for mic access under Hardened Runtime - com.apple.security.cs.disable-library-validation: required to load the bundled unsigned voxtral_realtime_runner and libomp.dylib Made-with: Cursor
Made-with: Cursor
Root cause: xcodegen's `entitlements:` block without `properties:` was overwriting VoxtralRealtime.entitlements to an empty dict on every `xcodegen generate`. The built app under Hardened Runtime had no audio-input entitlement, so macOS silently denied mic access without showing the permission prompt. Fix: - Add `properties:` to the entitlements block in project.yml so xcodegen generates the correct keys every time - Export EXECUTORCH_PATH and MODEL_DIR in build.sh so xcodebuild's post-compile script inherits them - Remove CODE_SIGN_ALLOW_ENTITLEMENTS_MODIFICATION (no longer needed) Verified: codesign -d --entitlements shows both com.apple.security.device.audio-input and com.apple.security.cs.disable-library-validation in the built app. Mic permission prompt appears on first launch after TCC reset. Made-with: Cursor
- Add BSD license headers to all 20 Swift source files - Add BSD license headers to shell scripts (build.sh, create_dmg.sh) - Update bundle identifier from com.younghan to org.pytorch.executorch - Update GitHub release URL from personal fork to official pytorch repo - Update .gitignore to exclude DMG files (binary artifacts) - Update LICENSE file with proper BSD license text
Made-with: Cursor
- Compressed from 17 MB (3456x2234, 240fps .mov) to 563 KB (1728p, 30fps .mp4) - Uploaded to v1.0.0 release assets for GitHub README rendering - Removed large .mov from git history Made-with: Cursor
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.