ET Mac Voxtral Realtime Desktop App by seyeong-han · Pull Request #219 · meta-pytorch/executorch-examples

seyeong-han · 2026-03-05T23:11:55Z

No description provided.

Native SwiftUI macOS app that wraps ExecuTorch's voxtral_realtime_runner for on-device speech transcription using Voxtral-Mini-4B (Metal int4). Features: - Live transcription with real-time token streaming - Model preloading with loading progress indicators - Pause/resume within the same session - Session history with search, rename, and persistence - Audio level waveform visualization - Bundled runner binary, libomp, and model artifacts via build phase Uses XcodeGen (project.yml) to generate the Xcode project. Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor

- Introduced DictationManager to handle dictation state and hotkey registration. - Implemented startDictation and stopDictation methods in TranscriptStore for managing dictation sessions. - Added DictationOverlayView and DictationPanel for user interface during dictation. - Updated VoxtralRealtimeApp to integrate dictation features, including accessibility checks and hotkey registration. - Enhanced user experience with real-time dictation text display and command menu options for starting/stopping dictation.

- Raise silence threshold from 0.005 to 0.02 so background noise doesn't prevent auto-stop - Save frontmost app reference before showing panel and re-activate it before pasting - Use nil CGEventSource and .cgSessionEventTap for reliable paste - Add 300ms delay after panel dismiss for focus to settle - Remove AXIsProcessTrusted guard (unreliable with Debug builds) Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor

Overlay text area grows smoothly from 40pt to 200pt as transcribed text exceeds two lines, with animated height transition. Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor

- Introduced new preferences for silence detection: silence threshold and silence timeout, allowing users to customize sensitivity and auto-stop delay. - Updated SettingsView to include sliders for adjusting silence detection parameters. - Added a script to create a DMG for easy application distribution with a drag-to-Applications UI. - Included new app icon assets for better visual representation in the app.

… model repo - Rename project directory from apps/macos/speech-studio to apps/macos/VoxtralRealtimeApp - Rename branch from et-speech-studio to et-voxtral-realtime - Rewrite README as HF showcase app with end-user and developer sections - Update DMG volume name and all text references from "Speech Studio" to "Voxtral Realtime" - Update SetupGuideView with context-aware instructions (bundled vs developer build) - Update context.md to reflect model bundling in DMG and new paths Made-with: Cursor

- Add scripts/build.sh: one-command pipeline (check prereqs → download models → xcodegen → xcodebuild → create DMG), supports --download-models - Update create_dmg.sh: validates all 5 required files (runner, libomp, model, preprocessor, tokenizer) exist in .app bundle before creating DMG - Update README: add Download section pointing to GitHub Releases for end users, add quick-build section for developers, clarify that models are not in git and must be downloaded before building - Update context.md with distribution model and build pipeline decisions Made-with: Cursor

- build.sh now checks CONDA_DEFAULT_ENV is set before proceeding, with full setup instructions if no env is active - README restructured: conda env creation is step 1, all subsequent steps (ExecuTorch install, runner build, model download) run inside the env - Consolidated pip installs (huggingface_hub, sounddevice) into one step - Added DYLD_LIBRARY_PATH to CLI test section - Updated context.md constraints with conda env requirement Made-with: Cursor

Tested: full pipeline runs end-to-end producing a 3.5 GB DMG with all 5 required files bundled (runner, libomp, model, preprocessor, tokenizer). - build.sh: default EXECUTORCH_PATH changed to ~/executorch, enforces non-base conda env with full setup guide for et-metal, --help shows complete one-time setup sequence - project.yml: post-compile script reads EXECUTORCH_PATH and MODEL_DIR env vars (defaults to ~/executorch and ~/voxtral_realtime_quant_metal) - Preferences.swift: fallback runner path updated to ~/executorch - create_dmg.sh: osascript layout step is now non-fatal (skipped in non-interactive shells), hdiutil detach tolerates errors - context.md: paths and constraints updated for et-metal + ~/executorch Made-with: Cursor

Microphone: - Check AVCaptureDevice.authorizationStatus live before every startTranscription, resumeTranscription, and startDictation instead of relying on cached healthResult - Add HealthCheck.liveMicPermission() for direct, non-cached checks - Validate AudioEngine input format after start — throw microphoneNotAvailable if hardware returns zero sample rate - Re-run health check when app returns to foreground so UI reflects permission changes made in System Settings - Error messages now tell user to "quit and relaunch the app" since macOS caches permission grants per process lifetime Accessibility (auto-paste): - Re-check AXIsProcessTrustedWithOptions right before paste, not just at startup — catches trust invalidated by debug rebuilds - Handle nil CGEvents explicitly: log clear error instead of silently failing via optional chaining - Copy text to clipboard before attempting paste so it's always available even if CGEvent fails - Remove startup Accessibility prompt — defer to first paste attempt to avoid confusing users who don't use dictation Made-with: Cursor

Explains how to clear stale permission entries when mic or Accessibility prompts stop appearing after multiple builds/installs. Made-with: Cursor

The entitlements file was empty while Hardened Runtime was enabled, which caused macOS to silently deny microphone access without showing the permission prompt. - com.apple.security.device.audio-input: required for mic access under Hardened Runtime - com.apple.security.cs.disable-library-validation: required to load the bundled unsigned voxtral_realtime_runner and libomp.dylib Made-with: Cursor

Made-with: Cursor

Root cause: xcodegen's `entitlements:` block without `properties:` was overwriting VoxtralRealtime.entitlements to an empty dict on every `xcodegen generate`. The built app under Hardened Runtime had no audio-input entitlement, so macOS silently denied mic access without showing the permission prompt. Fix: - Add `properties:` to the entitlements block in project.yml so xcodegen generates the correct keys every time - Export EXECUTORCH_PATH and MODEL_DIR in build.sh so xcodebuild's post-compile script inherits them - Remove CODE_SIGN_ALLOW_ENTITLEMENTS_MODIFICATION (no longer needed) Verified: codesign -d --entitlements shows both com.apple.security.device.audio-input and com.apple.security.cs.disable-library-validation in the built app. Mic permission prompt appears on first launch after TCC reset. Made-with: Cursor

- Add BSD license headers to all 20 Swift source files - Add BSD license headers to shell scripts (build.sh, create_dmg.sh) - Update bundle identifier from com.younghan to org.pytorch.executorch - Update GitHub release URL from personal fork to official pytorch repo - Update .gitignore to exclude DMG files (binary artifacts) - Update LICENSE file with proper BSD license text

Made-with: Cursor

- Compressed from 17 MB (3456x2234, 240fps .mov) to 563 KB (1728p, 30fps .mp4) - Uploaded to v1.0.0 release assets for GitHub README rendering - Removed large .mov from git history Made-with: Cursor

Made-with: Cursor

seyeong-han and others added 15 commits March 3, 2026 13:32

Auto-expand dictation overlay for longer transcriptions

7930545

Overlay text area grows smoothly from 40pt to 200pt as transcribed text exceeds two lines, with animated height transition. Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor

Add TCC reset tip to troubleshooting section

12958b9

Explains how to clear stale permission entries when mic or Accessibility prompts stop appearing after multiple builds/installs. Made-with: Cursor

Allow entitlements modification to fix stale build cache error

fe5fa1d

Made-with: Cursor

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 5, 2026

seyeong-han added 3 commits March 5, 2026 15:15

Add demo video to README

651a202

Made-with: Cursor

Replace demo video with compressed mp4 hosted on GitHub Releases

561f573

- Compressed from 17 MB (3456x2234, 240fps .mov) to 563 KB (1728p, 30fps .mp4) - Uploaded to v1.0.0 release assets for GitHub README rendering - Removed large .mov from git history Made-with: Cursor

Use GitHub user-attachments URL for demo video

cbaf029

Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ET Mac Voxtral Realtime Desktop App#219

ET Mac Voxtral Realtime Desktop App#219
seyeong-han wants to merge 18 commits intometa-pytorch:mainfrom
seyeong-han:et-voxtral-realtime

seyeong-han commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seyeong-han commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant