Ci/GitHub workflows#30
Merged
Merged
Conversation
…le output - Real mic capture via ffmpeg (avfoundation/pulse/dshow) - ElevenLabs Scribe v2 Realtime STT with auto-reconnect - Streaming LLM translation (OpenRouter/OpenAI/Anthropic) - ElevenLabs Flash v2.5 TTS with on-demand reconnect - BlackHole virtual mic output via ffmpeg audiotoolbox - Auto-calibrate VAD threshold from mic noise floor - Stable-partial timer triggers translation after 1.2s pause - Delayed TTS flush batches translations (300ms window) - Device selection by name (survives plug/unplug) - Mic selector in settings, BlackHole disabled as input - VAD sensitivity selector (low/medium/high) - Curated model dropdown per LLM provider - Backpressure handling for ffmpeg output pipe
…onnect issues
- REST TTS: POST /v1/text-to-speech/{voiceId} with pcm_24000 output
- Each utterance is an independent HTTP request — no shared WebSocket state
- No flush/isFinal/reconnect dance — just request → audio → play
- Resample 24kHz→48kHz and write to BlackHole in one shot
- Remove all TTS WebSocket code (connect, heartbeat, queue, flush timer)
- LLM still streams tokens but collects full translation before TTS
- Simpler, more reliable, works every time
… OBS frame buffer - Every utterance queued and processed one at a time - No concurrent translations/TTS — eliminates all race conditions - Queue drains serially: translate → TTS → write → next - Delay is OK — but nothing gets lost - Queue cleared on session stop - Log shows queue depth for monitoring
…onboarding - Add push-to-talk mode (default ON) — hold SPACE or button to talk - PTT uses manual commit strategy, prevents double-commit/throttle - Fix STT: proper query params (audio_format, commit_strategy, language_code) - Fix STT: add commit field to input_audio_chunk (required by API) - Fix STT: auto-reconnect on close + 10s watchdog for silent connections - Fix STT: handle commit_throttled gracefully without crash loop - Remove dead TTS WebSocket code (now REST-only) - Add speaker playback via afplay (temp WAV per chunk, macOS built-in) - Add prerequisites onboarding step (ffmpeg, sox, BlackHole per platform) - Add prerequisite check/install IPC handlers - Fix auto-calibration: only use silence samples for noise floor - Fix VAD default to high sensitivity - Clean up unused imports and fields
- Auto-restart session on language change (stop + start with new lang) - Reset STT connection after each PTT release (1.5s delay for transcript) - Prevents accumulated buffer from concatenating across PTT presses - Disable watchdog in PTT mode (silence between presses is expected) - Default VAD sensitivity to high
- Replace persistent ffmpeg pipe with per-utterance short-lived processes - Each TTS chunk spawns fresh ffmpeg → BlackHole (stdin + EOF = play + exit) - Speaker playback via afplay temp WAV (unchanged) - Eliminates audiotoolbox internal buffering that caused delayed/stacked playback - Each PTT press-release is fully independent: fresh STT, fresh audio output
- Add screenshots (onboarding, voice clone, main view, settings) - Add demo.mov video - Rewrite README with demo/screenshots at top, push-to-talk docs - Add Remotion reel project (reel/) for IG/TikTok promo video - Add public/ folder for media assets - Exclude reel/out/ from git
writeVirtualMic was playing TTS through both BlackHole (virtual mic) AND system speakers via afplay. The speaker output caused double sound since the meeting app already plays the virtual mic audio back to the user. Now only writes to BlackHole — user hears translation through the meeting app's audio output.
In PTT mode, the STT WebSocket was kept alive between presses, receiving no audio, causing ElevenLabs to close it every ~5s. The auto-reconnect handler would immediately reconnect, creating an infinite connect/close cycle that wasted API quota and caused race conditions when PTT was pressed during a reconnect. Now: STT connects on PTT press, disconnects after getting the committed transcript on release. No reconnect loop when idle.
… to BlackHole The old approach piped raw PCM to ffmpeg stdin with an empty output filename, causing ffmpeg to misinterpret the stream and produce garbled/blurpy audio. Now writes a proper WAV file with correct headers and lets ffmpeg read it, which also handles resampling from 24kHz (TTS output) to BlackHole's native rate internally. Removed the manual linear interpolation resampler — ffmpeg's built-in sinc resampler produces much cleaner output.
- Added dist:mac, dist:win, dist:linux scripts using electron-builder - Added electron-builder config in package.json (dmg, nsis, deb) - Workflows now trigger on ci/github-workflows branch for testing - GitHub Releases only created on master push - All platforms upload build artifacts for branch builds
Root cause: raw PCM piped through ffmpeg's audiotoolbox output was producing garbled/blurpy audio due to format mismatches and pipe buffering issues. Fix: - Request mp3_44100_128 from ElevenLabs instead of pcm_24000 - Write MP3 to temp file (proper encoded format, no header issues) - Use sox with coreaudio output to play directly to BlackHole - sox handles decoding, resampling, and device output natively - Falls back to afplay if sox is unavailable - Linux: uses sox with pulseaudio output to voicebridge sink
…itoring - Fixed ffmpeg audiotoolbox output (empty string arg works from Node spawn) - Fixed device discovery command to use proper dummy input - Added debug logging for ffmpeg exit codes - After playing to BlackHole, also plays through speakers via afplay so user can hear the translation locally without a meeting app
… MacBook speakers
…2kbps - stability: 0.5 → 0.75 (less random variation, more natural for clones) - similarity_boost: 0.75 → 0.85 (closer to original voice) - style: 0.3 → 0.05 (minimal expressiveness, avoids pitch distortion) - Added use_speaker_boost: true (normalizes volume) - Bumped MP3 to 192kbps for cleaner audio
…ic prosody - Model: eleven_flash_v2_5 → eleven_multilingual_v2 (better pitch/tone for Japanese, Korean, Hindi, etc.) - stability: 0.75 → 0.5 (allow natural intonation variation) - style: 0.05 → 0.35 (enable language-appropriate expressiveness) - similarity_boost: 0.85 → 0.8 (balanced clone fidelity vs naturalness)
ElevenLabs output is too quiet by default. Added: - ffmpeg -af volume=3.0 for BlackHole output (3x amplification) - afplay --volume 2 for speaker output (2x) - Same boost on Linux pulse output
Previous approach: volume filter on ffmpeg audiotoolbox output didn't actually boost the audio heard through speakers (afplay was separate). New approach: first convert MP3 → boosted WAV using loudnorm (broadcast standard -14 LUFS) + volume=2.0, then play the boosted WAV through both BlackHole and speakers. Both outputs get the same loud audio.
Playing to both BlackHole (ffmpeg) and speakers (afplay) simultaneously created a reverb effect due to timing offset between the two processes. Now only plays to BlackHole. Dropped loudnorm filter (added latency), kept simple volume=3.0 boost.
Create a volume-boosted MP3 (6x) first, then play that same loud file to both BlackHole and speakers. Both outputs get the same boosted audio. Previous approach only boosted BlackHole (ffmpeg volume filter) but afplay got the quiet original.
…-addon - rootDir: src/main → src (includes shared/ and native/ directories) - Removed string fallback returns from #findBlackHoleIndex (was returning 'BlackHole 2ch' string where number | null was expected)
… BlackHole Root cause of double playback: audio went to both BlackHole (ffmpeg) and speakers (afplay). With headphones, both outputs were audible. With MacBook speakers, only afplay was heard (nothing reading BlackHole). Fix: removed BlackHole ffmpeg output entirely. Now only plays through the default audio output (afplay) — works the same on speakers and headphones. One play, no reverb, no double sound. For meetings: user sets up a macOS Multi-Output Device (BlackHole + speakers) in Audio MIDI Setup, or the meeting app monitors BlackHole separately.
- settings-store: use Record<string, unknown> cache to avoid Partial type issues - driver-installer: remove unused exec import and #nativeAddon field - electron-ipc: remove unused VALID_RENDERER_CHANNELS import - main.ts: fix unused vars, exactOptionalPropertyTypes, IPC type mismatches - native-addon: suppress unused #bhIdx and #findBlackHoleIndex warnings
- Remove unused _p progress helper in driver-installer.ts #installLinux - Replace loose _tray/_autoStart vars with retained object to satisfy noUnusedLocals - Widen tsconfig.renderer.json rootDir from src/renderer to src (includes src/shared)
- electron-builder requires electron in devDependencies, not dependencies - Add missing author field to satisfy electron-builder validation
CI was running tsc which output to dist/main/main/main.js due to rootDir: src, but electron-builder expected dist/main/main.js. Preload script was also not built in CI at all. - Add desktop/scripts/build.mjs (esbuild production bundler) - Update build:main script to use esbuild instead of tsc - Update all three CI workflows (macOS, Windows, Linux)
- Add homepage, repository, author email (required for Linux .deb maintainer) - Set publish: null to prevent auto-update info generation errors on all platforms
- Output main process as .cjs to avoid 'require is not defined' in ES module scope (package.json has type: module, esbuild outputs CJS) - Fix production renderer load path (was dist/renderer/src/renderer/index.html, now dist/renderer/index.html) - Add custom VoiceBridge app icon (Nothing design: black bg, bridge motif, red accent dot, dot grid, monospace VB) - Generate icon.icns (macOS), icon.ico (Windows), icon.png (Linux) - Add generate-icons.mjs script for regenerating from SVG
Vite's default base: '/' produces absolute paths (/assets/...) which resolve to filesystem root under file:// in Electron. Set base: './' for relative paths.
- Production window: 420x680 (was 360x480), min 380x520 - Enable frame, resizable, and hiddenInset title bar for macOS - Show in taskbar in both dev and production
Electron apps launched from Finder/Dock don't inherit the user's shell PATH. This caused createNativeAddon() to fall back to MockNativeAddon (silence) because 'which ffmpeg' failed without /opt/homebrew/bin or /usr/local/bin. Prepend common install locations to process.env.PATH at startup.
- Add NSMicrophoneUsageDescription to Info.plist via extendInfo - Add entitlements.mac.plist with audio-input entitlement - Update CI workflow to rename and upload both x64 and arm64 DMGs - Release notes now show architecture table for downloads
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.