Skip to content

Ci/GitHub workflows#30

Merged
AlleyBo55 merged 52 commits into
masterfrom
ci/github-workflows
Apr 23, 2026
Merged

Ci/GitHub workflows#30
AlleyBo55 merged 52 commits into
masterfrom
ci/github-workflows

Conversation

@AlleyBo55
Copy link
Copy Markdown
Owner

No description provided.

…le output

- Real mic capture via ffmpeg (avfoundation/pulse/dshow)
- ElevenLabs Scribe v2 Realtime STT with auto-reconnect
- Streaming LLM translation (OpenRouter/OpenAI/Anthropic)
- ElevenLabs Flash v2.5 TTS with on-demand reconnect
- BlackHole virtual mic output via ffmpeg audiotoolbox
- Auto-calibrate VAD threshold from mic noise floor
- Stable-partial timer triggers translation after 1.2s pause
- Delayed TTS flush batches translations (300ms window)
- Device selection by name (survives plug/unplug)
- Mic selector in settings, BlackHole disabled as input
- VAD sensitivity selector (low/medium/high)
- Curated model dropdown per LLM provider
- Backpressure handling for ffmpeg output pipe
…onnect issues

- REST TTS: POST /v1/text-to-speech/{voiceId} with pcm_24000 output
- Each utterance is an independent HTTP request — no shared WebSocket state
- No flush/isFinal/reconnect dance — just request → audio → play
- Resample 24kHz→48kHz and write to BlackHole in one shot
- Remove all TTS WebSocket code (connect, heartbeat, queue, flush timer)
- LLM still streams tokens but collects full translation before TTS
- Simpler, more reliable, works every time
… OBS frame buffer

- Every utterance queued and processed one at a time
- No concurrent translations/TTS — eliminates all race conditions
- Queue drains serially: translate → TTS → write → next
- Delay is OK — but nothing gets lost
- Queue cleared on session stop
- Log shows queue depth for monitoring
…onboarding

- Add push-to-talk mode (default ON) — hold SPACE or button to talk
- PTT uses manual commit strategy, prevents double-commit/throttle
- Fix STT: proper query params (audio_format, commit_strategy, language_code)
- Fix STT: add commit field to input_audio_chunk (required by API)
- Fix STT: auto-reconnect on close + 10s watchdog for silent connections
- Fix STT: handle commit_throttled gracefully without crash loop
- Remove dead TTS WebSocket code (now REST-only)
- Add speaker playback via afplay (temp WAV per chunk, macOS built-in)
- Add prerequisites onboarding step (ffmpeg, sox, BlackHole per platform)
- Add prerequisite check/install IPC handlers
- Fix auto-calibration: only use silence samples for noise floor
- Fix VAD default to high sensitivity
- Clean up unused imports and fields
- Auto-restart session on language change (stop + start with new lang)
- Reset STT connection after each PTT release (1.5s delay for transcript)
- Prevents accumulated buffer from concatenating across PTT presses
- Disable watchdog in PTT mode (silence between presses is expected)
- Default VAD sensitivity to high
- Replace persistent ffmpeg pipe with per-utterance short-lived processes
- Each TTS chunk spawns fresh ffmpeg → BlackHole (stdin + EOF = play + exit)
- Speaker playback via afplay temp WAV (unchanged)
- Eliminates audiotoolbox internal buffering that caused delayed/stacked playback
- Each PTT press-release is fully independent: fresh STT, fresh audio output
- Add screenshots (onboarding, voice clone, main view, settings)
- Add demo.mov video
- Rewrite README with demo/screenshots at top, push-to-talk docs
- Add Remotion reel project (reel/) for IG/TikTok promo video
- Add public/ folder for media assets
- Exclude reel/out/ from git
writeVirtualMic was playing TTS through both BlackHole (virtual mic)
AND system speakers via afplay. The speaker output caused double sound
since the meeting app already plays the virtual mic audio back to the
user. Now only writes to BlackHole — user hears translation through
the meeting app's audio output.
In PTT mode, the STT WebSocket was kept alive between presses,
receiving no audio, causing ElevenLabs to close it every ~5s.
The auto-reconnect handler would immediately reconnect, creating
an infinite connect/close cycle that wasted API quota and caused
race conditions when PTT was pressed during a reconnect.

Now: STT connects on PTT press, disconnects after getting the
committed transcript on release. No reconnect loop when idle.
… to BlackHole

The old approach piped raw PCM to ffmpeg stdin with an empty output
filename, causing ffmpeg to misinterpret the stream and produce
garbled/blurpy audio. Now writes a proper WAV file with correct
headers and lets ffmpeg read it, which also handles resampling
from 24kHz (TTS output) to BlackHole's native rate internally.

Removed the manual linear interpolation resampler — ffmpeg's
built-in sinc resampler produces much cleaner output.
- Added dist:mac, dist:win, dist:linux scripts using electron-builder
- Added electron-builder config in package.json (dmg, nsis, deb)
- Workflows now trigger on ci/github-workflows branch for testing
- GitHub Releases only created on master push
- All platforms upload build artifacts for branch builds
Root cause: raw PCM piped through ffmpeg's audiotoolbox output was
producing garbled/blurpy audio due to format mismatches and pipe
buffering issues.

Fix:
- Request mp3_44100_128 from ElevenLabs instead of pcm_24000
- Write MP3 to temp file (proper encoded format, no header issues)
- Use sox with coreaudio output to play directly to BlackHole
- sox handles decoding, resampling, and device output natively
- Falls back to afplay if sox is unavailable
- Linux: uses sox with pulseaudio output to voicebridge sink
…itoring

- Fixed ffmpeg audiotoolbox output (empty string arg works from Node spawn)
- Fixed device discovery command to use proper dummy input
- Added debug logging for ffmpeg exit codes
- After playing to BlackHole, also plays through speakers via afplay
  so user can hear the translation locally without a meeting app
…2kbps

- stability: 0.5 → 0.75 (less random variation, more natural for clones)
- similarity_boost: 0.75 → 0.85 (closer to original voice)
- style: 0.3 → 0.05 (minimal expressiveness, avoids pitch distortion)
- Added use_speaker_boost: true (normalizes volume)
- Bumped MP3 to 192kbps for cleaner audio
…ic prosody

- Model: eleven_flash_v2_5 → eleven_multilingual_v2 (better pitch/tone
  for Japanese, Korean, Hindi, etc.)
- stability: 0.75 → 0.5 (allow natural intonation variation)
- style: 0.05 → 0.35 (enable language-appropriate expressiveness)
- similarity_boost: 0.85 → 0.8 (balanced clone fidelity vs naturalness)
ElevenLabs output is too quiet by default. Added:
- ffmpeg -af volume=3.0 for BlackHole output (3x amplification)
- afplay --volume 2 for speaker output (2x)
- Same boost on Linux pulse output
Previous approach: volume filter on ffmpeg audiotoolbox output didn't
actually boost the audio heard through speakers (afplay was separate).

New approach: first convert MP3 → boosted WAV using loudnorm (broadcast
standard -14 LUFS) + volume=2.0, then play the boosted WAV through
both BlackHole and speakers. Both outputs get the same loud audio.
Playing to both BlackHole (ffmpeg) and speakers (afplay) simultaneously
created a reverb effect due to timing offset between the two processes.
Now only plays to BlackHole. Dropped loudnorm filter (added latency),
kept simple volume=3.0 boost.
Create a volume-boosted MP3 (6x) first, then play that same loud
file to both BlackHole and speakers. Both outputs get the same
boosted audio. Previous approach only boosted BlackHole (ffmpeg
volume filter) but afplay got the quiet original.
…-addon

- rootDir: src/main → src (includes shared/ and native/ directories)
- Removed string fallback returns from #findBlackHoleIndex (was returning
  'BlackHole 2ch' string where number | null was expected)
… BlackHole

Root cause of double playback: audio went to both BlackHole (ffmpeg)
and speakers (afplay). With headphones, both outputs were audible.
With MacBook speakers, only afplay was heard (nothing reading BlackHole).

Fix: removed BlackHole ffmpeg output entirely. Now only plays through
the default audio output (afplay) — works the same on speakers and
headphones. One play, no reverb, no double sound.

For meetings: user sets up a macOS Multi-Output Device (BlackHole +
speakers) in Audio MIDI Setup, or the meeting app monitors BlackHole
separately.
- settings-store: use Record<string, unknown> cache to avoid Partial type issues
- driver-installer: remove unused exec import and #nativeAddon field
- electron-ipc: remove unused VALID_RENDERER_CHANNELS import
- main.ts: fix unused vars, exactOptionalPropertyTypes, IPC type mismatches
- native-addon: suppress unused #bhIdx and #findBlackHoleIndex warnings
- Remove unused _p progress helper in driver-installer.ts #installLinux
- Replace loose _tray/_autoStart vars with retained object to satisfy noUnusedLocals
- Widen tsconfig.renderer.json rootDir from src/renderer to src (includes src/shared)
- electron-builder requires electron in devDependencies, not dependencies
- Add missing author field to satisfy electron-builder validation
CI was running tsc which output to dist/main/main/main.js due to
rootDir: src, but electron-builder expected dist/main/main.js.
Preload script was also not built in CI at all.

- Add desktop/scripts/build.mjs (esbuild production bundler)
- Update build:main script to use esbuild instead of tsc
- Update all three CI workflows (macOS, Windows, Linux)
- Add homepage, repository, author email (required for Linux .deb maintainer)
- Set publish: null to prevent auto-update info generation errors on all platforms
- Output main process as .cjs to avoid 'require is not defined' in ES module
  scope (package.json has type: module, esbuild outputs CJS)
- Fix production renderer load path (was dist/renderer/src/renderer/index.html,
  now dist/renderer/index.html)
- Add custom VoiceBridge app icon (Nothing design: black bg, bridge motif,
  red accent dot, dot grid, monospace VB)
- Generate icon.icns (macOS), icon.ico (Windows), icon.png (Linux)
- Add generate-icons.mjs script for regenerating from SVG
Vite's default base: '/' produces absolute paths (/assets/...) which resolve
to filesystem root under file:// in Electron. Set base: './' for relative paths.
- Production window: 420x680 (was 360x480), min 380x520
- Enable frame, resizable, and hiddenInset title bar for macOS
- Show in taskbar in both dev and production
Electron apps launched from Finder/Dock don't inherit the user's shell PATH.
This caused createNativeAddon() to fall back to MockNativeAddon (silence)
because 'which ffmpeg' failed without /opt/homebrew/bin or /usr/local/bin.

Prepend common install locations to process.env.PATH at startup.
- Add NSMicrophoneUsageDescription to Info.plist via extendInfo
- Add entitlements.mac.plist with audio-input entitlement
- Update CI workflow to rename and upload both x64 and arm64 DMGs
- Release notes now show architecture table for downloads
@AlleyBo55 AlleyBo55 merged commit f877180 into master Apr 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant