Skip to content

Replace BlackHole with ScreenCaptureKit for audio capture#2

Open
morellid wants to merge 20 commits into
ajayrmk:mainfrom
morellid:feature/ScreenCaptureKit
Open

Replace BlackHole with ScreenCaptureKit for audio capture#2
morellid wants to merge 20 commits into
ajayrmk:mainfrom
morellid:feature/ScreenCaptureKit

Conversation

@morellid
Copy link
Copy Markdown

Summary

  • Rewrite audio recording to use ScreenCaptureKit (via a Swift helper) for capturing meeting app audio directly, replacing the BlackHole virtual audio device dependency. Mic audio is captured in parallel via sounddevice and mixed on stop.
  • Improve meeting detection: gate auto-record on meeting app presence (avoids recording YouTube/Spotify), add Firefox tab detection, fix Google Meet landing page false positives, and exclude Firefox from stop detection (unreliable window titles).
  • Reliability improvements: BLE device-switch resilience for mic recording, stream audio to temp files instead of memory buffering, allow new recordings to start while previous transcription runs, and auto-enrich transcripts via Claude API.

Key changes

  • swift/sck-capture/ — New Swift helper binary for ScreenCaptureKit audio capture
  • trnscrb/sck.py — Python subprocess wrapper for the Swift helper
  • trnscrb/recorder.py — Rewritten with dual-source SCK+mic capture
  • trnscrb/screen_capture.py — Screen Recording permission check via CoreGraphics
  • trnscrb/watcher.py — Enhanced meeting detection with app gating, Firefox support, grace period fixes
  • trnscrb/menu_bar.py — Updated for new recorder API and auto-enrichment
  • trnscrb/settings.py — Added auto_enrich setting

Why ScreenCaptureKit over BlackHole

BlackHole requires installing a kernel extension / virtual audio device, which is fragile across macOS updates and requires user configuration. ScreenCaptureKit is a first-party Apple API (macOS 13+) that captures app audio directly — no extra install, no audio routing changes, and it captures the specific meeting app rather than all system audio.

morellid added 20 commits March 30, 2026 08:12
ScreenCaptureKit helper that captures audio from a specific app by
bundle ID and writes raw float32 PCM (16kHz mono) to stdout.

- sck-capture <bundle-id>: capture audio, pipe PCM to stdout
- sck-capture --check: exit 0 if Screen Recording permission granted
- Status/errors to stderr, READY signal when capture starts
- Handles SIGTERM/SIGINT for clean shutdown
- Requires macOS 13+, built with: cd swift/sck-capture && swift build
ctypes bindings to CGPreflightScreenCaptureAccess / CGRequestScreenCaptureAccess
from CoreGraphics. No PyObjC dependency needed.
Manages the sck-capture Swift helper subprocess. Reads raw float32 PCM
from stdout in 1024-sample chunks, collects frames under a lock — same
format as sounddevice callbacks for easy mixing.
When app_bundle_id is provided, captures two streams in parallel:
- ScreenCaptureKit for meeting app audio (device-independent)
- sounddevice for mic (user's voice)

Streams are mixed on stop() with clipping. Falls back to mic-only
when no bundle ID, no SCK binary, or no permission.

Removes BlackHole detection (find_blackhole_device, list_input_devices).
If sounddevice reports a device error (e.g. Bluetooth earbuds
disconnect mid-meeting), auto-restart the mic stream with the
new default input device. Brief ~1s gap in mic audio, no crash.
SCK capture is unaffected by device changes.
detect_meeting() now returns (name, bundle_id) tuple. Native apps
map to static bundle IDs. Browser-based meetings return the bundle
ID of whichever browser (Chrome, Safari) the meeting tab was found in.

on_start callback signature updated to (meeting_name, bundle_id).
- menu_bar: pass bundle_id from watcher, use audio_source_description
- mcp_server: use Recorder() without device arg
- cli watch: pass bundle_id from watcher on_start callback

Removes all BlackHole references from consumer sites.
- Replace BlackHole install step with Xcode CLI tools check and SCK
  binary build (swift build -c release, copies to ~/.local/share/trnscrb/)
- Add Screen Recording permission check to install wizard
- Add helper functions: _xcode_cli_installed, _sck_binary_built, _build_sck_helper
- Remove _blackhole_installed helper
- Fix devices command to use sounddevice directly (Recorder.list_input_devices removed)
- Fix mic-status command for detect_meeting() returning (name, bundle_id) tuple
- Replace pip install with uv add in package install step
- Replace BlackHole references with ScreenCaptureKit audio capture
- Document dual-source recording (SCK + mic), BLE device resilience
- Add Xcode CLI tools to requirements
- Update install guide to reflect new setup steps
- Add sck.py and screen_capture.py to CLAUDE.md architecture
Normalize mic RMS to match SCK RMS before mixing so Whisper hears
both the user's voice and remote participants at similar levels.
Gain is capped at 5x to avoid amplifying background noise.
Only transition from warming to recording when a meeting app or
browser meeting tab is actually detected. Prevents false triggers
from YouTube, Spotify, or other non-meeting mic usage.
After leaving a Google Meet call, Chrome navigates to
meet.google.com/landing — the browser tab script was still matching
this as an active meeting, preventing auto-stop. Exclude /landing
and root Meet URLs in both Chrome and Safari tab checks.
- Add Firefox window-title based detection for Meet, Teams, Zoom
- Only match "Meet – <code>" pattern (active call), not bare
  "Google Meet" title (landing page after leaving)
- Firefox bundle ID (org.mozilla.firefox) passed to SCK for capture
- Copy sck-capture Swift source into trnscrb/sck-capture/ so it ships
  with pip/uv installs (setuptools package-data)
- Update _build_sck_helper() to find bundled source first, fall back
  to repo root swift/ for development
- Update sck.py find_binary() to check bundled build dir
- Fix package install step to use pip (uv add won't work for end users)
- Add .build/ to .gitignore for Swift build artifacts
Both mic and SCK audio are now written to temp files during recording
instead of accumulating in Python lists. This keeps memory usage
constant regardless of meeting length (~4 KB vs ~460 MB for 1h).

On stop, temp files are read back for mixing and transcription.
Files use /tmp with trnscrb_ prefix and are cleaned up after use.
On crash, the OS eventually purges them.
Firefox window titles don't change reliably after leaving a call
("Meet – code" stays even after leaving), which prevents auto-stop.
Split browser scripts into broad (all browsers, for start) and
narrow (Chrome/Safari only, for stop). Firefox meetings now stop
via mic-idle detection instead.
Back-to-back meetings: if Meeting A is still transcribing when
Meeting B starts, the recorder was blocked. Now only blocks if
already recording (not transcribing), since transcription runs
in a separate thread and doesn't need the recorder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant