Your meetings, your machine, no exceptions.
A macOS menu bar app that records, transcribes, and summarizes your meetings — entirely on your device. Robota uses Apple's on-device SpeechAnalyzer for transcription and Apple Intelligence for summaries, so your words never leave your Mac. It sits quietly in your menu bar, detects when a call starts, captures both your mic and the room, and hands you structured notes the moment you hang up. No subscriptions, no data brokers, no surprises.
- macOS 26.0+ (Tahoe) — Apple Silicon or Intel
- Apple Intelligence — for summaries, chat, and translation. Zero setup on eligible Macs — the on-device model is managed by the system.
- Ollama (optional) — power-user alternative for longer transcripts. Install Ollama, then
ollama pull llama3.2 - No external model downloads needed for transcription — the system manages speech models automatically
Coming soon — check Releases for the latest build.
git clone https://github.com/worldtiki/robota.git
cd robota
swift build -c releaseTo create a bundled .app:
./bundle.sh- Launch Robota — a lime 🍋🟩 appears in your menu bar
- Grant Screen & System Audio Recording when prompted
- Grant Microphone access
- Grant Speech Recognition access
- Click the icon → Record — or wait for automatic meeting detection to prompt you
-
Your conversations stay yours. Cloud-based meeting tools process your audio on remote servers — often to train models, serve ads, or comply with data requests. Robota runs everything locally. There is no server to breach, no retention policy to read, and no account to delete.
-
No subscription tax. Pay nothing per month, per seat, or per hour of audio. The only "cost" is the disk space for a few audio files that get deleted automatically after transcription.
-
Works without internet. Airplane mode? Corporate firewall? VPN that blocks third-party APIs? Robota does not care. Transcription and summarization are fully offline.
-
Separate streams, better notes. Rather than mixing your voice with the call audio into one recording, Robota captures them as independent files. This means cleaner speaker separation —
[You]and[They]segments — without sending audio to a diarization API. -
Plays well with your tools. Export polished Markdown notes directly to your Obsidian vault. Chat with your transcript using Apple Intelligence or any Ollama-compatible model. Configure everything with a plain JSON file. It fits into your workflow instead of replacing it.
No dock icon. No onboarding wizard. A small lime in your menu bar that wakes up when you need it and disappears when you don't. The floating control pill anchors to the right edge of your screen — never far, never in the way.
Robota watches your audio devices in the background. When a headset or external mic appears — the classic signal that a call is starting — a quiet alert slides in with a one-click prompt to start recording. Ten seconds later, if you ignore it, it goes away on its own.
Your voice and the call audio are captured separately via ScreenCaptureKit, producing two clean files: mic.caf for what you said and system.caf for what everyone else said. This gives you accurate speaker attribution without the guesswork. Works transparently with Zoom, Teams, Google Meet, and any other conferencing app — ScreenCaptureKit captures mic audio passively without interfering with active calls.
Pause recording mid-call without stopping the session. The timer pauses, audio buffers are skipped, and the menu bar shows ⏸. Resume when you're ready — the elapsed time accounts for the gap.
Hardware-level acoustic echo cancellation via AVAudioEngine — the same technology FaceTime uses. It monitors system audio output as a reference signal and subtracts it from the mic input, so remote participants' voices don't appear as false [You] segments. Enabled by default, toggleable from the control pill.
Mark key moments during a call with a tap of the star button. Bookmarks appear as yellow markers in the review transcript and are included in Obsidian exports. Add optional labels to remember why a moment mattered.
Watch your conversation transcribe in real time in the floating widget. [You] segments appear in teal, [They] segments in amber. Non-final words appear dimmed until the recognizer confirms them. All powered by Apple SpeechAnalyzer — no network hop, no latency, no monthly API bill, no model to download.
Working across languages? Enable the globe toggle and each finalized transcript segment gets translated to your target language via Ollama. The original text stays visible; the translation appears below. Toggle off anytime without interrupting the recording.
Hit "Summarize" and get a structured breakdown — Key Decisions, Action Items, Discussion Points, and Next Steps — streamed live in the widget.
Apple Intelligence (default) uses the on-device ~3B model via FoundationModels. Zero setup on eligible hardware. Best for typical-length meetings.
Ollama is the power-user alternative with 16K token context for longer transcripts. Use any model you already have pulled — llama3.2 by default, but yours to configure.
Different meetings need different formats. Choose from 6 built-in recipes — General, Sales Call, 1:1, Sprint Planning, Standup, and SWE Interview — or create your own custom recipes in Settings with a name, icon, and system prompt. The recipe picker appears in the review toolbar next to the Summarize button.
Type a question — "What did we decide about the timeline?" — and get an answer with context from the full transcript. Conversation history carries over, so follow-up questions work naturally. Available during live recording and after transcription.
Search across all your exported meeting notes from within the floating widget. Full-text search returns results with dates and context snippets, sorted by most recent. Click a result to read the full note with rendered markdown.
One click sends your transcript and summary to your Obsidian vault as a formatted Markdown note with YAML frontmatter — date, duration, language, tags, bookmarks, all included. Auto-save after transcription is available for zero friction.
Robota never connects to the internet. Audio is written to ~/Documents/Robota/ during recording and deleted after transcription. Transcripts are kept in memory only — they are not saved to disk. LLM processing happens on-device via Apple Intelligence or locally via Ollama on localhost:11434. There are no analytics, no telemetry, and no accounts.
Meeting audio is some of the most sensitive data you generate — business strategy, personnel discussions, client details, candid off-the-record moments. Robota starts from the position that this data belongs on your machine and nowhere else, not as a premium tier or a compliance checkbox, but as the only option.
All settings live in ~/Library/Application Support/com.worldtiki.robota/settings.json. The file is optional — all values have sensible defaults. Settings auto-save on every change.
{
"summarization_provider": "apple",
"ollama_model": "llama3.2",
"language": null,
"live_transcript": false,
"meeting_detection_enabled": true,
"echo_suppression_enabled": true,
"start_muted": false,
"notifications_enabled": true,
"obsidian_vault_path": "/path/to/vault",
"obsidian_folder": "Meetings/Robota",
"obsidian_auto_save": false,
"obsidian_open_after_save": true,
"translate_to_language": null,
"custom_recipes": [],
"last_recipe_id": "general"
}| Key | Default | Description |
|---|---|---|
summarization_provider |
"apple" |
LLM provider: "apple" (Apple Intelligence, on-device) or "ollama" (local Ollama) |
ollama_model |
"llama3.2" |
Ollama model for summarization and chat (only when provider is "ollama") |
language |
system locale | Transcription language as a locale identifier (e.g. "en", "pt-BR") |
live_transcript |
false |
Auto-expand transcript panel when recording starts |
meeting_detection_enabled |
true |
Auto-detect meetings via audio device monitoring |
echo_suppression_enabled |
true |
Hardware echo cancellation for cleaner transcription |
start_muted |
false |
Start recording with microphone muted |
notifications_enabled |
true |
macOS notifications for transcription events |
obsidian_vault_path |
nil |
Path to Obsidian vault for note export |
obsidian_folder |
"Meetings/Robota" |
Subfolder within vault for meeting notes |
obsidian_auto_save |
false |
Auto-export to Obsidian after transcription |
obsidian_open_after_save |
true |
Open the note in Obsidian after saving |
translate_to_language |
nil |
Target language for live translation (e.g. "English", "Portuguese") |
custom_recipes |
[] |
User-defined summary recipes |
last_recipe_id |
"general" |
Last-used summary recipe, persisted across sessions |
ScreenCaptureKit (single SCStream)
├── .audio → system.caf → SpeechAnalyzer (live + batch)
└── .microphone → mic.caf → SpeechAnalyzer (live + batch)
↕ AVAudioEngine AEC (echo cancellation)
After recording stops:
SpeechAnalyzer batch → transcript (in-memory)
Audio files → deleted
Transcript → Apple Intelligence → structured summary (default)
→ Ollama → structured summary (alternative)
→ ObsidianExporter → vault
→ chat (ask questions)
Both audio sources flow through one SCStream — system audio as .audio output, mic as .microphone output (captureMicrophone = true). ScreenCaptureKit captures mic audio passively at the system mixer level, so device switches during calls are handled transparently by the OS without interfering with Zoom, Teams, or other apps.
Contributions are welcome! See CONTRIBUTING.md for development setup, code conventions, and how to submit changes.



