Skip to content

feat: add optional cloud transcription via Groq (free Whisper Large v3)#4

Open
nicremo wants to merge 2 commits into
giusmarci:mainfrom
nicremo:feat/groq-cloud-transcription
Open

feat: add optional cloud transcription via Groq (free Whisper Large v3)#4
nicremo wants to merge 2 commits into
giusmarci:mainfrom
nicremo:feat/groq-cloud-transcription

Conversation

@nicremo
Copy link
Copy Markdown

@nicremo nicremo commented Apr 13, 2026

Summary

Adds cloud-based speech-to-text as an optional alternative to the local Whisper pipeline. The existing fully-local mode continues to work unchanged.

  • Cloud transcription via Groq: Uses whisper-large-v3 for significantly better accuracy than local whisper-base
  • Completely free: Groq offers a generous free tier (7,200 audio seconds per hour)
  • Automatic offline fallback: When internet is unavailable, falls back to local Whisper seamlessly
  • 3 modes: Auto (cloud + fallback), Cloud-only, Local-only (original behavior)
  • Any OpenAI-compatible provider: Works with Groq, OpenAI, Lemonfox, or any compatible API
  • API key security: Encrypted via macOS Keychain (Electron safeStorage), never stored in plaintext

Why Groq?

Provider Price/min Model Free Tier
Groq $0.0002 Whisper Large v3 7,200 sec/hr
OpenAI $0.006 Whisper v2 None

Groq is 30x cheaper than OpenAI and the free tier covers normal dictation usage entirely.

What changed

  • New files: api-key.ts (encrypted key storage), cloud-transcription.ts (OpenAI-compatible API client)
  • Modified: dictation.ts (transcription routing), App.tsx (TranscriptionCard UI, updated setup wizard), types.ts (new settings fields)
  • No breaking changes: Default mode is "Auto", which tries cloud first and falls back to local. Users without an API key get the exact same local-only experience as before.
  • No new npm dependencies: Uses native fetch, FormData, and Blob (Node 22+ in Electron 41)

UI additions

  • Transcription card on the Models page: source selector (Auto/Cloud/Local), API key input, cloud model dropdown, language selector
  • Updated setup wizard: New "Transcription Engine" step for choosing cloud vs local
  • Relaxed validation: Ollama is no longer required when enhancement level is set to "No Filter"

Test plan

  • Fresh install: setup wizard shows transcription engine choice
  • Enter Groq API key, verify it saves and encrypts (check settings.json for openaiApiKeyEncrypted)
  • Cloud mode: Fn dictation transcribes via Groq, text is polished by Ollama
  • Disconnect internet: Auto mode falls back to local Whisper
  • Local mode: behaves exactly like the original app
  • No API key: app works in local-only mode without errors

nicremo added 2 commits April 13, 2026 12:24
- Cloud transcription via Groq/OpenAI-compatible APIs (whisper-large-v3)
- Auto/Cloud/Local transcription mode with automatic offline fallback
- API key encrypted via macOS Keychain (Electron safeStorage)
- Default text model changed from gemma4:e4b (9.6GB) to qwen3.5:2b (2.7GB)
- Configurable API base URL (Groq, OpenAI, Lemonfox, any compatible provider)
- Language selector (German default, 11 languages available)
- Stronger same-language prompt to prevent LLM translation
- Built-in microphone preferred over external devices (AirPods fix)
- New TranscriptionCard UI with source selector, API key management
- Setup wizard with cloud/local transcription choice
- Relaxed hotkey validation: Ollama not required when enhancement is off
@cekimilf
Copy link
Copy Markdown

Amazing! But can a version of this fork be made for windows as well?

@nicremo
Copy link
Copy Markdown
Author

nicremo commented Apr 13, 2026

@cekimilf I will try to cook something together. Check out https://github.com/nicremo/openwhisp-enhanced

@cekimilf
Copy link
Copy Markdown

@cekimilf I will try to cook something together. Check out https://github.com/nicremo/openwhisp-enhanced

Amazing, thank you! I will check it out once it's ready! Already downloaded your version, looks really good. I like how you can play around with settings and choose whichever model and approach you prefer, and how you can change your keybinding. (I like the Ctrl + Shift + Space like in the default program)

Would love to use this for both English, Serbian (and German, which I am slowly learning)

@nicremo
Copy link
Copy Markdown
Author

nicremo commented Apr 13, 2026

@cekimilf can you hit me up via Mail? kontakt@bitzer-fabian.de
Created a Windows Version but need someone to test in on windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants