Skip to content

jcddc83/simple-transcriber

Repository files navigation

Simple Transcriber for Podcasts & Videos

A small desktop app that turns a podcast or video URL into a speaker-labeled, audio-synced HTML transcript you can read, edit, and quote from. Meant to replace the functionality, ease of use, speed and accuracy of Scroll.ai's transcription capabilities.

Transcript view

Paste a YouTube link or a podcast URL, get back a transcript with:

  • Per-paragraph play buttons synced to the audio
  • Word-level highlighting that follows playback
  • Editable speaker labels and title
  • Inline text editing for fixing mis-transcribed names
  • Paragraph bookmarks collected at the top as "Saved quotes"
  • A browsable library of every transcript you've made

Built as a personal replacement for Scroll, which shut down in 2026 and had offered free access for journalists and freelancers. Uses Groq's whisper-large-v3 for transcription and AssemblyAI for speaker diarization.

If you're looking for a more robust but still easy-to-install version and/or a Docker setup, try Easy Transcriber from ReadTedium.


Install (Windows)

  1. Download SimpleTranscriber-Setup.exe from the Releases page.
  2. Run it. The installer handles WebView2 (if not already on your system), creates a Start menu and desktop shortcut, and launches the app.
  3. On first launch, paste your two API keys (see below). They're saved locally to config.json next to the app — nothing leaves your machine except the audio you submit for transcription.

That's it. After setup, paste a URL and click Transcribe.

Getting API keys

Both are free, take under a minute, and have generous free tiers.

Service Sign up Notes
Groq https://console.groq.com → API Keys Used for transcription. ~$0.11/hr of audio.
AssemblyAI https://www.assemblyai.com → Dashboard Used for speaker diarization. $50 free credit covers months of casual use.

A typical 45-minute interview costs well under $0.50 total.


Features

  • Per-paragraph playback — click ▶ next to any paragraph to jump there
  • Word-level highlighting — the active word lights up as audio plays
  • Auto-scroll with "Jump to now" pill — follows playback; pauses scrolling when you scroll up to re-read, with a one-click way back to the live position
  • Editable speakers — click a label to rename ("Speaker A" → "Jane Smith"); choose to rename just one occurrence or all matching labels
  • Editable text and title — click any paragraph or the title to fix typos
  • Bookmarks — star paragraphs worth quoting; they appear in a "Saved quotes" section at the top with jump links
  • Hints field — paste proper nouns ("John Doe, ACME Co., NASA") before transcribing to help Whisper spell them correctly
  • URL queue — paste multiple URLs (one per line) to process in sequence. Press Shift+Enter to start a new line.
  • Library page — every transcript, searchable by title, date, or transcript content, sorted by when you transcribed it
  • Copy and export — copy a single quote (pre-formatted with attribution and timestamp), copy the full transcript, or export as .txt / .md

Keyboard shortcuts (in a transcript)

Key Action
Space Play / pause
Seek back 10 seconds
Seek forward 10 seconds

Shortcuts are disabled while editing text so they don't interfere with typing.


Where files live

  • Transcripts: Desktop\Transcripts\<title>\ — each gets its own folder with the HTML transcript, MP3 audio, and a small .meta.json sidecar
  • Library index: Desktop\Transcripts\index.html — regenerated after every run
  • Config: config.json next to the installed .exe — your API keys and window geometry

You can move the Transcripts folder if you want — the app reads/writes to %USERPROFILE%\Desktop\Transcripts.


Privacy

Audio you transcribe is uploaded to Groq (transcription) and AssemblyAI (diarization). Both have published privacy policies; neither is used for training by default. Everything else — your API keys, the transcripts themselves, the library — stays on your machine.

Limits

  • Groq free tier caps uploads at 25MB. The app encodes audio as mono MP3 at 32kbps (~14MB/hour), so typical interviews up to ~90 min fit comfortably. Longer files are auto-chunked and stitched back together.
  • Transcription quality is high (Whisper large-v3, ~95–98% on clear audio). Speaker labels are good but not perfect — expect occasional misattributions at speaker transitions, especially with more than two speakers. Suitable as a readable working transcript, not a substitute for human review before publication.

Install from source (instead of the installer)

If you'd rather run the Python directly:

git clone git clone https://github.com/jcddc83/simple-transcriber.git
cd simple-transcriber
pip install -r requirements.txt
python transcribe.py

Requires Python 3.11+ and FFmpeg on your PATH (or ffmpeg.exe next to the script). On Windows, the easiest FFmpeg install is via gyan.dev — download "release essentials", unzip, and drop ffmpeg.exe next to transcribe.py.

Building the installer

build.bat
iscc installer.iss

build.bat downloads FFmpeg and the WebView2 bootstrapper automatically if they aren't already present, then packages everything into dist\SimpleTranscriber.exe via PyInstaller. iscc installer.iss (from Inno Setup) wraps it into Output\SimpleTranscriber-Setup.exe.

macOS

A build.sh script exists for producing a Mac .app bundle, but it hasn't been tested end-to-end yet. The Python code itself is cross-platform; pywebview uses WKWebView on macOS (built in, no extra dependency).


Troubleshooting

  • "Windows protected your PC" on first launch — because the app isn't code-signed, Windows SmartScreen shows an "unrecognized app" warning the first time you run the installer. Click More info → Run anyway. This is expected for independent apps without a commercial code-signing certificate.
  • Antivirus false positive — PyInstaller-packaged apps occasionally trigger Windows Defender or other AV scanners (the unpack-then-run pattern looks suspicious to heuristic scanners). If your AV quarantines the installer, add an exception for SimpleTranscriber-Setup.exe, or run from source instead.
  • "ffmpeg not found" (running from source) — make sure ffmpeg.exe is in the transcriber/ folder or on your system PATH.
  • YouTube rate-limitedyt-dlp occasionally gets throttled by YouTube. Wait an hour and try again, or run with browser cookies (advanced).
  • Audio >25MB after encoding — automatic chunking handles it, but very long files (3+ hours) may need manual splitting.
  • Window opens but stays blank — make sure WebView2 Runtime is installed. The installer handles this; if you're running from source, download it from Microsoft.

Credits

License

Personal use, MIT-licensed. See LICENSE if present.