Skip to content

nock4/transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

transcribe

Universal transcription. Any video, podcast, or audio source — one command.

Works with 1,800+ platforms via yt-dlp. Runs OpenAI Whisper locally for speech-to-text. No API keys. No data leaves your machine.

Install

pip install yt-dlp openai-whisper

# Optional: for video file support (mp4, mkv, avi, etc.)
brew install ffmpeg  # macOS
# sudo apt install ffmpeg  # Linux

Usage

# Any URL
python3 scripts/transcribe.py "https://www.youtube.com/watch?v=..."

# Local file
python3 scripts/transcribe.py /path/to/recording.mp4

# Podcast feed (latest 3 episodes)
python3 scripts/transcribe.py "https://feeds.example.com/podcast.xml" --latest 3

# Twitter/X Space
python3 scripts/transcribe.py "https://x.com/i/spaces/1yxBeMYdqgnJN"

# Batch mode
python3 scripts/transcribe.py file1.mp3 "https://youtube.com/..." file2.mp4

# Higher accuracy
python3 scripts/transcribe.py "URL" --model small

# Just download audio
python3 scripts/transcribe.py "URL" --download-only

Supported Sources

Source Example
YouTube https://www.youtube.com/watch?v=...
Twitter/X https://x.com/i/spaces/..., tweet videos
Twitch https://www.twitch.tv/videos/...
Vimeo https://vimeo.com/...
SoundCloud https://soundcloud.com/...
Podcast RSS https://feeds.example.com/podcast.xml
Local audio recording.mp3, audio.m4a, interview.wav
Local video meeting.mp4, lecture.mkv, clip.mov
1,800+ more Anything yt-dlp supports

Options

Flag Default Description
--model base Whisper model: tiny, base, small, medium, large
--language en Language code
-o / --output stdout Write transcript to file
--text-only off Skip metadata header
--download-only off Download audio only, skip transcription
--keep-audio off Keep audio file after transcription
--latest N all For RSS feeds: transcribe latest N episodes

Model Selection

Model Size Speed (35 min audio) Quality
tiny 39 MB ~5s Low
base 74 MB ~15s Good
small 244 MB ~45s Better
medium 769 MB ~2 min High
large 1.5 GB ~5 min Best

All models run on CPU. No GPU required.

Security

  • URL validation — only http(s) URLs accepted
  • RSS enclosure URL validation before download
  • No shell injection — all subprocess calls use list form
  • Symlink protection on output paths
  • Predictable path avoidance (PID-based temp filenames)
  • No XXE risk (stdlib ElementTree)
  • Metadata sanitization (control characters stripped, length capped)
  • Timeout enforcement on all subprocess calls

How It Works

Source (URL / file / RSS)
    │
    ▼
┌─────────────┐
│  yt-dlp     │  Downloads audio from 1800+ platforms
│  (or ffmpeg) │  Extracts audio from local video files
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Whisper    │  Local speech-to-text (no API)
└──────┬──────┘
       │
       ▼
  Clean transcript
  + metadata header

Also Available As

  • Hermes skill — install via hermes skills or place in ~/.hermes/skills/media/transcribe/

License

MIT

About

Universal transcription. Any video, podcast, or audio source — one command.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages