VoxHub

Download audio from videos via yt-dlp + FFmpeg, record audio in-browser, and transcribe files/recordings using an AI transcription model. Includes a simple Web UI, CLI, and Docker support for streamlined setup.

Highlights

Web UI: paste a video URL (YouTube/Shorts) and pick mp3 or m4a/mp4.
Transcription: upload or record audio and get text instantly; copy button included.
Recording: in-browser recording (ogg/webm) with playable previews.
CLI: fast downloads with bitrate control and templates.
Docker: one-command setup with FFmpeg preinstalled.
Windows-friendly: no need to modify PATH when using Docker.

Prerequisites (local run)

Python 3.10+.
FFmpeg installed and available on PATH.
- Windows: winget install Gyan.FFmpeg or choco install ffmpeg
- Or install a local copy with scripts\install_ffmpeg.ps1 and use --ffmpeg.
Transcription API key:
- Set GEMINI_API_KEY as an environment variable before starting the server.

Quick Start (Web UI)

# Install dependencies
pip install -r requirements.txt

# Set transcription API key
$env:GEMINI_API_KEY="<your_key_here>"

# Run the server
python src/web_app.py

Open http://localhost:5000/, paste a video URL, choose mp3 or m4a/mp4, and optionally set MP3 bitrate or a custom FFmpeg path.

Browser Extension (Chrome/Edge)

A simple extension (Manifest V3) is available in src/extension/ to open the local app with the video URL pre-filled and start conversion automatically.

How to install (developer mode)

Enable developer mode in chrome://extensions (or edge://extensions).
Click "Load unpacked" and select the folder src/extension.
Keep the server running (python src/web_app.py).
Click the extension icon:
- Paste the YouTube URL or click "Use current" to grab the active tab URL.
- Choose the format (MP3 or M4A).
- Click "Download". The extension will open http://127.0.0.1:5000 with URL parameters and the app will start automatically.

Tip

You can also open the app manually with parameters:

http://127.0.0.1:5000/?url=<VIDEO_URL>&format=mp3&submit=1

Note: By default the app focuses on extracting audio. If you need to download video (MP4), the backend can be extended to support that flow.

Docker (recommended for a consistent setup)

Using Docker Compose

Optionally create a .env file with:

GEMINI_API_KEY=<your_key_here>

Start the app:

docker compose up --build

Open http://localhost:5000/.

Compose mounts local folders for development:

./src and ./src/static for live changes in debug.
./downloads to persist outputs on the host.

Using plain Docker

# Build image
docker build -t audio-downloader-web .

# Run container (Windows PowerShell example)
docker run --rm -p 5000:5000 \
  -e GEMINI_API_KEY="<your_key_here>" \
  -e HOST=0.0.0.0 -e PORT=5000 -e DEBUG=1 \
  -v "$(Get-Location)\downloads":/app/downloads \
  audio-downloader-web

On bash shells, replace the volume flag with -v "$(pwd)/downloads:/app/downloads".

CLI Usage

Basic (Windows):

python src/download_audio.py --format mp3 --output downloads "https://example.com/video-url"

Multiple URLs:

python src/download_audio.py -f m4a -o downloads "https://url1" "https://url2"

Control bitrate for MP3 (kbps):

python src/download_audio.py -f mp3 -b 192 "https://url"

Skip playlists:

python src/download_audio.py --no-playlist "https://playlist-url"

Custom output template:

python src/download_audio.py -t "downloads/%(title)s.%(ext)s" "https://url"

Use cookies file (for sites requiring login):

python src/download_audio.py -c "C:\path\to\cookies.txt" "https://url"

Use a local FFmpeg without touching system PATH:

python src/download_audio.py --ffmpeg "tools/ffmpeg/bin" -f mp3 "https://url"

Configuration

Environment variables for the Web UI:
- GEMINI_API_KEY: required for transcription.
- HOST: default 0.0.0.0 in Docker, 127.0.0.1 locally.
- PORT: default 5000.
- DEBUG: 1 or 0.
FFmpeg path can be set in the Web UI if not on PATH.

Project Structure

audio/
├─ src/
│  ├─ web_app.py
│  ├─ download_audio.py
│  └─ static/
│     └─ style.css
├─ scripts/
│  ├─ setup.ps1
│  ├─ build.ps1
│  └─ install_ffmpeg.ps1
├─ Dockerfile
├─ docker-compose.yml
├─ .dockerignore
├─ .gitignore
├─ requirements.txt
└─ README.md

Troubleshooting

500 Internal Server Error on transcription: ensure GEMINI_API_KEY is set.
Clipboard copy issues: the UI includes a fallback method if permissions are restricted.
"Open downloads" inside containers cannot launch the host’s file explorer; open ./downloads on your machine directly.
If FFmpeg is missing locally, either install it or run with Docker.

Notes

Respect platform terms of use and copyright.
Do not use this tool to circumvent technical protection measures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoxHub

Highlights

Prerequisites (local run)

Quick Start (Web UI)

Browser Extension (Chrome/Edge)

How to install (developer mode)

Tip

Docker (recommended for a consistent setup)

Using Docker Compose

Using plain Docker

CLI Usage

Configuration

Project Structure

Troubleshooting

Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
runtime.txt		runtime.txt
supabase_schema.sql		supabase_schema.sql
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

VoxHub

Highlights

Prerequisites (local run)

Quick Start (Web UI)

Browser Extension (Chrome/Edge)

How to install (developer mode)

Tip

Docker (recommended for a consistent setup)

Using Docker Compose

Using plain Docker

CLI Usage

Configuration

Project Structure

Troubleshooting

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages