Separates audio into vocals, drums, bass, and other instruments using Demucs by Meta Research.
- AI-Powered Separation: Uses state-of-the-art Demucs models for high-quality stem separation
- Multiple Models: Support for 4-stem and 6-stem (guitar + piano) models
- Multiple Formats: Output as WAV, MP3, FLAC, OGG, or AAC
- YouTube Support: Download and process YouTube videos directly, including playlists
- Spotify Support: Download and process Spotify tracks (requires spotdl)
- Batch Processing: Process entire directories of audio files
- Custom Remixing: Mix stems with custom volume levels
- Audio Preview: Preview separated stems before saving
- GPU Acceleration: Automatic GPU detection for faster processing
- Metadata Preservation: Preserve ID3 tags from source files
- Configuration File: Save your preferences in
~/.stem-separator.yaml
Windows:
install.batLinux/macOS:
chmod +x install.sh && ./install.sh-
Install FFmpeg (required):
# Windows winget install FFmpeg.FFmpeg # macOS brew install ffmpeg # Linux (Debian/Ubuntu) sudo apt install ffmpeg
-
Install Python packages:
pip install demucs yt-dlp soundfile scipy pyyaml rich
-
Optional dependencies:
# Spotify support pip install spotdl # Audio preview pip install sounddevice # Enhanced metadata handling pip install mutagen
pip install -e .This enables:
- Running as
stem-separator song.mp3orstems song.mp3 - Running as
python -m stem_separator song.mp3
Run with Docker for the easiest setup - no dependencies to install:
# CPU mode (works on any machine)
docker compose --profile cpu up -d
# GPU mode (requires NVIDIA GPU + Container Toolkit)
docker compose --profile gpu up -d
# Access the web UI at http://localhost:8080See DOCKER.md for complete Docker documentation, including:
- Building custom images
- Volume mounting for input/output
- Model caching
- CLI usage in Docker
- API documentation
# Process a local file
python stem_separator.py song.mp3
# YouTube URL
python stem_separator.py "https://www.youtube.com/watch?v=VIDEO_ID"
# Specify output folder
python stem_separator.py song.mp3 -o ./output
# Use 6-stem model (adds guitar + piano separation)
python stem_separator.py song.mp3 --model htdemucs_6s
# Export as MP3
python stem_separator.py song.mp3 --format mp3
# Extract only specific stems
python stem_separator.py song.mp3 --stems vocals,drums
# Create karaoke version (everything except vocals)
python stem_separator.py song.mp3 --stems karaoke --format mp3
# Extract acapella (vocals only)
python stem_separator.py song.mp3 --stems acapella# Process all audio files in a directory
python stem_separator.py ./music_folder --batch -o ./stems
# Process recursively
python stem_separator.py ./music_folder --batch --recursive
# Dry run (see what would be processed)
python stem_separator.py ./music_folder --batch --dry-run# Process an entire playlist
python stem_separator.py "https://youtube.com/playlist?list=..." --playlist
# Resume interrupted playlist download
python stem_separator.py "https://youtube.com/playlist?list=..." --playlist# Single track
python stem_separator.py "https://open.spotify.com/track/..."
# Playlist or album
python stem_separator.py "https://open.spotify.com/playlist/..." --playlist# Create a custom mix with volume control
python stem_separator.py song.mp3 --remix "vocals:0.5,drums:1.0,bass:0.8"
# Process then remix
python stem_separator.py song.mp3 --remix "vocals:0,drums:1.5,bass:1.2,other:0.5"# Preview stems interactively after processing
python stem_separator.py song.mp3 --preview# Analyze separation quality
python stem_separator.py song.mp3 --quality| Option | Description |
|---|---|
-o, --output |
Output directory (default: current) |
--format |
Output format: wav, mp3, flac, ogg, aac (default: wav) |
--naming |
Output naming template (default: {name}_{stem}) |
| Option | Description |
|---|---|
--model |
AI model: htdemucs, htdemucs_ft, htdemucs_6s (default: htdemucs) |
--stems |
Stems to export: comma-separated or preset |
| Option | Description |
|---|---|
--cpu |
Force CPU mode (skip GPU) |
--normalize |
Normalize audio levels in output stems |
--low-memory |
Low memory mode for very long tracks |
--quality |
Analyze and report stem separation quality |
| Option | Description |
|---|---|
--batch |
Process directory of files |
-r, --recursive |
Search directories recursively |
-j, --parallel N |
Number of parallel jobs |
| Option | Description |
|---|---|
--playlist |
Process YouTube/Spotify playlist |
--browser |
Use browser cookies (chrome, firefox, edge, safari) |
| Option | Description |
|---|---|
--remix |
Remix stems with volume control |
--preview |
Preview stems interactively |
--no-metadata |
Don't preserve metadata |
-v, --verbose |
Verbose output |
-q, --quiet |
Quiet mode |
--dry-run |
Show what would be done |
--version |
Show version |
| Model | Stems | Description |
|---|---|---|
htdemucs |
4 | Default model (vocals, drums, bass, other) |
htdemucs_ft |
4 | Fine-tuned version (better quality) |
htdemucs_6s |
6 | Adds guitar and piano separation |
| Preset | Description |
|---|---|
all |
All stems (default) |
karaoke |
Everything except vocals (instrumental) |
acapella |
Vocals only |
instrumental |
Same as karaoke |
Create ~/.stem-separator.yaml to set defaults:
# Default settings
model: htdemucs
format: wav
output_dir: .
normalize: false
naming_template: "{name}_{stem}"
# Processing
cpu: false
low_memory: false
# YouTube
browser: chrome
# Output
verbose: false
quiet: falseGenerate a sample config:
python stem_separator.py --generate-config > ~/.stem-separator.yamlCreates a folder named {song}_stems containing the separated stems.
4-stem model output:
vocals- Singing/voicedrums- Drums and percussionbass- Bassother- Guitar, piano, synths, etc.
6-stem model output (htdemucs_6s):
- All of the above, plus:
guitar- Guitarpiano- Piano
The script automatically tries GPU first and falls back to CPU if needed.
# Standard CUDA support
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
# RTX 5090 / Blackwell GPUs (requires nightly)
pip install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128pip install torch torchaudio
# MPS backend is used automatically| Setup | Time per Song |
|---|---|
| GPU (CUDA) | ~30 seconds |
| Apple Silicon (MPS) | ~1 minute |
| CPU | 2-4 minutes |
The Docker container includes a modern web interface at http://localhost:8080 with:
- File Upload: Drag & drop audio files (MP3, WAV, FLAC, OGG, AAC, M4A)
- URL Processing: Paste YouTube or Spotify URLs directly
- Playlist Support: Process entire playlists
- All CLI Options: Model selection, output format, stem selection, etc.
- Real-time Progress: Live progress updates via WebSocket
- Easy Downloads: Download individual stems or all as ZIP
from stem_separator import separate_audio, StemSeparator, remix_stems
# Simple usage
result = separate_audio(
input_file="song.mp3",
output_dir="./output",
model_name="htdemucs",
output_format="mp3",
)
# With model pre-loading for batch
separator = StemSeparator(model_name="htdemucs_6s")
separator.load_model()
for song in songs:
result = separator.separate(song, output_dir)
separator.unload_model()
# Remix stems
result = remix_stems(
stems_dir="./output/song_stems",
output_path="./remix.mp3",
mix_components="vocals:0.5,drums:1.0,bass:0.8",
)python stem_separator.py URL --browser edge- Check CUDA is installed:
nvidia-smi - Check PyTorch CUDA:
python -c "import torch; print(torch.cuda.is_available())" - For newest GPUs, install nightly PyTorch
# Use low-memory mode for long tracks
python stem_separator.py long_song.mp3 --low-memoryMIT License
