🎸 Essentia Music Tagger

Intelligent audio analysis and automatic genre/mood tagging for your music library

Analyze your music collection using machine learning and write accurate genre and mood tags directly to your audio files. No internet required - all processing happens locally using Essentia, an advanced audio analysis library from the Music Technology Group.

✨ Features

🎯 Audio-based analysis - Analyzes actual audio content, not metadata lookups
🎼 400 genre classifications - Uses Discogs taxonomy for detailed genre tagging
😊 Mood detection - Detects moods like energetic, dark, happy, aggressive, etc.
🎚️ Analysis mode selection - Choose genres only, moods only, or both per run
� Default library path - Save your library root; browse it with an arrow-key folder navigator on every run
�🔄 Batch processing - Recursively process entire music libraries
🎛️ Fully configurable - Interactive prompts for all settings on each run
📝 Comprehensive logging - Detailed logs with confidence scores and predictions
🏷️ Multiple tag formats - Choose how genre tags are formatted
💾 Wide format support - Writes tags to FLAC, MP3, OGG, Opus, M4A/MP4, AAC, WMA, AIFF, WAV, WavPack, APE, Musepack, and DSF
🧪 Dry run mode - Test before making changes
🚀 CPU-only - No GPU required (though it helps!)
🐳 Docker support - Run with Docker, with optional GPU acceleration
🤖 Automation support - CLI arguments for scripted/automated workflows
🔄 Picard integration - Auto-tag files saved by MusicBrainz Picard.

🎵 How It Works

Unlike tools like MusicBrainz Picard or beets that look up metadata from online databases, Essentia Music Tagger analyzes the actual audio waveform using deep learning models:

Loads audio - Reads your music files
Extracts features - Analyzes spectral, tonal, and rhythmic characteristics
Runs ML models - Pre-trained neural networks predict genres and moods
Writes tags - Saves predictions to your music files

No internet connection required after initial setup!

Example Predictions

Input: Alternative rock track
Output:

🎸 Genres: Rock - Alternative Rock (32%), Rock - Indie Rock (23%), Rock - Brit Pop (22%) 😊 Moods: Energetic (2.3%), Dark (1.8%)

Tags written:

GENRE: Rock - Alternative Rock; Rock - Indie Rock; Rock - Brit Pop
MOOD: Energetic; Dark

🚀 Quick Start

Prerequisites

Python 3.8+
Linux (Debian/Ubuntu recommended, also works on macOS)
~100MB disk space for models
8GB+ RAM recommended

Installation

# 1. Clone the repository
cd /srv/dev-disk-by-uuid-dc4918d5-6597-465b-9567-ce442fbd8e2a/Github
git clone https://github.com/WB2024/Essentia-to-Metadata.git
cd Essentia-to-Metadata

# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate

# 3. Install dependencies
pip install essentia-tensorflow mutagen numpy

# 4. Download ML models (~87MB)
bash download_models.sh

First Run

Run the script

python tag_music.py

You'll be prompted for:

📂 Music directory to analyze
🎸 Number of genres to tag (1-10)
📊 Confidence thresholds for genre/mood
🎨 Tag formatting style
🧪 Dry run mode (test first!)
And more...

Recommendation: Run in dry-run mode first to preview results!

🐳 Docker

CPU mode

# Build the image
docker compose build essentia-tagger

# Process a music directory
MUSIC_DIR=/path/to/music docker compose run --rm essentia-tagger

# Dry run
MUSIC_DIR=/path/to/music docker compose run --rm essentia-tagger /music --auto --dry-run

# Process a single file
MUSIC_DIR=/path/to/music docker compose run --rm essentia-tagger /music/song.flac --auto --single-file

GPU mode (NVIDIA)

Requires the NVIDIA Container Toolkit installed on the host. Uses a dedicated Dockerfile.gpu based on nvidia/cuda with CUDA 11.2 + cuDNN 8 for actual GPU acceleration.

# Build the GPU image
docker compose build essentia-tagger-gpu

# Process with GPU acceleration
MUSIC_DIR=/path/to/music docker compose run --rm essentia-tagger-gpu

If no GPU is available on the host, use the CPU profile instead.

🤖 Command Line / Automation Mode

For scripting or integration with other tools, use CLI arguments:

# Basic automated mode
python tag_music.py /path/to/music --auto

# Process a single file (e.g., from a file watcher)
python tag_music.py /path/to/song.flac --auto --single-file

# Custom settings
python tag_music.py /path/to/music --auto \
    --genres 4 \
    --genre-threshold 20 \
    --mood-threshold 1 \
    --genre-format child_only \
    --overwrite

# Dry run for testing
python tag_music.py /path/to/music --auto --dry-run

CLI Arguments Reference

Argument	Short	Description	Default
`--auto`	`-a`	Non-interactive mode	-
`--single-file`	`-f`	Process single file	-
`--genres N`	`-g`	Number of genres	3
`--genre-threshold PCT`	`-gt`	Genre confidence %	15
`--genre-format STYLE`	`-gf`	Format style	parent_child
`--no-genres`	-	Disable genre analysis	-
`--no-moods`	-	Disable mood analysis	-
`--mood-threshold PCT`	`-mt`	Mood confidence %	0.5
`--dry-run`	`-d`	Don't write tags	-
`--overwrite`	`-o`	Overwrite existing tags	-
`--quiet`	`-q`	Minimal output	-
`--log-dir DIR`	-	Log file directory	./
`--model-dir DIR`	-	Essentia models directory	~/essentia_models
`--library DIR`	-	Set & save default library path	-

Note: --no-genres and --no-moods cannot be used together — at least one analysis type must be enabled.

Tip: Use --library /path/to/library once to save your library root. It is stored in ~/.essentia_tagger.json and used automatically in future interactive runs.

🎵 MusicBrainz Picard Automation

Automatically tag files whenever Picard saves them! Perfect for server setups.

See PICARD_AUTOMATION_SETUP.md for complete setup instructions.

How it works:

Picard (in Docker) saves files to your music directory
A file watcher (systemd service) detects new files
Essentia analyzes and tags the files automatically

Included files for automation:

essentia_watcher.sh - File watcher script using inotifywait
essentia-tagger.service - Systemd service file

📖 Detailed Usage

Interactive Configuration

Every run prompts you to configure:

Default Library Path

The first setting lets you define (or update) a default music library root path, saved persistently to ~/.essentia_tagger.json.

Once set, every subsequent run begins with a path selection menu:

Option	Description
1. Scan entire library (default)	Recursively scan from the library root
2. Browse & select a folder	Open the interactive folder browser to pick a sub-folder
3. Enter a custom path	Type a path manually (original behaviour)

Interactive Folder Browser

Option 2 opens a full-screen CLI folder navigator:

   📂 Browsing: /2Pac
   📍 Full path: /srv/.../Music/Sources/Clean/2Pac
   Use ↑↓ arrows to navigate, Enter to select, 'q' to cancel
   ──────────────────────────────────────────────────
   ▶ ✅ SELECT THIS FOLDER
     ⬆️  ../ (go up)
     📁 [1994] Me Against the World
     📁 [1996] All Eyez on Me
     📁 [1996] The Don Killuminati

Key	Action
↑ / ↓	Move selection up/down
Enter	Select folder or navigate into it
Backspace	Go up one directory
q	Cancel and return to path selection

Analysis Mode

Choose what to analyse for each run:

Option	Description
1. Both (default)	Analyse and write both genre and mood tags
2. Genres only	Run only the genre model; skip mood analysis entirely
3. Moods only	Run only the mood model; skip genre analysis entirely

Only the models that are needed are loaded, saving memory and time when running in a single-mode.

Genre Settings (shown when mode is Both or Genres only)

Number of genres (1-10) - How many genre tags per song
Confidence threshold (1-50%) - Minimum prediction confidence
Format style:
- Rock - Alternative Rock (parent - child) ← default
- Alternative Rock - Rock (child - parent)
- Alternative Rock (child only)
- Rock---Alternative Rock (raw)

Mood Settings (shown when mode is Both or Moods only)

Confidence threshold (0.1-20%) - Moods have lower confidence than genres

Other Options

Dry run mode - Test without writing tags
Confidence tags - Write detailed scores to custom tags
Overwrite existing - Skip or replace existing genre/mood tags independently
Verbose output - Show detailed predictions

Example Session

$ python tag_music.py

🎸 ESSENTIA MUSIC TAGGER - INTERACTIVE MODE ══════════════════════════════════════════════════════════════════════

Enter the path to analyze: /music/2Pac

📂 Directory: /music/2Pac 🎵 Found ~45 audio files

⚙️ CONFIGURATION ────────────────────────────────────────────────────────────────────── 🧪 DRY RUN MODE Enable dry run mode? [Y/n]: y

�️ ANALYSIS MODE

Both genres and moods (default)
Genres only
Moods only Select mode [1]: 1

🎸 GENRE SETTINGS Number of genres to write [3]: 3 Genre threshold (%) [15]: 15 Genre format [1]: 1

😊 MOOD SETTINGS Mood threshold (%) [0.5]: 0.5

🔄 Loading models... ✅ Models loaded successfully!

[1/45] 2Pac/Me Against the World/05 - Temptations.flac 🎸 Raw: Hip-Hop---Gangsta (45.2%), Hip-Hop---East Coast Hip Hop (32.4%) 🎸 Formatted: Hip-Hop - Gangsta, Hip-Hop - East Coast Hip Hop 😊 Raw: energetic (2.3%), dark (1.8%) 😊 Formatted: Energetic, Dark [DRY RUN] Would write: Genres: Hip-Hop - Gangsta, Hip-Hop - East Coast Hip Hop | Moods: Energetic, Dark

🎛️ Configuration Guide

Confidence Thresholds

Understanding confidence scores:

Genres:

Model predicts across 400 possible classes
Top prediction typically 15-40%
15% threshold = balanced (recommended)
25% threshold = strict (fewer genres)
5% threshold = inclusive (more genres)

Moods:

Naturally much lower than genres (0.1-5% range)
0.5% threshold = good starting point
1-3% = more selective

Tag Formatting Examples

Raw Prediction	parent_child	child_parent	child_only
`Rock---Alternative Rock`	`Rock - Alternative Rock`	`Alternative Rock - Rock`	`Alternative Rock`
`Hip-Hop---Gangsta`	`Hip-Hop - Gangsta`	`Gangsta - Hip-Hop`	`Gangsta`
`Electronic---Techno`	`Electronic - Techno`	`Techno - Electronic`	`Techno`

📁 Output

Tags Written

Tags are written using the native tagging format for each container:

FLAC / OGG Vorbis / OGG Opus (Vorbis Comments):

GENRE - Formatted genre tags (semicolon-separated)
MOOD - Formatted mood tags (semicolon-separated)
ESSENTIA_GENRE - Raw predictions with confidence scores (optional)
ESSENTIA_MOOD - Raw mood predictions with scores (optional)

MP3 / AIFF / WAV / DSF (ID3v2):

TCON (Genre) - Formatted genre tags
COMM (Comment) - Confidence scores (optional)

M4A / MP4 / AAC (iTunes atoms):

©gen - Formatted genre tags
©cmt - Confidence scores (optional)
----:com.apple.iTunes:MOOD - Mood tags (freeform atom)

WMA (ASF/Windows Media):

WM/Genre - Formatted genre tags
WM/Mood - Mood tags
WM/Provider - Confidence scores (optional)

WavPack / APE / Musepack (APEv2):

Genre - Formatted genre tags
Mood - Mood tags
Essentia Genre / Essentia Mood - Confidence scores (optional)

Log Files

Each run creates a timestamped log: essentia_tagger_YYYYMMDD_HHMMSS.log

Example log content:

FILE: 2Pac/Me Against the World/05 - Temptations.flac ──────────────────────────────────────────────────────────────────────────── GENRES (raw predictions): • Hip-Hop---Gangsta: 45.23% • Hip-Hop---East Coast Hip Hop: 32.45% • Hip-Hop---Golden Age Hip Hop: 21.34%

GENRES (formatted for tags): • Hip-Hop - Gangsta • Hip-Hop - East Coast Hip Hop

ALL GENRE PREDICTIONS (top 10): • Hip-Hop---Gangsta: 45.23% • Hip-Hop---East Coast Hip Hop: 32.45% ...

MOODS (passed threshold - 2 total): • energetic: 2.34% • dark: 1.87%

🎓 Understanding the Models

Genre Model: Discogs-400

Classes: 400 genre/style categories from Discogs taxonomy
Architecture: EfficientNet-based CNN
Training: Supervised learning on Discogs-tagged releases
Strengths: Very detailed genre classification
Example classes: Hip-Hop---Golden Age Hip Hop, Rock---Shoegaze, Electronic---Deep House

Mood Model: MTG-Jamendo

Classes: Mood and theme tags (energetic, dark, happy, sad, etc.)
Architecture: Multi-label classification
Training: MTG-Jamendo dataset with crowd-sourced tags
Note: Lower confidence than genres (this is normal!)

Embedding Model: Discogs-Effnet

Purpose: Extracts audio features for downstream tasks
Input: 16kHz audio resampled from any format
Output: High-dimensional embedding vectors

⚡ Performance

Processing Speed (CPU-only, Intel i3-4150T @ 3.00GHz):

~5-15 seconds per track (varies by length and complexity)
~500 tracks = 2-4 hours
~2000 tracks = 8-16 hours

Tips for faster processing:

Use newer/faster CPU
Process in batches by artist/album
Run overnight for large libraries
GPU support (if available) can 10x speed

Memory Usage:

~2-3GB RAM during processing
Models loaded once, reused for all files

🛠️ Troubleshooting

"Could not load model" errors

Ensure models are downloaded: bash download_models.sh
Check ~/essentia_models/ contains .pb and .json files

"No moods above threshold"

Moods have very low confidence (0.1-5%)
Try lowering mood threshold to 0.3% or 0.1%
Check log file for raw mood predictions

TensorFlow warnings

Could not load dynamic library 'libcudart.so.11.0'

Safe to ignore - means no GPU, will use CPU
Performance is still good on CPU

Out of memory

Reduce batch processing
Close other applications
Upgrade RAM if processing very large files

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Ideas for contributions:

GUI interface
Progress bars
Resume/checkpoint system for interrupted runs
Custom model support
Genre mapping/translation tables
ALAC (Apple Lossless) support

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: Essentia library itself is licensed under AGPL-3.0. Pre-trained models from MTG are available under CC BY-NC-ND 4.0 for non-commercial use.

🙏 Credits

Built With

Essentia - Audio analysis library by Music Technology Group
Mutagen - Python library for audio metadata
NumPy - Numerical computing

Models

Pre-trained models provided by:

Music Technology Group (MTG) - Universitat Pompeu Fabra
Trained on public datasets: Discogs, MTG-Jamendo, AcousticBrainz

Inspiration

AcousticBrainz - Crowdsourced acoustic analysis
beets - Music library management
MusicBrainz Picard - Music tagger

📚 Further Reading

💬 Support

Issues: GitHub Issues
Discussions: GitHub Discussions

🎶 Example Use Cases

1. Organize Your Music Library

Tag your entire collection with accurate genres for better browsing in music players.

2. DJ / Producer Workflow

Quickly find tracks by mood/energy for mixing and production.

3. Music Research

Analyze genre distributions in music collections.

4. Playlist Generation

Use mood tags to create dynamic playlists (energetic workout mixes, relaxing evening music, etc.).

📋 Changelog

v2.0.0 (Latest)

New Features

Analysis mode selection — New 3-way mode menu in interactive mode lets you choose to run genres only, moods only, or both. The corresponding ML models are only loaded when needed, reducing memory usage when running in single-mode. CLI equivalents: --no-genres and --no-moods.
Default library path & folder browser — Set a default music library root path (saved to ~/.essentia_tagger.json). On each interactive run you can scan the whole library, drill into a subfolder using an arrow-key folder browser, or type a custom path. No more copying and pasting long paths. CLI equivalent: --library /path/to/library.
Wide audio format support — Tag writing now covers 12 audio container formats via native tag systems:
- Vorbis Comments: FLAC, OGG Vorbis, OGG Opus
- ID3v2: MP3, AIFF, WAV, DSF
- iTunes atoms: M4A, MP4, AAC
- ASF: WMA
- APEv2: WavPack (.wv), Monkey's Audio (.ape), Musepack (.mpc)
Per-tag-type overwrite logic — Genre and mood tags are now checked and potentially skipped independently. Previously, if a genre tag existed and overwrite was disabled the entire file was skipped, including moods.

Bug Fixes

Fixed MP3 mood tags never being written when genres were disabled — mood writing was incorrectly nested inside the genres block.
Fixed bare except: clause in MP3 tag writing replaced with except Exception: to avoid swallowing system exits and keyboard interrupts.
Fixed mood_threshold config default mismatch — the internal default was 0.01 while the CLI help text advertised 0.5% (0.005). Both are now consistently 0.005.
Removed unused imports (re, from mutagen.mp3 import MP3).

Made with ❤️ for music lovers and data nerds

If this project helps you, consider ⭐ starring the repo!

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
PICARD_AUTOMATION_SETUP.md		PICARD_AUTOMATION_SETUP.md
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.py		docker-entrypoint.py
download_models.sh		download_models.sh
essentia-tagger.service		essentia-tagger.service
essentia_watcher.sh		essentia_watcher.sh
requirements.txt		requirements.txt
tag_music.py		tag_music.py

Folders and files

Latest commit

History

Repository files navigation

🎸 Essentia Music Tagger

✨ Features

🎵 How It Works

Example Predictions

🚀 Quick Start

Prerequisites

Installation

First Run

Run the script

🐳 Docker

CPU mode

GPU mode (NVIDIA)

🤖 Command Line / Automation Mode

CLI Arguments Reference

🎵 MusicBrainz Picard Automation

📖 Detailed Usage

Interactive Configuration

Default Library Path

Interactive Folder Browser

Analysis Mode

Genre Settings (shown when mode is Both or Genres only)

Mood Settings (shown when mode is Both or Moods only)

Other Options

Example Session

🎛️ Configuration Guide

Confidence Thresholds

Tag Formatting Examples

📁 Output

Tags Written

Log Files

🎓 Understanding the Models

Genre Model: Discogs-400

Mood Model: MTG-Jamendo

Embedding Model: Discogs-Effnet

⚡ Performance

🛠️ Troubleshooting

"Could not load model" errors

"No moods above threshold"

TensorFlow warnings

Out of memory

🤝 Contributing

Ideas for contributions:

📜 License

🙏 Credits

Built With

Models

Inspiration

📚 Further Reading

💬 Support

🎶 Example Use Cases

1. Organize Your Music Library

2. DJ / Producer Workflow

3. Music Research

4. Playlist Generation

📋 Changelog

v2.0.0 (Latest)

New Features

Bug Fixes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages