This script processes and harmonizes the MusicBrainz release.tar.xz data dump. It extracts essential release information, including:
- Release titles and dates
- Artist names
- Track listings with durations
- Genres and subgenres (from tags)
- Labels and country of release
The script handles large datasets efficiently using multiprocessing and logs critical errors for review.
Ensure you have the required dependencies installed:
pip install ujson tqdmRun the script with:
python prepare20M.py --tar-file path/to/release.tar.xz --output-dir output_directoryReplace path/to/release.tar.xz with the path to your MusicBrainz data file and output_directory with your desired output folder.
- Processed JSON files containing essential release data.
- A log file
mb_processing_errors.logcapturing critical errors. - A
rejected_recordsfolder with samples of records that were not processed.
This project is licensed under the MIT License.