This alternative discogsxml2db is written in C# and runs on Microsoft .NET Core,
although the latter is required only for development; builds that require
no installation are provided with each release.
It provides a significant speedup over the python version:
| File | Record Count | Python | C# |
|---|---|---|---|
| discogs_20200806_artists.xml.gz | 7,046,615 | 6:22 | 2:35 |
| discogs_20200806_labels.xml.gz | 1,571,873 | 1:15 | 0:22 |
| discogs_20200806_masters.xml.gz | 1,734,371 | 3:56 | 1:57 |
| discogs_20200806_releases.xml.gz | 12,867,980 | 1:45:16 | 42:38 |
Done:
- parsing all four discogs dumps, both .xml and .xml.gz;
- exporting to csv and compressed csv. Produces the exact same files that the Python version does;
- displaying progress of import/export process;
- "dry runs": only parsing the files and displaying counts, not producing any csv files;
TODO:
- option to track progress display against the most recently reported
discogs record counts (
--api-countsargument); - option to import the resulting csv files into the database; this process is currently manual or done through the python DB-specific scripts;
- option to specify the output folder for csv files;
Unlike the Python version, this version requires no installation.
Simply download from the release page
the archive appropriate for your platform. Unzip,
and you should have 2 files: a discogs executable (or discogs.exe on
Windows) and a "discogs.pdb" support file.
That's it.
Executing discogs without any parameters or passing --help will
output a list of available arguments:
Usage: discogs [options] [files...]
Options:
--dry-run Parse the files, output counts, but don't write any actual files
--verbose More verbose output
--gz Compress output files (gzip)
files... Path to discogs_[date]_[type].xml, or .xml.gz files.
Can specify multiple files.
To export one or more discogs xml files to csv, simply pass it as parameters
to the executable: discogs /tmp/discogs_20200806_artists.xml.gz /tmp/discogs_20200806_labels.xml.gz.
Currently, the program exports the csv files in the same folder as each of the
original xml files. If you would like the csv files to be compressed to .csv.gz,
pass the --gz argument: discogs --gz /tmp/discogs_20200806_artists.xml.gz.