Just reporting some statistics here - there is a lot of duplication of code with the current method for storing MadGraph files for later running. MG5 is bad but MG4 is even worse with some files being copied over 40 times. The tar-ball below has full listings of all the files and their md5sum as well as a sorted list of uniq md5sum with a corresponding example file. This does not handle duplicate code within files but it is a start.
mg-unique-file-listing.tar.gz
How
calculate md5sum of each file1
cd generators/madgraphN
fd -tf -x md5sum | sort > md5sum.list
get uniq files sorted by number of copies
uniq -c -w 32 md5sum.list | sort -nr > uniq.list
Just reporting some statistics here - there is a lot of duplication of code with the current method for storing MadGraph files for later running. MG5 is bad but MG4 is even worse with some files being copied over 40 times. The tar-ball below has full listings of all the files and their md5sum as well as a sorted list of uniq md5sum with a corresponding example file. This does not handle duplicate code within files but it is a start.
mg-unique-file-listing.tar.gz
How
calculate md5sum of each file1
get uniq files sorted by number of copies
Footnotes
using
fdinstead offindhere since its faster. Thefindequivalent isfind -type f -exec md5sum {} ';' | sort > md5sum.list↩