Right now, the repo size is unnecessarily large.
git filter-repo --analyze:
| Unpacked |
Packed |
| 844.83MB |
605.33MB |
Folder sizes:
| .git |
Worktree |
Combined |
| 548MB |
318MB |
866MB |
From what I could find, this is caused by large PDF files and mostly by large jpgs/pngs/webps.
Solution
git filter-repo should be used to cleanup either only old deleted unused blobs with a bash script, or just git filter-repo --strip-blobs-bigger-than 5M to get rid of everything that's bigger than 5M including currently active files/blobs, though stuff in public/ should be moved to the CDN before that, and its riskier, but the payoff will definitely be worth it with the size reduction, because I don't think anyone wants to download a ~548mb git repo just to edit a single file.
This will most likely need someone with force push perms though and other people would need to re-clone the repo again, if I'm correct.
I tested this bash script, and got the .git down to 386M by only removing unused files:
# get every filename ever in history from %(rest)
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
awk '/^blob/ {print $4}' | sort -u > all_files.txt
# get currently used files
git ls-tree -r --name-only main | sort > current_files.txt
# get dead files by comparing all files to current files
comm -23 all_files.txt current_files.txt > deleted_files.txt
# strip dead files from history
git filter-repo --invert-paths --paths-from-file deleted_files.txt
By just straight up doing git filter-repo --strip-blobs-bigger-than 3M, I got the .git size down to 219M.
By combining the bash script and the --strip-blobs-bigger-than 3M, I got it down to 159M, which is obviously a lot better than 548MB.
Largest files
Largest files in history from running git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | awk '/^blob/ {print $3, $4}' | sort -u | sort -n | awk '{printf "| %.2fMB | %s |\n", $1/1024/1024, $2}' | tail -10:
| Size |
File |
| 9.96MB |
public/jobs/zephyr-group-pic.jpg |
| 10.25MB |
public/winter/2.png |
| 12.28MB |
public/hc-cdn/7bf19e299e3e8253096906cef8d599c7aedeed09_image.png |
| 13.05MB |
public/fiscal-sponsorship/hcb-gource.gif |
| 16.58MB |
public/winter/11.gif |
| 22.57MB |
public/home/assemble.jpg |
| 22.96MB |
public/train_starry_night.png |
| 29.65MB |
public/home/outernet-110.jpg |
| 38.72MB |
public/philanthropy/hackclub.pdf |
| 40.43MB |
public/onboard/first_and_hack_club.pdf |
Right now, the repo size is unnecessarily large.
git filter-repo --analyze:
Folder sizes:
From what I could find, this is caused by large PDF files and mostly by large jpgs/pngs/webps.
Solution
git filter-reposhould be used to cleanup either only old deleted unused blobs with a bash script, or justgit filter-repo --strip-blobs-bigger-than 5Mto get rid of everything that's bigger than 5M including currently active files/blobs, though stuff in public/ should be moved to the CDN before that, and its riskier, but the payoff will definitely be worth it with the size reduction, because I don't think anyone wants to download a ~548mb git repo just to edit a single file.This will most likely need someone with force push perms though and other people would need to re-clone the repo again, if I'm correct.
I tested this bash script, and got the .git down to 386M by only removing unused files:
By just straight up doing
git filter-repo --strip-blobs-bigger-than 3M, I got the .git size down to 219M.By combining the bash script and the
--strip-blobs-bigger-than 3M, I got it down to 159M, which is obviously a lot better than 548MB.Largest files
Largest files in history from running
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | awk '/^blob/ {print $3, $4}' | sort -u | sort -n | awk '{printf "| %.2fMB | %s |\n", $1/1024/1024, $2}' | tail -10: