How to prune a git repo of large files
Problem
A git repo has only a handful of tiny files, but its size is over 1GB?
Diagnosis
- People have not been using gitignore, and have also not been observant when committing files to git.
- The repo has large files in its .git history.
- Even AFTER files are deleted, they are still in the .git repo folder.
Solution
1) Understand git ignore
- Read here
https://git-scm.com/docs/gitignore
2) Understand git add
- DO NOT use “git commit -am”. You will add everything and commit it. This is bad practice, as it results in unwanted files being committed to git.
- It’s better instead to use git status, then “git add <filename>”.
- Then git status again, then “git commit -m “Useful message here””. Then git push.
Read more here:
https://git-scm.com/docs/git-add
3) Prune the repo
- Prune them using this java tool:
https://rtyley.github.io/bfg-repo-cleaner/
Important:
By default the BFG doesn’t modify the contents of your latest commit on your master (or ‘HEAD’) branch, even though it will clean all the commits before it.
Step 1
- From WSL, clone the full, bare repo (1.45 GiB!!!) with — mirror (basically the .git folder, no checked-out files).
ak@sys:/mnt/c/Users/ak
$ git clone --mirror <git-repo-url>
Cloning into bare repository 'repo.git'...
remote: Counting objects: 173, done.
Receiving objects: 100% (173/173), 1.45 GiB | 1.11 MiB/s, done.
Resolving deltas: 100% (102/102), done.
Checking connectivity... done.
Step 2
- From Powershell because I don’t have Java on WSL atm.
cd C:\Users\ak
java -jar C:\apps\bfg\bfg-1.13.0.jar --strip-blobs-bigger-than 1M repo.git
# Deleted files
# ------------------------------------------------------------------------------
# AWS Schema Conversion Tool-1.0.628.msi | 6c8977ee (287.6 MB)
# aws-cassandra-extractor-1.0.628-1.x86_64.rpm | 28cf6ab1 (119.9 MB)
# aws-cassandra-extractor-1.0.628.deb | 27db22dd (116.0 MB)
# aws-schema-conversion-tool-dms-agent-3.3.0-R1.x86_64.rpm | 2538ad16 (102.2 MB)
# aws-schema-conversion-tool-extractor-1.0.628-1.x86_64.rpm | 2136707b (29.9 MB)
# aws-schema-conversion-tool-extractor-1.0.628.deb | ace85e24 (29.8 MB)
# aws-schema-conversion-tool-extractor-1.0.628.dmg | 4fd92a15 (32.3 MB)
# aws-schema-conversion-tool-extractor-1.0.628.msi | 8cfa88fc (32.7 MB)
# package.zip
Step 3
- Back to WSL to finish the pruning task and push it back up to the server.
ak@sys:/mnt/c/Users/ak/repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 173, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (164/164), done.
Writing objects: 100% (173/173), done.
Total 173 (delta 102), reused 16 (delta 0)
ak@sys:/mnt/c/Users/ak/repo.git
$ git push
Counting objects: 159, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (56/56), done.
Writing objects: 100% (159/159), 123.02 MiB | 1.14 MiB/s, done.
Total 159 (delta 95), reused 158 (delta 94)
remote: processing ......To <git-repo-url>
+ 6501d49...93a33d9 master -> master (forced update)
Step 4
- Backup and remove your old repo, clone it again, check its size.
ak@sys:/mnt/c/Users/ak/repos
$ git clone <repo-url> demo-repo
Cloning into 'demo-repo'...
remote: Counting objects: 173, done.
Receiving objects: 100% (173/173), 31.64 KiB | 0 bytes/s, done.
Resolving deltas: 100% (99/99), done.
Checking connectivity... done.
ak@sys:/mnt/c/Users/ak/repos
$ cd demo-repo
ak@sys:/mnt/c/Users/ak/demo-repo
$ du -sh
112K .
Conclusion
A very handy tool, but you must coordinate with other developers in your team while doing this.