Cloud Techies
Published in

Cloud Techies

How to prune a git repo of large files

Problem

A git repo has only a handful of tiny files, but its size is over 1GB?

Diagnosis

  • People have not been using gitignore, and have also not been observant when committing files to git.
  • The repo has large files in its .git history.
  • Even AFTER files are deleted, they are still in the .git repo folder.

Solution

1) Understand git ignore

  • Read here
https://git-scm.com/docs/gitignore

2) Understand git add

  • DO NOT use “git commit -am”. You will add everything and commit it. This is bad practice, as it results in unwanted files being committed to git.
  • It’s better instead to use git status, then “git add <filename>”.
  • Then git status again, then “git commit -m “Useful message here””. Then git push.

Read more here:

https://git-scm.com/docs/git-add

3) Prune the repo

  • Prune them using this java tool:
https://rtyley.github.io/bfg-repo-cleaner/

Important:

By default the BFG doesn’t modify the contents of your latest commit on your master (or ‘HEAD’) branch, even though it will clean all the commits before it.

Step 1

  • From WSL, clone the full, bare repo (1.45 GiB!!!) with — mirror (basically the .git folder, no checked-out files).
ak@sys:/mnt/c/Users/ak
$ git clone --mirror <git-repo-url>
Cloning into bare repository 'repo.git'...
remote: Counting objects: 173, done.
Receiving objects: 100% (173/173), 1.45 GiB | 1.11 MiB/s, done.
Resolving deltas: 100% (102/102), done.
Checking connectivity... done.

Step 2

  • From Powershell because I don’t have Java on WSL atm.
cd C:\Users\ak
java -jar C:\apps\bfg\bfg-1.13.0.jar --strip-blobs-bigger-than 1M repo.git

# Deleted files
# ------------------------------------------------------------------------------
# AWS Schema Conversion Tool-1.0.628.msi | 6c8977ee (287.6 MB)
# aws-cassandra-extractor-1.0.628-1.x86_64.rpm | 28cf6ab1 (119.9 MB)
# aws-cassandra-extractor-1.0.628.deb | 27db22dd (116.0 MB)
# aws-schema-conversion-tool-dms-agent-3.3.0-R1.x86_64.rpm | 2538ad16 (102.2 MB)
# aws-schema-conversion-tool-extractor-1.0.628-1.x86_64.rpm | 2136707b (29.9 MB)
# aws-schema-conversion-tool-extractor-1.0.628.deb | ace85e24 (29.8 MB)
# aws-schema-conversion-tool-extractor-1.0.628.dmg | 4fd92a15 (32.3 MB)
# aws-schema-conversion-tool-extractor-1.0.628.msi | 8cfa88fc (32.7 MB)
# package.zip

Step 3

  • Back to WSL to finish the pruning task and push it back up to the server.
ak@sys:/mnt/c/Users/ak/repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 173, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (164/164), done.
Writing objects: 100% (173/173), done.
Total 173 (delta 102), reused 16 (delta 0)

ak@sys:/mnt/c/Users/ak/repo.git
$ git push
Counting objects: 159, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (56/56), done.
Writing objects: 100% (159/159), 123.02 MiB | 1.14 MiB/s, done.
Total 159 (delta 95), reused 158 (delta 94)
remote: processing ......To <git-repo-url>
+ 6501d49...93a33d9 master -> master (forced update)

Step 4

  • Backup and remove your old repo, clone it again, check its size.
ak@sys:/mnt/c/Users/ak/repos
$ git clone <repo-url> demo-repo
Cloning into 'demo-repo'...
remote: Counting objects: 173, done.
Receiving objects: 100% (173/173), 31.64 KiB | 0 bytes/s, done.
Resolving deltas: 100% (99/99), done.
Checking connectivity... done.

ak@sys:/mnt/c/Users/ak/repos
$ cd demo-repo

ak@sys:/mnt/c/Users/ak/demo-repo
$ du -sh
112K .

Conclusion

A very handy tool, but you must coordinate with other developers in your team while doing this.

--

--

Onboarding steps, design diagrams, architecture flows, technical solutions and implementations on all major Clouds like AWS, GCP, Azure and details about other important open source tools like Kubernetes, Terraform, Ansible.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arun Kumar

Cloud Architect | AWS, GCP, Azure, Python, Kubernetes, Terraform, Ansible