Checking Git repository for binaries

Jacob Lorenzen
2 min readOct 13, 2017

--

It is a good practice to track your binaries with Git LFS. While you can certainly check for missing binary tracking as part of your code review process — a better approach is to let the computer do that work :)

Git LFS uses .gitattributes to keep track of which files to put into LFS. A simple shell script can easily do the job of listing all binary files and compare it will already tracked item.

Sindre Sorhuss has made a package that list all binary extensions. The essence of that package is a simple json file that can be downloaded with curl.

curl -s https://raw.githubusercontent.com/sindresorhus/binary-extensions/master/binary-extensions.json | sed 's/[],,",\,\t\[]//g' > binaries

Since I don’t need all thejsonsyntax ceremony I use sed to strip out the noice. The final output is saved in a file I have named binaries

The next step if to find already tracked binaries. The syntax Git LFS uses in the .gitattributes file is as follows

*.png filter=lfs diff=lfs merge=lfs -text

In this case it is tracking png files. Based on this pattern I can get the already tracked extensions.

I’m using cut to get the first column and again using sed to only get the extensions. The output is saved into a file named tracked Why the path repo/.gitattributes? We will get to that.

The last step is to find all extensions in the repository and compare them with the list downloaded and exclude those already tracked.

I’m using find to get all extensions and sort them.

As the last step I’m setting the exitcode to 1 — in the case where binaries was found that is currently not tracked by Git LFS. This can of course be interpreter by a computer.

Runing the script as part of CI

How do I make the script run as part of continues integration? One way it can be done is creating a docker image with the script inside. Then using volumes map the candidate repository and run the script inside the image.

The docker image is simple — I have the curlcommand as part of the image as it doesn’t make sense to download it every time the script runs.

I’m starting up the container with the following command and listening to STDERR for output.

docker run -a STDERR —name binary -v $(Build.Repository.LocalPath):/app/repo jaxwood/binary /app/run.sh

Most CI engines allow to run docker commands — so it should work across different providers.

The code could most certainly be improved — but it does the job. I have put up the full example on my Github repository and a docker image on Dockerhub.

If you can find ways to improve it— I will gladly accept pull requests.

--

--