PROGRAMMING JOURNEY

Making a self-committing micro Git

Christian Clausen
Webtips
Published in
4 min readJul 20, 2020

--

In this post, I describe the steps to implement a minimal Git, only complete enough to commit itself.

My programming journeys are intended to be guided tours. Programming along takes in the range of a few hours to one day to implement but is enriched with ample opportunities for going off and doing side quests. The order we visit the sites is that in which one might implement things, and therefore the optimal architecture is to be discovered by the reader’s reflection.

What is fun about implementing Git is that it cares only about what is on our disk. No internet required, no fancy caching, or background processes, just plain old files.

If you would like company on this journey, I streamed my implementation live, follow this link to watch the video from that: https://www.youtube.com/watch?v=AuaNhJE6XfU

git init

Let’s start with the easy part git init. Git init really only sets up a few folders, and the master branch. This is your chance to change what the initial branch is called, and you can call it anything a file can be called. To mimic the real Git I call mine master though.

  • Create folders .git/objects and .git/refs/heads.
  • Create file .git/HEAD with the content ref: refs/heads/master.

Boom! That’s it, you just implemented a valid git init. You can even confirm that by running:

> git status
ERROR
> [your git] init
> git status
Great SUCCESS!

Sidequest (easy): Error if we are already in a git repo.

git commit

Git stores pretty much everything in the tree (I find that reading this with Morgan Freeman’s voice helps underline how important the tree is; I warmly recommend it). The tree is the central storage; This is why we created the objects folder. If at any point you get lost or feel like the exercises are not clear have a regular Git repo open and run git cat-file -pp [hex] to look directly inside the tree.

Because we want this to work with regular Git we need to be careful that the bytes are identical; otherwise, the hashes are going to be incompatible.

For each file in the directory:

  • Get the content (as bytes) and prepend blob [content length]\0 (as bytes).
  • We need to do two things with this:
  1. Compress it using GZIP (in Java you can use DeflaterOutputStream).
  2. Hash it using SHA-1 (in Java you can use MessageDigest).
  • Convert the hash to a hex string.
  • Split the hex string into the two first characters and the rest.
  • Now store the compressed bytes in this file:
.git/objects/[two first characters]/[the rest]
  • Keep track of the hash (as bytes); we are not done with it yet.

Having stored the files in the tree, we also need to keep track of which files are in the commit. We do this by (virtually) making a file with a line like this for each file in the commit:

100644 [filename]\0[hash]

This virtual file is called a tree (notice not cursive), and of course, we store the tree as an actual file in… the tree following the same steps as with real files, except replacing blob with tree.

Sidequest (hard): Folders are stored as trees as well, so by doing everything we just did, recursively, we get a git that can store folders as well.

We also need to store the commit metadata: the commit message, the author, and commit time. In the name of minimalism, we fake most of these, but make another virtual file with this content:

tree [hex string of tree hash]
author A <a@a.a> 1
committer A <a@a.a> 1

Now comes the big surprise: this file should be stored in the tree! This time with commit instead of blob.

Sidequest (easy): Add parent commit by adding a line parent [hex string of commit hash].

Sidequest (easy): Put the correct timestamp.

Sidequest (easy): Add commit message by adding a blank line and then the message.

Sidequest (hard): Add configurable author and committer

Finally, create the file .git/refs/heads/master containing the hex string of the commit hash.

Sidequest (easy): This also hints at how to do branches: duplicate the file that .git/HEAD refers to, with the new branch name. And update .git/HEAD to point to the new file.

Sidequest (very hard): Branches are more fun if we can do git checkout.

Sidequest: Implement git add instead of looping through all files.

Sidequest: Add support for .gitignore.

Congratulations

You have just implemented enough Git to commit itself. You can test it using:

> [your git] init
> git status
> [your git] commit
> git status
> git show
> git log

If you got this far you must be some kind of coding wizard, and as such, I think you should checkout (pun intended) some of the magic in my refactoring book:

--

--

Christian Clausen
Webtips

I live by my mentor’s words: “The key to being consistently brilliant is: hard work, every day.”