PROGRAMMING JOURNEY
Making a self-committing micro Git
In this post, I describe the steps to implement a minimal Git, only complete enough to commit itself.
My programming journeys are intended to be guided tours. Programming along takes in the range of a few hours to one day to implement but is enriched with ample opportunities for going off and doing side quests. The order we visit the sites is that in which one might implement things, and therefore the optimal architecture is to be discovered by the reader’s reflection.
What is fun about implementing Git is that it cares only about what is on our disk. No internet required, no fancy caching, or background processes, just plain old files.
If you would like company on this journey, I streamed my implementation live, follow this link to watch the video from that: https://www.youtube.com/watch?v=AuaNhJE6XfU
git init
Let’s start with the easy part git init
. Git init really only sets up a few folders, and the master
branch. This is your chance to change what the initial branch is called, and you can call it anything a file can be called. To mimic the real Git I call mine master
though.
- Create folders
.git/objects
and.git/refs/heads
. - Create file
.git/HEAD
with the contentref: refs/heads/master
.
Boom! That’s it, you just implemented a valid git init
. You can even confirm that by running:
> git status
ERROR
> [your git] init
> git status
Great SUCCESS!
Sidequest (easy): Error if we are already in a git repo.
git commit
Git stores pretty much everything in the tree (I find that reading this with Morgan Freeman’s voice helps underline how important the tree is; I warmly recommend it). The tree is the central storage; This is why we created the objects
folder. If at any point you get lost or feel like the exercises are not clear have a regular Git repo open and run git cat-file -pp [hex]
to look directly inside the tree.
Because we want this to work with regular Git we need to be careful that the bytes are identical; otherwise, the hashes are going to be incompatible.
For each file in the directory:
- Get the content (as bytes) and prepend
blob [content length]\0
(as bytes). - We need to do two things with this:
- Compress it using GZIP (in Java you can use
DeflaterOutputStream
). - Hash it using SHA-1 (in Java you can use
MessageDigest
).
- Convert the hash to a hex string.
- Split the hex string into the two first characters and the rest.
- Now store the compressed bytes in this file:
.git/objects/[two first characters]/[the rest]
- Keep track of the hash (as bytes); we are not done with it yet.
Having stored the files in the tree, we also need to keep track of which files are in the commit. We do this by (virtually) making a file with a line like this for each file in the commit:
100644 [filename]\0[hash]
This virtual file is called a tree (notice not cursive), and of course, we store the tree as an actual file in… the tree following the same steps as with real files, except replacing blob
with tree
.
Sidequest (hard): Folders are stored as trees as well, so by doing everything we just did, recursively, we get a git that can store folders as well.
We also need to store the commit metadata: the commit message, the author, and commit time. In the name of minimalism, we fake most of these, but make another virtual file with this content:
tree [hex string of tree hash]
author A <a@a.a> 1
committer A <a@a.a> 1
Now comes the big surprise: this file should be stored in the tree! This time with commit
instead of blob
.
Sidequest (easy): Add parent commit by adding a line parent [hex string of commit hash]
.
Sidequest (easy): Put the correct timestamp.
Sidequest (easy): Add commit message by adding a blank line and then the message.
Sidequest (hard): Add configurable author and committer
Finally, create the file .git/refs/heads/master
containing the hex string of the commit hash.
Sidequest (easy): This also hints at how to do branches: duplicate the file that .git/HEAD
refers to, with the new branch name. And update .git/HEAD
to point to the new file.
Sidequest (very hard): Branches are more fun if we can do git checkout
.
Sidequest: Implement git add
instead of looping through all files.
Sidequest: Add support for .gitignore
.
Congratulations
You have just implemented enough Git to commit itself. You can test it using:
> [your git] init
> git status
> [your git] commit
> git status
> git show
> git log
If you got this far you must be some kind of coding wizard, and as such, I think you should checkout (pun intended) some of the magic in my refactoring book: