How does git work internally

Shalitha Suranga
Sep 1, 2018 · 7 min read

A Friendly introduction

When we are doing very straight forward code projects (suppose writing a simple bash file) there are only two points in our development timeline, only start and finish. We start coding very first, thereafter we finalize and ship those projects. Obviously many projects will get more than two points in their development timeline due to feature requests , bug fixes and sometimes reverts.

Why (Version Control Systems) — VCS

As mentioned above if we do have many points in our development timeline we really need to use a VCS. So basically VCS tools allow users to manage their development paths (maybe versions, features , patches or technically branches) or development histories without too much effort.

Git — from the guy who wrote kernel

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

Git is distributed system. it means that Git users are not just sending their code in to centralized codebase in order to record the history. Everyone got their own copies of development history.

Haha.. Article is about internals. So let’s begin.We’ll skip git basics. I found a good git-cheatsheet here

Walking to the door

We hit git add , git commit with our keyboards. In other words we stage changes of files and thereafter we commit them to the history. What will happen internally? .. Maybe some magic? or does git manage a centralized database. Then how entire history is available with git clone ?

Opening the door..

Hashes, file based key-value storage and tree data structure, these are the key things behind git. Each tree node, commit and files has own unique 40 character long SHA-1 representation(We can say that’s the key). Thus those elements are added to a tree data structure which is persisted inside .git/objects folder.

.git directory

This will be automatically created when a new repo is created or cloned. Git saves history(file contents and commits) and configuration inside this folder.

Got ahead and play your fingers for these commands

branches — Git no longer use this folder — depreciated

config — Store repo’s configuration

HEAD — reference to your current working branch.

hooks — Scripts that will be triggered with a Git event (before committing etc..). Normally these hooks are not enabled. You need to remove .sample extension to make them work.

objects — File based key-value storage that holds commits, tree nodes and file contents (in blob form).

Hey!! you are now inside ..

Plumbing commands (core commands) will help to understand Git internals. Yes… you understood!, there is a hard way to commit changes than using simple abstract commands like git add and git commit

git add (hard way)

Adding changes to the stage is just like writing a diary anonymously. It means data will be saved to .git/objects but there is no commit message. In other words there is no history written actually.

git hash-object will calculate SHA-1 hash and put the blob file into key-value storage.

mm.. now we have something in our database. So let’s try with cat .

Wow binary.. we can’t simply cat because Git uses different internal binary format than general encoding.

This will return empty content since the myfile.txt file is has not content. So add some content to myfile.txt

This will return another hash because the file content is changed. So.. git cat new hash.

mm.. We got our file content. Thereafter we can start staging process.

This command will add your file to .git/index which holds the indexing information of files. Check staged elements on index files using ls-files

Now what you think! Yes hit git status

Congratulations!! you staged a file doing the hard way.

git commit (hard way)

We wrote things in our diary, thereafter we have two choices. We can either tear the page ( git reset --hard ) or put the signature ( git commit).

So as good people we simply go ahead and put our signature on what we wrote. Verify your details..

Awesome!! your signature is okay. commit object has a SHA-1 hash ( like any other Git objects ) and it points a tree node.

So.. where is the tree node?. We need to create one.

This will create a tree node from current index objects (Remember we staged our blob in there). Thus it will return a new hash which represents our new tree node.

Now we have enough things to do a commit

See first commit’s content

Also you see history via git log

Let’s do our second commit by updating myfile.txt

Now file is having another version. Therefore we are going to create another tree node for this history change.

Since file is already in Git index we can simply pass one argument to update-index .

Since commits happen in linear manner with time, we need to pass previous commit has as an argument for new commit.

This will return second commit’s hash value

Once we enter git log still we cannot get results. Therefore we need to set reference to our latest commit

Wow!. commits ands tree nodes are connected as per below. Further tree nodes has another tree nodes depending on what directory structure you staged. This is the basic internal process behind Git functionality.

Moreover branching is very powerful feature in VCS. Basically branches are just movable pointers to tree nodes as per displayed below.


This explanation was focused on git staging, tree data structure and committing internals. There are other useful features when remote repository is used, such as pulling, pushing etc.


Useful links


Take a look on our latest open source work

Support me on Patreon

Happy version controlling!!!

Shalitha Suranga

Written by

Software Engineer at 99xt | Apache Committer

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade