Git is an essential part of every developers’s day to day life, but how many of them really knows what does inside git behind the hood. How git performs a git commit when you say
git commit or additionally any other commands.
But how does it really matters to know what goes behind the hood, and why one should care?
- As a developer we should be aware of the internals of the tool we use, specially when we use is on very frequent basis.
- It will really help us do many things such as creating a branch or resolving merge conflicts or solving any other issue related to git efficiently if we know the internals.
- It provides confidence to work and use commands when you know the internals, specifically when you think you messed up something.
Is this article for you?
- If you are new to git and its basic commands, I would suggest to go and read about them to know an overview what we’ll be talking about.
- If you already know the basic commands like
git pulletc and you want to know about there internal working you are good to go and this is for you.
What’s coming on next?
We’ll be creating a git repo from start without using
git init command that will give us a sense of how git creates a
.git directory and what are the contents of it, we’ll be able to relate many of the top level commands as we go, example- what happens when we add something to git staging area.
We’ll start by creating a directory which does not have git initialised yet.
In above step we have created a directory and used git status there, we can clearly see git is not happy with the command and it says this is not a git repository.
So, Let’s first use the
git init command and see what git does when we initialise a git repo.
You may notice that once we have initialised the git, it has created a directory for us called
.git, now lets explore what’s inside a
Till this point we can clearly see that
.git has two main directory inside it “objects” and “refs” and some more subdirectories inside it, there is one more directory called hooks but as of now we’ll not worry about it.
There are some more files to notice HEAD and config, we’ll look what are these in sometime.
Now lets create file and add it to git to see what it does
We can see that after adding the file to git, it has created something inside the “.git/objects” and that seems an interesting number, now we have done enough and seen the basic structure
.git. As we have seen the initialisation using the
git init command lets now delete the
.git and initialise this by own.
As we deleted the
.git directory it’s now not a git repository.
Unfold the “git init” magic
As we’ve already seen for having a git repo, it must have a
.git and inside
.git it has two more important subdirectories called “objects” and “refs”, lets create them and see if git is happy with that
.git and subdirectory didn’t help us as git is still not happy, so now lets pause here for sometime and understand
“what are these two subdirectories and how they are useful to git?”
Let us think of git as maintaining a file system specifically maintaining snapshots of the files in time, a file-system can contains of files, directories and subdirectories.
git objects directory is used to store three kinds of objects which are called as
BLOB (Binary Large Object): A blob stores nothing but content of file, it is different from a typical file because it doesn’t contains any metadata, but it only contains the content every blobs is identified by it’s “sha-1” hash.
TREE: In git equivalent of a directory is tree, its a directory listing which refers to blob and other trees, it’s also identified by a “sha-1” hash and it stores pointers to other blobs and trees.
If you notice closely above it represents a directory structure in form of trees and blobs.
COMMIT: In git a snapshot of working tree is called a commit, it stores reference to the working tree and also contains some metadata like author name, time, commit message and parent commit if any. It also identifies by a “sha-1” hash.
A commit stores entire snapshot of working tree and not just a diff between commits, doesn’t that means it has to store a lot of data?
To solve that problem git only creates a new blob if the content of the file has changes else it uses the same reference sha to point to that
blob object, for example- if we create 100 files with same content git will only create a single blob object for it and reference every file to it.
.git/refs or .git/refs/heads
We might already know about git branches, when we create a new git repository now-a-days we get
main as our default git branch.
A branch is nothing but a names reference to a commit, we can certainly just refer to a commit but as humans we are not so comfortable remembering the sha ids hence giving it a name.
The .git/refs/heads directory contains information about branches and it’s nothing but a file which is named after a branch having a reference commit as it’s content, as we know we can have many branches thus can have multiple files inside .git/refs/heads.
As we now know about objects and refs, lets create a default branch inside
Oh, git is still not happy it seems, what else is remaining?
As we know we can have multiple branches at any given point, but now the question arises that “how git know which is the current branch it should point to?”
And to solve that problem git has a file called HEAD on top level which we have seen already. It stores reference to the current branch. So lets add the HEAD which points to
Wow! git seems to be happy this time, now it points to main branch.
What do we do now?
We are now ready to create a commit and we’ll do that using some more internal commands, for time being we’ll remove the cool.txt file to make everything clean.
Our .git structure looks better now:
Creating git commit
hash-object command takes an input and in this case the input is from
— stdin and then it generates a
blob for it.
Using the git
hash-object we’ve added a blob with content “git is cool” notice that we haven’t given a file name to it, it’s just the content, and we also see that is has created a file in the
objects directory, if we try to cat this file it will show the garbage but we have a git commands which can solve the problem by printing the type and content of this file.
Notice that the object type is
blob and the sha completes with the two characters of the directory names, git makes these directories for easy lookups.
As we have a blob we can add that to out
git update-index command which is internally used by git when we say
git add we can see that git thinks that the file
cool.txt is deleted because the root directory does not have this file and when we updated the index we’ve also given the file name
cool.txt with permissions
10064 we’ll not worry about the permissions but the name for now.
Let’s dump the content of the file and create
cool.txt with that same content and see how git behaves then
As the file is now created git seems to be more happy and green, now we are at a stage where we can make a commit, but before that we’ll need to make a tree with this particular
blob because we can only commit a
tree, so lets do that using
git write-tree command, it will pick all the changes we’ve staged and create a tree out of it.
Tree is now been created and we can see that the tree points to a blob which has a filename in reference (cool.txt).
We’re all set to make a commit, let’s now do that using the command
git commit-tree, passing a tree id and a message we’ve made a commit, once we commit a tree it gives us a
commit id when we cat-file the type we can see that it’s a
commit object and the content has a tree which we’ve passed to it along with the message and it picks up the author from the local git configs.
At this point if we see the structure of
.git folder we can see three objects lying there,
a tree and
Let’s now do a git log and see if out commit reflects in logs or not?
Oh no! where does our commit go?
Remember that when we created the branch using the touch command we haven’t specified the commit id it should refer to, that means main branch does not no where it needs to point and fetch commits from.
And to solve that problem we’ll need to assign the commit id to main branch
Yey!! Assigning the commit id to branch
main, meaning branch main will now starts pointing to this commit and so on, thus the
git log started working.
We’ve now come to an end where we’ve already created a git repository and made a commit without using any of the top level commands.
I’m hoping now you know a bit better about how git internally works. If you want to know more about the commands which I’ve used in the article you can refer to my previous article on git Knowing Git Inside-Out
Thanks for reading :)