When we talk about the DevOps tool, the very first thing which came to our mind is git. Git is the most powerful distributed version control system which was developed by Linus Torvalds (a Finnish-American software engineer) in 2005. Since then git has evolved a lot and became the most essential part of developers' life.
This blog does not deals with the introductory part of git, I expect you to have clear understanding of some basic git commands like git init, git add, git commit .
As we all know git is basically responsible for tracking and storing all the changes that we make in our project directory.
In this blog, we will be diving into the depth and see
- how git is working internally behind the scene and
- how it manages to store all the changes inside it.
- We will also Dissect a single commit and will have a close look at how a single commit looks behind the scene.
This will cover most of the advanced concepts that are working behind the scene for tracking and storing changes. After reading this blog you will be able to see how things are working differently inside the git from a different angle in a more proficient way and it will help you to understand the world's most powerful version control system closely. Let's get started…….
If you do a good deed, don’t expect anything back — the other person can still be a complete git. —
Myths and Misconceptions about commits
To understand how git stores the commit internally first we need to understand “What is a commit?”. One of the most formal definitions which most people use to define a commit is “It's a snapshot of all the changes that you have made in the file at a certain interval of time”.
Many people miss interrupt with the above definition of commit because it is clearly mentioned in the definition that git is taking a snapshot of your changes so if we try to move with this definition and try to figure out how commit looks like then we can assume that commit is simply a type of image that git has captured and now git is storing that image in any of the image formats like jpg or png inside the repository(.git folder).
But this is really not the case the theory of storing commit in the form of image and the concept of taking a snapshot of your code was totally wrong because this not how git works behind the scene. This misconception occurs due to the term “Snapshot “ which appears in the formal definition of the commit.
Understanding the term Snapshot
The term “Snapshot “ is simply used in the formal definition of commit to hide the complex process that involves in the creation of a single commit internally. There are a lot of complex processes that are involved to store a single bit of change that we make in our file. So to understand commit in a simple way we use the term “Snapshot”. But one thing is clear now i.e git is not storing any commit in the form of the image because if git starts storing changes in the form of images then the size of the repository will be drastically increased by 2x or 3x and this is really not how git works behind the scene.
This concept of the snapshot can be easily understood by using the concept of Abstraction in OOP. Abstraction means displaying only essential information and hiding the details. In the below image we can clearly see how normal people visualize a car and how an automobile engineer sees it.
similarly, commit can be also visualized in the same way for normal people it's just a snapshot with a unique id but behind the scene actually, it's a lot more than that. Which we will see in detail in this blog.
How git stores a single commit internally
So now you have a clear understanding that whenever you make a commit git does not take any screenshot to store the changes that you have made. So now the question arises “ how git manages to store a commit or the changes that we do in our repository ?“.
To answer the above question we need to dive deep and understand a topic i.e “Application of Object Model in Git “.
Application of Object Model in Git
Linus Torvalds creator of the most powerful version control system (i.e git) used the concept of Objects to store each and every change that we make in our project directory. This is one of the most important application of objects that it is used in git to store numerous amounts of data in the form of key-value pairs. So now we can say that commit is not just a snapshot with a unique id it's a lot more than that and we will dive deeper to understand how commits are really stored in the form of an object.
How commits are stored in the form of an object
Git stores each and every commit that we made in the form of an object inside the object database.
what is object database?
The object database is nothing but it's a kind of folder where all the commits that we make are stored in the form of key-value pair i.e in the form of an object.
Now the question arises here that where this object database is located. As we all know each and every commit is stored inside the .git folder and inside the .git folder there exists an object database. Let's see where exactly this database is located inside the .git folder.
Dissection of .git folder
The above figure shows the inside view of the .git folder. The .git folder is something where all the magic happens and this folder, intentionally comes in hidden format (to see this folder we can use command ls -a in our terminal).
Inside the .git folder, we can see there is an object folder also known as object database this is where our commits are basically stored in a form of an object.
Since the object database can only store objects inside it hence now we can say that the commits which we store in it are also an object. Now let’s understand how the commits that we make are stored in a form of an object.
Dissecting a single commit
Now it's clear that the commit which we make is not just a snapshot it's actually an object. It can be more clearly visualize by splitting a single commit.
This is exactly how a single commit looks like. If we split or dissect a single commit then it can be further broken down into
- Commit Object
- Tree Object
- Blob Object
These all are the three major objects that git stores inside the object database. Let's explore all three one by one …
When we make a commit git store that commit in the form of a commit object and provide a unique commit id which is generated by using the Secure hash algorithm(SSH algorithm).
In the above figure, we can visualize the inside view of a Commit Object, It basically holds data regarding the commit which we have made and store it in the form of key-value pair. The data which is stored inside the commit objects are
- Tree object
- Parent object
- About author
- About committer
- Date and Message associated with the commit
Tree object is always present inside the Commit Object and it is basically responsible for storing Blobs object as well as Tree object inside it in the form of key-value pair. Tree object also assigned with a unique id by using SSH algorithm to avoid duplication of data
Blob Object is present inside the Tree Object and it is basically responsible for storing all the contents of the file. This is the area where all the content of the files is stored (i.e all the code that you have written in the file). Blob object is also assigned with a unique id as well.
This is how a single commit is stored in the form of a nested object inside an object database.
Overall working and Implementation of the above concept
Let's now combine all the concepts we have deal till now and look at how really git is working behind the scene with a practical approach.
When we make a commit then git stores that commit in the form of a nested object. like first commit object is created then inside the commit object tree object is created and then inside the tree object, blob object is created and inside the blob object content of our file is stored. let's explore this process in depth.
The commit which we make is stored in the form of a key-value pair called Commit Object. Inside the commit object, there is a lot of other data which is also known as metadata like tree object, parent object, name of the author, name of committer, date, and message associated with that commit.
The Tree object is responsible for storing the current change whereas the Parent object is responsible for storing changes of the previous commit and this is how git forms a parent-child relationship between multiple commits and this is how all the commits are linked together.
The tree object also holds the data in the form of key-value pair and it basically stores the blob object as well as the tree object.
Blob Object is the last nested object that contains all the content of the changes that you make basically all the codes will be stored here only in the form of the text so we can say that blob object is the last nested object of Commit object which stores the actual content of the commit. some important points about blobs We cant store the same blob twice but we can refer to it many times.
This is the overall working of a single commit behind the scene and this is how git stores a single commit in the form of nested objects.
Practical Implementation of the above concept
Let's see how actually a single commit can be broken down in a terminal and where it is stored inside the object folder.
let say we have multiple commits and we want to dissect the second commit with commit id 8583d66 as shown in the below image.
To see how a single commit can be broken down into nested objects we can use the below command.
After executing the above command we can able to see how the commit object looks like in actual
Now we have just peeled off the 1st layer of the object let's dive deeper and see how the tree object looks like for that we can use the below command
After executing the above command we can able to see how the Tree objects look like in actual
Now we can able to see how blobs are store inside the tree object and as we know that blobs are responsible for storing all the content. Let's peel off the last layer of the nested object to see how our contents are really stored inside the blob object. For doing that we can use the below command.
After executing the above command we can able to see how the Blob objects look like.
Now we can able to see the content which we have stored while making the commit and this is how it is stored inside the blob object behind the scene.
So, this is how the world’s most powerful version control system works behind the scene to store each and every bit of change that we make in our project directory.
So till now, we have deal with
- Myths and Misconceptions about commits
- Understanding the term snapshot
- How git stores a single commit internally
- Application of Object Model in git
- Dissecting a single commit
- Overall working and Practical implementation
I hope this blog was helpful to you for understanding the behind the scene concept of git.
If you like to Explore more on git behind the scene then I would recommend these resources