Decoding the Git Magic: Unveiling the Inner Workings of Version Control

Prince Gola
6 min readJun 22, 2023

--

This article will help you become more productive using Git. So, whenever you execute a git command, you’ll visualize how data is moved between the internal areas of Git :)

“We rise by lifting others.” — Robert Ingersoll

Introduction

Git is a popular version control system used in software development. It helps track changes to projects and is essential for coordinating work among developers. It has a significant presence in the open-source community, with millions of developers and repositories on platforms like GitHub.

How Your Changes Are Stored Upon Commit ?

Basically, when you commit changes in Git, Git creates objects to store the necessary information. The main object created is called a commit object, which captures a snapshot of your project at that moment. It includes details like who made the commit, when it was made, the commit message, and a reference to the previous commit.

Also In Addition to commit objects, Git also stores reference in Blob & Tree object. The tree object represents the directory structure and file hierarchy of your project at the time of the commit. The blob objects store the content of each individual file.

How tree object are created in Git ?

A Git tree object helps organize files in a Git repository. It defines how directories and files are connected. You can use tree objects to read from and write to your Git database on GitHub.

Once you commit any changes, that commit will creates two things tree and blob objects, tree are like a organize structure in a git repo.

You can see git tree using this command on cmd git log — graph

What is Blob ?

In Git, a blob (Binary Large Object) refers to the fundamental unit of data storage. It represents the content of a file, such as source code or any other type of file, within a Git repository.

When you make changes to a file, a new blob is created for the updated content. Each blob has a unique identifier based on its content.

Let’s Dive into Blob Magic: An Assessment of Blob Working

Imagine we have an app project. At its core, we find a README.md file and a src directory. Inside the src directory, there’s a text file and a hello.js source file. Together, we’ll explore how Git’s Blob objects efficiently store the content of these files.

Pro Tip: if you’re unfamiliar with what the echo command above is doing: see I/O redirection. In a nutshell, it redirects the output of “echo ‘Hello, world!’” to a file called greeting.

Currently, there is nothing committed to our git repository, so let’s make the initial commit of adding these files and directory. Once the commit is made, we’ll inspect the objects directory to see the various objects created.

Firstly run this command to verify no objects exist for that folder “ls -a .git/objects ”.

When we use .git/objects, the value we are seeing above are hash

67/f67f4664981e4397625791c8eabbb5f2279a31

You will see these 40-character strings all over the place in Git. the two digit i.e “67” the first two digits becomes the folder name and rest 38 object key. In each case, the name is calculated by taking the SHA1 hash of the contents of the object. The SHA1 hash has a cryptographic hash function. What that means to us is that it is virtually impossible to find two different objects with the same name. This has a number of advantages; among others:

  • Git can quickly determine whether two objects are identical or not, just by comparing names.
  • Since object names are computed the same way in every repository, the same content stored in two repositories will always be stored under the same name.

We see one commit in the commit history but Inside the objects directory, we see the various objects being created. Let’s inspect them one by one using the git cat-file command starting with the commit viewed in log command.

In summary, git cat-file -t shows the type of an object, while git cat-file -p displays the contents of an object in a more readable format.

As we can see above, the commit object contains a simple and short piece of text. Whenever a commit is made (using git commit), git generates this text, generates its hash, and then stores it pretty much the same way a blob is stored. The commit text contains all the metadata about the commit. Apart from Author, committer, date of commit, it has a hash for the tree object. This commit (171490) is pointing to the root directory of the project. Let’s fetch the contents of this tree object to understand it further.

Just like a commit object, a tree object is also a piece of text. It contains the list of contents (represented by hashes) present in the directory. In this case, we have a blob and another tree with their corresponding names. blob is the README.md file present in the root while the tree is the src directory in the root.

Let’s use the git cat-file again to get the contents of the blob first.

So, the contents of the blob are the same as what present inside our README.md file.

To summarize, a commit points to a tree (signifying root), and this tree points to a blob (README.md file), and another tree (src directory). The blob is just a piece of content present inside the README.md file.

Bonus Questions:

1. Do you know why blank folder in directory doesn’t push to GitHub?

When pushing a repository to GitHub, empty directories (blank folders) are not stored or preserved. Git focuses on tracking files and their content, rather than directories alone. Empty directories do not contain any files, so they are not considered relevant in the version control process. As a result, Git does not push or store empty directories to GitHub. Only directories that contain files or other directories with files will be tracked and included in the repository.

2. Did you know that Git has a secret storage system that makes version control possible? How does Git’s enchanting content-addressable storage work its magic?

Git’s content-addressable storage system is a fundamental component that empowers its version control capabilities. It organizes data into three types of objects: blobs, trees, and commits. Blobs hold the content of individual files, while trees maintain the structure of directories and file hierarchies. Commits capture snapshots of the project’s history.

The magic lies in the unique hash assigned to each object, derived from its content. This ensures data integrity and enables efficient storage. With this sorcery, Git enables seamless branching, merging, and tracking of changes.

In simpler terms, Git’s content-addressable storage system acts like a spellbook, preserving your code’s history, organizing files, and making version control a breeze. It’s the secret ingredient that keeps your projects in perfect harmony, no matter how complex they become.

--

--