What is Git Object Model

Amit Prajapati
MindOrks
Published in
9 min readSep 26, 2019

Today’s Motivation

“As long as you feel pain, you’re still alive. As long as you make mistakes, you’re still human. And as long as you keep trying, there’s still hope.”

Get started… 👩‍💻

What is repository

A repository is simply a database containing all the information needed to retain and manage the revisions and history of a project. Within a repository, Git maintains two primary data structures, the object store, and index.

The object store is designed to be efficiently copied during a clone operation as part of the mechanism that supports a fully DVCS (Distributed Version Control System).

An index is transitory information, is private to a repository and can be created or modified on-demand as needed.

The SHA (Secure Hash Algorithm)

All the information needed to represent the history of a project is stored in files referenced by a 40-digit “object name” that looks something like this:

8hf67f4664981e4397625791c8eabbb5f2279a31

You will see these 40-character strings all over the place in Git. In each case, the name is calculated by taking the SHA1 hash of the contents of the object. The SHA1 hash has a cryptographic hash function. What that means to us is that it is virtually impossible to find two different objects with the same name. This has a number of advantages; among others:

  • Git can quickly determine whether two objects are identical or not, just by comparing names.
  • Since object names are computed the same way in every repository, the same content stored in two repositories will always be stored under the same name.

Git Object Store

It contains your original data files and all the log messages, author information, dates, and other information required to rebuild any revision or branch of the project.

Every object consists of three things- a type, a size, and content. The size is simply the size of the contents, the content depends on what type of object is, and there are four different types of objects: “blob”, “tree”, “commit”, and “tag”. Git stores these different types of objects in .git/objects.

  • A “blob” is used to store file data- it is generally a file.
  • A “tree” is basically like a directory- it references a bunch of other trees and blobs (i.e. files and sub-directories).
  • A “commit” object holds metadata for each change introduced in the repository, including the author, committer, commit-data, and log- messages.
  • A “tag” object assigns an arbitrary human-readable name to a specific object usually a commit.

Different from SVN (Subversion)

It is important to note that this is very different from most SCM (Source Code Management) systems that you may be familiar with. Subversion and CVS store the differences between one commit and the next. Git does not do this — it stores a snapshot of all the files in your project each time you commit. This is a very important concept to understand when using Git.

Here we make examples of each of these object types in a new repository.

First, we make the working tree and initialize the repository.

Git will inform us it has created a .git directory in our project’s directory so let’s take a quick peek at how it looks like:

Some of these files and directories may sound familiar to you ( particularly HEAD) but for now, we will focus on the .git/objects directory which is empty right now, but we will change that in a moment.

Let’s create an “index.txt” and “README.md” file with some content.

Now, let’s stage and commit them:

Okay, nothing special here, adding and committing — we’ve all “been there, done that”.

If we take a look again at our .git directory we can see that the .git/objects directory has some subdirectories and files now:

(Note: directories and files will have different names on your computer)

We will get back to .git/objects but for now, notice that every directory name is two characters long. Git generates a 40- character checksum (SHA-1) hash for every object and the first two characters of that checksum are used as the directory name and other 38 as a file (object) name.

The first kind of object that git creates when we commit some files are blob objects, in our case, we’ve two of them, one for each file we committed.

Blob objects associated with our index.txt and README.md files.

They contain snapshots of our files (the content of our files at the time of the commit) and have their checksum header.

Note: I’ll use the command git cat-file to show the contents of the hashed files in .git/objects, but cat-file is a relatively obscure git command that you will probably not need in your daily git work.

git-cat-file : Provides content or type and size information for repository objects.

There are multiple options in git cat-file command that we can use, but we mainly focus on -t, -s, -p options.

Blob Object

Blob is an abbreviation for “binary large object”. The first kind of objects that git creates when we commit some files are blob objects, in our case “index.txt” and “README.md” files, one for each file we committed. A blob generally stores the contents of a file. The directory listing gave us the hash of the stored file named “index.txt & README.md ”. This object is of type “blob” and contains the file snapshot.

We can look at blob object representing (for example) index.txt with the cat-file command:

git cat-file blob 980a0d5f19a64b4b30a87d4206aade58726b60e3

and we see that it contains our index.txt file’s content.

From Git, we expect there will now be four objects in the directory in .git/objects :

  • One storing the backup of index.txt and README.md files.
  • One storing the directory listing for the commit.
  • One storing the commit message.

Tree Object

The next kind of objects git creates are tree objects. In our case, there is only one and it contains a list of all files in our project with a pointer to the blob object assigned to them (this is how git associates your files with their blob objects):

Tree object pointing to blob objects

The tree object contains one line per file or subdirectory, with each line giving file permissions, object type, object hash, and filename. An object type is usually one of “blob” for a file or “tree” for a subdirectory.

Let’s try to understand, the output of the command git cat-file -p 5fbbad8 in the image below:

  • The first part states the file’s permission i.e. 100644. As you can see the file has 644 permission (ignoring the 100). Permissions of 600 mean that the owner has full read and write access to the file, while no other user can access the file. Permissions of 644 mean that the owner of the file has read and write access, while the group members and other users on the system only have read access.
  • The second part states that the content of this entry is represented by a blob, rather than a tree.
  • The third part states the hash of the blob.
  • The fourth part states the filename.

And finally, git creates a commit object that has a pointer to its tree object (along with some other information):

Commit Object

The commit object contains the directory tree object hash, parent commit hash, author, committer, date, and message.

Git log will show us the hash of the commit message:

Commit object points to its tree object

git cat-file -p shows the contents of the file associated with this hash.

As you can see, a commit is defined by:

  • A tree: The SHA1 name of a tree object, representing the contents of a directory at a certain point in time.
  • A parent(s): The SHA1 name of some number of commits which represent the immediately previous step(s) in the history of the project. The example above has no parent; merge commits may have more than one. A commit with no parent is called a “root” commit and represents the initial revision of a project. Each project must have at least one root.
  • An author: The name of the person responsible for this change, together with its date.
  • A committer: The name of the person who actually created the commit, with the date it was done. This may be different from the author; for example, if the author wrote a patch and emailed it to another person who used the patch to create the commit.
  • A comment describing the commit.

A commit is usually created by git commit , which creates a commit whose parent is normally the current HEAD, and whose tree is taken from the content currently stored in the index.

Tag Object

A tag object contains an object name(called simply ‘object’), object type, tag name, the name of the person (“tagger”) who created the tag, and a message.

There is also a git type for annotated tags. We don’t have one of these yet, so let’s make one:

This gives us a new object in .git/objects :

The object is of type “tag”:

The tag object type contains the hash of the tagged object, the type of tagged object (usually a commit), the tag name, author, date, and message:

Notice that the “object” the tag points to, via its hash, is the commit object, as were expecting.

Now, we’ll do another commit, this time let’s say we made some changes to our index.txt file and committed those changes:

Git creates a new blob object for the file that has changed.

As we see, git has now created 3 new objects for the second commit. Blob object with a new snapshot of the index.txt . Since README.md hasn’t changed, no new blob object for it is created, git will reuse the existing one instead (we’ll see in a second how).

Now, When git creates a tree object, blob pointer assigned to index.txt is updated and blob pointer assigned to README.md simply stays the same as in the previous commit’s tree.

Pointer to index.txt blob is updated and a pointer to README.md blob stays the same

And at the end, git creates a commit object with a pointer to its tree object.

Commit object points to its tree and also has a pointer to its parent commit object

and every commit expects that the first one has at least one parent.

I hope you find this helpful. If you liked it then don’t forget to give 👏 and share it with your friends.

Thank you, Readers!

--

--