Creating Git Repository From Scratch

Chandan Kumar
Mar 13 · 9 min read

Git is an essential part of every developers’s day to day life, but how many of them really knows what does inside git behind the hood. How git performs a git commit when you say git commit or additionally any other commands.

But how does it really matters to know what goes behind the hood, and why one should care?

  • As a developer we should be aware of the internals of the tool we use, specially when we use is on very frequent basis.
  • It will really help us do many things such as creating a branch or resolving merge conflicts or solving any other issue related to git efficiently if we know the internals.
  • It provides confidence to work and use commands when you know the internals, specifically when you think you messed up something.

Is this article for you?

  • If you are new to git and its basic commands, I would suggest to go and read about them to know an overview what we’ll be talking about.
  • If you already know the basic commands like git commit, git add, git branch, git pull etc and you want to know about there internal working you are good to go and this is for you.

What’s coming on next?

We’ll be creating a git repo from start without using git init command that will give us a sense of how git creates a .git directory and what are the contents of it, we’ll be able to relate many of the top level commands as we go, example- what happens when we add something to git staging area.

Lets start

We’ll start by creating a directory which does not have git initialised yet.

In above step we have created a directory and used git status there, we can clearly see git is not happy with the command and it says this is not a git repository.

So, Let’s first use the git init command and see what git does when we initialise a git repo.

You may notice that once we have initialised the git, it has created a directory for us called .git, now lets explore what’s inside a .git folder.

Till this point we can clearly see that .git has two main directory inside it “objects” and “refs” and some more subdirectories inside it, there is one more directory called hooks but as of now we’ll not worry about it.

There are some more files to notice HEAD and config, we’ll look what are these in sometime.

Now lets create file and add it to git to see what it does

We can see that after adding the file to git, it has created something inside the “.git/objects” and that seems an interesting number, now we have done enough and seen the basic structure .git. As we have seen the initialisation using the git init command lets now delete the .git and initialise this by own.

As we deleted the .git directory it’s now not a git repository.

Unfold the “git init” magic

As we’ve already seen for having a git repo, it must have a .git and inside .git it has two more important subdirectories called “objects” and “refs”, lets create them and see if git is happy with that

Creating .git and subdirectory didn’t help us as git is still not happy, so now lets pause here for sometime and understand

“what are these two subdirectories and how they are useful to git?”

.git/objects

Let us think of git as maintaining a file system specifically maintaining snapshots of the files in time, a file-system can contains of files, directories and subdirectories.

git objects directory is used to store three kinds of objects which are called as

  • blob
  • tree
  • commit

BLOB (Binary Large Object): A blob stores nothing but content of file, it is different from a typical file because it doesn’t contains any metadata, but it only contains the content every blobs is identified by it’s “sha-1” hash.

TREE: In git equivalent of a directory is tree, its a directory listing which refers to blob and other trees, it’s also identified by a “sha-1” hash and it stores pointers to other blobs and trees.

directory like tree structure

If you notice closely above it represents a directory structure in form of trees and blobs.

COMMIT: In git a snapshot of working tree is called a commit, it stores reference to the working tree and also contains some metadata like author name, time, commit message and parent commit if any. It also identifies by a “sha-1” hash.

A commit stores entire snapshot of working tree and not just a diff between commits, doesn’t that means it has to store a lot of data?

To solve that problem git only creates a new blob if the content of the file has changes else it uses the same reference sha to point to that blob object, for example- if we create 100 files with same content git will only create a single blob object for it and reference every file to it.

.git/refs or .git/refs/heads

We might already know about git branches, when we create a new git repository now-a-days we get master or main as our default git branch.

A branch is nothing but a names reference to a commit, we can certainly just refer to a commit but as humans we are not so comfortable remembering the sha ids hence giving it a name.

The .git/refs/heads directory contains information about branches and it’s nothing but a file which is named after a branch having a reference commit as it’s content, as we know we can have many branches thus can have multiple files inside .git/refs/heads.

As we now know about objects and refs, lets create a default branch inside refs/heads called main

Oh, git is still not happy it seems, what else is remaining?

As we know we can have multiple branches at any given point, but now the question arises that “how git know which is the current branch it should point to?”

And to solve that problem git has a file called HEAD on top level which we have seen already. It stores reference to the current branch. So lets add the HEAD which points to main branch

Wow! git seems to be happy this time, now it points to main branch.

What do we do now?

We are now ready to create a commit and we’ll do that using some more internal commands, for time being we’ll remove the cool.txt file to make everything clean.

Our .git structure looks better now:

Creating git commit

The hash-object command takes an input and in this case the input is from — stdin and then it generates a blob for it.

Using the git hash-object we’ve added a blob with content “git is cool” notice that we haven’t given a file name to it, it’s just the content, and we also see that is has created a file in the objects directory, if we try to cat this file it will show the garbage but we have a git commands which can solve the problem by printing the type and content of this file.

Notice that the object type is blob and the sha completes with the two characters of the directory names, git makes these directories for easy lookups.

As we have a blob we can add that to out staging/indexing area

We’ve used git update-index command which is internally used by git when we say git add we can see that git thinks that the file cool.txt is deleted because the root directory does not have this file and when we updated the index we’ve also given the file name cool.txt with permissions 10064 we’ll not worry about the permissions but the name for now.

Let’s dump the content of the file and create cool.txt with that same content and see how git behaves then

As the file is now created git seems to be more happy and green, now we are at a stage where we can make a commit, but before that we’ll need to make a tree with this particular blob because we can only commit a tree, so lets do that using git write-tree command, it will pick all the changes we’ve staged and create a tree out of it.

Tree is now been created and we can see that the tree points to a blob which has a filename in reference (cool.txt).

We’re all set to make a commit, let’s now do that using the command git commit-tree

By using git commit-tree, passing a tree id and a message we’ve made a commit, once we commit a tree it gives us a commit id when we cat-file the type we can see that it’s a commit object and the content has a tree which we’ve passed to it along with the message and it picks up the author from the local git configs.

At this point if we see the structure of .git folder we can see three objects lying there, a blob, a tree and a commit

Let’s now do a git log and see if out commit reflects in logs or not?

Oh no! where does our commit go?

Remember that when we created the branch using the touch command we haven’t specified the commit id it should refer to, that means main branch does not no where it needs to point and fetch commits from.

And to solve that problem we’ll need to assign the commit id to main branch

Yey!! Assigning the commit id to branch main, meaning branch main will now starts pointing to this commit and so on, thus the git log started working.

We’ve now come to an end where we’ve already created a git repository and made a commit without using any of the top level commands.

I’m hoping now you know a bit better about how git internally works. If you want to know more about the commands which I’ve used in the article you can refer to my previous article on git Knowing Git Inside-Out

Thanks for reading :)

Geek Culture

Proud to geek out.

Sign up for Geek Culture Hits

By Geek Culture

Subscribe to receive top 10 most read stories of Geek Culture — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app