Sitemap
Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Follow publication

Git Part 3: Discover the .git Folder

--

Have you ever wondered what the hell the .git folder is for? You didn’t? No problem. Today we’re going to look into it in great detail!!

Photo by Thom Holmes on Unsplash

Along the way, we will get an overview of the storage system and understand why the phrase ‘a branch is just a pointer’ makes sense.

Agenda

  • .git what???
  • Structure of the .git folder
  • Conclusion

.git what???

You probably already know that if you want to create a git repository, you have to type git init in your terminal. But do you know as well that a new .git folder is automatically generated at that moment? When you look at your project directory, you will find them.

Git stores everything there. If we look inside, we can see many files and folders. It may seem confusing at first glance, but don’t worry, we’ll examine everything step by step.

Structure of the .git folder

First of all, your folder might contain some additional files and folders. This is because there are some more files and folders besides the ones shown above. At the top, however, you can see all those that usually appear when creating the git repository. So let’s start with the objects directory.

objects

When we open it, we only see an info and pack folder. But later many more directories will be added. Git stores all staged files in it.

But that is not the only thing. Later we will look at two more types that are stored in the objects folder. Overall, we can say that these are what we usually call database storage. Go back to our project folder and enter the following line in the console:

echo 'test' >> test.txt
git add test.txt

Firstly we create a new file test.txt with the content test. Next, we add it to the staging area.

Not sure what the staging area is? Take a break and read my previous article ‘Git the three worlds system’. There I’ll briefly explain the three different areas in Git.

Now we can go back into the .git/objects folder.

You will see a new directory. Git stores everything in the 160-bit hash value. What appears as a 40 digits long hexadecimal number. You’re probably wondering, “Wait a minute. I can only see two hexadecimal numbers. Where is the rest?”. Git uses the first two characters of each hash to create a folder that included a file named with the last 38 characters. So you only see the first two characters of each hash. Git does this to make the memory system faster. But the whole 40 digits are the key, with the contents of the file being the value. To examine the hash, we can type:

git cat-file -p 9daeafb9864cf43055ae93beb0afd6c7d144bfa4

The -p allows us to see all file changes. In our case, you only see “test”. The hash depends on the changes and metadata, so your hash will be different. If we want to link it to the local git history, we need to make a commit. Let’s do that.

git commit -m "feat: Create a test.txt file with the content test"

Therefore we created our first commit. Change back to the .git/objects folder and type ls -la. You should have the same number of directories as below.

Two new folders were created. But why two new directories? We are only pushing one commit. Let’s analyze them and see what type each object is. To get the type of an object, we replace -p with -t in the git cat-file command.

our tracked changes
---------
git cat-file -t 9daeafb9864cf43055ae93beb0afd6c7d144bfa4
// blob
new commit
-----------
git cat-file -t 2b297e643c551e76cfa1f93810c50811382f9117
// tree
git cat-file -t b9ca915ed5e9507d44dbfaebc8a64b0f2ba52649
// commit

Now we can see that the 2b… hash stores something called tree and the 05… hash stores something called commit. To get a more detailed understanding, we must dive deeper into the memory system of git. But I think that would be too much. We will discuss this in more detail in my next article. For the moment, to get a better orientation, we can say, the blob stores only the content. The tree stores additional information, such as what file the content belongs to and what kind of file it is, and finally the commit anchors the changes in the history.

refs

Below you can see the structure of the refs folder. Into it, we find two subdirectories. The heads directory contains all branches and the tags directory contains all bookmarks from the history.

heads
We will start with the head directory. When we look inside, we’ll find a file called master. It contains a reference to the last commit -> b9ca915ed5e9507d44dbfaebc8a64b0f2ba52649.

But what does it mean? Branches are an important part of git. Each branch is completely independent of all the others.

If you want to learn more about branches, read my git fundamentals article.

As soon as we create a new branch called second_branch and looking into the heads directory, we will find a second file with the same name as our new branch.

The file contains the exactly same reference as the file master. Now we create a new commit in our second_branch and display the contents of both files again.

master
---------
b9ca915ed5e9507d44dbfaebc8a64b0f2ba52649
second_branch
-----------
c225e0c7175b7467eb6cc5f283413b2eee027ff3

The branch master still has the reference -> b9ca915ed5e9507d44dbfaebc8a64b0f2ba52649, whereas the second_branch has now a reference to our new commit.

Summarized we can say that when you create a branch, a new file with the same name is automatically created in the heads directory with a reference from the current commit. Each time you create a new commit, the reference that contains the file is changed.

So, branches are just pointer.

tags
There are two different types of bookmarks. Each of them is stored in the Tags directory. The first is known as lightweight tags and is only a reference to a commit like the files in the heads directory. The second one is called annotated tag and stores much more information. Below is a short list of them:

  • tagger name
  • email
  • date
  • tagging message

In most situations, it’s recommended to use the annotated tags, but it depends on the situation.

HEAD

The HEAD is quickly explained. It’s a reference to the active branch. So git knows which branch is currently in use. To see what our head file contains type cat HEAD.

info

In the info directory, we find additional information about our repository. One of the best-known files is the exclude file. It decides which pattern will be ignored. To define the ignored files and folders, we use a file in our project called .gitignore.

config

The configuration file, as its name suggests, stores the configuration of your repository. You can define the configuration globally for all repositories or locally. In this case, it’s only used for your local repository. Below are several configurations that you can set in your configuration file. To see a complete list, visit the git configuration page.

  • name
  • email
  • editor
  • excludefiles
  • autocorrect

description

Here you can create a short description of your repository. But it’s pretty irrelevant if you’re not using gitweb.

hooks

There are some predefined git functions that are executed on certain events. In the Hooks directory, we can see a few of them. An example of an event could be if the entire commit process is completed.

Above is a sample list of hooks from my tutorial repository. Each of them has the word ‘sample’ appended to the name. This ensures that all hooks are disabled at the beginning. If you want to enable a feature, you just have to remove the word ‘sample’. You can also edit or write your own git hooks. For more information, click here.

Conclusion

That’s it. Today we discovered the .git folder together. In the end, you should know what files/folders you typically found in the .git directory and what they do. Additionally, we get an overview of the git storage and understand why the phrase ‘a branch is just a pointer’ makes sense. I hope the article was helpful to get a deeper understanding of Git and the .git folder. If you have any questions or feedback, please let me know in the comments.

In my next article, we will look at how the memory functionality of git works. Well then see you soon.

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Henry Steinhauer
Henry Steinhauer

Written by Henry Steinhauer

Passionate software developer who enjoys exploring new programming languages, design patterns and frameworks.

Responses (1)