Git Without Fear for Human Beings

Henrique Mota
Full-Stack tips
Published in
7 min readJun 13, 2018

If you work at a development team like I do, you know that git becomes part of your daily life more than any single programming language.

that fear of committing time.

There are many developers that fear the moment where they have to push their changes to the server. I’ve been there and that is perfectly understandable. For a guy that was used to svn or cvs, starting to use reactively git, just knowing a bunch of commands don’t reduce the stress levels.

Hopefully after reading this article you will be able to enjoy programming again without fearing the moment where you have to push your code to the server.

Seeing the big picture of version control

It all starts with a commit because this is the atomic point of interaction with git. So you insert a commit into git and a version of your project is stored, with git working under the hood, so that you later can perform a operation from this stored version.

So let’s see this in practice, starting from scratch:

> mkdir demo-project
> cd demo-project
> git init
> echo "A simple file" > A
> echo "Another file" > B
> git add A B
> git commit -m "First commit"
> cat A
A simple file
> cat B
Another file

We created our first commit, containing two files with a message “First commit”. This is the base version of our project, since it’s our first commit.

Let’s build another version, committing changes on this files:

> echo "Another line" >> A # >> appends a new line to our file
> echo "Another line" >> B
> git add A B
> git commit -m "Second commit"
> cat A
A simple file
Another line
> cat B
Another file
Another line

So we committed our second version on the top of the previous commit and just like a database we inserted another “record” in the table.

Yes this is nothing more than a database and if it’s a database there is a way to perform “queries” to our commits, right?

Right. First of all let’s list our commits from the our version “database” with git log and a couple of options for readability sake:

> git log --pretty --graph --oneline

If you want to add this to your git aliases:

> git config --global alias.lg 'log --pretty --graph --oneline'# now you can use:> git lg

Our git log will output the following:

* 720ae00 (HEAD -> master) Second commit
* 347c154 First commit

As expected we have two commits with an hash id and as you may notice we have (HEAD -> master) in the second commit.

The master as you may know is the current branch you are in.

HEAD is just a pointer to the checkout point. So our current checkout point is the master branch signal by the -> after HEAD. Let’s checkout the first commit:

> git checkout 347c154
> git lg --all
* 720ae00 (master) Second commit
* 347c154 (HEAD) First commit

Performing a git checkout moves the HEAD pointer to a new commit and consequently this commit state will reflect in your project folder.

As you may see our HEAD now it’s pointing to our first commit, in a detached state. Being in a detached state only means that HEAD is not pointing to a branch.

But what is a branch?

Well conceptually a branch is any deviation in the git database. As I said previously git is just a database of commits built on top of each other and this represents an acyclic directed graph. If you want to know more about graphs check this amazing article of Vaidehi Joshi:

scenario where we have 3 branches

In our previously example we only had one branch, the master branch. In the previous figure we see three branches, because there are 2 deviance points in commit 2 and commit 3.

How git stores all this? The shocking true is that a branch is just a reference/pointer to a commit. If you think about it, being in a presence of a graph, if you know the last commit you will know all is ancestors. But this is only the start.

how git represents a commit

Besides being a reference a branch is also scope, every time your HEAD points to a branch after a commit, the HEAD and consequently the branch will point automatically to the new commit. By another hand git reset does the same thing but backwards.

So:

> git checkout branch-2 # our HEAD is pointing to branch-2
> git reset --hard <hash-commit-7> # reset HEAD -> master to commit7

So we will end with this graph:

git graph after a branch-2 reset to commit 7

Do you see how the branch-2 is now pointing to commit 7. But wait commit 8 is still there, no I didn’t forgot commit 8 when I made this figure. Git commits aren’t erased after a reset, so when you do this you don’t loose any version.

In the next topic we are going to see something that you are not used to see every days. Let’s use some out of the box commands to understand what is happening under the hood.

Going behind the git atomic point of interaction — the commit

So we already saw that git is nothing more than a database of several commits, but the question is:

How the hell a commit can be materialised into a version of your project?

Is not something you have to worry about, but knowing how it works will make one more point to trust in git:

Let’s back to our scenario where we had two commits:

> git log* 720ae00 (HEAD -> master) Second commit
* 347c154 First commit

Now I am going to use a command that may be new for you git cat-file, there is:

  • p # pretty print
  • t # type
  • s # size

Let’s see the type of our first commit:

> git cat-file -t 347c154
commit

As expected it’s a commit.

I hope you are excited, because right now we are going to see the content of a commit, an aha moment:

> git cat-file -p 347c154tree 99d89c723829ec2352809c52e507b2a46119a948
author **** <****@gmail.com> 1528907497 +0100
committer **** <****@gmail.com> 1528907497 +0100
First commit

Hum very curious, we have a tree, an author, a committer and the message of the commit. I’m intrigued with the tree, let’s confirm that this is a tree:

> git cat-file -t 99d89c723829ec2352809c52e507b2a46119a948
tree

Now let’s see the content:

100644 blob d34fd419c2aed7ceee1247760bb2b951df961959 A
100644 blob b0b9fc8f6cc2f8f110306ed7f6d1ce079541b41f B

Cool, we see that the tree it’s actually the tree of our files A and B. We have the first digits reserved for file permissions a blob reference and the name of the file.

Let’s see the content of the blob associated with the name A:

> git cat-file -p d34fd419c2aed7ceee1247760bb2b951df961959
A simple file

A blob is a snapshot of a file content.

Congratulations we traverse the git commit trough the blob and now you know how git stores is objects. By the way all this magic is stored on the .git folder of your project.

But wait how does a git knows where it comes from? We only seen the first commit, the start point of our project. Let’s see the contents of the second commit:

> git cat-file -p 720ae00tree e30ead0c3157e6e2a355e49cdb76ec67f6542d21
parent 347c154abd66d914ea067f749aabb23cd1b07c77
author *** <***@gmail.com> 1528907929 +0100
committer *** <***@gmail.com> 1528907929 +0100
Second commit

So in the second commit we have something we didn’t have in the first, a parent reference. Every commit stores is ancestor and there are some special commits ‘merge commits’ that store 2 parents.

We can also see that although we didn’t made any change to the structure of our project tree, the tree ID changed. This is because every object in git is immutable, so a blobs ID change implies a tree change.

Let me leave you with a schema representing what we’ve seen in this section:

git storing file versions

As a bonus I will going to present a very curious case, to see how well designed is git. Let me copy A file into a new file called C. Then I will commit that file, and let’s see the hashes of both blobs.

> cp A C
> git add C
> git commit -m "Third commit"
> git cat-file -p HEAD
tree 948e4085e60fb71230e7e4fae2a9edbf6c7af780
parent 720ae00f0a865e56bf3ca39062cc7d3c192394bb
author *** <***@gmail.com> 1528924849 +0100
committer *** <***@gmail.com> 1528924849 +0100
Third commit

So again we have a new tree, let’s see the contents:

> git cat-file -p 948e4085e60fb71230e7e4fae2a9edbf6c7af780100644 blob 791b4c25c92cf5bd6c5a7a05343132f69ed3ff3d A
100644 blob a091a72e3cf3e81b8d0b0bc0c19abee7aab03fe0 B
100644 blob 791b4c25c92cf5bd6c5a7a05343132f69ed3ff3d C

The content is the same, so there isn’t a need to create another object to store C. Let’s see the content of the previous commit main tree:

> git cat-file -p e30ead0c3157e6e2a355e49cdb76ec67f6542d21100644 blob 791b4c25c92cf5bd6c5a7a05343132f69ed3ff3d A
100644 blob a091a72e3cf3e81b8d0b0bc0c19abee7aab03fe0 B

Git don’t waste any space, it creates new objects when it has to create.

Hope you liked it,

Stupid Gopher

--

--