Git: Get a Basic Understanding

7 min readJun 27, 2024

I knew nothing about Git. Despite using Git everyday, I was running the same shell commands I’ve memorised since I first started university. This article is my effort to demystify Git for myself and hopefully whoever is reading it. This article will not contain the commands, just the operations and logic behind Git. For a quick and comprehensive guide on the commands I recommend git — the simple guide — no deep shit! (rogerdudler.github.io)

What is Git?

Git is a version control system that stores its data as snapshots of file systems, rather than tracking the changes on individual files. Everytime you save your changes, i.e. commit your changes with git it saves a , Git “takes a picture of what all your files look like at that moment” and stores a reference. If a file is unchanged, Git stores a reference to the already-stored previous identical file. A diagram for the stream of snapshots over time can be seen below.

Three Main Sections of Git:

A repository is the most basic element of GitHub. It contains all of the project files (including documentation), and stores each file’s revision history. Repositories can have multiple collaborators and can be either public or private.

Git Directory is where the object database and metadata of your project is stored. When you clone a repository into a new directory, or checkout a specific version of a repository the contents of the git directory are copied. When you initialise a repository the .git subdirectory is created.

The cloned or checked out version of the project is your working directory or tree. You can use and modify these files as you please.

The staging area or index is a file in your git directory that contains information on what will proceed to your next commit, i.e. it contains all the tracked files.

The ProGit book defines the basic workflow as below:

You modify files in your working tree.
You selectively stage just those changes you want to be part of your next commit, which adds only those changes to the staging area.
You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

Sections of Git, source: Git — What is Git? (git-scm.com)

In Detail:

When a change is committed, git stores a commit object that contains the pointer to the snapshot of the staged content and metadata such as author name, email address, commit message, pointers to the commits that came before this etc. The figure below shows the commit object pointing to the tree, which is pointing to the blobs, i.e. the contents of the files.

The figure below shows a sequence of commit objects with pointers to the commit that came before it.

Series of commits. Most recent being Snapshot C.

Git uses the SHA-1 has to checksum on contents of all changes committed, and stores these in the git directory. A checksum is the value returned from a one-way hash algorithm. It is used to validate the integrity of the data as modifying the data will change the value returned. When you modify, stage and commit a file the checksum for the file is calculated and added to the staging area and the modified version of the file is stored in the git directory.

Git File Stages:

Untracked: Files that are in your project directory but not in the last snapshot of the repository or in the staging area.

Tracked: Files that are in the last snapshot, and newly added files that are staged.

Staged: A modified or newly added file that has been marked to go into the next commit snapshot in its current modified version.

Modified: The file has been changed but not committed to the database.

Committed: Data stored in the local database in a snapshot.

Git Branching:

A branch in git is a simple and moveable pointer to a commit. In git the master branch always points to the most recent commit, which moves automatically evert time you make a commit.

When you branch git creates a new pointer to the commit you branch off from. Git knows which branch you are on by keeping a special HEAD pointer. To switch from a branch to another means simply moving this pointer. Moving the HEAD is called checkout. It is important to note that if your branch has uncommitted changes in the staging area or working directory Git won’t let you switch branches. The diagram below shows the pointers for a checked out testing branch.

When you make a commit, the branch that is pointed at by HEAD (checked out branch) will move forward to point at that commit object. The pointer diagram below shows a commit made while on the testing branch.

Merging, Rebasing and Divergent History:

Merging is joining two or more development histories together. It incorporates the committed changes from one or more branch into the current branch.

If your current branch points to a commit that is directly ahead of the latest commit (pointed to by the master branch) then merging is ‘fast-forwarded’ by moving the master pointer forward to that commit. However things are not this straightforward in reality, and branches result in divergent history.

You can see in the above diagram that the branch experiment has diverged from an older commit C2. Merging the experiment branch will be less straightforward than simply moving the pointer forward. Now Git will perform a three-way merge using the two snapshots C3 and C4 and their common ancestor C2. The result of this will be a new snapshot with a new commit object that points to it. This merge commit will have the two parents: C3 and C4 as shown below.

Most of the times Git is not able to merge diverged branches clearly due to merge conflicts. This happens when the same parts of a file are changed differently in the two branches. In this case git pauses the merge commit until you resolve the conflict. Once resolved the file is staged and the merge commit can be complete.

There is another way of integrating the branches called rebasing. In this case Git takes the patch of changes made in C4 and reapplies it on top of C3. In other words all changes made in one branch are replayed on another. Git goes back to the common ancestor of the two branches, gets the diff introduced by the branch you are on, resets the current branch to the branch you are rebasing onto and applies the changes it got from the diff. This is shown below.

Once the rebase is complete, the merge operation between this rebased branch and master can be fast-forwarded.

Remote Branches:

Remote references are references (pointers) in your remote repositories, including branches, tags, and so on. Git has remote-tracking branches that refer to the state of the remote branches which are local references that you can’t move. Git moves these references whenever a network communication is made to represent the state of the remote repository.

These branches are named as follows: <remote>/<branch>. The default name for the remote repository is origin. So the master branch on your server is named origin/master. To synchronise your local work you run the git fetch <remote> command which fetches any data in the server that you don’t have in your local database. This will not modify your working directory. To modify you need to merge the fetched remote branch to the current local branch.

A git pull is a command that combines a git fetch and merge. If you’re on a tracking branch and execute a pull, Git automatically knows which server to fetch from and which branch to merge in.

To share a local branch to the remote server, you need to execute a push as they are not automatically synchronised to the remotes.

Checking out a local branch from a remote-tracking branch automatically creates what is called a “tracking branch” (and the branch it tracks is called an “upstream branch”). Tracking branches are local branches that have a direct relationship to a remote branch.

References

Git — Book (git-scm.com)