Git-The fundamentals Part I 🕵

5 min readSep 29, 2018

In the world of distributed version control systems, it is no secret that Git stands out the most. In fact according to the most recent stack-overflow survey at the time of writing this article, over 88% of professional developers use Git and take advantage of its numerous benefits such as providing a timeline of their changes and progress regardless of the time zone and providing the necessary communication and collaboration features within teams.

Stack-overflow survey for version control

Throughout this article about Git fundamentals, we will provide a quick explanation of how Git works and an overview of some of Git’s most fundamental concepts like the main sections of a Git project as well as commits, references, branches and stashing. We will explore merging and rebasing in Part II.

How Git works 🤖

At the core of Git is a simple key-value data store. The value which is your data can be retrieved using the key which is a hash of the data.

In fact Git uses SHA-1 function (Secure Hash Algorithm 1) to generate every key which is a checksumming mechanism that produces a 40-digit hexadecimal number from a given content. This mechanism provides data integrity since the key is a hash of all the information, meaning that any change in the data results in a new SHA1 hash. This also applies even when the files don’t change, because the created date will, which explains why we can never change a commit.

Git objects

On the other hand, Git stores the compressed data in a blob. A Git blob allows you to store the content of each file in the repository. The hierarchy between files in a Git repository is provided by Git trees. A Git tree represents the relationship between directories and the files they contain.

The main sections of a Git project

In a Git project, code can live in three different areas: the working area, the staging area and the repository.

The working area contains a checkout of one version of the project. Once these files are pulled out of the compressed database in the Git directory, they are placed on disk in order to be used or modified. Any file in the working area that is not in the staging area is an untracked file, meaning that it is not handled by Git. In fact, when you have a file that is in your working directory but that was not part of the previous snapshots (commits), Git will not include it in your next commits until you explicitly add it. This is to prevent you from accidentally including unwanted files or generated binary files.
The staging area is a file, generally contained in your Git directory, that shows the difference between the current snapshot and the next snapshot and includes the files that will be part of the next commit. The technical name for staging area is the index.
The repository (Git directory) contains all of the files that are handled by Git and all of your commits. In fact, it is where Git stores the metadata and object database of the project.

Basic Git workflow

A basic Git workflow can be as follows : By checking out a project from the repository, all of the project files are ready to be used or modified in your working directory. Once you modify files, you stage only the ones that you would like to include in your next commit. Only those changes are added to the staging area. Then you do a commit which basically takes changed files from the staging area and stores that snapshot into your repository.

Commits

A Git commit is a snapshot of the hierarchy represented by Git trees and the content of the files represented by Git blobs. Every commit corresponds to an SHA-1 hash which is based on the meta data (Author, date, commit message and parent commits) along with the hash of the root tree object which explains the data integrity previously mentioned.

Aspects of a good commit 😍

A good commit should encapsulate one logical idea without introducing any breaking changes such as failing the tests or including code quality rules violations. Commits should happen very often with a commit message that explains the current behaviour, the reason for your changes and the potential side effects. Commits are very important to document the history of your code and can be used to have a proper code review, roll back and troubleshoot.

References

A reference is an alternative way to access a commit. In order to be able to access a commit with a simple name instead of using the commit hash, it is possible to create a reference (or ref) under the directory .git/refs.

The are 3 different types of references. The HEAD which is a symbolic reference to the current branch meaning the HEAD file contains a pointer to the current branch rather than its actual SHA1. The tag which is a Git type that has a tagger, a date, a message, and a pointer and consists of a user-friendly way of pointing to the same commit. The remote which allows the collaboration on the same Git project by pushing your local branch to it. Remotes are considered read only mainly because you can not update one with a commit. Git manages a remote by storing the last pushed value to it for every branch in refs/remotes.

Branches

A branch is a pointer to a specific commit that allows you to develop with a new and independent history, working directory and staging area. Each time a new commit takes place, the current branch pointer moves to the newest commit without changing the repository or its history. Managing branches relies mainly on 3 main commands which are branch, checkout and merge. Branch commands include creating, deleting, renaming and listing. Checkout allows you to switch between existing branches and merge allows you to put together the content of two different repositories.

Stashing

Stashing can be a great tool when you would like to make sure that your work stays safe without doing a commit. This could be especially useful when your work is still too messy to be committed yet you want to make sure that it doesn’t get lost while switching to a different branch. Stashes are safe from destructive operations. Thus, given a current state of a working directory, it is possible to record it as a stash, show the content of the stash, apply a specific stash and list the existing stashes. It is also possible to control whether to include untracked or ignored files.

Conclusion

This article represents one of two parts which provide an overview of Git fundamentals and some of the best practices to take into consideration while using the elaborated functionalities. Part II is to come in which we explore two other very powerful Git features which enable you to integrate changes between branches: merging and rebasing.