The philosophy of Git

Amir Ebrahimi Fard
Data Management for Researchers
4 min readJul 26, 2021

--

Git is a free and Open Source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. In combination with platforms designed to host Git-based projects, this technology can be used to create a powerful version control infrastructure that provides opportunities for remote collaboration and tracking an annotated history of project development.

Source of the figure: “Linus-Torvalds” by laboratoriolinux is licensed under CC BY-NC-SA 2.0
Source of the figure: “linus-torvalds-facts” by laboratoriolinux is licensed under CC BY-NC-SA 2.0

Designing a curriculum for learning Git is a difficult task, as it is a massive co-dependent system with lots of commands and options. It can be difficult to isolate and explain a particular topic in Git without referencing other topics. With that in mind, we recommend taking a “spiral” approach to learning Git. Don’t worry too much about mastering one topic at a time — instead allow space for new information to plug into existing knowledge and update your overall understanding. This tutorial follows the same logic, so sometimes a concept referenced within the document will be explained later on in more detail. Thus, you may need to return to sections within this article more than once in order to contextualise the full picture.

First, we’ll take a more conceptual stance to explain the basics about how Git works and keeps track of files.

A brief overview on Git concepts and workflow

As previously explained, Git is a system to keep track of changes during the development of a project. Files can be located in three environments when a Git repository has been initialised in a project directory. Figure 1 displays those environments and how they are related to each other. The working directory is where the tracking system for Git is set up. Everything in the working directory is on Git’s “radar”, so to speak, any change in this directory (such as adding, removing, or modifying a file) will be immediately identified by Git. Though Git will always be keeping an eye on things, in order to record a specific set of changes, you as a user must submit these to Git’s staging area and afterwards commit them to your repository.

Figure 1: Git environments and how they are related to each other.

Upon receiving these commits, Git creates a history line containing information about past commits (Figure 2).

Figure 2: The history of commits after the first three commits.

One of the many interesting features of Git is the possibility of branching, which facilitates the division of labor and remote collaboration. This feature allows us to work on a project in a parallel branch without changing the original history of commits. When the parallel branch job is finished, it can be merged to the main commit history (Figure 3).

Figure 3: The branch creation and merge in Git.

Another fundamental concept in Git is the repository. There are two kinds of repositories in Git: local and remote. A local repository keeps the commits in a local machine while a remote repository is simply a repository in another location from where Git is locally operating, thus it keeps the commits history (from all the project members) somewhere outside of the local machine.

Initialising Git and tracking changes

To initialise a local Git repository and keep track of changes to files in a particular directory, we need to use the git init command. This automatically creates a set of files in a hidden .git folder, which contain information about all changes. By default, all files in the directory as well as any subdirectories are included in this tracking. When changes are made to a file, Git notices — but these are not added to the stage or committed to the repository until a user does so manually. The git add <file_name(s)> command moves file(s) from the working directory to the stage, and the git commit -m <commit_message> transfers them to the local repository with an informative message that is easy to reference later.

Figure 4: Three main commands to start working with Git.

--

--

Amir Ebrahimi Fard
Data Management for Researchers

Postdoc Researcher on AI Explainability - Interested in the intersection of data, algorithm, and society.