“Git” fundamentals (History, Architecture & Popular commands)

Mehul Gala
Nov 3 · 8 min read

In modern software development, a version control system is an absolute must. It is not a luxury but an out and out necessity. Even if you’re working alone on a small project, you dare not afford to lose your intermediate milestone changes or let the ongoing features mess up the existing working system. If ever turn your eyes towards the market, you will hear two popular names, Git and SVN.

Number of public repositories on popular version control systems

Not so surprisingly, the Git is ruling the market hands down. But why is it so popular? What gives? At the most fundamental level, it offers features like any file tracking system should. Things like tracking changes to a file, list them in the form of versions, provisions to move back and forth between them, compare them, see logs. What makes it so popular is its decentralized architecture which highly optimizes its performance. It does not require network access or a central server. Every git local directory is a full-fledged repository in itself with a complete history and full version tracking capabilities. A lot of people confuses it with GitHub, which is a completely different code sharing platform, that I’ve covered in a separate article. You don’t mandatorily need Github to use Git.

One of Git’s Tag-line

History

Git was developed in 2005, by Linus Torvalds, the guy who invented the Linux operating system, to support the massive collaborative development of the Linux kernel. Prior to Git, the Linux developers were using the BitKeeper, a proprietary version control system, which was free for use earlier on, but preciously in 2005, its copyright holder decided to revoke its free usage and licensed it. This did not go well with Linus who had firm beliefs in the open-source and collaborative software development process. Although he loved the distributed model of the BitKeeper, he chose to abandon it.

To add to his frustrations, none of the other source control systems of the time could match his requirements. All of them followed a centralized model, where the entire source code lies on a central repository. They were taking nearly 30 seconds to push a single commit on the server, way too slow for the rapid pace at which Linux was under development. What he wanted was “a distributed version control system && free for use”. None existed. So he decided to make one. With Git, he managed to achieve 6–7 commits per second.

One last bit of trivia, before we move on to understand Git further, the name git means an unpleasant person in British English slang. That’s what the introverted Linus believes he is to most people, and hence git is named after him. Jokingly, he calls himself ‘an egotistical bastard’ who names all his projects after himself. First Linux and then Git. He never takes Git seriously, a live example is the Linux ‘man’ page, which calls Git “the stupid content tracker”.


Architecture

Right then, with history done, let’s explore its architecture.

Git Architecture

It is easy to get overwhelmed by this diagram but you would understand every bit of it after this series. I must admit, I have seen systems with nasty terminologies for their operational keywords, but Git has managed to have highly intuitive commands that are very easy to pick up after a couple of usages.

Everything begins with a Repository. It is nothing but a collection of files under a single project name. There are four sections where different versions of the file can reside,

  1. Workspace (Working directory): It is your working copy of the file. The physical state of the file which you can see and modify. You are free to add, modify, remove anything you want. Git will not track any of the changes you are making when the file is in this section.
  2. Staging: When you feel like you have completed one logical section of the task you are doing, you can add this state of the file (with these changes) to the Staging section. This will create a temporary version (called the staging index) of the file.
  3. Local Repository: You can move all the staged changes of this file to the local repository. This is usually done when a file reaches the complete cohesive state (like completion of one logical portion of a feature or a hotfix). You will have to commit these changes which will create a permanent version of the file in the local repository. This version will have all the details (like the name of the person who is committing the change, commit date, the cryptographic hash of the file which acts as the commit-id, and the commit message). The staging history of this file (all the temporary versions) will be trashed when a commit happens.
  4. Remote Repository: Wait, didn’t we talk about the decentralized architecture? Then, why is there a remote repository? Well, this step is completely optional. If you are working alone on a project, you can achieve full version control capabilities with the first three steps. But if you are working in a collaborative environment across multiple people and teams, then you would need to sync the changes with the remote repository for everyone to have access to the latest code changes. (which is where Github comes in).

The working directory, staging area, and local repository are often called as “The Three Trees of Git”.


Commands

Since Git offers a hell lot of things, it has an exhaustive list of commands. Important to note that Git was developed for Linux, so all the commands are Unix-style. Even if you are working on Windows, the ‘Git for Windows’ comes up with a Bash Editor, which lets you interact with Git using Unix commands. You can access Git Bash from the ‘Context Menu’ on any Windows folder.

Let’s talk about the popular commands which you will be using often. I found this a good source for command reference.

  1. Init: You can make any existing or new folder into a Git repository using the Init command. This is the first step in the workflow.
  2. Clone: Alternatively, if you already have a remote repository setup, then you can simply clone it using the clone command. This will pull all the changes of the remote repository to the local repository and will be available to you to work on.
  3. Config: You need to do a bit of configuration like setting up your username and email, and other preferences like colors and aliases. All of this can be done using the config command. You apply these configurations in the local scope (i.e. only for the current repository) or the global scope (for all repositories).

After the setup, you can now start adding files to the repo and start making changes.

  1. Add: You can move any file from the working directory to the staging area using the add command. You will get an option to add a particular file or the entire directory to the staging area.
  2. Commit: You can move all staged files to the local repository using the commit command. You need to pass the mandatory commit message to this command for the recording purpose. There is a shorthand command which lets you add and commit the files in one step.
  3. Diff: It is very handy to see the differences between different versions from time to time to get the perspective of what is changed. This is done using the diff command. Git diff highlights the changes added and removed with minus and plus signs. You can color-code them for more intuitiveness.

I want to take a pause here, and explain the popular a git terminology called The HEAD. It is the latest version of the file in the local repository (the last committed version). We talked about earlier how each committed version is uniquely identified by a Hash which acts like a Commit Id. It is a very common need to compare your working directory with the latest commit. It will be a painful task if you would need to go and fetch commit id (hash) of the latest version every time. To avoid this pain, a separate label called “HEAD” is associated with the latest commit.

Resuming on the diff command, we can achieve the following things this command

  • Check the difference between file state in the working directory and staging area.
  • Check the difference between file state in the staging area and local repository.
  • Check the difference between file state in the working directory and local repository.
  • Check the difference between the two different commits.
  1. Status: You would need to check the status of your repository from time to time. To see which all files are committed, staged or untracked. This can be done using the status command.
  2. Log: To see all the commit history of the working directory use the log command. The vanilla version will display the entire log (messages, changes, author, date). There are many switches to optimize the seeing of logs as per your need.
  1. Revert: There would be cases where you would want to undo any particular commit. This can be easily achieved using the revert command. We can revert the head or any other commit we want. If done so, Revert will create a new commit with the “inverse” of the changes of the commit you chose to revert. Important to note that the Revert can only undo a single commit, it does not revert the entire up-stream. For that, you would need to use the reset command.
  2. Reset: To reset the entire upstream, and completely switch to any previous commit, you would need the reset command. This is a very powerful command that augurs to tread lightly. If not used properly it will make a mess of your project. It comes in three flavors (soft, mixed and hard). It operates at an individual file level and also at the entire commit level.

When we choose to reset a file or an entire commit to any other previous committed version, we will have to choose one of the flavors.

Local Repository: Affected. The head is reset to the reset commit version specified.

Staging Area: Unaffected. They will still have the latest release changes.

Working Directory: Unaffected. They will still have the latest release changes.

Local Repository: Affected. The head is reset to the reset commit version specified.

Staging Area: Affected. All the staged files will be reset to the reset commit version specified.

Working Directory: Unaffected. They will still have the latest release changes.

Local Repository: Affected. The head is reset to the previous reset version specified.

Staging Area: Affected. All the staged files will be reset to the reset commit version specified.

Working Directory: Affected. All the working directory files will be reset to the reset commit version specified. All the untracked changes will be lost forever.


That concludes the fundamental section. Git may appear a little overwhelming and intimidating in the first impression, and getting the hang of it takes a bit of time, but considering its omnipresence, putting efforts to make a good grasp on it is totally worth it.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade