What is Version Control Tool? Explore Git and GitHub

Siddhesh Gunjal
Analytics Vidhya
Published in
7 min readSep 21, 2018

Git is a free, open source distributed Version Control System Tool designed to handle everything from small to very large projects with speed and efficiency. Git has the functionality, performance, security and flexibility that most teams and individual developers need.
According to the creator of Git:
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows. It was created by Linus Torvalds in 2005 to develop Linux Kernel.

Why Git came into existence?

Whenever developers Design and Develop any application, it is common that multiple versions of that software will be developed in different sites as they will be working simultaneously on updates and also the bug fixes. Bugs are often present only in some versions, We have seen developers shifting back to previous version because of the bugs and then they release new version of software with bug fixes. Therefore for the purpose of locating and fixing the bugs, it is important to be able to retrieve and run different versions of the software to determine in which versions the problem occurs. It may also be necessary to develop two versions of the software concurrently: for instance, where one version has bugs fixed, but no new features, while the other version is where new features are worked on Trunk. So we need to maintain the proper documentation and configuration files along with updated source code. This process is also called as a Version Control or Revision Control or Source Control.
The need for a logical way to organise and control version has existed for almost as long as writing has existed, but version control became much more important, and complicated when the era of computing began. So let’s step back and learn all about Version Control System (VCS).

Version Control System (VCS):

The Version Control Systems(VCS) can be defined a component of software configuration management, which records the changes to a file or set of files so that it can be recalled in future. However, the primary use of Version control is to **Track changes**
A version control system is mostly based on one concept which is tracking changes that happen within directories or files. Depending on the version control system, this could vary from knowing a file changed to knowing specific characters or bytes in a file that have changed.
There are two types of VCS:

  • Centralised Version Control System (CVCS)
  • Distributed Version Control System (DVCS)

Centralized version control system (CVCS) uses a central server to store all files and enables team collaboration. It works on a single repository to which users can directly access a central server.

The repository in the above diagram indicates a central server that could be local or remote which is directly connected to each of the programmer’s workstation.
Every programmer can extract or update their workstations with the data present in the repository or can make changes to the data or commit in the repository. Every operation is performed directly on the repository.
Even though it seems pretty convenient to maintain a single repository, it has some major drawbacks. Some of them are:

  • It is not locally available; meaning you always need to be connected to a network to perform any action.
  • Since everything is centralised, in any case of the central server getting crashed or corrupted will result in losing the entire data of the project.

Distributed Version Control System (DVCS) do not necessarily rely on a central server to store all the versions of a project file.

In Distributed VCS, every contributor has a local copy or “clone” of the main repository i.e. everyone maintains a local repository of their own which contains all the files and metadata present in the main repository.
As you can see in the above diagram, every programmer maintains a local repository on its own, which is actually the copy or clone of the central repository on their hard drive. They can commit and update their local repository without any interference.
They can update their local repositories with new data from the central server by an operation called “pull” and affect changes to the main repository by an operation called “push” from their local repository.
Distributed VCS gives us following advantages:
All operations (except push & pull) are very fast because the tool only needs to access the hard drive, not a remote server. Hence, you do not always need an internet connection.

  • Committing new change-sets can be done locally without manipulating the data on the main repository. Once you have a group of change-sets ready, you can push them all at once.
  • Since every contributor has a full copy of the project repository, they can share changes with one another if they want to get some feedback before affecting changes in the main repository.
  • If the central server gets crashed at any point of time, the lost data can be easily recovered from any one of the contributor’s local repositories.

Why Git?

Git is a Distributed Version Control tool that supports distributed non-linear workflows by providing data assurance for developing quality software. Git provides with all the Distributed VCS facilities to the user that was mentioned earlier. Git repositories are very easy to find and access. You will know how flexible and compatible Git is with your system when you go through the features mentioned below:

  • Free and Open source: Git is released under GPL’s (General Public License) open source license. Its absolutely free and as it is open source, you can modify the source code as per your requirement.
  • Speed and Offline: Since you do not have to connect to internet all the time to perform all the operations, it completes all tasks really fast. You just need internet access to Pull the file from Working Repository to your Local Repository, then you don’t need internet to work with Git. After your modification is done you can verify changes and then you can Push your files into Working Repository when you have internet access.
  • Scalable: Git is very scalable. Though Git represents an entire repository, the data stored on the client’s side is very small as Git compresses all the huge data through a lossless compression technique. And even if in future, number of collaborators increases Git can handle this change.
  • Reliable: Since every contributor has its own local repository, on the events of a system crash, the lost data can be recovered from any of the local repositories. You will always have a backup of all your files.
  • Secure: Git uses SHA1 (Security Hash Function) to name and identify objects within its Repository.
  • Supports Non-Linear Development: Git supports rapid branching and merging, and includes specific tools for visualising and navigating a non-linear development history.
  • Easy Branching: Branch management with Git is very simple. It takes only few seconds to create, delete, and merge branches. Feature branches provide an isolated environment for every change to your codebase.
  • Distributed Development: Git gives each developer a local copy of the entire development history, and changes are copied from one such repository to another. These changes are imported as additional development branches, and can be merged in the same way as a locally developed branch.
  • Compatibility with existing systems and protocol: Repositories can be published via http, ftp or a Git protocol over either a plain socket or ssh. Git also has a Concurrent Version Systems (CVS) server emulation, which enables the use of existing CVS clients and IDE plugins to access Git repositories. Apache SubVersion (SVN) and SVK repositories can be used directly with Git-SVN.

Tools like Git enable communication between the development and the operations team. When you are developing a large project with a huge number of collaborators, it is very important to have communication between the collaborators while making changes in the project. Commit messages in Git play a very important role in communicating among the team. The bits and pieces that we all deploy lies in the Version Control system like Git. To succeed in DevOps, you need to have all of the communication in Version Control. Hence, Git plays a vital role in succeeding at DevOps.

Git compared with other VCTs:

Git has earned way more popularity compared to other version control tools available in the market like Apache Subversion(SVN), Concurrent Version Systems(CVS), Mercurial etc. because of the above stated advantages.
You can compare the interest of Git by time with other version control tools with the graph collected from Google Trends below:

Data from Google trend

In large companies, products are generally developed by developers located all around the world. To enable communication among them, Git is the solution.
Some companies that use Git for version control are: Facebook, Yahoo, Zynga, Quora, Twitter, eBay, Salesforce, Microsoft and many more.

Thank you for reading the article. Please share your comments and feedback below.

--

--

Siddhesh Gunjal
Analytics Vidhya

ML Engineer | Creator of Slackker (PyPi pakage) | Former Adjunct Faculty @upGrad | Former Professor @BSE (Bombay stock exchange)