What is Github?

Maria Gusarova
7 min readJul 13, 2022
drawing by author

This article is part of a series where we walk step by step through solving fintech problems with different Machine Learning techniques using the “All lending club loan” dataset. Here you can find the complete end-to-end data science project for beginners to learn data science.

GitHub is a cloud file storage service for software development. The point is that on this service you can place some files from your computer and store them on a remote server. And, you can do it completely free of charge.

Github is used by 83 million developers (as of June 2022), and more than 200 million repositories (including at least 28 million public repositories). It is the largest source code host as of November 2021. (ref.)

But, the question arises: why do we need Github when there are such cloud storage services as Google Cloud, Dropbox, etc.? Why Github?

In order to understand this issue, you need to understand how this service works.

GitHub is a collaborative development and project hosting service. With the help of GitHub, an unlimited number of programmers from anywhere in the world can work on the same project. GitHub has a git version control (management) system that allows you to view and control any code changes by any developer and return to the state before the changes.

In general, GitHub is a social network for developers where you can find open-source projects from other developers, practice coding, and store your portfolio.

After you register an account on the Github service, you can create repositories.

GitHub page screenshot

Repositories, it is so-called folders where you put a set of files and this will be your project.

Inside the repository, all code changes are stored as branches and commits.

A commit is the main development object that stores all code changes per iteration. In fact, this is a list with all the current changes and a link to the previous version of the commit. Each commit has attributes: name, creation date, author, and comments on the current version.

GitHub page screenshot

A branch is a pointer to a commit with certain changes. For example, two developers took a commit, and each of them made their own changes in the code, creating a new commit. This is how two branches with different codes appeared in the project: the developer can choose which commit to working on next.

The main branch of the project, as a rule, is considered the main or master branch — developers create new branches based on it. You can also create an unlimited number of branches to make new changes without interfering with the main project.

GitHub is online code storage and synchronization service for programmers and developers, where Git is the name of the version control software developed by the author of the Linux system, Finnish programmer Linus Torvalds. And the word Hub in this case can be translated as “community” or “portal”.

When it comes to building an application or a web service, Git is a must and indispensable tool. It often happens that a fix made to the code breaks the rest of the working parts of the project, and even after removing this fix, the situation does not improve. The solution is Git. It protects your project from such surprises and eliminates the possibility of accidentally deleting edits or files.

Thanks to its unique approach to data storage, Git can quickly roll back a project to a working state when errors occur. You do not need to look for problems that the addition of changes added, because at any time you can return to one of the old versions. Such a system is needed so that the people involved in the development can freely ‘deep dive’ into the code without fear of harming other people’s edits or the whole work of the project.

With Git, you can maintain a working version and create new ones in parallel, merge them together or separate them with one command. This tool speeds up the development process and makes it more efficient.

Git webpage screenshot

Git is a distributed version control system which implies three services at once:

  • Storage, processing and transmission of data.
  • Control over all variations of the project.
  • Parallel development. Remote file storage makes it possible for different people to edit the same objects at the same time and at the same time guarantees the absence of conflicts in the code. This is achieved due to the fact that each project participant stores his edits in a local repository on his device, and they will not take effect until he or she uploads them to the server and merges them with the working version.

How to get started with Git

To include Git in your development process, you first need to install it on your computer. You can download the latest version of the Git toolkit from the official Git website here, which is also relevant for Windows and macOS.

Git webpage screenshot

If you are a complete beginner and have no experience I suggest visiting this page to get some basic git knowledge how to use:

https://git-scm.com/doc

Or this GitHowTo page:

https://githowto.com

These will help you get basic knowledge of Git to start your project using GitHub and Git.

If you want to use git, you don't need GitHub for it, but you could not use GitHub without git. There are many other alternatives to GitHub, such as GitLab, and BitBucket solutions.

You can interact with files on your Github page through the web interface, including downloading their contents, browsing them, and even editing them.

The repositories are maintained by Git.

You might be confused as to why git if we trying to understand what is GitHub?

Git is to control versions of your project. But Github is a web service where you could collaborate, share, release and host your project.

For example, you created a project on GitHub, and you made it publicly accessible to the world. Anyone can access it, moreover, if someone found your project interesting, they can start forking it. First, that person will clone your project to their local machine using the git command — git clone. Second, will create a separate branch with the git command and make some changes with code in the newly created branch and after push these changes to the remote branch. Third, after pushing the changes, that person will create a pull request using GitHub, which will notify you about the changes. And lastly, you can review these changes, discuss questions or ask for more changes, and finally merge new features or fixes into your project by merging the stranger request. A great environment for open source development! And if something goes wrong you can always revert the changes using Github.

Git and GitHub — are like one piece, as there is no GitHub without Git.

For more tutorials and information about git please visit the links below:

  1. Git Cheat Sheets
  2. Learn Git Branching
  3. Exploring git visually

Want to learn more? Here is the complete end-to-end data science project for beginners to learn data science. By completing this project: 1) you will experience the entire data science cycle yourself, 2) you will develop a project that you can use to prove your experience, and 3) you will answer the most popular interview questions in case you decide to pursue the career of a data scientist.

What do you struggle with in your early journey? Please share it with me here, and I am happy to help! I listen to your stories carefully and want to produce content that helps you in this journey. For more content like this, sign up for my newsletter.

--

--