Why should Data Analyst use Github?

Kevin Adinata
tiket.com
Published in
3 min readMay 6, 2022

Imagine Google Docs but for programming code.

What is Github?

Before asking “What is Github?”, we need to understand “What is Git?”.

Git is a distributed revision control and source code management system with an emphasis on speed. Essentially, Git allows multiple people to work on different offline copies of the same codes and then merge the two copies together in the future.

On the other hand, Github is one of the most popular Git hosting companies.
It is widely used by different organizations and allows people to showcase their coding projects/works.

Important features in Github

  • Branch
    Users can create a new branch whenever they want to create a new version of the codes. Another common scenario is when multiple users want to work on the different parts of the same code. They can create their own branch and merge it later.
  • Pull Request
    A pull request is created to merge a branch into the master branch. The project owners will be able to accept or reject pull requests/edits.
  • Fork
    Forking in GitHub allows users to copy another user code library into their account. If your organization has their own repository, you are able to save the code library locally allowing you to create an edit of the master file
  • Gists
    Gists let users share code snippets, entire files, or even applications. Each gist is a repository that can be cloned or forked by other people.

Why should Data Analyst use Github?

  • Collaboration purpose
    As Data Analysts, it is difficult to collaborate conservatively by sending our codes via email back and forth. Github exists to improve active collaboration within your data team. Assuming that your team member has changed and improved the efficiency of the codes, she will be able to create a branch. Then the team will be able to review the new code together and merge it later.
  • Code Library
    Creating a query library in the GitHub repository enables us to share our SQL queries with a simple URL share. Everything is saved in the repository and we will avoid rewriting the query from scratch in the future. Most importantly, other team members will not need to write their own query from scratch should they need to create a similar analysis.
  • Integration
    GitHub can be integrated with coding platforms and code builders. Thus making it easy to save all of the codes in your organization repository locally and navigate through different codes in the repository. From my experience, GitHub works well with Sublime Text and Microsoft Visual Studio.
  • Version Tracking
    Should you make a mistake and merge the new version by accident, GitHub allows users to revert back to the previous version with just a few clicks. It keeps track of all the changes that have been pushed to the repository. Just like Microsoft Word or Google Docs, you can just undo it by pressing “ctrl+z” and you will end up with the previous version of the documents. With GitHub, you can see the version history of your code so that previous versions are not lost with every update.

Summary

All in all, GitHub is a very useful software for Data Analysts in any organizations. It enables users to create repositories and libraries, making sure that we don’t need to create a query from scratch whenever we want to redo an analysis or create a similar reporting. Although the main scope of a Data Analyst is to solve business problems, create reports, and perform market research, they do spend most of their work hours writing very complex queries. Having GitHub on their side will help to solve this versioning issue.

--

--