
As an aspiring data scientist, one of the first things I was advised to do was to create and set up a Github repository. So what is Github? In short, I would describe Github as Facebook for programmers. Basically, every developer has a profile and posts their work on there. Similar to the save function on a Word document, Github can be a useful way to track changes in your projects. If you’re not satisfied with new edits, git allows you to revert back to a previous version. It also works as a great backup just in case your computer crashes or worse.
I quickly skimmed over a couple of introductory Github tutorials and began to upload my work in the browser. Mainly focused on learning Python and statistics, I didn’t allocate proper time to fully understand the beauty of Github. Soon enough, I developed bad habits and my Github repository was a mess. It took a while to fix it and after four months, I’m still experimenting with Github commands.
To those who are new to or struggling with Github, here are five lessons I’ve learned.
Properly setup your Github Repository
Mistake #1
As a beginner, I only interacted with Github through the website since it was the easiest way. I had a folder of my projects on my desktop. Once a draft was completed, I used Github’s drag and drop feature to upload the document. Things started to go wrong when I wanted to edit a project that had already been uploaded. After I made changes on the local file, I would then upload the revised version. Eventually, I accumulated a folder full of different versions of the same project. This didn’t seem to be a productive workflow, and it wasn’t. If I took the time to properly setup my repository through the command line, I could have avoided some confusion and stress.
What I should have done
If I properly connected my local repository (folder on my desktop) to the remote repository (my account on Github), I would have been able to update my work to Github with a few commands.
Here are some great tutorials on how to set up your Github repository:
Alex Aklson — How to Properly Setup Your Github Repository (Mac Version)
Anne Bonner — Getting started with Git and GitHub: the complete beginner’s guide
Challenge yourself and use the terminal
Mistake #2
Starting out, I didn’t have prior programming experience or exposure to Linux commands. This definitely was one of the reasons why I was afraid to access my Github repository through the terminal. However, this limited the opportunity to properly add changes, write commit messages, and push new commits to the remote repository.
Hands off the GUI/ website
Once your repository is properly set up, please do not continue to upload files by using the website. Practice git add, commit and push in the command line.
Below are simple guides/cheatsheets of Git commands:
Roger Dudle — git — the simple guide
Sviatoslav A. — Most Basic Git Commands with Examples
Be consistent with commit messages
Mistake #3
After I became comfortable with Git commands, I didn’t bother to look into best practices for commits. So, my workflow varied greatly. Things ran smoothly when I added a new commit message after a single change. However, there were times when I edited multiple files before adding one commit message. My Git command for adding was limited to ‘git add .’, which selects all modified files since the last commit and applies one message to all those changes. As one can assume, the edits I made to the files were not the same so adding one commit message was poor documentation.
Best Practice for Commits
- Each commit should be a single logical change. Don’t make several logical changes in one commit.
- Commit early and often. Small, self-contained commits are easier to understand and revert when something goes wrong.
- Commits should be ordered logically. For example, if commit X depends on changes done in commit Y, then commit Y should come before commit X.
These articles provide good tips on how to write a commit message and if you make a mistake:
Chris Beams — How to Write a Git Commit Message
Karl Broman — Amend the last commit message
Learn essential Git log commands
Mistake #4
After committing your first message, Git log is a helpful tool to view previous commits made in the repository. This is extremely handy when troubleshooting Git. For example, I once tried to upload a very large dataset to Github. I compressed the file, but things got complicated and I couldn’t upload new commits to the remote repository.
Use Git log commands
With Git log, I retraced my steps and removed the previous commits that blocked me from accessing the remote repository.
Here are some helpful Git Log commands:
Aaron Tabor — 10 Essential Git Log Command Examples on Linux to View Commits
David Wash — Squash Commits with Git
Learn how to branch
Mistake #5
While working on projects, there were multiple occasions where I had new ideas and trying them out was a risk. It was great when these ideas worked out. However, there were times when things didn’t go well. Once, I decided to completely restructure the outline of my project. It turned out to be a big mistake so I spent many hours amending the new changes. If I learned how to create a new branch when testing new ideas, it would have saved me so much time when things didn’t follow through.
Practice branching
The branch command creates a separate branch from the master branch. Basically, we can experiment on a side branch that won’t affect the master branch. If you’re satisfied with the new changes, you can merge it to the master branch. If not, you can delete the side branch and continue with the undistributed master branch.
Here’s a simple tutorial on how to branch and merge:
Karl Broman — Branching and merging
To recap
I’m not a Github expert, but that’s ok. I learned so much from making these mistakes. If you are starting out with Github, I hope this was helpful. If not, I hope you are able to learn from your mistakes and keep on chugging along.
