A Collision of Terminal and Github
For any Data Scientist getting their first crack at exploring the world of data must begin with the knowledge of how to properly use two things before even getting started, Terminal and Github. Particularly, since the two go hand in hand in building your data profile on your personal computer and to the greater data science world.
Before we get to GitHub, we must learn the essentials of terminal:
When you open Terminal you will get a message of when it was last opened and the root directory. The root directory is the top folder on your laptop. For Mac users that will be your User folder. It should look like this,
To see what is inside a folder you simply have to type “ls” after the $ sign and Terminal will show you what the contents of that folder are.
To dive deeper into any folder you can then use the command cd followed by the name of the folder. For the above list to access the Documents folder I would simply type “cd Doucments”. If you wanted to go back a step and exit a folder all you would have to do is enter “cd ..” A nice short cut when trying to access long named folders is to type the beginning of the file name and, if it is unique to the rest of the items in that folder, simply press the tab button and the rest of the file name will appear.
Keep in mind that Terminal is case sensitive.
Now we can start to go deeper and deeper into different directories throughout your computer. However, if you are time conscious and do not want to have to go through each folder everytime you open up Terminal to reach a file you can use what is known as a relative path. This is a considerably shorter piece of code than what is known as an absolute path as shown below.
Absolute Path to file:
Relative Path to file:
To do the same process for folders you can simply type “cd ~/YourFolder” to skip many a cd and go right to your prime location.
So what does all of this have to do with Github?
Before we get to that here is some brief informaiton on what Git is as defined by the wonderful source of Wikipedia —
“ a version control system (VCS) for tracking changes in computer files and coordinating work on those files among multiple people. It is primarily used for source code management in software development, but it can be used to keep track of changes in any set of files. As a distributed revision control system it is aimed at speed, data integrity, and support for distributed, non-linear workflows”
Now we dive deeper into the Git world into Github. Github is a hosting service for Git respositories from individual and group accounts compiled of codebases. It allows a sort of social network for programmers and data scientists by allowing you to follow and favourite projects across the site.
We have finally reached the convergence of Github and Terminal when accessing individual files,loading them to your local computer, and then returning them to Github.
The first step is in this process is that you in your selected Github repository you will need to “Fork” the repository to you computer. To do this you simply click the “Fork” button in the upper right hand corner and then select your repository and it will now appear at your Github page. This now allows you to keep track of any changes and new versions of the repository on your page without affected the primary repository.
Once you have completed forking the repository you must now clone it by opening the clone or download tab and then highlighting and copying the SSH Key code. Before you grab the repositories files you should go into your terminal and “cd” to the folder you would like the Github project to be located in for sake of easily finding it later. From here we now save the repositories files on our computer using terminal with the easy command of “git clone” shown in the eample below:
Do not clone a repository into another repository.
Now you are able to do whatever you want with the repositories files whether that be adding new documents or python notebooks or really any other document it can all be added to you local computer.
Once you have completed you goals with the repo on your local computer you can now upload to your repository on Github. You should be in the folder you wish to upload. I would advise against uploading large datasets as it can result in overloading Github.
Now to upload the file we start with entering:
this tells Git that we are ready to add the folder to the repository before we commit the changes using,
git commit -m"Message"
Anything can be put in message, but I recommend calling it the name of the folder-your initials so if others access your repository they know which folder is yours.
Finally, in Terminal we do,
This will upload all your files to your repository. If you wish to show your progress to the larger repository you “forked” it from earlier simply open the project file on your Github page click the “New pull request” button and it should upload your items to the larger repository for others to view.
I hope that this guide has been helpful.
If you would like a more visual version of the entire process on Github I highly suggest Mark Mummert entitled “Submitting Work in Github.