Photo by bongkarn thanyakij from Pexels

Preparing Github Repositories For Portfolio Part 1: Organizing and Cleaning Repo Files on a Mac.

Alexander Beat

--

I recently graduated from Flatiron’s Data Science Bootcamp and am in the process of cleaning up some repositories and projects on Github and making things more organized for a portfolio. We had some lessons going over cleaning up some of the files, so I chose some of the best parts that I found most helpful to share in this blog post. This first step in the process is more of the organization and handling of files in the repo, and then the next part will deal more with the cleanup of the actual project notebook code, maybe some work on the README.

Update your repo name and URL

You’ll want a new name for your repository that explains the project better, and while you’re at it, be sure to add a description. Go to the repository on Github and click on settings.

Click on Settings.

Then you’ll be able to change the repo name on the next page like shown below.

Type a new name and then click “Rename”.

Change the description.

Back on the main page for the repository, on the far right, you’ll select the gear icon to add a description.

Select the gear icon.

This will show the edit details screen and you can type in your description and add some tags too.

Checking and updating remote origin.

Changing the repo name will also change it’s URL, so you will need to make sure your local repository on your machine connects back to the one on Github by updating the remote origin on the Terminal. Use these commands in Terminal to do so:

Check what your current remote repository is.

git remote -v

Updates your remote origin to the new one.

git remote set-url origin <new repo url> 

Updating broken links from URL change.

Also note that since the URL changed, anything that links to it within your project files will need to be updated. For example, in one of my projects, I had links in the README that were being used to show images, and with the URL changed, those image links will be broken. Below you can see what I mean.

Some examples of links used in the README code that could be broken if I were to change my repo name and URL. Be sure to update links like these in your notebooks and in your readme.

Updating the Colab button.

Another instance where this became problematic was when using Github with Colab. If you’re using Colab with Github, and you saved your Colab Notebook to Github with the option of having that “Open in Colab” button at the top of your project notebook, then that link will need to be updated as well. See the image below to know what button I’m talking about.

There’s a couple ways you could go about manually changing this though I don’t find them to be very intuitive. You can open the notebook editing screen in Github and directly edit the link to direct from your new repo URL like in this image below. You’ll find the link code towards the top of the edit window code.

The link code for the “Open in Colab” button.

Or, another way you could fix this, is by saving the Colab notebook to Github again with the button option selected like seen here.

The file menu for your notebook in Colab. Save a copy in Github.

The next window shows your options where you can select the checkbox to include the link to Colab (that same button at the top of your notebook I showed you above).

Both of these solutions are not very elegant but got the job done. If you have a much smoother or better way of handling this Colab/Github button disconnect then I would love to hear it and it would help me out a ton.

.gitignore file.

Something else that will help to clean up your repo is to create a .gitignore file. We had a lesson in our post grad work about cleaning up repositories and I found this one to be really helpful and had no clue what it even was at first until researching more about it. Here’s a screenshot of the example repository and you can see that there’s no .gitignore file.

This file will help by telling Github not to track certain files you may not want to include in your commits. Some examples of files to include in the .gitignore file might be like system files, caches that could cause problems when the repo is cloned to another user’s computer, or private data you don’t want to be committed to Github and available for others to clone from your repo, or large datasets/directories that you don’t want being uploaded to Github. Github will send a warning if your push is larger than 50MB, and will not be allowed if the push is larger than 100MB. You don’t necessarily need to list your data in the .gitignore file if you already know that you have it saved in a different spot and not within your local repository. But if you have your data saved within the local repo and don’t want to accidentally add it to your commits and push to Github, then listing it in your .gitignore file could be useful for you. It depends on how you keep your data organized and where you like to store it.

There are a lot of different types of .gitignore templates depending on what type of technologies or languages you’re coding with.

This is the template from Github that we used in our lesson for Python notebooks

And a link to all of the other templates that Github provides.

Here is the command to create a .gitignore using Terminal:

touch .gitignore

This will open the file in your text editor once you’ve created it.

open .gitignore

Copy the code from the Github template and paste it into your .gitignore file.

Example of what the Github .gitignore template page looks like. Copy all of that code.
Paste it into your open .gitignore file that you created.

I’m on a Mac so there is a file that you can add to the gitignore labeled “.DS_Store” that can be removed to declutter the repo. It’s a file that Macs use with Finder to access all your different folders and contains types of metadata and information on system files. So it can be beneficial to not include these on a public Github page. Add it to your .gitignore.

Remove any other files you don’t need in the repo. You can manually do this from your finder window, or in Terminal using the command:

git rm <filename>

Some files that could be removed could be the dataset like I mentioned previously, but make sure it’s backed up elsewhere. Also the .DS_Store file and the .ipynbcheckpoints folder can be removed from the repo.

In this example we are working with, there are random images just hanging in the repo. We can move the random images into a folder to clean it up a bit.

You can make an image folder to contain those and do this manually in your Finder window, or in Terminal using the command

mkdir images

Then move images into the image folder manually in Finder or using Terminal command:

git mv <oldlocation/filename> <newlocation/filename>

Here’s an example of that command to move the file.

git mv for-sale-header.jpg images/for-sale-header.jpg

You may also want to consider renaming your files like your notebook to more descriptive names as well.

Afterwards, once the changes are pushed, the example repo has a much cleaner look to it.

I hope this helps you get some ideas for how to clean up your repositories a bit and create better looking work for portfolio. I’ll start going into detail on cleaning up notebooks and readme files in the next post. If you have some advice or additional insight related to this, I’d love to hear from you on what other steps you used to clean things up or reorganize. Thanks.

--

--

Alexander Beat

Data scientist. Flatiron grad. Artist converted to tech. Fascinated by technology, space, global culture and history. linkedin.com/in/alexanderbeat