Practicum Bootcamp
Published in

Practicum Bootcamp

Data Science Portfolio: Making the Most Out of GitHub

by Alex Kim, Consultant at Practicum Bootcamp Twitter | LinkedIn
  • Project repositories
  • Profile README
  • Open-source collaboration
  • GitHub Pages

Project repositories

As a data scientist, you can make your project repository focused on one or more of the following:

  • Dataset collection — it could be a set of useful shell and/or python scripts to collect interesting data from a website or a web API. These types of projects are very often overlooked by data scientists building their portfolios. It’s a shame because there’s a real shortage of suitable datasets to work with. For this reason, sharing an interesting dataset (or at least a way to collect it) could be your shortcut to building a good reputation among your peers in the data science community.
  • Data storytelling and visualization — a project where you thoroughly analyze a particular dataset, uncover unique insights, and build elegant visualizations. These projects are amazing for practicing one of the most valuable skills for a data scientist — communicating your findings to other people with little to no programming skills or statistical knowledge.
  • Machine learning — this type of project would contain code that performs data cleaning, feature engineering, and ML modeling. Here you can either 1) train a model on a unique and interesting dataset that no one has approached before you, or 2) try to use novel techniques and ML algorithms to achieve state-of-the-art performance on some benchmark datasets. Unless your ultimate goal is to have a research-oriented position, we recommend the former approach.
  • Keeping the source code separate from your jupyter notebooks and unit tests
  • Having a separate directory for all static files and media assets (images, audio files, etc.)
  • Having all project configs (such as virtual environment files) at the root of the project
Automatic Python code linting in Visual Studio Code.
Markdown editing and preview in Visual Studio Code.

Profile README

There’s one very special Profile README file that you can create to give the visitors of your GitHub profile a quick overview of yourself, your skills and various projects you are working on.

Create a repository with the same name as your username
Example of what a super simple Profile README looks like.
Example of a card with GitHub stats.

Open source contributions

Another great way to hone your skills is to contribute to other GitHub projects. It could be a small project that your friend is trying to get off the ground, or a big open-source project, like scikit-learn or pandas, that has been around for many years and has an active community of collaborators.

Personal blog on GitHub Pages

Have you been thinking about creating your own data science blog?

Summary

This post covers all the four major services and products available on GitHub that will help you boost your portfolio. If you are new to GitHub and maybe only recently created an account there, we would suggest starting with creating your Profile README, as it would allow you to learn Markdown syntax and get familiar with GitHub’s user interface. After that, you might want to clean up your data science projects that have been living on your personal computer and upload them to GitHub. Don’t forget about code quality, documentation, and a README. Next, set up your blog, try out a few different static site generators and design themes, then start posting! Or maybe fixing that pesky bug in a python library that you’ve been using is something that’s more up your alley. So go ahead, fix it and create a pull request. Hopefully, we’ve convinced you that establishing your presence on GitHub is one of the best ways to boost your portfolio.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store