Turning our API hackathon into an end-to-end data science portfolio project

Deepsha Menghani
Data Science at Microsoft
10 min readApr 18, 2023

We (authors Deepsha Menghani and Riesling Walker) worked together on a Microsoft Global Hackathon project to analyze data around a common passion using Python and R. Through this work, we wanted to create an end-to-end data science portfolio project that highlights our ability to learn relevant techniques for different parts of the project and drive data-based decisions. If you missed our last two articles in this three-part article series describing our experiences, we:

These were useful skills to learn, and we had fun! But how would we package this project as a portfolio piece to use on our resume? We describe below how we did it in this concluding article.

What is a data science portfolio?

A data science portfolio is a collection of work aimed at demonstrating your skills for potential employers, clients, or business partners. A portfolio helps you stand out from a typical resume because it shows concrete examples of your abilities, highlights your personal interests, and demonstrates your ability to go above and beyond.

A portfolio could be as fancy as a self-hosted personal website, embedded with appealing graphic design, contact forms, and references, or it could be as simple as a link to your personal GitHub on your resume. Our advice is to start simple — just link a project on your resume. Remember, this is a way to show your data skill, not your website building and design skills.

What exactly should be included in a data science portfolio?

It comes down to four factors encompassing clear descriptions, a skills focus, a clear takeaway, and sharing for feedback.

Clear communication of the project description. When someone looks at your portfolio, you will not be there to explain it. So, make sure that you include clear communication about what the project is, why you selected it, how to run it or what order to look at the files, and the output results. Without this, the person viewing your portfolio will not know where to start and will likely not take the time to try to understand. Additionally, this highlights your written communication skills, which are important for any role.

Does this sound intimidating? This could be as simple as a README in a GitHub repo. Want to knock this out of the park? Film yourself talking through a one- to three-minute demo of your portfolio work!

Have a clear focus that highlights specific skills. You have a lot of skills, and that’s great! Limiting your focus to only a few skills to highlight helps you tell your story, and helps the person viewing your portfolio determine whether that project is relevant to what they are looking for. Some example skills that you could highlight for data science jobs are data cleaning, data exploration, data visualization, data scraping, how to use a specific model or package, using a model to solve a business problem like churn or inform a recommendation, or A/B testing readout.

Have a clear takeaway. Just like a project at your job, your portfolio piece should have a clear takeaway. Doing data cleaning is awesome, but what could someone then do with the cleaned dataset? Model building is cool, but what could someone do with the results of your model? Could it drive a decision? Making a data visualization is great, but what should someone learn from looking at your visual?

Having a defined clear takeaway helps show a manager that you are results oriented. This is important because managers are always looking for people who are excited about the impact of their work and avoid doing projects that don’t have the potential to improve the business.

Share your work. Post your work on Medium or LinkedIn! Share it with friends, coworkers, and mentors. This is a great way to gain confidence in your work and gather feedback about how to improve.

Portfolio projects to avoid

Just as there are things to include, there are also things not to include:

Common tutorials or college homework. Often an interviewer will have seen these before, done them themselves, or know that the whole problem is easily found online. This includes things like survival classification on the Titanic dataset, hand-written digit classification on the MNIST dataset, and flower species classification using the iris dataset, among others.

Anything you are not passionate about. Your portfolio shows your skillset and interests. If you include projects that you aren’t interested in, or features ones using skills that you don’t like using, you’re going to pigeonhole yourself into jobs that you don’t want.

Something for which it takes too much time to explain the context. Remember, you’re trying to show off your data skills, not teach someone about your niche interest. The more time you spend explaining context, the less time you get to spend highlighting your skills.

Okay, okay, so where should I start with finding a portfolio project?

A lot of people get overwhelmed when trying to think of a possible portfolio project, and don’t know where to start! To help you begin, consider the following:

Start with your own personal interests or needs! As you can see in our articles, Riesling and Deepsha love knitting, and love data, so we used the Ravelry API to pull data about our knitting! We’ve also heard of people who start portfolio projects out of necessity.

Try a Kaggle competition. Kaggle competitions are great because they provide a clear business problem, clean data to work with, a community of people to partner with, and the potential to earn prize money! You can easily search for projects that you find interesting and applicable to your field to demonstrate your skills.

Look at the places that you like to learn for inspiration. For example, Tableau has a daily viz that you can use as inspiration for your own data visualizations or datasets to start with. Towards Data Science is packed with tutorials and projects that could be author portfolio pieces and might give you inspiration. As a bonus, after you use these types of resources for inspiration, you could submit your project for consideration to become a Tableau daily viz or get published in the Towards Data Science publication, which would be a recognition that you could highlight within your portfolio or resume!

Look at job postings that you’re really interested in. What is the one skill that they all have listed that you don’t have? Try to find a project that you could do to learn and highlight that skill.

Think about the companies and industries that you’re interested in. What problems are they facing? Can you do a project that is related to that problem or space? This will help you stand out as a candidate because it shows how passionate you are about that industry!

Our portfolio piece using the Ravelry API

As discussed in our last two articles, we worked together to use our data science skills to analyze our knitting queues on Ravelry, a social networking and organizational website for yarn-related crafts, by pulling data through the Ravelry API. But don’t worry — you don’t need to know anything about knitting to understand this portfolio piece.

To present this project in our portfolios, we chose the mode of a GitHub README for ease of use and collaboration. Additionally, this article series doubles as a portfolio presentation! Below is a description of how we took the raw data from APIs and turned them into business insights with a clear takeaway.

Where to find the code files

You can find the code to recreate everything in this and previous articles at this GitHub repo.

Our primary business decision

With the Ravelry API providing knitting queue data, we wanted to use analytics to figure out what yarn to buy for a friend as a gift. With this business decision in mind, we took the following steps.

  1. We used the Ravelry API through Flask in Python and defined functions that output relevant information.
  2. We then used Reticulate to pull output of those function calls in R to analyze for our business decision.

With our data in R, it’s easy to explore using summary statistics and data visualizations using ggplot2. Using this information, we can draw conclusions about what kind of yarn to buy for a friend. For the sake of this “business case” of buying yarn for a friend, Deepsha is looking to buy yarn for Riesling and is comparing their individual queuing patterns to gain insight.

Our Ravelry usernames are “rieslingm” and “yarnsandcoffee”. For our visualizations, we plotted the Ravelry behavior of both of us as users for comparison, although this analysis could have been done with only one username. We refer to these usernames as “Riesling” and “Deepsha” in all visualizations for readability.

Question: Is Riesling still active?

While Deepsha knows that Riesling uses Ravelry, Deepsha wants to know whether Riesling has recently continued to add projects to her knitting queue before Deepsha jumps into analyzing Riesling’s queue. To confirm that, Deepsha plotted their number of queued projects over time.

It’s interesting to note that both Deepsha and Riesling had a “pandemic year” hobby jump. It’s more relevant, though, that Riesling has continued to add projects to her queue recently, which means that Deepsha can analyze her more recent projects to figure out a perfect yarn gift.

Question: What yarn weight does Riesling like now?

Yarn weight describes the thickness of yarn. While looking for a gift, it is important for Deepsha to figure out what yarn weight she should get for Riesling, as yarn weight varies a lot among projects. For this, Deepsha looked at five of the most common yarn weight categories ranging from very thin yarn (lace weight) to very thick yarn (bulky weight). She looked at Riesling’s queued projects and the yarn weight that these projects require.

Deepsha and Riesling both moved from thicker yarn to thinner yarn as they gained experience. This is the case with a lot of people learning knitting as a new hobby, as they tend to start with thicker yarn to make it easier to pick up the craft.

Unlike Deepsha, who has been queuing a lot more fingering weight yarn lately, Riesling has been queuing more lace and DK weight. So, if Deepsha wants to get a gift for Riesling, rather than going with her new favorite yarn weight (fingering), she should opt for DK or lace. Good thing she checked the data!

Question: How much yarn should be in the gift?

The amount of yarn required to make something can vary dramatically among projects. Knitters would rather not get yarn at all than get too little yarn to accomplish their favorite project. So, it is important for Deepsha to know what type of projects Riesling likes to make, so that she can figure out what quantity of yarn to buy her.

Deepsha and Riesling have three pattern types in common among their top favorite patterns. Even though Riesling has the most shawls/wraps in her queue, she also has a lot of pullovers. Because Deepsha also has a lot of pullovers, she’s probably familiar with how much yarn goes into one of those, so she will be more likely to buy the right yarn quantity for that item.

Question: What pattern should Deepsha get for Riesling?

Finally, Deepsha has decided to be extra generous and buy Riesling a gift card for a pattern to use with the new yarn. So, she must figure out which designer to buy a gift card from. For this, Deepsha needed to look at what designer projects are most common in Riesling’s queue.

Deepsha has a few designers to choose from, but she also might take this opportunity to introduce Riesling to her own favorite designer, PetiteKnit! Riesling and Deepsha also have a common pattern author that they both like: Andrea Mowry. Deepsha can decide to give Riesling an Andrea Mowry gift card so they could knit a pattern of hers together!

The clear business takeaway of our portfolio project (drumroll!!!)

Overall, through this project, Deepsha now has a lot of information to take to her local yarn store to buy Riesling a wonderful gift! She will likely buy 1,400 yards of DK weight yarn for an Andrea Mowry pullover.

Conclusion

Hopefully, through this three-part article series, Riesling and Deepsha have demystified:

So, what are you waiting for!? Go learn a new skill, or apply a skill you have to data you’re interested in. We hope this article inspires you to create a fun portfolio project with the information and learning shared by Riesling and Deepsha. If you want to learn how to create your data science portfolio website easily with Posit Quarto, you can check out this talk that Deepsha did for Posit PBC (previously RStudio). It has live coding, so make sure to come with some popcorn!

If you’re reading this article because you’re interested in changing roles and want to look for ways to improve your application, check out Riesling’s other articles on career development:

References:

Follow Deepsha, Riesling, or Data Science @ Microsoft (all on Medium.com) to see the previous articles in this series.

--

--