How I Spent My Time As Product Data Scientist

andrew wong
Human Science AI
Published in
3 min readAug 6, 2019

I have been working through a few data science workflows, in areas as broad as socio-economic on universal income to peer-to-peer lending on LendingClub.

What can I design and develop based on the new algorithm or predictive model that I have just produced? That’s always been my ultimate goal.

Let’s get into the mindset of a Product Data Scientist — someone who can design and develop some kind of product or service out of data science results (i.e. new algorithm, new engineered feature, or new predictive model).

Let me put this straight out there where Product Data Scientist to sit in the Venn Diagram (some credits to Drew Conway with the original Data Scientist Venn Diagram)

So, what does it mean to become a Product Data Scientist.

Let me start with TIME. That’s right! I’d like to share the percentage time effort split that a product data scientist like myself invested in a typical project with the ultimate aim of churning out product/service ideas.

Prior to any project, I have invested significant of time in understanding the industry domain knowledge that I will be working on. In this case, I invested time to understand the peer-to-peer lending (why and what are the differences as compared to traditional banking lending), competitive analysis (who else in the peer-to-peer lending marketplace), and customer analysis (why there is demand for peer-to-peer lending).

Next, I have invested about 40% of my time effort in scrubbing/cleaning the LendingClub dataset. This is one of the crucial part of being a data scientist — getting the dataset ready for feature engineering and predictive modeling. The main parts of data scrubbing consist of cleaning the missing value, checking for multi-collinearity, and scrubbing categorical and numerical data.

Then, the next most crucial task is feature engineering. According to Andrew Ng, “Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine learning” is basically feature engineering.” The data science workflow (like OSEMN workflow) is fluid and iterative, so there’s no one right or optimum answer. Essentially, feature engineering is all things on creating and designing new features from existing dataset to improve predictive model performance.

The next tasks that I am about to describe will likely be new to most data scientists.

I realise that being better in designing a product/services from predictive modeling is a far more valuable and creative part of data science. It is about being able to leverage from what we have just learned. It is about being able to translate predictive modeling into potential product modeling (aka designing).

This is something that I enjoy doing — project storytelling. This is about weaving all the threads what you have learned so far into a cohesive and memorable stories. This is the selling-side of data science i.e. the ability to show-and-tell what you have learned and what are the next steps.

Here’s the big picture of what I have been through as a Product Data Scientist.

And, after going through Design Sprint, the Agile way of working, through the OSEMN Data Science workflow … I have attempted to design forward a few applications for LendingClub.

Tell me what you think?

--

--